[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652958#comment-16652958 ] Yonatan Gottesman commented on OMID-117: looks good > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_addendum1.patch, > OMID-117_hbase2.patch, OMID-117_v2.patch, OMID-117_v3.patch, > OMID-117_v4.patch, OMID-117_v5.patch, OMID-117_v6.patch, OMID-117_v7.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652692#comment-16652692 ] James Taylor commented on OMID-117: --- [~yonigo] - please review my addendum patch. I lowered the default number of read timeouts to one - this seems to have fixed the issue and wouldn't be an issue in production. Also, worst case, we could always override the default (without needing a release) to be larger if we encounter issues. > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_addendum1.patch, > OMID-117_hbase2.patch, OMID-117_v2.patch, OMID-117_v3.patch, > OMID-117_v4.patch, OMID-117_v5.patch, OMID-117_v6.patch, OMID-117_v7.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650343#comment-16650343 ] James Taylor commented on OMID-117: --- {quote}About the retries, what the worst thing that can happen with this? how bad is it to have it like this? {quote} Check out the comment I added to RegionConnectionFactory: {code:java} // This setting controls how many retries occur on the region server if an // IOException occurs while trying to access the commit table. Because a // handler thread will be in use while these retries occur and the client // will be blocked waiting, it must not tie up the call for longer than // the client RPC timeout. Otherwise, the client will initiate retries on it's // end, tying up yet another handler thread. It's best if the retries can be // zero, as in that case the handler is released and the retries occur on the // client side. In testing, we've seen NoServerForRegionException occur which // is a DoNotRetryIOException which are not retried on the client. It's not // clear if this is a real issue or a test-only issue. private static final int DEFAULT_COMMIT_TABLE_ACCESS_ON_READ_RETRIES_NUMBER = 11; private static final int DEFAULT_COMMIT_TABLE_ACCESS_ON_READ_RETRY_PAUSE = 100; {code} As it is with this patch, if retries are necessary to reach the RS hosting the commit table, they will occur from the RS handling the scan for 48 seconds. During this time, the handler thread will be tied up (i.e. it won't be able to be used by any other HBase client). If this occurs for all the handler threads on the RS, then all incoming requests would be queued. For example, non transactional queries would potentially not be processed during this time. If the retries (and pauses) occur on the client side, then non transactional work loads wouldn't be impacted. Ideally, we'd have a test that reproduces this NoServerForRegionException and see if any changes are needed to handle this situation. You might be able to repro this by manually splitting the commit table and then performing a read against a transactional table. It also may just occur the very first time the commit table is attempted to be reached from a RS after the commit table is created. > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_hbase2.patch, > OMID-117_v2.patch, OMID-117_v3.patch, OMID-117_v4.patch, OMID-117_v5.patch, > OMID-117_v6.patch, OMID-117_v7.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649798#comment-16649798 ] Yonatan Gottesman commented on OMID-117: Hi it looks good. I dont understand what you said about changing the poms, all tests pass without changing anything (hbase2 too). About the retries, what the worst thing that can happen with this? how bad is it to have it like this? if its bad i can investigate why tests dont pass. ok to commit > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_hbase2.patch, > OMID-117_v2.patch, OMID-117_v3.patch, OMID-117_v4.patch, OMID-117_v5.patch, > OMID-117_v6.patch, OMID-117_v7.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649618#comment-16649618 ] James Taylor commented on OMID-117: --- The InterRegionServerRpcController is only a few lines of code and you'd need the constructor in both classes, so I don't think it's worth it. I've attached a v7 that passes for hbase-1 and hbase-2. For the tests to pass, I had to have the server-side retry. Without this, we'd get a NoServerForRegionException which is a DoNotRetryIOException so doesn't trigger any retries on the client. I'm not sure if this is a test-only issue. I added a long comment, but I don't have the cycles to explore further. For hbase-2 I could only test it by changing the poms to make hbase-2 the default profile (see attached). I'll let that be figured out for OMID-109. Is this ok to commit, [~yonigo]? > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, > OMID-117_v4.patch, OMID-117_v5.patch, OMID-117_v6.patch, OMID-117_v7.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648987#comment-16648987 ] James Taylor commented on OMID-117: --- The patch applied to the head of the phoenix-integration branch (which includes 116 as it's checked in already). > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, > OMID-117_v4.patch, OMID-117_v5.patch, OMID-117_v6.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648847#comment-16648847 ] Yonatan Gottesman commented on OMID-117: Thanks [~jamestaylor], I cannot apply v6. I tried on top of the 116 patch but it didn't work. The master has changed a bit to try to fix 109. If you give me a patch that applies on 116 i'll do the rebase myself and fix hbase-2 issues. Thanks. > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, > OMID-117_v4.patch, OMID-117_v5.patch, OMID-117_v6.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648289#comment-16648289 ] James Taylor commented on OMID-117: --- Please hold off on reviewing the latest patch - I found an issue for hbase-2. > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, > OMID-117_v4.patch, OMID-117_v5.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647975#comment-16647975 ] James Taylor commented on OMID-117: --- Ping [~yonigo] or [~ohads]. Ok to commit now? > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, > OMID-117_v4.patch, OMID-117_v5.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646674#comment-16646674 ] James Taylor commented on OMID-117: --- I've attached a v5 that fixes the license header and adds back the test-only inject constructor. Please review, [~yonigo]. > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, > OMID-117_v4.patch, OMID-117_v5.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643998#comment-16643998 ] James Taylor commented on OMID-117: --- As part of OMID-113, we'll need to move the coprocessor implementations into the shim module since Optional is not supported in Java 1.7 (which needs to be the target for hbase-1). We should try to make these implementations as thin as possible to reduce code duplication. We should be able to delegate to classes in hbase-common that do most of the work. I'll add the header where it's missing and add back the HBaseCommitTable constructor with the @Inject, but this should only be used for testing. > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, > OMID-117_v4.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643959#comment-16643959 ] Yonatan Gottesman commented on OMID-117: Hi [~jamestaylor] thanks. 1) Im unable to "mvn install" because some of the new files license headers are not good. 2) Many tests dont pass i get this error: {code:java} 1) Could not find a suitable constructor in org.apache.omid.committable.hbase.HBaseCommitTable. Classes must have either one (and only one) constructor annotated with @Inject or a zero-argument constructor that is not private. {code} Is this the problem you had, how did you fix it? It seems really strange. In the coprocessors you removed the method {code:java} public Optional getRegionObserver() { return Optional.of(this); } {code} But this is required since hbase 2 look at [this|https://hbase.apache.org/book.html] : {code:java} Coprocessor APIs have changed in HBase 2.0+ All Coprocessor APIs have been refactored to improve supportability around binary API compatibility for future versions of HBase. If you or applications you rely on have custom HBase coprocessors, you should read the release notes for HBASE-18169 for details of changes you will need to make prior to upgrading to HBase 2.0+. For example, if you had a BaseRegionObserver in HBase 1.2 then at a minimum you will need to update it to implement both RegionObserver and RegionCoprocessor and add the method ... @Override public Optional getRegionObserver() { return Optional.of(this); } ... {code} did you try to run "mvn test -Phbase-2" ? I think it wont work in this case. What do you think? > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, > OMID-117_v4.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641157#comment-16641157 ] James Taylor commented on OMID-117: --- For some reason git apply was ok, but git am was giving an error. I've removed the changes to ScrambledZipfianGenerator.java (which was just for an unnecessary cast) and attached a v4 which seems to work fine with git am as well. Please try again. > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, > OMID-117_v4.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641143#comment-16641143 ] Yonatan Gottesman commented on OMID-117: error: patch failed: benchmarks/src/main/java/org/apache/omid/benchmarks/utils/ScrambledZipfianGenerator.java:117 > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640786#comment-16640786 ] Yonatan Gottesman commented on OMID-117: [~jamestaylor], git am doesn't work on _v2 version. (applying the first version works) should i apply something first? im synchronized with apache git > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640782#comment-16640782 ] James Taylor commented on OMID-117: --- Never mind. Turns out it was an environmental issue which went away after I rebooted. Please review the patch, [~ohads] or [~yonigo]. We still need to ensure that Java 1.7 is used for hbase-1 and Java 1.8 is used for hbase-2. I don't know how to do that with profiles, but if we use a compat module approach (similar to Tephra), it's pretty straightforward. > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch, OMID-117_v2.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640592#comment-16640592 ] James Taylor commented on OMID-117: --- Actually, looks like TestHBaseCommitTable hasn't passed since 81672f016b535546444ea1e3b551ae5dca4bf3ef. Any ideas? > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table
[ https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640577#comment-16640577 ] James Taylor commented on OMID-117: --- Need some help on this, [~ohads] and [~yonigo]. This patch adds new constructors to HBaseCommitTable so that you can pass the correctly configured Connection on the server side. However, I can't get TestHBaseCommitTable to pass. I'm not familiar with TestNG or Google Guice. Maybe I broke something? The strange this is that even if I try running the unit tests with an old commit, it still doesn't work. The key things on the server-side connection: * A single shared connection (rather than a new one created potentially per region) needs to be used, otherwise your region server will die when many regions attempt to connect to the commit table but are unable. * The timeouts need to be overridden because HBase by default multiplies the default settings by 10x for server to server RPCs (which would cause all you handler threads to get tied up and make your region server inaccessible. > Ensure timeouts are configured low for RPCs to commit table > --- > > Key: OMID-117 > URL: https://issues.apache.org/jira/browse/OMID-117 > Project: Apache Omid > Issue Type: Bug >Reporter: James Taylor >Priority: Major > Attachments: OMID-117.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)