[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-16 Thread Yonatan Gottesman (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652958#comment-16652958
 ] 

Yonatan Gottesman commented on OMID-117:


looks good

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_addendum1.patch, 
> OMID-117_hbase2.patch, OMID-117_v2.patch, OMID-117_v3.patch, 
> OMID-117_v4.patch, OMID-117_v5.patch, OMID-117_v6.patch, OMID-117_v7.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-16 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652692#comment-16652692
 ] 

James Taylor commented on OMID-117:
---

[~yonigo] - please review my addendum patch. I lowered the default number of 
read timeouts to one - this seems to have fixed the issue and wouldn't be an 
issue in production. Also, worst case, we could always override the default 
(without needing a release) to be larger if we encounter issues.

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_addendum1.patch, 
> OMID-117_hbase2.patch, OMID-117_v2.patch, OMID-117_v3.patch, 
> OMID-117_v4.patch, OMID-117_v5.patch, OMID-117_v6.patch, OMID-117_v7.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-15 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650343#comment-16650343
 ] 

James Taylor commented on OMID-117:
---

{quote}About the retries, what the worst thing that can happen with this? how 
bad is it to have it like this?
{quote}
Check out the comment I added to RegionConnectionFactory:
{code:java}
// This setting controls how many retries occur on the region server if an
// IOException occurs while trying to access the commit table. Because a
// handler thread will be in use while these retries occur and the client
// will be blocked waiting, it must not tie up the call for longer than
// the client RPC timeout. Otherwise, the client will initiate retries on it's
// end, tying up yet another handler thread. It's best if the retries can be
// zero, as in that case the handler is released and the retries occur on the
// client side. In testing, we've seen NoServerForRegionException occur which
// is a DoNotRetryIOException which are not retried on the client. It's not
// clear if this is a real issue or a test-only issue.
private static final int DEFAULT_COMMIT_TABLE_ACCESS_ON_READ_RETRIES_NUMBER = 
11;
private static final int DEFAULT_COMMIT_TABLE_ACCESS_ON_READ_RETRY_PAUSE = 100;
{code}
As it is with this patch, if retries are necessary to reach the RS hosting the 
commit table, they will occur from the RS handling the scan for 48 seconds. 
During this time, the handler thread will be tied up (i.e. it won't be able to 
be used by any other HBase client). If this occurs for all the handler threads 
on the RS, then all incoming requests would be queued. For example, non 
transactional queries would potentially not be processed during this time. If 
the retries (and pauses) occur on the client side, then non transactional work 
loads wouldn't be impacted. 

Ideally, we'd have a test that reproduces this NoServerForRegionException and 
see if any changes are needed to handle this situation. You might be able to 
repro this by manually splitting the commit table and then performing a read 
against a transactional table. It also may just occur the very first time the 
commit table is attempted to be reached from a RS after the commit table is 
created.

 

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_hbase2.patch, 
> OMID-117_v2.patch, OMID-117_v3.patch, OMID-117_v4.patch, OMID-117_v5.patch, 
> OMID-117_v6.patch, OMID-117_v7.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-15 Thread Yonatan Gottesman (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649798#comment-16649798
 ] 

Yonatan Gottesman commented on OMID-117:


Hi it looks good.

I dont understand what you said about changing the poms, all tests pass without 
changing anything (hbase2 too).

About the retries, what the worst thing that can happen with this? how bad is 
it to have it like this? if its bad i can investigate why tests dont pass.

ok to commit

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_hbase2.patch, 
> OMID-117_v2.patch, OMID-117_v3.patch, OMID-117_v4.patch, OMID-117_v5.patch, 
> OMID-117_v6.patch, OMID-117_v7.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-14 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16649618#comment-16649618
 ] 

James Taylor commented on OMID-117:
---

The InterRegionServerRpcController is only a few lines of code and you'd need 
the constructor in both classes, so I don't think it's worth it.

I've attached a v7 that passes for hbase-1 and hbase-2. For the tests to pass, 
I had to have the server-side retry. Without this, we'd get a 
NoServerForRegionException which is a DoNotRetryIOException so doesn't trigger 
any retries on the client. I'm not sure if this is a test-only issue. I added a 
long comment, but I don't have the cycles to explore further. For hbase-2 I 
could only test it by changing the poms to make hbase-2 the default profile 
(see attached). I'll let that be figured out for OMID-109.

Is this ok to commit, [~yonigo]?

 

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, 
> OMID-117_v4.patch, OMID-117_v5.patch, OMID-117_v6.patch, OMID-117_v7.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-13 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648987#comment-16648987
 ] 

James Taylor commented on OMID-117:
---

The patch applied to the head of the phoenix-integration branch (which includes 
116 as it's checked in already).

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, 
> OMID-117_v4.patch, OMID-117_v5.patch, OMID-117_v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-13 Thread Yonatan Gottesman (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648847#comment-16648847
 ] 

Yonatan Gottesman commented on OMID-117:


Thanks [~jamestaylor],

I cannot apply v6. I tried on top of the 116 patch but it didn't work.

The master has changed a bit to try to fix 109.

If you give me a patch that applies on 116 i'll do the rebase myself and fix 
hbase-2 issues.

Thanks.

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, 
> OMID-117_v4.patch, OMID-117_v5.patch, OMID-117_v6.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-12 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648289#comment-16648289
 ] 

James Taylor commented on OMID-117:
---

Please hold off on reviewing the latest patch - I found an issue for hbase-2.

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, 
> OMID-117_v4.patch, OMID-117_v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-12 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16647975#comment-16647975
 ] 

James Taylor commented on OMID-117:
---

Ping [~yonigo] or [~ohads]. Ok to commit now?

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, 
> OMID-117_v4.patch, OMID-117_v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-11 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646674#comment-16646674
 ] 

James Taylor commented on OMID-117:
---

I've attached a v5 that fixes the license header and adds back the test-only 
inject constructor. Please review, [~yonigo].

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, 
> OMID-117_v4.patch, OMID-117_v5.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-09 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643998#comment-16643998
 ] 

James Taylor commented on OMID-117:
---

As part of OMID-113, we'll need to move the coprocessor implementations into 
the shim module since Optional is not supported in Java 1.7 (which needs to be 
the target for hbase-1). We should try to make these implementations as thin as 
possible to reduce code duplication. We should be able to delegate to classes 
in hbase-common that do most of the work.

I'll add the header where it's missing and add back the HBaseCommitTable 
constructor with the @Inject, but this should only be used for testing. 

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, 
> OMID-117_v4.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-09 Thread Yonatan Gottesman (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643959#comment-16643959
 ] 

Yonatan Gottesman commented on OMID-117:


Hi [~jamestaylor] thanks.

1) Im unable to "mvn install" because some of the new files license headers are 
not good.

2) Many tests dont pass i get this error:
{code:java}
1) Could not find a suitable constructor in 
org.apache.omid.committable.hbase.HBaseCommitTable. Classes must have either 
one (and only one) constructor annotated with @Inject or a zero-argument 
constructor that is not private.

{code}
Is this the problem you had, how did you fix it? It seems really strange.

In the coprocessors you removed the method 

 
{code:java}
public Optional getRegionObserver() {
return Optional.of(this);
}
{code}
But this is required since hbase 2 look at 
[this|https://hbase.apache.org/book.html] :
{code:java}
Coprocessor APIs have changed in HBase 2.0+
All Coprocessor APIs have been refactored to improve supportability around 
binary API compatibility for future versions of HBase. If you or applications 
you rely on have custom HBase coprocessors, you should read the release notes 
for HBASE-18169 for details of changes you will need to make prior to upgrading 
to HBase 2.0+.

For example, if you had a BaseRegionObserver in HBase 1.2 then at a minimum you 
will need to update it to implement both RegionObserver and RegionCoprocessor 
and add the method

...
  @Override
  public Optional getRegionObserver() {
return Optional.of(this);
  }
...
{code}
 

did you try to run "mvn test -Phbase-2" ? I think it wont work in this case.

What do you think?

 

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, 
> OMID-117_v4.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-07 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641157#comment-16641157
 ] 

James Taylor commented on OMID-117:
---

For some reason git apply was ok, but git am was giving an error. I've removed 
the changes to ScrambledZipfianGenerator.java (which was just for an 
unnecessary cast) and attached a v4 which seems to work fine with git am as 
well. Please try again.

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch, 
> OMID-117_v4.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-07 Thread Yonatan Gottesman (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641143#comment-16641143
 ] 

Yonatan Gottesman commented on OMID-117:


error: patch failed: 
benchmarks/src/main/java/org/apache/omid/benchmarks/utils/ScrambledZipfianGenerator.java:117

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch, OMID-117_v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-06 Thread Yonatan Gottesman (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640786#comment-16640786
 ] 

Yonatan Gottesman commented on OMID-117:


[~jamestaylor], git am doesn't work on _v2 version. (applying the first version 
works)

should i apply something first? im synchronized with apache git

 

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-06 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640782#comment-16640782
 ] 

James Taylor commented on OMID-117:
---

Never mind. Turns out it was an environmental issue which went away after I 
rebooted. Please review the patch, [~ohads] or [~yonigo]. We still need to 
ensure that Java 1.7 is used for hbase-1 and Java 1.8 is used for hbase-2. I 
don't know how to do that with profiles, but if we use a compat module approach 
(similar to Tephra), it's pretty straightforward.

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch, OMID-117_v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-06 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640592#comment-16640592
 ] 

James Taylor commented on OMID-117:
---

Actually, looks like TestHBaseCommitTable hasn't passed since 
81672f016b535546444ea1e3b551ae5dca4bf3ef.

Any ideas?

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (OMID-117) Ensure timeouts are configured low for RPCs to commit table

2018-10-06 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/OMID-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640577#comment-16640577
 ] 

James Taylor commented on OMID-117:
---

Need some help on this, [~ohads] and [~yonigo]. This patch adds new 
constructors to HBaseCommitTable so that you can pass the correctly configured 
Connection on the server side. However, I can't get TestHBaseCommitTable to 
pass. I'm not familiar with TestNG or Google Guice. Maybe I broke something? 
The strange this is that even if I try running the unit tests with an old 
commit, it still doesn't work.

The key things on the server-side connection:
 * A single shared connection (rather than a new one created potentially per 
region) needs to be used, otherwise your region server will die when many 
regions attempt to connect to the commit table but are unable.
 * The timeouts need to be overridden because HBase by default multiplies the 
default settings by 10x for server to server RPCs (which would cause all you 
handler threads to get tied up and make your region server inaccessible.

 

> Ensure timeouts are configured low for RPCs to commit table
> ---
>
> Key: OMID-117
> URL: https://issues.apache.org/jira/browse/OMID-117
> Project: Apache Omid
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Attachments: OMID-117.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)