[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913964#comment-13913964
 ] 

Hudson commented on HBASE-10566:


FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #100 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/100/])
HBASE-10566 cleanup rpcTimeout in the client - addendum (nkeywal: rev 1572033)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java


> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913012#comment-13913012
 ] 

Hudson commented on HBASE-10566:


FAILURE: Integrated in HBase-TRUNK #4957 (See 
[https://builds.apache.org/job/HBase-TRUNK/4957/])
HBASE-10566 cleanup rpcTimeout in the client - addendum (nkeywal: rev 1572033)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java


> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-26 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912826#comment-13912826
 ] 

Nicolas Liochon commented on HBASE-10566:
-

bq. Will do as an addendum. 
Done.

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912401#comment-13912401
 ] 

Hudson commented on HBASE-10566:


FAILURE: Integrated in HBase-TRUNK-on-Hadoop-1.1 #99 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/99/])
HBASE-10566 cleanup rpcTimeout in the client - missing TimeLimitedRpcController 
(nkeywal: rev 1571730)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/TimeLimitedRpcController.java
HBASE-10566 cleanup rpcTimeout in the client (nkeywal: rev 1571727)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientSmallScanner.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/DelegatingRetryingCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiServerCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Put.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionServerCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RetryingCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/PayloadCarryingRpcController.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RegionCoprocessorRpcChannel.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEditsReplaySink.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java


> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> Th

[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911944#comment-13911944
 ] 

Nicolas Liochon commented on HBASE-10566:
-

bq. Belated +1
Thanks ;-)

bq. Yes.
Will do as an addendum. 

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911926#comment-13911926
 ] 

Hudson commented on HBASE-10566:


FAILURE: Integrated in HBase-TRUNK #4953 (See 
[https://builds.apache.org/job/HBase-TRUNK/4953/])
HBASE-10566 cleanup rpcTimeout in the client - missing TimeLimitedRpcController 
(nkeywal: rev 1571730)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/TimeLimitedRpcController.java
HBASE-10566 cleanup rpcTimeout in the client (nkeywal: rev 1571727)
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ClientSmallScanner.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/DelegatingRetryingCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/MultiServerCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Put.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RegionServerCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RetryingCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCaller.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/PayloadCarryingRpcController.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RegionCoprocessorRpcChannel.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/RpcClient.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RpcServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/ReplicationProtbufUtil.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/SplitLogWorker.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALEditsReplaySink.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestEndToEndSplitTransaction.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegionServerBulkLoad.java


> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system 

[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911856#comment-13911856
 ] 

Andrew Purtell commented on HBASE-10566:


Belated +1

bq. btw, I'm seeing this only now, but I extended the original naming 
'ipc.socket.timeout', and that was not prefixed by 'hbase.' I should change 
this, no?

Yes.

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911703#comment-13911703
 ] 

Nicolas Liochon commented on HBASE-10566:
-

btw, I'm seeing this only now, but I extended the original naming 
'ipc.socket.timeout', and that was not prefixed by 'hbase.' I should change 
this, no?

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911682#comment-13911682
 ] 

stack commented on HBASE-10566:
---

+1 on fix in separate patch

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911557#comment-13911557
 ] 

Nicolas Liochon commented on HBASE-10566:
-

I plan to commit this shortly. The ??callWithRetries(callable, 
HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT);?? looks terrible, but may 
it's 'feature by accident', so it's better to fix this in a separate patch.

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911548#comment-13911548
 ] 

Hadoop QA commented on HBASE-10566:
---

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630929/10566.v3.patch
  against trunk revision .
  ATTACHMENT ID: 12630929

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8797//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8797//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8797//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8797//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8797//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8797//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8797//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8797//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8797//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8797//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8797//console

This message is automatically generated.

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).

[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911488#comment-13911488
 ] 

Nicolas Liochon commented on HBASE-10566:
-

Added to that, the existing code does this:
{code}
  public synchronized T callWithRetries(RetryingCallable callable) throws 
IOException,
  RuntimeException {
return callWithRetries(callable, 
HConstants.DEFAULT_HBASE_CLIENT_OPERATION_TIMEOUT);
  }
{code}

In other words, the setting is not taken into account; we use the hardcoded 
default value in many cases
The patch does not change this.

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911487#comment-13911487
 ] 

Nicolas Liochon commented on HBASE-10566:
-

v3:
 - fix the javadoc warnings
 - add a test
 - test showed that some HTable paths were not setting the timeout. Fixed.

ProtobufUtils contains code that belong to the client code. I  haven't fixed 
them all, the patch would become too big. I think we're as good as before.


> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-24 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910954#comment-13910954
 ] 

stack commented on HBASE-10566:
---

Fix javadoc warning on commit?

This is a a great comment: "We're spending a lot of time wrapping the 
exceptions, and then unwrapping them to discover what really happened."  File 
an issue for this one when you get a chance.

Patch looks great to me.  Commit.



> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910746#comment-13910746
 ] 

Hadoop QA commented on HBASE-10566:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630733/10566.v2.patch
  against trunk revision .
  ATTACHMENT ID: 12630733

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 3 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8787//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8787//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8787//console

This message is automatically generated.

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will mak

[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-24 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910602#comment-13910602
 ] 

Nicolas Liochon commented on HBASE-10566:
-

v2 fixes the test error. I'm not sure that we should not get rid of 
'wrapException' however. We're spending a lot of time wrapping the exceptions, 
and then unwrapping them to discover what really happened.

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910592#comment-13910592
 ] 

Hadoop QA commented on HBASE-10566:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12630699/10566.v1.patch
  against trunk revision .
  ATTACHMENT ID: 12630699

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 3 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.client.TestClientOperationInterrupt

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8785//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8785//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8785//console

This message is automatically generated.

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> qu

[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-24 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910452#comment-13910452
 ] 

Nicolas Liochon commented on HBASE-10566:
-

v1 is a first attempt. I haven't run all the tests locally, but I had no error 
after a 30 minutes run.

3 different socket timeouts
- connect
- read
- write

For all of them, we should be able to set them to low value, something like 2 / 
5 / 5, without any impact. Likely I will need to write a test for this. The 
existing timeout of 60s is a global timeout for the operation. I need to double 
check how we were using the existing operationTimout, my feeling is that it was 
buggy, and that it was overriding the individual timeout. If it's the case, 
it's still buggy.




> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-24 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910145#comment-13910145
 ] 

Nicolas Liochon commented on HBASE-10566:
-

bq. I suppose it is ok. Maybe rename the class so it is not confused with 
Callable.
Actually, we never use the fact that it's a java Callable. But changing the 
name can impact a lot of code. I will try (intelliJ will do the change for me, 
but it can make the patch much bigger, I don"t know).

bq. Is TimeLimitedRpcController left as an exercise to the reader
I forgot it (usual stuff: not added to git, so not included in git diff). But 
the patch globally compiles but does not set the timeout all the time.

bq. Doesn't "callTimeout" make more sense for this parameter name?
Often timeout indicates a duration, while here I used something like a cutoff 
time. That's what I wanted to express. There is an implication however: the 
client and the server time must be in sync. Even if it's a common requirement, 
I'm not sure I'm not going to change my mind.

Thanks a lot for the feedback, I'm going to try to write the full patch.




> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908830#comment-13908830
 ] 

stack commented on HBASE-10566:
---

I like Nick's namings

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-21 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908726#comment-13908726
 ] 

Nick Dimiduk commented on HBASE-10566:
--

bq. A confusion between the socket timeout and the call timeout

Yet you chose the name "finishBefore". Doesn't "callTimeout" make more sense 
for this parameter name?

Is {{TimeLimitedRpcController}} left as an exercise to the reader? Perhaps this 
is what you mean by "sample". Again with the name choice (for my own clarity 
mostly), I understand this to be an implementation of RpcController that honors 
an RpcCallTimeout setting.

bq. Nick pointed me to this, it's quite interesting

My intention was to show how that implementation manages Rpc timeout. I've 
studied their implementation a bit, but don't know how to map those concepts 
back to ours. In their implementation, the timeout management is handled as an 
ExecutorService implementation.

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908707#comment-13908707
 ] 

stack commented on HBASE-10566:
---

TimeLimitedRpcController.java is missing from the patch (but I think I can 
imagine what it looks like -- smile).

Hmm this is radical difference:

-public interface RetryingCallable extends Callable {
+public interface RetryingCallable  {

I suppose it is ok.  Maybe rename the class so it is not confused with Callable.

The cleanup in RpcRetryingCaller is great.

Why does this have to be a data member in RpcClient?

+final long finishBefore;

... oh I see ... there is a Call per method invocation... makes sense.

Fix spelling: finisheBefore

Nice cleanup in RpcClient.java removing the TL

Patch lgtm.


> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908668#comment-13908668
 ] 

stack commented on HBASE-10566:
---

bq. I'm not that ambitious.

That is because you are more sensible than I (smile).

Sounds like we'd have to remove the TLS anyway, whether new transport or not.

Let me review the patch.  Can do new transport in another issue (smile).

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-21 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908647#comment-13908647
 ] 

Nicolas Liochon commented on HBASE-10566:
-

bq. You think we should just do a new transport altogether? You are fixing ugly 
legacy.
I'm not that ambitious :-). I'm not saying it does not make sense.
I'm trying to remove the non standard behaviors: here, the TLS is both buggy 
and impact the threading model. By removing it I can have a different one, 
likely more compatible with any other rpc layer, but, anyway and short term, 
less buggy. If I remove the TLS, I can have a thread pool for all the servers, 
and this will allow me to have a single code path after 10525. In any case, we 
need to have standard features. Client side, the issues were the ping and the 
rpc timeout. I think that if we remove them, we're clean imho. Server side it's 
more complex: delayed call, priorities, configurable schedulers, ...

Thanks a lot for reviewing: the patch is not complete, but I need feedback on 
the direction, especially because it's going to touch a lot of parts (trivial 
changes, but all over the place).

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908619#comment-13908619
 ] 

stack commented on HBASE-10566:
---

bq.  Nick pointed me to this, it's quite interesting: 
https://code.google.com/p/protobuf-rpc-pro/wiki/RpcTimeout

Yes.  It would be fun to try and drop in a new transport, one that had 
fancyness like that of pb-rpc-pro with bidirectional messaging and cancel, etc.

Or this dead one: https://code.google.com/p/netty-protobuf-rpc/  There are 
others too..

This is a thorny issue N.  Thanks for digging in.

bq.  if we can not write to the socket in 10 second, we have an issue.

Yes.  We've inherited a load of our timeouts from our batch-orientated parent 
and have yet to change them in many cases.

bq. Today, we have a single value, it does not allow us to have low socket read 
timeouts.

Yes. Excellent.  This sloppyness has been allowed prevail down through the 
years (sorry about that).

bq. May be protobuf makes this complicated, to be confirmed), but as well it 
does not really work, because we can have multithreading issues 

This we inherited from the hadoop rpc.

You think we should just do a new transport altogether?  You are fixing ugly 
legacy.

bq. we could send the call timeout to the server as well: 

Yes.  Server might reject a call, even before it starts working on it, because 
it is already past its timeout (for whatever reason)


bq.   getStub().multi(null, request); << 
pcrc not used

Good.

bq. we would need to instantiate one rpcController per call (we more or less do 
that already)

We do this already -- at least IIRC, this is what the model puts up on us.

rpcController we use at the moment for carrying our cellblock across the pb rpc 
interface (it doesn't allow for extra args as is).

Let me look at the patch






> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-21 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908563#comment-13908563
 ] 

Nicolas Liochon commented on HBASE-10566:
-

Nick pointed me to this, it's quite interesting: 
https://code.google.com/p/protobuf-rpc-pro/wiki/RpcTimeout

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-21 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908493#comment-13908493
 ] 

Nicolas Liochon commented on HBASE-10566:
-

The difficult point is to correlate this with the cancel. The RpcController 
supports the cancellation, but we don't really show this class to the client 
code. However, doing a cancel is a client decision. So we can be protobuf 
compliant with the timeout, as it's something we manage as a conf parameter, so 
we don't have to show the RpcController to the client,  but we can't easily be 
protobuf compliant with the cancel, as it would require to expose this 
RpcController.

the path for this jira is:
clean rpc timeout -> shared thread pool for the connection instead of 2 threads 
per region server -> cancellation server side
likely, this would help to put Netty or byteBuffer in the loop as well.
And this would allow to have shorter socket timeout: today, if we want to 
support an operation that can last a few minutes (a scan with a lot of filters 
under heavy load for example), we need to have a socket timeout of a few 
minutes as well.

So I'm interested in any feedback. [~saint@gmail.com], 
[~andrew.purt...@gmail.com]; [~enis]; [~devaraj] ? 

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-21 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908332#comment-13908332
 ] 

Nicolas Liochon commented on HBASE-10566:
-

Interestingly: "A specific RpcController implementation may very well provide a 
SetTimeout() method — Google’s internal implementation does exactly this." - 
http://steve.vinoski.net/blog/2008/07/13/protocol-buffers-leaky-rpc/

So likely, it's likely the right approach from a protobuf PoV, even if I'm not 
sure that we're totally rpc protobuf friendly...

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-21 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908255#comment-13908255
 ] 

Nicolas Liochon commented on HBASE-10566:
-

As well to include in this patch: same policy for ServerRpcController: one 
throws UnsupportedException, one does nothing.

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-20 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907013#comment-13907013
 ] 

Nicolas Liochon commented on HBASE-10566:
-

Any feedback? Is the approach for the rpcController ok?

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-19 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905868#comment-13905868
 ] 

Nicolas Liochon commented on HBASE-10566:
-

Here a sample, using the rpcController to give the info:
- call() is replaced by call(finishBefore): it indicates that the call should 
finish before this time
- call(0) means default timeout
- we could forward the info to the server (somewhere in protobuf)
- we would need to instantiate one rpcController per call (we more or less do 
that already)

Any opinion on the approach? It seems that's how protobuf should be used, but I 
may be wrong. 

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-19 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905738#comment-13905738
 ] 

Nicolas Liochon commented on HBASE-10566:
-

To be fixed in this patch, there is a bug in HTable#mutateRow

{code}
  public void mutateRow(final RowMutations rm) throws IOException {
RegionServerCallable callable =
new RegionServerCallable(connection, getName(), rm.getRow()) {
  public Void call() throws IOException {
try {
  RegionAction.Builder regionMutationBuilder = 
RequestConverter.buildRegionAction(
getLocation().getRegionInfo().getRegionName(), rm);
  regionMutationBuilder.setAtomic(true);
  MultiRequest request =

MultiRequest.newBuilder().addRegionAction(regionMutationBuilder.build()).build();
  PayloadCarryingRpcController pcrc = new 
PayloadCarryingRpcController();
  pcrc.setPriority(tableName);
  getStub().multi(null, request); << pcrc 
not used
} catch (ServiceException se) {
  throw ProtobufUtil.getRemoteException(se);
}
return null;
  }
};
rpcCallerFactory. newCaller().callWithRetries(callable, 
this.operationTimeout);
  }
{code}

> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)