[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2017-08-04 Thread Dave Latham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Latham updated HBASE-10566:

Release Note: 
3 new settings are now available to configure the socket in the HBase client:
- connect timeout: "hbase.ipc.client.socket.timeout.connect" (milliseconds, 
default: 10 seconds)
- read timeout: "hbase.ipc.client.socket.timeout.read" (milliseconds, default: 
20 seconds)
- write timeout: "hbase.ipc.client.socket.timeout.write" (milliseconds, 
default: 60 seconds)

ipc.socket.timeout is not used anymore.
The per operation timeout is still controled by hbase.rpc.timeout 


  was:
3 new settings are now available to configure the socket in the HBase client:
- connect timeout: "hbase.ipc.client.socket.timeout.connect" (milliseconds, 
default: 10 seconds)
- write timeout: "hbase.ipc.client.socket.timeout.read" (milliseconds, default: 
20 seconds)
- read timeout: "hbase.ipc.client.socket.timeout.write" (milliseconds, default: 
60 seconds)

ipc.socket.timeout is not used anymore.
The per operation timeout is still controled by hbase.rpc.timeout 



> cleanup rpcTimeout in the client
> 
>
> Key: HBASE-10566
> URL: https://issues.apache.org/jira/browse/HBASE-10566
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.99.0
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
> Fix For: 0.99.0
>
> Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
> 10566.v3.patch
>
>
> There are two issues:
> 1) A confusion between the socket timeout and the call timeout
> Socket timeouts should be minimal: a default like 20 seconds, that could be 
> lowered to single digits timeouts for some apps: if we can not write to the 
> socket in 10 second, we have an issue. This is different from the total 
> duration (send query + do query + receive query), that can be longer, as it 
> can include remotes calls on the server and so on. Today, we have a single 
> value, it does not allow us to have low socket read timeouts.
> 2) The timeout can be different between the calls. Typically, if the total 
> time, retries included is 60 seconds but failed after 2 seconds, then the 
> remaining is 58s. HBase does this today, but by hacking with a thread local 
> storage variable. It's a hack (it should have been a parameter of the 
> methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
> complicated, to be confirmed), but as well it does not really work, because 
> we can have multithreading issues (we use the updated rpc timeout of someone 
> else, or we create a new BlockingRpcChannelImplementation with a random 
> default timeout).
> Ideally, we could send the call timeout to the server as well: it will be 
> able to dismiss alone the calls that it received but git stick in the request 
> queue or in the internal retries (on hdfs for example).
> This will make the system more reactive to failure.
> I think we can solve this now, especially after 10525. The main issue is to 
> something that fits well with protobuf...
> Then it should be easy to have a pool of thread for writers and readers, w/o 
> a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-26 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10566:


Release Note: 
3 new settings are now available to configure the socket in the HBase client:
- connect timeout: hbase.ipc.client.socket.timeout.connect (milliseconds, 
default: 10 seconds)
- write timeout: hbase.ipc.client.socket.timeout.read (milliseconds, default: 
20 seconds)
- read timeout: hbase.ipc.client.socket.timeout.write (milliseconds, default: 
60 seconds)

ipc.socket.timeout is not used anymore.
The per operation timeout is still controled by hbase.rpc.timeout 


  was:
3 settings are now available to configure the socket in the HBase client:
- connect timeout: ipc.socket.timeout.connect (default: 10 seconds)
- write timeout: ipc.socket.timeout.read (default: 20 seconds)
- read timeout: ipc.socket.timeout.write (default: 60 seconds)

The per operation timeout is still controled by hbase.rpc.timeout 



 cleanup rpcTimeout in the client
 

 Key: HBASE-10566
 URL: https://issues.apache.org/jira/browse/HBASE-10566
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
 10566.v3.patch


 There are two issues:
 1) A confusion between the socket timeout and the call timeout
 Socket timeouts should be minimal: a default like 20 seconds, that could be 
 lowered to single digits timeouts for some apps: if we can not write to the 
 socket in 10 second, we have an issue. This is different from the total 
 duration (send query + do query + receive query), that can be longer, as it 
 can include remotes calls on the server and so on. Today, we have a single 
 value, it does not allow us to have low socket read timeouts.
 2) The timeout can be different between the calls. Typically, if the total 
 time, retries included is 60 seconds but failed after 2 seconds, then the 
 remaining is 58s. HBase does this today, but by hacking with a thread local 
 storage variable. It's a hack (it should have been a parameter of the 
 methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
 complicated, to be confirmed), but as well it does not really work, because 
 we can have multithreading issues (we use the updated rpc timeout of someone 
 else, or we create a new BlockingRpcChannelImplementation with a random 
 default timeout).
 Ideally, we could send the call timeout to the server as well: it will be 
 able to dismiss alone the calls that it received but git stick in the request 
 queue or in the internal retries (on hdfs for example).
 This will make the system more reactive to failure.
 I think we can solve this now, especially after 10525. The main issue is to 
 something that fits well with protobuf...
 Then it should be easy to have a pool of thread for writers and readers, w/o 
 a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10566:


Status: Open  (was: Patch Available)

 cleanup rpcTimeout in the client
 

 Key: HBASE-10566
 URL: https://issues.apache.org/jira/browse/HBASE-10566
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
 10566.v3.patch


 There are two issues:
 1) A confusion between the socket timeout and the call timeout
 Socket timeouts should be minimal: a default like 20 seconds, that could be 
 lowered to single digits timeouts for some apps: if we can not write to the 
 socket in 10 second, we have an issue. This is different from the total 
 duration (send query + do query + receive query), that can be longer, as it 
 can include remotes calls on the server and so on. Today, we have a single 
 value, it does not allow us to have low socket read timeouts.
 2) The timeout can be different between the calls. Typically, if the total 
 time, retries included is 60 seconds but failed after 2 seconds, then the 
 remaining is 58s. HBase does this today, but by hacking with a thread local 
 storage variable. It's a hack (it should have been a parameter of the 
 methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
 complicated, to be confirmed), but as well it does not really work, because 
 we can have multithreading issues (we use the updated rpc timeout of someone 
 else, or we create a new BlockingRpcChannelImplementation with a random 
 default timeout).
 Ideally, we could send the call timeout to the server as well: it will be 
 able to dismiss alone the calls that it received but git stick in the request 
 queue or in the internal retries (on hdfs for example).
 This will make the system more reactive to failure.
 I think we can solve this now, especially after 10525. The main issue is to 
 something that fits well with protobuf...
 Then it should be easy to have a pool of thread for writers and readers, w/o 
 a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10566:


Attachment: 10566.v3.patch

 cleanup rpcTimeout in the client
 

 Key: HBASE-10566
 URL: https://issues.apache.org/jira/browse/HBASE-10566
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
 10566.v3.patch


 There are two issues:
 1) A confusion between the socket timeout and the call timeout
 Socket timeouts should be minimal: a default like 20 seconds, that could be 
 lowered to single digits timeouts for some apps: if we can not write to the 
 socket in 10 second, we have an issue. This is different from the total 
 duration (send query + do query + receive query), that can be longer, as it 
 can include remotes calls on the server and so on. Today, we have a single 
 value, it does not allow us to have low socket read timeouts.
 2) The timeout can be different between the calls. Typically, if the total 
 time, retries included is 60 seconds but failed after 2 seconds, then the 
 remaining is 58s. HBase does this today, but by hacking with a thread local 
 storage variable. It's a hack (it should have been a parameter of the 
 methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
 complicated, to be confirmed), but as well it does not really work, because 
 we can have multithreading issues (we use the updated rpc timeout of someone 
 else, or we create a new BlockingRpcChannelImplementation with a random 
 default timeout).
 Ideally, we could send the call timeout to the server as well: it will be 
 able to dismiss alone the calls that it received but git stick in the request 
 queue or in the internal retries (on hdfs for example).
 This will make the system more reactive to failure.
 I think we can solve this now, especially after 10525. The main issue is to 
 something that fits well with protobuf...
 Then it should be easy to have a pool of thread for writers and readers, w/o 
 a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10566:


Status: Patch Available  (was: Open)

 cleanup rpcTimeout in the client
 

 Key: HBASE-10566
 URL: https://issues.apache.org/jira/browse/HBASE-10566
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
 10566.v3.patch


 There are two issues:
 1) A confusion between the socket timeout and the call timeout
 Socket timeouts should be minimal: a default like 20 seconds, that could be 
 lowered to single digits timeouts for some apps: if we can not write to the 
 socket in 10 second, we have an issue. This is different from the total 
 duration (send query + do query + receive query), that can be longer, as it 
 can include remotes calls on the server and so on. Today, we have a single 
 value, it does not allow us to have low socket read timeouts.
 2) The timeout can be different between the calls. Typically, if the total 
 time, retries included is 60 seconds but failed after 2 seconds, then the 
 remaining is 58s. HBase does this today, but by hacking with a thread local 
 storage variable. It's a hack (it should have been a parameter of the 
 methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
 complicated, to be confirmed), but as well it does not really work, because 
 we can have multithreading issues (we use the updated rpc timeout of someone 
 else, or we create a new BlockingRpcChannelImplementation with a random 
 default timeout).
 Ideally, we could send the call timeout to the server as well: it will be 
 able to dismiss alone the calls that it received but git stick in the request 
 queue or in the internal retries (on hdfs for example).
 This will make the system more reactive to failure.
 I think we can solve this now, especially after 10525. The main issue is to 
 something that fits well with protobuf...
 Then it should be easy to have a pool of thread for writers and readers, w/o 
 a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10566:


  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk, thanks for the review, Nick  Stack.
Release notes and derivate jiras are coming.

 cleanup rpcTimeout in the client
 

 Key: HBASE-10566
 URL: https://issues.apache.org/jira/browse/HBASE-10566
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
 10566.v3.patch


 There are two issues:
 1) A confusion between the socket timeout and the call timeout
 Socket timeouts should be minimal: a default like 20 seconds, that could be 
 lowered to single digits timeouts for some apps: if we can not write to the 
 socket in 10 second, we have an issue. This is different from the total 
 duration (send query + do query + receive query), that can be longer, as it 
 can include remotes calls on the server and so on. Today, we have a single 
 value, it does not allow us to have low socket read timeouts.
 2) The timeout can be different between the calls. Typically, if the total 
 time, retries included is 60 seconds but failed after 2 seconds, then the 
 remaining is 58s. HBase does this today, but by hacking with a thread local 
 storage variable. It's a hack (it should have been a parameter of the 
 methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
 complicated, to be confirmed), but as well it does not really work, because 
 we can have multithreading issues (we use the updated rpc timeout of someone 
 else, or we create a new BlockingRpcChannelImplementation with a random 
 default timeout).
 Ideally, we could send the call timeout to the server as well: it will be 
 able to dismiss alone the calls that it received but git stick in the request 
 queue or in the internal retries (on hdfs for example).
 This will make the system more reactive to failure.
 I think we can solve this now, especially after 10525. The main issue is to 
 something that fits well with protobuf...
 Then it should be easy to have a pool of thread for writers and readers, w/o 
 a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-25 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10566:


Release Note: 
3 settings are now available to configure the socket in the HBase client:
- connect timeout: ipc.socket.timeout.connect (default: 10 seconds)
- write timeout: ipc.socket.timeout.read (default: 20 seconds)
- read timeout: ipc.socket.timeout.write (default: 60 seconds)

The per operation timeout is still controled by hbase.rpc.timeout 


 cleanup rpcTimeout in the client
 

 Key: HBASE-10566
 URL: https://issues.apache.org/jira/browse/HBASE-10566
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch, 
 10566.v3.patch


 There are two issues:
 1) A confusion between the socket timeout and the call timeout
 Socket timeouts should be minimal: a default like 20 seconds, that could be 
 lowered to single digits timeouts for some apps: if we can not write to the 
 socket in 10 second, we have an issue. This is different from the total 
 duration (send query + do query + receive query), that can be longer, as it 
 can include remotes calls on the server and so on. Today, we have a single 
 value, it does not allow us to have low socket read timeouts.
 2) The timeout can be different between the calls. Typically, if the total 
 time, retries included is 60 seconds but failed after 2 seconds, then the 
 remaining is 58s. HBase does this today, but by hacking with a thread local 
 storage variable. It's a hack (it should have been a parameter of the 
 methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
 complicated, to be confirmed), but as well it does not really work, because 
 we can have multithreading issues (we use the updated rpc timeout of someone 
 else, or we create a new BlockingRpcChannelImplementation with a random 
 default timeout).
 Ideally, we could send the call timeout to the server as well: it will be 
 able to dismiss alone the calls that it received but git stick in the request 
 queue or in the internal retries (on hdfs for example).
 This will make the system more reactive to failure.
 I think we can solve this now, especially after 10525. The main issue is to 
 something that fits well with protobuf...
 Then it should be easy to have a pool of thread for writers and readers, w/o 
 a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-24 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10566:


Status: Patch Available  (was: Open)

 cleanup rpcTimeout in the client
 

 Key: HBASE-10566
 URL: https://issues.apache.org/jira/browse/HBASE-10566
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10566.sample.patch, 10566.v1.patch


 There are two issues:
 1) A confusion between the socket timeout and the call timeout
 Socket timeouts should be minimal: a default like 20 seconds, that could be 
 lowered to single digits timeouts for some apps: if we can not write to the 
 socket in 10 second, we have an issue. This is different from the total 
 duration (send query + do query + receive query), that can be longer, as it 
 can include remotes calls on the server and so on. Today, we have a single 
 value, it does not allow us to have low socket read timeouts.
 2) The timeout can be different between the calls. Typically, if the total 
 time, retries included is 60 seconds but failed after 2 seconds, then the 
 remaining is 58s. HBase does this today, but by hacking with a thread local 
 storage variable. It's a hack (it should have been a parameter of the 
 methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
 complicated, to be confirmed), but as well it does not really work, because 
 we can have multithreading issues (we use the updated rpc timeout of someone 
 else, or we create a new BlockingRpcChannelImplementation with a random 
 default timeout).
 Ideally, we could send the call timeout to the server as well: it will be 
 able to dismiss alone the calls that it received but git stick in the request 
 queue or in the internal retries (on hdfs for example).
 This will make the system more reactive to failure.
 I think we can solve this now, especially after 10525. The main issue is to 
 something that fits well with protobuf...
 Then it should be easy to have a pool of thread for writers and readers, w/o 
 a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-24 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10566:


Attachment: 10566.v1.patch

 cleanup rpcTimeout in the client
 

 Key: HBASE-10566
 URL: https://issues.apache.org/jira/browse/HBASE-10566
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10566.sample.patch, 10566.v1.patch


 There are two issues:
 1) A confusion between the socket timeout and the call timeout
 Socket timeouts should be minimal: a default like 20 seconds, that could be 
 lowered to single digits timeouts for some apps: if we can not write to the 
 socket in 10 second, we have an issue. This is different from the total 
 duration (send query + do query + receive query), that can be longer, as it 
 can include remotes calls on the server and so on. Today, we have a single 
 value, it does not allow us to have low socket read timeouts.
 2) The timeout can be different between the calls. Typically, if the total 
 time, retries included is 60 seconds but failed after 2 seconds, then the 
 remaining is 58s. HBase does this today, but by hacking with a thread local 
 storage variable. It's a hack (it should have been a parameter of the 
 methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
 complicated, to be confirmed), but as well it does not really work, because 
 we can have multithreading issues (we use the updated rpc timeout of someone 
 else, or we create a new BlockingRpcChannelImplementation with a random 
 default timeout).
 Ideally, we could send the call timeout to the server as well: it will be 
 able to dismiss alone the calls that it received but git stick in the request 
 queue or in the internal retries (on hdfs for example).
 This will make the system more reactive to failure.
 I think we can solve this now, especially after 10525. The main issue is to 
 something that fits well with protobuf...
 Then it should be easy to have a pool of thread for writers and readers, w/o 
 a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-24 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10566:


Status: Open  (was: Patch Available)

 cleanup rpcTimeout in the client
 

 Key: HBASE-10566
 URL: https://issues.apache.org/jira/browse/HBASE-10566
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch


 There are two issues:
 1) A confusion between the socket timeout and the call timeout
 Socket timeouts should be minimal: a default like 20 seconds, that could be 
 lowered to single digits timeouts for some apps: if we can not write to the 
 socket in 10 second, we have an issue. This is different from the total 
 duration (send query + do query + receive query), that can be longer, as it 
 can include remotes calls on the server and so on. Today, we have a single 
 value, it does not allow us to have low socket read timeouts.
 2) The timeout can be different between the calls. Typically, if the total 
 time, retries included is 60 seconds but failed after 2 seconds, then the 
 remaining is 58s. HBase does this today, but by hacking with a thread local 
 storage variable. It's a hack (it should have been a parameter of the 
 methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
 complicated, to be confirmed), but as well it does not really work, because 
 we can have multithreading issues (we use the updated rpc timeout of someone 
 else, or we create a new BlockingRpcChannelImplementation with a random 
 default timeout).
 Ideally, we could send the call timeout to the server as well: it will be 
 able to dismiss alone the calls that it received but git stick in the request 
 queue or in the internal retries (on hdfs for example).
 This will make the system more reactive to failure.
 I think we can solve this now, especially after 10525. The main issue is to 
 something that fits well with protobuf...
 Then it should be easy to have a pool of thread for writers and readers, w/o 
 a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-24 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10566:


Status: Patch Available  (was: Open)

 cleanup rpcTimeout in the client
 

 Key: HBASE-10566
 URL: https://issues.apache.org/jira/browse/HBASE-10566
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10566.sample.patch, 10566.v1.patch, 10566.v2.patch


 There are two issues:
 1) A confusion between the socket timeout and the call timeout
 Socket timeouts should be minimal: a default like 20 seconds, that could be 
 lowered to single digits timeouts for some apps: if we can not write to the 
 socket in 10 second, we have an issue. This is different from the total 
 duration (send query + do query + receive query), that can be longer, as it 
 can include remotes calls on the server and so on. Today, we have a single 
 value, it does not allow us to have low socket read timeouts.
 2) The timeout can be different between the calls. Typically, if the total 
 time, retries included is 60 seconds but failed after 2 seconds, then the 
 remaining is 58s. HBase does this today, but by hacking with a thread local 
 storage variable. It's a hack (it should have been a parameter of the 
 methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
 complicated, to be confirmed), but as well it does not really work, because 
 we can have multithreading issues (we use the updated rpc timeout of someone 
 else, or we create a new BlockingRpcChannelImplementation with a random 
 default timeout).
 Ideally, we could send the call timeout to the server as well: it will be 
 able to dismiss alone the calls that it received but git stick in the request 
 queue or in the internal retries (on hdfs for example).
 This will make the system more reactive to failure.
 I think we can solve this now, especially after 10525. The main issue is to 
 something that fits well with protobuf...
 Then it should be easy to have a pool of thread for writers and readers, w/o 
 a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HBASE-10566) cleanup rpcTimeout in the client

2014-02-19 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-10566:


Attachment: 10566.sample.patch

 cleanup rpcTimeout in the client
 

 Key: HBASE-10566
 URL: https://issues.apache.org/jira/browse/HBASE-10566
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.99.0
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
 Fix For: 0.99.0

 Attachments: 10566.sample.patch


 There are two issues:
 1) A confusion between the socket timeout and the call timeout
 Socket timeouts should be minimal: a default like 20 seconds, that could be 
 lowered to single digits timeouts for some apps: if we can not write to the 
 socket in 10 second, we have an issue. This is different from the total 
 duration (send query + do query + receive query), that can be longer, as it 
 can include remotes calls on the server and so on. Today, we have a single 
 value, it does not allow us to have low socket read timeouts.
 2) The timeout can be different between the calls. Typically, if the total 
 time, retries included is 60 seconds but failed after 2 seconds, then the 
 remaining is 58s. HBase does this today, but by hacking with a thread local 
 storage variable. It's a hack (it should have been a parameter of the 
 methods, the TLS allowed to bypass all the layers. May be protobuf makes this 
 complicated, to be confirmed), but as well it does not really work, because 
 we can have multithreading issues (we use the updated rpc timeout of someone 
 else, or we create a new BlockingRpcChannelImplementation with a random 
 default timeout).
 Ideally, we could send the call timeout to the server as well: it will be 
 able to dismiss alone the calls that it received but git stick in the request 
 queue or in the internal retries (on hdfs for example).
 This will make the system more reactive to failure.
 I think we can solve this now, especially after 10525. The main issue is to 
 something that fits well with protobuf...
 Then it should be easy to have a pool of thread for writers and readers, w/o 
 a single thread per region server as today. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)