[jira] [Commented] (HBASE-11608) Add synchronous split
[ https://issues.apache.org/jira/browse/HBASE-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085219#comment-14085219 ] Jerry He commented on HBASE-11608: -- Yes. I was looking at the the current synchronous HBaseAdmin APIs and how they are implemented. Two different approaches. +1+ Inside the API implementation, the first part is to send the request asynchronously. Then the second part loops for the status via a status inquiry (direct RPC inquiry or indirect inquiry thru meta status, etc) until the request is completed. For example, deleteTable, disableTable and snapshot. The identification used to poll the status is table name, or snapshot name in the above cases. +2+ The second approach is a real synchronous call. The server side will not return until the work is really completed. For example, flush. Other HBaseAdmin APIs are asynchronous, for example, the current compact and split API. The client only submits the request without waiting for status at all. For this JIRA, the 2nd approach is cleaner. We could do the 1st approach. For example, the client asynchronously submits the split request. Then the client loops for the split to complete by polling on the meta status to wait for the parent to split and daughters to come online. This is similar to what deleteTable and disableTable do. The identification to poll and inquiry the status is the parent region name. But this approach will be a little messy. . Add synchronous split - Key: HBASE-11608 URL: https://issues.apache.org/jira/browse/HBASE-11608 Project: HBase Issue Type: New Feature Components: Admin Affects Versions: 0.99.0, 0.98.5 Reporter: Jerry He Users have been asking for this. We have an internal requirement for this as well. The goal is a provide a Admin API (and shell command) so that users can request to split a region or table and get the split completion result synchronously. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11608) Add synchronous split
[ https://issues.apache.org/jira/browse/HBASE-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085250#comment-14085250 ] Matteo Bertozzi commented on HBASE-11608: - the current admin implementation is kind of broken, and since we lack in infrastructure to build proper sync operations overtime we implemented workaround to obtain the sync operation. The problems are: * If you have the sync call on the server and the operation is long: 1) you are blocking all the other rpc handlers. 2) in case of failure client side or server side you don't know the state of the operation. * if you use the poll by relying on something like META we are stuck in keeping that execution order and that META format to keep the client compatibility. (one solution to avoid this problem is to have something like the isDone method for snapshot, where the server gives you the status of the operation. but you end up having tons of isDone for each sync operation that you want) I'd say that we should stop adding features can't be implemented in a straightforward way and that may prevent future rolling upgrade compatibily. HBASE-5487 and HBASE-9864 should solve all these problems with long operations and operation across the cluster. Add synchronous split - Key: HBASE-11608 URL: https://issues.apache.org/jira/browse/HBASE-11608 Project: HBase Issue Type: New Feature Components: Admin Affects Versions: 0.99.0, 0.98.5 Reporter: Jerry He Users have been asking for this. We have an internal requirement for this as well. The goal is a provide a Admin API (and shell command) so that users can request to split a region or table and get the split completion result synchronously. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11608) Add synchronous split
[ https://issues.apache.org/jira/browse/HBASE-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14085436#comment-14085436 ] Jerry He commented on HBASE-11608: -- Linked the issues here. HBASE-5487, HBASE-9864 and HBASE-10544 Add synchronous split - Key: HBASE-11608 URL: https://issues.apache.org/jira/browse/HBASE-11608 Project: HBase Issue Type: New Feature Components: Admin Affects Versions: 0.99.0, 0.98.5 Reporter: Jerry He Users have been asking for this. We have an internal requirement for this as well. The goal is a provide a Admin API (and shell command) so that users can request to split a region or table and get the split completion result synchronously. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11608) Add synchronous split
[ https://issues.apache.org/jira/browse/HBASE-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082941#comment-14082941 ] Nick Dimiduk commented on HBASE-11608: -- I was thinking of the case when the connection between client and RS is dropped. When that happens, the result of the split operation can not be discovered, other than by side-effect. There are many other meta operations like this where the client cannot know a result except by side-effect. As I understand it, resolving these kinds of issues in a general case is the primary motivation behind HBASE-5487. Add synchronous split - Key: HBASE-11608 URL: https://issues.apache.org/jira/browse/HBASE-11608 Project: HBase Issue Type: New Feature Components: Admin Affects Versions: 0.99.0, 0.98.5 Reporter: Jerry He Users have been asking for this. We have an internal requirement for this as well. The goal is a provide a Admin API (and shell command) so that users can request to split a region or table and get the split completion result synchronously. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11608) Add synchronous split
[ https://issues.apache.org/jira/browse/HBASE-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078179#comment-14078179 ] Jerry He commented on HBASE-11608: -- Linked to an old issue that raised this request. Add synchronous split - Key: HBASE-11608 URL: https://issues.apache.org/jira/browse/HBASE-11608 Project: HBase Issue Type: New Feature Components: Admin Affects Versions: 0.99.0, 0.98.5 Reporter: Jerry He Users have been asking for this. We have an internal requirement for this as well. The goal is a provide a Admin API (and shell command) so that users can request to split a region or table and get the split completion result synchronously. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11608) Add synchronous split
[ https://issues.apache.org/jira/browse/HBASE-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078215#comment-14078215 ] Jerry He commented on HBASE-11608: -- Carry over the design comment from HBase-2949 so we will continue the discussion here. +Current implementation on the client side+ In HBaseAdmin.split(), 1. Input is a region name 1a. If a split key is given, we will send a split request to the region server hosting the region containing the key. Asynchronous operation. 2b. If no split key, we will send a split request to the region server hosting the region to split based split policy. Asynchronous operation. 2. Input is a table name 1a. If a split key is given, we will send a split request to the region server hosting the region containing the key. Asynchronous operation. 2b. If no split key, we will send split requests in a loop to (region, rs) pair for all the regions of the table to split based split policy. Asynchronous operation. +Current implementation on the region server side+ HRegionServer - RSRpcServices.splitRegion() - compactSplitThread.requestSplit() -- CompactSplitThread.splits.execute(new SplitRequest()) CompactSplitThread.splits is a ThreadPoolExecutor. SplitRequest is a Runnable that does all the slit work. +Goal+ Backward compatible, no impact on existing system triggered split, minimum overall impact, more?? +Proposed change on the region server side+ Provide a synchronous compactSplitThread.requestSplitSynchronous() -- CompactSplitThread.splits.submit(new SplitRequest()) and wait for 'future' completion. SplitRequest currently is a Runnable. Possibly change it to CallableVoid so that we can propagate up any exceptions during split operation as we want to get notified any errors/exceptions synchronously as well. HRegionServer - RSRpcServices.splitRegionSynchronous() will use the new requestSplitSynchronous(). +On the HBaseAdmin side+ If we go synchronous in HBaseAdmin.split(), one use case scenario will become less efficient. i.e. the 2.b case from above: bq. 2b. If no split key, we will send split requests in a loop to (region, rs) pair for all the regions of the table to split based split policy. Because in this case, each (region, rs) pair will be requested serially and synchronously. We'll provide a HBaseAdmin.splitSynchronous() and keep the current asynchronous HBaseAdmin.split(). Users are kept aware of the choices they have. In the future HBaseAdmin.splitSynchronous() can be improved (e.g. using Global procedure in split regions in parallel). This is my initial thinking. Your comments please, if this is doable, if there is any concern, if I miss or misunderstand anything. I am sure I do :-) Add synchronous split - Key: HBASE-11608 URL: https://issues.apache.org/jira/browse/HBASE-11608 Project: HBase Issue Type: New Feature Components: Admin Affects Versions: 0.99.0, 0.98.5 Reporter: Jerry He Users have been asking for this. We have an internal requirement for this as well. The goal is a provide a Admin API (and shell command) so that users can request to split a region or table and get the split completion result synchronously. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11608) Add synchronous split
[ https://issues.apache.org/jira/browse/HBASE-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078343#comment-14078343 ] Nick Dimiduk commented on HBASE-11608: -- Nice feature [~jinghe]. How will your future approach handle intermittent connectivity issues between client and RS? Have you looked at HBASE-5487 ? Add synchronous split - Key: HBASE-11608 URL: https://issues.apache.org/jira/browse/HBASE-11608 Project: HBase Issue Type: New Feature Components: Admin Affects Versions: 0.99.0, 0.98.5 Reporter: Jerry He Users have been asking for this. We have an internal requirement for this as well. The goal is a provide a Admin API (and shell command) so that users can request to split a region or table and get the split completion result synchronously. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11608) Add synchronous split
[ https://issues.apache.org/jira/browse/HBASE-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078597#comment-14078597 ] Jerry He commented on HBASE-11608: -- Hi, [~ndimiduk] I was aware of the grand plan in HBASE-5487, and just went though it again :-) bq. How will your future approach handle intermittent connectivity issues between client and RS? If client issues synchronous split request, the client RPC connection will hold for longer period of time. Or do you mean in the future if we want to use Master-coordinated tasks to splt the table synchronously, how do we deal with master and/or RS failures? Add synchronous split - Key: HBASE-11608 URL: https://issues.apache.org/jira/browse/HBASE-11608 Project: HBase Issue Type: New Feature Components: Admin Affects Versions: 0.99.0, 0.98.5 Reporter: Jerry He Users have been asking for this. We have an internal requirement for this as well. The goal is a provide a Admin API (and shell command) so that users can request to split a region or table and get the split completion result synchronously. -- This message was sent by Atlassian JIRA (v6.2#6252)