[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469986#comment-13469986 ] Hudson commented on HBASE-6165: --- Integrated in HBase-0.92-security #143 (See [https://builds.apache.org/job/HBase-0.92-security/143/]) HBASE-6724 Port HBASE-6165 'Replication can overrun .META. scans on cluster re-start' to 0.92 (Revision 1381451) Result = FAILURE tedyu : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.94.2, 0.96.0 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452491#comment-13452491 ] Jean-Daniel Cryans commented on HBASE-6165: --- [~whitingj], originally replication was using the normal handlers and was just deadlocking the clusters in a different way. ReplicationSink uses the HBase client which can block for ungodly amounts of time so it would fill up the handlers and the RS would stop serving requests. HBASE-6550 changed the latter that a bit by setting low timeouts via replication-specific client-side configuration parameters (if it was using the normal client configurations it would also affect all the other clients). With HBASE-6165 it's even safer since replication is sandboxed. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452552#comment-13452552 ] Jeff Whiting commented on HBASE-6165: - @stack and @jdcryans Thanks for the explanation. I can see how it would deadlock on itself. I also found HBASE-3401 which talks about the deadlock. We patched our cdh4 cluster with HBASE-6724 and it has been running much smoother. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13452557#comment-13452557 ] Himanshu Vashishtha commented on HBASE-6165: [~whitingj] Specifically, replication specific jira about deadlocking on normal handlers is HBASE-4280. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449245#comment-13449245 ] Jeff Whiting commented on HBASE-6165: - I maybe a little late to the party, but why is replication using any kind of higher than normal priority handlers? It looks like we all agree that they shouldn't be using the high priority handlers. It looks like they now have their own medium priority handlers. But I don't see an argument as to why they don't just use the normal handlers priority handlers. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449382#comment-13449382 ] Hudson commented on HBASE-6165: --- Integrated in HBase-0.92 #558 (See [https://builds.apache.org/job/HBase-0.92/558/]) HBASE-6724 Port HBASE-6165 'Replication can overrun .META. scans on cluster re-start' to 0.92 (Revision 1381451) Result = FAILURE Tedyu : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449409#comment-13449409 ] stack commented on HBASE-6165: -- @Jeff IIRC they need to be on a channel other than user priority queue because they can overwhelm user loadings (e.g. big cluster replication into small cluster). We've been learning a bunch of late about replicating and its fair to say that some pieces need a bit of rethink making them more robust around cases such as aforementioned large into small or one we ran into ourselves recently where we couldn't start the small cluster because the high priority handlers were all occupied by replication soon after startup (This patch would help w/ that scenario). I see that this patch has just been backported to 0.92 -- hopefully that will be of help to you in your current predicament. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448255#comment-13448255 ] Hudson commented on HBASE-6165: --- Integrated in HBase-0.94-security #51 (See [https://builds.apache.org/job/HBase-0.94-security/51/]) HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448324#comment-13448324 ] Hudson commented on HBASE-6165: --- Integrated in HBase-0.94-security-on-Hadoop-23 #7 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/7/]) HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236) Result = FAILURE larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445447#comment-13445447 ] Lars Hofhansl commented on HBASE-6165: -- The canonical repository is the SVN repository, Himanshu. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445464#comment-13445464 ] Himanshu Vashishtha commented on HBASE-6165: good to know; will set up a svn/eclipse environment. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445465#comment-13445465 ] Lars Hofhansl commented on HBASE-6165: -- I'll make a patch for now. For folks who like git, svn is a pain (or so I heard) :) Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445498#comment-13445498 ] Himanshu Vashishtha commented on HBASE-6165: Thanks for the final patch Lars :) Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445614#comment-13445614 ] Hudson commented on HBASE-6165: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #155 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/155/]) HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379235) Result = FAILURE larsh : Files : * /hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestPriorityRpc.java Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445641#comment-13445641 ] Hudson commented on HBASE-6165: --- Integrated in HBase-0.94 #443 (See [https://builds.apache.org/job/HBase-0.94/443/]) HBASE-6165 Replication can overrun .META. scans on cluster re-start (Himanshu Vashishtha) (Revision 1379236) Result = SUCCESS larsh : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/HConstants.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: 6165-v6.txt, HBase-6165-94-v1.patch, HBase-6165-94-v2.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch, HBase-6165-v5.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442786#comment-13442786 ] Jean-Daniel Cryans commented on HBASE-6165: --- FWIW the v4 patch really doesn't apply on 0.94: {noformat} su-jdcryans-2:hbase-git-su jdcryans$ patch -p1 -F 10 --dry-run HBase-6165-v4.patch patching file src/main/java/org/apache/hadoop/hbase/HConstants.java Hunk #1 succeeded at 650 with fuzz 2 (offset -42 lines). patching file src/main/java/org/apache/hadoop/hbase/ipc/HBaseRpcMetrics.java Hunk #1 succeeded at 98 (offset -11 lines). patching file src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java Hunk #1 succeeded at 225 (offset -51 lines). Hunk #2 succeeded at 1304 (offset -360 lines). Hunk #3 succeeded at 1335 with fuzz 1 (offset -414 lines). Hunk #4 succeeded at 1356 (offset -415 lines). Hunk #5 succeeded at 1526 (offset -405 lines). Hunk #6 succeeded at 1630 with fuzz 3 (offset -415 lines). Hunk #7 succeeded at 1652 (offset -423 lines). Hunk #8 succeeded at 1664 (offset -423 lines). patching file src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Hunk #1 succeeded at 449 with fuzz 2 (offset 153 lines). Hunk #2 FAILED at 658. Hunk #3 succeeded at 486 (offset -87 lines). Hunk #4 succeeded at 504 (offset -87 lines). Hunk #5 succeeded at 520 (offset -87 lines). Hunk #6 succeeded at 536 (offset -87 lines). Hunk #7 succeeded at 3159 (offset 1061 lines). Hunk #8 succeeded at 3170 with fuzz 1 (offset 1059 lines). Hunk #9 succeeded at 3630 with fuzz 3 (offset 529 lines). Hunk #10 FAILED at 3836. Hunk #11 FAILED at 3883. Hunk #12 FAILED at 3911. Hunk #13 FAILED at 3998. Hunk #14 FAILED at 4037. Hunk #15 FAILED at 4068. Hunk #16 FAILED at 4097. Hunk #17 FAILED at 4131. 9 out of 17 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java.rej {noformat} Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442790#comment-13442790 ] Himanshu Vashishtha commented on HBASE-6165: The above patch was for trunk; will upload a 0.94 one. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13442868#comment-13442868 ] Hadoop QA commented on HBASE-6165: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12542698/HBase-6165-94-v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2710//console This message is automatically generated. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-94-v1.patch, HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437478#comment-13437478 ] Hadoop QA commented on HBASE-6165: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541517/HBase-6165-v4.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2619//console This message is automatically generated. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch, HBase-6165-v4.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437450#comment-13437450 ] Hadoop QA commented on HBASE-6165: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12541512/HBase-6165-v3.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2616//console This message is automatically generated. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437452#comment-13437452 ] Zhihong Ted Yu commented on HBASE-6165: --- Patch v3 looks clean. nit: {code} +if(handlers != null) { + for(Handler h : handlers) { {code} Space should be added immediately before '(' Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch, HBase-6165-v3.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433313#comment-13433313 ] Himanshu Vashishtha commented on HBASE-6165: So, shall I upload with a +ve default value for the number of custom handlers then? For the naming of existing handlers, I can another jira? Thoughts? Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433327#comment-13433327 ] Zhihong Ted Yu commented on HBASE-6165: --- Sounds good. Consider renaming hbase.regionserver.custom.priority.handler.count to hbase.regionserver.custom.handler.count Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433690#comment-13433690 ] Zhihong Ted Yu commented on HBASE-6165: --- From https://builds.apache.org/job/PreCommit-HBASE-Build/2556/console: {code} /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/dev-support/test-patch.sh: line 353: 393 Aborted {code} Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433700#comment-13433700 ] Himanshu Vashishtha commented on HBASE-6165: On current trunk (with commit 7b9cbf0c0b35468591b3a1cf5c93951461590f8c), it applied clean. Shall I upload again? Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433703#comment-13433703 ] Zhihong Ted Yu commented on HBASE-6165: --- Yes, please. Aborted test run is different from compilation error. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433707#comment-13433707 ] Elliott Clark commented on HBASE-6165: -- I still don't understand the naming. There's nothing custom about these handlers. They handle replication. REPLICATION_OPS, MISC_OPS, INTERNAL_OPS any of those seem convey more about the type of operations these threads will handle. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433718#comment-13433718 ] Himanshu Vashishtha commented on HBASE-6165: @Elliot: I don't want to tie them with replication. As you see, they have +ve default value now, so it will not be correct to call them REPLICATION_OPS. Any method with CUSTOM_OPS attributed will be handled with it. The nearest candidate to use this is Security related methods I think. MISC/INTERNAL doesn't convey anything specific too Don't know, but CUSTOM still looks ok to me... :) But will be glad to change with more appropriate name. @Ted: What does that error mean btw? Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433725#comment-13433725 ] Elliott Clark commented on HBASE-6165: -- It's not that custom doesn't convey enough meaning (I could live with that). Custom implies that there's been some modification from normal or stock. That is not the case. These handlers are there for things that are built in. Replication and security are core pieces of functionality. Naming things custom gives the impression that they are not as supported as other operation, which is not the case. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433756#comment-13433756 ] Zhihong Ted Yu commented on HBASE-6165: --- @Himanshu: I don't know the root cause for abortion of QA run. w.r.t. queue naming, can I assume that misc(ellaneous) is acceptable to everyone ? Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13433823#comment-13433823 ] Andrew Purtell commented on HBASE-6165: --- MISC doesn't have any meaning. Neither does custom. IMO, name these after what they actually do. If this is for replication, name it REPLICATION_QOS. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch, HBase-6165-v2.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432014#comment-13432014 ] Elliott Clark commented on HBASE-6165: -- A better name is probably needed for the Queue. Custom doesn't really get across what's can go into that qos level (replication). Since this starts 0 custom priority handlers by default it will add another undocumented step when enabling replication. We should either make the number of handlers start by default 0, or have the number depend on if replication is enabled. Why choose the number 5 for the priority ? Since the QOS_THRESHOLD is 10. (Even if they are arbitrary seems like we should have some reason and a comment about the numbering scheme.) Thanks for doing this. Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432015#comment-13432015 ] Lars Hofhansl commented on HBASE-6165: -- Patch looks good generally. Few comments: # The naming is weird. These are not CustomQOS, but MediumQOS methods, right? # Is there a way to generalize this to sets of Handlers with different priority (not important, though). # By default now (if hbase.regionserver.custom.priority.handler.count is not set), replicateWALEntry would use non-priority handlers... Which is not right, I think. It should revert back to the current behavior in that case (which is to do use the priorityQOS. What I still do not understand... Does this problem always happen? Does it happen because replicateWALEntry takes too long to finish? Does this only happen when the slave is already degraded for other reasons? Should we also work on replicateWALEntry failing faster in case of problems (shorter/fewer retries, etc)? Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432022#comment-13432022 ] Zhihong Ted Yu commented on HBASE-6165: --- w.r.t. default value for hbase.regionserver.custom.priority.handler.count, I agree with Lars and Elliot that the default should be 0. Actually we should perform check on the actual value: if user specifies 0 and either replication or security is enabled, we should raise the value to, say, 3. Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432023#comment-13432023 ] Elliott Clark commented on HBASE-6165: -- @Lars We had this happen when a large cluster is replication to a small cluster. Source (Large Cluster) Sink (Small cluster) After the sink goes down or re-starts, the source waits for meta to come up. After that lots of replicate wal edits are shipped to all the server. So many in fact that the server holding meta does not have any left to answer meta scans or edits. Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432029#comment-13432029 ] Himanshu Vashishtha commented on HBASE-6165: [~eclark]: I used custom, because the current naming scheme is not appropriate in my opinion (I started with medium/semi QOS, but then changed it to Custom). Using priority is kind of a misnomer as there is no priority as such, its just different set of handlers that is serving the requests. Though we call them priorityHandlers, etc, they are just like regular handlers but for meta operations. I think we should change their name to metaOpsHandlers (or metaHandlers). Yea, I just used a threshold b/w 0 and 10. bq. Since this starts 0 custom priority handlers by default it will add another undocumented step when enabling replication. We should either make the number of handlers start by default 0, or have the number depend on if replication is enabled. I am ok with 0 default; don't think it should be tied to replication as they can be used for other methods too (such as Security, etc) @Lars: bq. The naming is weird. These are not CustomQOS, but MediumQOS methods, right? Hope you find it rationale now. bq. By default now (if hbase.regionserver.custom.priority.handler.count is not set), replicateWALEntry would use non-priority handlers... Which is not right, I think. It should revert back to the current behavior in that case (which is to do use the priorityQOS. default 0 sounds good? bq. What I still do not understand... Does this problem always happen? Does it happen because replicateWALEntry takes too long to finish? Does this only happen when the slave is already degraded for other reasons? Should we also work on replicateWALEntry failing faster in case of problems (shorter/fewer retries, etc)? It can occur when the slave cluster is slow. And whenever it happens, it will make the entire cluster unresponsive. I have a patch which adds the fail fast behavior in sink and has been testing it too. It looks good so far. I tried creating a new JIRA but IOE while creating it (see INFRA-5131). Will attach the patch once its created. Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432045#comment-13432045 ] Lars Hofhansl commented on HBASE-6165: -- @Himanshu: Thanks. Yes makes sense. I like MetaHandlers. Re: failing fast: I think instead of using an HTablePool the sink should create a Connection and ThreadPool and then create HTable on demand using these (see: HBASE-4805), together with short timeouts and few retries. Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432050#comment-13432050 ] Zhihong Ted Yu commented on HBASE-6165: --- +1 on shifting away from using HTablePool in the JIRA for fail-fast. Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432053#comment-13432053 ] Himanshu Vashishtha commented on HBASE-6165: Lars, Ted and Elliot: Thanks for the feedback. @Lars: Changing the name is beyond the scope of this jira, no? Another jira for that? re: failfast: Yeah, the patch still uses HTablePool, but submits the batch in a threadpool (of ReplicationSink). Meanwhile, the handler keeps checking whether the client is still alive or not, while waiting for the task to finish. If the client is out, it cancels the task. Also, ReplicationSink now has its own conf object where it can decorate it with its own timeout, number of retrials etc. Is there an open jira for ReplicationSink (can't create a jira yet)? Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432056#comment-13432056 ] Hadoop QA commented on HBASE-6165: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540074/HBase-6165-v1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.master.TestAssignmentManager org.apache.hadoop.hbase.TestLocalHBaseCluster Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2542//console This message is automatically generated. Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432068#comment-13432068 ] Himanshu Vashishtha commented on HBASE-6165: Created fail-fast replicationSink jira HBase-6550 (https://issues.apache.org/jira/browse/HBASE-6550) Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432205#comment-13432205 ] Elliott Clark commented on HBASE-6165: -- {quote}Using priority is kind of a misnomer as there is no priority as such{quote} The actual handlers don't imply some sort of QOS, but the naming does correspond to {low|medium|high} priority set of operations that can be in that handler's queue. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META. scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13432224#comment-13432224 ] Himanshu Vashishtha commented on HBASE-6165: Yeah, and I think it should be changed to what it actually do. So, changing the QOS and respective handlers in the line of CLIENT_OPS, CUSTOM_OPS, and META_OPS seems more appropriate. Replication can overrun .META. scans on cluster re-start Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.2 Attachments: HBase-6165-v1.patch When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396052#comment-13396052 ] Jean-Daniel Cryans commented on HBASE-6165: --- The other solution is to have a different set of handlers, but this requires to either hack HBaseServer to add another queue and priority level or refactor it to make it more configurable. Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13393636#comment-13393636 ] Elliott Clark commented on HBASE-6165: -- Upping the number of privileged ipc threads is the workaround that we're going to deploy soon. Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start
[ https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13296141#comment-13296141 ] Lars Hofhansl commented on HBASE-6165: -- What's a good approach to avoid this? Replication can overrun .META scans on cluster re-start --- Key: HBASE-6165 URL: https://issues.apache.org/jira/browse/HBASE-6165 Project: HBase Issue Type: Bug Reporter: Elliott Clark When restarting a large set of regions on a reasonably small cluster the replication from another cluster tied up every xceiver meaning nothing could be onlined. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira