[jira] [Commented] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services
[ https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254268#comment-13254268 ] Jieshan Bean commented on HBASE-1936: - I will upload the patch to review board after more tests. Based on stack's original design, ClassPathLocalizer is a configurable class, that's why we use the refrection to call the method. I think I can change this. @stack, what do you think? I'm also thinking about how to add test case. Thank you, Ted. @Andrew, thank you. I will try to reuse the code. I agree with we should use one classloader. The classloader in CP is not a general one. ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services --- Key: HBASE-1936 URL: https://issues.apache.org/jira/browse/HBASE-1936 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Jieshan Bean Labels: noob Attachments: HBASE-1936-trunk(forReview).patch, cp_from_hdfs.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services
[ https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249655#comment-13249655 ] Jieshan Bean commented on HBASE-1936: - Thank you, stack. I will make a patch based on this one. ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services --- Key: HBASE-1936 URL: https://issues.apache.org/jira/browse/HBASE-1936 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Daniel Ploeg Labels: noob Attachments: cp_from_hdfs.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services
[ https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249674#comment-13249674 ] Jieshan Bean commented on HBASE-1936: - Thank you Lars. I will check:) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services --- Key: HBASE-1936 URL: https://issues.apache.org/jira/browse/HBASE-1936 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Daniel Ploeg Labels: noob Attachments: cp_from_hdfs.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services
[ https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248335#comment-13248335 ] Jieshan Bean commented on HBASE-1936: - Yes. Regarding on how to add the new class into HDFS seems not mentioned in this patch. Should we also take care of this? ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services --- Key: HBASE-1936 URL: https://issues.apache.org/jira/browse/HBASE-1936 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Daniel Ploeg Labels: noob Attachments: cp_from_hdfs.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services
[ https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247180#comment-13247180 ] Jieshan Bean commented on HBASE-1936: - @stack, can you look back to this new feature again? I think it's very useful either for hot deployment of Filter or dynamic coprocessor. Any potential reasons for hanging this task up? Thank you. ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services --- Key: HBASE-1936 URL: https://issues.apache.org/jira/browse/HBASE-1936 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Daniel Ploeg Labels: noob Attachments: cp_from_hdfs.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)
[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244093#comment-13244093 ] Jieshan Bean commented on HBASE-5682: - Everything seems good to me. Only a minor doubt, is it necessary to close zooKeeper before set it as null? If HConnectionImplementation#managed is true, HConnectionImplementation#abort doesn't set closed to true, just calls close method. It makes sense to me:). So the retry logic introduced in HBASE-5153 seems redundant. If one want to manage the connection by himself. If the connection is aborted. We should suggest to recreate the HConnection and HTable, right? Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only) -- Key: HBASE-5682 URL: https://issues.apache.org/jira/browse/HBASE-5682 Project: HBase Issue Type: Improvement Components: client Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical Fix For: 0.94.0 Attachments: 5682-all-v2.txt, 5682-all-v3.txt, 5682-all.txt, 5682-v2.txt, 5682.txt Just realized that without this HBASE-4805 is broken. I.e. there's no point keeping a persistent HConnection around if it can be rendered permanently unusable if the ZK connection is lost temporarily. Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5312) Closed parent region present in Hlog.lastSeqWritten
[ https://issues.apache.org/jira/browse/HBASE-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240340#comment-13240340 ] Jieshan Bean commented on HBASE-5312: - I'm afraid it's not the same issue with HBASE-5568. No concurrent flushing happened when the edit with the sequenceId 20312224 added into Region 2acaf8e3acfd2e8a5825a1f6f0aca4a8. Though that problem may also happened in this issue. {noformat} 00:28:25,460 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~153.9m for region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in 8444ms, sequenceid=20294110, compaction requested=true 00:28:25,460 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. because regionserver20020.cacheFlusher; priority=2, compaction queue size=5835 00:30:35,328 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. 00:30:35,943 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8., current region memstore size 129.5m 00:30:37,963 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8., flushing=true, writesEnabled=true 00:30:37,971 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. 00:30:37,971 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8., current region memstore size 129.7m 00:30:47,907 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/5960455867013769207 to hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/3682074154882687307 00:30:47,907 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/5960455867013769207 to hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/3682074154882687307 00:30:49,241 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/3682074154882687307, entries=233841, sequenceid=20311822, memsize=129.5m, filesize=89.5m 00:30:49,242 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~129.5m for region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in 13299ms, sequenceid=20311822, compaction requested=true 00:30:49,242 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. because User-triggered split; priority=1, compaction queue size=5840 00:30:55,214 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/1755862026714756815 to hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123 00:30:55,214 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/1755862026714756815 to hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123 00:30:59,614 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123, entries=7537, sequenceid=20312223, memsize=4.2m, filesize=2.9m 00:30:59,787 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~133.5m for region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in 21816ms, sequenceid=20312223, compaction requested=true 00:30:59,787 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. because regionserver20020.cacheFlusher; priority=0, compaction queue size=5840 00:31:12,605 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting compaction on region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. 00:31:12,607 INFO org.apache.hadoop.hbase.regionserver.HRegion: completed compaction on region Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. after 0sec 00:31:12,607 INFO
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219951#comment-13219951 ] Jieshan Bean commented on HBASE-4991: - @Jieshan Yes that'll work. How you do it? You have a patch? @Stack: I have written a client tool to do this for 90. Only delete the specified regions. I will modify it and submit a patch if this tool is necessary. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4991) Provide capability to delete named region
[ https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219785#comment-13219785 ] Jieshan Bean commented on HBASE-4991: - We also have this requirement: delete some specified regions, specillaly about the data in HDFS, and merge those regions. We do it after diable the table. So it seems very simple: 1. Ensure the table has been disabled. 2. Scan META, and find our all the regions should be deleted. 3. Delete information from .META. 4. Delete Region directory in HDFS. 5. Add a new empty region to avoid region hole in .META. 6. enable table. Provide capability to delete named region - Key: HBASE-4991 URL: https://issues.apache.org/jira/browse/HBASE-4991 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Mubarak Seyed Fix For: 0.94.0 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch See discussion titled 'Able to control routing to Solr shards or not' on lily-discuss User may want to quickly dispose of out of date records by deleting specific regions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213237#comment-13213237 ] Jieshan Bean commented on HBASE-5396: - All tests passed for TRUNK. Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, HBASE-5396-trunk.patch, Logs-TestFor92.rar The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213349#comment-13213349 ] Jieshan Bean commented on HBASE-5396: - @stack, RegionServer process was killed manually. Not aborted itself. This is the steps of my test: Suppose there's 3 nodes in cluster: A, B, C. 1. Create a table with 1000 regions. 2. Assign all regions to C. 3. Kill C. During the ServerShutdownHandler processing, then kill A(At this time, some regions may been openning on this server.) See whether all the regions can be opened in time(And also check the regions which in regionPlans). Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, HBASE-5396-trunk.patch, Logs-TestFor92.rar The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213381#comment-13213381 ] Jieshan Bean commented on HBASE-5396: - Only see the below and related logs. It means the region in regionPlans can be reassigned with this patch. {noformat} 2012-02-20 23:25:36,472 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: 1 regions which planned to open on C3S31,20020,1329798177361 be re-assigned. {noformat} Thanks, Stack. Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, HBASE-5396-trunk.patch, Logs-TestFor92.rar The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211917#comment-13211917 ] Jieshan Bean commented on HBASE-5396: - I tested the patch for 92 by unit test and also in real cluster. This problem seems not represent in 92 and trunk version. I will give more tests and went through the code to check whether it's necessary for 92 and trunk. Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, HBASE-5396-90-forReview.patch, HBASE-5396-90.patch The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212352#comment-13212352 ] Jieshan Bean commented on HBASE-5396: - RegionServer Log: {noformat} 2012-02-20 23:24:49,432 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x3359b077bc90018 Attempting to transition node 897ad476f426e58a36cae77c7302be1d from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 2012-02-20 23:24:49,447 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x3359b077bc90018 Successfully transitioned node 897ad476f426e58a36cae77c7302be1d from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 2012-02-20 23:24:49,448 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Opening region: {NAME = 'jeason,0014,1329747764680.897ad476f426e58a36cae77c7302be1d.', STARTKEY = '0014', ENDKEY = '0014', ENCODED = 897ad476f426e58a36cae77c7302be1d,} 2012-02-20 23:24:49,448 INFO org.apache.hadoop.hbase.regionserver.HRegion: Setting up tabledescriptor config now ... 2012-02-20 23:24:49,448 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Instantiated jeason,0014,1329747764680.897ad476f426e58a36cae77c7302be1d. 2012-02-20 23:24:49,455 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined jeason,0014,1329747764680.897ad476f426e58a36cae77c7302be1d.; next sequenceid=1 //This regionserver was killed just after the below log. 2012-02-20 23:24:49,455 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x3359b077bc90018 Attempting to transition node 897ad476f426e58a36cae77c7302be1d from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING {noformat} HMaster Log: {noformat} 2012-02-20 23:25:36,442 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region jeason,0014,1329747764680.897ad476f426e58a36cae77c7302be1d. to C3S32,20020,1329798176762 2012-02-20 23:25:36,442 DEBUG org.apache.hadoop.hbase.master.ServerManager: New connection to C3S32,20020,1329798176762 2012-02-20 23:25:36,462 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=C3S32,20020,1329798176762, region=561e68708199320894bfcfc533cae772 2012-02-20 23:25:36,462 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for jeason,0013,1329747764680.561e68708199320894bfcfc533cae772. from C3S32,20020,1329798176762; deleting unassigned node 2012-02-20 23:25:36,462 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:2-0x2359b0778ec000d Deleting existing unassigned node for 561e68708199320894bfcfc533cae772 that is in expected state RS_ZK_REGION_OPENED 2012-02-20 23:25:36,463 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=C3S32,20020,1329798176762, region=3455f468b76dcf7cfe3881f9e156d0c8 2012-02-20 23:25:36,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:2-0x2359b0778ec000d Successfully deleted unassigned node for region 561e68708199320894bfcfc533cae772 in expected state RS_ZK_REGION_OPENED 2012-02-20 23:25:36,468 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region jeason,0013,1329747764680.561e68708199320894bfcfc533cae772. has been deleted. 2012-02-20 23:25:36,468 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region jeason,0013,1329747764680.561e68708199320894bfcfc533cae772. that was online on C3S32,20020,1329798176762 2012-02-20 23:25:36,472 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: 1 regions which planned to open on C3S31,20020,1329798177361 be re-assigned. {noformat} Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, HBASE-5396-trunk.patch The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212357#comment-13212357 ] Jieshan Bean commented on HBASE-5396: - I'm running the unit tests for TRUNK. Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, HBASE-5396-trunk.patch The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208378#comment-13208378 ] Jieshan Bean commented on HBASE-5396: - All tests passed. And I also tested it in real cluster. Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.7 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-forReview.patch, HBASE-5396-90.patch The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208222#comment-13208222 ] Jieshan Bean commented on HBASE-5396: - Thanks, Ted. Only one doubt basing on your comments: bq.Since the member of the Set isn't RegionPlan, I suggest renaming the above field to regionsOnServer. Since the Set came from the regionPlans. So can I keep this name as regionPlanOnThisServer? I think regionsOnServer will bring some misunderstanding. right? I have test this patch in 90 cluster for many times. And it works fine. I will upload the new patch today. Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.7 Attachments: HBASE-5396-90.patch The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler
[ https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208227#comment-13208227 ] Jieshan Bean commented on HBASE-5396: - ok..I rename it as regionsOnServer. I will upload the new patch after tests. Thanks, Ted. Handle the regions in regionPlans while processing ServerShutdownHandler Key: HBASE-5396 URL: https://issues.apache.org/jira/browse/HBASE-5396 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.6 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.7 Attachments: HBASE-5396-90.patch The regions plan to open on this server while ServerShutdownHandler is handling, just be removed from AM.regionPlans, and only left to TimeoutMonitor handle these regions. This need to optimize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194541#comment-13194541 ] Jieshan Bean commented on HBASE-5153: - @Lars: One more doubt basing on the previous comment, regading on the below code: {noformat} + try { +setupZookeeperTrackers(); +break; + } catch (ZooKeeperConnectionException zkce) { +if (tries = this.numRetries) { + throw zkce; +} + } {noformat} if ZookeeperNodeTracker#start get an exception, then catches and calls abort(Doesn't throw it). The above retries will be break under this situation. So this retries takes no effects.Correct me if am wrong. Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194406#comment-13194406 ] Jieshan Bean commented on HBASE-5153: - It should not leading to an endless loop. Unless, each retry will get a ZookeeperLossException. If this exception happened for long time, Zookeeper must has some problem. so when create a new Zookeeper instance, it already thrown a Exception. So it won't be an endless loop: {noformat} if ((t instanceof KeeperException.SessionExpiredException) || (t instanceof KeeperException.ConnectionLossException)) { try { LOG.info(This client just lost it's session with ZooKeeper, trying + to reconnect.); resetZooKeeperTrackersWithRetries(); LOG.info(Reconnected successfully. This disconnect could have been + caused by a network partition or a long-running GC pause, + either way it's recommended that you verify your environment.); return; } catch (ZooKeeperConnectionException e) { LOG.error(Could not reconnect to ZooKeeper after session + expiration, aborting); t = e; } } if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); HConnectionManager.deleteStaleConnection(this); {noformat} Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194412#comment-13194412 ] Jieshan Bean commented on HBASE-5153: - As discussed with Ted. Trunk and 92 already including a retry logic in RecoverableZooKeeper. So that makes the retry logic in resetZooKeeperTrackersWithRetries less important. Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194426#comment-13194426 ] Jieshan Bean commented on HBASE-5153: - Thanks, Ted...I will take a look at that stack trace. If a ZooKeeperConnectionException thrown by the below code: {noformat} try { if (setupZookeeperTrackers(isLastTime)) { break; } } catch (ZooKeeperConnectionException zkce) { if (isLastTime) { throw zkce; } } {noformat} If will be catched in abort method, then calling LOG.fatal(msg, t); No problem here. Don't know whether I get you correctly:(. Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1319#comment-1319 ] Jieshan Bean commented on HBASE-5153: - Thanks Lars. It's nice of you:) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194445#comment-13194445 ] Jieshan Bean commented on HBASE-5153: - The endless loop happens when ZK is actually down. If ZK is actually down, the below code will throw a Exception: this.zooKeeper = getZooKeeperWatcher(); Then catched by the below code: {noformat} try { LOG.info(This client just lost it's session with ZooKeeper, trying + to reconnect.); resetZooKeeperTrackersWithRetries(); LOG.info(Reconnected successfully. This disconnect could have been + caused by a network partition or a long-running GC pause, + either way it's recommended that you verify your environment.); return; } catch (ZooKeeperConnectionException e) { LOG.error(Could not reconnect to ZooKeeper after session + expiration, aborting); t = e; } if (t != null) LOG.fatal(msg, t); else LOG.fatal(msg); HConnectionManager.deleteStaleConnection(this); {noformat} It should not be a endless loop. Does that make sense? Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194447#comment-13194447 ] Jieshan Bean commented on HBASE-5153: - bq. Are you guys saying we do not need this in 0.92+? We need this, but the patch for 0.92+ should take notice of this:) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194451#comment-13194451 ] Jieshan Bean commented on HBASE-5153: - @Lars: Adding a isResettingZKTrackers sounds good to me. One doubt: Is it necessary to add the keyword of volatile? Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194519#comment-13194519 ] Jieshan Bean commented on HBASE-5153: - @Lars: During the retry, if we get any exceptions, Zookeeper and the Trackers also need to close. {noformat} + try { +setupZookeeperTrackers(); +break; + } catch (ZooKeeperConnectionException zkce) { +if (tries = this.numRetries) { + throw zkce; +} + } {noformat} Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.90.6, 0.92.1 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187442#comment-13187442 ] Jieshan Bean commented on HBASE-5153: - Thank you, Ted. Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.94.0, 0.92.1, 0.90.6 Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186721#comment-13186721 ] Jieshan Bean commented on HBASE-5153: - The test is still running. Before I get the results, I want your comments:). Thank you. Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.6 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186730#comment-13186730 ] Jieshan Bean commented on HBASE-5153: - I will upload the new patch after the tests finish(Including this minor change), Thanks, Ted. Add retry logic in HConnectionImplementation#resetZooKeeperTrackers --- Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.6 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185487#comment-13185487 ] Jieshan Bean commented on HBASE-5153: - Maybe retry in resetZooKeeperTrackers is better. We try our best to re-use the original connection. The worst case is even if we retried max times and still fail, then abort, but i think we're responsible for letting the user level know this. HConnection re-creation in HTable after HConnection abort - Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.6 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185601#comment-13185601 ] Jieshan Bean commented on HBASE-5153: - Thanks, Ted. Without HBASE-3065, 0.90 doesn't handle the ConnectionLossException correctly. Consider the below case: 1. Somewhere trigger a HConnection#abort. 2. Suppose the check of if (t instanceof KeeperException.SessionExpiredException) is true. Then called the resetZooKeeperTrackers(). 3. A ConnectionLossException occur during ZookeeperNodeTracker#start. then trigger a new HConnection#abort. At this scenario, the previous abort may print a log of Reconnected successfully. This disconnect could have been caused by a network partition or a long-running GC pause.. 4. The new abort carry a Throwable with a type which is not KeeperException.SessionExpiredException. so this time abort directly. It seems a recursion here. Either re-use the old connection by resetZooKeeperTrackers, or re-create the connection, the ZookeeperWatcher will be a new one. So I still think the patch for 0.90 is reasonable. Trunk patch will be made big changes. So any other good suggestions? Thanks. HConnection re-creation in HTable after HConnection abort - Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.6 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184705#comment-13184705 ] Jieshan Bean commented on HBASE-5153: - @Ted, the patch for TRUNK seems very different, and i still need some time to check it. hope i can provide today:) @Stack, I think ConnectionUtils is reasonable. I can add it:). I will update the patch. Thank you all. HConnection re-creation in HTable after HConnection abort - Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.6 Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153.patch HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13183290#comment-13183290 ] Jieshan Bean commented on HBASE-5153: - Only do that in flushCommits maybe not enough. I'll go though the code and give a more considerate approach. and also will give a patch for TRUNK. HConnection re-creation in HTable after HConnection abort - Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.6 Attachments: HBASE-5153-V2.patch, HBASE-5153.patch HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort
[ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182472#comment-13182472 ] Jieshan Bean commented on HBASE-5153: - Thanks, Ted. I'll add the unit test code immediately. HConnection re-creation in HTable after HConnection abort - Key: HBASE-5153 URL: https://issues.apache.org/jira/browse/HBASE-5153 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.6 Attachments: HBASE-5153.patch HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a HConnectionManager$HConnectionImplementation@18fb1f7 closed exception. It solve the problem of stale connection can't removed. But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated. Actually, there's two aproach to solve this: 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround. 2. In HBase Client side, catch this exception, and re-create connection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181900#comment-13181900 ] Jieshan Bean commented on HBASE-5088: - It seems my reply is too late. Thank you all:) I suggest backport this patch to 0.90.6. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Lars Hofhansl Priority: Critical Fix For: 0.92.0, 0.94.0 Attachments: 5088-final.txt, 5088-final2.txt, 5088-final3.txt, 5088-syncObj.txt, 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180256#comment-13180256 ] Jieshan Bean commented on HBASE-5088: - We can finish the tests tommorow,then i'll give the results. Thank you, Lars:) A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181156#comment-13181156 ] Jieshan Bean commented on HBASE-5088: - Either the patch of replace TreeMap with ConcurrentSkipListMap, or the patch of replace SoftValueSortedMap with ConcurrentSkipListMap, the performance slightly degraded. The latter one seems better. Please find the test results from the attachment PerformanceTestResults.png. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Priority: Critical Fix For: 0.92.0 Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181161#comment-13181161 ] Jieshan Bean commented on HBASE-5088: - @Lars: patch of replace TreeMap with ConcurrentSkipListMap is same with the patch you attached named 5088.generics.txt. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Priority: Critical Fix For: 0.92.0 Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088-90-replaceSoftValueSortedMap.patch, HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, PerformanceTestResults.png SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178362#comment-13178362 ] Jieshan Bean commented on HBASE-5088: - @Lars: It makes sense to me. I also made plan to test this patch, but the tests haven't finished yet(Encountered some problems during the tests). Thank you for your attention on this issue, Lars:) A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5100) Rollback of split would cause closed region to opened
[ https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176993#comment-13176993 ] Jieshan Bean commented on HBASE-5100: - The patch is good. If region has been closed by other thread, just abondon the split.That region should not be online again while rolling back. Meanwhile, we just need to clean the splitDir. Rollback of split would cause closed region to opened -- Key: HBASE-5100 URL: https://issues.apache.org/jira/browse/HBASE-5100 Project: HBase Issue Type: Bug Reporter: chunhui shen Assignee: chunhui shen Attachments: hbase-5100.patch If master sending close region to rs and region's split transaction concurrently happen, it may cause closed region to opened. See the detailed code in SplitTransaction#createDaughters {code} ListStoreFile hstoreFilesToSplit = null; try{ hstoreFilesToSplit = this.parent.close(false); if (hstoreFilesToSplit == null) { // The region was closed by a concurrent thread. We can't continue // with the split, instead we must just abandon the split. If we // reopen or split this could cause problems because the region has // probably already been moved to a different server, or is in the // process of moving to a different server. throw new IOException(Failed to close region: already closed by + another thread); } } finally { this.journal.add(JournalEntry.CLOSED_PARENT_REGION); } {code} when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes this.parent.initialize(); Although this region is not onlined in the regionserver, it may bring some potential problem. For example, in our environment, the closed parent region is rolled back sucessfully , and then starting compaction and split again. The parent region is f892dd6107b6b4130199582abc78e9c1 master log {code} 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., src=dw87.kgb.sqa.cm4,60020,1324827866085, dest=dw80.kgb.sqa.cm4,60020,1324827865780 2011-12-26 00:24:42,693 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. (offlining) 2011-12-26 00:24:42,694 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1., server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING) 2011-12-26 00:24:42,699 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,348 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED event for f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1. state=CLOSED, ts=1324830285347 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x13447f283f40e73 Creating (or updating) unassigned node for f892dd6107b6b4130199582abc78e9c1 with OFFLINE state 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, region=f892dd6107b6b4130199582abc78e9c1 2011-12-26 00:24:45,354 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176074#comment-13176074 ] Jieshan Bean commented on HBASE-5088: - @Ted: Upgrate the jdk version seems very difficult to us, because all the applications base on that version. I remember the 1.6.0_22 is the recommanded version from community before. Can you tell me why should I use 1.6.0_29? thanks. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176078#comment-13176078 ] Jieshan Bean commented on HBASE-5088: - ok..Thanks...We're doing the performance tests accross reads vs writes with two different patches according to your suggestion:) A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175887#comment-13175887 ] Jieshan Bean commented on HBASE-5088: - @Ted, At first, I also thought we would get a higher performance with this patch, because all the keywords of synchronized removed. But it slowdown. I agree with the explaination from Lars. Our JDK version is 1.6.0_22. And the below is our OS information: {noformat} C3S3:~ # cat /proc/version Linux version 2.6.32.12-0.7-default (geeko@buildhost) (gcc version 4.3.4 [gcc-4_ 3-branch revision 152973] (SUSE Linux) ) #1 SMP 2010-05-20 11:14:20 +0200 C3S3:/proc # lsb_release -a LSB Version: core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64:desktop-4.0-amd64:desktop-4.0-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch Distributor ID: SUSE LINUX Description:SUSE Linux Enterprise Server 11 (x86_64) Release:11 Codename: n/a {noformat} We'll take more tests accross the read vs write and give out the results. @Lars, Sorry, I didn't do another comparison with SoftvalueSortedMap replaced by ConcurrentSkiplistMap.And am planning to do it. Including the functional test and the performance test. And then, we can choose a better one. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175253#comment-13175253 ] Jieshan Bean commented on HBASE-5088: - Loop into the mothod of TreeMap#fixAfterDeletion(EntryK,V x), once the x is null(It may caused by a concurrency issue), it can't come out of that loop. At that time, CPU usage is high. That's what we saw. All the time, the thread was blocked in that method. Currently, the patch is made just replace TreeMap with ConcurrentSkipListMap as Anoop and Lars's suggestion. We're verifying the patch. @Ted, actually, the heapmap method of ConcurrentSkipListMap is different from TreeMap, it is also backed by the original thread-safe map. what do you think? A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175283#comment-13175283 ] Jieshan Bean commented on HBASE-5088: - +1 Use the concrete Map type is always not recommended. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175306#comment-13175306 ] Jieshan Bean commented on HBASE-5088: - I have made a patch replace the TreeMap with ConcurrentSplitListMap in SoftValueSortedMap, but not replace SoftValueSortedMap. I'm running the test on that patch. once finished, I'll submit the patch. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Attachments: 5088-useMapInterfaces.txt SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175309#comment-13175309 ] Jieshan Bean commented on HBASE-5088: - Thanks Lars, I will modify the patch under your suggestion, to use the interface replace the concrete type:)...Will submit it later. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean Attachments: 5088-useMapInterfaces.txt SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use headMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174606#comment-13174606 ] Jieshan Bean commented on HBASE-5088: - We found this issue while one thread can't get out of the loop in TreeMap#fixAfterDeletion: {noformat} Thread-923 prio=10 tid=0x7f3d40553000 nid=0x3ed6 runnable [0x7f3d05c1a000] java.lang.Thread.State: RUNNABLE at java.util.TreeMap.fixAfterDeletion(TreeMap.java:2193) at java.util.TreeMap.deleteEntry(TreeMap.java:2151) at java.util.TreeMap.remove(TreeMap.java:585) at java.util.TreeMap$NavigableSubMap.remove(TreeMap.java:1395) at org.apache.hadoop.hbase.util.SoftValueSortedMap.get(SoftValueSortedMap.java:101) - locked 0x7f3d94f24f70 (a org.apache.hadoop.hbase.util.SoftValueSortedMap) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getCachedLocation(HConnectionManager.java:846) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:668) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:416) at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57) at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:63) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1018) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1104) at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1027) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:535) at com.huawei.icbc.query.SingleTabQuery.querybatch(SingleTabQuery.java:197) at com.huawei.icbc.benchmark.SingleTabQueryAction.query(SingleTabQueryAction.java:181) at framework.QueryThread.run(QueryThread.java:47) {noformat} A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use heapMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap
[ https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174615#comment-13174615 ] Jieshan Bean commented on HBASE-5088: - Good suggestion, Anoop. Actually there's 2 solutions now: 1. Use synchronized while we operation on the view of the original map. 2. Use ConcurrentSkipListMap insteadof TreeMap. Don't know which one is better. A concurrency issue on SoftValueSortedMap - Key: HBASE-5088 URL: https://issues.apache.org/jira/browse/HBASE-5088 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.90.4, 0.94.0 Reporter: Jieshan Bean Assignee: Jieshan Bean SoftValueSortedMap is backed by a TreeMap. All the methods in this class are synchronized. If we use this method to add/delete elements, it's ok. But in HConnectionManager#getCachedLocation, it use heapMap to get a view from SoftValueSortedMap#internalMap. Once we operate on this view map(like add/delete) in other threads, a concurrency issue may occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira