[jira] [Commented] (HBASE-4861) Fix some misspells and extraneous characters in logs; set some to TRACE
[ https://issues.apache.org/jira/browse/HBASE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157006#comment-13157006 ] Hudson commented on HBASE-4861: --- Integrated in HBase-0.92-security #13 (See [https://builds.apache.org/job/HBase-0.92-security/13/]) HBASE-4861 Fix some misspells and extraneous characters in logs; set some to TRACE stack : Files : * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/SplitRegionHandler.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java > Fix some misspells and extraneous characters in logs; set some to TRACE > --- > > Key: HBASE-4861 > URL: https://issues.apache.org/jira/browse/HBASE-4861 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack > Fix For: 0.92.0 > > Attachments: 4861.txt > > > Some small clean up in logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157008#comment-13157008 ] Hudson commented on HBASE-4855: --- Integrated in HBase-0.92-security #13 (See [https://builds.apache.org/job/HBase-0.92-security/13/]) HBASE-4855 SplitLogManager hangs on cluster restart due to batch.installed doubly counted tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java > SplitLogManager hangs on cluster restart due to batch.installed doubly counted > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.0 > > Attachments: HBASE-4855.patch > > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157007#comment-13157007 ] Hudson commented on HBASE-4856: --- Integrated in HBase-0.92-security #13 (See [https://builds.apache.org/job/HBase-0.92-security/13/]) HBASE-4856 Upgrade zookeeper to 3.4.0 release tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/pom.xml > Upgrade zookeeper to 3.4.0 release > -- > > Key: HBASE-4856 > URL: https://issues.apache.org/jira/browse/HBASE-4856 > Project: HBase > Issue Type: Task >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.92.0 > > Attachments: 4856.txt > > > Zookeeper 3.4.0 has been released. > We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157005#comment-13157005 ] Hudson commented on HBASE-4864: --- Integrated in HBase-0.92-security #13 (See [https://builds.apache.org/job/HBase-0.92-security/13/]) HBASE-4864 TestMasterObserver#testRegionTransitionOperations occasionally fails (Gao Jinchao) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java > TestMasterObserver#testRegionTransitionOperations occasionally fails > > > Key: HBASE-4864 > URL: https://issues.apache.org/jira/browse/HBASE-4864 > Project: HBase > Issue Type: Test > Components: test >Reporter: gaojinchao >Assignee: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: HBASE-4864_Branch92.patch > > > looks this logs: > https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ > It seems that we should wait region is added to online region set. > I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4838) Port 2856 (TestAcidGuarantee is failing) to 0.92
[ https://issues.apache.org/jira/browse/HBASE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157000#comment-13157000 ] Lars Hofhansl commented on HBASE-4838: -- I pinpointed the difference to the compactions of the daughters (again with just 2 keys): in 0.92 (with this patch) I see this for the 1st daughter region (which is compacted last): {noformat} 2011-11-24 22:08:51,324 INFO [RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] regionserver.HRegion(1012): Starting compaction on testFamily in region testFilterAcrossMutlipleRegions,,1322201330936.0db66f8aabdf138dbbcf6c04f857c284. 2011-11-24 22:08:51,332 INFO [RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] regionserver.Store(725): Starting compaction of 1 file(s) in testFamily of testFilterAcrossMutlipleRegions,,1322201330936.0db66f8aabdf138dbbcf6c04f857c284. into tmpdir=hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/.tmp, seqid=3, totalSize=662.0 2011-11-24 22:08:51,333 DEBUG [RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] regionserver.Store(1174): Compacting hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/testFamily/85a0a11b15a248c69e09e44e0e9e052e.4e293f99103a49243c16eb104996554b-hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/4e293f99103a49243c16eb104996554b/testFamily/85a0a11b15a248c69e09e44e0e9e052e-bottom, keycount=2, bloomtype=NONE, size=662.0 2011-11-24 22:08:51,388 INFO [RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] regionserver.Store(1322): Renaming compacted file at hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/.tmp/7e7f4acb121e4696bd3c7d64e26a66b9 to hdfs://localhost:52206/user/lars/testFilterAcrossMutlipleRegions/0db66f8aabdf138dbbcf6c04f857c284/testFamily/7e7f4acb121e4696bd3c7d64e26a66b9 2011-11-24 22:08:51,402 INFO [RegionServer:2;localhost,42385,1322201325234-smallCompactions-1322201331230] regionserver.Store(746): Completed major compaction of 1 file(s) in testFamily of testFilterAcrossMutlipleRegions,,1322201330936.0db66f8aabdf138dbbcf6c04f857c284. into 7e7f4acb121e4696bd3c7d64e26a66b9, size=662.0; total size for store is 662.0 {noformat} in trunk I see this for the 1st daughter region: {noformat} 2011-11-24 22:15:18,205 INFO [RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] regionserver.HRegion(1097): Starting compaction on testFamily in region testFilterAcrossMutlipleRegions,,1322201717807.2bdeac6934712efdd694ec44ae48d1b2. 2011-11-24 22:15:18,206 INFO [RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] regionserver.Store(797): Starting compaction of 1 file(s) in testFamily of testFilterAcrossMutlipleRegions,,1322201717807.2bdeac6934712efdd694ec44ae48d1b2. into tmpdir=hdfs://localhost:37213/user/lars/testFilterAcrossMutlipleRegions/2bdeac6934712efdd694ec44ae48d1b2/.tmp, seqid=3, totalSize=718.0 2011-11-24 22:15:18,206 DEBUG [RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] regionserver.Store(1255): Compacting hdfs://localhost:37213/user/lars/testFilterAcrossMutlipleRegions/2bdeac6934712efdd694ec44ae48d1b2/testFamily/64908313825b4c0599b86c26b33797e3.215be88f57f1ca63b6ead035b39c4d2e-hdfs://localhost:37213/user/lars/testFilterAcrossMutlipleRegions/215be88f57f1ca63b6ead035b39c4d2e/testFamily/64908313825b4c0599b86c26b33797e3-bottom, keycount=2, bloomtype=NONE, size=718.0 2011-11-24 22:15:18,211 INFO [RegionServer:0;localhost,46427,1322201712357-smallCompactions-1322201718071] regionserver.Store(818): Completed major compaction of 1 file(s) in testFamily of testFilterAcrossMutlipleRegions,,1322201717807.2bdeac6934712efdd694ec44ae48d1b2. into none, size=none; total size for store is 0.0 {noformat} The keys in both cases are aaa and aab and the split key is aaa, so the 1st region (''-'aaa') should indeed be empty after compaction. In trunk it is correctly compacted to an empty file. In 0.92 it somehow wrote out the entire file again (so the keys are found in the store files for both regions). > Port 2856 (TestAcidGuarantee is failing) to 0.92 > > > Key: HBASE-4838 > URL: https://issues.apache.org/jira/browse/HBASE-4838 > Project: HBase > Issue Type: Sub-task >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl > Fix For: 0.92.0 > > Attachments: 4838-v1.txt > > > Moving back port into a separate issue (as suggested by JonH), because this > not trivial. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administr
[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-4862: Attachment: hbase-4862v1 for trunk.diff > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for > trunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chunhui shen updated HBASE-4862: Attachment: hbase-4862v1 for 0.90.diff > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for > trunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156991#comment-13156991 ] chunhui shen commented on HBASE-4862: - @Ted @Todd I'm sorry my explanation is not clear. I think I should descibe the detailed case first. In the whole following process , client's putting data to region C. 1.Sucessfully move region C from server A to server B, At the moment,there is log entry about region C in both server A's log file and server B's log file 2.kill server A and server B, 3.restart server B, Now, mastet start serverShutdownHanlder for server B, and assign the region C to server D 4,Before region C is opend on the server D,restart server A Now,mastet start serverShutdownHanlder for server A, and split server A's log file. Because there is log entry about region C in server A's log file (why? see 1), split hlog thread would create a file F in the region C's recovered.edits directory. 5.In region C opening process, it will execute replayRecoveredEdits(),and then delete file F. 6.Therefore,in the 4, it throws IO Exception that file F not exists, and cause stopping parse the current server A's hlog file, however, other data in this server A's hlog file lossed The posted region server log is server B's log, and it is doing replayRecoveredEditsIfAny(). Although it prints failed delete of file recovered.edits/13156791680, but in fact this file has been deleted, and master throws file not exist exception : 2011-11-16 11:50:13,037 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while writing log entry to log org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680 File does not exist. I'm not sure whether you are clear now, waiting for your question. Thanks! > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4855: -- Fix Version/s: 0.92.0 Thanks Ted for your review and committing the patch. Updating fix versions as 0.92. > SplitLogManager hangs on cluster restart due to batch.installed doubly counted > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.0 > > Attachments: HBASE-4855.patch > > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4867) A tool to merge configuration files
A tool to merge configuration files --- Key: HBASE-4867 URL: https://issues.apache.org/jira/browse/HBASE-4867 Project: HBase Issue Type: New Feature Reporter: Mikhail Bautin Assignee: Mikhail Bautin Priority: Minor With our cluster configuration setup it would be good to have a tool that would merge HBase configuration, so that files appearing later in the list would override properties specified in earlier files. This way we could merge application-specific configuration file with the cluster-specific configuration file (with the latter overriding the former) and produce a single HBase configuration file to install on the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156967#comment-13156967 ] chunhui shen commented on HBASE-4862: - After successfully move region from server A to server B, the log about this region in server A's log file is successful because flushed already, but it affects other regions'log data in server A's log file if encounter this exception when split hlog > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156965#comment-13156965 ] chunhui shen commented on HBASE-4862: - @Ted Yu @Todd Lipcon It will happen concurrently in the following case: 1.Move region from server A to server B (for example,do balance) 2.kill server A and Server B 3.restart server A and Server B immediately Before we restart server A and Server B, log data about this region appear in the both server's log file, 4.After we restart server B, serverShutdownHandler process this dead server , and assign this region, 5.At the same time, serverShutdownHandler would process dead server B, and split server B's hlog because 4 and 5 is concurrent, replayRecoveredEditsIfAny in 4 and appending log entry for this region's recoverd.edit file are concurrent. So, when the recoverd.edit file deleted by replayRecoveredEdits, exception is thrown. master and region server log in this case as the following: master log: 2011-11-16 11:50:13,037 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while writing log entry to log org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680 File does not exist. [Lease. Holder: DFSClient_hb_m_dw75.kgb.sqa.cm4:6_1321413286871, pendingcreates: 54] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:49) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:962) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:926) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:898) regionserver log: 2011-11-16 11:49:49,727 ERROR org.apache.hadoop.hbase.regionserver.HRegion: Failed delete of hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680 2011-11-16 11:49:49,732 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Deleted recovered.edits file=hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156800103 > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edi
[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156964#comment-13156964 ] Ted Yu commented on HBASE-4863: --- {code} testSleepWithoutInterrupt(org.apache.hadoop.hbase.util.TestThreads) Time elapsed: 5.004 sec <<< FAILURE! java.lang.AssertionError at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.hadoop.hbase.util.TestThreads.testSleepWithoutInterrupt(TestThreads.java:57) {code} points to this line: {code} assertTrue(sleeper.isInterrupted()); {code} > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: > 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, > D531.2.patch, D531.3.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156956#comment-13156956 ] Ted Yu commented on HBASE-4863: --- In thrift2/ThriftServer.java: {code} } else { server = getTThreadPoolServer(protocolFactory, processor, transportFactory, inetSocketAddress); {code} where {code} TThreadPoolServer.Args serverArgs = new TThreadPoolServer.Args(serverTransport); {code} It would be nice to incorporate TBoundedThreadPoolServer into the above module. This can be done in a separate JIRA. > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: > 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, > D531.2.patch, D531.3.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4855: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > SplitLogManager hangs on cluster restart due to batch.installed doubly counted > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Attachments: HBASE-4855.patch > > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156923#comment-13156923 ] Hadoop QA commented on HBASE-4863: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505038/0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.util.TestThreads Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/364//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/364//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/364//console This message is automatically generated. > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: > 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, > D531.2.patch, D531.3.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156907#comment-13156907 ] Hudson commented on HBASE-4855: --- Integrated in HBase-TRUNK #2481 (See [https://builds.apache.org/job/HBase-TRUNK/2481/]) HBASE-4855 SplitLogManager hangs on cluster restart due to batch.installed doubly counted tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/SplitLogManager.java > SplitLogManager hangs on cluster restart due to batch.installed doubly counted > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Attachments: HBASE-4855.patch > > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4863: -- Attachment: 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch The same as D531.3.patch but generated using "git format-patch --no-prefix HEAD^..HEAD" so that it can be applied using the normal patch command. > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: > 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, > D531.2.patch, D531.3.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Bautin updated HBASE-4863: -- Status: Patch Available (was: Open) > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: > 0001-Fix-thread-leaks-in-the-HBase-thread-pool-server.patch, D531.1.patch, > D531.2.patch, D531.3.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4863: --- Attachment: D531.3.patch mbautin updated the revision "[jira] [HBASE-4863] Make HBase Thrift server more configurable and add a command-line UI test". Reviewers: JIRA, Kannan, tedyu, stack Addressing Ted's comments. I will re-run unit tests and cluster tests, and post an update. REVISION DETAIL https://reviews.facebook.net/D531 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java src/main/java/org/apache/hadoop/hbase/util/Threads.java src/main/resources/hbase-default.xml src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java src/test/java/org/apache/hadoop/hbase/util/TestThreads.java > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: D531.1.patch, D531.2.patch, D531.3.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156883#comment-13156883 ] Phabricator commented on HBASE-4863: tedyu has commented on the revision "[jira] [HBASE-4863] Make HBase Thrift server more configurable and add a command-line UI test". Should similar changes in thrift/ThriftServer.java be applied to thrift2/ThriftServer.java ? INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:111 Should this become a parameter user can adjust ? src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:263 Should ttx.getType() be logged ? src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java:179 Should read 'Exactly one ' REVISION DETAIL https://reviews.facebook.net/D531 > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: D531.1.patch, D531.2.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156873#comment-13156873 ] Ted Yu commented on HBASE-4855: --- Failed test was due to 'Too many open files' Patch integrated to 0.92 and TRUNK. Thanks for the patch Ramkrishna. > SplitLogManager hangs on cluster restart due to batch.installed doubly counted > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Attachments: HBASE-4855.patch > > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart due to batch.installed doubly counted
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4855: -- Summary: SplitLogManager hangs on cluster restart due to batch.installed doubly counted (was: SplitLogManager hangs on cluster restart. ) > SplitLogManager hangs on cluster restart due to batch.installed doubly counted > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Attachments: HBASE-4855.patch > > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156869#comment-13156869 ] Hadoop QA commented on HBASE-4855: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505020/HBASE-4855.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/363//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/363//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/363//console This message is automatically generated. > SplitLogManager hangs on cluster restart. > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Attachments: HBASE-4855.patch > > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156866#comment-13156866 ] Ted Yu commented on HBASE-4862: --- @Chunhui: Can you attach master and region server log snippets which would show us what happened ? Thanks > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-4856. --- Resolution: Fixed > Upgrade zookeeper to 3.4.0 release > -- > > Key: HBASE-4856 > URL: https://issues.apache.org/jira/browse/HBASE-4856 > Project: HBase > Issue Type: Task >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.92.0 > > Attachments: 4856.txt > > > Zookeeper 3.4.0 has been released. > We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156858#comment-13156858 ] Todd Lipcon commented on HBASE-4862: wait, wait -- _why_ is this happening concurrently? A region should never be opened until the split process is done for that region. If this is happening we have a much larger issue, which we shouldn't be working around with tmp file names, etc. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156845#comment-13156845 ] Hudson commented on HBASE-4856: --- Integrated in HBase-TRUNK #2479 (See [https://builds.apache.org/job/HBase-TRUNK/2479/]) HBASE-4856 Upgrade zookeeper to 3.4.0 release tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/pom.xml > Upgrade zookeeper to 3.4.0 release > -- > > Key: HBASE-4856 > URL: https://issues.apache.org/jira/browse/HBASE-4856 > Project: HBase > Issue Type: Task >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.92.0 > > Attachments: 4856.txt > > > Zookeeper 3.4.0 has been released. > We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156844#comment-13156844 ] Hudson commented on HBASE-4864: --- Integrated in HBase-TRUNK #2479 (See [https://builds.apache.org/job/HBase-TRUNK/2479/]) HBASE-4864 TestMasterObserver#testRegionTransitionOperations occasionally fails (Gao Jinchao) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java > TestMasterObserver#testRegionTransitionOperations occasionally fails > > > Key: HBASE-4864 > URL: https://issues.apache.org/jira/browse/HBASE-4864 > Project: HBase > Issue Type: Test > Components: test >Reporter: gaojinchao >Assignee: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: HBASE-4864_Branch92.patch > > > looks this logs: > https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ > It seems that we should wait region is added to online region set. > I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156830#comment-13156830 ] Ted Yu commented on HBASE-4855: --- +1 on patch. > SplitLogManager hangs on cluster restart. > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Attachments: HBASE-4855.patch > > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156821#comment-13156821 ] Ted Yu edited comment on HBASE-4863 at 11/24/11 5:14 PM: - I got compilation error: {code} testRunThriftServer[0](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine) Time elapsed: 2.047 sec <<< ERROR! java.lang.Error: Unresolved compilation problem: Cannot make a static reference to the non-static method getColumnDescriptors() from the type TestThriftServer at org.apache.hadoop.hbase.thrift.TestThriftServer.createDropTable(TestThriftServer.java:111) {code} Since HBaseThreadPoolServer extends TServer, I think a better name for the class would be TBoundedThreadPoolServer (TThreadPoolServer is in thrift). was (Author: yuzhih...@gmail.com): I got compilation error: {code} testRunThriftServer[0](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine) Time elapsed: 2.047 sec <<< ERROR! java.lang.Error: Unresolved compilation problem: Cannot make a static reference to the non-static method getColumnDescriptors() from the type TestThriftServer at org.apache.hadoop.hbase.thrift.TestThriftServer.createDropTable(TestThriftServer.java:111) {code} Since HBaseThreadPoolServer extends TServer, I think a better name for the class would be TThreadPoolServer. > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: D531.1.patch, D531.2.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156828#comment-13156828 ] Phabricator commented on HBASE-4863: tedyu has commented on the revision "[jira] [HBASE-4863] Make HBase Thrift server more configurable and add a command-line UI test". INLINE COMMENTS src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:64 Please add javadoc for the keys. These keys should be placed into hbase-default.xml src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java:80 Is TIME_TO_WAIT_AFTER_SHUTDOWN_MS a better name for this constant ? REVISION DETAIL https://reviews.facebook.net/D531 > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: D531.1.patch, D531.2.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4855: -- Status: Patch Available (was: Open) > SplitLogManager hangs on cluster restart. > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Attachments: HBASE-4855.patch > > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4855: -- Attachment: HBASE-4855.patch TestDistributedLogSplitting is passing . Other test cases results will get in the morning. > SplitLogManager hangs on cluster restart. > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Attachments: HBASE-4855.patch > > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156822#comment-13156822 ] Ted Yu commented on HBASE-4855: --- The above analysis makes sense. Nice catch Ramkrishna. > SplitLogManager hangs on cluster restart. > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156821#comment-13156821 ] Ted Yu commented on HBASE-4863: --- I got compilation error: {code} testRunThriftServer[0](org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine) Time elapsed: 2.047 sec <<< ERROR! java.lang.Error: Unresolved compilation problem: Cannot make a static reference to the non-static method getColumnDescriptors() from the type TestThriftServer at org.apache.hadoop.hbase.thrift.TestThriftServer.createDropTable(TestThriftServer.java:111) {code} Since HBaseThreadPoolServer extends TServer, I think a better name for the class would be TThreadPoolServer. > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: D531.1.patch, D531.2.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156820#comment-13156820 ] ramkrishna.s.vasudevan commented on HBASE-4855: --- When the master restarts and sees splitlog nodes which are not processed the SplitLogManager does handleUnassignedTasks {code} Task task = findOrCreateOrphanTask(path); {code} As part of which {code} task = tasks.putIfAbsent(path, orphanTask); {code} Ths task is added. Later in splitLogDistributed() we try to installTask(). Here we create the task if absent {code} Task oldtask = createTaskIfAbsent(path, batch); {code} Inside createTaskIfAbsent() {code} oldtask = tasks.putIfAbsent(path, new Task(batch)); if (oldtask != null && oldtask.isOrphan()) { LOG.info("Previously orphan task " + path + " is now being waited upon"); oldtask.setBatch(batch); return (null); } {code} the putIfAbsent returns the already added task so oldtask is not null. Already while doing new Task(batch) {code} Task(TaskBatch tb) { incarnation = 0; last_version = -1; deleted = false; setBatch(tb); setUnassigned(); } public void setBatch(TaskBatch batch) { if (batch != null && this.batch != null) { LOG.fatal("logic error - batch being overwritten"); } this.batch = batch; if (batch != null) { batch.installed++; } } {code} the batch.installed++ happens. Since the oldtask is not null once again we call oldtask.setBatch(batch) making the batch.installed to increment once again. This is why batch.done is not able to reach this batch.installed and hence the while loop keeps looping. {code} while ((batch.done + batch.error) != batch.installed) { {code} Pls correct me if my analysis is wrong. I am uploading a patch which solved the problem. Kindly validate the fix. > SplitLogManager hangs on cluster restart. > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4855: -- Fix Version/s: (was: 0.92.0) > SplitLogManager hangs on cluster restart. > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4855: -- Affects Version/s: 0.92.0 Fix Version/s: 0.92.0 > SplitLogManager hangs on cluster restart. > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.0 > > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4866) Fix possible NPE in AssignmentManager#regionOnline
[ https://issues.apache.org/jira/browse/HBASE-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156779#comment-13156779 ] Jonathan Hsieh commented on HBASE-4866: --- Looks like it corresponds to this line which is AssignmentManager:724 on the 0.90 branch {code} HServerInfo hsiWithoutLoad = new HServerInfo( serverInfo.getServerAddress(), serverInfo.getStartCode(), serverInfo.getInfoPort(), serverInfo.getHostname()); {code} > Fix possible NPE in AssignmentManager#regionOnline > -- > > Key: HBASE-4866 > URL: https://issues.apache.org/jira/browse/HBASE-4866 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 0.90.4 >Reporter: Jonathan Hsieh > > NPE encountered in users's HMaster logs: > {code} > 11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting > shutdown. > java.lang.NullPointerException >at > org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731) >at > org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215) >at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:422) >at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:295) > {code} > From user list: > http://mail-archives.apache.org/mod_mbox/hbase-user/20.mbox/%3C4ECC9AFC.6030307%40qualtrics.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4861) Fix some misspells and extraneous characters in logs; set some to TRACE
[ https://issues.apache.org/jira/browse/HBASE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156776#comment-13156776 ] Hudson commented on HBASE-4861: --- Integrated in HBase-TRUNK #2478 (See [https://builds.apache.org/job/HBase-TRUNK/2478/]) HBASE-4861 Fix some misspells and extraneous characters in logs; set some to TRACE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/SplitRegionHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java > Fix some misspells and extraneous characters in logs; set some to TRACE > --- > > Key: HBASE-4861 > URL: https://issues.apache.org/jira/browse/HBASE-4861 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack > Fix For: 0.92.0 > > Attachments: 4861.txt > > > Some small clean up in logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4866) Fix possible NPE in AssignmentManager#regionOnline
Fix possible NPE in AssignmentManager#regionOnline -- Key: HBASE-4866 URL: https://issues.apache.org/jira/browse/HBASE-4866 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: Jonathan Hsieh NPE encountered in users's HMaster logs: {code} 11/11/22 23:45:37 FATAL master.HMaster: Unhandled exception. Starting shutdown. java.lang.NullPointerException at org.apache.hadoop.hbase.master.AssignmentManager.regionOnline(AssignmentManager.java:731) at org.apache.hadoop.hbase.master.AssignmentManager.processFailover(AssignmentManager.java:215) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:422) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:295) {code} >From user list: >http://mail-archives.apache.org/mod_mbox/hbase-user/20.mbox/%3C4ECC9AFC.6030307%40qualtrics.com%3E -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4865) HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as asynchronous but are actually synchronous.
[ https://issues.apache.org/jira/browse/HBASE-4865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156771#comment-13156771 ] Ted Yu commented on HBASE-4865: --- w.r.t. HBaseAdmin#createTable[Async] methods, see HBASE-3904 and HBASE-3229 We don't need to change their implementation now. > HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as > asynchronous but are actually synchronous. > - > > Key: HBASE-4865 > URL: https://issues.apache.org/jira/browse/HBASE-4865 > Project: HBase > Issue Type: Bug > Components: client, master >Affects Versions: 0.94.0 > Environment: all >Reporter: nkeywal >Priority: Minor > > The javadoc states is asynchronous, but we can see in the implementation on > HMaster that the implementation does not use executorService but calls > directly process(). This is not true for all methods: enableTable, > modifyTable, disableTable are truly asynchronous. > The other impact is that the listeners are not called, as this is done by the > executorService. > I don't known if we have to change the documentation or the implementation. > For consistency; I would change the implementation, but it may breaks > existing code. > Two other comments: > 1) There is no real naming pattern here, while it would be useful: > HBaseAdmin#createTable is synchrounous and calls the asynchronous > HMaster#createTable > HBaseAdmin#createTableAsync is asynchrounous and calls the asynchronous > HMaster#createTable > HBaseAdmin#modifyTable is asynchrounous and calls the asynchronous > HMaster#modifyTable > HBaseAdmin#modifyColumn is documented as asynchrounous and calls the > synchronous HMaster#modifyColumn > 2) the coprocessor "post" semantic is not consistent across the services. > - when the service is synchronous, post is called after the services > execution (ex: addColumn with the current implementation). > - when the service is asynchronous, post is called after the executorService > has registered the service to execute, but the service itself is not executed > yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156770#comment-13156770 ] Ted Yu commented on HBASE-4856: --- Integrated to 0.92 and TRUNK after verifying that 3.4.0 artifacts could be pulled. > Upgrade zookeeper to 3.4.0 release > -- > > Key: HBASE-4856 > URL: https://issues.apache.org/jira/browse/HBASE-4856 > Project: HBase > Issue Type: Task >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.92.0 > > Attachments: 4856.txt > > > Zookeeper 3.4.0 has been released. > We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-4862: - Assignee: chunhui shen > Split hlog and open region concurrently happend may cause data loss > --- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4862: -- Fix Version/s: 0.90.5 0.94.0 0.92.0 > Split hlog and open region concurrently happend may cause data loss > --- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4862: -- Summary: Splitting hlog and opening region concurrently may cause data loss (was: Split hlog and open region concurrently happend may cause data loss) > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Split hlog and open region concurrently happend may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156765#comment-13156765 ] Ted Yu commented on HBASE-4862: --- Nice work. The patch doesn't apply to 0.90 branch: {code} Hunk #4 succeeded at 783 (offset -332 lines). 1 out of 4 hunks FAILED -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java.rej ... patch unexpectedly ends in middle of line 2 out of 2 hunks ignored -- saving rejects to file src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java.rej {code} Please rebase your patch for 0.90 A separate patch for TRUNK would be helpful for HadoopQA to run test suite. Comments about the changes: getTmpRecoveredEditsFileName() is only used once and there is no javadoc for it. Maybe we don't need to create the method, just append ".tmp" directly to the filename. {code} +// Convert file name ends with .tmp, so ensure region's replayRecoveredEdits {code} The beginning of the above should read 'Append filename with '.tmp' to ensure' > Split hlog and open region concurrently happend may cause data loss > --- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156753#comment-13156753 ] Ted Yu commented on HBASE-4864: --- Integrated to 0.92 and TRUNK. Thanks for the patch Jinchao. > TestMasterObserver#testRegionTransitionOperations occasionally fails > > > Key: HBASE-4864 > URL: https://issues.apache.org/jira/browse/HBASE-4864 > Project: HBase > Issue Type: Test > Components: test >Reporter: gaojinchao >Assignee: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: HBASE-4864_Branch92.patch > > > looks this logs: > https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ > It seems that we should wait region is added to online region set. > I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4864: -- Resolution: Fixed Status: Resolved (was: Patch Available) > TestMasterObserver#testRegionTransitionOperations occasionally fails > > > Key: HBASE-4864 > URL: https://issues.apache.org/jira/browse/HBASE-4864 > Project: HBase > Issue Type: Test > Components: test >Reporter: gaojinchao >Assignee: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: HBASE-4864_Branch92.patch > > > looks this logs: > https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ > It seems that we should wait region is added to online region set. > I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4864) testRegionTransitionOperations occasional failures
[ https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4864: -- Issue Type: Test (was: Bug) Hadoop Flags: Reviewed > testRegionTransitionOperations occasional failures > -- > > Key: HBASE-4864 > URL: https://issues.apache.org/jira/browse/HBASE-4864 > Project: HBase > Issue Type: Test > Components: test >Reporter: gaojinchao >Assignee: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: HBASE-4864_Branch92.patch > > > looks this logs: > https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ > It seems that we should wait region is added to online region set. > I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4864) TestMasterObserver#testRegionTransitionOperations occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4864: -- Summary: TestMasterObserver#testRegionTransitionOperations occasionally fails (was: testRegionTransitionOperations occasional failures) > TestMasterObserver#testRegionTransitionOperations occasionally fails > > > Key: HBASE-4864 > URL: https://issues.apache.org/jira/browse/HBASE-4864 > Project: HBase > Issue Type: Test > Components: test >Reporter: gaojinchao >Assignee: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: HBASE-4864_Branch92.patch > > > looks this logs: > https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ > It seems that we should wait region is added to online region set. > I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4864) testRegionTransitionOperations occasional failures
[ https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-4864: - Assignee: gaojinchao > testRegionTransitionOperations occasional failures > -- > > Key: HBASE-4864 > URL: https://issues.apache.org/jira/browse/HBASE-4864 > Project: HBase > Issue Type: Bug > Components: test >Reporter: gaojinchao >Assignee: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: HBASE-4864_Branch92.patch > > > looks this logs: > https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ > It seems that we should wait region is added to online region set. > I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4855) SplitLogManager hangs on cluster restart.
[ https://issues.apache.org/jira/browse/HBASE-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156732#comment-13156732 ] ramkrishna.s.vasudevan commented on HBASE-4855: --- The batch.installed was getting incremented twice. I will upload the patch shortly for review. Test cases result will let you know tomorrow morning as it will take time. > SplitLogManager hangs on cluster restart. > -- > > Key: HBASE-4855 > URL: https://issues.apache.org/jira/browse/HBASE-4855 > Project: HBase > Issue Type: Bug >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > > Start a master and RS > RS goes down (kill -9) > Wait for ServerShutDownHandler to create the splitlog nodes. As no RS is > there it cannot be processed. > Restart both master and bring up an RS. > The master hangs in SplitLogManager.waitforTasks(). > I feel that batch.done is not getting incremented properly. Not yet digged > in fully. > This may be the reason for occasional failure of > TestDistributedLogSplitting.testWorkerAbort(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4865) HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as asynchronous but are actually synchronous.
HBaseAdmin addColumn, modifyColumn, deleteColumn are documented as asynchronous but are actually synchronous. - Key: HBASE-4865 URL: https://issues.apache.org/jira/browse/HBASE-4865 Project: HBase Issue Type: Bug Components: client, master Affects Versions: 0.94.0 Environment: all Reporter: nkeywal Priority: Minor The javadoc states is asynchronous, but we can see in the implementation on HMaster that the implementation does not use executorService but calls directly process(). This is not true for all methods: enableTable, modifyTable, disableTable are truly asynchronous. The other impact is that the listeners are not called, as this is done by the executorService. I don't known if we have to change the documentation or the implementation. For consistency; I would change the implementation, but it may breaks existing code. Two other comments: 1) There is no real naming pattern here, while it would be useful: HBaseAdmin#createTable is synchrounous and calls the asynchronous HMaster#createTable HBaseAdmin#createTableAsync is asynchrounous and calls the asynchronous HMaster#createTable HBaseAdmin#modifyTable is asynchrounous and calls the asynchronous HMaster#modifyTable HBaseAdmin#modifyColumn is documented as asynchrounous and calls the synchronous HMaster#modifyColumn 2) the coprocessor "post" semantic is not consistent across the services. - when the service is synchronous, post is called after the services execution (ex: addColumn with the current implementation). - when the service is asynchronous, post is called after the executorService has registered the service to execute, but the service itself is not executed yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4861) Fix some misspells and extraneous characters in logs; set some to TRACE
[ https://issues.apache.org/jira/browse/HBASE-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156721#comment-13156721 ] Hudson commented on HBASE-4861: --- Integrated in HBase-TRUNK-security #8 (See [https://builds.apache.org/job/HBase-TRUNK-security/8/]) HBASE-4861 Fix some misspells and extraneous characters in logs; set some to TRACE stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/MetaEditor.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/SplitRegionHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/SplitTransaction.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java > Fix some misspells and extraneous characters in logs; set some to TRACE > --- > > Key: HBASE-4861 > URL: https://issues.apache.org/jira/browse/HBASE-4861 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack > Fix For: 0.92.0 > > Attachments: 4861.txt > > > Some small clean up in logs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4864) testRegionTransitionOperations occasional failures
[ https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156710#comment-13156710 ] Hadoop QA commented on HBASE-4864: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505006/HBASE-4864_Branch92.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 66 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestInstantSchemaChange org.apache.hadoop.hbase.client.TestAdmin Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/362//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/362//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/362//console This message is automatically generated. > testRegionTransitionOperations occasional failures > -- > > Key: HBASE-4864 > URL: https://issues.apache.org/jira/browse/HBASE-4864 > Project: HBase > Issue Type: Bug > Components: test >Reporter: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: HBASE-4864_Branch92.patch > > > looks this logs: > https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ > It seems that we should wait region is added to online region set. > I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4789) On split, parent region is sticking around in oldest sequenceid to region map though not online; we don't cleanup WALs.
[ https://issues.apache.org/jira/browse/HBASE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156654#comment-13156654 ] Hudson commented on HBASE-4789: --- Integrated in HBase-TRUNK #2477 (See [https://builds.apache.org/job/HBase-TRUNK/2477/]) HBASE-4853 HBASE-4789 does overzealous pruning of seqids HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO GET TED COMMENT IN HBASE-4853 HBASE-4789 does overzealous pruning of seqids stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java > On split, parent region is sticking around in oldest sequenceid to region map > though not online; we don't cleanup WALs. > --- > > Key: HBASE-4789 > URL: https://issues.apache.org/jira/browse/HBASE-4789 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 0.92.0 > > Attachments: 4789-v2.txt, 4789-v3.txt, 4789-v4.txt, 4789.txt > > > Here is log for a particular region: > {code} > 2011-11-15 05:46:31,382 INFO > org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the > master to process the split for 8bbd7388262dc8cb1ce2cf4f04a7281d > 2011-11-15 05:46:31,483 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:7003-0x1337b0b92cd000a-0x1337b0b92cd000a Attempting to > transition node 8bbd7388262dc8cb1ce2cf4f04a7281d from RS_ZK_REGION_SPLIT to > RS_ZK_REG > ION_SPLIT > 2011-11-15 05:46:31,484 INFO > org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META > updated, and report to master. > Parent=TestTable,0862220095,1321335865649.8bbd7388262dc8cb1ce2cf4f04a7281d., > new regions: TestTab > le,0862220095,1321335989689.f00c683df3182d8ef33e315f77ca539c., > TestTable,0892568091,1321335989689.a56ca1eff5b4401432fcba04b4e851f8.. Split > took 1sec > 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Compacting > hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d- > hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-top, > keycount=717559, bloomtype=NONE, size=711.1m > 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Compacting > hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d- > hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-top, > keycount=416691, bloomtype=NONE, size=412.9m > 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Compacting > hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d- > hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-bottom, > keycount=717559, bloomtype=NONE, size=711.1m > 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Compacting > hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d- > hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-bottom, > keycount=416691, bloomtype=NONE, size=412.9m > 2011-11-15 05:48:00,690 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: > Found 3 hlogs to remove out of total 12; oldest outstanding sequenceid is > 5699 from region 8bbd7388262dc8cb1ce2cf4f04a7281d > 2011-11-15 05:57:54,083 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: > Too many hlogs: logs=33, maxlogs=32; forcing flush of 1 regions(s): > 8bbd7388262dc8cb1ce2cf4f04a7281d > 2011-11-15 05:57:54,083 WARN org.apache.hadoop.hbase.regionserver.LogRoller: > Failed to schedule flush of 8bbd7388262dc8cb1ce2cf4f04a7281dr=null, > requester=null > 201
[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156653#comment-13156653 ] Hudson commented on HBASE-4853: --- Integrated in HBase-TRUNK #2477 (See [https://builds.apache.org/job/HBase-TRUNK/2477/]) HBASE-4853 HBASE-4789 does overzealous pruning of seqids HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO GET TED COMMENT IN HBASE-4853 HBASE-4789 does overzealous pruning of seqids stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java > HBASE-4789 does overzealous pruning of seqids > - > > Key: HBASE-4853 > URL: https://issues.apache.org/jira/browse/HBASE-4853 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v10.txt, > 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853-v7.txt, 4853-v8.txt, 4853-v9.txt, > 4853-v9.txt, 4853.txt > > > Working w/ J-D on failing replication test turned up hole in seqids made by > the patch over in hbase-4789. With this patch in place we see lots of > instances of the suspicious: 'Last sequenceid written is empty. Deleting all > old hlogs' > At a minimum, these lines need removing: > {code} > diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > index 623edbe..a0bbe01 100644 > --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { >// Cleaning up of lastSeqWritten is in the finally clause because we >// don't want to confuse getOldestOutstandingSeqNum() >this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); > - Long l = this.lastSeqWritten.remove(encodedRegionName); > - if (l != null) { > -LOG.warn("Why is there a raw encodedRegionName in lastSeqWritten? > name=" + > - Bytes.toString(encodedRegionName) + ", seqid=" + l); > - } >this.cacheFlushLock.unlock(); > } >} > {code} > ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4864) testRegionTransitionOperations occasional failures
[ https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-4864: -- Status: Patch Available (was: Open) > testRegionTransitionOperations occasional failures > -- > > Key: HBASE-4864 > URL: https://issues.apache.org/jira/browse/HBASE-4864 > Project: HBase > Issue Type: Bug > Components: test >Reporter: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: HBASE-4864_Branch92.patch > > > looks this logs: > https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ > It seems that we should wait region is added to online region set. > I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4864) testRegionTransitionOperations occasional failures
[ https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156646#comment-13156646 ] Ted Yu commented on HBASE-4864: --- +1 on patch. > testRegionTransitionOperations occasional failures > -- > > Key: HBASE-4864 > URL: https://issues.apache.org/jira/browse/HBASE-4864 > Project: HBase > Issue Type: Bug > Components: test >Reporter: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: HBASE-4864_Branch92.patch > > > looks this logs: > https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ > It seems that we should wait region is added to online region set. > I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4864) testRegionTransitionOperations occasional failures
[ https://issues.apache.org/jira/browse/HBASE-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-4864: -- Attachment: HBASE-4864_Branch92.patch > testRegionTransitionOperations occasional failures > -- > > Key: HBASE-4864 > URL: https://issues.apache.org/jira/browse/HBASE-4864 > Project: HBase > Issue Type: Bug > Components: test >Reporter: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: HBASE-4864_Branch92.patch > > > looks this logs: > https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ > It seems that we should wait region is added to online region set. > I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4864) testRegionTransitionOperations occasional failures
testRegionTransitionOperations occasional failures -- Key: HBASE-4864 URL: https://issues.apache.org/jira/browse/HBASE-4864 Project: HBase Issue Type: Bug Components: test Reporter: gaojinchao Priority: Minor Fix For: 0.92.0, 0.94.0 looks this logs: https://builds.apache.org/job/HBase-TRUNK-security/ws/trunk/target/surefire-reports/ It seems that we should wait region is added to online region set. I made a patch, Please review. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4856) Upgrade zookeeper to 3.4.0 release
[ https://issues.apache.org/jira/browse/HBASE-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156634#comment-13156634 ] Hudson commented on HBASE-4856: --- Integrated in HBase-TRUNK-security #7 (See [https://builds.apache.org/job/HBase-TRUNK-security/7/]) HBASE-4856 Upgrade zookeeper to 3.4.0 release - revert, Apache maven repository not ready HBASE-4856 Upgrade zookeeper to 3.4.0 release tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/pom.xml tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/pom.xml > Upgrade zookeeper to 3.4.0 release > -- > > Key: HBASE-4856 > URL: https://issues.apache.org/jira/browse/HBASE-4856 > Project: HBase > Issue Type: Task >Reporter: Ted Yu >Assignee: Ted Yu > Fix For: 0.92.0 > > Attachments: 4856.txt > > > Zookeeper 3.4.0 has been released. > We should upgade. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4789) On split, parent region is sticking around in oldest sequenceid to region map though not online; we don't cleanup WALs.
[ https://issues.apache.org/jira/browse/HBASE-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156635#comment-13156635 ] Hudson commented on HBASE-4789: --- Integrated in HBase-TRUNK-security #7 (See [https://builds.apache.org/job/HBase-TRUNK-security/7/]) HBASE-4853 HBASE-4789 does overzealous pruning of seqids HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO GET TED COMMENT IN HBASE-4853 HBASE-4789 does overzealous pruning of seqids stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java > On split, parent region is sticking around in oldest sequenceid to region map > though not online; we don't cleanup WALs. > --- > > Key: HBASE-4789 > URL: https://issues.apache.org/jira/browse/HBASE-4789 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 0.92.0 > > Attachments: 4789-v2.txt, 4789-v3.txt, 4789-v4.txt, 4789.txt > > > Here is log for a particular region: > {code} > 2011-11-15 05:46:31,382 INFO > org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the > master to process the split for 8bbd7388262dc8cb1ce2cf4f04a7281d > 2011-11-15 05:46:31,483 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: > regionserver:7003-0x1337b0b92cd000a-0x1337b0b92cd000a Attempting to > transition node 8bbd7388262dc8cb1ce2cf4f04a7281d from RS_ZK_REGION_SPLIT to > RS_ZK_REG > ION_SPLIT > 2011-11-15 05:46:31,484 INFO > org.apache.hadoop.hbase.regionserver.SplitRequest: Region split, META > updated, and report to master. > Parent=TestTable,0862220095,1321335865649.8bbd7388262dc8cb1ce2cf4f04a7281d., > new regions: TestTab > le,0862220095,1321335989689.f00c683df3182d8ef33e315f77ca539c., > TestTable,0892568091,1321335989689.a56ca1eff5b4401432fcba04b4e851f8.. Split > took 1sec > 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Compacting > hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d- > hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-top, > keycount=717559, bloomtype=NONE, size=711.1m > 2011-11-15 05:46:37,705 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Compacting > hdfs://sv4r11s38:7000/hbase/TestTable/a56ca1eff5b4401432fcba04b4e851f8/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d- > hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-top, > keycount=416691, bloomtype=NONE, size=412.9m > 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Compacting > hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9ce16d8fa94e4938964c04775a6fa1a7.8bbd7388262dc8cb1ce2cf4f04a7281d- > hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9ce16d8fa94e4938964c04775a6fa1a7-bottom, > keycount=717559, bloomtype=NONE, size=711.1m > 2011-11-15 05:46:53,090 DEBUG org.apache.hadoop.hbase.regionserver.Store: > Compacting > hdfs://sv4r11s38:7000/hbase/TestTable/f00c683df3182d8ef33e315f77ca539c/info/9213f4d7ee9b4fda857a97603a001f9e.8bbd7388262dc8cb1ce2cf4f04a7281d- > hdfs://sv4r11s38:7000/hbase/TestTable/8bbd7388262dc8cb1ce2cf4f04a7281d/info/9213f4d7ee9b4fda857a97603a001f9e-bottom, > keycount=416691, bloomtype=NONE, size=412.9m > 2011-11-15 05:48:00,690 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog: > Found 3 hlogs to remove out of total 12; oldest outstanding sequenceid is > 5699 from region 8bbd7388262dc8cb1ce2cf4f04a7281d > 2011-11-15 05:57:54,083 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: > Too many hlogs: logs=33, maxlogs=32; forcing flush of 1 regions(s): > 8bbd7388262dc8cb1ce2cf4f04a7281d > 2011-11-15 05:57:54,083 WARN org.apache.hadoop.hbase.regionserver.LogRoller: > Failed to schedule flush of 8bbd7388262dc8cb1ce2cf4f04a7281dr=null, > requeste
[jira] [Commented] (HBASE-4772) Utility to Create StoreFiles
[ https://issues.apache.org/jira/browse/HBASE-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156633#comment-13156633 ] Hudson commented on HBASE-4772: --- Integrated in HBase-TRUNK-security #7 (See [https://builds.apache.org/job/HBase-TRUNK-security/7/]) HBASE-4772 Utility to Create StoreFiles karthik : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/CreateRandomStoreFile.java > Utility to Create StoreFiles > > > Key: HBASE-4772 > URL: https://issues.apache.org/jira/browse/HBASE-4772 > Project: HBase > Issue Type: Test >Affects Versions: 0.94.0 >Reporter: Nicolas Spiegelberg >Assignee: Mikhail Bautin >Priority: Minor > Fix For: 0.94.0 > > Attachments: HBASE-4772-B.patch, HBASE-4772.patch > > > Add a tool to create a StoreFile with the specified number of key/value > pairs, with the specified compression and Bloom filter type. This is useful > for creating HFileV1 & HFileV2 store files for testing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4857) Recursive loop on KeeperException in AuthenticationTokenSecretManager/ZKLeaderManager
[ https://issues.apache.org/jira/browse/HBASE-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156636#comment-13156636 ] Hudson commented on HBASE-4857: --- Integrated in HBase-TRUNK-security #7 (See [https://builds.apache.org/job/HBase-TRUNK-security/7/]) HBASE-4857 Recursive loop on KeeperException in AuthenticationTokenSecretManager garyh : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/security/src/main/java/org/apache/hadoop/hbase/security/token/AuthenticationTokenSecretManager.java > Recursive loop on KeeperException in > AuthenticationTokenSecretManager/ZKLeaderManager > - > > Key: HBASE-4857 > URL: https://issues.apache.org/jira/browse/HBASE-4857 > Project: HBase > Issue Type: Bug > Components: security >Affects Versions: 0.92.0, 0.94.0 >Reporter: Gary Helmling >Assignee: Gary Helmling >Priority: Critical > Fix For: 0.92.0 > > Attachments: HBASE-4857.patch > > > Looking through stack traces for {{TestMasterFailover}}, I see a case where > the leader {{AuthenticationTokenSecretManager}} can get into a recursive loop > when a {{KeeperException}} is encountered: > {noformat} > Thread-1-EventThread" daemon prio=10 tid=0x7f9fb47b2800 nid=0x77f6 > waiting on condition [0x7f9fab376000] >java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at java.lang.Thread.sleep(Thread.java:302) > at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328) > at > org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:206) > at > org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:891) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.createBaseZNodes(ZooKeeperWatcher.java:161) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:154) > at > org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:1397) > at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:1435) > at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:1374) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.abort(ZooKeeperWatcher.java:450) > at > org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:166) > at > org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) > at > org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) > at > org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) > at > org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.stepDownAsLeader(ZKLeaderManager.java:167) > at > org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager$LeaderElector.stop(AuthenticationTokenSecretManager.java:293) > at > org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.handleLeaderChange(ZKLeaderManager.java:96) > at > org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.nodeDeleted(ZKLeaderManager.java:78) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:286) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497) > {noformat} > The {{KeeperException}} causes {{ZKLeaderManager}} to call > {{AuthenticationTokenSecretManager$LeaderElector.stop()}}, which calls > {{ZKLeaderManager.stepDownAsLeader()}}, which will encounter another > {{KeeperException}}, and so on... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4783) Improve RowCounter to count rows in a specific key range.
[ https://issues.apache.org/jira/browse/HBASE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156632#comment-13156632 ] Hudson commented on HBASE-4783: --- Integrated in HBase-TRUNK-security #7 (See [https://builds.apache.org/job/HBase-TRUNK-security/7/]) HBASE-4783 Improve RowCounter to count rows in a specific key range. nspiegelberg : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java > Improve RowCounter to count rows in a specific key range. > - > > Key: HBASE-4783 > URL: https://issues.apache.org/jira/browse/HBASE-4783 > Project: HBase > Issue Type: Improvement >Reporter: Nicolas Spiegelberg >Assignee: Nicolas Spiegelberg >Priority: Trivial > Fix For: 0.94.0 > > Attachments: 4783.txt, HBASE-4783.patch > > > Currently RowCounter in MR package is a very simple map only job that does a > full scan of a table. Enhance the utility to let the user specify a key range > and count the number of rows in this range. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4739) Master dying while going to close a region can leave it in transition forever
[ https://issues.apache.org/jira/browse/HBASE-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156628#comment-13156628 ] Hudson commented on HBASE-4739: --- Integrated in HBase-TRUNK-security #7 (See [https://builds.apache.org/job/HBase-TRUNK-security/7/]) HBASE-4739 Master dying while going to close a region can leave it in transition forever (Gao Jinchao) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/RegionTransitionData.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/UnAssignCallable.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java > Master dying while going to close a region can leave it in transition forever > - > > Key: HBASE-4739 > URL: https://issues.apache.org/jira/browse/HBASE-4739 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.4 >Reporter: Jean-Daniel Cryans >Assignee: gaojinchao >Priority: Minor > Fix For: 0.92.0, 0.94.0 > > Attachments: 4739_trial2.patch, 4739_trialV3.patch, > HBASE-4739_Branch092.patch, HBASE-4739_Trunk.patch, > HBASE-4739_Trunk_V2.patch, HBASE-4739_V7.patch, HBASE-4739_trail5.patch, > HBASE-4739_trial.patch, HBASE-4739_trial6.patch > > > I saw this in the aftermath of HBASE-4729 on a 0.92 refreshed yesterday, when > the master died it had just created the RIT znode for a region but didn't > tell the RS to close it yet. > When the master restarted it saw the znode and started printing this: > {quote} > 2011-11-03 00:02:49,130 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed > out: TestTable,0007560564,1320253568406.f76899564cabe7e9857c3aeb526ec9dc. > state=CLOSING, ts=1320253605285, server=sv4r11s38,62003,1320195046948 > 2011-11-03 00:02:49,130 INFO > org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSING for > too long, this should eventually complete or the server will expire, doing > nothing > {quote} > It's never going to happen, and it's blocking balancing. > I'm marking this as minor since I believe this situation is pretty rare > unless you hit other bugs while trying out stuff to root bugs out. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4787) Make corePool as a configurable parameter in HTable
[ https://issues.apache.org/jira/browse/HBASE-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156627#comment-13156627 ] Hudson commented on HBASE-4787: --- Integrated in HBase-TRUNK-security #7 (See [https://builds.apache.org/job/HBase-TRUNK-security/7/]) HBASE-4787 Rename HTable thread pool nspiegelberg : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java > Make corePool as a configurable parameter in HTable > --- > > Key: HBASE-4787 > URL: https://issues.apache.org/jira/browse/HBASE-4787 > Project: HBase > Issue Type: Improvement >Reporter: Nicolas Spiegelberg >Priority: Trivial > Fix For: 0.94.0 > > Attachments: HBASE-4787.patch > > > Make the corePool a configurable parameter in HTable. So we can tune this > parameter in the config file. While at it, change the core pool name so we > can distinguish it from other AppServer pools. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4785) Improve recovery time of HBase client when a region server dies.
[ https://issues.apache.org/jira/browse/HBASE-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156629#comment-13156629 ] Hudson commented on HBASE-4785: --- Integrated in HBase-TRUNK-security #7 (See [https://builds.apache.org/job/HBase-TRUNK-security/7/]) HBASE-4785 Improve recovery time of HBase client when a region server dies. nspiegelberg : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/SoftValueSortedMap.java > Improve recovery time of HBase client when a region server dies. > > > Key: HBASE-4785 > URL: https://issues.apache.org/jira/browse/HBASE-4785 > Project: HBase > Issue Type: Improvement >Reporter: Nicolas Spiegelberg >Assignee: Nicolas Spiegelberg >Priority: Minor > Fix For: 0.92.0 > > Attachments: HBASE-4785.patch, HBASE-4785.patch > > > When a region server dies, the HBase client waits until the RPC timesout > before learning that it needs to check META to find the new location of the > region. And it incurs this *timeout* cost for every region being served by > the dead region server. Remove this overhead by clearing the entries in cache > that have the dead region server as their values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4308) Race between RegionOpenedHandler and AssignmentManager
[ https://issues.apache.org/jira/browse/HBASE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156630#comment-13156630 ] Hudson commented on HBASE-4308: --- Integrated in HBase-TRUNK-security #7 (See [https://builds.apache.org/job/HBase-TRUNK-security/7/]) HBASE-4308 Race between RegionOpenedHandler and AssignmentManager(Ram) ramkrishna : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java > Race between RegionOpenedHandler and AssignmentManager > -- > > Key: HBASE-4308 > URL: https://issues.apache.org/jira/browse/HBASE-4308 > Project: HBase > Issue Type: Bug >Affects Versions: 0.92.0 >Reporter: Todd Lipcon >Assignee: ramkrishna.s.vasudevan > Fix For: 0.92.0 > > Attachments: HBASE-4308.patch, HBASE-4308_1.patch, HBASE-4308_2.patch > > > When the master is processing a ZK event for REGION_OPENED, it calls delete() > on the znode before it removes the node from RegionsInTransition. If the > notification of that delete comes back into AssignmentManager before the > region is removed from RIT, you see an error like: > 2011-08-30 17:43:29,537 WARN [main-EventThread] > master.AssignmentManager(861): Node deleted but still in RIT: > .META.,,1.1028785192 state=OPEN, ts=1314751409532, > server=todd-w510,55655,1314751396840 > Not certain if it causes issues, but it's a concerning log message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4853) HBASE-4789 does overzealous pruning of seqids
[ https://issues.apache.org/jira/browse/HBASE-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156631#comment-13156631 ] Hudson commented on HBASE-4853: --- Integrated in HBase-TRUNK-security #7 (See [https://builds.apache.org/job/HBase-TRUNK-security/7/]) HBASE-4853 HBASE-4789 does overzealous pruning of seqids HBASE-4853 HBASE-4789 does overzealous pruning of seqids; REVERT TEMPORARILY TO GET TED COMMENT IN HBASE-4853 HBASE-4789 does overzealous pruning of seqids stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java stack : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/TestGlobalMemStoreSize.java > HBASE-4789 does overzealous pruning of seqids > - > > Key: HBASE-4853 > URL: https://issues.apache.org/jira/browse/HBASE-4853 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: 4853--no-prefix.txt, 4853-trunk.txt, 4853-v10.txt, > 4853-v4.txt, 4853-v5.txt, 4853-v6.txt, 4853-v7.txt, 4853-v8.txt, 4853-v9.txt, > 4853-v9.txt, 4853.txt > > > Working w/ J-D on failing replication test turned up hole in seqids made by > the patch over in hbase-4789. With this patch in place we see lots of > instances of the suspicious: 'Last sequenceid written is empty. Deleting all > old hlogs' > At a minimum, these lines need removing: > {code} > diff --git a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > index 623edbe..a0bbe01 100644 > --- a/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java > @@ -1359,11 +1359,6 @@ public class HLog implements Syncable { >// Cleaning up of lastSeqWritten is in the finally clause because we >// don't want to confuse getOldestOutstandingSeqNum() >this.lastSeqWritten.remove(getSnapshotName(encodedRegionName)); > - Long l = this.lastSeqWritten.remove(encodedRegionName); > - if (l != null) { > -LOG.warn("Why is there a raw encodedRegionName in lastSeqWritten? > name=" + > - Bytes.toString(encodedRegionName) + ", seqid=" + l); > - } >this.cacheFlushLock.unlock(); > } >} > {code} > ... but above is no good w/o figuring why WALs are not being rotated off. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4863: --- Attachment: D531.2.patch mbautin updated the revision "[jira] [HBASE-4863] Make HBase Thrift server more configurable and add a command-line UI test". Reviewers: JIRA, Kannan, tedyu, stack Updating with the most recent version. Posted a stale version at first -- sorry for spam. REVISION DETAIL https://reviews.facebook.net/D531 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java src/main/java/org/apache/hadoop/hbase/util/Threads.java src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java src/test/java/org/apache/hadoop/hbase/util/TestThreads.java > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: D531.1.patch, D531.2.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
[ https://issues.apache.org/jira/browse/HBASE-4863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HBASE-4863: --- Attachment: D531.1.patch mbautin requested code review of "[jira] [HBASE-4863] Make HBase Thrift server more configurable and add a command-line UI test". Reviewers: JIRA, Kannan, tedyu, stack This started as an internal hotfix where we found out that the Thrift server spawned 15000 threads. To bound the thread pool size I added a custom thread pool server implementation called HBaseThreadPoolServer into HBase codebase, and made the following parameters configurable from both command line and as config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. Under an increasing load, the server creates new threads for every connection before the pool size reaches minWorkerThreads. After that, the server puts new connections into the queue and only creates a new thread when the queue is full. If an attempt to create a new thread fails, the server drops connection. The default TThreadPoolServer would crash in that case, but it never happened because the thread pool was unbounded, so the server would hang indefinitely, consume a lot of memory, and cause huge latency spikes on the client side. Another part of this fix is refactoring and unit testing of the command-line part of the Thrift server. The logic there is sufficiently complicated, and the existing ThriftServer class does not test that part at all. The new TestThriftServerCmdLine test starts the Thrift server on a random port with various combinations of options and talks to it through the client API from another thread. TEST PLAN Unit tests, cluster test with a Python Thrift client. I will post an update when I'm done with testing. REVISION DETAIL https://reviews.facebook.net/D531 AFFECTED FILES src/main/java/org/apache/hadoop/hbase/thrift/HBaseThreadPoolServer.java src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java src/main/java/org/apache/hadoop/hbase/util/Threads.java src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServerCmdLine.java src/test/java/org/apache/hadoop/hbase/util/TestThreads.java MANAGE HERALD DIFFERENTIAL RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/1167/ Tip: use the X-Herald-Rules header to filter Herald messages in your client. > Make HBase Thrift server more configurable and add a command-line UI test > - > > Key: HBASE-4863 > URL: https://issues.apache.org/jira/browse/HBASE-4863 > Project: HBase > Issue Type: Improvement >Reporter: Mikhail Bautin >Assignee: Mikhail Bautin > Attachments: D531.1.patch > > > This started as an internal hotfix where we found out that the Thrift server > spawned 15000 threads. To bound the thread pool size I added a custom thread > pool server implementation called HBaseThreadPoolServer into HBase codebase, > and made the following parameters configurable from both command line and as > config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. > Under an increasing load, the server creates new threads for every connection > before the pool size reaches minWorkerThreads. After that, the server puts > new connections into the queue and only creates a new thread when the queue > is full. If an attempt to create a new thread fails, the server drops > connection. The default TThreadPoolServer would crash in that case, but it > never happened because the thread pool was unbounded, so the server would > hang indefinitely, consume a lot of memory, and cause huge latency spikes on > the client side. > Another part of this fix is refactoring and unit testing of the command-line > part of the Thrift server. The logic there is sufficiently complicated, and > the existing ThriftServer class does not test that part at all. The new > TestThriftServerCmdLine test starts the Thrift server on a random port with > various combinations of options and talks to it through the client API from > another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4863) Make HBase Thrift server more configurable and add a command-line UI test
Make HBase Thrift server more configurable and add a command-line UI test - Key: HBASE-4863 URL: https://issues.apache.org/jira/browse/HBASE-4863 Project: HBase Issue Type: Improvement Reporter: Mikhail Bautin Assignee: Mikhail Bautin This started as an internal hotfix where we found out that the Thrift server spawned 15000 threads. To bound the thread pool size I added a custom thread pool server implementation called HBaseThreadPoolServer into HBase codebase, and made the following parameters configurable from both command line and as config settings: minWorkerThreads, maxWorkerThreads, and maxQueuedRequests. Under an increasing load, the server creates new threads for every connection before the pool size reaches minWorkerThreads. After that, the server puts new connections into the queue and only creates a new thread when the queue is full. If an attempt to create a new thread fails, the server drops connection. The default TThreadPoolServer would crash in that case, but it never happened because the thread pool was unbounded, so the server would hang indefinitely, consume a lot of memory, and cause huge latency spikes on the client side. Another part of this fix is refactoring and unit testing of the command-line part of the Thrift server. The logic there is sufficiently complicated, and the existing ThriftServer class does not test that part at all. The new TestThriftServerCmdLine test starts the Thrift server on a random port with various combinations of options and talks to it through the client API from another thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira