[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test
[ https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105794#comment-14105794 ] Alex Newman commented on HBASE-11798: - I noticed when trying to commit HBASE-4955 that TestBucketWriterThread can zombie in its setup function. I also noticed main prio=10 tid=0x7fa9e000a800 nid=0x571 waiting on condition [0x7fa9e6184000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:143) at org.apache.hadoop.hbase.io.hfile.bucket.TestBucketWriterThread.setUp(TestBucketWriterThread.java:78) In the jstack at https://builds.apache.org/job/PreCommit-HBASE-Build/10511/console Looking at the code this.plainCacheable = Mockito.mock(Cacheable.class); bc.cacheBlock(this.plainKey, plainCacheable); while(!bc.ramCache.isEmpty()) Threads.sleep(1); - where we hang assertTrue(q.isEmpty()); // Now writer thread should be disabled. At first I was confused but then I realized that isn't Thread.sleep it is Threads.sleep /** * If interrupted, just prints out the interrupt on STDOUT, resets interrupt and returns * @param millis How long to sleep for in milliseconds. */ public static void sleep(long millis) { try { Thread.sleep(millis); } catch (InterruptedException e) { e.printStackTrace(); Thread.currentThread().interrupt(); } } I don't know if we need this. I am curious if we can fix it with a different sleep command. TestBucketWriterThread can zombie test -- Key: HBASE-11798 URL: https://issues.apache.org/jira/browse/HBASE-11798 Project: HBase Issue Type: Bug Reporter: Alex Newman Assignee: Alex Newman -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test
[ https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105877#comment-14105877 ] Alex Newman commented on HBASE-11798: - Thinking about it, this test should sleep in a loop forever. Let's limit it to 1000 tries. If it can't succeed then, we will be stuck with a slow or flakey test. I think that causes more hard then good. TestBucketWriterThread can zombie test -- Key: HBASE-11798 URL: https://issues.apache.org/jira/browse/HBASE-11798 Project: HBase Issue Type: Bug Reporter: Alex Newman Assignee: Alex Newman Attachments: HBASE-11798-v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test
[ https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106022#comment-14106022 ] Sergey Soldatov commented on HBASE-11798: - Here is a small evaluation. Test create a BucketCache which is create a WriterThread After that test disableWriter in the WriterThread, so according to its logic it should stop working. At the time of setting disableWriter test expects that WriterThread is blocking by waiting for a new entry in ramQueue. The truth is that even if WriterThread is already created and following asserts passed: assertEquals(writerThreadsCount, bc.writerThreads.length); assertEquals(writerThreadsCount, bc.writerQueues.size()); that doesn't mean that .run() was already executed. So, sometimes it could happen that the real execution is following: 1. test creates BucketCache 2. BucketCache creates WriterThread 3. test sets WriterThread.disableWriter 4. WriterThread executed run() and stops on the first check that writer is not disabled. To fix it I suggest to wait until Writer.Thread is executed. One of the way without adding an additional flag is to check the thread state. It should be WAITING or BLOCKED. Or in other hands is not RUNNABLE. patch is attached. TestBucketWriterThread can zombie test -- Key: HBASE-11798 URL: https://issues.apache.org/jira/browse/HBASE-11798 Project: HBase Issue Type: Bug Reporter: Alex Newman Assignee: Alex Newman Attachments: HBASE-11798-v1.patch, HBASE-11798-v2.patch, HBASE-11798-v3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test
[ https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106026#comment-14106026 ] Esteban Gutierrez commented on HBASE-11798: --- I've been hitting this very often [~posix4e] , +1 for v3 TestBucketWriterThread can zombie test -- Key: HBASE-11798 URL: https://issues.apache.org/jira/browse/HBASE-11798 Project: HBase Issue Type: Bug Reporter: Alex Newman Assignee: Alex Newman Attachments: HBASE-11798-v1.patch, HBASE-11798-v2.patch, HBASE-11798-v3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test
[ https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106038#comment-14106038 ] Alex Newman commented on HBASE-11798: - OK I realized we are not trying to kill them. They aren't zombies, they are just tests that never completed. TestBucketWriterThread can zombie test -- Key: HBASE-11798 URL: https://issues.apache.org/jira/browse/HBASE-11798 Project: HBase Issue Type: Bug Reporter: Alex Newman Assignee: Alex Newman Attachments: HBASE-11798-v1.patch, HBASE-11798-v2.patch, HBASE-11798-v3.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test
[ https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106064#comment-14106064 ] Alex Newman commented on HBASE-11798: - Combined the good aspects of both of our fixes. In essence since @begin can't timeout, it's important that we not try forever. In my opinion Sergey needs to get credit for this one though. He's the one who found the root of the problem. I also cleaned up some nits on the file. I am not sure if it should be a separate commit, but there were also lies in the javadocs. TestBucketWriterThread can zombie test -- Key: HBASE-11798 URL: https://issues.apache.org/jira/browse/HBASE-11798 Project: HBase Issue Type: Bug Reporter: Alex Newman Assignee: Alex Newman Attachments: HBASE-11798-v1.patch, HBASE-11798-v2.patch, HBASE-11798-v3.patch, HBASE-11798-v4.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test
[ https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106335#comment-14106335 ] Ted Yu commented on HBASE-11798: Patch v4 looks good - I ran the test locally with it. Let me integrate this - the test hangs quite often. TestBucketWriterThread can zombie test -- Key: HBASE-11798 URL: https://issues.apache.org/jira/browse/HBASE-11798 Project: HBase Issue Type: Bug Reporter: Alex Newman Assignee: Alex Newman Fix For: 2.0.0 Attachments: HBASE-11798-v1.patch, HBASE-11798-v2.patch, HBASE-11798-v3.patch, HBASE-11798-v4.patch -- This message was sent by Atlassian JIRA (v6.2#6252)