[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test

2014-08-21 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105794#comment-14105794
 ] 

Alex Newman commented on HBASE-11798:
-

I noticed when trying to commit HBASE-4955 that TestBucketWriterThread can 
zombie in its setup function.

I also noticed
main prio=10 tid=0x7fa9e000a800 nid=0x571 waiting on condition 
[0x7fa9e6184000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:143)
at 
org.apache.hadoop.hbase.io.hfile.bucket.TestBucketWriterThread.setUp(TestBucketWriterThread.java:78)

In the jstack at 
https://builds.apache.org/job/PreCommit-HBASE-Build/10511/console

Looking at the code

this.plainCacheable = Mockito.mock(Cacheable.class);
bc.cacheBlock(this.plainKey, plainCacheable);
while(!bc.ramCache.isEmpty()) Threads.sleep(1); - where we hang
assertTrue(q.isEmpty());
// Now writer thread should be disabled.

At first I was confused but then I realized that isn't Thread.sleep it is 
Threads.sleep
  /**
   * If interrupted, just prints out the interrupt on STDOUT, resets interrupt 
and returns
   * @param millis How long to sleep for in milliseconds.
   */
  public static void sleep(long millis) {
try {
  Thread.sleep(millis);
} catch (InterruptedException e) {
  e.printStackTrace();
  Thread.currentThread().interrupt();
}
  }


I don't know if we need this. I am curious if we can fix it with a different 
sleep command.

 TestBucketWriterThread can zombie test
 --

 Key: HBASE-11798
 URL: https://issues.apache.org/jira/browse/HBASE-11798
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test

2014-08-21 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14105877#comment-14105877
 ] 

Alex Newman commented on HBASE-11798:
-

Thinking about it, this test should sleep in a loop forever. Let's limit it to 
1000 tries. If it can't succeed then, we will be stuck with a slow or flakey 
test. I think that causes more hard then good.

 TestBucketWriterThread can zombie test
 --

 Key: HBASE-11798
 URL: https://issues.apache.org/jira/browse/HBASE-11798
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman
 Attachments: HBASE-11798-v1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test

2014-08-21 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106022#comment-14106022
 ] 

Sergey Soldatov commented on HBASE-11798:
-

Here is a small evaluation. 
Test create a BucketCache which is create a WriterThread 
After that test disableWriter in the WriterThread, so according to its logic it 
should stop working.
At the time of setting disableWriter test expects that WriterThread is blocking 
by waiting for a new entry in ramQueue. 
The truth is that even if WriterThread is already created and following asserts 
passed:
assertEquals(writerThreadsCount, bc.writerThreads.length);
assertEquals(writerThreadsCount, bc.writerQueues.size());
that doesn't mean that .run() was already executed. So, sometimes it could 
happen that the real execution is following:
1. test creates BucketCache
2. BucketCache creates WriterThread
3. test sets WriterThread.disableWriter
4. WriterThread executed run() and stops on the first check that writer is not 
disabled. 

To fix it I suggest to wait until Writer.Thread is executed. One of the way 
without adding an additional flag is to check the thread state. It should be 
WAITING or BLOCKED. Or in other hands is not RUNNABLE. 
patch is attached. 

 TestBucketWriterThread can zombie test
 --

 Key: HBASE-11798
 URL: https://issues.apache.org/jira/browse/HBASE-11798
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman
 Attachments: HBASE-11798-v1.patch, HBASE-11798-v2.patch, 
 HBASE-11798-v3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test

2014-08-21 Thread Esteban Gutierrez (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106026#comment-14106026
 ] 

Esteban Gutierrez commented on HBASE-11798:
---

I've been hitting this very often [~posix4e] , +1 for v3

 TestBucketWriterThread can zombie test
 --

 Key: HBASE-11798
 URL: https://issues.apache.org/jira/browse/HBASE-11798
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman
 Attachments: HBASE-11798-v1.patch, HBASE-11798-v2.patch, 
 HBASE-11798-v3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test

2014-08-21 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106038#comment-14106038
 ] 

Alex Newman commented on HBASE-11798:
-

OK I realized we are not trying to kill them. They aren't zombies, they are 
just tests that never completed. 

 TestBucketWriterThread can zombie test
 --

 Key: HBASE-11798
 URL: https://issues.apache.org/jira/browse/HBASE-11798
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman
 Attachments: HBASE-11798-v1.patch, HBASE-11798-v2.patch, 
 HBASE-11798-v3.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test

2014-08-21 Thread Alex Newman (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106064#comment-14106064
 ] 

Alex Newman commented on HBASE-11798:
-

Combined the good aspects of both of our fixes. In essence since @begin can't 
timeout, it's important that we not try forever. In my opinion Sergey needs to 
get credit for this one though. He's the one who found the root of the problem. 
I also cleaned up some nits on the file. I am not sure if it should be a 
separate commit, but there were also lies in the javadocs.

 TestBucketWriterThread can zombie test
 --

 Key: HBASE-11798
 URL: https://issues.apache.org/jira/browse/HBASE-11798
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman
 Attachments: HBASE-11798-v1.patch, HBASE-11798-v2.patch, 
 HBASE-11798-v3.patch, HBASE-11798-v4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11798) TestBucketWriterThread can zombie test

2014-08-21 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106335#comment-14106335
 ] 

Ted Yu commented on HBASE-11798:


Patch v4 looks good - I ran the test locally with it.

Let me integrate this - the test hangs quite often.

 TestBucketWriterThread can zombie test
 --

 Key: HBASE-11798
 URL: https://issues.apache.org/jira/browse/HBASE-11798
 Project: HBase
  Issue Type: Bug
Reporter: Alex Newman
Assignee: Alex Newman
 Fix For: 2.0.0

 Attachments: HBASE-11798-v1.patch, HBASE-11798-v2.patch, 
 HBASE-11798-v3.patch, HBASE-11798-v4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)