[ 
https://issues.apache.org/jira/browse/HBASE-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-8419.
--------------------------

    Resolution: Cannot Reproduce

Don't see these hangs anymore.  We've rigged ec2 and apache builds to log 
zombies if it sees them.  I think this stuff fixed by a combo of issues 
including upgrade to hadoop-2.0.5-alpha.  Will open new issue if find hang 
again.
                
> Hadoop2 MR tests fail with delete failing/hanging threads present
> -----------------------------------------------------------------
>
>                 Key: HBASE-8419
>                 URL: https://issues.apache.org/jira/browse/HBASE-8419
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Jonathan Hsieh
>             Fix For: 0.95.2
>
>
> In flaky failure on hadoop2 runs of such as: 
> * TestImportTsv/testBulkOutputWithoutAnExistingTable
> * TestImportTsv/testMROnTable
> * TestImportExport/testWithFilter
> * (and many others)
> We have logs with hanging threads and failed file deletes that look like this.
> {code}
> 2013-04-24 06:05:01,807 WARN  [ContainersLauncher #0] 
> nodemanager.DefaultContainerExecutor(193): Exit code from task is : 137
> 2013-04-24 06:05:06,520 INFO  [pool-1-thread-1] hbase.ResourceChecker(171): 
> after: mapreduce.TestImportExport#testExportScannerBatching Thread=539 (was 
> 534)
> Potentially hanging thread: hbase-table-pool-25-thread-1
>       sun.misc.Unsafe.park(Native Method)
>       java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
>       
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:424)
> ...
> <threads seemingly related to dfs connection>
> {code}
> {code}2013-04-24 06:03:28,351 WARN  [DeletionService #0] 
> nodemanager.DefaultContainerExecutor(276): delete returned false for path: 
> [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_0/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
> 2013-04-24 06:03:28,353 WARN  [DeletionService #1] 
> nodemanager.DefaultContainerExecutor(276): delete returned false for path: 
> [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_1/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
> 2013-04-24 06:03:28,353 WARN  [DeletionService #2] 
> nodemanager.DefaultContainerExecutor(276): delete returned false for path: 
> [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_2/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
> 2013-04-24 06:03:28,354 WARN  [DeletionService #0] 
> nodemanager.DefaultContainerExecutor(276): delete returned false for path: 
> [/var/lib/jenkins/workspace/apache-hbase-trunk-hadoop2/trunk/hbase-server/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-localDir-nm-0_3/usercache/jenkins/appcache/application_1366808588748_0001/container_1366808588748_0001_01_000001]
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to