[jira] [Commented] (TEZ-2342) TestFaultTolerance.testRandomFailingTasks fails due to timeout

2015-04-27 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513676#comment-14513676
 ] 

Jeff Zhang commented on TEZ-2342:
-

[~bikassaha] No other issue after running many times, and check the logs on the 
windows jenkins server, it is failed due to timeout.



 TestFaultTolerance.testRandomFailingTasks fails due to timeout
 --

 Key: TEZ-2342
 URL: https://issues.apache.org/jira/browse/TEZ-2342
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor
 Attachments: TEZ-2342-1.patch, syslog_dag_1429582868137_0001_1


 {code}
 Error Message
 test timed out after 12 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 12 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:126)
   at 
 org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114)
   at 
 org.apache.tez.test.TestFaultTolerance.testRandomFailingTasks(TestFaultTolerance.java:723)
 Standard Output
 2015-04-17 07:46:10,952 INFO  [main] test.TestFaultTolerance 
 (TestFaultTolerance.java:setup(65)) - Starting mini clusters
 2015-04-17 07:46:11,508 INFO  [main] hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:init(446)) - starting cluster: numNameNodes=1, 
 numDataNodes=1
 Formatting using clusterid: testClusterID
 2015-04-17 07:46:12,919 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(716)) - No KeyProvider found.
 2015-04-17 07:46:12,920 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(726)) - fsLock is fair:true
 2015-04-17 07:46:13,021 INFO  [main] Configuration.deprecation 
 (Configuration.java:warnOnceIfDeprecated(1173)) - 
 hadoop.configured.node.mapping is deprecated. Instead, use 
 net.topology.configured.node.mapping
 2015-04-17 07:46:13,021 INFO  [main] blockmanagement.DatanodeManager 
 (DatanodeManager.java:init(239)) - dfs.block.invalidate.limit=1000
 2015-04-17 07:46:13,022 INFO  [main] blockmanagement.DatanodeManager 
 (DatanodeManager.java:init(245)) - 
 dfs.namenode.datanode.registration.ip-hostname-check=true
 2015-04-17 07:46:13,022 INFO  [main] blockmanagement.BlockManager 
 (InvalidateBlocks.java:printBlockDeletionTime(71)) - 
 dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
 2015-04-17 07:46:13,025 INFO  [main] blockmanagement.BlockManager 
 (InvalidateBlocks.java:printBlockDeletionTime(76)) - The block deletion will 
 start around 2015 Apr 17 07:46:13
 2015-04-17 07:46:13,029 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(354)) - Computing capacity for map 
 BlocksMap
 2015-04-17 07:46:13,030 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(355)) - VM type   = 64-bit
 2015-04-17 07:46:13,032 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(356)) - 2.0% max memory 910.3 MB = 18.2 
 MB
 2015-04-17 07:46:13,033 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(361)) - capacity  = 2^21 = 2097152 
 entries
 2015-04-17 07:46:13,079 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:createBlockTokenSecretManager(365)) - 
 dfs.block.access.token.enable=false
 2015-04-17 07:46:13,080 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(350)) - defaultReplication = 1
 2015-04-17 07:46:13,080 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(351)) - maxReplication = 512
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(352)) - minReplication = 1
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(353)) - maxReplicationStreams  = 2
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(354)) - shouldCheckForEnoughRacks  = false
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(355)) - replicationRecheckInterval = 3000
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(356)) - encryptDataTransfer= false
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(357)) - maxNumBlocksToLog  = 1000
 2015-04-17 07:46:13,115 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(746)) - fsOwner = jenkins (auth:SIMPLE)
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(747)) - supergroup  = supergroup
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(748)) - isPermissionEnabled = true
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 

[jira] [Commented] (TEZ-2342) TestFaultTolerance.testRandomFailingTasks fails due to timeout

2015-04-27 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514427#comment-14514427
 ] 

Bikas Saha commented on TEZ-2342:
-

Sounds good. +1

 TestFaultTolerance.testRandomFailingTasks fails due to timeout
 --

 Key: TEZ-2342
 URL: https://issues.apache.org/jira/browse/TEZ-2342
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor
 Attachments: TEZ-2342-1.patch, syslog_dag_1429582868137_0001_1


 {code}
 Error Message
 test timed out after 12 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 12 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:126)
   at 
 org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114)
   at 
 org.apache.tez.test.TestFaultTolerance.testRandomFailingTasks(TestFaultTolerance.java:723)
 Standard Output
 2015-04-17 07:46:10,952 INFO  [main] test.TestFaultTolerance 
 (TestFaultTolerance.java:setup(65)) - Starting mini clusters
 2015-04-17 07:46:11,508 INFO  [main] hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:init(446)) - starting cluster: numNameNodes=1, 
 numDataNodes=1
 Formatting using clusterid: testClusterID
 2015-04-17 07:46:12,919 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(716)) - No KeyProvider found.
 2015-04-17 07:46:12,920 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(726)) - fsLock is fair:true
 2015-04-17 07:46:13,021 INFO  [main] Configuration.deprecation 
 (Configuration.java:warnOnceIfDeprecated(1173)) - 
 hadoop.configured.node.mapping is deprecated. Instead, use 
 net.topology.configured.node.mapping
 2015-04-17 07:46:13,021 INFO  [main] blockmanagement.DatanodeManager 
 (DatanodeManager.java:init(239)) - dfs.block.invalidate.limit=1000
 2015-04-17 07:46:13,022 INFO  [main] blockmanagement.DatanodeManager 
 (DatanodeManager.java:init(245)) - 
 dfs.namenode.datanode.registration.ip-hostname-check=true
 2015-04-17 07:46:13,022 INFO  [main] blockmanagement.BlockManager 
 (InvalidateBlocks.java:printBlockDeletionTime(71)) - 
 dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
 2015-04-17 07:46:13,025 INFO  [main] blockmanagement.BlockManager 
 (InvalidateBlocks.java:printBlockDeletionTime(76)) - The block deletion will 
 start around 2015 Apr 17 07:46:13
 2015-04-17 07:46:13,029 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(354)) - Computing capacity for map 
 BlocksMap
 2015-04-17 07:46:13,030 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(355)) - VM type   = 64-bit
 2015-04-17 07:46:13,032 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(356)) - 2.0% max memory 910.3 MB = 18.2 
 MB
 2015-04-17 07:46:13,033 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(361)) - capacity  = 2^21 = 2097152 
 entries
 2015-04-17 07:46:13,079 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:createBlockTokenSecretManager(365)) - 
 dfs.block.access.token.enable=false
 2015-04-17 07:46:13,080 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(350)) - defaultReplication = 1
 2015-04-17 07:46:13,080 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(351)) - maxReplication = 512
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(352)) - minReplication = 1
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(353)) - maxReplicationStreams  = 2
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(354)) - shouldCheckForEnoughRacks  = false
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(355)) - replicationRecheckInterval = 3000
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(356)) - encryptDataTransfer= false
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(357)) - maxNumBlocksToLog  = 1000
 2015-04-17 07:46:13,115 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(746)) - fsOwner = jenkins (auth:SIMPLE)
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(747)) - supergroup  = supergroup
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(748)) - isPermissionEnabled = true
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(759)) - HA Enabled: false
 2015-04-17 07:46:13,120 INFO  [main] namenode.FSNamesystem 
 

[jira] [Commented] (TEZ-2342) TestFaultTolerance.testRandomFailingTasks fails due to timeout

2015-04-22 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506515#comment-14506515
 ] 

Jeff Zhang commented on TEZ-2342:
-

[~hitesh] [~bikassaha] Please help review it. 

 TestFaultTolerance.testRandomFailingTasks fails due to timeout
 --

 Key: TEZ-2342
 URL: https://issues.apache.org/jira/browse/TEZ-2342
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor
 Attachments: TEZ-2342-1.patch, syslog_dag_1429582868137_0001_1


 {code}
 Error Message
 test timed out after 12 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 12 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:126)
   at 
 org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114)
   at 
 org.apache.tez.test.TestFaultTolerance.testRandomFailingTasks(TestFaultTolerance.java:723)
 Standard Output
 2015-04-17 07:46:10,952 INFO  [main] test.TestFaultTolerance 
 (TestFaultTolerance.java:setup(65)) - Starting mini clusters
 2015-04-17 07:46:11,508 INFO  [main] hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:init(446)) - starting cluster: numNameNodes=1, 
 numDataNodes=1
 Formatting using clusterid: testClusterID
 2015-04-17 07:46:12,919 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(716)) - No KeyProvider found.
 2015-04-17 07:46:12,920 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(726)) - fsLock is fair:true
 2015-04-17 07:46:13,021 INFO  [main] Configuration.deprecation 
 (Configuration.java:warnOnceIfDeprecated(1173)) - 
 hadoop.configured.node.mapping is deprecated. Instead, use 
 net.topology.configured.node.mapping
 2015-04-17 07:46:13,021 INFO  [main] blockmanagement.DatanodeManager 
 (DatanodeManager.java:init(239)) - dfs.block.invalidate.limit=1000
 2015-04-17 07:46:13,022 INFO  [main] blockmanagement.DatanodeManager 
 (DatanodeManager.java:init(245)) - 
 dfs.namenode.datanode.registration.ip-hostname-check=true
 2015-04-17 07:46:13,022 INFO  [main] blockmanagement.BlockManager 
 (InvalidateBlocks.java:printBlockDeletionTime(71)) - 
 dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
 2015-04-17 07:46:13,025 INFO  [main] blockmanagement.BlockManager 
 (InvalidateBlocks.java:printBlockDeletionTime(76)) - The block deletion will 
 start around 2015 Apr 17 07:46:13
 2015-04-17 07:46:13,029 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(354)) - Computing capacity for map 
 BlocksMap
 2015-04-17 07:46:13,030 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(355)) - VM type   = 64-bit
 2015-04-17 07:46:13,032 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(356)) - 2.0% max memory 910.3 MB = 18.2 
 MB
 2015-04-17 07:46:13,033 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(361)) - capacity  = 2^21 = 2097152 
 entries
 2015-04-17 07:46:13,079 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:createBlockTokenSecretManager(365)) - 
 dfs.block.access.token.enable=false
 2015-04-17 07:46:13,080 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(350)) - defaultReplication = 1
 2015-04-17 07:46:13,080 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(351)) - maxReplication = 512
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(352)) - minReplication = 1
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(353)) - maxReplicationStreams  = 2
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(354)) - shouldCheckForEnoughRacks  = false
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(355)) - replicationRecheckInterval = 3000
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(356)) - encryptDataTransfer= false
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(357)) - maxNumBlocksToLog  = 1000
 2015-04-17 07:46:13,115 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(746)) - fsOwner = jenkins (auth:SIMPLE)
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(747)) - supergroup  = supergroup
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(748)) - isPermissionEnabled = true
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(759)) - HA Enabled: false
 2015-04-17 07:46:13,120 INFO  [main] 

[jira] [Commented] (TEZ-2342) TestFaultTolerance.testRandomFailingTasks fails due to timeout

2015-04-22 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507543#comment-14507543
 ] 

Bikas Saha commented on TEZ-2342:
-

If this passes with the increased timeout (instead of hanging permanently then 
the change looks good. Could you please run this in loop 10-20 times and see if 
there are any further issues. If none, then lets commit this. Else lets look 
for a code/test bug.

 TestFaultTolerance.testRandomFailingTasks fails due to timeout
 --

 Key: TEZ-2342
 URL: https://issues.apache.org/jira/browse/TEZ-2342
 Project: Apache Tez
  Issue Type: Bug
Reporter: Jeff Zhang
Assignee: Jeff Zhang
Priority: Minor
 Attachments: TEZ-2342-1.patch, syslog_dag_1429582868137_0001_1


 {code}
 Error Message
 test timed out after 12 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 12 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:126)
   at 
 org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:114)
   at 
 org.apache.tez.test.TestFaultTolerance.testRandomFailingTasks(TestFaultTolerance.java:723)
 Standard Output
 2015-04-17 07:46:10,952 INFO  [main] test.TestFaultTolerance 
 (TestFaultTolerance.java:setup(65)) - Starting mini clusters
 2015-04-17 07:46:11,508 INFO  [main] hdfs.MiniDFSCluster 
 (MiniDFSCluster.java:init(446)) - starting cluster: numNameNodes=1, 
 numDataNodes=1
 Formatting using clusterid: testClusterID
 2015-04-17 07:46:12,919 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(716)) - No KeyProvider found.
 2015-04-17 07:46:12,920 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(726)) - fsLock is fair:true
 2015-04-17 07:46:13,021 INFO  [main] Configuration.deprecation 
 (Configuration.java:warnOnceIfDeprecated(1173)) - 
 hadoop.configured.node.mapping is deprecated. Instead, use 
 net.topology.configured.node.mapping
 2015-04-17 07:46:13,021 INFO  [main] blockmanagement.DatanodeManager 
 (DatanodeManager.java:init(239)) - dfs.block.invalidate.limit=1000
 2015-04-17 07:46:13,022 INFO  [main] blockmanagement.DatanodeManager 
 (DatanodeManager.java:init(245)) - 
 dfs.namenode.datanode.registration.ip-hostname-check=true
 2015-04-17 07:46:13,022 INFO  [main] blockmanagement.BlockManager 
 (InvalidateBlocks.java:printBlockDeletionTime(71)) - 
 dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
 2015-04-17 07:46:13,025 INFO  [main] blockmanagement.BlockManager 
 (InvalidateBlocks.java:printBlockDeletionTime(76)) - The block deletion will 
 start around 2015 Apr 17 07:46:13
 2015-04-17 07:46:13,029 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(354)) - Computing capacity for map 
 BlocksMap
 2015-04-17 07:46:13,030 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(355)) - VM type   = 64-bit
 2015-04-17 07:46:13,032 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(356)) - 2.0% max memory 910.3 MB = 18.2 
 MB
 2015-04-17 07:46:13,033 INFO  [main] util.GSet 
 (LightWeightGSet.java:computeCapacity(361)) - capacity  = 2^21 = 2097152 
 entries
 2015-04-17 07:46:13,079 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:createBlockTokenSecretManager(365)) - 
 dfs.block.access.token.enable=false
 2015-04-17 07:46:13,080 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(350)) - defaultReplication = 1
 2015-04-17 07:46:13,080 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(351)) - maxReplication = 512
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(352)) - minReplication = 1
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(353)) - maxReplicationStreams  = 2
 2015-04-17 07:46:13,083 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(354)) - shouldCheckForEnoughRacks  = false
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(355)) - replicationRecheckInterval = 3000
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(356)) - encryptDataTransfer= false
 2015-04-17 07:46:13,084 INFO  [main] blockmanagement.BlockManager 
 (BlockManager.java:init(357)) - maxNumBlocksToLog  = 1000
 2015-04-17 07:46:13,115 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(746)) - fsOwner = jenkins (auth:SIMPLE)
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem 
 (FSNamesystem.java:init(747)) - supergroup  = supergroup
 2015-04-17 07:46:13,116 INFO  [main] namenode.FSNamesystem