Rushabh S Shah created MAPREDUCE-7076: -----------------------------------------
Summary: TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build Key: MAPREDUCE-7076 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7076 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.8.0 Reporter: Rushabh S Shah TestNNBench#testNNBenchCreateReadAndDelete failed couple of times in our internal jenkins build. {noformat} java.lang.AssertionError: create_write should create the file at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestNNBench.testNNBenchCreateReadAndDelete(TestNNBench.java:55) {noformat} Below is my analysis for why it didn't create the file. {code:title=NNBench.java|borderStyle=solid} // Some comments here public void map(Text key, LongWritable value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException { if (barrier()) { String fileName = "file_" + value; if (op.equals(OP_CREATE_WRITE)) { startTimeTPmS = System.currentTimeMillis(); doCreateWriteOp(fileName, reporter); } ... } else { output.collect(new Text("l:latemaps"), new Text("1")); } // Below are the relevant parts of barrier() method private boolean barrier() { .. // If the sleep time is greater than 0, then sleep and return ... LOG.info("Waiting in barrier for: " + sleepTime + " ms"); return retVal; } // Below are the relevant parts of the doCreateWriteOp private void doCreateWriteOp(String name, Reporter reporter) { FSDataOutputStream out; byte[] buffer = new byte[bytesToWrite]; for (long l = 0l; l < numberOfFiles; l++) { Path filePath = new Path(new Path(baseDir, dataDirName), name + "_" + l); } .... } {code} This file {{BASE_DIR/data/file_0_0}} is getting created only if the map task starts before the time mentioned by {{startTime}}. Refer the chunk which I pasted above. {{map(..)}} --> {{barrier()}} and *only if* {{barrier()}} evaluates to true it will call {{doCreateWriteOp}} which will eventually create the file. In test case, the delay value is 3 seconds as per {{"-startTime", "" + (Time.now() / 1000 + 3)}} In this failing test case, I can see the task starting minimum 6 seconds after the test case started. {noformat} 2017-01-27 03:11:15,387 INFO [Thread-4] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: job_local1711545156_0001 2017-01-27 03:11:23,405 INFO [Thread-4] mapreduce.Job (Job.java:submit(1345)) - The url to track the job: http://localhost:8080/ {noformat} Also when I run this test on my laptop, I see the following line being printed. {noformat} 2017-01-27 17:09:27,982 INFO [LocalJobRunner Map Task Executor #0] hdfs.NNBench (NNBench.java:barrier(676)) - Waiting in barrier for: 1018 ms {noformat} This line will be printed only in {{barrier()}} method and I don't see this line in the logs of failed test. In our environment, the jenkins server was very slow and it took more than 6 seconds to launch a map task. The correct fix in my opinion would be to return true in case there is no sleep in {{barrier() method}}. Only in exception, it will return false. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org