[jira] [Created] (MAPREDUCE-6831) Flaky test TestJobImpl.testKilledDuringKillAbort
Peter Bacsko created MAPREDUCE-6831: --- Summary: Flaky test TestJobImpl.testKilledDuringKillAbort Key: MAPREDUCE-6831 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6831 Project: Hadoop Map/Reduce Issue Type: Test Components: mrv2 Reporter: Peter Bacsko Assignee: Peter Bacsko The test case TestJobImpl.testKilledDuringKillAbort() is flaky. Example of a failure: {noformat:title=Error Message} expected: but was: {noformat} {noformat:title=Stack Trace} java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:978) at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testKilledDuringKillAbort(TestJobImpl.java:516) {noformat} {noformat:title=Standard Output} 2016-12-12 00:26:29,724 INFO [Thread-12] event.AsyncDispatcher (AsyncDispatcher.java:register(202)) - Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler 2016-12-12 00:26:29,729 INFO [Thread-12] event.AsyncDispatcher (AsyncDispatcher.java:register(202)) - Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$StubbedJob 2016-12-12 00:26:29,729 INFO [Thread-12] event.AsyncDispatcher (AsyncDispatcher.java:register(202)) - Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$2a4993a5 2016-12-12 00:26:29,730 INFO [Thread-12] event.AsyncDispatcher (AsyncDispatcher.java:register(202)) - Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$2a4993a5 2016-12-12 00:26:29,730 INFO [Thread-12] event.AsyncDispatcher (AsyncDispatcher.java:register(202)) - Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$2a4993a5 2016-12-12 00:26:29,730 INFO [Thread-12] impl.JobImpl (JobImpl.java:setup(1523)) - Adding job token for job_123456789_0001 to jobTokenSecretManager 2016-12-12 00:26:29,731 WARN [Thread-12] impl.JobImpl (JobImpl.java:setup(1529)) - Shuffle secret key missing from job credentials. Using job token secret as shuffle secret. 2016-12-12 00:26:29,733 INFO [Thread-12] impl.JobImpl (JobImpl.java:makeUberDecision(1294)) - Not uberizing job_123456789_0001 because: not enabled; 2016-12-12 00:26:29,734 INFO [Thread-12] impl.JobImpl (JobImpl.java:createMapTasks(1551)) - Input size for job job_123456789_0001 = 0. Number of splits = 2 2016-12-12 00:26:29,734 INFO [Thread-12] impl.JobImpl (JobImpl.java:createReduceTasks(1568)) - Number of reduces for job job_123456789_0001 = 1 2016-12-12 00:26:29,734 INFO [Thread-12] impl.JobImpl (JobImpl.java:handle(1006)) - job_123456789_0001Job Transitioned from NEW to INITED 2016-12-12 00:26:29,736 INFO [CommitterEvent Processor #0] commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - Processing the event EventType: JOB_SETUP 2016-12-12 00:26:29,737 INFO [Thread-12] impl.JobImpl (JobImpl.java:handle(1006)) - job_123456789_0001Job Transitioned from INITED to SETUP 2016-12-12 00:26:29,738 INFO [AsyncDispatcher event handler] impl.JobImpl (JobImpl.java:handle(1006)) - job_123456789_0001Job Transitioned from SETUP to RUNNING {noformat} Reproduction: insert a {{Thread.sleep(50);}} after {{job.handle(new JobStartEvent(jobId));}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Reopened] (MAPREDUCE-6201) TestNetworkedJob fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko reopened MAPREDUCE-6201: - Assignee: Peter Bacsko (was: Brahma Reddy Battula) I'm reopening this because I was able to reproduce this. > TestNetworkedJob fails on trunk > --- > > Key: MAPREDUCE-6201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6201 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Robert Kanter >Assignee: Peter Bacsko > > Currently, {{TestNetworkedJob}} is failing on trunk: > {noformat} > Running org.apache.hadoop.mapred.TestNetworkedJob > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 215.01 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob > testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed: > 67.363 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6856) TestRecovery.testSpeculative fails if testCrashed fails
Peter Bacsko created MAPREDUCE-6856: --- Summary: TestRecovery.testSpeculative fails if testCrashed fails Key: MAPREDUCE-6856 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6856 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Peter Bacsko Assignee: Peter Bacsko The test {{testSpeculative}} in {{org.apache.hadoop.mapreduce.v2.app.TestRecovery}} is unstable. Based on my findings, the test itself is not problematic. It only fails if {{testCrashed}} in the same class fails before it. The reason is not completely clear to me, but I whenever I explicitly stop the MRAppMaster in {{testCrashed}} in a finally block, then the issue disappears. I think the reason is that both tests uses the same folder for staging. Solution: wrap logic in {{testCrashed}} in a try-finally block and then stop the MRAppMaster. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
Peter Bacsko created MAPREDUCE-6892: --- Summary: Issues with the count of failed/killed tasks in the jhist file Key: MAPREDUCE-6892 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, jobhistoryserver Reporter: Peter Bacsko Assignee: Peter Bacsko Recently we encountered some issues with the value of failed tasks. After parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually there were failures. Another minor thing is that you cannot get the number of killed tasks (although this can be calculated). The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the successful map/reduce task counts. Number of failed (or killed) tasks are not stored. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6898) TestKill.testKillTask is flaky
Peter Bacsko created MAPREDUCE-6898: --- Summary: TestKill.testKillTask is flaky Key: MAPREDUCE-6898 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6898 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, test Reporter: Peter Bacsko Assignee: Peter Bacsko TestKill.testKillTask() can fail the async dispatcher thread is slower than the test's thread. {noformat} 2017-05-26 11:43:26,532 INFO [AsyncDispatcher event handler] impl.JobImpl (JobImpl.java:handle(1006)) - job_0_Job Transitioned from INITED to SETUP Job State is : RUNNING Job State is : RUNNING Waiting for state : SUCCEEDED map progress : 0.0 reduce progress : 0.0 2017-05-26 11:43:26,538 INFO [CommitterEvent Processor #0] commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - Processing the event EventType: JOB_SETUP 2017-05-26 11:43:26,540 INFO [AsyncDispatcher event handler] impl.TaskImpl (TaskImpl.java:handle(661)) - task_0__m_00 Task Transitioned from NEW to KILLED 2017-05-26 11:43:26,540 ERROR [AsyncDispatcher event handler] impl.JobImpl (JobImpl.java:handle(998)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SETUP at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1366) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1362) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) 2017-05-26 11:43:26,541 INFO [AsyncDispatcher event handler] impl.JobImpl (JobImpl.java:handle(1006)) - job_0_Job Transitioned from SETUP to ERROR 2017-05-26 11:43:26,542 INFO [AsyncDispatcher event handler] app.MRAppMaster (MRAppMaster.java:serviceStop(978)) - Skipping cleaning up the staging dir. assuming AM will be retried. {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6939) Follow-up on MAPREDUCE-6870
Peter Bacsko created MAPREDUCE-6939: --- Summary: Follow-up on MAPREDUCE-6870 Key: MAPREDUCE-6939 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6939 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Reporter: Peter Bacsko Assignee: Peter Bacsko Priority: Minor Some minor changes should be made after MAPREDUCE-6870 was committed upstream: 1. Fix JavaDoc in {{JobImpl.java}} 2. Correct the description of the method, that is, it might not be entirely clear what the "improvement" is or what it really improves 3. Small typo in the name of the new testcase -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6953) Skip the testcase testJobWithChangePriority if FairScheduler is used
Peter Bacsko created MAPREDUCE-6953: --- Summary: Skip the testcase testJobWithChangePriority if FairScheduler is used Key: MAPREDUCE-6953 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6953 Project: Hadoop Map/Reduce Issue Type: Test Components: client Reporter: Peter Bacsko Assignee: Peter Bacsko We run the unit tests with Fair Scheduler downstream. FS does not support priorities at the moment, so TestMRJobs#testJobWithChangePriority fails. Just add {{Assume.assumeFalse(usingFairScheduler);}} and JUnit will skip the test. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6954) Disable erasure coding for files that are uploaded to the MR staging area
Peter Bacsko created MAPREDUCE-6954: --- Summary: Disable erasure coding for files that are uploaded to the MR staging area Key: MAPREDUCE-6954 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6954 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Peter Bacsko Assignee: Peter Bacsko Depending on the encoder/decoder used and the type or MR workload, EC might negatively affect the performance of an MR job if too many files are localized. In such a scenario, users might want to disable EC in the staging area to speed up the execution. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7015) Possible race condition in JHS if the job is not loaded
Peter Bacsko created MAPREDUCE-7015: --- Summary: Possible race condition in JHS if the job is not loaded Key: MAPREDUCE-7015 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7015 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Peter Bacsko Assignee: Peter Bacsko There could be a race condition inside JHS. In our build environment, {{TestMRJobClient.testJobClient()}} failed with this exception: {noformat} ava.io.FileNotFoundException: File does not exist: hdfs://localhost:32836/tmp/hadoop-yarn/staging/history/done_intermediate/jenkins/job_1509975084722_0001_conf.xml at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266) at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1258) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2123) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2092) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2068) at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:460) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.mapreduce.TestMRJobClient.runTool(TestMRJobClient.java:94) at org.apache.hadoop.mapreduce.TestMRJobClient.testConfig(TestMRJobClient.java:551) at org.apache.hadoop.mapreduce.TestMRJobClient.testJobClient(TestMRJobClient.java:167) {noformat} Root cause: 1. MapReduce job completes 2. CLI calls {{cluster.getJob(jobid)}} 3. The job is finished and the client side gets redirected to JHS 4. The job data is missing from CachedHistoryStorage so JHS tries to find the job 5. First it scans the intermediate directory and finds the job 6. The call moveToDone() is scheduled for execution on a separate thread inside moveToDoneExecutor but does not get the chance to run immediately 7. RPC invocation returns with the path pointing to /tmp/hadoop-yarn/staging/history/done_intermediate 8. The call to moveToDone() completes which moves the contents of done_intermediate to done 9. Hadoop CLI tries to download the config file from done_intermediate but it's no longer there Usually step #6 is fast enough to complete before step #7, but sometimes it can get behind, causing this race condition. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7046) Enhance logging related to retrieving Job
Peter Bacsko created MAPREDUCE-7046: --- Summary: Enhance logging related to retrieving Job Key: MAPREDUCE-7046 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7046 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Peter Bacsko Assignee: Peter Bacsko We recently encountered an interesting problem. In one case, Hive Driver was unable to retrieve the status of a MapReduce job. The following stack trace was printed: {noformat} [main] INFO org.apache.hadoop.hive.ql.exec.Task - 2018-01-15 00:18:09,324 Stage-2 map = 0%, reduce = 0%, Cumulative CPU 1679.31 sec [main] ERROR org.apache.hadoop.hive.ql.exec.Task - Ended Job = job_1511036412170_1322169 with exception 'java.io.IOException(Could not find status of job:job_1511036412170_1322169)' java.io.IOException: Could not find status of job:job_1511036412170_1322169 at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:295) at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:549) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:435) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:137) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1782) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1539) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1318) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1127) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1115) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:172) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:383) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:318) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:416) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:432) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:726) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:693) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:628) at org.apache.oozie.action.hadoop.HiveMain.runHive(HiveMain.java:325) at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:302) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:49) {noformat} We examined the logs from JHS and AM, but haven't seen anything suspicious. For some reason a {{null}} was returned but it's not obvious why. The MR job was running at this point. Some ideas: 1. We already have logging in place related to JobClient->AM and JobClient->JHS communication, but that's on TRACE level and that could be too low. It might make more sense to raise the level to DEBUG. 2. We need new {{LOG.debug()}} calls at some crucial points -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7048) AM can still crash after MAPREDUCE-7020
Peter Bacsko created MAPREDUCE-7048: --- Summary: AM can still crash after MAPREDUCE-7020 Key: MAPREDUCE-7048 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7048 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am Affects Versions: 3.1.0, 3.0.1, 2.10.0, 2.9.1, 2.8.4, 2.7.6 Reporter: Peter Bacsko Assignee: Peter Bacsko The testcase TestUberAM#testThreadDumpOnTaskTimeout was supposed to be fixed by MAPREDUCE-7020. However, it still fails, see: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7325/testReport/junit/org.apache.hadoop.mapreduce.v2/TestMRJobs/testThreadDumpOnTaskTimeout/ (note: other tests failed as well, but those look unrelated). When I tried to reproduce it locally, it failed again, although with a slightly different error message (it was actually the same as before): {noformat} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.mapreduce.v2.TestUberAM [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 128.192 s <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestUberAM [ERROR] testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestUberAM) Time elapsed: 79.539 s <<< FAILURE! java.lang.AssertionError: No AppMaster log found! expected:<1> but was:<2> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1228) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} *Root cause:* {{System.exit()}} is still invoked at {{Task.statusUpdate()}} {noformat} public void statusUpdate(TaskUmbilicalProtocol umbilical) throws IOException { int retries = MAX_RETRIES; while (true) { try { if (!umbilical.statusUpdate(getTaskID(), taskStatus).getTaskFound()) { LOG.warn("Parent died. Exiting "+taskId); System.exit(66); } taskStatus.clearStatus(); return; ... {noformat} At this point, the task was not found and return value of {{umbilical.statusUpdate()}} is false. Checking whether we run in uber mode seems to solve the problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7049) Testcase TestMRJobs#testJobClassloaderWithCustomClasses fails
Peter Bacsko created MAPREDUCE-7049: --- Summary: Testcase TestMRJobs#testJobClassloaderWithCustomClasses fails Key: MAPREDUCE-7049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7049 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, test Reporter: Peter Bacsko Assignee: Peter Bacsko The testcase TestMRJobs#testJobClassloaderWithCustomClasses fails consistently with this error: {noformat} [INFO] --- [INFO] T E S T S [INFO] --- [INFO] Running org.apache.hadoop.mapreduce.v2.TestMRJobs [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 54.325 s <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestMRJobs [ERROR] testJobClassloaderWithCustomClasses(org.apache.hadoop.mapreduce.v2.TestMRJobs) Time elapsed: 10.531 s <<< FAILURE! java.lang.AssertionError: Job status: Application application_1517928628935_0001 failed 2 times due to AM Container for appattempt_1517928628935_0001_02 exited with exitCode: 1 Failing this attempt.Diagnostics: [2018-02-06 15:50:38.688]Exception from container-launch. Container id: container_1517928628935_0001_02_01 Exit code: 1 [2018-02-06 15:50:38.693]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. [2018-02-06 15:50:38.694]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. For more detailed output, check the application tracking page: http://ubuntu:46235/cluster/app/application_1517928628935_0001 Then click on links to logs of each attempt. . Failing the application. at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloader(TestMRJobs.java:529) at org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloaderWithCustomClasses(TestMRJobs.java:477) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {noformat} Today I found the offending commit with {{git bisect}} and this failure is caused by {{YARN-2185}}. The application master fails because of the following error: {noformat} 2018-02-05 17:15:18,530 DEBUG [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException 1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1694) Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:554) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:534) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1802) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:534) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:311) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1760)
[jira] [Created] (MAPREDUCE-7052) TestFixedLengthInputFormat#testFormatCompressedIn is flaky
Peter Bacsko created MAPREDUCE-7052: --- Summary: TestFixedLengthInputFormat#testFormatCompressedIn is flaky Key: MAPREDUCE-7052 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7052 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, test Reporter: Peter Bacsko Assignee: Peter Bacsko Sometimes the test case TestFixedLengthInputFormat#testFormatCompressedIn can fail with the following error: {noformat} java.lang.OutOfMemoryError: Requested array size exceeds VM limit at org.apache.hadoop.mapred.TestFixedLengthInputFormat.runRandomTests(TestFixedLengthInputFormat.java:322) at org.apache.hadoop.mapred.TestFixedLengthInputFormat.testFormatCompressedIn(TestFixedLengthInputFormat.java:90) {noformat} *Root cause:* under special circumstances, the following line can return a huge number: {noformat} // Test a split size that is less than record len numSplits = (int)(fileSize/Math.floor(recordLength/2)); {noformat} For example, let {{seed}} be 2026428718. This causes {{recordLength}} to be 1 at iteration 19. {{Math.floor()}} returns negative Infinity, which becomes positve infinity after the divison. Casting it to {{int}} yields {{Integer.MAX_VALUE}}. Eventually we get an OOME because the test wants to create a huge {{InputSplit}} array. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7056) Ensure that mapreduce.job.reduces is not negative
Peter Bacsko created MAPREDUCE-7056: --- Summary: Ensure that mapreduce.job.reduces is not negative Key: MAPREDUCE-7056 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7056 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Reporter: Peter Bacsko Assignee: Peter Bacsko Recently we've seen a strange problem that was related to {{mapreduce.job.reduces}} being set to -1. If this value is negative, two things can happen: 1. If we use the old API, then the mappers will pass, but the number of reducers will be recorded as "-1" if we open it later from JHS. This can confuse Hadoop users. 2. If use the new API, then we'll see a not-so-obvious stack trace: {noformat} 2018-02-20 06:37:35,493 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2018-02-20 06:37:35,507 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2018-02-20 06:37:35,507 INFO [main] org.apache.hadoop.mapred.MapTask: kvbuffer is null. Skipping flush. 2018-02-20 06:37:35,508 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException at java.nio.ByteBuffer.allocate(ByteBuffer.java:334) at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:51) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1891) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1527) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:735) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:805) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) {noformat} and the job fails. We should either fail if this property is negative or set to "0" to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7064) Flaky test TestTaskAttempt#testReducerCustomResourceTypes
Peter Bacsko created MAPREDUCE-7064: --- Summary: Flaky test TestTaskAttempt#testReducerCustomResourceTypes Key: MAPREDUCE-7064 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7064 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, test Reporter: Peter Bacsko Assignee: Peter Bacsko The test {{TestTaskAttempt#testReducerCustomResourceType}} can occasionally fail with the following error: {noformat} org.apache.hadoop.yarn.exceptions.ResourceNotFoundException: Unknown resource 'a-custom-resource'. Known resources are [name: memory-mb, units: Mi, type: COUNTABLE, value: 0, minimum allocation: 0, maximum allocation: 9223372036854775807, name: vcores, units: , type: COUNTABLE, value: 0, minimum allocation: 0, maximum allocation: 9223372036854775807] at org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.createReduceTaskAttemptImplForTest(TestTaskAttempt.java:434) at org.apache.hadoop.mapreduce.v2.app.job.impl.TestTaskAttempt.testReducerCustomResourceTypes(TestTaskAttempt.java:1535) {noformat} The root cause seems to be an interference from previous tests that start instance(s) of {{FailingAttemptsMRApp}} or {{FailingAttemptsDuringAssignedMRApp}}. When I disabled these tests, {{testReducerCustomResourceTypes}} always passed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7132) Check erasure coding in JobSplitWriter to avoid warnings
Peter Bacsko created MAPREDUCE-7132: --- Summary: Check erasure coding in JobSplitWriter to avoid warnings Key: MAPREDUCE-7132 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7132 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 3.1.1 Reporter: Peter Bacsko Assignee: Peter Bacsko Currently, {{JobSplitWriter}} compares the number of hosts for a certain block against a static value that comes from {{mapreduce.job.max.split.locations}}. However, an EC shema like RS-10-4 requires at least 14 host. In this case, 14 block locations will be returned and {{JobSplitWriter}} prints a warning, which can confuse users. A possible solution could check whether EC is enabled for a block and increase this value dynamically if needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7144) Speculative execution can cause race condition
Peter Bacsko created MAPREDUCE-7144: --- Summary: Speculative execution can cause race condition Key: MAPREDUCE-7144 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7144 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Reporter: Peter Bacsko In our internal build environment, we observed that the test case {{TestMRIntermediateDataEncryption#testMultipleReducers}} was flaky and failed randomly on multiple branches. After a long investigation, it turned out that the problems were caused by speculative execution and timing issues around it. Detailed explanation: 1. AppMaster speculatively starts two reducers: {noformat} 2018-09-19 04:09:31,022 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: ATTEMPT_START task_1537355349087_0001_r_01 ... 2018-09-19 04:09:31,025 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: ATTEMPT_START task_1537355349087_0001_r_00 {noformat} 2. Both attempts are scheduled and run in parallel: {noformat} 2018-09-19 04:09:31,025 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1537355349087_0001_r_00_0 TaskAttempt Transitioned from ASSIGNED to RUNNING ... 2018-09-19 04:09:46,036 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1537355349087_0001_r_00_1 TaskAttempt Transitioned from ASSIGNED to RUNNING {noformat} 3. attempt_1537355349087_0001_r_00_1 is finished earlier and reached progress of 1.0 {noformat} 2018-09-19 04:10:05,747 INFO [IPC Server handler 3 on 36796] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1537355349087_0001_r_00_1 is : 1.0 2018-09-19 04:10:05,751 INFO [IPC Server handler 2 on 36796] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1537355349087_0001_r_00_1 {noformat} 4. There's no need for attempt_1537355349087_0001_r_00_0, so the AppMaster decides to kill it: {noformat} 2018-09-19 04:10:05,755 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Issuing kill to other attempt attempt_1537355349087_0001_r_00_0 {noformat} 5. Right after this, the MapReduce job transitions to COMMITTING phase, which involves moving files on HDFS, deleting the temporary directory and creating a file named _SUCCESS: {noformat} 2018-09-19 04:10:05,836 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange (FSNamesystem.java:deleteInt(4181)) - DIR* NameSystem.delete: /test/output/_temporary 2018-09-19 04:10:05,836 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange (FSDirectory.java:delete(1334)) - DIR* FSDirectory.delete: /test/output/_temporary 2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange (FSDirectory.java:unprotectedDelete(1480)) - DIR* FSDirectory.unprotectedDelete: _temporary is removed 2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange (FSNamesystem.java:deleteInternal(4251)) - DIR* Namesystem.delete: /test/output/_temporary is removed 2018-09-19 04:10:05,837 INFO [IPC Server handler 3 on 45026] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(9826)) - allowed=true ugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=delete src=/test/output/_temporary dst=nullperm=null proto=rpc 2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] metrics.TopMetrics (TopMetrics.java:report(122)) - a metric is reported: cmd: delete user: jenkins (auth:SIMPLE) 2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] top.TopAuditLogger (TopAuditLogger.java:logAuditEvent(78)) - --- logged event for top service: allowed=true ugi=jenkins (auth:SIMPLE) ip=/127.0.0.1 cmd=delete src=/test/output/_temporary dst=nullperm=null 2018-09-19 04:10:05,839 DEBUG [IPC Server handler 2 on 45026] hdfs.StateChange (NameNodeRpcServer.java:create(596)) - *DIR* NameNode.create: file /test/output/_SUCCESS for DFSClient_NONMAPREDUCE_-188083900_1 at 127.0.0.1 2018-09-19 04:10:05,839 DEBUG [IPC Server handler 2 on 45026] hdfs.StateChange (FSNamesystem.java:startFileInt(2748)) - DIR* NameSystem.startFile: src=/test/output/_SUCCESS, holder=DFSClient_NONMAPREDUCE_-188083900_1, clientMachine=127.0.0.1, createParent=true, replication=2, createFlag=[CREATE, OVERWRITE], blockSize=134217728, supportedVersions=[CryptoProtocolVersion{description='Encryption zones', version=2, unknownValue=null}] 2018-09-19 04:10:05,839 DEBUG [IPC Server handler 2 on 45026] namenode.FSDirectory (FSDirectory.java:copyINodeDefaultAcl(2272)) - child: _SUCCESS, posixAclInheritanceEnabled: false, modes: { masked: rw-rw-rw-, unmasked
[jira] [Created] (MAPREDUCE-7152) LD_LIBRARY_PATH is always passed from MR AM to tasks
Peter Bacsko created MAPREDUCE-7152: --- Summary: LD_LIBRARY_PATH is always passed from MR AM to tasks Key: MAPREDUCE-7152 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7152 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Peter Bacsko Assignee: Peter Bacsko {{LD_LIBRARY_PATH}} is set to {{$HADOOP_COMMON_HOME/lib/native}} by default in Hadoop (as part of {{mapreduce.admin.user.env}} and {{yarn.app.mapreduce.am.user.env}}), and passed as an environment variable from AM container to task containers in the container launch context. In cases where {{HADOOP_COMMON_HOME}} is different in AM node and task node, tasks will fail to load native library. A reliable way to fix this is to add {{LD_LIBRARY_PATH}} in {{yarn.nodemanager.admin-env}} instead. Another approach is to perform a lazy evaluation of {{LD_LIBRARY_PATH}} on the NM side. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7156) NullPointerException when reaching max shuffle connections
Peter Bacsko created MAPREDUCE-7156: --- Summary: NullPointerException when reaching max shuffle connections Key: MAPREDUCE-7156 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7156 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Peter Bacsko Assignee: Peter Bacsko -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7159) FrameworkUploader: ensure proper permissions of generated framework tar.gz if restrictive umask is used
Peter Bacsko created MAPREDUCE-7159: --- Summary: FrameworkUploader: ensure proper permissions of generated framework tar.gz if restrictive umask is used Key: MAPREDUCE-7159 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7159 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 3.1.1 Reporter: Peter Bacsko Assignee: Peter Bacsko -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7175) JobSubmitter: validateFilePath() throws an exception because it requests a local FS unnecessarily
Peter Bacsko created MAPREDUCE-7175: --- Summary: JobSubmitter: validateFilePath() throws an exception because it requests a local FS unnecessarily Key: MAPREDUCE-7175 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7175 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.9.2, 3.1.1 Reporter: Peter Bacsko Assignee: Peter Bacsko After a security fix, we receive the following exception in Oozie if we want to use {{mapreduce.job.log4j-properties-file}} {noformat} org.apache.oozie.action.ActionExecutorException: UnsupportedOperationException: Accessing local file system is not allowed at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:446) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1246) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1424) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) at org.apache.oozie.command.XCommand.call(XCommand.java:286) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.UnsupportedOperationException: Accessing local file system is not allowed at org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:48) at org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2816) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:358) at org.apache.hadoop.mapreduce.JobResourceUploader.validateFilePath(JobResourceUploader.java:303) at org.apache.hadoop.mapreduce.JobResourceUploader.copyLog4jPropertyFile(JobResourceUploader.java:248) at org.apache.hadoop.mapreduce.JobResourceUploader.addLog4jToDistributedCache(JobResourceUploader.java:223) at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:175) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1231) ... 11 more{noformat} Note that this happens even if the scheme is {{hdfs://}}. The solution is what mentioned in MAPREDUCE-6052: move FileSystem localFs = FileSystem.getLocal(conf); inside the {{if}} block. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-7175) JobSubmitter: validateFilePath() throws an exception because it requests a local FS unnecessarily
[ https://issues.apache.org/jira/browse/MAPREDUCE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved MAPREDUCE-7175. - Resolution: Duplicate > JobSubmitter: validateFilePath() throws an exception because it requests a > local FS unnecessarily > - > > Key: MAPREDUCE-7175 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7175 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 3.1.1, 2.9.2 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > After a security fix, we receive the following exception in Oozie if we want > to use {{mapreduce.job.log4j-properties-file}} > {noformat} > org.apache.oozie.action.ActionExecutorException: > UnsupportedOperationException: Accessing local file system is not allowed > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:446) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1246) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1424) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:232) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:286) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:332) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:261) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:179) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.UnsupportedOperationException: Accessing local file > system is not allowed > at > org.apache.hadoop.fs.RawLocalFileSystem.initialize(RawLocalFileSystem.java:48) > at org.apache.hadoop.fs.LocalFileSystem.initialize(LocalFileSystem.java:47) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2816) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) > at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:358) > at > org.apache.hadoop.mapreduce.JobResourceUploader.validateFilePath(JobResourceUploader.java:303) > at > org.apache.hadoop.mapreduce.JobResourceUploader.copyLog4jPropertyFile(JobResourceUploader.java:248) > at > org.apache.hadoop.mapreduce.JobResourceUploader.addLog4jToDistributedCache(JobResourceUploader.java:223) > at > org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:175) > at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:99) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:194) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:578) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:573) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:573) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:564) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1231) > ... 11 more{noformat} > > Note that this happens even if the scheme is {{hdfs://}}. The solution is > what mentioned in MAPREDUCE-6052: move > {noformat} > FileSystem localFs = FileSystem.getLocal(conf);{noformat} > inside the {{if}} block. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Reopened] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko reopened MAPREDUCE-6441: - Reopening this to attach patch for branch-3.1 too. > Improve temporary directory name generation in LocalDistributedCacheManager > for concurrent processes > > > Key: MAPREDUCE-6441 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: William Watson >Assignee: Haibo Chen >Priority: Major > Fix For: 3.2.0 > > Attachments: HADOOP-10924.02.patch, > HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441-branch-3.1.001.patch, > MAPREDUCE-6441.004.patch, MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, > MAPREDUCE-6441.008.patch, MAPREDUCE-6441.009.patch, MAPREDUCE-6441.010.patch, > MAPREDUCE-6441.011.patch > > > Kicking off many sqoop processes in different threads results in: > {code} > 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: > Encountered IOException running import job: java.io.IOException: > java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot > overwrite non empty destination directory > /tmp/hadoop-hadoop/mapred/local/1406915233073 > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > java.security.AccessController.doPrivileged(Native Method) > 2014-08-01 13:47:24 -0400: INFO -at > javax.security.auth.Subject.doAs(Subject.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.run(Sqoop.java:145) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.main(Sqoop.java:238) > {code} > If two are kicked off in the same second. The issue is the following lines of > code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = >new AtomicLong(System.currentTimeMillis()); > {code} > and > {code} > Long.toString(uniqueNumberGenerator.incrementAndGet())), > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Reopened] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error
[ https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko reopened MAPREDUCE-7240: - Reopening it to attach patches for branch-3.2 and branch-3.1. > Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at > SUCCESS_FINISHING_CONTAINER' cause job error > > > Key: MAPREDUCE-7240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.2 >Reporter: luhuachao >Assignee: luhuachao >Priority: Critical > Labels: Reviewed, applicationmaster, mrv2 > Fix For: 3.3.0 > > Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, > application_1566552310686_260041.log > > > *log in appmaster* > {noformat} > 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_52_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_49_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_51_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_50_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_53_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to > FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454 > 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_49_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_51_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.
[jira] [Created] (MAPREDUCE-7250) FrameworkUploader: add option to skip replication check
Peter Bacsko created MAPREDUCE-7250: --- Summary: FrameworkUploader: add option to skip replication check Key: MAPREDUCE-7250 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Peter Bacsko Assignee: Peter Bacsko The framework uploader tool has this piece of code which makes sure that all block of the uploaded mapreduce tarball has been replicated: {noformat} while(endTime - startTime < timeout * 1000 && currentReplication < acceptableReplication) { Thread.sleep(1000); endTime = System.currentTimeMillis(); currentReplication = getSmallestReplicatedBlockCount(); } {noformat} There are cases, however, when we don't want to wait for this (eg. we want to speed up Hadoop installation). I suggest adding {{--skiprelicationcheck}} switch which disables this replication test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7273) JHS: make sure that Kerberos relogin is performed when KDC becomes offline then online again
Peter Bacsko created MAPREDUCE-7273: --- Summary: JHS: make sure that Kerberos relogin is performed when KDC becomes offline then online again Key: MAPREDUCE-7273 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7273 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Peter Bacsko Assignee: Peter Bacsko In JHS, if the KDC goes offline, the IPC layer does try to relogin, but it's not always enough. You have to wait for 60 seconds for the next retry. In the meantime, if the KDC comes back, the following error might occur: {noformat} 2020-04-09 03:27:52,075 DEBUG ipc.Server (Server.java:processSaslToken(1952)) - Have read input token of size 708 for processing by saslServer.evaluateResponse() 2020-04-09 03:27:52,077 DEBUG ipc.Server (Server.java:saslProcess(1829)) - javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Invalid argument (400) - Cannot find key of appropriate type to decrypt AP REP - AES128 CTS mode with HMAC SHA1-96)] at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199) ... {noformat} When this happens, JHS has to be restarted. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7302) Upgrading to JUnit 4.13 causes tests in TestFetcher.testCorruptedIFile() fail
Peter Bacsko created MAPREDUCE-7302: --- Summary: Upgrading to JUnit 4.13 causes tests in TestFetcher.testCorruptedIFile() fail Key: MAPREDUCE-7302 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7302 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Peter Bacsko Assignee: Peter Bacsko See related ticket YARN-10460. JUnit 4.13 causes the same test failure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7303) Fix TestJobResourceUploader failures after HADOOP-16878
Peter Bacsko created MAPREDUCE-7303: --- Summary: Fix TestJobResourceUploader failures after HADOOP-16878 Key: MAPREDUCE-7303 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7303 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Peter Bacsko Assignee: Peter Bacsko Currently, two test cases fail with NPE: {{org.apache.hadoop.mapreduce.TestJobResourceUploader.testOriginalPathIsRoot()}} {{org.apache.hadoop.mapreduce.TestJobResourceUploader.testOriginalPathEndsInSlash()}} Root cause is the src/dst qualified path check introduced by HADOOP-16878. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7352) ArithmeticException in some MapReduce tests
Peter Bacsko created MAPREDUCE-7352: --- Summary: ArithmeticException in some MapReduce tests Key: MAPREDUCE-7352 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7352 Project: Hadoop Map/Reduce Issue Type: Task Components: test Reporter: Peter Bacsko Assignee: Peter Bacsko There are some ArithmeticException failures in certain MapReduce test cases, for example: {noformat} 2021-06-14 14:14:20,078 INFO [main] service.AbstractService (AbstractService.java:noteFailure(267)) - Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STARTED java.lang.ArithmeticException: / by zero at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1015) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:301) at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:285) at org.apache.hadoop.mapreduce.v2.app.TestMRApp.testUpdatedNodes(TestMRApp.java:223) {noformat} We have to set {{detailsInterval}} when the async dispatcher is spied. For some reason, despite the fact that {{serviceInit()}} is called, this variable remains zero. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-7352) ArithmeticException in some MapReduce tests
[ https://issues.apache.org/jira/browse/MAPREDUCE-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Bacsko resolved MAPREDUCE-7352. - Resolution: Duplicate > ArithmeticException in some MapReduce tests > --- > > Key: MAPREDUCE-7352 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7352 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > > There are some ArithmeticException failures in certain MapReduce test cases, > for example: > {noformat} > 2021-06-14 14:14:20,078 INFO [main] service.AbstractService > (AbstractService.java:noteFailure(267)) - Service > org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state STARTED > java.lang.ArithmeticException: / by zero > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:304) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1015) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) > at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:301) > at org.apache.hadoop.mapreduce.v2.app.MRApp.submit(MRApp.java:285) > at > org.apache.hadoop.mapreduce.v2.app.TestMRApp.testUpdatedNodes(TestMRApp.java:223) > {noformat} > We have to set {{detailsInterval}} when the async dispatcher is spied. For > some reason, despite the fact that {{serviceInit()}} is called, this variable > remains zero. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org