[jira] [Commented] (MAPREDUCE-6555) TestMRAppMaster fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15026767#comment-15026767 ] Junping Du commented on MAPREDUCE-6555: --- bq. I'm checking by running the test case. Please wait a moment. The Jenkins test report above already show it. Isn't it? > TestMRAppMaster fails on trunk > -- > > Key: MAPREDUCE-6555 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6555 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Junping Du > Attachments: MAPREDUCE-6555.patch > > > Observed in QA report of YARN-3840 > {noformat} > Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > Tests run: 9, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 20.699 sec > <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > testMRAppMasterMidLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.474 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterMidLock(TestMRAppMaster.java:174) > testMRAppMasterSuccessLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.175 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.FileNotFoundException: File > file:/home/varun/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/staging/history/done_intermediate/TestAppMasterUser/job_1317529182569_0004-1448100479292-TestAppMasterUser-%3Cmissing+job+name%3E-1448100479413-0-0-SUCCEEDED-default-1448100479292.jhist_tmp > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:640) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) > at > org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:372) > at > org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:513) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.moveTmpToDone(JobHistoryEventHandler.java:1346) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1154) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1751) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1247) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterSuccessLock(TestMRAppMaster.java:254) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6557) Some tests in mapreduce-client-app are writing outside of target
[ https://issues.apache.org/jira/browse/MAPREDUCE-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15026743#comment-15026743 ] Junping Du commented on MAPREDUCE-6557: --- Thanks for the patch, [~ajisakaa]! As I just mentioned in MAPREDUCE-6555, may be we also want to fix the issue that the directory is not cleanup after test finish? > Some tests in mapreduce-client-app are writing outside of target > > > Key: MAPREDUCE-6557 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6557 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: Akira AJISAKA >Priority: Blocker > Attachments: MAPREDUCE-6557.00.patch > > > There is a staging directory appearing. It should not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6555) TestMRAppMaster fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15026732#comment-15026732 ] Junping Du commented on MAPREDUCE-6555: --- bq. TestMRAppMaster is using directory other than test.build.data, so the intermediate files are checked by Apache Rat. We should fix it in a separate jira. Agree. The worse thing is the test doesn't cleanup the directory in the end because the cleanup() get called before every tests rather than after. Let's file a separate JIRA to fix them. > TestMRAppMaster fails on trunk > -- > > Key: MAPREDUCE-6555 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6555 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Junping Du > Attachments: MAPREDUCE-6555.patch > > > Observed in QA report of YARN-3840 > {noformat} > Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > Tests run: 9, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 20.699 sec > <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > testMRAppMasterMidLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.474 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterMidLock(TestMRAppMaster.java:174) > testMRAppMasterSuccessLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.175 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.FileNotFoundException: File > file:/home/varun/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/staging/history/done_intermediate/TestAppMasterUser/job_1317529182569_0004-1448100479292-TestAppMasterUser-%3Cmissing+job+name%3E-1448100479413-0-0-SUCCEEDED-default-1448100479292.jhist_tmp > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:640) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) > at > org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:372) > at > org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:513) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.moveTmpToDone(JobHistoryEventHandler.java:1346) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1154) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1751) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1247) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterSuccessLock(TestMRAppMaster.java:254) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MAPREDUCE-6545) Test committer.commitJob() behavior during committing when MR AM get failed.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned MAPREDUCE-6545: - Assignee: Junping Du > Test committer.commitJob() behavior during committing when MR AM get failed. > > > Key: MAPREDUCE-6545 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6545 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Junping Du >Assignee: Junping Du > > In MAPREDUCE-5485, we are adding additional API (isCommitJobRepeatable) to > allow job commit can tolerate AM failure in some cases (like > FileOutputCommitter in v2 algorithm). Although we have unit test to cover > most of flows, we may want a completed end to end test to verify the whole > work flow. > The scenario include: > 1. For FileOutputCommitter (or some sub class), emulate a MR AM failure or > restart during commitJob() in progress > 2. Check different behavior for v1 and v2 (support isCommitJobRepeatable() or > not) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6555) TestMRAppMaster fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6555: -- Target Version/s: 3.0.0 > TestMRAppMaster fails on trunk > -- > > Key: MAPREDUCE-6555 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6555 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Junping Du > Attachments: MAPREDUCE-6555.patch > > > Observed in QA report of YARN-3840 > {noformat} > Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > Tests run: 9, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 20.699 sec > <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > testMRAppMasterMidLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.474 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterMidLock(TestMRAppMaster.java:174) > testMRAppMasterSuccessLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.175 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.FileNotFoundException: File > file:/home/varun/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/staging/history/done_intermediate/TestAppMasterUser/job_1317529182569_0004-1448100479292-TestAppMasterUser-%3Cmissing+job+name%3E-1448100479413-0-0-SUCCEEDED-default-1448100479292.jhist_tmp > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:640) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) > at > org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:372) > at > org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:513) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.moveTmpToDone(JobHistoryEventHandler.java:1346) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1154) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1751) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1247) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterSuccessLock(TestMRAppMaster.java:254) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6555) TestMRAppMaster fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6555: -- Status: Patch Available (was: Open) > TestMRAppMaster fails on trunk > -- > > Key: MAPREDUCE-6555 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6555 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Junping Du > Attachments: MAPREDUCE-6555.patch > > > Observed in QA report of YARN-3840 > {noformat} > Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > Tests run: 9, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 20.699 sec > <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > testMRAppMasterMidLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.474 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterMidLock(TestMRAppMaster.java:174) > testMRAppMasterSuccessLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.175 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.FileNotFoundException: File > file:/home/varun/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/staging/history/done_intermediate/TestAppMasterUser/job_1317529182569_0004-1448100479292-TestAppMasterUser-%3Cmissing+job+name%3E-1448100479413-0-0-SUCCEEDED-default-1448100479292.jhist_tmp > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:640) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) > at > org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:372) > at > org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:513) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.moveTmpToDone(JobHistoryEventHandler.java:1346) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1154) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1751) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1247) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterSuccessLock(TestMRAppMaster.java:254) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6555) TestMRAppMaster fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024466#comment-15024466 ] Junping Du commented on MAPREDUCE-6555: --- The previous failure is because since MAPREDUCE-5485, we allow MR job can retry on AM failure during committing stage (if Committer is repeatable). So MRAppMaster.initAndStartAppMaster() won't throw fatal exception if there are commit start file exists (which hints previous AM failed in the middle of commit) for FileOutputCommitter which is default for version 2 algorithm in trunk. I think we don't need this fix in branch-2 as the version in branch-2 is 1. > TestMRAppMaster fails on trunk > -- > > Key: MAPREDUCE-6555 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6555 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Junping Du > Attachments: MAPREDUCE-6555.patch > > > Observed in QA report of YARN-3840 > {noformat} > Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > Tests run: 9, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 20.699 sec > <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > testMRAppMasterMidLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.474 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterMidLock(TestMRAppMaster.java:174) > testMRAppMasterSuccessLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.175 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.FileNotFoundException: File > file:/home/varun/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/staging/history/done_intermediate/TestAppMasterUser/job_1317529182569_0004-1448100479292-TestAppMasterUser-%3Cmissing+job+name%3E-1448100479413-0-0-SUCCEEDED-default-1448100479292.jhist_tmp > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:640) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) > at > org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:372) > at > org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:513) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.moveTmpToDone(JobHistoryEventHandler.java:1346) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1154) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1751) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1247) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterSuccessLock(TestMRAppMaster.java:254) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6555) TestMRAppMaster fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6555: -- Attachment: MAPREDUCE-6555.patch Upload a quick fix to test failure. > TestMRAppMaster fails on trunk > -- > > Key: MAPREDUCE-6555 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6555 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Junping Du > Attachments: MAPREDUCE-6555.patch > > > Observed in QA report of YARN-3840 > {noformat} > Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > Tests run: 9, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 20.699 sec > <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > testMRAppMasterMidLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.474 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterMidLock(TestMRAppMaster.java:174) > testMRAppMasterSuccessLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.175 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.FileNotFoundException: File > file:/home/varun/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/staging/history/done_intermediate/TestAppMasterUser/job_1317529182569_0004-1448100479292-TestAppMasterUser-%3Cmissing+job+name%3E-1448100479413-0-0-SUCCEEDED-default-1448100479292.jhist_tmp > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:640) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) > at > org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:372) > at > org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:513) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.moveTmpToDone(JobHistoryEventHandler.java:1346) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1154) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1751) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1247) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterSuccessLock(TestMRAppMaster.java:254) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MAPREDUCE-6555) TestMRAppMaster fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reassigned MAPREDUCE-6555: - Assignee: Junping Du > TestMRAppMaster fails on trunk > -- > > Key: MAPREDUCE-6555 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6555 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Varun Saxena >Assignee: Junping Du > Attachments: MAPREDUCE-6555.patch > > > Observed in QA report of YARN-3840 > {noformat} > Running org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > Tests run: 9, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 20.699 sec > <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster > testMRAppMasterMidLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.474 sec <<< FAILURE! > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterMidLock(TestMRAppMaster.java:174) > testMRAppMasterSuccessLock(org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster) > Time elapsed: 0.175 sec <<< ERROR! > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.FileNotFoundException: File > file:/home/varun/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/staging/history/done_intermediate/TestAppMasterUser/job_1317529182569_0004-1448100479292-TestAppMasterUser-%3Cmissing+job+name%3E-1448100479413-0-0-SUCCEEDED-default-1448100479292.jhist_tmp > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:640) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:866) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:630) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) > at > org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:372) > at > org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:513) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.moveTmpToDone(JobHistoryEventHandler.java:1346) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1154) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) > at > org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1751) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1247) > at > org.apache.hadoop.mapreduce.v2.app.TestMRAppMaster.testMRAppMasterSuccessLock(TestMRAppMaster.java:254) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024401#comment-15024401 ] Junping Du commented on MAPREDUCE-5485: --- Sure. Thanks [~ajisakaa] for reminding on this! > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.3 > > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, > MAPREDUCE-5485-v5-branch-2.7.patch, MAPREDUCE-5485-v5.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Attachment: MAPREDUCE-5485-v5-branch-2.7.patch Upload a patch for branch-2.7 in case we want to commit it for 2.7.3. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, > MAPREDUCE-5485-v5-branch-2.7.patch, MAPREDUCE-5485-v5.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007167#comment-15007167 ] Junping Du commented on MAPREDUCE-5485: --- bq. Junping Du, there are some findbugs and ut failures, mind checking ? The unit test is unrelated and just fixed in MAPREDUCE-6533. The fingbugs warning belongs to wrong checking as org.apache.hadoop.mapred.FileOutputCommitter.isCommitJobRepeatable(Context) override the right method in parent abstract class (it wrongly recognize to override another abstract method). So I think we should ignore these warnings. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, > MAPREDUCE-5485-v5.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6542: -- Hadoop Flags: (was: Reviewed) > HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe > - > > Key: MAPREDUCE-6542 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6542 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.2.0, 2.7.1 > Environment: CentOS6.5 Hadoop >Reporter: zhangyubiao >Assignee: zhangyubiao > Attachments: MAPREDUCE-6542-v2.patch, MAPREDUCE-6542.patch > > > I use SimpleDateFormat to Parse the JobHistory File before > private static final SimpleDateFormat dateFormat = > new SimpleDateFormat("-MM-dd HH:mm:ss"); > public static String getJobDetail(JobInfo job) { > StringBuffer jobDetails = new StringBuffer(""); > SummarizedJob ts = new SummarizedJob(job); > jobDetails.append(job.getJobId().toString().trim()).append("\t"); > jobDetails.append(job.getUsername()).append("\t"); > jobDetails.append(job.getJobname().replaceAll("\\n", > "")).append("\t"); > jobDetails.append(job.getJobQueueName()).append("\t"); > jobDetails.append(job.getPriority()).append("\t"); > jobDetails.append(job.getJobConfPath()).append("\t"); > jobDetails.append(job.getUberized()).append("\t"); > > jobDetails.append(dateFormat.format(job.getSubmitTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getLaunchTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getFinishTime())).append("\t"); >return jobDetails.toString(); > } > But I find I query the SubmitTime and LaunchTime in hive and compare > JobHistory File time , I find that the submitTime and launchTime was wrong. > Finally,I chang to use the FastDateFormat to parse the time format and the > time become right > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6542: -- Status: Patch Available (was: Open) > HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe > - > > Key: MAPREDUCE-6542 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6542 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.7.1, 2.2.0 > Environment: CentOS6.5 Hadoop >Reporter: zhangyubiao >Assignee: zhangyubiao > Attachments: MAPREDUCE-6542-v2.patch, MAPREDUCE-6542.patch > > > I use SimpleDateFormat to Parse the JobHistory File before > private static final SimpleDateFormat dateFormat = > new SimpleDateFormat("-MM-dd HH:mm:ss"); > public static String getJobDetail(JobInfo job) { > StringBuffer jobDetails = new StringBuffer(""); > SummarizedJob ts = new SummarizedJob(job); > jobDetails.append(job.getJobId().toString().trim()).append("\t"); > jobDetails.append(job.getUsername()).append("\t"); > jobDetails.append(job.getJobname().replaceAll("\\n", > "")).append("\t"); > jobDetails.append(job.getJobQueueName()).append("\t"); > jobDetails.append(job.getPriority()).append("\t"); > jobDetails.append(job.getJobConfPath()).append("\t"); > jobDetails.append(job.getUberized()).append("\t"); > > jobDetails.append(dateFormat.format(job.getSubmitTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getLaunchTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getFinishTime())).append("\t"); >return jobDetails.toString(); > } > But I find I query the SubmitTime and LaunchTime in hive and compare > JobHistory File time , I find that the submitTime and launchTime was wrong. > Finally,I chang to use the FastDateFormat to parse the time format and the > time become right > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe
[ https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002265#comment-15002265 ] Junping Du commented on MAPREDUCE-6542: --- Hi [~piaoyu zhang], thanks for the patch! The "Reviewed" flag is only set after some committer give +1 on your patch. In addition, you should click "Submit Patch" when you upload a new patch so your JIRA status will become "Patch Available" and Jenkins can trigger test automatically on your patch. I will fix it for you at this time. > HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe > - > > Key: MAPREDUCE-6542 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6542 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.2.0, 2.7.1 > Environment: CentOS6.5 Hadoop >Reporter: zhangyubiao >Assignee: zhangyubiao > Attachments: MAPREDUCE-6542-v2.patch, MAPREDUCE-6542.patch > > > I use SimpleDateFormat to Parse the JobHistory File before > private static final SimpleDateFormat dateFormat = > new SimpleDateFormat("-MM-dd HH:mm:ss"); > public static String getJobDetail(JobInfo job) { > StringBuffer jobDetails = new StringBuffer(""); > SummarizedJob ts = new SummarizedJob(job); > jobDetails.append(job.getJobId().toString().trim()).append("\t"); > jobDetails.append(job.getUsername()).append("\t"); > jobDetails.append(job.getJobname().replaceAll("\\n", > "")).append("\t"); > jobDetails.append(job.getJobQueueName()).append("\t"); > jobDetails.append(job.getPriority()).append("\t"); > jobDetails.append(job.getJobConfPath()).append("\t"); > jobDetails.append(job.getUberized()).append("\t"); > > jobDetails.append(dateFormat.format(job.getSubmitTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getLaunchTime())).append("\t"); > > jobDetails.append(dateFormat.format(job.getFinishTime())).append("\t"); >return jobDetails.toString(); > } > But I find I query the SubmitTime and LaunchTime in hive and compare > JobHistory File time , I find that the submitTime and launchTime was wrong. > Finally,I chang to use the FastDateFormat to parse the time format and the > time become right > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000916#comment-15000916 ] Junping Du commented on MAPREDUCE-5485: --- bq. About the overall test. The main overall change is to allow the retry AM to continue after seeing an in-progress commit from the previous AM. It seems incomplete to not have a test for that. I agree that it is better to add as many cases as possible in unit test. But due to limitations of our current unit test framework, we could miss many functional tests, especially related to MR AM failed/restart, like: in rolling upgrade story, we don't have tests to check AM failed over during NM/RM restart. Instead, we may have to split the whole functionality into pieces and test each piece. Sometime it is sad that this may not be good enough and that's why we still need to test/verify the feature works end to end on a real cluster. bq. However if you think that we dont have existing infra for that code path then we should create a follow up jira to add that infra and relevant tests. I have not followed the MR AM code changes for a while and so I cannot recall of the top of my head about any existing test cases. Maybe other committers may have some ideas. Just filed MAPREDUCE-6545 to track more test effort that comes later. bq. With that caveat, the latest patch looks good to me. Thanks for your patience through the reviews. Thanks Bikas for your carefully review. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, > MAPREDUCE-5485-v5.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (MAPREDUCE-6545) Test committer.commitJob() behavior during committing when MR AM get failed.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du moved YARN-4346 to MAPREDUCE-6545: - Target Version/s: (was: 2.8.0) Key: MAPREDUCE-6545 (was: YARN-4346) Project: Hadoop Map/Reduce (was: Hadoop YARN) > Test committer.commitJob() behavior during committing when MR AM get failed. > > > Key: MAPREDUCE-6545 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6545 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Junping Du > > In MAPREDUCE-5485, we are adding additional API (isCommitJobRepeatable) to > allow job commit can tolerate AM failure in some cases (like > FileOutputCommitter in v2 algorithm). Although we have unit test to cover > most of flows, we may want a completed end to end test to verify the whole > work flow. > The scenario include: > 1. For FileOutputCommitter (or some sub class), emulate a MR AM failure or > restart during commitJob() in progress > 2. Check different behavior for v1 and v2 (support isCommitJobRepeatable() or > not) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6544) yarn rmadmin -updateNodeResource doesn't work
Junping Du created MAPREDUCE-6544: - Summary: yarn rmadmin -updateNodeResource doesn't work Key: MAPREDUCE-6544 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6544 Project: Hadoop Map/Reduce Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.1 Reporter: Junping Du Assignee: Junping Du Priority: Critical YARN-313 add CLI to update node resource. It works fine for batch mode update. However, for single node update "yarn rmadmin -updateNodeResource" failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000727#comment-15000727 ] Junping Du commented on MAPREDUCE-5485: --- The test failure is not related and I believe nothing could do with findbugs/checkstyle issues. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, > MAPREDUCE-5485-v5.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Attachment: MAPREDUCE-5485-v5.patch Update v5 patch to address Bikas comments above. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch, > MAPREDUCE-5485-v5.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000472#comment-15000472 ] Junping Du commented on MAPREDUCE-5485: --- bq. This introduces duplication of code for checking commit status and can cause a bug if the logic changes in either place. And also makes extra RPC calls to HDFS for checking file status - which is avoidable. Moving the code to the place where earlier we were failing due to in-progress commit, will allow this method to do exactly as it name suggests - cleanup in progress commit markers. Does that clarify? Thanks for clarifying. That sounds good. Will update in v5 patch. bq. 1) Test MR Appmaster new functionality that allows commit to proceed in a retried AM if commit is repeatable. Theoretically, I agree it is nice to have something fully functional. However, I don't think it is easy to have for this case. Do we have other tests on job commit (not retry) with launching AppMaster fully functional? If not, I would prefer to add it later in another JIRA if we have more ideas on how to do it. bq. 2) Test in FileOutputCommitter that for repeatable commit - a filenotfoundexception is not counted as an error (new behavior). Can you check FileOutputCommitter#testCommitterRepeatableV1() and FileOutputCommitter#testCommitterRepeatableV2()? > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6533) testDetermineCacheVisibilities of TestClientDistributedCacheManager is broken
[ https://issues.apache.org/jira/browse/MAPREDUCE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000263#comment-15000263 ] Junping Du commented on MAPREDUCE-6533: --- The test failure for testDetermineCache() is quite annoying. Thanks [~lichangleo] for working on it and [~jlowe] for reviewing it. v4 patch looks good in overall except one place: {code} + private static final Path TEST_ROOT_DIR = + new Path(System.getProperty("test.build.data", "/tmp")); {code} We should replace "/tmp" with System.getProperty("java.io.tmpdir") for tests to run smoothly in platform other than Linux. > testDetermineCacheVisibilities of TestClientDistributedCacheManager is broken > - > > Key: MAPREDUCE-6533 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6533 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chang Li >Assignee: Chang Li > Attachments: MAPREDUCE-6533.2.patch, MAPREDUCE-6533.3.patch, > MAPREDUCE-6533.4.patch, MAPREDUCE-6533.4.patch, MAPREDUCE-6533.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6505) Migrate io test cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-6505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15000240#comment-15000240 ] Junping Du commented on MAPREDUCE-6505: --- I think this jira should be moved to hadoop project instead of mapreduce project as all changes happen in hadoop-common. > Migrate io test cases > - > > Key: MAPREDUCE-6505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6505 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: test >Reporter: Dustin Cote >Assignee: Dustin Cote >Priority: Trivial > Attachments: MAPREDUCE-6505-1.patch, MAPREDUCE-6505-2.patch, > MAPREDUCE-6505-3.patch > > > Migrating just the io test cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Attachment: MAPREDUCE-5485-v4.1.patch Fix minor issues with white spaces, findbugs, etc. in v4.1 patch. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.1.patch, MAPREDUCE-5485-v4.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Attachment: MAPREDUCE-5485-v4.patch > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch, MAPREDUCE-5485-v4.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998995#comment-14998995 ] Junping Du commented on MAPREDUCE-5485: --- Thanks [~bikassaha] for comments! bq. Is the above check too early in the code. E.g. IIRC, at this point we have not checked whether the previous job commit was succeeded or failed - in which case we cannot recover and there is nothing to do. The cleanupInterruptedCommit() already check previous job commit succeed or failed. Am I missing anything here? bq. Also, we have already changed the startCommit operation to be repeatable via the overwrite flag. After that is there a need to delete the files upfront. Delete may be an expensive operation on some cloud stores. I don't see much different with deletion/write a small or empty file with overwrite an existing file (updating timestamp, contents) in any cloud stores. I just prefer not to add additional if-else cases to existing ones that is already complicated to me. If we do observe the performance differences in real cluster, we can optimize it then. What do you think? bq. Mapred javadoc fixes are missing. Also there are some typos in there. E.g. Nice catch! Will fix it in v4 patch. bq. This part of the code change could use some tests. Ok. Add TestFileOutputCommitter for class in Mapred package. bq. Tests for repeatable success marker file and FileExistsException for repeatable deletes would be good to have. Previously, success marker file is being added as overwrite (fs.create() default to be overwrite), so no much different here. Add additional tests on duplicated job commit in v4 patch. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997796#comment-14997796 ] Junping Du commented on MAPREDUCE-5485: --- Actually, the test failure of TestClientDistributedCacheManager.testDetermineCacheVisibilities is already tracked by MAPREDUCE-6533. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Attachment: MAPREDUCE-5485-v3.1.patch v3.1 patch to fix minor issues (whitespace, checkstyle, java doc, etc.) reported by Jenkins. The unit test failure TestClientDistributedCacheManager.testDetermineCacheVisibilities is not related, will file a separated JIRA to fix it. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.1.patch, > MAPREDUCE-5485-v3.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14997661#comment-14997661 ] Junping Du commented on MAPREDUCE-5485: --- Upload v3 patch to address following comments: bq. I am not disagreeing with the AM retry in an absolute sense. However, it does not seem to belong to this jira and is likely better done as a follow up. Make sense. We can separate this part (AM retry after commit failure) out. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Attachment: MAPREDUCE-5485-v3.patch > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch, MAPREDUCE-5485-v3.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996979#comment-14996979 ] Junping Du commented on MAPREDUCE-5485: --- bq. doing ++retries here can remove code duplication for the < check in the while? Sorry. I miss this comment in my patch just uploaded. Will update in next patch. bq. Even for a non-repeatable committer, if there is a classpath issue (which can get fixed by retrying the AM) then the AM should retry, right? I agree this could be a potentially separated topic. However, it could take more time and effort to make sure the retry on non-repeatable committer won't bring risk to cause a successl commit which is not right for result and should get failed earlier. For repeatable committer, it seems no such risk but it could paid price of unnecessary retry in some cases but earn more chance for succeed in commit stage in other cases, especially you cannot differentiate the case belongs to former or later. Just like the exception of deleting temp directory failed, it could due to AM connection with HDFS (we should retry) or HDFS down permanently (we shouldn't retry). I would prefer the current trade-off: simple and best effort for commit success in repeatable case. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Attachment: MAPREDUCE-5485-v2.patch Update patch according to review comments from Bikas. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch, MAPREDUCE-5485-v2.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6096) SummarizedJob class NPEs with some jhist files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6096: -- Assignee: zhangyubiao > SummarizedJob class NPEs with some jhist files > -- > > Key: MAPREDUCE-6096 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6096 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: zhangyubiao >Assignee: zhangyubiao > Labels: easyfix, patch > Attachments: MAPREDUCE-6096-v8.patch, > job_1446203652278_66705-1446308686422-dd_edw-insert+overwrite+table+bkactiv...dp%3D%27ACTIVE%27%28Stage-1446308802181-233-0-SUCCEEDED-bdp_jdw_corejob.jhist > > > When I Parse the JobHistory in the HistoryFile,I use the Hadoop System's > map-reduce-client-core project > org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser class and > HistoryViewer$SummarizedJob to Parse the JobHistoryFile(Just Like > job_1408862281971_489761-1410883171851_XXX.jhist) > and it throw an Exception Just Like > Exception in thread "pool-1-thread-1" java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.jobhistory.HistoryViewer$SummarizedJob.(HistoryViewer.java:626) > at > com.jd.hadoop.log.parse.ParseLogService.getJobDetail(ParseLogService.java:70) > After I'm see the SummarizedJob class I find that attempt.getTaskStatus() is > NULL , So I change the order of > attempt.getTaskStatus().equals (TaskStatus.State.FAILED.toString()) to > TaskStatus.State.FAILED.toString().equals(attempt.getTaskStatus()) > and it works well . > So I wonder If we can change all attempt.getTaskStatus() after > TaskStatus.State.XXX.toString() ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6096) SummarizedJob class NPEs with some jhist files
[ https://issues.apache.org/jira/browse/MAPREDUCE-6096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6096: -- Labels: easyfix patch (was: BB2015-05-TBR easyfix patch) > SummarizedJob class NPEs with some jhist files > -- > > Key: MAPREDUCE-6096 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6096 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: zhangyubiao > Labels: easyfix, patch > Attachments: MAPREDUCE-6096-v8.patch, > job_1446203652278_66705-1446308686422-dd_edw-insert+overwrite+table+bkactiv...dp%3D%27ACTIVE%27%28Stage-1446308802181-233-0-SUCCEEDED-bdp_jdw_corejob.jhist > > > When I Parse the JobHistory in the HistoryFile,I use the Hadoop System's > map-reduce-client-core project > org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser class and > HistoryViewer$SummarizedJob to Parse the JobHistoryFile(Just Like > job_1408862281971_489761-1410883171851_XXX.jhist) > and it throw an Exception Just Like > Exception in thread "pool-1-thread-1" java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.jobhistory.HistoryViewer$SummarizedJob.(HistoryViewer.java:626) > at > com.jd.hadoop.log.parse.ParseLogService.getJobDetail(ParseLogService.java:70) > After I'm see the SummarizedJob class I find that attempt.getTaskStatus() is > NULL , So I change the order of > attempt.getTaskStatus().equals (TaskStatus.State.FAILED.toString()) to > TaskStatus.State.FAILED.toString().equals(attempt.getTaskStatus()) > and it works well . > So I wonder If we can change all attempt.getTaskStatus() after > TaskStatus.State.XXX.toString() ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992102#comment-14992102 ] Junping Du commented on MAPREDUCE-5485: --- Thanks Bikas for review and comments! bq. If the commit actually failed then there does not seem any reason to assume that retrying it will succeed. IMO if the commit reports a failure then AM should fail. Similarly, if a commit failure file exists (from a previous AM version) then the new version of the AM should respect that and fail since the commit has been reported to be failed. There are still reasons that related to AM specific, i.e. previous AM cannot connect to FS (FS or other CloudFS), committer mis-behavior because of getting loaded incorrect (due to classpath or other defect), etc. I think it make sense to do the best effort to retry the commit failure (like other reason to cause AM failure) given the commit is repeatable and all tasks are done successfully. bq. Javadoc could be improved. Inline Yes. I will. bq. num-retries instead of retries? Also, if its num-retries then default should be 0. If its num-attempts then default should be 1. Ok. I will update to attempts. default to be 1 means no retry to keep consistent with previous behavior. bq. Retry count checking code in the catch block subsumes the check retry count check in the while block? I don't think so. Can you take a look at it again? bq. The previous operation could delete the path after the if check has succeeded. So we probably also need to catch FileNotFoundException exception class here and ignore it if repeatableCommit is true. That's good point. Will fix it. bq. Do testcases need an @Test annotation? No. The test class extends TestCase, so all method start with "test" will be executed automatically. bq. firstTimeFail is probably a more clear name for what its doing - failing on the first attempt. Would be good to have a test that version 2 and retry = 1 will fail also. Testcases missing for specific changes in FileOutputCommitter for create/delete operation changes? Sounds good. Will fix/add later. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Priority: Critical (was: Major) > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Target Version/s: 2.6.3, 2.7.3 > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Attachment: MAPREDUCE-5485-v1.patch Attach a new patch to address Bikas comments above, include: 1. Make retry logic go to committer.commitJob() rather than MRAppMaster 2. It will fail AM instead of Job when exception happens during jobCommit if commitJob() is repeatable. 3. Add related unit tests. Verify this feature works well on a small scale cluster that kill AM during job committing stage, and the job can continue and succeed after AM restarted. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch, > MAPREDUCE-5485-v1.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6528) Memory leak for HistoryFileManager.getJobSummary()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982852#comment-14982852 ] Junping Du commented on MAPREDUCE-6528: --- Thanks Jason to review and commit this in! > Memory leak for HistoryFileManager.getJobSummary() > -- > > Key: MAPREDUCE-6528 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6528 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.2, 2.6.3 > > Attachments: MAPREDUCE-6528.patch > > > We meet memory leak issues for JHS in a large cluster which is caused by code > below doesn't release FSDataInputStream in exception case. MAPREDUCE-6273 > should fix most cases that exceptions get thrown. However, we still need to > fix the memory leak for occasional case. > {code} > private String getJobSummary(FileContext fc, Path path) throws IOException { > Path qPath = fc.makeQualified(path); > FSDataInputStream in = fc.open(qPath); > String jobSummaryString = in.readUTF(); > in.close(); > return jobSummaryString; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982375#comment-14982375 ] Junping Du commented on MAPREDUCE-5485: --- Thanks [~bikassaha] for the comments! I agree it makes more sense to move retry logic into committer.commitJob() if it support repeatable. My original thinking is to combine this retry for committer.commitJob() with other AM exceptions in handleJobCommit (outside of committer), like: failed to write endCommitSuccessFile, etc. But now I think we should separate committer retry with AM specific handling for the reason you mentioned above. For this case, I would prefer we just let AM exit directly instead of fail the job (if commit job is repeatable). Most like the same as proposed above by [~nemon], but a slightly different is: we should apply AM fail (not job fail) even for commiter.commitJob() failed after retry for handling some corner cases, i.e. something goes wrong with related to committer in this AM but still get chance to success in another AM if we support repeatable in commit job. I will update a patch soon. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6528) Memory leak for HistoryFileManager.getJobSummary()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982317#comment-14982317 ] Junping Du commented on MAPREDUCE-6528: --- Thanks [~brahmareddy]! Can someone commit this patch in? It is quite straight-forward. > Memory leak for HistoryFileManager.getJobSummary() > -- > > Key: MAPREDUCE-6528 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6528 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6528.patch > > > We meet memory leak issues for JHS in a large cluster which is caused by code > below doesn't release FSDataInputStream in exception case. MAPREDUCE-6273 > should fix most cases that exceptions get thrown. However, we still need to > fix the memory leak for occasional case. > {code} > private String getJobSummary(FileContext fc, Path path) throws IOException { > Path qPath = fc.makeQualified(path); > FSDataInputStream in = fc.open(qPath); > String jobSummaryString = in.readUTF(); > in.close(); > return jobSummaryString; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6528) Memory leak for HistoryFileManager.getJobSummary()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982214#comment-14982214 ] Junping Du commented on MAPREDUCE-6528: --- Good point, Vinod! Let's keep the patch as it is now as try-with-resources won't be supported in earlier version of JDKs. > Memory leak for HistoryFileManager.getJobSummary() > -- > > Key: MAPREDUCE-6528 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6528 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6528.patch > > > We meet memory leak issues for JHS in a large cluster which is caused by code > below doesn't release FSDataInputStream in exception case. MAPREDUCE-6273 > should fix most cases that exceptions get thrown. However, we still need to > fix the memory leak for occasional case. > {code} > private String getJobSummary(FileContext fc, Path path) throws IOException { > Path qPath = fc.makeQualified(path); > FSDataInputStream in = fc.open(qPath); > String jobSummaryString = in.readUTF(); > in.close(); > return jobSummaryString; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Attachment: MAPREDUCE-5485-demo-2.patch Update 2nd demo patch, with fixing: 1. Make sure FileOutputCommitter only repeatable when using algorithm 2 (algorithm 1 is not support yet). 2. Make the temporary directory delete operation idempotent by allowing temporary directory is deleted because the directory does not exist (since it may have been deleted by the first AM). 3. Make the SUCCESS file marker creation operation idempotent by allowing for the file to exist (since it may have been created by the first AM). Test is still ongoing, will add unit test in next patch. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du > Attachments: MAPREDUCE-5485-demo-2.patch, MAPREDUCE-5485-demo.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6528) Memory leak for HistoryFileManager.getJobSummary()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14980892#comment-14980892 ] Junping Du commented on MAPREDUCE-6528: --- bq. Thanks for reporting this..can you use try-with-resources..? Given code with final block is already there. Any advantage for try-with-resources? > Memory leak for HistoryFileManager.getJobSummary() > -- > > Key: MAPREDUCE-6528 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6528 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6528.patch > > > We meet memory leak issues for JHS in a large cluster which is caused by code > below doesn't release FSDataInputStream in exception case. MAPREDUCE-6273 > should fix most cases that exceptions get thrown. However, we still need to > fix the memory leak for occasional case. > {code} > private String getJobSummary(FileContext fc, Path path) throws IOException { > Path qPath = fc.makeQualified(path); > FSDataInputStream in = fc.open(qPath); > String jobSummaryString = in.readUTF(); > in.close(); > return jobSummaryString; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6528) Memory leak for HistoryFileManager.getJobSummary()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6528: -- Status: Patch Available (was: Open) > Memory leak for HistoryFileManager.getJobSummary() > -- > > Key: MAPREDUCE-6528 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6528 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6528.patch > > > We meet memory leak issues for JHS in a large cluster which is caused by code > below doesn't release FSDataInputStream in exception case. MAPREDUCE-6273 > should fix most cases that exceptions get thrown. However, we still need to > fix the memory leak for occasional case. > {code} > private String getJobSummary(FileContext fc, Path path) throws IOException { > Path qPath = fc.makeQualified(path); > FSDataInputStream in = fc.open(qPath); > String jobSummaryString = in.readUTF(); > in.close(); > return jobSummaryString; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6528) Memory leak for HistoryFileManager.getJobSummary()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6528: -- Attachment: MAPREDUCE-6528.patch Attach a simple patch to fix it. The fix is quite straightforward, so no need for unit test. > Memory leak for HistoryFileManager.getJobSummary() > -- > > Key: MAPREDUCE-6528 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6528 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6528.patch > > > We meet memory leak issues for JHS in a large cluster which is caused by code > below doesn't release FSDataInputStream in exception case. MAPREDUCE-6273 > should fix most cases that exceptions get thrown. However, we still need to > fix the memory leak for occasional case. > {code} > private String getJobSummary(FileContext fc, Path path) throws IOException { > Path qPath = fc.makeQualified(path); > FSDataInputStream in = fc.open(qPath); > String jobSummaryString = in.readUTF(); > in.close(); > return jobSummaryString; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6528) Memory leak for HistoryFileManager.getJobSummary()
Junping Du created MAPREDUCE-6528: - Summary: Memory leak for HistoryFileManager.getJobSummary() Key: MAPREDUCE-6528 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6528 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Junping Du Assignee: Junping Du Priority: Critical We meet memory leak issues for JHS in a large cluster which is caused by code below doesn't release FSDataInputStream in exception case. MAPREDUCE-6273 should fix most cases that exceptions get thrown. However, we still need to fix the memory leak for occasional case. {code} private String getJobSummary(FileContext fc, Path path) throws IOException { Path qPath = fc.makeQualified(path); FSDataInputStream in = fc.open(qPath); String jobSummaryString = in.readUTF(); in.close(); return jobSummaryString; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Status: Patch Available (was: Open) > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du > Attachments: MAPREDUCE-5485-demo.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-6201) TestNetworkedJob fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved MAPREDUCE-6201. --- Resolution: Cannot Reproduce > TestNetworkedJob fails on trunk > --- > > Key: MAPREDUCE-6201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6201 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Robert Kanter >Assignee: Brahma Reddy Battula > > Currently, {{TestNetworkedJob}} is failing on trunk: > {noformat} > Running org.apache.hadoop.mapred.TestNetworkedJob > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 215.01 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob > testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed: > 67.363 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6201) TestNetworkedJob fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971415#comment-14971415 ] Junping Du commented on MAPREDUCE-6201: --- Oh. My bad... I will roll back as Cannot Reproduce. Thanks for pointing that out, [~brahmareddy]! > TestNetworkedJob fails on trunk > --- > > Key: MAPREDUCE-6201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6201 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Robert Kanter >Assignee: Brahma Reddy Battula > > Currently, {{TestNetworkedJob}} is failing on trunk: > {noformat} > Running org.apache.hadoop.mapred.TestNetworkedJob > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 215.01 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob > testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed: > 67.363 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (MAPREDUCE-6201) TestNetworkedJob fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reopened MAPREDUCE-6201: --- > TestNetworkedJob fails on trunk > --- > > Key: MAPREDUCE-6201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6201 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Robert Kanter >Assignee: Brahma Reddy Battula > > Currently, {{TestNetworkedJob}} is failing on trunk: > {noformat} > Running org.apache.hadoop.mapred.TestNetworkedJob > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 215.01 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob > testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed: > 67.363 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6508) TestNetworkedJob fails consistently due to delegation token changes on RM.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6508: -- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I have commit latest patch to trunk and branch-2. Thanks [~ajisakaa] for delivering the patch! > TestNetworkedJob fails consistently due to delegation token changes on RM. > -- > > Key: MAPREDUCE-6508 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6508 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Akira AJISAKA > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6508.00.patch, MAPREDUCE-6508.01.patch > > > {noformat} > Running org.apache.hadoop.mapred.TestNetworkedJob > Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 84.215 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob > testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed: > 31.537 sec <<< ERROR! > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: > java.io.IOException: Delegation Token can be issued only with kerberos > authentication > at > org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1044) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getDelegationToken(ApplicationClientProtocolPBServiceImpl.java:325) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:483) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2236) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2232) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2230) > Caused by: java.io.IOException: Delegation Token can be issued only with > kerberos authentication > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1017) > ... 10 more > at org.apache.hadoop.ipc.Client.call(Client.java:1448) > at org.apache.hadoop.ipc.Client.call(Client.java:1379) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy84.getDelegationToken(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getDelegationToken(ApplicationClientProtocolPBClientImpl.java:339) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:251) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy85.getDelegationToken(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getRMDelegationToken(YarnClientImpl.java:541) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.getDelegationToken(ResourceMgrDelegate.java:177) > at > org.apache.hadoop.mapred.YARNRunner.getDelegationToken(YARNRunner.java:231) > at > org.apache.hadoop.mapreduce.Cluster.getDelegationToken(Cluster.java:401) > at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1234) > at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1231) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.mapred.JobClient.getDelegationToken(JobClient.java:1230) > at > org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:260) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6508) TestNetworkedJob fails consistently due to delegation token changes on RM.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6508: -- Summary: TestNetworkedJob fails consistently due to delegation token changes on RM. (was: TestNetworkedJob fails intermittently) > TestNetworkedJob fails consistently due to delegation token changes on RM. > -- > > Key: MAPREDUCE-6508 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6508 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Akira AJISAKA > Attachments: MAPREDUCE-6508.00.patch, MAPREDUCE-6508.01.patch > > > {noformat} > Running org.apache.hadoop.mapred.TestNetworkedJob > Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 84.215 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob > testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed: > 31.537 sec <<< ERROR! > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: > java.io.IOException: Delegation Token can be issued only with kerberos > authentication > at > org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1044) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getDelegationToken(ApplicationClientProtocolPBServiceImpl.java:325) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:483) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2236) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2232) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2230) > Caused by: java.io.IOException: Delegation Token can be issued only with > kerberos authentication > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1017) > ... 10 more > at org.apache.hadoop.ipc.Client.call(Client.java:1448) > at org.apache.hadoop.ipc.Client.call(Client.java:1379) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy84.getDelegationToken(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getDelegationToken(ApplicationClientProtocolPBClientImpl.java:339) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:251) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy85.getDelegationToken(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getRMDelegationToken(YarnClientImpl.java:541) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.getDelegationToken(ResourceMgrDelegate.java:177) > at > org.apache.hadoop.mapred.YARNRunner.getDelegationToken(YARNRunner.java:231) > at > org.apache.hadoop.mapreduce.Cluster.getDelegationToken(Cluster.java:401) > at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1234) > at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1231) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.mapred.JobClient.getDelegationToken(JobClient.java:1230) > at > org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:260) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MAPREDUCE-6201) TestNetworkedJob fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du resolved MAPREDUCE-6201. --- Resolution: Duplicate > TestNetworkedJob fails on trunk > --- > > Key: MAPREDUCE-6201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6201 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Robert Kanter >Assignee: Brahma Reddy Battula > > Currently, {{TestNetworkedJob}} is failing on trunk: > {noformat} > Running org.apache.hadoop.mapred.TestNetworkedJob > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 215.01 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob > testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed: > 67.363 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (MAPREDUCE-6201) TestNetworkedJob fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du reopened MAPREDUCE-6201: --- I believe TestNetworkedJob still get consistently failed on trunk, so we shouldn't close this JIRA as "Cannot Reproduce". We should resolve this as duplicate of MAPREDUCE-6508 which already have a patch to go soon. > TestNetworkedJob fails on trunk > --- > > Key: MAPREDUCE-6201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6201 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Robert Kanter >Assignee: Brahma Reddy Battula > > Currently, {{TestNetworkedJob}} is failing on trunk: > {noformat} > Running org.apache.hadoop.mapred.TestNetworkedJob > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 215.01 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob > testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed: > 67.363 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6508) TestNetworkedJob fails intermittently
[ https://issues.apache.org/jira/browse/MAPREDUCE-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971204#comment-14971204 ] Junping Du commented on MAPREDUCE-6508: --- Thanks for your reply, [~ajisakaa]! I agree it is right to remove that code in unit test which make test failed consistently. The JIRA title "intermittently" is a little misleading, and I will correct it soon. Latest patch (01) LGTM. +1. I will commit it shortly. > TestNetworkedJob fails intermittently > - > > Key: MAPREDUCE-6508 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6508 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Akira AJISAKA > Attachments: MAPREDUCE-6508.00.patch, MAPREDUCE-6508.01.patch > > > {noformat} > Running org.apache.hadoop.mapred.TestNetworkedJob > Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 84.215 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob > testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed: > 31.537 sec <<< ERROR! > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: > java.io.IOException: Delegation Token can be issued only with kerberos > authentication > at > org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1044) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getDelegationToken(ApplicationClientProtocolPBServiceImpl.java:325) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:483) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2236) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2232) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2230) > Caused by: java.io.IOException: Delegation Token can be issued only with > kerberos authentication > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1017) > ... 10 more > at org.apache.hadoop.ipc.Client.call(Client.java:1448) > at org.apache.hadoop.ipc.Client.call(Client.java:1379) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy84.getDelegationToken(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getDelegationToken(ApplicationClientProtocolPBClientImpl.java:339) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:251) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy85.getDelegationToken(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getRMDelegationToken(YarnClientImpl.java:541) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.getDelegationToken(ResourceMgrDelegate.java:177) > at > org.apache.hadoop.mapred.YARNRunner.getDelegationToken(YARNRunner.java:231) > at > org.apache.hadoop.mapreduce.Cluster.getDelegationToken(Cluster.java:401) > at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1234) > at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1231) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.mapred.JobClient.getDelegationToken(JobClient.java:1230) > at > org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:260) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6449) MR Code should not throw and catch YarnRuntimeException to communicate internal exceptions
[ https://issues.apache.org/jira/browse/MAPREDUCE-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14971033#comment-14971033 ] Junping Du commented on MAPREDUCE-6449: --- Is this patch breaking our MR rolling upgrade story that old MR job and new MR job can coexist in a single cluster (during upgrade)? At least, changes on hs part sounds like this. If so, I would be very concern on this. > MR Code should not throw and catch YarnRuntimeException to communicate > internal exceptions > -- > > Key: MAPREDUCE-6449 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6449 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Neelesh Srinivas Salian > Labels: mapreduce > Attachments: MAPREDUCE-6449.001.patch, MAPREDUCE-6449.002.patch, > MAPREDUCE-6499-prelim.patch > > > In discussion of MAPREDUCE-6439 we discussed how throwing and catching > YarnRuntimeException in MR code is incorrect and we should instead use some > MR specific exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6201) TestNetworkedJob fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970991#comment-14970991 ] Junping Du commented on MAPREDUCE-6201: --- MAPREDUCE-6508 is still open for tracking failures of TestNetworkedJob. I think it is still get failed intermittently. Isn't it? > TestNetworkedJob fails on trunk > --- > > Key: MAPREDUCE-6201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6201 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Robert Kanter >Assignee: Brahma Reddy Battula > > Currently, {{TestNetworkedJob}} is failing on trunk: > {noformat} > Running org.apache.hadoop.mapred.TestNetworkedJob > Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 215.01 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob > testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed: > 67.363 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:195) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6508) TestNetworkedJob fails intermittently
[ https://issues.apache.org/jira/browse/MAPREDUCE-6508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970973#comment-14970973 ] Junping Du commented on MAPREDUCE-6508: --- Hi [~ajisakaa], Thanks for the patch. However, do we figure out the reason why job get failed intermittently? May be we should try to fix/understand the test problem rather than remove the test code? > TestNetworkedJob fails intermittently > - > > Key: MAPREDUCE-6508 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6508 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Rohith Sharma K S >Assignee: Akira AJISAKA > Attachments: MAPREDUCE-6508.00.patch, MAPREDUCE-6508.01.patch > > > {noformat} > Running org.apache.hadoop.mapred.TestNetworkedJob > Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 84.215 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestNetworkedJob > testNetworkedJob(org.apache.hadoop.mapred.TestNetworkedJob) Time elapsed: > 31.537 sec <<< ERROR! > java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: > java.io.IOException: Delegation Token can be issued only with kerberos > authentication > at > org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1044) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getDelegationToken(ApplicationClientProtocolPBServiceImpl.java:325) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:483) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:976) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2236) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2232) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2230) > Caused by: java.io.IOException: Delegation Token can be issued only with > kerberos authentication > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getDelegationToken(ClientRMService.java:1017) > ... 10 more > at org.apache.hadoop.ipc.Client.call(Client.java:1448) > at org.apache.hadoop.ipc.Client.call(Client.java:1379) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy84.getDelegationToken(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getDelegationToken(ApplicationClientProtocolPBClientImpl.java:339) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:251) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) > at com.sun.proxy.$Proxy85.getDelegationToken(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getRMDelegationToken(YarnClientImpl.java:541) > at > org.apache.hadoop.mapred.ResourceMgrDelegate.getDelegationToken(ResourceMgrDelegate.java:177) > at > org.apache.hadoop.mapred.YARNRunner.getDelegationToken(YARNRunner.java:231) > at > org.apache.hadoop.mapreduce.Cluster.getDelegationToken(Cluster.java:401) > at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1234) > at org.apache.hadoop.mapred.JobClient$16.run(JobClient.java:1231) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1667) > at > org.apache.hadoop.mapred.JobClient.getDelegationToken(JobClient.java:1230) > at > org.apache.hadoop.mapred.TestNetworkedJob.testNetworkedJob(TestNetworkedJob.java:260) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5485: -- Attachment: MAPREDUCE-5485-demo.patch Upload a demo patch first. More unit test will be added later. BTW, it adopt some code in MAPREDUCE-5718 with similar purpose, so please share the credit to the contributors of MAPREDUCE-5718 if we want to commit the following patches of this JIRA in future. > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou >Assignee: Junping Du > Attachments: MAPREDUCE-5485-demo.patch > > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5485) Allow repeating job commit by extending OutputCommitter API
[ https://issues.apache.org/jira/browse/MAPREDUCE-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955012#comment-14955012 ] Junping Du commented on MAPREDUCE-5485: --- The proposal above sounds good to me. [~nemon], thanks for filing this JIRA which is quite useful in some scenarios. If you don't mind, I would like to work on it and move it forward. Thanks! > Allow repeating job commit by extending OutputCommitter API > --- > > Key: MAPREDUCE-5485 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5485 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.1.0-beta >Reporter: Nemon Lou > > There are chances MRAppMaster crush during job committing,or NodeManager > restart cause the committing AM exit due to container expire.In these cases > ,the job will fail. > However,some jobs can redo commit so failing the job becomes unnecessary. > Let clients tell AM to allow redo commit or not is a better choice. > This idea comes from Jason Lowe's comments in MAPREDUCE-4819 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6478) Add an option to skip cleanupJob stage or ignore cleanup failure during commitJob().
[ https://issues.apache.org/jira/browse/MAPREDUCE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876054#comment-14876054 ] Junping Du commented on MAPREDUCE-6478: --- Thanks [~leftnoteasy] for review and commit! > Add an option to skip cleanupJob stage or ignore cleanup failure during > commitJob(). > > > Key: MAPREDUCE-6478 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6478 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6478-v1.1.patch, MAPREDUCE-6478-v1.patch > > > In some of our test cases for MR on public cloud scenario, a very big MR job > with hundreds or thousands of reducers cannot finish successfully because of > Job Cleanup failures which is caused by different scale/performance impact > for File System on the cloud (like AzureFS) which replacing HDFS's deletion > for whole directory with REST API calls on deleting each sub-directories > recursively. Even it get successfully, that could take much longer time > (hours) which is not necessary and waste time/resources especially in public > cloud scenario. > In these scenarios, some failures of cleanupJob can be ignored or user choose > to skip cleanupJob() completely make more sense. This is because making whole > job finish successfully with side effect of wasting some user spaces is much > better as user's jobs are usually comes and goes in public cloud, so have > choices to tolerant some temporary files exists with get rid of big job > re-run (or saving job's running time) is quite effective in time/resource > cost. > We should allow user to have this option (ignore failure or skip job cleanup > stage completely) especially when user know the cleanup failure is not due to > HDFS abnormal status but other FS' different performance trade-off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6478) Add an option to skip cleanupJob stage or ignore cleanup failure during commitJob().
[ https://issues.apache.org/jira/browse/MAPREDUCE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6478: -- Attachment: MAPREDUCE-6478-v1.1.patch Fix whitespace issue in v1.1. patch. > Add an option to skip cleanupJob stage or ignore cleanup failure during > commitJob(). > > > Key: MAPREDUCE-6478 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6478 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-6478-v1.1.patch, MAPREDUCE-6478-v1.patch > > > In some of our test cases for MR on public cloud scenario, a very big MR job > with hundreds or thousands of reducers cannot finish successfully because of > Job Cleanup failures which is caused by different scale/performance impact > for File System on the cloud (like AzureFS) which replacing HDFS's deletion > for whole directory with REST API calls on deleting each sub-directories > recursively. Even it get successfully, that could take much longer time > (hours) which is not necessary and waste time/resources especially in public > cloud scenario. > In these scenarios, some failures of cleanupJob can be ignored or user choose > to skip cleanupJob() completely make more sense. This is because making whole > job finish successfully with side effect of wasting some user spaces is much > better as user's jobs are usually comes and goes in public cloud, so have > choices to tolerant some temporary files exists with get rid of big job > re-run (or saving job's running time) is quite effective in time/resource > cost. > We should allow user to have this option (ignore failure or skip job cleanup > stage completely) especially when user know the cleanup failure is not due to > HDFS abnormal status but other FS' different performance trade-off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6478) Add an option to skip cleanupJob stage or ignore cleanup failure during commitJob().
[ https://issues.apache.org/jira/browse/MAPREDUCE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6478: -- Status: Patch Available (was: Open) > Add an option to skip cleanupJob stage or ignore cleanup failure during > commitJob(). > > > Key: MAPREDUCE-6478 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6478 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-6478-v1.patch > > > In some of our test cases for MR on public cloud scenario, a very big MR job > with hundreds or thousands of reducers cannot finish successfully because of > Job Cleanup failures which is caused by different scale/performance impact > for File System on the cloud (like AzureFS) which replacing HDFS's deletion > for whole directory with REST API calls on deleting each sub-directories > recursively. Even it get successfully, that could take much longer time > (hours) which is not necessary and waste time/resources especially in public > cloud scenario. > In these scenarios, some failures of cleanupJob can be ignored or user choose > to skip cleanupJob() completely make more sense. This is because making whole > job finish successfully with side effect of wasting some user spaces is much > better as user's jobs are usually comes and goes in public cloud, so have > choices to tolerant some temporary files exists with get rid of big job > re-run (or saving job's running time) is quite effective in time/resource > cost. > We should allow user to have this option (ignore failure or skip job cleanup > stage completely) especially when user know the cleanup failure is not due to > HDFS abnormal status but other FS' different performance trade-off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6478) Add an option to skip cleanupJob stage or ignore cleanup failure during commitJob().
[ https://issues.apache.org/jira/browse/MAPREDUCE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6478: -- Attachment: MAPREDUCE-6478-v1.patch Put a quick patch to add two configurations to allow skip cleanupJob or ignore cleanupJob failures. This is quite straightforward, so unit test is unnecessary here. > Add an option to skip cleanupJob stage or ignore cleanup failure during > commitJob(). > > > Key: MAPREDUCE-6478 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6478 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-6478-v1.patch > > > In some of our test cases for MR on public cloud scenario, a very big MR job > with hundreds or thousands of reducers cannot finish successfully because of > Job Cleanup failures which is caused by different scale/performance impact > for File System on the cloud (like AzureFS) which replacing HDFS's deletion > for whole directory with REST API calls on deleting each sub-directories > recursively. Even it get successfully, that could take much longer time > (hours) which is not necessary and waste time/resources especially in public > cloud scenario. > In these scenarios, some failures of cleanupJob can be ignored or user choose > to skip cleanupJob() completely make more sense. This is because making whole > job finish successfully with side effect of wasting some user spaces is much > better as user's jobs are usually comes and goes in public cloud, so have > choices to tolerant some temporary files exists with get rid of big job > re-run (or saving job's running time) is quite effective in time/resource > cost. > We should allow user to have this option (ignore failure or skip job cleanup > stage completely) especially when user know the cleanup failure is not due to > HDFS abnormal status but other FS' different performance trade-off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6478) Add an option to skip cleanupJob stage or ignore cleanup failure during commitJob().
[ https://issues.apache.org/jira/browse/MAPREDUCE-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6478: -- Description: In some of our test cases for MR on public cloud scenario, a very big MR job with hundreds or thousands of reducers cannot finish successfully because of Job Cleanup failures which is caused by different scale/performance impact for File System on the cloud (like AzureFS) which replacing HDFS's deletion for whole directory with REST API calls on deleting each sub-directories recursively. Even it get successfully, that could take much longer time (hours) which is not necessary and waste time/resources especially in public cloud scenario. In these scenarios, some failures of cleanupJob can be ignored or user choose to skip cleanupJob() completely make more sense. This is because making whole job finish successfully with side effect of wasting some user spaces is much better as user's jobs are usually comes and goes in public cloud, so have choices to tolerant some temporary files exists with get rid of big job re-run (or saving job's running time) is quite effective in time/resource cost. We should allow user to have this option (ignore failure or skip job cleanup stage completely) especially when user know the cleanup failure is not due to HDFS abnormal status but other FS' different performance trade-off. was: In some our test cases for MR on public cloud scenario, a very big MR job with hundreds or thousands of reducers cannot finish successfully because of Job Cleanup failures which is caused by different scale/performance impact for File System on the cloud (like AzureFS) which replacing HDFS's deletion for whole directory with REST API calls on deleting each sub-directories recursively. That could take much longer time (hours) which is not necessary in public cloud scenario. Also, it also more easily to get failed to cleanup in these cases, so some failures of cleanupJob can be ignored in this case. Making whole job finish successfully with side effect of wasting some user spaces make more sense in these cases as user's job is usually comes and goes in public cloud, so have a trade off to tolerant some temporary files exists with get rid of big job re-run is quite cost effective. We should allow user to have this option (ignore failure or skip job cleanup stage completely) especially when user know the cleanup failure is not due to HDFS abnormal status but other FS' different performance trade-off. > Add an option to skip cleanupJob stage or ignore cleanup failure during > commitJob(). > > > Key: MAPREDUCE-6478 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6478 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Junping Du >Assignee: Junping Du > > In some of our test cases for MR on public cloud scenario, a very big MR job > with hundreds or thousands of reducers cannot finish successfully because of > Job Cleanup failures which is caused by different scale/performance impact > for File System on the cloud (like AzureFS) which replacing HDFS's deletion > for whole directory with REST API calls on deleting each sub-directories > recursively. Even it get successfully, that could take much longer time > (hours) which is not necessary and waste time/resources especially in public > cloud scenario. > In these scenarios, some failures of cleanupJob can be ignored or user choose > to skip cleanupJob() completely make more sense. This is because making whole > job finish successfully with side effect of wasting some user spaces is much > better as user's jobs are usually comes and goes in public cloud, so have > choices to tolerant some temporary files exists with get rid of big job > re-run (or saving job's running time) is quite effective in time/resource > cost. > We should allow user to have this option (ignore failure or skip job cleanup > stage completely) especially when user know the cleanup failure is not due to > HDFS abnormal status but other FS' different performance trade-off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6478) Add an option to skip cleanupJob stage or ignore cleanup failure during commitJob().
Junping Du created MAPREDUCE-6478: - Summary: Add an option to skip cleanupJob stage or ignore cleanup failure during commitJob(). Key: MAPREDUCE-6478 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6478 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Junping Du Assignee: Junping Du In some our test cases for MR on public cloud scenario, a very big MR job with hundreds or thousands of reducers cannot finish successfully because of Job Cleanup failures which is caused by different scale/performance impact for File System on the cloud (like AzureFS) which replacing HDFS's deletion for whole directory with REST API calls on deleting each sub-directories recursively. That could take much longer time (hours) which is not necessary in public cloud scenario. Also, it also more easily to get failed to cleanup in these cases, so some failures of cleanupJob can be ignored in this case. Making whole job finish successfully with side effect of wasting some user spaces make more sense in these cases as user's job is usually comes and goes in public cloud, so have a trade off to tolerant some temporary files exists with get rid of big job re-run is quite cost effective. We should allow user to have this option (ignore failure or skip job cleanup stage completely) especially when user know the cleanup failure is not due to HDFS abnormal status but other FS' different performance trade-off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6458) Figure out the way to pass build-in classpath (files in distributed cache, etc.) from parent to spawned shells
[ https://issues.apache.org/jira/browse/MAPREDUCE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706488#comment-14706488 ] Junping Du commented on MAPREDUCE-6458: --- bq. Re-assigning this to me and updating the description to reflect reality, since I actually understand how bash works. Please feel free to take it if you have bandwidth to work on it immediately. > Figure out the way to pass build-in classpath (files in distributed cache, > etc.) from parent to spawned shells > -- > > Key: MAPREDUCE-6458 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6458 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Allen Wittenauer > > In MAPREDUCE-6454 (target for branch-2.x), we provide a way with constraints > to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, > so jars in distributed cache can still work in child tasks. In trunk, we may > think some way different, like: involve additional env var to safely pass > build-in classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6458) Figure out the way to pass build-in classpath (files in distributed cache, etc.) from parent to spawned shells
[ https://issues.apache.org/jira/browse/MAPREDUCE-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6458: -- Description: In MAPREDUCE-6454 (target for branch-2.x), we provide a way with constraints to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, so jars in distributed cache can still work in child tasks. In trunk, we may think some way different, like: involve additional env var to safely pass build-in classpath. (was: In MAPREDUCE-6454 (target for branch-2.x), we provide an extremely fragile way to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, so jars in distributed cache can still work in child tasks. In trunk, we may think some way different, like: involve additional env var to safely pass build-in classpath.) > Figure out the way to pass build-in classpath (files in distributed cache, > etc.) from parent to spawned shells > -- > > Key: MAPREDUCE-6458 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6458 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Allen Wittenauer > > In MAPREDUCE-6454 (target for branch-2.x), we provide a way with constraints > to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, > so jars in distributed cache can still work in child tasks. In trunk, we may > think some way different, like: involve additional env var to safely pass > build-in classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706044#comment-14706044 ] Junping Du commented on MAPREDUCE-6454: --- Thanks [~vinodkv] for review/commit and [~aw] for comments. bq. For trunk at least, it would probably be better to have a different var that is handled via mapreduce's shellprofile.d bit. This also sounds like a good way. Agree that we can discuss more later (on another JIRA) for trunk. bq. We will have to think more about the right-approach for trunk. Will open a separate ticket for this. +1. Just filed MAPREDUCE-6458 to address this. > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.2, 2.6.2 > > Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, > MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6458) Figure out the way to pass build-in classpath (files in distributed cache, etc.) from parent to spawned shells
Junping Du created MAPREDUCE-6458: - Summary: Figure out the way to pass build-in classpath (files in distributed cache, etc.) from parent to spawned shells Key: MAPREDUCE-6458 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6458 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du In MAPREDUCE-6454 (target for branch-2.x), we provide a way to pass built-in classpath from parent to child shell, via HADOOP_CLASSPATH, so jars in distributed cache can still work in child tasks. In trunk, we may think some way different, like: involve additional env var to complicatedly pass build-in classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6454: -- Attachment: MAPREDUCE-6454-v3.1.patch Fix minor issues like: javadoc warnings, checkstyle, etc. > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, > MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6454: -- Status: Patch Available (was: Open) > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, > MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6454: -- Attachment: MAPREDUCE-6454-v3.patch Previous fixes are not right to resolve the issues we met. v3 patch is verified to be working. > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, > MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6454: -- Status: Open (was: Patch Available) > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, > MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6454: -- Attachment: MAPREDUCE-6454-v2.1.patch Fix some whitespace issue in v2.1 patch. > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, > MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6454: -- Status: Patch Available (was: Open) > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, > MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6454: -- Status: Open (was: Patch Available) > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6454-v2.patch, MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6454: -- Attachment: MAPREDUCE-6454-v2.patch The test failure is because YarnConfiguration.YARN_APPLICATION_CLASSPATH is always empty string without any settings. In this case, we should use YarnConfiguration.DEFAULT_YARN_APPLICATION_CLASSPATH as default value. Fix this in v2 patch. > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6454-v2.patch, MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6454: -- Attachment: MAPREDUCE-6454.patch MRApps already include files in distributed cache when adding them to classpath env of MRApps. However, "jar"s are excluded for some reason (seems not valid now). In this patch, we add it back. > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6454: -- Status: Patch Available (was: Open) > MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. > > > Key: MAPREDUCE-6454 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6454.patch > > > We already set lib jars on distributed-cache to CLASSPATH. However, in some > corner cases (like: MR local mode, Hive Map side local join, etc.), we need > these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching > runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
Junping Du created MAPREDUCE-6454: - Summary: MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache. Key: MAPREDUCE-6454 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Priority: Critical We already set lib jars on distributed-cache to CLASSPATH. However, in some corner cases (like: MR local mode, Hive Map side local join, etc.), we need these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching runjar process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6443: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) > Add JvmPauseMonitor to Job History Server > - > > Key: MAPREDUCE-6443 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6443.001.patch, MAPREDUCE-6443.002.patch > > > We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History > Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660053#comment-14660053 ] Junping Du commented on MAPREDUCE-6443: --- I have commit the patch to trunk and branch-2. Thanks [~rkanter] for the contribution! > Add JvmPauseMonitor to Job History Server > - > > Key: MAPREDUCE-6443 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6443.001.patch, MAPREDUCE-6443.002.patch > > > We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History > Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660017#comment-14660017 ] Junping Du commented on MAPREDUCE-6443: --- +1. 002 patch LGTM. Will commit it shortly. > Add JvmPauseMonitor to Job History Server > - > > Key: MAPREDUCE-6443 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6443.001.patch, MAPREDUCE-6443.002.patch > > > We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History > Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6443) Add JvmPauseMonitor to Job History Server
[ https://issues.apache.org/jira/browse/MAPREDUCE-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14655241#comment-14655241 ] Junping Du commented on MAPREDUCE-6443: --- Thanks for the patch, Robert. A similar comment with YARN-4019: can we move initiate work of pauseMonitor to serviceInit() and only left start work at serviceStart()? > Add JvmPauseMonitor to Job History Server > - > > Key: MAPREDUCE-6443 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6443 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.8.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Attachments: MAPREDUCE-6443.001.patch > > > We should add the {{JvmPauseMonitor}} from HADOOP-9618 to the Job History > Server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (MAPREDUCE-6441) LocalDistributedCacheManager for concurrent sqoop processes fails to create unique directories
[ https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du moved HADOOP-10924 to MAPREDUCE-6441: Key: MAPREDUCE-6441 (was: HADOOP-10924) Project: Hadoop Map/Reduce (was: Hadoop Common) > LocalDistributedCacheManager for concurrent sqoop processes fails to create > unique directories > -- > > Key: MAPREDUCE-6441 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: William Watson >Assignee: William Watson > Attachments: HADOOP-10924.02.patch, > HADOOP-10924.03.jobid-plus-uuid.patch > > > Kicking off many sqoop processes in different threads results in: > {code} > 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: > Encountered IOException running import job: java.io.IOException: > java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot > overwrite non empty destination directory > /tmp/hadoop-hadoop/mapred/local/1406915233073 > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > java.security.AccessController.doPrivileged(Native Method) > 2014-08-01 13:47:24 -0400: INFO -at > javax.security.auth.Subject.doAs(Subject.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.run(Sqoop.java:145) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.main(Sqoop.java:238) > {code} > If two are kicked off in the same second. The issue is the following lines of > code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = >new AtomicLong(System.currentTimeMillis()); > {code} > and > {code} > Long.toString(uniqueNumberGenerator.incrementAndGet())), > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6424) Store MR counters as timeline metrics instead of event
Junping Du created MAPREDUCE-6424: - Summary: Store MR counters as timeline metrics instead of event Key: MAPREDUCE-6424 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6424 Project: Hadoop Map/Reduce Issue Type: Sub-task Reporter: Junping Du Assignee: Junping Du In MAPREDUCE-6327, we make map/reduce counters get encoded from JobFinishedEvent as timeline events with counters details in JSON format. We need to store framework specific counters as metrics in timeline service to support query, aggregation, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6232) Task state is running when all task attempts fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6232: -- Labels: (was: BB2015-05-RFC) > Task state is running when all task attempts fail > - > > Key: MAPREDUCE-6232 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6232 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.6.0 >Reporter: Yang Hao >Assignee: Yang Hao > Attachments: MAPREDUCE-6232.patch, MAPREDUCE-6232.v2.patch, > TaskImpl.new.png, TaskImpl.normal.png, result.pdf > > > When task attempts fails, the task's state is still running. A clever way is > to check the task attempts's state, if none of the attempts is running, then > the task state should not be running -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6232) Task state is running when all task attempts fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6232: -- Target Version/s: (was: 2.6.0) > Task state is running when all task attempts fail > - > > Key: MAPREDUCE-6232 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6232 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.6.0 >Reporter: Yang Hao >Assignee: Yang Hao > Attachments: MAPREDUCE-6232.patch, MAPREDUCE-6232.v2.patch, > TaskImpl.new.png, TaskImpl.normal.png, result.pdf > > > When task attempts fails, the task's state is still running. A clever way is > to check the task attempts's state, if none of the attempts is running, then > the task state should not be running -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: "error=7, Argument list too long at if number of input file is high"
[ https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555957#comment-14555957 ] Junping Du commented on MAPREDUCE-5965: --- Thanks guys for good discussions here. +1 on the overall solution here. Agree that we don't need to put new streaming configuration to *-default.xml as previous practices. bq. If you really want to make it configurable the easiest way would be to roll the two settings in one. We could make the stream.truncate.long.jobconf.values an integer: -1 do not truncate otherwise truncate at the length given. That sounds better. May be we should rename "stream.truncate.long.jobconf.values" to something like: "stream.jobconf.truncate.limit" and document somewhere to say -1 is the default value which doesn't do any truncate and 20K is a proper value for most cases? > Hadoop streaming throws error if list of input files is high. Error is: > "error=7, Argument list too long at if number of input file is high" > > > Key: MAPREDUCE-5965 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Arup Malakar >Assignee: Wilfred Spiegelenburg > Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, > MAPREDUCE-5965.patch > > > Hadoop streaming exposes all the key values in job conf as environment > variables when it forks a process for streaming code to run. Unfortunately > the variable mapreduce_input_fileinputformat_inputdir contains the list of > input files, and Linux has a limit on size of environment variables + > arguments. > Based on how long the list of files and their full path is this could be > pretty huge. And given all of these variables are not even used it stops user > from running hadoop job with large number of files, even though it could be > run. > Linux throws E2BIG if the size is greater than certain size which is error > code 7. And java translates that to "error=7, Argument list too long". More: > http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping > variables if it is greater than certain length. That way if user code > requires the environment variable it would fail. It should also introduce a > config variable to skip long variables, and set it to false by default. That > way user has to specifically set it to true to invoke this feature. > Here is the exception: > {code} > Error: java.lang.RuntimeException: Error in configuring object at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at > org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:415) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: > java.lang.reflect.InvocationTargetException at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) > at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 > more Caused by: java.lang.reflect.InvocationTargetException at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ... 17 more Caused by: java.lang.RuntimeException: configuration exception at > org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at > org.apache.hadoop.streaming.PipeMapper.configure(PipeMa
[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6361: -- Target Version/s: 2.7.1 (was: 2.8.0) Fix Version/s: (was: 2.8.0) 2.7.1 Thanks [~ozawa] for review and commit the patch! Move the commit from 2.8 to 2.7.1 as we need this fix asap. > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.7.1 > > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6164) "mapreduce.reduce.shuffle.fetch.retry.timeout-ms" should be set to 3 minutes instead of 30 seconds by default to be consistent with other retry timeout
[ https://issues.apache.org/jira/browse/MAPREDUCE-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550284#comment-14550284 ] Junping Du commented on MAPREDUCE-6164: --- Do we still need this? It seems to be pending for a long time... > "mapreduce.reduce.shuffle.fetch.retry.timeout-ms" should be set to 3 minutes > instead of 30 seconds by default to be consistent with other retry timeout > > > Key: MAPREDUCE-6164 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6164 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6164.patch > > > In MAPREDUCE-5891, we are adding retry logic to MAPREDUCE shuffle stage for > fetcher can be survival during NM downtime (with shuffle service down as > well). In many places, we are setting the default timeout to be 3 minutes > (connection timeout, etc.) to tolerant possible more time for NM down, but we > are making "mapreduce.reduce.shuffle.fetch.retry.timeout-ms" to be 30 seconds > which is not consistent here. We should change this to 180 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6361: -- Status: Patch Available (was: Open) > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6361: -- Attachment: MAPREDUCE-6361-v1.patch Upload the patch with the 2nd solution proposed above with unit test. > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: MAPREDUCE-6361-v1.patch > > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539594#comment-14539594 ] Junping Du commented on MAPREDUCE-6361: --- There are basically two ways to fix the race condition here: 1. abstract following code into a synchronized method, so copySucceeded() would get blocked until copyFailed() finished. {code} scheduler.hostFailed(host.getHostName()); for(TaskAttemptID left: failedTasks) { scheduler.copyFailed(left, host, true, false); } {code} This sounds like more performance impact on shuffle as failure in fetching map output on one thread will block copySucceeded() for other threads with longer time. 2. Update copyFailed() to have assumption that hostFailures could be cleanup in the other thread. In case of that, adding back host to hostFailed as the first time host failed. Prefer the 2nd option which sounds more lightweight. Will deliver a quick patch soon. > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6164) "mapreduce.reduce.shuffle.fetch.retry.timeout-ms" should be set to 3 minutes instead of 30 seconds by default to be consistent with other retry timeout
[ https://issues.apache.org/jira/browse/MAPREDUCE-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6164: -- Target Version/s: 2.8.0 (was: 2.6.1) > "mapreduce.reduce.shuffle.fetch.retry.timeout-ms" should be set to 3 minutes > instead of 30 seconds by default to be consistent with other retry timeout > > > Key: MAPREDUCE-6164 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6164 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6164.patch > > > In MAPREDUCE-5891, we are adding retry logic to MAPREDUCE shuffle stage for > fetcher can be survival during NM downtime (with shuffle service down as > well). In many places, we are setting the default timeout to be 3 minutes > (connection timeout, etc.) to tolerant possible more time for NM down, but we > are making "mapreduce.reduce.shuffle.fetch.retry.timeout-ms" to be 30 seconds > which is not consistent here. We should change this to 180 seconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6361: -- Priority: Critical (was: Major) > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538071#comment-14538071 ] Junping Du commented on MAPREDUCE-6361: --- NPE get throw in copyFailed() in ShuffleSchedulerImpl.java:267: {code} "boolean hostFail = hostFailures.get(hostname).get() > getMaxHostFailures() ? true : false;" {code} It means hostFailures doesn't include hostname that just failed, which is not expected because we call hostFailed() to put host into hostFailures before anytime to call copyFailed(): {code} scheduler.hostFailed(host.getHostName()); for(TaskAttemptID left: failedTasks) { scheduler.copyFailed(left, host, true, false); } {code} Although hostFailed() and copyFailed() are both synchronized method (so as copySucceeded()), it is still possible (like the only reason) to cause this NPE for the other thread calls copySucceeded() on the same host (for other map output) between we call hostFailed() and copyFailed() in this thread when taking care of one map output failure. We need to fix this concurrent issue to get rid of NPE issue which failed map output copy directly without any retry. > NPE issue in shuffle caused by concurrent issue between copySucceeded() in > one thread and copyFailed() in another thread on the same host > - > > Key: MAPREDUCE-6361 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > > The failure in log: > 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : > org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in > shuffle in fetcher#25 > at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host
Junping Du created MAPREDUCE-6361: - Summary: NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host Key: MAPREDUCE-6361 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du The failure in log: 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#25 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.NullPointerException at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) -- This message was sent by Atlassian JIRA (v6.3.4#6332)