[jira] [Commented] (MAPREDUCE-7069) Add ability to specify user environment variables individually
[ https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177385#comment-17177385 ] Kihwal Lee commented on MAPREDUCE-7069: --- This should have made to other release branches. Cherry-picked to branch-3.1 and branch-2.10. > Add ability to specify user environment variables individually > -- > > Key: MAPREDUCE-7069 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Fix For: 3.2.0, 2.10.1, 3.1.5 > > Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch, > MAPREDUCE-7069.003.patch, MAPREDUCE-7069.004.patch, MAPREDUCE-7069.005.patch, > MAPREDUCE-7069.006.patch, MAPREDUCE-7069.007.patch > > > As reported in YARN-6830, it is currently not possible to specify an > environment variable that contains commas via {{mapreduce.map.env}}, > mapreduce.reduce.env, or {{mapreduce.admin.user.env}}. > To address this, [~aw] proposed in [YARN-6830] that we add the ability to > specify environment variables individually: > {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7069) Add ability to specify user environment variables individually
[ https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-7069: -- Fix Version/s: 3.1.5 2.10.1 > Add ability to specify user environment variables individually > -- > > Key: MAPREDUCE-7069 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Fix For: 3.2.0, 2.10.1, 3.1.5 > > Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch, > MAPREDUCE-7069.003.patch, MAPREDUCE-7069.004.patch, MAPREDUCE-7069.005.patch, > MAPREDUCE-7069.006.patch, MAPREDUCE-7069.007.patch > > > As reported in YARN-6830, it is currently not possible to specify an > environment variable that contains commas via {{mapreduce.map.env}}, > mapreduce.reduce.env, or {{mapreduce.admin.user.env}}. > To address this, [~aw] proposed in [YARN-6830] that we add the ability to > specify environment variables individually: > {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7177) Disable speculative execution in TestDFSIO
Kihwal Lee created MAPREDUCE-7177: - Summary: Disable speculative execution in TestDFSIO Key: MAPREDUCE-7177 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7177 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.8.5, 3.2.0 Reporter: Kihwal Lee When TestDFSIO runs in a certain environment where a subset of mappers are slow, the speculative execution can start. In the write phase, this will make existing mapper to fail in next addBlock() since the output files are overwritten. To make the benchmark more predictable and repeatable, speculation must be implicitly disabled in TestDFSIO itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6767) TestSlive fails after a common change
[ https://issues.apache.org/jira/browse/MAPREDUCE-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435706#comment-15435706 ] Kihwal Lee commented on MAPREDUCE-6767: --- E.g. https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/143/testReport/junit/org.apache.hadoop.fs.slive/TestSlive/testSelection/ > TestSlive fails after a common change > - > > Key: MAPREDUCE-6767 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6767 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Kihwal Lee > > It looks like this was broken after HADOOP-12726. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6767) TestSlive fails after a common change
[ https://issues.apache.org/jira/browse/MAPREDUCE-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-6767: -- Description: It looks like this was broken after HADOOP-12726. > TestSlive fails after a common change > - > > Key: MAPREDUCE-6767 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6767 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Kihwal Lee > > It looks like this was broken after HADOOP-12726. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6767) TestSlive fails after a common change
Kihwal Lee created MAPREDUCE-6767: - Summary: TestSlive fails after a common change Key: MAPREDUCE-6767 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6767 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Kihwal Lee -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6750) TestHSAdminServer.testRefreshSuperUserGroups is failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-6750: -- Fix Version/s: (was: 2.9.0) 2.8.0 > TestHSAdminServer.testRefreshSuperUserGroups is failing > --- > > Key: MAPREDUCE-6750 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6750 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Fix For: 2.8.0 > > Attachments: MAPREDUCE-6750.patch > > > HADOOP-13442 changed {{AccessControlList}} to call {{getGroups()}} instead of > {{getGroupNames()}}. It should work if the mocks are updated to stub the > right method and return the right type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-6750) TestHSAdminServer.testRefreshSuperUserGroups is failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned MAPREDUCE-6750: - Assignee: Kihwal Lee > TestHSAdminServer.testRefreshSuperUserGroups is failing > --- > > Key: MAPREDUCE-6750 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6750 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Minor > Attachments: MAPREDUCE-6750.patch > > > HADOOP-13442 changed {{AccessControlList}} to call {{getGroups()}} instead of > {{getGroupNames()}}. It should work if the mocks are updated to stub the > right method and return the right type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6750) TestHSAdminServer.testRefreshSuperUserGroups is failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-6750: -- Status: Patch Available (was: Open) > TestHSAdminServer.testRefreshSuperUserGroups is failing > --- > > Key: MAPREDUCE-6750 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6750 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Priority: Minor > Attachments: MAPREDUCE-6750.patch > > > HADOOP-13442 changed {{AccessControlList}} to call {{getGroups()}} instead of > {{getGroupNames()}}. It should work if the mocks are updated to stub the > right method and return the right type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6750) TestHSAdminServer.testRefreshSuperUserGroups is failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-6750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-6750: -- Attachment: MAPREDUCE-6750.patch > TestHSAdminServer.testRefreshSuperUserGroups is failing > --- > > Key: MAPREDUCE-6750 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6750 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Priority: Minor > Attachments: MAPREDUCE-6750.patch > > > HADOOP-13442 changed {{AccessControlList}} to call {{getGroups()}} instead of > {{getGroupNames()}}. It should work if the mocks are updated to stub the > right method and return the right type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6750) TestHSAdminServer.testRefreshSuperUserGroups is failing
Kihwal Lee created MAPREDUCE-6750: - Summary: TestHSAdminServer.testRefreshSuperUserGroups is failing Key: MAPREDUCE-6750 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6750 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Kihwal Lee Priority: Minor HADOOP-13442 changed {{AccessControlList}} to call {{getGroups()}} instead of {{getGroupNames()}}. It should work if the mocks are updated to stub the right method and return the right type. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6527) Data race on field org.apache.hadoop.mapred.JobConf.credentials
[ https://issues.apache.org/jira/browse/MAPREDUCE-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180367#comment-15180367 ] Kihwal Lee commented on MAPREDUCE-6527: --- LocalJobRunner is for testing map reduce locally without involving the actual cluster. Impact of the race is minimal. If the input or output path is in a secure HDFS, it might cause the local job instance to fail. If the job uses the local file system or a HDFS with security disabled, there will be no issue. > Data race on field org.apache.hadoop.mapred.JobConf.credentials > --- > > Key: MAPREDUCE-6527 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6527 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Ali Kheradmand >Assignee: Haibo Chen > Attachments: mapreduce6527.001.patch > > > I am running the test suite against a dynamic race detector called > RV-Predict. Here is a race report that I got: > {noformat} > Data race on field org.apache.hadoop.mapred.JobConf.credentials: {{{ > Concurrent read in thread T327 (locks held: {}) > > at org.apache.hadoop.mapred.JobConf.getCredentials(JobConf.java:505) > at > org.apache.hadoop.mapreduce.task.JobContextImpl.(JobContextImpl.java:70) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:524) > T327 is created by T22 > at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:218) > Concurrent write in thread T22 (locks held: {Monitor@496c673a, > Monitor@496319b0}) > > at org.apache.hadoop.mapred.JobConf.setCredentials(JobConf.java:510) > at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:787) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:241) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) > at > org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335) > locked Monitor@496319b0 at > org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:n/a) > > at > org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:245) > locked Monitor@496c673a at > org.apache.hadoop.mapreduce.lib.jobcontrol.JobControl.run(JobControl.java:229) > > T22 is created by T1 > at > org.apache.hadoop.mapred.jobcontrol.TestJobControl.doJobControlTest(TestJobControl.java:111) > }}} > {noformat} > In the source code of org.apache.hadoop.mapreduce.JobStatus.submitJob > function, we have the following lines: > {code} > Job job = new Job(JobID.downgrade(jobid), jobSubmitDir); > job.job.setCredentials(credentials); > {code} > It looks a bit suspicious: Job extends thread and at the end of its > constructor it starts a new thread which creates a new instance of > JobContextImpl which reads credentials. However, the first thread > concurrently sets credentials after a creating the Job instance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
[ https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983321#comment-14983321 ] Kihwal Lee commented on MAPREDUCE-6451: --- bq did you forget the DynamicInputChunkContext class when you commit? It is a Friday. :) Fixed it. Thanks for reporting. > DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic > - > > Key: MAPREDUCE-6451 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.7.2 > > Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, > MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch, MAPREDUCE-6451-v5.patch > > > DistCp when used with dynamic strategy does not update the chunkFilePath and > other static variables any time other than for the first job. This is seen > when DistCp::run() is used. > A single copy succeeds but multiple jobs finish successfully without any real > copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
[ https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983156#comment-14983156 ] Kihwal Lee commented on MAPREDUCE-6451: --- +1 > DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic > - > > Key: MAPREDUCE-6451 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, > MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch, MAPREDUCE-6451-v5.patch > > > DistCp when used with dynamic strategy does not update the chunkFilePath and > other static variables any time other than for the first job. This is seen > when DistCp::run() is used. > A single copy succeeds but multiple jobs finish successfully without any real > copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
[ https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-6451: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.2 3.0.0 Status: Resolved (was: Patch Available) I've committed this to trunk, branch-2 and branch-2.7. Thanks for working on the fix, Kuhu. Thank you gentlemen for the reviews. > DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic > - > > Key: MAPREDUCE-6451 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distcp >Affects Versions: 2.6.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Fix For: 3.0.0, 2.7.2 > > Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, > MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch, MAPREDUCE-6451-v5.patch > > > DistCp when used with dynamic strategy does not update the chunkFilePath and > other static variables any time other than for the first job. This is seen > when DistCp::run() is used. > A single copy succeeds but multiple jobs finish successfully without any real > copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
[ https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705598#comment-14705598 ] Kihwal Lee commented on MAPREDUCE-6451: --- Kicked the precommit: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5947/ DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic - Key: MAPREDUCE-6451 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.6.0 Reporter: Kuhu Shukla Assignee: Kuhu Shukla Attachments: MAPREDUCE-6451-v1.patch DistCp when used with dynamic strategy does not update the chunkFilePath and other static variables any time other than for the first job. This is seen when DistCp::run() is used. A single copy succeeds but multiple jobs finish successfully without any real copying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201012#comment-14201012 ] Kihwal Lee commented on MAPREDUCE-5958: --- +1 the patch looks good. Thanks for adding the test case, Jason. Wrong reduce task progress if map output is compressed -- Key: MAPREDUCE-5958 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1 Reporter: Emilio Coppa Assignee: Emilio Coppa Priority: Minor Labels: progress, reduce Attachments: HADOOP-5958-v2.patch, MAPREDUCE-5958v3.patch If the map output is compressed (_mapreduce.map.output.compress_ set to _true_) then the reduce task progress may be highly underestimated. In the reduce phase (but also in the merge phase), the progress of a reduce task is computed as the ratio between the number of processed bytes and the number of total bytes. But: - the number of total bytes is computed by summing up the uncompressed segment sizes (_Merger.Segment.getRawDataLength()_) - the number of processed bytes is computed by exploiting the position of the current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may refer to the position in the underlying on disk file (which may be compressed) Thus, if the map outputs are compressed then the progress may be underestimated (e.g., only 1 map output ondisk file, the compressed file is 25% of its original size, then the reduce task progress during the reduce phase will range between 0 and 0.25 and then artificially jump to 1.0). Attached there is a patch: the number of processed bytes is now computed by exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then _getPosition()_ already returns exactly this field). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-5958: -- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Wrong reduce task progress if map output is compressed -- Key: MAPREDUCE-5958 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1 Reporter: Emilio Coppa Assignee: Emilio Coppa Priority: Minor Labels: progress, reduce Fix For: 2.6.0 Attachments: HADOOP-5958-v2.patch, MAPREDUCE-5958v3.patch If the map output is compressed (_mapreduce.map.output.compress_ set to _true_) then the reduce task progress may be highly underestimated. In the reduce phase (but also in the merge phase), the progress of a reduce task is computed as the ratio between the number of processed bytes and the number of total bytes. But: - the number of total bytes is computed by summing up the uncompressed segment sizes (_Merger.Segment.getRawDataLength()_) - the number of processed bytes is computed by exploiting the position of the current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may refer to the position in the underlying on disk file (which may be compressed) Thus, if the map outputs are compressed then the progress may be underestimated (e.g., only 1 map output ondisk file, the compressed file is 25% of its original size, then the reduce task progress during the reduce phase will range between 0 and 0.25 and then artificially jump to 1.0). Attached there is a patch: the number of processed bytes is now computed by exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then _getPosition()_ already returns exactly this field). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5958) Wrong reduce task progress if map output is compressed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201015#comment-14201015 ] Kihwal Lee commented on MAPREDUCE-5958: --- Committed this to trunk, branch-2 and branch-2.6. Wrong reduce task progress if map output is compressed -- Key: MAPREDUCE-5958 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5958 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0, 2.3.0, 2.2.1, 2.4.0, 2.4.1 Reporter: Emilio Coppa Assignee: Emilio Coppa Priority: Minor Labels: progress, reduce Fix For: 2.6.0 Attachments: HADOOP-5958-v2.patch, MAPREDUCE-5958v3.patch If the map output is compressed (_mapreduce.map.output.compress_ set to _true_) then the reduce task progress may be highly underestimated. In the reduce phase (but also in the merge phase), the progress of a reduce task is computed as the ratio between the number of processed bytes and the number of total bytes. But: - the number of total bytes is computed by summing up the uncompressed segment sizes (_Merger.Segment.getRawDataLength()_) - the number of processed bytes is computed by exploiting the position of the current _IFile.Reader_ (using _IFile.Reader.getPosition()_) but this may refer to the position in the underlying on disk file (which may be compressed) Thus, if the map outputs are compressed then the progress may be underestimated (e.g., only 1 map output ondisk file, the compressed file is 25% of its original size, then the reduce task progress during the reduce phase will range between 0 and 0.25 and then artificially jump to 1.0). Attached there is a patch: the number of processed bytes is now computed by exploiting _IFile.Reader.bytesRead_ (if the the reader is in memory, then _getPosition()_ already returns exactly this field). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6022) map_input_file is missing from streaming job environment
[ https://issues.apache.org/jira/browse/MAPREDUCE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188616#comment-14188616 ] Kihwal Lee commented on MAPREDUCE-6022: --- +1 looks good to me. map_input_file is missing from streaming job environment Key: MAPREDUCE-6022 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6022 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: MAPREDUCE-6022.patch, MAPREDUCE-6022v2.patch When running a streaming job the 'map_input_file' environment variable is not being set. This property is deprecated, but in the past deprecated properties still appeared in a stream job's environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6022) map_input_file is missing from streaming job environment
[ https://issues.apache.org/jira/browse/MAPREDUCE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-6022: -- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've just committed this. Thanks for fixing the bug, Jason. map_input_file is missing from streaming job environment Key: MAPREDUCE-6022 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6022 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 2.6.0 Attachments: MAPREDUCE-6022.patch, MAPREDUCE-6022v2.patch When running a streaming job the 'map_input_file' environment variable is not being set. This property is deprecated, but in the past deprecated properties still appeared in a stream job's environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6022) map_input_file is missing from streaming job environment
[ https://issues.apache.org/jira/browse/MAPREDUCE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-6022: -- Assignee: Jason Lowe map_input_file is missing from streaming job environment Key: MAPREDUCE-6022 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6022 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: MAPREDUCE-6022.patch When running a streaming job the 'map_input_file' environment variable is not being set. This property is deprecated, but in the past deprecated properties still appeared in a stream job's environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-5939) StartTime showing up as the epoch time in JHS UI after upgrade
Kihwal Lee created MAPREDUCE-5939: - Summary: StartTime showing up as the epoch time in JHS UI after upgrade Key: MAPREDUCE-5939 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5939 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0 Reporter: Kihwal Lee After upgrading from 0.23.x to 2.5, the start time of old apps are showing up as the epoch time. It looks like 2.5 expects start time to be encoded at the end of the jhist file name (-[timestamp].jhist). It should have been made backward compatible. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5868) TestPipeApplication causing nightly build to fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984461#comment-13984461 ] Kihwal Lee commented on MAPREDUCE-5868: --- The test output contains only this. {panel} 2014-04-29 14:05:07,398 INFO \[main\] util.ProcessTree (ProcessTree.java:isSetsidSupported(64)) - setsid exited with exit code 0 {panel} In the test worksapce, I see {{cache.sh}} and {{outfile}}. {{outfile}} is 0-byte. TestPipeApplication causing nightly build to fail - Key: MAPREDUCE-5868 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5868 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: trunk Reporter: Jason Lowe TestPipeApplication appears to be timing out which causes the nightly build to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5749) TestRMContainerAllocator#testReportedAppProgress Failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13979753#comment-13979753 ] Kihwal Lee commented on MAPREDUCE-5749: --- This has been causing failures in the nightly build. Attaching the full test log from last night for reference. TestRMContainerAllocator#testReportedAppProgress Failed --- Key: MAPREDUCE-5749 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5749 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: trunk Reporter: shenhong Attachments: MAPREDUCE-5749.patch When execute mvn test -Dtest=TestRMContainerAllocator#testReportedAppProgress, It failed with message: {code} Caused by: java.io.FileNotFoundException: File /home/yuling.sh/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/target/org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator/appattempt_1392009213299_0001_01/.staging/job_1392009213299_0001/job.xml does not exist {code} But in fact, the job.xml exits: {code} -rw-rw-r-- 1 yuling.sh yuling.sh 65791 2月 10 13:13 /home/yuling.sh/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/target/org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator/yuling.sh/.staging/job_1392009213299_0001/job.xml {code} See the following code: {code} public Job submit(Configuration conf, boolean mapSpeculative, boolean reduceSpeculative) throws Exception { String user = conf.get(MRJobConfig.USER_NAME, UserGroupInformation .getCurrentUser().getShortUserName()); conf.set(MRJobConfig.USER_NAME, user); conf.set(MRJobConfig.MR_AM_STAGING_DIR, testAbsPath.toString()); conf.setBoolean(MRJobConfig.MR_AM_CREATE_JH_INTERMEDIATE_BASE_DIR, true); // TODO: fix the bug where the speculator gets events with // not-fully-constructed objects. For now, disable speculative exec conf.setBoolean(MRJobConfig.MAP_SPECULATIVE, mapSpeculative); conf.setBoolean(MRJobConfig.REDUCE_SPECULATIVE, reduceSpeculative); init(conf); start(); DefaultMetricsSystem.shutdown(); Job job = getContext().getAllJobs().values().iterator().next(); if (assignedQueue != null) { job.setQueueName(assignedQueue); } // Write job.xml String jobFile = MRApps.getJobFile(conf, user, TypeConverter.fromYarn(job.getID())); LOG.info(Writing job conf to + jobFile); new File(jobFile).getParentFile().mkdirs(); conf.writeXml(new FileOutputStream(jobFile)); return job; } {code} At first, user is yuling.sh, but the UGI is setted to attemptId at start();, after that, job.xml write to yuling.sh/.staging/job_1392009213299_0001/job.xml. But when the job is running, MRAppMaster can't find the job.xml at appattempt_1392009213299_0001_01/.staging/job_1392009213299_0001. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5749) TestRMContainerAllocator#testReportedAppProgress Failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-5749: -- Attachment: TestRMContainerAllocator_failure.txt TestRMContainerAllocator#testReportedAppProgress Failed --- Key: MAPREDUCE-5749 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5749 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: trunk Reporter: shenhong Attachments: MAPREDUCE-5749.patch, TestRMContainerAllocator_failure.txt When execute mvn test -Dtest=TestRMContainerAllocator#testReportedAppProgress, It failed with message: {code} Caused by: java.io.FileNotFoundException: File /home/yuling.sh/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/target/org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator/appattempt_1392009213299_0001_01/.staging/job_1392009213299_0001/job.xml does not exist {code} But in fact, the job.xml exits: {code} -rw-rw-r-- 1 yuling.sh yuling.sh 65791 2月 10 13:13 /home/yuling.sh/hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/target/org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator/yuling.sh/.staging/job_1392009213299_0001/job.xml {code} See the following code: {code} public Job submit(Configuration conf, boolean mapSpeculative, boolean reduceSpeculative) throws Exception { String user = conf.get(MRJobConfig.USER_NAME, UserGroupInformation .getCurrentUser().getShortUserName()); conf.set(MRJobConfig.USER_NAME, user); conf.set(MRJobConfig.MR_AM_STAGING_DIR, testAbsPath.toString()); conf.setBoolean(MRJobConfig.MR_AM_CREATE_JH_INTERMEDIATE_BASE_DIR, true); // TODO: fix the bug where the speculator gets events with // not-fully-constructed objects. For now, disable speculative exec conf.setBoolean(MRJobConfig.MAP_SPECULATIVE, mapSpeculative); conf.setBoolean(MRJobConfig.REDUCE_SPECULATIVE, reduceSpeculative); init(conf); start(); DefaultMetricsSystem.shutdown(); Job job = getContext().getAllJobs().values().iterator().next(); if (assignedQueue != null) { job.setQueueName(assignedQueue); } // Write job.xml String jobFile = MRApps.getJobFile(conf, user, TypeConverter.fromYarn(job.getID())); LOG.info(Writing job conf to + jobFile); new File(jobFile).getParentFile().mkdirs(); conf.writeXml(new FileOutputStream(jobFile)); return job; } {code} At first, user is yuling.sh, but the UGI is setted to attemptId at start();, after that, job.xml write to yuling.sh/.staging/job_1392009213299_0001/job.xml. But when the job is running, MRAppMaster can't find the job.xml at appattempt_1392009213299_0001_01/.staging/job_1392009213299_0001. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5804) TestMRJobsWithProfiler#testProfiler timesout
[ https://issues.apache.org/jira/browse/MAPREDUCE-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943304#comment-13943304 ] Kihwal Lee commented on MAPREDUCE-5804: --- +1 TestMRJobsWithProfiler#testProfiler timesout Key: MAPREDUCE-5804 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5804 Project: Hadoop Map/Reduce Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: LOG.txt, MAPREDUCE-5804.patch {noformat} testProfiler(org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler) Time elapsed: 154.972 sec ERROR! java.lang.Exception: test timed out after 12 milliseconds at java.io.UnixFileSystem.getBooleanAttributes0(Native Method) at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242) at java.io.File.exists(File.java:813) at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080) at sun.misc.URLClassPath.getResource(URLClassPath.java:199) at java.net.URLClassLoader$1.run(URLClassLoader.java:358) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.log4j.spi.LoggingEvent.init(LoggingEvent.java:165) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.warn(Log4JLogger.java:208) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:338) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:532) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1570) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1344) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1306) at org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler.testProfiler(TestMRJobsWithProfiler.java:138) Results : Tests in error: TestMRJobsWithProfiler.testProfiler:138 » test timed out after 12 millise... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5804) TestMRJobsWithProfiler#testProfiler timesout
[ https://issues.apache.org/jira/browse/MAPREDUCE-5804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-5804: -- Resolution: Fixed Fix Version/s: 2.5.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for working on this. I've committed this to trunk and branch-2. TestMRJobsWithProfiler#testProfiler timesout Key: MAPREDUCE-5804 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5804 Project: Hadoop Map/Reduce Issue Type: Test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Mit Desai Fix For: 3.0.0, 2.5.0 Attachments: LOG.txt, MAPREDUCE-5804.patch {noformat} testProfiler(org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler) Time elapsed: 154.972 sec ERROR! java.lang.Exception: test timed out after 12 milliseconds at java.io.UnixFileSystem.getBooleanAttributes0(Native Method) at java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:242) at java.io.File.exists(File.java:813) at sun.misc.URLClassPath$FileLoader.getResource(URLClassPath.java:1080) at sun.misc.URLClassPath.getResource(URLClassPath.java:199) at java.net.URLClassLoader$1.run(URLClassLoader.java:358) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.log4j.spi.LoggingEvent.init(LoggingEvent.java:165) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.warn(Log4JLogger.java:208) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:338) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:532) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1570) at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311) at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:599) at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1344) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1306) at org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler.testProfiler(TestMRJobsWithProfiler.java:138) Results : Tests in error: TestMRJobsWithProfiler.testProfiler:138 » test timed out after 12 millise... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-3184) Improve handling of fetch failures when a tasktracker is not responding on HTTP
[ https://issues.apache.org/jira/browse/MAPREDUCE-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-3184: -- Assignee: Todd Lipcon (was: Jordan Zimmerman) Improve handling of fetch failures when a tasktracker is not responding on HTTP --- Key: MAPREDUCE-3184 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3184 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.20.205.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 1.0.1 Attachments: mr-3184.txt On a 100 node cluster, we had an issue where one of the TaskTrackers was hit by MAPREDUCE-2386 and stopped responding to fetches. The behavior observed was the following: - every reducer would try to fetch the same map task, and fail after ~13 minutes. - At that point, all reducers would report this failed fetch to the JT for the same task, and the task would be re-run. - Meanwhile, the reducers would move on to the next map task that ran on the TT, and hang for another 13 minutes. The job essentially made no progress for hours, as each map task that ran on the bad node was serially marked failed. To combat this issue, we should introduce a second type of failed fetch notification, used when the TT does not respond at all (ie SocketTimeoutException, etc). These fetch failure notifications should count against the TT at large, rather than a single task. If more than half of the reducers report such an issue for a given TT, then all of the tasks from that TT should be re-run. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5757) ConcurrentModificationException in JobControl.toList
[ https://issues.apache.org/jira/browse/MAPREDUCE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13900842#comment-13900842 ] Kihwal Lee commented on MAPREDUCE-5757: --- +1 lgtm ConcurrentModificationException in JobControl.toList Key: MAPREDUCE-5757 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5757 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: MAPREDUCE-5757.patch Despite having the fix for MAPREDUCE-5513 we saw another ConcurrencyModificationException in JobControl, so something there still isn't fixed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5623) TestJobCleanup fails because of RejectedExecutionException and NPE.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13847742#comment-13847742 ] Kihwal Lee commented on MAPREDUCE-5623: --- +1 The patch looks good to me. TestJobCleanup fails because of RejectedExecutionException and NPE. --- Key: MAPREDUCE-5623 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5623 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Tsuyoshi OZAWA Assignee: Jason Lowe Attachments: MAPREDUCE-5623.1.patch, MAPREDUCE-5623.2.patch, MAPREDUCE-5623.3.patch org.apache.hadoop.mapred.TestJobCleanup can fail because of RejectedExecutionException by NonAggregatingLogHandler. This problem is described in YARN-1409. TestJobCleanup can still fail after fixing RejectedExecutionException, because of NPE by Job#getCounters()'s returning null. {code} --- Test set: org.apache.hadoop.mapred.TestJobCleanup --- Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 140.933 sec FAILURE! - in org.apache.hadoop.mapred.TestJobCleanup testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup) Time elapsed: 31.068 sec ERROR! java.lang.NullPointerException: null at org.apache.hadoop.mapred.TestJobCleanup.testFailedJob(TestJobCleanup.java:199) at org.apache.hadoop.mapred.TestJobCleanup.testCustomAbort(TestJobCleanup.java:296) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5603) Ability to disable FileInputFormat listLocatedStatus optimization to save client memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824255#comment-13824255 ] Kihwal Lee commented on MAPREDUCE-5603: --- The patch looks good. Since the MAPREDUCE build is broken, can you post your own test result? Ability to disable FileInputFormat listLocatedStatus optimization to save client memory --- Key: MAPREDUCE-5603 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5603 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, mrv2 Affects Versions: 0.23.10, 2.2.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Minor Attachments: MAPREDUCE-5603.patch It would be nice if users had the option to disable the listLocatedStatus optimization in FileInputFormat to save client memory. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5373) TestFetchFailure.testFetchFailureMultipleReduces could fail intermittently
[ https://issues.apache.org/jira/browse/MAPREDUCE-5373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824264#comment-13824264 ] Kihwal Lee commented on MAPREDUCE-5373: --- +1 Looks good to me. TestFetchFailure.testFetchFailureMultipleReduces could fail intermittently -- Key: MAPREDUCE-5373 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5373 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chuan Liu Assignee: Jonathan Eagles Attachments: MAPREDUCE-5373.patch The unit test case could fail intermittently on both Linux and Windows in my testing. The error message seems suggesting the task status was wrong during testing. An example Linux failure: {noformat} --- Test set: org.apache.hadoop.mapreduce.v2.app.TestFetchFailure --- Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 12.235 sec FAILURE! testFetchFailureMultipleReduces(org.apache.hadoop.mapreduce.v2.app.TestFetchFailure) Time elapsed: 1261 sec FAILURE! java.lang.AssertionError: expected:SUCCEEDED but was:SCHEDULED at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.mapreduce.v2.app.TestFetchFailure.testFetchFailureMultipleReduces(TestFetchFailure.java:332) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75) {noformat} An example Windows failure: {noformat} --- Test set: org.apache.hadoop.mapreduce.v2.app.TestFetchFailure --- Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 50.342 sec FAILURE! testFetchFailureMultipleReduces(org.apache.hadoop.mapreduce.v2.app.TestFetchFailure) Time elapsed: 36175 sec FAILURE! java.lang.AssertionError: expected:SUCCEEDED but was:RUNNING at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at
[jira] [Commented] (MAPREDUCE-5446) TestJobHistoryEvents and TestJobHistoryParsing have race conditions
[ https://issues.apache.org/jira/browse/MAPREDUCE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729589#comment-13729589 ] Kihwal Lee commented on MAPREDUCE-5446: --- +1 the patch looks good. TestJobHistoryEvents and TestJobHistoryParsing have race conditions --- Key: MAPREDUCE-5446 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5446 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, test Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jason Lowe Attachments: MAPREDUCE-5446.patch TestJobHistoryEvents and TestJobHistoryParsing are not properly waiting for MRApp to finish. Currently they are polling the service state looking for Service.STATE.STOPPED, but the service can appear to be in that state *before* it is fully stopped. This causes tests to finish with MRApp threads still in-flight, and those threads can conflict with subsequent tests when they collide in the filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5446) TestJobHistoryEvents and TestJobHistoryParsing have race conditions
[ https://issues.apache.org/jira/browse/MAPREDUCE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-5446: -- Resolution: Fixed Fix Version/s: 2.1.1-beta 3.0.0 Status: Resolved (was: Patch Available) The patch has been committed to trunk, branch-2 and branch-2.1-beta. Thanks for the patch, Jason and for the review, Tsuyoshi. TestJobHistoryEvents and TestJobHistoryParsing have race conditions --- Key: MAPREDUCE-5446 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5446 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, test Affects Versions: 2.1.0-beta Reporter: Jason Lowe Assignee: Jason Lowe Fix For: 3.0.0, 2.1.1-beta Attachments: MAPREDUCE-5446.patch TestJobHistoryEvents and TestJobHistoryParsing are not properly waiting for MRApp to finish. Currently they are polling the service state looking for Service.STATE.STOPPED, but the service can appear to be in that state *before* it is fully stopped. This causes tests to finish with MRApp threads still in-flight, and those threads can conflict with subsequent tests when they collide in the filesystem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3894) 0.23 and trunk MR builds fail intermittently
[ https://issues.apache.org/jira/browse/MAPREDUCE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved MAPREDUCE-3894. --- Resolution: Fixed 0.23 and trunk MR builds fail intermittently Key: MAPREDUCE-3894 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3894 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.24.0, 0.23.2 Reporter: Kihwal Lee The builds occasionally report ABORTED or FAILURE, which is not caused by the new code change included in the builds. We are not sure since when they have been broken this way, but Bobby's guess is around Feb 10. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720137#comment-13720137 ] Kihwal Lee commented on MAPREDUCE-1981: --- +1 The patch for branch-0.23 looks good too. Improve getSplits performance by using listFiles, the new FileSystem API Key: MAPREDUCE-1981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Attachments: mapredListFiles1.patch, mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch This jira will make FileInputFormat and CombinedFileInputForm to use the new API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13716688#comment-13716688 ] Kihwal Lee commented on MAPREDUCE-1981: --- +1 The patch looks good. I also ran some tests and they worked successfully. Thanks for fixing both mapred and mapreduce. Improve getSplits performance by using listFiles, the new FileSystem API Key: MAPREDUCE-1981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Attachments: mapredListFiles1.patch, mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, mapredListFiles.patch, MAPREDUCE-1981.patch This jira will make FileInputFormat and CombinedFileInputForm to use the new API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633470#comment-13633470 ] Kihwal Lee commented on MAPREDUCE-5065: --- I've committed this to trunk, branch-2 and branch-0.23. DistCp should skip checksum comparisons if block-sizes are different on source/target. -- Key: MAPREDUCE-5065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: MAPREDUCE-5065.branch-0.23.patch, MAPREDUCE-5065.branch-2.patch When copying files between 2 clusters with different default block-sizes, one sees that the copy fails with a checksum-mismatch, even though the files have identical contents. The reason is that on HDFS, a file's checksum is unfortunately a function of the block-size of the file. So you could have 2 different files with identical contents (but different block-sizes) have different checksums. (Thus, it's also possible for DistCp to fail to copy files on the same file-system, if the source-file's block-size differs from HDFS default, and -pb isn't used.) I propose that we skip checksum comparisons under the following conditions: 1. -skipCrc is specified. 2. File-size is 0 (in which case the call to the checksum-servlet is moot). 3. source.getBlockSize() != target.getBlockSize(), since the checksums are guaranteed to differ in this case. I have a patch for #3. Edit: I've modified the fix to warn the user (instead of skipping the checksum-check). Skipping parity-checks is unsafe. The code now fails the copy, and suggests that the user either use -pb to preserve block-size, or consider -skipCrc (and forgo copy validation entirely). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-5065: -- Resolution: Fixed Fix Version/s: 0.23.8 2.0.5-beta 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) DistCp should skip checksum comparisons if block-sizes are different on source/target. -- Key: MAPREDUCE-5065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Fix For: 3.0.0, 2.0.5-beta, 0.23.8 Attachments: MAPREDUCE-5065.branch-0.23.patch, MAPREDUCE-5065.branch-2.patch When copying files between 2 clusters with different default block-sizes, one sees that the copy fails with a checksum-mismatch, even though the files have identical contents. The reason is that on HDFS, a file's checksum is unfortunately a function of the block-size of the file. So you could have 2 different files with identical contents (but different block-sizes) have different checksums. (Thus, it's also possible for DistCp to fail to copy files on the same file-system, if the source-file's block-size differs from HDFS default, and -pb isn't used.) I propose that we skip checksum comparisons under the following conditions: 1. -skipCrc is specified. 2. File-size is 0 (in which case the call to the checksum-servlet is moot). 3. source.getBlockSize() != target.getBlockSize(), since the checksums are guaranteed to differ in this case. I have a patch for #3. Edit: I've modified the fix to warn the user (instead of skipping the checksum-check). Skipping parity-checks is unsafe. The code now fails the copy, and suggests that the user either use -pb to preserve block-size, or consider -skipCrc (and forgo copy validation entirely). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626624#comment-13626624 ] Kihwal Lee commented on MAPREDUCE-5065: --- The patch looks good to me. [~cutting], are you okay with the change? DistCp should skip checksum comparisons if block-sizes are different on source/target. -- Key: MAPREDUCE-5065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: MAPREDUCE-5065.branch-0.23.patch, MAPREDUCE-5065.branch-2.patch When copying files between 2 clusters with different default block-sizes, one sees that the copy fails with a checksum-mismatch, even though the files have identical contents. The reason is that on HDFS, a file's checksum is unfortunately a function of the block-size of the file. So you could have 2 different files with identical contents (but different block-sizes) have different checksums. (Thus, it's also possible for DistCp to fail to copy files on the same file-system, if the source-file's block-size differs from HDFS default, and -pb isn't used.) I propose that we skip checksum comparisons under the following conditions: 1. -skipCrc is specified. 2. File-size is 0 (in which case the call to the checksum-servlet is moot). 3. source.getBlockSize() != target.getBlockSize(), since the checksums are guaranteed to differ in this case. I have a patch for #3. Edit: I've modified the fix to warn the user (instead of skipping the checksum-check). Skipping parity-checks is unsafe. The code now fails the copy, and suggests that the user either use -pb to preserve block-size, or consider -skipCrc (and forgo copy validation entirely). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605626#comment-13605626 ] Kihwal Lee commented on MAPREDUCE-5065: --- Review comments: * Add a reasonable timeout to the test case. This is a relatively new rule. It applies even when you are modifying existing test cases. Please take account that tests may run on a slower hardware. * If we suggest -skipCrc along with -pb, we should probably inform users of the risk of skipping validation. DistCp should skip checksum comparisons if block-sizes are different on source/target. -- Key: MAPREDUCE-5065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Attachments: MAPREDUCE-5065.branch-0.23.patch, MAPREDUCE-5065.branch-2.patch When copying files between 2 clusters with different default block-sizes, one sees that the copy fails with a checksum-mismatch, even though the files have identical contents. The reason is that on HDFS, a file's checksum is unfortunately a function of the block-size of the file. So you could have 2 different files with identical contents (but different block-sizes) have different checksums. (Thus, it's also possible for DistCp to fail to copy files on the same file-system, if the source-file's block-size differs from HDFS default, and -pb isn't used.) I propose that we skip checksum comparisons under the following conditions: 1. -skipCrc is specified. 2. File-size is 0 (in which case the call to the checksum-servlet is moot). 3. source.getBlockSize() != target.getBlockSize(), since the checksums are guaranteed to differ in this case. I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603493#comment-13603493 ] Kihwal Lee commented on MAPREDUCE-5065: --- bq. Another option might be to implement a checksum that's blocksize-independent... Reading whole metadata may be too much, especially for huge files. It will be better if we make computation happen where the data is. :) Most hashing is incremental, so if DFSClient feeds the last state of hash into the next datanode and let it continue updating it, the result will be independent of block size. The current way of doing file checksum allows calculating individual block checksums in parallel, but we are not taking advantage of it in DFSClient anyway. So I don't think there won't be any significant changes in performance or overhead. We should probably continue this discussion in a separate jira. DistCp should skip checksum comparisons if block-sizes are different on source/target. -- Key: MAPREDUCE-5065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan When copying files between 2 clusters with different default block-sizes, one sees that the copy fails with a checksum-mismatch, even though the files have identical contents. The reason is that on HDFS, a file's checksum is unfortunately a function of the block-size of the file. So you could have 2 different files with identical contents (but different block-sizes) have different checksums. (Thus, it's also possible for DistCp to fail to copy files on the same file-system, if the source-file's block-size differs from HDFS default, and -pb isn't used.) I propose that we skip checksum comparisons under the following conditions: 1. -skipCrc is specified. 2. File-size is 0 (in which case the call to the checksum-servlet is moot). 3. source.getBlockSize() != target.getBlockSize(), since the checksums are guaranteed to differ in this case. I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603494#comment-13603494 ] Kihwal Lee commented on MAPREDUCE-5065: --- bq. So I don't think there won't be any significant changes in performance or overhead. Sorry, unintended double negation. DistCp should skip checksum comparisons if block-sizes are different on source/target. -- Key: MAPREDUCE-5065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan When copying files between 2 clusters with different default block-sizes, one sees that the copy fails with a checksum-mismatch, even though the files have identical contents. The reason is that on HDFS, a file's checksum is unfortunately a function of the block-size of the file. So you could have 2 different files with identical contents (but different block-sizes) have different checksums. (Thus, it's also possible for DistCp to fail to copy files on the same file-system, if the source-file's block-size differs from HDFS default, and -pb isn't used.) I propose that we skip checksum comparisons under the following conditions: 1. -skipCrc is specified. 2. File-size is 0 (in which case the call to the checksum-servlet is moot). 3. source.getBlockSize() != target.getBlockSize(), since the checksums are guaranteed to differ in this case. I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5065) DistCp should skip checksum comparisons if block-sizes are different on source/target.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13603502#comment-13603502 ] Kihwal Lee commented on MAPREDUCE-5065: --- Filed HDFS-4605 for block-size independent FileChecksum in HDFS. DistCp should skip checksum comparisons if block-sizes are different on source/target. -- Key: MAPREDUCE-5065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5065 Project: Hadoop Map/Reduce Issue Type: Bug Components: distcp Affects Versions: 2.0.3-alpha, 0.23.5 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan When copying files between 2 clusters with different default block-sizes, one sees that the copy fails with a checksum-mismatch, even though the files have identical contents. The reason is that on HDFS, a file's checksum is unfortunately a function of the block-size of the file. So you could have 2 different files with identical contents (but different block-sizes) have different checksums. (Thus, it's also possible for DistCp to fail to copy files on the same file-system, if the source-file's block-size differs from HDFS default, and -pb isn't used.) I propose that we skip checksum comparisons under the following conditions: 1. -skipCrc is specified. 2. File-size is 0 (in which case the call to the checksum-servlet is moot). 3. source.getBlockSize() != target.getBlockSize(), since the checksums are guaranteed to differ in this case. I have a patch for #3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565758#comment-13565758 ] Kihwal Lee commented on MAPREDUCE-1700: --- Merged to branch-0.23. User supplied dependencies may conflict with MapReduce system JARs -- Key: MAPREDUCE-1700 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Tom White Assignee: Tom White Fix For: 2.0.3-alpha Attachments: MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch If user code has a dependency on a version of a JAR that is different to the one that happens to be used by Hadoop, then it may not work correctly. This happened with user code using a different version of Avro, as reported [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081]. The problem is analogous to the one that application servers have with WAR loading. Using a specialized classloader in the Child JVM is probably the way to solve this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547831#comment-13547831 ] Kihwal Lee commented on MAPREDUCE-1700: --- +1 The patch looks good. I hope people try this with many different use cases. User supplied dependencies may conflict with MapReduce system JARs -- Key: MAPREDUCE-1700 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch If user code has a dependency on a version of a JAR that is different to the one that happens to be used by Hadoop, then it may not work correctly. This happened with user code using a different version of Avro, as reported [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081]. The problem is analogous to the one that application servers have with WAR loading. Using a specialized classloader in the Child JVM is probably the way to solve this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13526775#comment-13526775 ] Kihwal Lee commented on MAPREDUCE-1700: --- {quote} bq. Tom, one thing I've forgot to mention in my previous comment, we should see how to enable the classloader on the client side as well as it may be required (to use different JARs) for the submission code. I think this is a slightly different problem, since users generally have more control over the JVM they submit from than the JVM the task runs in. So, yes, another JIRA would be appropriate. {quote} I think AM also runs user code, if a custom output format is defined. User supplied dependencies may conflict with MapReduce system JARs -- Key: MAPREDUCE-1700 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch If user code has a dependency on a version of a JAR that is different to the one that happens to be used by Hadoop, then it may not work correctly. This happened with user code using a different version of Avro, as reported [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081]. The problem is analogous to the one that application servers have with WAR loading. Using a specialized classloader in the Child JVM is probably the way to solve this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1700) User supplied dependencies may conflict with MapReduce system JARs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509888#comment-13509888 ] Kihwal Lee commented on MAPREDUCE-1700: --- Now that we have a much better way of dealing with dependency conflicts, what will be the fate of mapreduce.job.user.classpath.first feature? Is there any use case where this feature works but the CCL approach don't or somehow is preferred over CCL for some reason? If none, shall we deprecate it? User supplied dependencies may conflict with MapReduce system JARs -- Key: MAPREDUCE-1700 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1700 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700-ccl.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch, MAPREDUCE-1700.patch If user code has a dependency on a version of a JAR that is different to the one that happens to be used by Hadoop, then it may not work correctly. This happened with user code using a different version of Avro, as reported [here|https://issues.apache.org/jira/browse/AVRO-493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852081#action_12852081]. The problem is analogous to the one that application servers have with WAR loading. Using a specialized classloader in the Child JVM is probably the way to solve this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4451) fairscheduler fail to init job with kerberos authentication configured
[ https://issues.apache.org/jira/browse/MAPREDUCE-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13463881#comment-13463881 ] Kihwal Lee commented on MAPREDUCE-4451: --- Erik, you can run src/test/bin/test-patch.sh manually and post the result. fairscheduler fail to init job with kerberos authentication configured -- Key: MAPREDUCE-4451 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4451 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/fair-share Affects Versions: 1.0.3 Reporter: Erik.fang Attachments: MAPREDUCE-4451_branch-1.patch, MAPREDUCE-4451_branch-1.patch, MAPREDUCE-4451_branch-1.patch, MAPREDUCE-4451_branch-1.patch, MAPREDUCE-4451_branch-1.patch Using FairScheduler in Hadoop 1.0.3 with kerberos authentication configured. Job initialization fails: {code} 2012-07-17 15:15:09,220 ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed: java.io.IOException: Call to /192.168.7.80:8020 failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at org.apache.hadoop.ipc.Client.wrapException(Client.java:1129) at org.apache.hadoop.ipc.Client.call(Client.java:1097) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at $Proxy7.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:411) at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:125) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:329) at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:294) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1411) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1429) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) at org.apache.hadoop.security.Credentials.writeTokenStorageFile(Credentials.java:169) at org.apache.hadoop.mapred.JobInProgress.generateAndStoreTokens(JobInProgress.java:3558) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:696) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3911) at org.apache.hadoop.mapred.FairScheduler$JobInitializer$InitJob.run(FairScheduler.java:301) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:543) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:488) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:590) at org.apache.hadoop.ipc.Client$Connection.access$2100(Client.java:187) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1228) at org.apache.hadoop.ipc.Client.call(Client.java:1072) ... 20 more Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:194) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:134) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:385) at org.apache.hadoop.ipc.Client$Connection.access$1200(Client.java:187) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:583) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:580) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at
[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands
[ https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461918#comment-13461918 ] Kihwal Lee commented on MAPREDUCE-4662: --- Hemanth, As you pointed out, the core thread timeout only takes effect when there is no work (i.e. no job completion). I took the heap dump of JT to see the overhead of these extra threads. In terms of memory, counting HashMapEntry, Thread, Worker and thread local objects that are unique to a thread, it is well under 1KB per thread. I suspect stack and other supporting system data structures take more memory. So, having them around all the time doesn't seem to add much resource overhead. Probably starting and stopping them frequently will create more overhead. JobHistoryFilesManager thread pool never expands Key: MAPREDUCE-4662 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1.0.2 Reporter: Thomas Graves Assignee: Kihwal Lee Attachments: mapreduce-4662.branch-1.patch, mapreduce-4662.branch-1.patch The job history file manager creates a threadpool with core size 1 thread, max pool size 3. It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't have a max size. void start() { executor = new ThreadPoolExecutor(1, 3, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable()); } According to the ThreadPoolExecutor java doc page it only increases the number of threads when the queue is full. Since the queue we are using has no max size it never fills up and we never get more then 1 thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands
[ https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461978#comment-13461978 ] Kihwal Lee commented on MAPREDUCE-4662: --- I've also verified that the all worker threads for the pool are exiting after the timeout. JobHistoryFilesManager thread pool never expands Key: MAPREDUCE-4662 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1.0.2 Reporter: Thomas Graves Assignee: Kihwal Lee Attachments: mapreduce-4662.branch-1.patch, mapreduce-4662.branch-1.patch The job history file manager creates a threadpool with core size 1 thread, max pool size 3. It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't have a max size. void start() { executor = new ThreadPoolExecutor(1, 3, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable()); } According to the ThreadPoolExecutor java doc page it only increases the number of threads when the queue is full. Since the queue we are using has no max size it never fills up and we never get more then 1 thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands
[ https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459713#comment-13459713 ] Kihwal Lee commented on MAPREDUCE-4662: --- bq. One solution is to specify maximum number of queued requests for LinkedBlockingQueue. That could be it, but this solution needs more changes. When the queue is full and the max number of threads are running, new task will be rejected. We could apply CallerRunsPolicy, but the whole point of having ThreadPoolExecutor is to avoid blocking of JobTracker for doing job completion. I think the main requirements here are: * Absorb bursty job completions - queueing with sufficient capacity or fast dispatching with a large thread pool. * Avoid limiting job throughput - enough number of worker threads * Minimize consumption of extra resource - limit the number of worker threads * Don't drop anything. To satisfy the first and second requirements, one can think of the following two approaches. * Have a bounded queue and a sufficiently large thread pool. Since we cannot drop any job completion, we want CallerRunsPolicy for rejected ones. * Alternatively, use an unbounded queue and a reasonable number of core threads. No work will be rejected in this case. Between the two, the second one has an advantage, considering the third requirement and its simplicity. The question is, what is the reasonable number of core threads to avoid lagging behind forever? Base on our experience, 3 to 5 seems reasonable. The moveToDone() throughput varies a lot, but it topped at around 0.8/second in one of busiest clusters I've seen. If the job completion rate goes over this rate for a long time, the queue will grow and history won't show up for most of newer jobs. Here are the two approaches in code: * The queue is bounded but will absorb bursts of about 100. If the core thread cannot keep up, up to 10 more threads will be created to help the core thread drain the queue. If the queue cannot be drained fast enough, the caller will directly execute the work. This will block the job tracker, since JobTracker#finalizeJob() is a synchronized method. So the thread pool size and the queue size must be sufficiently large. {noformat} executor = new ThreadPoolExecutor(1, 10, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable(100), ThreadPoolExecutor.CallerRunsPolicy); {noformat} * The following will eventually start up 5 threads and keep them running. Non-blocking and least amount of changes. {noformat} executor = new ThreadPoolExecutor(5, 5, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable()); {noformat} What do you think is better? Or can you think of any better approaches? JobHistoryFilesManager thread pool never expands Key: MAPREDUCE-4662 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1.0.2 Reporter: Thomas Graves The job history file manager creates a threadpool with core size 1 thread, max pool size 3. It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't have a max size. void start() { executor = new ThreadPoolExecutor(1, 3, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable()); } According to the ThreadPoolExecutor java doc page it only increases the number of threads when the queue is full. Since the queue we are using has no max size it never fills up and we never get more then 1 thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands
[ https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459892#comment-13459892 ] Kihwal Lee commented on MAPREDUCE-4662: --- In the second approach, we can also add {{executor.allowsCoreThreadTimeOut()}} to make core threads expire after the keepalive time. I think this will be very close to the original design intention. JobHistoryFilesManager thread pool never expands Key: MAPREDUCE-4662 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1.0.2 Reporter: Thomas Graves The job history file manager creates a threadpool with core size 1 thread, max pool size 3. It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't have a max size. void start() { executor = new ThreadPoolExecutor(1, 3, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable()); } According to the ThreadPoolExecutor java doc page it only increases the number of threads when the queue is full. Since the queue we are using has no max size it never fills up and we never get more then 1 thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands
[ https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned MAPREDUCE-4662: - Assignee: Kihwal Lee JobHistoryFilesManager thread pool never expands Key: MAPREDUCE-4662 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1.0.2 Reporter: Thomas Graves Assignee: Kihwal Lee Attachments: mapreduce-4662.branch-1.patch The job history file manager creates a threadpool with core size 1 thread, max pool size 3. It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't have a max size. void start() { executor = new ThreadPoolExecutor(1, 3, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable()); } According to the ThreadPoolExecutor java doc page it only increases the number of threads when the queue is full. Since the queue we are using has no max size it never fills up and we never get more then 1 thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands
[ https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4662: -- Attachment: mapreduce-4662.branch-1.patch JobHistoryFilesManager thread pool never expands Key: MAPREDUCE-4662 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1.0.2 Reporter: Thomas Graves Assignee: Kihwal Lee Attachments: mapreduce-4662.branch-1.patch The job history file manager creates a threadpool with core size 1 thread, max pool size 3. It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't have a max size. void start() { executor = new ThreadPoolExecutor(1, 3, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable()); } According to the ThreadPoolExecutor java doc page it only increases the number of threads when the queue is full. Since the queue we are using has no max size it never fills up and we never get more then 1 thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands
[ https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4662: -- Status: Patch Available (was: Open) JobHistoryFilesManager thread pool never expands Key: MAPREDUCE-4662 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1.0.2 Reporter: Thomas Graves Assignee: Kihwal Lee Attachments: mapreduce-4662.branch-1.patch The job history file manager creates a threadpool with core size 1 thread, max pool size 3. It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't have a max size. void start() { executor = new ThreadPoolExecutor(1, 3, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable()); } According to the ThreadPoolExecutor java doc page it only increases the number of threads when the queue is full. Since the queue we are using has no max size it never fills up and we never get more then 1 thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands
[ https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459902#comment-13459902 ] Kihwal Lee commented on MAPREDUCE-4662: --- My bad. It should be {{executor.allowCoreThreadTimeOut(true)}}. JobHistoryFilesManager thread pool never expands Key: MAPREDUCE-4662 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1.0.2 Reporter: Thomas Graves Assignee: Kihwal Lee Attachments: mapreduce-4662.branch-1.patch, mapreduce-4662.branch-1.patch The job history file manager creates a threadpool with core size 1 thread, max pool size 3. It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't have a max size. void start() { executor = new ThreadPoolExecutor(1, 3, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable()); } According to the ThreadPoolExecutor java doc page it only increases the number of threads when the queue is full. Since the queue we are using has no max size it never fills up and we never get more then 1 thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands
[ https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4662: -- Attachment: mapreduce-4662.branch-1.patch JobHistoryFilesManager thread pool never expands Key: MAPREDUCE-4662 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1.0.2 Reporter: Thomas Graves Assignee: Kihwal Lee Attachments: mapreduce-4662.branch-1.patch, mapreduce-4662.branch-1.patch The job history file manager creates a threadpool with core size 1 thread, max pool size 3. It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't have a max size. void start() { executor = new ThreadPoolExecutor(1, 3, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable()); } According to the ThreadPoolExecutor java doc page it only increases the number of threads when the queue is full. Since the queue we are using has no max size it never fills up and we never get more then 1 thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands
[ https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13459966#comment-13459966 ] Kihwal Lee commented on MAPREDUCE-4662: --- Manually ran test-patch against branch-1. {noformat} -1 overall. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings. {noformat} * test not included since it is hard to get to the private variables for JobHistoryFilesManager and the ThreadPoolExecutor inside it. * findbugs warnings : there is actually no new warning. The numbers before and after patch are identical. JobHistoryFilesManager thread pool never expands Key: MAPREDUCE-4662 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 1.0.2 Reporter: Thomas Graves Assignee: Kihwal Lee Attachments: mapreduce-4662.branch-1.patch, mapreduce-4662.branch-1.patch The job history file manager creates a threadpool with core size 1 thread, max pool size 3. It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't have a max size. void start() { executor = new ThreadPoolExecutor(1, 3, 1, TimeUnit.HOURS, new LinkedBlockingQueueRunnable()); } According to the ThreadPoolExecutor java doc page it only increases the number of threads when the queue is full. Since the queue we are using has no max size it never fills up and we never get more then 1 thread. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4467) IndexCache failures due to missing synchronization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4467: -- Attachment: mapreduce-4467.patch.txt The new patch addresses Tom's comment. IndexCache failures due to missing synchronization -- Key: MAPREDUCE-4467 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4467 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 0.23.2 Reporter: Andrey Klochkov Assignee: Kihwal Lee Priority: Critical Fix For: 0.23.3, 3.0.0, 2.2.0-alpha Attachments: mapreduce-4467.patch.txt, mapreduce-4467.patch.txt TestMRJobs.testSleepJob fails randomly due to synchronization error in IndexCache: {code} 2012-07-20 19:32:34,627 ERROR [New I/O server worker #2-1] mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(528)) - Shuffle error: java.lang.IllegalMonitorStateException at java.lang.Object.wait(Native Method) at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:74) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:471) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:397) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148) at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} A related issue is MAPREDUCE-4384. The change introduced there removed synchronized keyword and hence info.wait() call fails. Tbis needs to be wrapped into a synchronized block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4470) Fix TestCombineFileInputFormat.testForEmptyFile
Kihwal Lee created MAPREDUCE-4470: - Summary: Fix TestCombineFileInputFormat.testForEmptyFile Key: MAPREDUCE-4470 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4470 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.1.0-alpha, 3.0.0 TestCombineFileInputFormat.testForEmptyFile started failing after HADOOP-8599. It expects one split on an empty input file, but with HADOOP-8599 it gets zero. The new behavior seems correct, but is it breaking anything else? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4427) Enable the RM to work with AM's that are not managed by it
[ https://issues.apache.org/jira/browse/MAPREDUCE-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420993#comment-13420993 ] Kihwal Lee commented on MAPREDUCE-4427: --- TestClientRMService.testGetQueueInfo has been consistently failing since MAPREDUCE-4427. MAPREDUCE-4471 has been filed. Enable the RM to work with AM's that are not managed by it -- Key: MAPREDUCE-4427 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4427 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Bikas Saha Assignee: Bikas Saha Labels: mrv2 Fix For: 2.1.0-alpha Attachments: MAPREDUCE-4427-1.patch, MAPREDUCE-4427-2.patch, MAPREDUCE-4427-3.patch Currently, the RM itself manages the AM by allocating a container for it and negotiating the launch on the NodeManager and manages the AM lifecycle. Thereafter, the AM negotiates resources with the RM and launches tasks to do the real work. It would be a useful improvement to enhance this model by allowing the AM to be launched independently by the client without requiring the RM. These AM's would be launched on a gateway machine that can talk to the cluster. This would open up new use cases such as the following 1) Easy debugging of AM, specially during initial development. Having the AM launched on an arbitrary cluster node makes it hard to looks at logs or attach a debugger to the AM. If it can be launched locally then these tasks would be easier. 2) Running AM's that need special privileges that may not be available on machines managed by the NodeManager -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4471) TestClientRMService.testGetQueueInfo failing after MR-4427
Kihwal Lee created MAPREDUCE-4471: - Summary: TestClientRMService.testGetQueueInfo failing after MR-4427 Key: MAPREDUCE-4471 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4471 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.1.0-alpha Reporter: Kihwal Lee Fix For: 2.1.0-alpha, head TestClientRMService.testGetQueueInfo has been consistently failing since MAPREDUCE-4427. {noformat} java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:407) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:393) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetQueueInfo(TestClientRMService.java:138) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4471) TestClientRMService.testGetQueueInfo failing after MR-4427
[ https://issues.apache.org/jira/browse/MAPREDUCE-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420995#comment-13420995 ] Kihwal Lee commented on MAPREDUCE-4471: --- Apparently MAPREDUCE-4440 fixed it. TestClientRMService.testGetQueueInfo failing after MR-4427 -- Key: MAPREDUCE-4471 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4471 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.1.0-alpha Reporter: Kihwal Lee Fix For: 2.1.0-alpha, head TestClientRMService.testGetQueueInfo has been consistently failing since MAPREDUCE-4427. {noformat} java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:407) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:393) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetQueueInfo(TestClientRMService.java:138) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4471) TestClientRMService.testGetQueueInfo failing after MR-4427
[ https://issues.apache.org/jira/browse/MAPREDUCE-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee resolved MAPREDUCE-4471. --- Resolution: Invalid TestClientRMService.testGetQueueInfo failing after MR-4427 -- Key: MAPREDUCE-4471 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4471 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.1.0-alpha Reporter: Kihwal Lee Fix For: 2.1.0-alpha, head TestClientRMService.testGetQueueInfo has been consistently failing since MAPREDUCE-4427. {noformat} java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:407) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:393) at org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService.testGetQueueInfo(TestClientRMService.java:138) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4427) Enable the RM to work with AM's that are not managed by it
[ https://issues.apache.org/jira/browse/MAPREDUCE-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13420996#comment-13420996 ] Kihwal Lee commented on MAPREDUCE-4427: --- Nevermind. Arun fixed it in MAPREDUCE-4440. Enable the RM to work with AM's that are not managed by it -- Key: MAPREDUCE-4427 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4427 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Bikas Saha Assignee: Bikas Saha Labels: mrv2 Fix For: 2.1.0-alpha Attachments: MAPREDUCE-4427-1.patch, MAPREDUCE-4427-2.patch, MAPREDUCE-4427-3.patch Currently, the RM itself manages the AM by allocating a container for it and negotiating the launch on the NodeManager and manages the AM lifecycle. Thereafter, the AM negotiates resources with the RM and launches tasks to do the real work. It would be a useful improvement to enhance this model by allowing the AM to be launched independently by the client without requiring the RM. These AM's would be launched on a gateway machine that can talk to the cluster. This would open up new use cases such as the following 1) Easy debugging of AM, specially during initial development. Having the AM launched on an arbitrary cluster node makes it hard to looks at logs or attach a debugger to the AM. If it can be launched locally then these tasks would be easier. 2) Running AM's that need special privileges that may not be available on machines managed by the NodeManager -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4467) IndexCache failures due to missing synchronization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4467: -- Attachment: mapreduce-4467.patch.txt Sorry about the bug. Patch attached. IndexCache failures due to missing synchronization -- Key: MAPREDUCE-4467 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4467 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 0.23.2 Reporter: Andrey Klochkov Fix For: 0.23.3, 3.0.0, 2.2.0-alpha Attachments: mapreduce-4467.patch.txt TestMRJobs.testSleepJob fails randomly due to synchronization error in IndexCache: {code} 2012-07-20 19:32:34,627 ERROR [New I/O server worker #2-1] mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(528)) - Shuffle error: java.lang.IllegalMonitorStateException at java.lang.Object.wait(Native Method) at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:74) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:471) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:397) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148) at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} A related issue is MAPREDUCE-4384. The change introduced there removed synchronized keyword and hence info.wait() call fails. Tbis needs to be wrapped into a synchronized block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4467) IndexCache failures due to missing synchronization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned MAPREDUCE-4467: - Assignee: Kihwal Lee IndexCache failures due to missing synchronization -- Key: MAPREDUCE-4467 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4467 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 0.23.2 Reporter: Andrey Klochkov Assignee: Kihwal Lee Fix For: 0.23.3, 3.0.0, 2.2.0-alpha Attachments: mapreduce-4467.patch.txt TestMRJobs.testSleepJob fails randomly due to synchronization error in IndexCache: {code} 2012-07-20 19:32:34,627 ERROR [New I/O server worker #2-1] mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(528)) - Shuffle error: java.lang.IllegalMonitorStateException at java.lang.Object.wait(Native Method) at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:74) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:471) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:397) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148) at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} A related issue is MAPREDUCE-4384. The change introduced there removed synchronized keyword and hence info.wait() call fails. Tbis needs to be wrapped into a synchronized block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4467) IndexCache failures due to missing synchronization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4467: -- Fix Version/s: 2.2.0-alpha 3.0.0 0.23.3 Status: Patch Available (was: Open) IndexCache failures due to missing synchronization -- Key: MAPREDUCE-4467 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4467 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 0.23.2 Reporter: Andrey Klochkov Assignee: Kihwal Lee Fix For: 0.23.3, 3.0.0, 2.2.0-alpha Attachments: mapreduce-4467.patch.txt TestMRJobs.testSleepJob fails randomly due to synchronization error in IndexCache: {code} 2012-07-20 19:32:34,627 ERROR [New I/O server worker #2-1] mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(528)) - Shuffle error: java.lang.IllegalMonitorStateException at java.lang.Object.wait(Native Method) at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:74) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:471) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:397) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148) at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} A related issue is MAPREDUCE-4384. The change introduced there removed synchronized keyword and hence info.wait() call fails. Tbis needs to be wrapped into a synchronized block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4467) IndexCache failures due to missing synchronization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419527#comment-13419527 ] Kihwal Lee commented on MAPREDUCE-4467: --- Additional test not needed. Existing test case detected the breakage. IndexCache failures due to missing synchronization -- Key: MAPREDUCE-4467 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4467 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 0.23.2 Reporter: Andrey Klochkov Assignee: Kihwal Lee Fix For: 0.23.3, 3.0.0, 2.2.0-alpha Attachments: mapreduce-4467.patch.txt TestMRJobs.testSleepJob fails randomly due to synchronization error in IndexCache: {code} 2012-07-20 19:32:34,627 ERROR [New I/O server worker #2-1] mapred.ShuffleHandler (ShuffleHandler.java:exceptionCaught(528)) - Shuffle error: java.lang.IllegalMonitorStateException at java.lang.Object.wait(Native Method) at org.apache.hadoop.mapred.IndexCache.getIndexInformation(IndexCache.java:74) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.sendMapOutput(ShuffleHandler.java:471) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:397) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:148) at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:116) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.unfoldAndfireMessageReceived(ReplayingDecoder.java:522) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:506) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:443) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} A related issue is MAPREDUCE-4384. The change introduced there removed synchronized keyword and hence info.wait() call fails. Tbis needs to be wrapped into a synchronized block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4416: -- Attachment: mapreduce-4416.patch.txt Some tests fail if Clover is enabled Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Priority: Critical Fix For: 2.0.1-alpha, 3.0.0 Attachments: mapreduce-4416.patch.txt There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. I thought MAPREDUCE-4253 had something to do with this, but I can reproduce the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a problem and it has been reported to the jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4416: -- Status: Patch Available (was: Open) Some tests fail if Clover is enabled Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Priority: Critical Fix For: 2.0.1-alpha, 3.0.0 Attachments: mapreduce-4416.patch.txt There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. I thought MAPREDUCE-4253 had something to do with this, but I can reproduce the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a problem and it has been reported to the jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4416: -- Description: There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. I thought MAPREDUCE-4253 had something to do with this, but I can reproduce the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a problem and it has been reported to the jira. was: There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. I thought MAPREDUCE-4253 had something to do this, but I can reproduce the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a problem and it has been reported to the jira. Some tests fail if Clover is enabled Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Priority: Critical Fix For: 2.0.1-alpha, 3.0.0 Attachments: mapreduce-4416.patch.txt There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. I thought MAPREDUCE-4253 had something to do with this, but I can reproduce the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a problem and it has been reported to the jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4416: -- Attachment: mapreduce-4416.patch.txt Some tests fail if Clover is enabled Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 2.0.1-alpha, 3.0.0 Attachments: mapreduce-4416.patch.txt, mapreduce-4416.patch.txt There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. I thought MAPREDUCE-4253 had something to do with this, but I can reproduce the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a problem and it has been reported to the jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4393) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/MAPREDUCE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413294#comment-13413294 ] Kihwal Lee commented on MAPREDUCE-4393: --- I think use of ZK is fine since it won't be pretty for routers to poll status from RM (to get the list of AMs) and AM (to get updates on app instances). Multiple AMs can run on the same node, so a predefined port number cannot be used. Then there has to be a way to discover the port number. Having ZK in the picture certainly helps. But depending on the requirement on router, all external dependencies (router zk) can be substituted with another YARN app! PaaS System App? If we do this, the PaaS app can be made to talk to any one of the two types of management system. PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS Key: MAPREDUCE-4393 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4393 Project: Hadoop Map/Reduce Issue Type: Task Components: examples Affects Versions: 0.23.1 Reporter: Jaigak Song Assignee: Jaigak Song Fix For: 3.0.0 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, MAPREDUCE4393.patch Original Estimate: 336h Remaining Estimate: 336h This application is to demonstrate that YARN can be used for non-mapreduce applications. As Hadoop has already been adopted and deployed widely and its deployment in future will be highly increased, we thought that it's a good potential to be used as PaaS. I have implemented a proof of concept to demonstrate that YARN can be used as a PaaS (Platform as a Service). I have done a gap analysis against VMware's Cloud Foundry and tried to achieve as many PaaS functionalities as possible on YARN. I'd like to check in this POC as a YARN example application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4393) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/MAPREDUCE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413337#comment-13413337 ] Kihwal Lee commented on MAPREDUCE-4393: --- I didn't mean that the manager AM is responsible for launching app AMs. I think it can be a separate yarn app. They don't even have to be any start-up dependency among them, if we design communication protocol well. This also makes restart easy. If we can (re)launch the manager AM on one of the predefined set of hosts, most of the requirements can be met. By storing system state in the hdfs and reading back on restart, it can go back in sync fast and offer service again. Routers can be provisioned similarly, but they will acquire state information from the manager AM. The service discovery is simplified by the fact that they will be on specific hosts. If a VIP is used to deal with service up/down or migration among the given set of hosts, the service discovery is further simplified. Since they are independent app instances or independent yarn apps, a crash/restart of one thing won't force termination of others. The one thing I am not sure about is the ability to specifying a specific set of candidate hosts for launching AM. If not supported already, we can launch AM on a random host and then launch containers on a specific set of hosts, but that lowers the reliability. Or maybe the AM can be anywhere and the container launched from it will only be used for service discovery. I am not insisting on doing this now, but it will be nice if everything is contained in YARN so that setting up is simpler and it is easily demoable. PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS Key: MAPREDUCE-4393 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4393 Project: Hadoop Map/Reduce Issue Type: Task Components: examples Affects Versions: 0.23.1 Reporter: Jaigak Song Assignee: Jaigak Song Fix For: 3.0.0 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE4393.patch, MAPREDUCE4393.patch Original Estimate: 336h Remaining Estimate: 336h This application is to demonstrate that YARN can be used for non-mapreduce applications. As Hadoop has already been adopted and deployed widely and its deployment in future will be highly increased, we thought that it's a good potential to be used as PaaS. I have implemented a proof of concept to demonstrate that YARN can be used as a PaaS (Platform as a Service). I have done a gap analysis against VMware's Cloud Foundry and tried to achieve as many PaaS functionalities as possible on YARN. I'd like to check in this POC as a YARN example application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4253) Tests for mapreduce-client-core are lying under mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411316#comment-13411316 ] Kihwal Lee commented on MAPREDUCE-4253: --- Harsh, I noticed that your {{svn mv}} script actually moved only 19 files out of 31. I think the ones that go into non-existing directories failed. Tests for mapreduce-client-core are lying under mapreduce-client-jobclient -- Key: MAPREDUCE-4253 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4253 Project: Hadoop Map/Reduce Issue Type: Task Components: client Affects Versions: 2.0.0-alpha Reporter: Harsh J Fix For: 2.0.1-alpha Attachments: MR-4253.1.patch, MR-4253.2.patch, crossing_project_checker.rb, result.txt Many of the tests for client libs from mapreduce-client-core are lying under mapreduce-client-jobclient. We should investigate if this is the right thing to do and if not, move the tests back into client-core. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4416) Some tests fail if Clover is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411354#comment-13411354 ] Kihwal Lee commented on MAPREDUCE-4416: --- The failing tests are using MiniMRCluster and submit jobs. Jobs fail because containers' classpath does not contain the clover jar. If I make the leaf project to pick up non-clovered mr-client-app, at least AM works. But custom mapper, reducers, etc. defined in mr-client-jobclient will be instrumented and be part of the code running inside containers, so the containers should be able to locate the clover jar. We have a record of these tests working with clover at least on June 24. So I went back and tried the old revision but it didn't work this time... I wonder how it ever worked. Before MAPREDUCE-4082, it seems the classpath in mr-client-app contained the clover jar. The jira comments also shows clover being in the generated classpath. The now problematic clovered tests might have worked okay back then. Some tests were also being ignored. There is MAPREDUCE-4141 that removed the hard dependency on clover. If these tests accidentally worked before, this might have stopped it. Maybe running clovered test code in yarn containers does not make sense. They are separate processes launched by something other than the test framework. The clover instrumentation doesn't seem to be designed to naturally cover them. We could exclude some of test helper classes from instrumentation. Some tests fail if Clover is enabled Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Priority: Critical Fix For: 2.0.1-alpha, 3.0.0 There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. It seems this started happening after MAPREDUCE-4253. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4416) Some tests fail if Clover is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411372#comment-13411372 ] Kihwal Lee commented on MAPREDUCE-4416: --- AM as well as mapper/reudcer fails, if run tests (e.g. {{TestChild}}) normally with {{-Pclover}}. {noformat} [CLOVER] FATAL ERROR: Clover could not be initialised. Are you sure you have Clover in the runtime classpath? (class java.lang.NoClassDefFoundError:com_cenqua_clover/CloverVersionInfo) Exception in thread main java.lang.NoClassDefFoundError: com_cenqua_clover/CoverageRecorder at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1017) {noformat} Some tests fail if Clover is enabled Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Priority: Critical Fix For: 2.0.1-alpha, 3.0.0 There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. It seems this started happening after MAPREDUCE-4253. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4416: -- Description: There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. I thought MAPREDUCE-4253 had something to do this, but I can reproduce the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a problem and it has been reported to the jira. was: There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. It seems this started happening after MAPREDUCE-4253. Some tests fail if Clover is enabled Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Priority: Critical Fix For: 2.0.1-alpha, 3.0.0 There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. I thought MAPREDUCE-4253 had something to do this, but I can reproduce the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a problem and it has been reported to the jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4253) Tests for mapreduce-client-core are lying under mapreduce-client-jobclient
[ https://issues.apache.org/jira/browse/MAPREDUCE-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411497#comment-13411497 ] Kihwal Lee commented on MAPREDUCE-4253: --- Maybe I am doing something wrong, but I still see only 19 files moved in both branch-2.0 and trunk from the revision history. The postings by Jenkins builds above also show only 19 files. Tests for mapreduce-client-core are lying under mapreduce-client-jobclient -- Key: MAPREDUCE-4253 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4253 Project: Hadoop Map/Reduce Issue Type: Task Components: client Affects Versions: 2.0.0-alpha Reporter: Harsh J Fix For: 2.0.1-alpha Attachments: MR-4253.1.patch, MR-4253.2.patch, crossing_project_checker.rb, result.txt Many of the tests for client libs from mapreduce-client-core are lying under mapreduce-client-jobclient. We should investigate if this is the right thing to do and if not, move the tests back into client-core. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4393) PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS
[ https://issues.apache.org/jira/browse/MAPREDUCE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13411911#comment-13411911 ] Kihwal Lee commented on MAPREDUCE-4393: --- You have probably done it already, but the first thing to make sure is that everything builds okay for all targets and profiles. e.g. build and run test with clover (-Pclover). The test-patch process is most useful when existing code is modified, so in your case it would be nice if you could report more testing results. People will also like to hear about your experience on writing a new YARN app. There are on-going works to make it easier to develop and debug apps. I am sure these efforts will benefit from your input. PaaS on YARN: an YARN application to demonstrate that YARN can be used as a PaaS Key: MAPREDUCE-4393 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4393 Project: Hadoop Map/Reduce Issue Type: Task Components: examples Affects Versions: 0.23.1 Reporter: Jaigak Song Assignee: Jaigak Song Fix For: 3.0.0 Attachments: HADOOPasPAAS_Architecture.pdf, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch, MAPREDUCE-4393.patch Original Estimate: 336h Remaining Estimate: 336h This application is to demonstrate that YARN can be used for non-mapreduce applications. As Hadoop has already been adopted and deployed widely and its deployment in future will be highly increased, we thought that it's a good potential to be used as PaaS. I have implemented a proof of concept to demonstrate that YARN can be used as a PaaS (Platform as a Service). I have done a gap analysis against VMware's Cloud Foundry and tried to achieve as many PaaS functionalities as possible on YARN. I'd like to check in this POC as a YARN example application. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4416) Some tests fail if Clover is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13412381#comment-13412381 ] Kihwal Lee commented on MAPREDUCE-4416: --- Patrick, thanks for the comment. That might work, but the fact that it's not really an app-level dependency bothers me. So I ended up adding the dependency inside the clover profile in hadoop-project/pom.xml. This causes each module to have clover jar as a dependency when -Pclover is specified. All mrapp-generated-classpath files will include the path to the clover jar. The package build will copy and include the clover jar, but we can't really use instrumented packages anyway. As an alternative to globally adding the dependency, we can do it per module whenever necessary. At least the following two needs the dependency specified. - hadoop-yarn-applications - hadoop-mapreduce-client-jobclient Some tests fail if Clover is enabled Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Priority: Critical Fix For: 2.0.1-alpha, 3.0.0 There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. I thought MAPREDUCE-4253 had something to do this, but I can reproduce the issue on an older revision. Although unrelated, MAPREDUCE-4253 does have a problem and it has been reported to the jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13410390#comment-13410390 ] Kihwal Lee commented on MAPREDUCE-4421: --- bq. How do we handle the native lib dependencies though? Do we ship that as well, or keep a static resource at the NM? There are two parts to this issue: # Specifying dependency: App-level dependencies on native libs should be specified within the app, not by YARN. Apps can also allow each job to specify additional dependencies. A proper merging of LD_LIBRARY_PATH from job, app and hadoop must be done. (was -Djava.library.path in mrv1) Who merges what needs to be made clear (a section in the app writer's guide?). # Making libs available: The simplest way is to let the app ship them for each job. But admins may choose to host app-level dependencies (in a multiversion aware manner) and even some of popular job-level dependencies. YARN should never automatically shove everything to apps. There are pros and cons in both approaches. A well designed app will support both. YARN should not remove or override legitimate app/job-level dependencies. I think YARN already satisfies this, but there might be some areas that need improvement. We should also provide a clear guide on how to manage dependencies for app writers and admins. This is quite similar to jar dependency management (this jira) in principle. Remove dependency on deployed MR jars - Key: MAPREDUCE-4421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4421 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Arun C Murthy Assignee: Arun C Murthy Currently MR AM depends on MR jars being deployed on all nodes via implicit dependency on YARN_APPLICATION_CLASSPATH. We should stop adding mapreduce jars to YARN_APPLICATION_CLASSPATH and, probably, just rely on adding a shaded MR jar along with job.jar to the dist-cache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4416: -- Priority: Critical (was: Major) Some tests fail if Clover is enabled Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Priority: Critical Fix For: 2.0.1-alpha, 3.0.0 There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. It seems this started happening after MAPREDUCE-4253. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4416) Some tests run twice or fail if Clover is enabled
Kihwal Lee created MAPREDUCE-4416: - Summary: Some tests run twice or fail if Clover is enabled Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Fix For: 2.0.1-alpha, 3.0.0 Some tests run twice. E.g. try mvn test -Dtest=TestJobConf. It runs under hadoop-mapreduce-client-core and hadoop-mapreduce-client-jobclient. There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. It seems this started happening after MAPREDUCE-4253. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4416) Some tests run twice or fail if Clover is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409918#comment-13409918 ] Kihwal Lee commented on MAPREDUCE-4416: --- There are actually two different TestJobConf. One in o.a.h.conf and another one in o.a.h.mapred. It's confusing, but not really a problem. I had 37 failures/errors in jobclient when Clover is enabled. {noformat} Failed tests: testChild(org.apache.hadoop.mapreduce.TestChild) testDefaultCleanupAndAbort(org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter): Job failed! testCustomAbort(org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter): Job failed! testCustomCleanup(org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter): Job failed! testValidProxyUser(org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser) testJobSucceed(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs): Job expected to succeed failed testJobFail(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs) testSleepJob(org.apache.hadoop.mapreduce.v2.TestMRJobs) testRandomWriter(org.apache.hadoop.mapreduce.v2.TestMRJobs) testDistributedCache(org.apache.hadoop.mapreduce.v2.TestMRJobs) testSleepJob(org.apache.hadoop.mapreduce.v2.TestUberAM) testRandomWriter(org.apache.hadoop.mapreduce.v2.TestUberAM) testFailingMapper(org.apache.hadoop.mapreduce.v2.TestUberAM): expected:false but was:true testSpeculativeExecution(org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution) testLazyOutput(org.apache.hadoop.mapreduce.TestMapReduceLazyOutput) testHeapUsageCounter(org.apache.hadoop.mapred.TestJobCounters): Job job_1341837408279_0001 failed! testDefaultCleanupAndAbort(org.apache.hadoop.mapred.TestJobCleanup): Done file /home/y/var/builds/thread2/workspace/Cloud-Hadoop-All-2.0-Component/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/test-job-cleanup/output-0/_SUCCESS missing for job job_1341837505379_0001 testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup): Done file /home/y/var/builds/thread2/workspace/Cloud-Hadoop-All-2.0-Component/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/test-job-cleanup/output-1/_SUCCESS missing for job job_1341837505379_0002 testCustomCleanup(org.apache.hadoop.mapred.TestJobCleanup): Done file /home/y/var/builds/thread2/workspace/Cloud-Hadoop-All-2.0-Component/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/test-dir/test-job-cleanup/output-2/_custom_cleanup missing for job job_1341837505379_0003 testTaskTempDir(org.apache.hadoop.mapred.TestMiniMRChildTask) testTaskEnv(org.apache.hadoop.mapred.TestMiniMRChildTask): The environment checker job failed. testTaskOldEnv(org.apache.hadoop.mapred.TestMiniMRChildTask): The environment checker job failed. testJob(org.apache.hadoop.mapred.TestMiniMRClientCluster) Tests in error: testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs): 0 testMR(org.apache.hadoop.mapred.TestClusterMRNotification): Job failed! testComplexName(org.apache.hadoop.mapred.TestJobName): Job failed! testComplexNameWithRegex(org.apache.hadoop.mapred.TestJobName): Job failed! testReduceFromPartialMem(org.apache.hadoop.mapred.TestReduceFetchFromPartialMem): Job failed! testClassPath(org.apache.hadoop.mapred.TestMiniMRClasspath): Job failed! testExternalWritable(org.apache.hadoop.mapred.TestMiniMRClasspath): Job failed! testWithDFS(org.apache.hadoop.mapred.TestJobSysDirWithDFS): Job failed! testReduceFromPartialMem(org.apache.hadoop.mapred.TestReduceFetchFromPartialMem): Job failed! testLazyOutput(org.apache.hadoop.mapred.TestLazyOutput): Job failed! testDistinctUsers(org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers): Job failed! testMultipleSpills(org.apache.hadoop.mapred.TestMiniMRWithDFSWithDistinctUsers): Job failed! testMapReduce(org.apache.hadoop.mapred.TestClusterMapReduceTestCase): Job failed! testMapReduceRestarting(org.apache.hadoop.mapred.TestClusterMapReduceTestCase): Job failed! Tests run: 381, Failures: 23, Errors: 14, Skipped: 14 {noformat} For the failing test cases, the container's stderr files contain the following: {noformat} [CLOVER] FATAL ERROR: Clover could not be initialised. Are you sure you have Clover in the runtime classpath? (class java.lang.NoClassDefFoundError:com_cenqua_clover/CloverVersionInfo) {noformat} Some tests run twice or fail if Clover is enabled - Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Fix For: 2.0.1-alpha, 3.0.0
[jira] [Updated] (MAPREDUCE-4416) Some tests fail if Clover is enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4416: -- Description: There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. It seems this started happening after MAPREDUCE-4253. was: Some tests run twice. E.g. try mvn test -Dtest=TestJobConf. It runs under hadoop-mapreduce-client-core and hadoop-mapreduce-client-jobclient. There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. It seems this started happening after MAPREDUCE-4253. Summary: Some tests fail if Clover is enabled (was: Some tests run twice or fail if Clover is enabled) Some tests fail if Clover is enabled Key: MAPREDUCE-4416 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4416 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Kihwal Lee Fix For: 2.0.1-alpha, 3.0.0 There are number of tests running under hadoop-mapreduce-client-jobclient that fail if Clover is enabled. Whenever a job is launched, AM doesn't start because it can't locate the clover jar file. It seems this started happening after MAPREDUCE-4253. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4384) Race conditions in IndexCache
[ https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4384: -- Attachment: mapreduce-4384.patch Attaching new patch without the test case. Race conditions in IndexCache - Key: MAPREDUCE-4384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 0.23.3, 2.0.1-alpha, 3.0.0 Attachments: mapreduce-4384.patch, mapreduce-4384.patch, mapreduce-4384.patch, mapreduce-4384.patch TestIndexCache is intermittently failing due to a race condition. Up on inspection of IndexCache implementation, more potential issues have been discovered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4384) Race conditions in IndexCache
[ https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405121#comment-13405121 ] Kihwal Lee commented on MAPREDUCE-4384: --- MAPREDUCE-4253 moved the test file to a different directory. I will post an updated patch. Race conditions in IndexCache - Key: MAPREDUCE-4384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 0.23.3, 2.0.1-alpha, 3.0.0 Attachments: mapreduce-4384.patch TestIndexCache is intermittently failing due to a race condition. Up on inspection of IndexCache implementation, more potential issues have been discovered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4384) Race conditions in IndexCache
[ https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4384: -- Attachment: mapreduce-4384.patch Race conditions in IndexCache - Key: MAPREDUCE-4384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 0.23.3, 2.0.1-alpha, 3.0.0 Attachments: mapreduce-4384.patch, mapreduce-4384.patch TestIndexCache is intermittently failing due to a race condition. Up on inspection of IndexCache implementation, more potential issues have been discovered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4384) Race conditions in IndexCache
[ https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated MAPREDUCE-4384: -- Attachment: mapreduce-4384.patch Race conditions in IndexCache - Key: MAPREDUCE-4384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 0.23.3, 2.0.1-alpha, 3.0.0 Attachments: mapreduce-4384.patch, mapreduce-4384.patch, mapreduce-4384.patch TestIndexCache is intermittently failing due to a race condition. Up on inspection of IndexCache implementation, more potential issues have been discovered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4384) Race conditions in IndexCache
[ https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405245#comment-13405245 ] Kihwal Lee commented on MAPREDUCE-4384: --- I posted an updated patch but the PreCommit-Admin build job hasn't run for almost two hours... Race conditions in IndexCache - Key: MAPREDUCE-4384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 0.23.3, 2.0.1-alpha, 3.0.0 Attachments: mapreduce-4384.patch, mapreduce-4384.patch, mapreduce-4384.patch TestIndexCache is intermittently failing due to a race condition. Up on inspection of IndexCache implementation, more potential issues have been discovered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4384) Race conditions in IndexCache
[ https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405289#comment-13405289 ] Kihwal Lee commented on MAPREDUCE-4384: --- I ran test-patch manually. There were 2066 (-3) javac warnings with the new patch. Race conditions in IndexCache - Key: MAPREDUCE-4384 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 0.23.3, 2.0.1-alpha, 3.0.0 Attachments: mapreduce-4384.patch, mapreduce-4384.patch, mapreduce-4384.patch TestIndexCache is intermittently failing due to a race condition. Up on inspection of IndexCache implementation, more potential issues have been discovered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4387) RM gets fatal error and exits during TestRM
Kihwal Lee created MAPREDUCE-4387: - Summary: RM gets fatal error and exits during TestRM Key: MAPREDUCE-4387 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4387 Project: Hadoop Map/Reduce Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.0.1-alpha, 3.0.0 It doesn't happen on my desktop, but it happens frequently during the builds with clover enabled. Surefire will report it as fork failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4387) RM gets fatal error and exits during TestRM
[ https://issues.apache.org/jira/browse/MAPREDUCE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404197#comment-13404197 ] Kihwal Lee commented on MAPREDUCE-4387: --- The test was calling {{ReourceManager#stop()}} after it thought it's done. This hit {{GenericEventHandler}} with an interrupt while it was trying to enqueue an event. This bubbled up and hit the RM's {{EventProcessor}} loop, which did System.exit(-1). It checks whether the JVM is being shutdown, but this is before {{ShutdownHookManager}} is activated. It seems {{EventProcessor}} shouldn't do exit(-1) if it's got an exception during shutdown. RM gets fatal error and exits during TestRM --- Key: MAPREDUCE-4387 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4387 Project: Hadoop Map/Reduce Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.0.1-alpha, 3.0.0 It doesn't happen on my desktop, but it happens frequently during the builds with clover enabled. Surefire will report it as fork failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4387) RM gets fatal error and exits during TestRM
[ https://issues.apache.org/jira/browse/MAPREDUCE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404212#comment-13404212 ] Kihwal Lee commented on MAPREDUCE-4387: --- {{EventProcessor#stop()}} is setting {stopped} to true before interrupting the thread. So we could just add one more condition and let it terminate normally. {code} while (!stopped !Thread.currentThread().isInterrupted()) { try { event = eventQueue.take(); } catch (InterruptedException e) { LOG.error(Returning, interrupted : + e); return; // TODO: Kill RM. } try { scheduler.handle(event); } catch (Throwable t) { + if (stopped) { + LOG.warn(Exception during shutdown: , t); + break; + } LOG.fatal(Error in handling event type + event.getType() + to the scheduler, t); if (shouldExitOnError !ShutdownHookManager.get().isShutdownInProgress()) { LOG.info(Exiting, bbye..); System.exit(-1); } } } {code} RM gets fatal error and exits during TestRM --- Key: MAPREDUCE-4387 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4387 Project: Hadoop Map/Reduce Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.0.1-alpha, 3.0.0 It doesn't happen on my desktop, but it happens frequently during the builds with clover enabled. Surefire will report it as fork failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4387) RM gets fatal error and exits during TestRM
[ https://issues.apache.org/jira/browse/MAPREDUCE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404227#comment-13404227 ] Kihwal Lee commented on MAPREDUCE-4387: --- I tested this idea and it worked. With {{-Pclover}}, TestRM always fails. With the added check, it succeeds. BEFORE {noformat} --- T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.TestRM Results : Tests run: 0, Failures: 0, Errors: 0, Skipped: 0 [INFO] hadoop-yarn-server-resourcemanager FAILURE [5.180s] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project hadoop-yarn-server-resourcemanager: ExecutionException; nested exception is java.util.concurrent.ExecutionException: org.apache.maven.surefire.booter.SurefireBooterForkException: Error occurred in starting fork, check output in log - [Help 1] {noformat} AFTER {noformat} --- T E S T S --- Running org.apache.hadoop.yarn.server.resourcemanager.TestRM Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.973 sec Results : Tests run: 3, Failures: 0, Errors: 0, Skipped: 0 {noformat} RM gets fatal error and exits during TestRM --- Key: MAPREDUCE-4387 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4387 Project: Hadoop Map/Reduce Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Fix For: 2.0.1-alpha, 3.0.0 It doesn't happen on my desktop, but it happens frequently during the builds with clover enabled. Surefire will report it as fork failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-4387) RM gets fatal error and exits during TestRM
[ https://issues.apache.org/jira/browse/MAPREDUCE-4387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned MAPREDUCE-4387: - Assignee: Kihwal Lee RM gets fatal error and exits during TestRM --- Key: MAPREDUCE-4387 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4387 Project: Hadoop Map/Reduce Issue Type: Bug Components: resourcemanager Affects Versions: 2.0.0-alpha Reporter: Kihwal Lee Assignee: Kihwal Lee Fix For: 2.0.1-alpha, 3.0.0 It doesn't happen on my desktop, but it happens frequently during the builds with clover enabled. Surefire will report it as fork failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira