[jira] [Commented] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal
[ https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606137#comment-17606137 ] Prabhu Joseph commented on MAPREDUCE-7351: -- This patch removes the _temporary directory under output path and not the output path. And in case of job succeed or failed, even without this patch the _temporary directory under output path will be removed. > CleanupJob during handle of SIGTERM signal > -- > > Key: MAPREDUCE-7351 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Shubham Gupta >Priority: Minor > Fix For: 3.3.0 > > Attachments: MAPREDUCE-7351-001.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Currently MR CleanupJob happens when the job is either successful or fail. > But during kill, it is not handled. This leaves all the temporary folders > under the output path. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7201) Make Job History File Permissions configurable
[ https://issues.apache.org/jira/browse/MAPREDUCE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7201: - Fix Version/s: 3.4.0 Resolution: Fixed Status: Resolved (was: Patch Available) > Make Job History File Permissions configurable > -- > > Key: MAPREDUCE-7201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Ashutosh Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: MAPREDUCE-7201-001.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to > configure the intermediate user directory permission but still the jhist file > permission are not changed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-7369) MapReduce tasks timing out when spends more time on MultipleOutputs#close
[ https://issues.apache.org/jira/browse/MAPREDUCE-7369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned MAPREDUCE-7369: Assignee: Ravuri Sushma sree (was: Prabhu Joseph) > MapReduce tasks timing out when spends more time on MultipleOutputs#close > - > > Key: MAPREDUCE-7369 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7369 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.3.1 >Reporter: Prabhu Joseph >Assignee: Ravuri Sushma sree >Priority: Major > > MapReduce tasks timing out when spends more time on MultipleOutputs#close. > MultipleOutputs#closes takes more time when there are multiple files to be > closed & there is a high latency in closing a stream. > {code} > 2021-11-01 02:45:08,312 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics > report from attempt_1634949471086_61268_m_001115_0: > AttemptID:attempt_1634949471086_61268_m_001115_0 Timed out after 300 secs > {code} > MapReduce task timeout can be increased but it is tough to set the right > timeout value. The timeout can be disabled with 0 but that might lead to > hanging tasks not getting killed. > The tasks are sending the ping every 3 seconds which are not honored by > ApplicationMaster. It expects the status information which won't be send > during MultipleOutputs#close. This jira is to add a config which considers > the ping from task as part of Task Liveliness Check in the ApplicationMaster. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7369) MapReduce tasks timing out when spends more time on MultipleOutputs#close
[ https://issues.apache.org/jira/browse/MAPREDUCE-7369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450668#comment-17450668 ] Prabhu Joseph commented on MAPREDUCE-7369: -- bq. Have you thought about also parallelising the close so that and the different outputs can be closed simultaneously? That will improve the speed. Have reported [MapReduce-7370|https://issues.apache.org/jira/browse/MAPREDUCE-7370] to handle the same. Thanks. > MapReduce tasks timing out when spends more time on MultipleOutputs#close > - > > Key: MAPREDUCE-7369 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7369 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.3.1 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > MapReduce tasks timing out when spends more time on MultipleOutputs#close. > MultipleOutputs#closes takes more time when there are multiple files to be > closed & there is a high latency in closing a stream. > {code} > 2021-11-01 02:45:08,312 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics > report from attempt_1634949471086_61268_m_001115_0: > AttemptID:attempt_1634949471086_61268_m_001115_0 Timed out after 300 secs > {code} > MapReduce task timeout can be increased but it is tough to set the right > timeout value. The timeout can be disabled with 0 but that might lead to > hanging tasks not getting killed. > The tasks are sending the ping every 3 seconds which are not honored by > ApplicationMaster. It expects the status information which won't be send > during MultipleOutputs#close. This jira is to add a config which considers > the ping from task as part of Task Liveliness Check in the ApplicationMaster. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7370) Parallelize MultipleOutputs#close call
[ https://issues.apache.org/jira/browse/MAPREDUCE-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7370: - Description: This call takes more time when there are lot of files to close and there is a high latency to close. Parallelize MultipleOutputs#close call to improve the speed. {code} public void close() throws IOException { for (RecordWriter writer : recordWriters.values()) { writer.close(null); } } {code} Idea is from [~ste...@apache.org] was: This call takes more time when there are lot of files to close and there is a high latency to close. Parallelize MultipleOutputs#close call to improve the speed. {code} public void close() throws IOException { for (RecordWriter writer : recordWriters.values()) { writer.close(null); } } {code} > Parallelize MultipleOutputs#close call > -- > > Key: MAPREDUCE-7370 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7370 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Ravuri Sushma sree >Priority: Major > > This call takes more time when there are lot of files to close and there is a > high latency to close. Parallelize MultipleOutputs#close call to improve the > speed. > {code} > public void close() throws IOException { > for (RecordWriter writer : recordWriters.values()) { > writer.close(null); > } > } > {code} > Idea is from [~ste...@apache.org] -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7370) Parallelize MultipleOutputs#close call
Prabhu Joseph created MAPREDUCE-7370: Summary: Parallelize MultipleOutputs#close call Key: MAPREDUCE-7370 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7370 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Ravuri Sushma sree This call takes more time when there are lot of files to close and there is a high latency to close. Parallelize MultipleOutputs#close call to improve the speed. {code} public void close() throws IOException { for (RecordWriter writer : recordWriters.values()) { writer.close(null); } } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7369) MapReduce tasks timing out when spends more time on MultipleOutputs#close
Prabhu Joseph created MAPREDUCE-7369: Summary: MapReduce tasks timing out when spends more time on MultipleOutputs#close Key: MAPREDUCE-7369 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7369 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.3.1 Reporter: Prabhu Joseph Assignee: Prabhu Joseph MapReduce tasks timing out when spends more time on MultipleOutputs#close. MultipleOutputs#closes takes more time when there are multiple files to be closed & there is a high latency in closing a stream. {code} 2021-11-01 02:45:08,312 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1634949471086_61268_m_001115_0: AttemptID:attempt_1634949471086_61268_m_001115_0 Timed out after 300 secs {code} MapReduce task timeout can be increased but it is tough to set the right timeout value. The timeout can be disabled with 0 but that might lead to hanging tasks not getting killed. The tasks are sending the ping every 3 seconds which are not honored by ApplicationMaster. It expects the status information which won't be send during MultipleOutputs#close. This jira is to add a config which considers the ping from task as part of Task Liveliness Check in the ApplicationMaster. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal
[ https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7351: - Labels: (was: pull-request-available) > CleanupJob during handle of SIGTERM signal > -- > > Key: MAPREDUCE-7351 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Shubham Gupta >Priority: Minor > Fix For: 3.3.0 > > Attachments: MAPREDUCE-7351-001.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Currently MR CleanupJob happens when the job is either successful or fail. > But during kill, it is not handled. This leaves all the temporary folders > under the output path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal
[ https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7351: - Resolution: Fixed Status: Resolved (was: Patch Available) > CleanupJob during handle of SIGTERM signal > -- > > Key: MAPREDUCE-7351 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Shubham Gupta >Priority: Minor > Fix For: 3.3.0 > > Attachments: MAPREDUCE-7351-001.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Currently MR CleanupJob happens when the job is either successful or fail. > But during kill, it is not handled. This leaves all the temporary folders > under the output path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal
[ https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376197#comment-17376197 ] Prabhu Joseph commented on MAPREDUCE-7351: -- Thanks [~shubhamod] for the patch. Have committed it to trunk. > CleanupJob during handle of SIGTERM signal > -- > > Key: MAPREDUCE-7351 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Shubham Gupta >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.0 > > Attachments: MAPREDUCE-7351-001.patch > > Time Spent: 20m > Remaining Estimate: 0h > > Currently MR CleanupJob happens when the job is either successful or fail. > But during kill, it is not handled. This leaves all the temporary folders > under the output path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7355) Fix MRAppMaster to getStagingAreaDir from Job Configuration
[ https://issues.apache.org/jira/browse/MAPREDUCE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7355: - Description: When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to submit the job, client uses /mapreducestaging/yarn as staging directory whereas MRAppMaster uses /mapreducestaging/oozie. This leads to below failure {code} Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: hdfs://yarncluster/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo: No such file or directory. at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263) {code} MRAppMaster can rely on Job Configuration mapreduce.job.dir to avoid this issue. was: When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to submit the job, client uses /mapreducestaging/yarn as staging directory whereas MRAppMaster uses /mapreducestaging/oozie. This leads to below failure {code} Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo: No such file or directory. at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263) {code} MRAppMaster can rely on Job Configuration mapreduce.job.dir to avoid this issue. > Fix MRAppMaster to getStagingAreaDir from Job Configuration > --- > > Key: MAPREDUCE-7355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7355 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to > submit the job, client uses /mapreducestaging/yarn as staging directory > whereas MRAppMaster uses /mapreducestaging/oozie. This leads to below failure > {code} > Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.FileNotFoundException: > hdfs://yarncluster/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo: > No such file or directory. > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.State
[jira] [Updated] (MAPREDUCE-7355) Fix MRAppMaster to getStagingAreaDir from Job Configuration
[ https://issues.apache.org/jira/browse/MAPREDUCE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7355: - Description: When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to submit the job, client uses /mapreducestaging/yarn as staging directory whereas MRAppMaster uses /mapreducestaging/oozie. This leads to below failure {code} Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo: No such file or directory. at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263) {code} MRAppMaster can rely on Job Configuration mapreduce.job.dir to avoid this issue. was: When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to submit the job, client uses /mapreducestaging/yarn as staging directory whereas MRAppMaster uses /mapreducestaging/oozie. This leads to below failure {code} Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo: No such file or directory. at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263) {code} MRApps#getStagingAreaDir can rely on Job Configuration mapreduce.job.dir to avoid this issue. > Fix MRAppMaster to getStagingAreaDir from Job Configuration > --- > > Key: MAPREDUCE-7355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7355 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to > submit the job, client uses /mapreducestaging/yarn as staging directory > whereas MRAppMaster uses /mapreducestaging/oozie. This leads to below failure > {code} > Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.FileNotFoundException: > wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo: > No such file or directory. > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431) > at > org.apache.hadoop.yarn.sta
[jira] [Updated] (MAPREDUCE-7355) Fix MRAppMaster to getStagingAreaDir from Job Configuration
[ https://issues.apache.org/jira/browse/MAPREDUCE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7355: - Summary: Fix MRAppMaster to getStagingAreaDir from Job Configuration (was: Fix MRApps#getStagingAreaDir to fetch it from Job Configuration) > Fix MRAppMaster to getStagingAreaDir from Job Configuration > --- > > Key: MAPREDUCE-7355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7355 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to > submit the job, client uses /mapreducestaging/yarn as staging directory > whereas MRAppMaster uses /mapreducestaging/oozie. This leads to below failure > {code} > Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: > java.io.FileNotFoundException: > wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo: > No such file or directory. > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263) > {code} > MRApps#getStagingAreaDir can rely on Job Configuration mapreduce.job.dir to > avoid this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7355) Fix MRApps#getStagingAreaDir to fetch it from Job Configuration
Prabhu Joseph created MAPREDUCE-7355: Summary: Fix MRApps#getStagingAreaDir to fetch it from Job Configuration Key: MAPREDUCE-7355 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7355 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to submit the job, client uses /mapreducestaging/yarn as staging directory whereas MRAppMaster uses /mapreducestaging/oozie. This leads to below failure {code} Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo: No such file or directory. at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431) at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263) {code} MRApps#getStagingAreaDir can rely on Job Configuration mapreduce.job.dir to avoid this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal
[ https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362473#comment-17362473 ] Prabhu Joseph commented on MAPREDUCE-7351: -- Yes right [~ste...@apache.org]. > CleanupJob during handle of SIGTERM signal > -- > > Key: MAPREDUCE-7351 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > > Currently MR CleanupJob happens when the job is either successful or fail. > But during kill, it is not handled. This leaves all the temporary folders > under the output path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal
[ https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7351: - Summary: CleanupJob during handle of SIGTERM signal (was: CleanupJob when handling SIGTERM signal) > CleanupJob during handle of SIGTERM signal > -- > > Key: MAPREDUCE-7351 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > > Currently MR CleanupJob happens when the job is either successful or fail. > But during kill, it is not handled. This leaves all the temporary folders > under the output path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7351) CleanupJob when handling SIGTERM signal
Prabhu Joseph created MAPREDUCE-7351: Summary: CleanupJob when handling SIGTERM signal Key: MAPREDUCE-7351 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Currently MR CleanupJob happens when the job is either successful or fail. But during kill, it is not handled. This leaves all the temporary folders under the output path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7250) FrameworkUploader: skip replication check entirely if timeout == 0
[ https://issues.apache.org/jira/browse/MAPREDUCE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987931#comment-16987931 ] Prabhu Joseph commented on MAPREDUCE-7250: -- Have committed it to trunk. Will resolve the Jira. > FrameworkUploader: skip replication check entirely if timeout == 0 > -- > > Key: MAPREDUCE-7250 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: MAPREDUCE-7250-001.patch > > > The framework uploader tool has this piece of code which makes sure that all > block of the uploaded mapreduce tarball has been replicated: > {noformat} > while(endTime - startTime < timeout * 1000 && >currentReplication < acceptableReplication) { > Thread.sleep(1000); > endTime = System.currentTimeMillis(); > currentReplication = getSmallestReplicatedBlockCount(); > } > {noformat} > There are cases, however, when we don't want to wait for this (eg. we want to > speed up Hadoop installation). > I suggest adding {{--skiprelicationcheck}} switch which disables this > replication test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7250) FrameworkUploader: skip replication check entirely if timeout == 0
[ https://issues.apache.org/jira/browse/MAPREDUCE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7250: - Fix Version/s: 3.3.0 > FrameworkUploader: skip replication check entirely if timeout == 0 > -- > > Key: MAPREDUCE-7250 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: Reviewed > Fix For: 3.3.0 > > Attachments: MAPREDUCE-7250-001.patch > > > The framework uploader tool has this piece of code which makes sure that all > block of the uploaded mapreduce tarball has been replicated: > {noformat} > while(endTime - startTime < timeout * 1000 && >currentReplication < acceptableReplication) { > Thread.sleep(1000); > endTime = System.currentTimeMillis(); > currentReplication = getSmallestReplicatedBlockCount(); > } > {noformat} > There are cases, however, when we don't want to wait for this (eg. we want to > speed up Hadoop installation). > I suggest adding {{--skiprelicationcheck}} switch which disables this > replication test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7250) FrameworkUploader: skip replication check entirely if timeout == 0
[ https://issues.apache.org/jira/browse/MAPREDUCE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7250: - Resolution: Fixed Status: Resolved (was: Patch Available) > FrameworkUploader: skip replication check entirely if timeout == 0 > -- > > Key: MAPREDUCE-7250 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: Reviewed > Attachments: MAPREDUCE-7250-001.patch > > > The framework uploader tool has this piece of code which makes sure that all > block of the uploaded mapreduce tarball has been replicated: > {noformat} > while(endTime - startTime < timeout * 1000 && >currentReplication < acceptableReplication) { > Thread.sleep(1000); > endTime = System.currentTimeMillis(); > currentReplication = getSmallestReplicatedBlockCount(); > } > {noformat} > There are cases, however, when we don't want to wait for this (eg. we want to > speed up Hadoop installation). > I suggest adding {{--skiprelicationcheck}} switch which disables this > replication test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7250) FrameworkUploader: skip replication check entirely if timeout == 0
[ https://issues.apache.org/jira/browse/MAPREDUCE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7250: - Labels: Reviewed (was: ) > FrameworkUploader: skip replication check entirely if timeout == 0 > -- > > Key: MAPREDUCE-7250 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Labels: Reviewed > Attachments: MAPREDUCE-7250-001.patch > > > The framework uploader tool has this piece of code which makes sure that all > block of the uploaded mapreduce tarball has been replicated: > {noformat} > while(endTime - startTime < timeout * 1000 && >currentReplication < acceptableReplication) { > Thread.sleep(1000); > endTime = System.currentTimeMillis(); > currentReplication = getSmallestReplicatedBlockCount(); > } > {noformat} > There are cases, however, when we don't want to wait for this (eg. we want to > speed up Hadoop installation). > I suggest adding {{--skiprelicationcheck}} switch which disables this > replication test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7250) FrameworkUploader: skip replication check entirely if timeout == 0
[ https://issues.apache.org/jira/browse/MAPREDUCE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987917#comment-16987917 ] Prabhu Joseph commented on MAPREDUCE-7250: -- Thanks [~pbacsko] for the patch. +1, will commit it shortly. The fix does not change any existing behavior other then not logging error message, will ignore testcase for the patch. > FrameworkUploader: skip replication check entirely if timeout == 0 > -- > > Key: MAPREDUCE-7250 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Peter Bacsko >Assignee: Peter Bacsko >Priority: Major > Attachments: MAPREDUCE-7250-001.patch > > > The framework uploader tool has this piece of code which makes sure that all > block of the uploaded mapreduce tarball has been replicated: > {noformat} > while(endTime - startTime < timeout * 1000 && >currentReplication < acceptableReplication) { > Thread.sleep(1000); > endTime = System.currentTimeMillis(); > currentReplication = getSmallestReplicatedBlockCount(); > } > {noformat} > There are cases, however, when we don't want to wait for this (eg. we want to > speed up Hadoop installation). > I suggest adding {{--skiprelicationcheck}} switch which disables this > replication test. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7249) Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes job failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7249: - Resolution: Fixed Status: Resolved (was: Patch Available) > Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes > job failure > > > Key: MAPREDUCE-7249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 3.1.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Labels: Reviewed > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: MAPREDUCE-7249-001.patch, MAPREDUCE-7249-002.patch, > MAPREDUCE-7249-branch-3.2.001.patch > > > Same issue as in MAPREDUCE-7240 but this one has a different state in which > the Exception {{TA_TOO_MANY_FETCH_FAILURE}} event is received: > {code} > 2019-11-18 23:03:40,270 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1568654141590_630203_m_003108_1 > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1183) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1388) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1380) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > {code} > The stack trace is from a CDH release which is highly patched 2.6 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7249) Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes job failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984259#comment-16984259 ] Prabhu Joseph commented on MAPREDUCE-7249: -- Have committed to trunk, branch-3.2 and branch-3.1. Will resolve the Jira. > Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes > job failure > > > Key: MAPREDUCE-7249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 3.1.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Labels: Reviewed > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: MAPREDUCE-7249-001.patch, MAPREDUCE-7249-002.patch, > MAPREDUCE-7249-branch-3.2.001.patch > > > Same issue as in MAPREDUCE-7240 but this one has a different state in which > the Exception {{TA_TOO_MANY_FETCH_FAILURE}} event is received: > {code} > 2019-11-18 23:03:40,270 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1568654141590_630203_m_003108_1 > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1183) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1388) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1380) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > {code} > The stack trace is from a CDH release which is highly patched 2.6 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7249) Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes job failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7249: - Fix Version/s: 3.2.2 3.1.4 3.3.0 > Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes > job failure > > > Key: MAPREDUCE-7249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 3.1.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: MAPREDUCE-7249-001.patch, MAPREDUCE-7249-002.patch, > MAPREDUCE-7249-branch-3.2.001.patch > > > Same issue as in MAPREDUCE-7240 but this one has a different state in which > the Exception {{TA_TOO_MANY_FETCH_FAILURE}} event is received: > {code} > 2019-11-18 23:03:40,270 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1568654141590_630203_m_003108_1 > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1183) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1388) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1380) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > {code} > The stack trace is from a CDH release which is highly patched 2.6 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7249) Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes job failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7249: - Labels: Reviewed (was: ) > Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes > job failure > > > Key: MAPREDUCE-7249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 3.1.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Labels: Reviewed > Fix For: 3.3.0, 3.1.4, 3.2.2 > > Attachments: MAPREDUCE-7249-001.patch, MAPREDUCE-7249-002.patch, > MAPREDUCE-7249-branch-3.2.001.patch > > > Same issue as in MAPREDUCE-7240 but this one has a different state in which > the Exception {{TA_TOO_MANY_FETCH_FAILURE}} event is received: > {code} > 2019-11-18 23:03:40,270 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1568654141590_630203_m_003108_1 > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1183) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1388) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1380) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > {code} > The stack trace is from a CDH release which is highly patched 2.6 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7249) Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes job failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984155#comment-16984155 ] Prabhu Joseph commented on MAPREDUCE-7249: -- Thanks [~wilfreds] for fixing the issue. Patch looks good, +1. Will commit it shortly. > Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes > job failure > > > Key: MAPREDUCE-7249 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7249 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 3.1.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Critical > Attachments: MAPREDUCE-7249-001.patch, MAPREDUCE-7249-002.patch, > MAPREDUCE-7249-branch-3.2.001.patch > > > Same issue as in MAPREDUCE-7240 but this one has a different state in which > the Exception {{TA_TOO_MANY_FETCH_FAILURE}} event is received: > {code} > 2019-11-18 23:03:40,270 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1568654141590_630203_m_003108_1 > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1183) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1388) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1380) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > {code} > The stack trace is from a CDH release which is highly patched 2.6 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error
[ https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7240: - Resolution: Fixed Status: Resolved (was: Patch Available) > Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at > SUCCESS_FINISHING_CONTAINER' cause job error > > > Key: MAPREDUCE-7240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.2 >Reporter: luhuachao >Assignee: luhuachao >Priority: Critical > Labels: applicationmaster, mrv2 > Fix For: 3.3.0 > > Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, > application_1566552310686_260041.log > > > *log in appmaster* > {noformat} > 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_52_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_49_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_51_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_50_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_53_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to > FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454 > 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_49_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_51_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.ma
[jira] [Updated] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error
[ https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7240: - Fix Version/s: 3.3.0 > Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at > SUCCESS_FINISHING_CONTAINER' cause job error > > > Key: MAPREDUCE-7240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.2 >Reporter: luhuachao >Assignee: luhuachao >Priority: Critical > Labels: applicationmaster, mrv2 > Fix For: 3.3.0 > > Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, > application_1566552310686_260041.log > > > *log in appmaster* > {noformat} > 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_52_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_49_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_51_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_50_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_53_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to > FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454 > 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_49_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_51_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDis
[jira] [Updated] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error
[ https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7240: - Labels: Reviewed applicationmaster mrv2 (was: applicationmaster mrv2) > Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at > SUCCESS_FINISHING_CONTAINER' cause job error > > > Key: MAPREDUCE-7240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.2 >Reporter: luhuachao >Assignee: luhuachao >Priority: Critical > Labels: Reviewed, applicationmaster, mrv2 > Fix For: 3.3.0 > > Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, > application_1566552310686_260041.log > > > *log in appmaster* > {noformat} > 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_52_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_49_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_51_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_50_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_53_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to > FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454 > 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_49_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_51_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.ap
[jira] [Commented] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error
[ https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983298#comment-16983298 ] Prabhu Joseph commented on MAPREDUCE-7240: -- Patch [^MAPREDUCE-7240-002.patch] looks good, +1. Have committed to trunk. Thanks [~Huachao] and [~pbacsko] for the patch and [~wilfreds] for the review. > Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at > SUCCESS_FINISHING_CONTAINER' cause job error > > > Key: MAPREDUCE-7240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.2 >Reporter: luhuachao >Assignee: luhuachao >Priority: Critical > Labels: applicationmaster, mrv2 > Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, > application_1566552310686_260041.log > > > *log in appmaster* > {noformat} > 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_52_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_49_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_51_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_50_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_53_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to > FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454 > 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_49_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_51_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce
[jira] [Commented] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error
[ https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983191#comment-16983191 ] Prabhu Joseph commented on MAPREDUCE-7240: -- Thanks [~wilfreds] and [~pbacsko] for the clarification. Yes, the SUCCEEDED map attempt is also marked as FAILED on TA_TOO_MANY_FETCH_FAILURE. {code} // Transitions from SUCCEEDED .addTransition(TaskAttemptStateInternal.SUCCEEDED, //only possible for map attempts TaskAttemptStateInternal.FAILED, TaskAttemptEventType.TA_TOO_MANY_FETCH_FAILURE, new TooManyFetchFailureTransition()) {code} The patch looks good except below. Have fixed it in [^MAPREDUCE-7240-002.patch] . 1. @Test is missed in the testcase. > Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at > SUCCESS_FINISHING_CONTAINER' cause job error > > > Key: MAPREDUCE-7240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.2 >Reporter: luhuachao >Assignee: luhuachao >Priority: Critical > Labels: kerberos > Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, > application_1566552310686_260041.log > > > *log in appmaster* > {noformat} > 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_52_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_49_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_51_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_50_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_53_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to > FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454 > 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_49_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_51_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) >
[jira] [Updated] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error
[ https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7240: - Attachment: MAPREDUCE-7240-002.patch > Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at > SUCCESS_FINISHING_CONTAINER' cause job error > > > Key: MAPREDUCE-7240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.2 >Reporter: luhuachao >Assignee: luhuachao >Priority: Critical > Labels: kerberos > Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, > application_1566552310686_260041.log > > > *log in appmaster* > {noformat} > 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_52_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_49_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_51_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_50_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_53_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to > FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454 > 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_49_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_51_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.jav
[jira] [Commented] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error
[ https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982254#comment-16982254 ] Prabhu Joseph commented on MAPREDUCE-7240: -- [~Huachao] Have a doubt, what happens if the successfully finishing container simply ignores this event. > Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at > SUCCESS_FINISHING_CONTAINER' cause job error > > > Key: MAPREDUCE-7240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.8.2 >Reporter: luhuachao >Assignee: luhuachao >Priority: Critical > Labels: kerberos > Attachments: application_1566552310686_260041.log > > > *log in appmaster* > {noformat} > 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_52_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_49_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_51_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_50_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1566552310686_260041_m_53_0 ... > raising fetch failure to map > 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to > FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454 > 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_49_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1566552310686_260041_m_51_0 > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458) > at > org.apache.hadoop.mapreduce
[jira] [Updated] (MAPREDUCE-7238) TestMRJobs.testJobClassloader fails intermittent
[ https://issues.apache.org/jira/browse/MAPREDUCE-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7238: - Attachment: stdout > TestMRJobs.testJobClassloader fails intermittent > > > Key: MAPREDUCE-7238 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7238 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: stdout > > > *TestMRJobs.testJobClassloader fails intermittent* observed in > {code} > ERROR] testJobClassloader(org.apache.hadoop.mapreduce.v2.TestMRJobs) Time > elapsed: 29.77 s <<< FAILURE! > java.lang.AssertionError: > Job status: Application application_1567255842834_0009 failed 2 times due to > AM Container for appattempt_1567255842834_0009_02 exited with exitCode: 1 > Failing this attempt.Diagnostics: [2019-08-31 12:54:14.542]Exception from > container-launch. > Container id: container_1567255842834_0009_02_01 > Exit code: 1 > [2019-08-31 12:54:14.546]Container exited with a non-zero exit code 1. Error > file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > Last 4096 bytes of stderr : > [2019-08-31 12:54:14.547]Container exited with a non-zero exit code 1. Error > file: prelaunch.err. > Last 4096 bytes of prelaunch.err : > Last 4096 bytes of stderr : > For more detailed output, check the application tracking page: > http://6437fb7eb209:32931/cluster/app/application_1567255842834_0009 Then > click on links to logs of each attempt. > . Failing the application. > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloader(TestMRJobs.java:531) > at > org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloader(TestMRJobs.java:473) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7238) TestMRJobs.testJobClassloader fails intermittent
[ https://issues.apache.org/jira/browse/MAPREDUCE-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7238: - Description: *TestMRJobs.testJobClassloader fails intermittent* observed in {code} ERROR] testJobClassloader(org.apache.hadoop.mapreduce.v2.TestMRJobs) Time elapsed: 29.77 s <<< FAILURE! java.lang.AssertionError: Job status: Application application_1567255842834_0009 failed 2 times due to AM Container for appattempt_1567255842834_0009_02 exited with exitCode: 1 Failing this attempt.Diagnostics: [2019-08-31 12:54:14.542]Exception from container-launch. Container id: container_1567255842834_0009_02_01 Exit code: 1 [2019-08-31 12:54:14.546]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : [2019-08-31 12:54:14.547]Container exited with a non-zero exit code 1. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : For more detailed output, check the application tracking page: http://6437fb7eb209:32931/cluster/app/application_1567255842834_0009 Then click on links to logs of each attempt. . Failing the application. at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloader(TestMRJobs.java:531) at org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloader(TestMRJobs.java:473) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} was: TestMRJobs.testThreadDumpOnTaskTimeout fails {code} [ERROR] testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestMRJobs) Time elapsed: 43.282 s <<< FAILURE! java.lang.AssertionError: No thread dump at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1222) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:745) {code} > TestMRJobs.testJobClassloader fails intermittent > > > Key: MAPREDUCE-7238 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7238 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: stdout > > > *TestMRJobs.testJobClassloader fails intermittent* observed in > {code} > ERROR] testJobClassloader(org.apache.hadoop.mapreduce.v2.TestMRJobs) Time > elapsed: 29.77 s <<< FAILURE! > java.lang.AssertionError: > Job status: Application application_1567255842834_0009 failed 2 times due to > AM Container for appattempt_1567255842834_0009_02 exited with exitCode: 1 > Failing this attempt.Diagnostics: [2019-
[jira] [Updated] (MAPREDUCE-7238) TestMRJobs.testJobClassloader fails intermittent
[ https://issues.apache.org/jira/browse/MAPREDUCE-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7238: - Summary: TestMRJobs.testJobClassloader fails intermittent (was: TestMRJobs.testThreadDumpOnTaskTimeout fails) > TestMRJobs.testJobClassloader fails intermittent > > > Key: MAPREDUCE-7238 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7238 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > TestMRJobs.testThreadDumpOnTaskTimeout fails > {code} > [ERROR] > testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestMRJobs) Time > elapsed: 43.282 s <<< FAILURE! > java.lang.AssertionError: No thread dump > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.assertTrue(Assert.java:41) > at > org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1222) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7238) TestMRJobs.testThreadDumpOnTaskTimeout fails
Prabhu Joseph created MAPREDUCE-7238: Summary: TestMRJobs.testThreadDumpOnTaskTimeout fails Key: MAPREDUCE-7238 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7238 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph TestMRJobs.testThreadDumpOnTaskTimeout fails {code} [ERROR] testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestMRJobs) Time elapsed: 43.282 s <<< FAILURE! java.lang.AssertionError: No thread dump at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1222) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908162#comment-16908162 ] Prabhu Joseph commented on MAPREDUCE-7230: -- Thanks [~snemeth]. > TestHSWebApp.testLogsViewSingle fails > - > > Key: MAPREDUCE-7230 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: MAPREDUCE-7230-001.patch > > > TestHSWebApp.testLogsViewSingle fails. > {code} > [ERROR] > testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp) > Time elapsed: 0.294 s <<< FAILURE! > Argument(s) are different! Wanted: > printWriter.write( > "Logs not available for container_10_0001_01_01. Aggregation may not > be complete, Check back later or try the nodemanager at localhost:1234" > ); > -> at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234) > Actual invocations have different arguments: > printWriter.print( > " "http://www.w3.org/TR/html4/strict.dtd";>" > ); > -> at > org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62) > printWriter.write( > " "http://www.w3.org/TR/html4/strict.dtd";>" > ); > -> at java.io.PrintWriter.print(PrintWriter.java:617) > printWriter.write( > " "http://www.w3.org/TR/html4/strict.dtd";>", > 0, > 90 > ); > -> at java.io.PrintWriter.write(PrintWriter.java:473) > printWriter.println( > > ); > -> at > org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81) > printWriter.print( > " ); > -> at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273) > printWriter.write( > " ); > -> at java.io.PrintWriter.print(PrintWriter.java:603) > printWriter.write( > " 0, > 5 > ); > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907452#comment-16907452 ] Prabhu Joseph commented on MAPREDUCE-7230: -- Yes [~snemeth], as the testcase fails in 3.2 and 3.1 as well. > TestHSWebApp.testLogsViewSingle fails > - > > Key: MAPREDUCE-7230 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7230-001.patch > > > TestHSWebApp.testLogsViewSingle fails. > {code} > [ERROR] > testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp) > Time elapsed: 0.294 s <<< FAILURE! > Argument(s) are different! Wanted: > printWriter.write( > "Logs not available for container_10_0001_01_01. Aggregation may not > be complete, Check back later or try the nodemanager at localhost:1234" > ); > -> at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234) > Actual invocations have different arguments: > printWriter.print( > " "http://www.w3.org/TR/html4/strict.dtd";>" > ); > -> at > org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62) > printWriter.write( > " "http://www.w3.org/TR/html4/strict.dtd";>" > ); > -> at java.io.PrintWriter.print(PrintWriter.java:617) > printWriter.write( > " "http://www.w3.org/TR/html4/strict.dtd";>", > 0, > 90 > ); > -> at java.io.PrintWriter.write(PrintWriter.java:473) > printWriter.println( > > ); > -> at > org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81) > printWriter.print( > " ); > -> at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273) > printWriter.write( > " ); > -> at java.io.PrintWriter.print(PrintWriter.java:603) > printWriter.write( > " 0, > 5 > ); > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906519#comment-16906519 ] Prabhu Joseph commented on MAPREDUCE-7230: -- [~snemeth] Can you review this Jira when you get time. This fixes failing test case TestHSWebApp.testLogsViewSingle caused by YARN-9451. Have missed it earlier as Jenkins Build for YARN-9451 patch did not trigger testcases of hadoop-mapreduce-client-hs. > TestHSWebApp.testLogsViewSingle fails > - > > Key: MAPREDUCE-7230 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7230-001.patch > > > TestHSWebApp.testLogsViewSingle fails. > {code} > [ERROR] > testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp) > Time elapsed: 0.294 s <<< FAILURE! > Argument(s) are different! Wanted: > printWriter.write( > "Logs not available for container_10_0001_01_01. Aggregation may not > be complete, Check back later or try the nodemanager at localhost:1234" > ); > -> at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234) > Actual invocations have different arguments: > printWriter.print( > " "http://www.w3.org/TR/html4/strict.dtd";>" > ); > -> at > org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62) > printWriter.write( > " "http://www.w3.org/TR/html4/strict.dtd";>" > ); > -> at java.io.PrintWriter.print(PrintWriter.java:617) > printWriter.write( > " "http://www.w3.org/TR/html4/strict.dtd";>", > 0, > 90 > ); > -> at java.io.PrintWriter.write(PrintWriter.java:473) > printWriter.println( > > ); > -> at > org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81) > printWriter.print( > " ); > -> at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273) > printWriter.write( > " ); > -> at java.io.PrintWriter.print(PrintWriter.java:603) > printWriter.write( > " 0, > 5 > ); > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7230: - Status: Patch Available (was: Open) > TestHSWebApp.testLogsViewSingle fails > - > > Key: MAPREDUCE-7230 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7230-001.patch > > > TestHSWebApp.testLogsViewSingle fails. > {code} > [ERROR] > testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp) > Time elapsed: 0.294 s <<< FAILURE! > Argument(s) are different! Wanted: > printWriter.write( > "Logs not available for container_10_0001_01_01. Aggregation may not > be complete, Check back later or try the nodemanager at localhost:1234" > ); > -> at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234) > Actual invocations have different arguments: > printWriter.print( > " "http://www.w3.org/TR/html4/strict.dtd";>" > ); > -> at > org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62) > printWriter.write( > " "http://www.w3.org/TR/html4/strict.dtd";>" > ); > -> at java.io.PrintWriter.print(PrintWriter.java:617) > printWriter.write( > " "http://www.w3.org/TR/html4/strict.dtd";>", > 0, > 90 > ); > -> at java.io.PrintWriter.write(PrintWriter.java:473) > printWriter.println( > > ); > -> at > org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81) > printWriter.print( > " ); > -> at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273) > printWriter.write( > " ); > -> at java.io.PrintWriter.print(PrintWriter.java:603) > printWriter.write( > " 0, > 5 > ); > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7230: - Attachment: MAPREDUCE-7230-001.patch > TestHSWebApp.testLogsViewSingle fails > - > > Key: MAPREDUCE-7230 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, test >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7230-001.patch > > > TestHSWebApp.testLogsViewSingle fails. > {code} > [ERROR] > testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp) > Time elapsed: 0.294 s <<< FAILURE! > Argument(s) are different! Wanted: > printWriter.write( > "Logs not available for container_10_0001_01_01. Aggregation may not > be complete, Check back later or try the nodemanager at localhost:1234" > ); > -> at > org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234) > Actual invocations have different arguments: > printWriter.print( > " "http://www.w3.org/TR/html4/strict.dtd";>" > ); > -> at > org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62) > printWriter.write( > " "http://www.w3.org/TR/html4/strict.dtd";>" > ); > -> at java.io.PrintWriter.print(PrintWriter.java:617) > printWriter.write( > " "http://www.w3.org/TR/html4/strict.dtd";>", > 0, > 90 > ); > -> at java.io.PrintWriter.write(PrintWriter.java:473) > printWriter.println( > > ); > -> at > org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81) > printWriter.print( > " ); > -> at > org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273) > printWriter.write( > " ); > -> at java.io.PrintWriter.print(PrintWriter.java:603) > printWriter.write( > " 0, > 5 > ); > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7231) hadoop-mapreduce-client-jobclient fails with timeout
Prabhu Joseph created MAPREDUCE-7231: Summary: hadoop-mapreduce-client-jobclient fails with timeout Key: MAPREDUCE-7231 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7231 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, test Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Attachments: Maven_TestCase_Report.txt hadoop-mapreduce-client-jobclient fails with timeout {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on project hadoop-mapreduce-client-jobclient: There was a timeout or other error in the fork -> [Help 1] {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails
Prabhu Joseph created MAPREDUCE-7230: Summary: TestHSWebApp.testLogsViewSingle fails Key: MAPREDUCE-7230 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, test Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph TestHSWebApp.testLogsViewSingle fails. {code} [ERROR] testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp) Time elapsed: 0.294 s <<< FAILURE! Argument(s) are different! Wanted: printWriter.write( "Logs not available for container_10_0001_01_01. Aggregation may not be complete, Check back later or try the nodemanager at localhost:1234" ); -> at org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234) Actual invocations have different arguments: printWriter.print( "http://www.w3.org/TR/html4/strict.dtd";>" ); -> at org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62) printWriter.write( "http://www.w3.org/TR/html4/strict.dtd";>" ); -> at java.io.PrintWriter.print(PrintWriter.java:617) printWriter.write( "http://www.w3.org/TR/html4/strict.dtd";>", 0, 90 ); -> at java.io.PrintWriter.write(PrintWriter.java:473) printWriter.println( ); -> at org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81) printWriter.print( " at org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273) printWriter.write( " at java.io.PrintWriter.print(PrintWriter.java:603) printWriter.write( "
[jira] [Updated] (MAPREDUCE-7216) Fix TeraSort Job failing on S3 DirectoryStagingCommitter
[ https://issues.apache.org/jira/browse/MAPREDUCE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7216: - Status: Patch Available (was: Open) > Fix TeraSort Job failing on S3 DirectoryStagingCommitter > > > Key: MAPREDUCE-7216 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: MAPREDUCE-7216-001.patch > > > TeraSort Job fails on S3 with below exception. Terasort creates OutputPath > and writes partition filename but DirectoryStagingCommitter expects output > path to not exist. > {code} > 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with > state FAILED due to: Job setup failed : > org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job > as Task committer attempt_1559891760159_0011_m_00_0: Destination path > exists and committer conflict resolution mode is "fail" > at > org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878) > at > org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Creating partition filename in /tmp or some other directory fixes the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7216) Fix TeraSort Job failing on S3 DirectoryStagingCommitter
[ https://issues.apache.org/jira/browse/MAPREDUCE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7216: - Attachment: MAPREDUCE-7216-001.patch > Fix TeraSort Job failing on S3 DirectoryStagingCommitter > > > Key: MAPREDUCE-7216 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > Attachments: MAPREDUCE-7216-001.patch > > > TeraSort Job fails on S3 with below exception. Terasort creates OutputPath > and writes partition filename but DirectoryStagingCommitter expects output > path to not exist. > {code} > 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with > state FAILED due to: Job setup failed : > org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job > as Task committer attempt_1559891760159_0011_m_00_0: Destination path > exists and committer conflict resolution mode is "fail" > at > org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878) > at > org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Creating partition filename in /tmp or some other directory fixes the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7216) Fix TeraSort Job failing on S3 DirectoryStagingCommitter
[ https://issues.apache.org/jira/browse/MAPREDUCE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7216: - Component/s: examples > Fix TeraSort Job failing on S3 DirectoryStagingCommitter > > > Key: MAPREDUCE-7216 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > > TeraSort Job fails on S3 with below exception. Terasort creates OutputPath > and writes partition filename but DirectoryStagingCommitter expects output > path to not exist. > {code} > 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with > state FAILED due to: Job setup failed : > org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job > as Task committer attempt_1559891760159_0011_m_00_0: Destination path > exists and committer conflict resolution mode is "fail" > at > org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878) > at > org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Creating partition filename in /tmp or some other directory fixes the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7216) Fix TeraSort Job failing on S3 DirectoryStagingCommitter
[ https://issues.apache.org/jira/browse/MAPREDUCE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7216: - Summary: Fix TeraSort Job failing on S3 DirectoryStagingCommitter (was: TeraSort Job Fails on S3) > Fix TeraSort Job failing on S3 DirectoryStagingCommitter > > > Key: MAPREDUCE-7216 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > > TeraSort Job fails on S3 with below exception. Terasort creates OutputPath > and writes partition filename but DirectoryStagingCommitter expects output > path to not exist. > {code} > 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with > state FAILED due to: Job setup failed : > org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job > as Task committer attempt_1559891760159_0011_m_00_0: Destination path > exists and committer conflict resolution mode is "fail" > at > org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878) > at > org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Creating partition filename in /tmp or some other directory fixes the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7216) Fix TeraSort Job failing on S3 DirectoryStagingCommitter
[ https://issues.apache.org/jira/browse/MAPREDUCE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7216: - Affects Version/s: 3.3.0 > Fix TeraSort Job failing on S3 DirectoryStagingCommitter > > > Key: MAPREDUCE-7216 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Minor > > TeraSort Job fails on S3 with below exception. Terasort creates OutputPath > and writes partition filename but DirectoryStagingCommitter expects output > path to not exist. > {code} > 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with > state FAILED due to: Job setup failed : > org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job > as Task committer attempt_1559891760159_0011_m_00_0: Destination path > exists and committer conflict resolution mode is "fail" > at > org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878) > at > org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Creating partition filename in /tmp or some other directory fixes the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7217) TestMRTimelineEventHandling.testMRTimelineEventHandling fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7217: - Description: *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* {code:java} ERROR] testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) Time elapsed: 46.337 s <<< FAILURE! org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} *TestJobHistoryEventHandler.testTimelineEventHandling* {code} [ERROR] testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) Time elapsed: 5.858 s <<< FAILURE! java.lang.AssertionError: expected:<1> but was:<0> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:597) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {code} was: *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* {code:java} ERROR] testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) Time elapsed: 46.337 s <<< FAILURE! org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.as
[jira] [Commented] (MAPREDUCE-7217) TestMRTimelineEventHandling.testMRTimelineEventHandling fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859805#comment-16859805 ] Prabhu Joseph commented on MAPREDUCE-7217: -- Testcase fails as the putEntities to {{ApplicationHistoryServer}} failed. {code:java} 2019-06-10 12:14:27,283 ERROR [RM Timeline dispatcher] metrics.TimelineServiceV1Publisher (TimelineServiceV1Publisher.java:putEntity(385)) - Error when publishing entity [YARN_CONTAINER,container_1560149051337_0001_01_04] org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response from the timeline server. HTTP error code: 403 at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:139) at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:383) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.access$100(TimelineServiceV1Publisher.java:52) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher$TimelineV1EventHandler.handle(TimelineServiceV1Publisher.java:408) at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher$TimelineV1EventHandler.handle(TimelineServiceV1Publisher.java:404) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:200) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:131) at java.lang.Thread.run(Thread.java:745) {code}  {{TimelineWebService}} fails the request as the remote user is null. {code:java} 2019-06-10 12:14:27,282 ERROR [qtp564893839-417] webapp.TimelineWebServices (TimelineWebServices.java:postEntities(237)) - The owner of the posted timeline entities is not set{code}  This happens when Filters are set with {{RMAuthenticationFilter}} and {{TimelineAuthenticationFIlter}} which both conflicts. {code:java} 2019-06-10 12:14:12,685 INFO [Listener at hw12663/50946] timeline.TimelineServerUtils (TimelineServerUtils.java:setTimelineFilters(73)) - Filter initializers set for timeline service: org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilterInitializer,org.apache.hadoop.http.lib.StaticUserWebFilter,org.apache.hadoop.yarn.server.timeline.security.TimelineAuthenticationFilterInitializer{code}  {{MiniYarnCluster}} uses same config for starting both {{ResourceManager}} and {{ApplicationHistoryServer}}. RM started first with {{RMAuthenticationFilter}} and then {{ApplicationHistoryServer}} appends the config {{hadoop.http.filter.initializers}} with {{TimelineAuthenticationFIlter}}. Have ignored {{RMAuthenticationFilter}} in {{MiniYarnCluster}} while starting {{ApplicationHistoryServer}} which fixes the issue.  > TestMRTimelineEventHandling.testMRTimelineEventHandling fails > - > > Key: MAPREDUCE-7217 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7217 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7217-001.patch > > > *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* > {code:java} > ERROR] > testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) > Time elapsed: 46.337 s <<< FAILURE! > org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRun
[jira] [Updated] (MAPREDUCE-7217) TestMRTimelineEventHandling.testMRTimelineEventHandling fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7217: - Status: Patch Available (was: Open) > TestMRTimelineEventHandling.testMRTimelineEventHandling fails > - > > Key: MAPREDUCE-7217 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7217 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7217-001.patch > > > *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* > {code:java} > ERROR] > testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) > Time elapsed: 46.337 s <<< FAILURE! > org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7217) TestMRTimelineEventHandling.testMRTimelineEventHandling fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7217: - Attachment: MAPREDUCE-7217-001.patch > TestMRTimelineEventHandling.testMRTimelineEventHandling fails > - > > Key: MAPREDUCE-7217 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7217 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7217-001.patch > > > *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* > {code:java} > ERROR] > testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) > Time elapsed: 46.337 s <<< FAILURE! > org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> > at org.junit.Assert.assertEquals(Assert.java:115) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7217) TestMRTimelineEventHandling.testMRTimelineEventHandling fails
Prabhu Joseph created MAPREDUCE-7217: Summary: TestMRTimelineEventHandling.testMRTimelineEventHandling fails Key: MAPREDUCE-7217 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7217 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.* {code:java} ERROR] testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling) Time elapsed: 46.337 s <<< FAILURE! org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED> at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7216) TeraSort Job Fails on S3
Prabhu Joseph created MAPREDUCE-7216: Summary: TeraSort Job Fails on S3 Key: MAPREDUCE-7216 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Prabhu Joseph Assignee: Prabhu Joseph TeraSort Job fails on S3 with below exception. Terasort creates OutputPath and writes partition filename but DirectoryStagingCommitter expects output path to not exist. {code} 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with state FAILED due to: Job setup failed : org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job as Task committer attempt_1559891760159_0011_m_00_0: Destination path exists and committer conflict resolution mode is "fail" at org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878) at org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} Creating partition filename in /tmp or some other directory fixes the issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7201) Make Job History File Permissions configurable
[ https://issues.apache.org/jira/browse/MAPREDUCE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841928#comment-16841928 ] Prabhu Joseph commented on MAPREDUCE-7201: -- Thanks [~erwaman] for reviewing. [~eyang] Can you review this Jira when you get time. This allows jhist file permission configurable. > Make Job History File Permissions configurable > -- > > Key: MAPREDUCE-7201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7201-001.patch > > > Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to > configure the intermediate user directory permission but still the jhist file > permission are not changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7203) TestRuntimeEstimators fails intermittent
Prabhu Joseph created MAPREDUCE-7203: Summary: TestRuntimeEstimators fails intermittent Key: MAPREDUCE-7203 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7203 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 3.3.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph TestRuntimeEstimators fails intermittent. {code} [ERROR] testExponentialEstimator(org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators) Time elapsed: 9.637 s <<< FAILURE! java.lang.AssertionError: We got the wrong number of successful speculations. expected:<3> but was:<1> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.coreTestEstimator(TestRuntimeEstimators.java:243) at org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator(TestRuntimeEstimators.java:257) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-6993) Provide additional aggregated task stats at the Map / Reduce level
[ https://issues.apache.org/jira/browse/MAPREDUCE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned MAPREDUCE-6993: Assignee: Prabhu Joseph > Provide additional aggregated task stats at the Map / Reduce level > -- > > Key: MAPREDUCE-6993 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6993 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > MapReduce ApplicationMaster can log aggregated tasks stats for Map / Reduce > stage like below which will make debugging easier. Similar to what Tez > provides TEZ-930 > firstTaskStartTime, > firstTasksToStart > lastTaskFinishTime > lastTasksToFinish > minTaskDuration > maxTaskDuration > avgTaskDuration > numSuccessfulTasks > shortestDurationTasks > longestDurationTasks > numFailedTaskAttempts > numKilledTaskAttempts > numCompletedTasks > numSucceededTasks > numKilledTasks > numFailedTasks -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7201) Make Job History File Permissions configurable
[ https://issues.apache.org/jira/browse/MAPREDUCE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836413#comment-16836413 ] Prabhu Joseph commented on MAPREDUCE-7201: -- Have tested with mapreduce.jobhistory.intermediate-user-done-dir.permissions=775 and both intermediate user done dir and history files permission set correctly. {code} [ambari-qa@yarn-ats-1 ~]$ hadoop fs -ls /mr-history/tmp/ Found 1 items drwxrwxr-x - ambari-qa supergroup 0 2019-05-09 13:32 /mr-history/tmp/ambari-qa [ambari-qa@yarn-ats-1 ~]$ [ambari-qa@yarn-ats-1 ~]$ hadoop fs -ls /mr-history/tmp/ambari-qa Found 3 items -rwxrwxr-x 3 ambari-qa supergroup 22926 2019-05-09 13:32 /mr-history/tmp/ambari-qa/job_1556909089920_0006-1557408730503-ambari%2Dqa-word+count-1557408748287-1-1-SUCCEEDED-default-1557408736193.jhist -rwxrwxr-x 3 ambari-qa supergroup444 2019-05-09 13:32 /mr-history/tmp/ambari-qa/job_1556909089920_0006.summary -rwxrwxr-x 3 ambari-qa supergroup 220806 2019-05-09 13:32 /mr-history/tmp/ambari-qa/job_1556909089920_0006_conf.xml {code} > Make Job History File Permissions configurable > -- > > Key: MAPREDUCE-7201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7201-001.patch > > > Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to > configure the intermediate user directory permission but still the jhist file > permission are not changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7201) Make Job History File Permissions configurable
[ https://issues.apache.org/jira/browse/MAPREDUCE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7201: - Attachment: MAPREDUCE-7201-001.patch > Make Job History File Permissions configurable > -- > > Key: MAPREDUCE-7201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7201-001.patch > > > Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to > configure the intermediate user directory permission but still the jhist file > permission are not changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7201) Make Job History File Permissions configurable
[ https://issues.apache.org/jira/browse/MAPREDUCE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7201: - Status: Patch Available (was: Open) > Make Job History File Permissions configurable > -- > > Key: MAPREDUCE-7201 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 3.2.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7201-001.patch > > > Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to > configure the intermediate user directory permission but still the jhist file > permission are not changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7201) Make Job History File Permissions configurable
Prabhu Joseph created MAPREDUCE-7201: Summary: Make Job History File Permissions configurable Key: MAPREDUCE-7201 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Affects Versions: 3.2.0 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to configure the intermediate user directory permission but still the jhist file permission are not changed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-7184) TestJobCounters byte counters omitting crc file bytes read
[ https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned MAPREDUCE-7184: Assignee: (was: Prabhu Joseph) > TestJobCounters byte counters omitting crc file bytes read > -- > > Key: MAPREDUCE-7184 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch, > MAPREDUCE-7184-003.patch > > > TestJobCounters test cases are failing in trunk while validating the input > files size with BYTES_READ by the job. The crc files are considered in > getFileSize whereas the job FileInputFormat ignores them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7184) TestJobCounters byte counters omitting crc file bytes read
[ https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766091#comment-16766091 ] Prabhu Joseph commented on MAPREDUCE-7184: -- Thanks [~ste...@apache.org] for the details. I am OK either way. > TestJobCounters byte counters omitting crc file bytes read > -- > > Key: MAPREDUCE-7184 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch, > MAPREDUCE-7184-003.patch > > > TestJobCounters test cases are failing in trunk while validating the input > files size with BYTES_READ by the job. The crc files are considered in > getFileSize whereas the job FileInputFormat ignores them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file
[ https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765994#comment-16765994 ] Prabhu Joseph commented on MAPREDUCE-7184: -- [~tasanuma0829] FileInputFormat does not consider hidden files (HiddenFileFilter) in the input list and so suspect the crc files are getting created after some change after which test case breaks. Since FileInputFormat does not consider crc files and so the getFileSize can also ignore. > TestJobCounters#getFileSize can ignore crc file > --- > > Key: MAPREDUCE-7184 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch, > MAPREDUCE-7184-003.patch > > > TestJobCounters test cases are failing in trunk while validating the input > files size with BYTES_READ by the job. The crc files are considered in > getFileSize whereas the job FileInputFormat ignores them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file
[ https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7184: - Attachment: MAPREDUCE-7184-003.patch > TestJobCounters#getFileSize can ignore crc file > --- > > Key: MAPREDUCE-7184 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch, > MAPREDUCE-7184-003.patch > > > TestJobCounters test cases are failing in trunk while validating the input > files size with BYTES_READ by the job. The crc files are considered in > getFileSize whereas the job FileInputFormat ignores them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file
[ https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7184: - Attachment: MAPREDUCE-7184-002.patch > TestJobCounters#getFileSize can ignore crc file > --- > > Key: MAPREDUCE-7184 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch > > > TestJobCounters test cases are failing in trunk while validating the input > files size with BYTES_READ by the job. The crc files are considered in > getFileSize whereas the job FileInputFormat ignores them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file
[ https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764670#comment-16764670 ] Prabhu Joseph commented on MAPREDUCE-7184: -- Hi [~sunilg], Can you review this jira which fixes TestJobCounters failing on trunk. > TestJobCounters#getFileSize can ignore crc file > --- > > Key: MAPREDUCE-7184 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7184-001.patch > > > TestJobCounters test cases are failing in trunk while validating the input > files size with BYTES_READ by the job. The crc files are considered in > getFileSize whereas the job FileInputFormat ignores them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file
[ https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7184: - Status: Patch Available (was: Open) > TestJobCounters#getFileSize can ignore crc file > --- > > Key: MAPREDUCE-7184 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7184-001.patch > > > TestJobCounters test cases are failing in trunk while validating the input > files size with BYTES_READ by the job. The crc files are considered in > getFileSize whereas the job FileInputFormat ignores them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file
[ https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7184: - Attachment: MAPREDUCE-7184-001.patch > TestJobCounters#getFileSize can ignore crc file > --- > > Key: MAPREDUCE-7184 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: MAPREDUCE-7184-001.patch > > > TestJobCounters test cases are failing in trunk while validating the input > files size with BYTES_READ by the job. The crc files are considered in > getFileSize whereas the job FileInputFormat ignores them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file
Prabhu Joseph created MAPREDUCE-7184: Summary: TestJobCounters#getFileSize can ignore crc file Key: MAPREDUCE-7184 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Prabhu Joseph Assignee: Prabhu Joseph TestJobCounters test cases are failing in trunk while validating the input files size with BYTES_READ by the job. The crc files are considered in getFileSize whereas the job FileInputFormat ignores them. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749762#comment-16749762 ] Prabhu Joseph commented on MAPREDUCE-7026: -- [~sunilg] Can you review this jira as well. Thanks. > Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler > -- > > Key: MAPREDUCE-7026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: supportability > Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch, > MAPREDUCE-7026.3.patch > > > A job is failing with reduce tasks failed to fetch map output and the > NodeManager ShuffleHandler failed to serve the map outputs with some > IOException like below. ShuffleHandler sends the actual error message in > response inside sendError() but the Fetcher does not log this message. > Logs from NodeManager ShuffleHandler: > {code} > 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler > (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating > headers : > java.io.IOException: Error Reading IndexFile > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Owner 'hbase' for path > /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index > d
[jira] [Assigned] (MAPREDUCE-7145) Improve ShuffleHandler Logging
[ https://issues.apache.org/jira/browse/MAPREDUCE-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned MAPREDUCE-7145: Assignee: Prabhu Joseph > Improve ShuffleHandler Logging > -- > > Key: MAPREDUCE-7145 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7145 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > ShuffleHandler logs SpillFile not found when there is a permission denied > issue which is misleading. > {code} > try { > spill = SecureIOUtils.openForRandomRead(spillfile, "r", user, null); > } catch (FileNotFoundException e) { > LOG.info(spillfile + " not found"); > return null; > } > {code} > SecureIOUtils.openForRandomRead should log "Permission denied" or "No such > file or directory" instead of generic "file not found" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7145) Improve ShuffleHandler Logging
Prabhu Joseph created MAPREDUCE-7145: Summary: Improve ShuffleHandler Logging Key: MAPREDUCE-7145 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7145 Project: Hadoop Map/Reduce Issue Type: Bug Components: nodemanager Affects Versions: 2.7.3 Reporter: Prabhu Joseph ShuffleHandler logs SpillFile not found when there is a permission denied issue which is misleading. {code} try { spill = SecureIOUtils.openForRandomRead(spillfile, "r", user, null); } catch (FileNotFoundException e) { LOG.info(spillfile + " not found"); return null; } {code} SecureIOUtils.openForRandomRead should log "Permission denied" or "No such file or directory" instead of generic "file not found" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7026: - Attachment: MAPREDUCE-7026.3.patch > Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler > -- > > Key: MAPREDUCE-7026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: supportability > Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch, > MAPREDUCE-7026.3.patch > > > A job is failing with reduce tasks failed to fetch map output and the > NodeManager ShuffleHandler failed to serve the map outputs with some > IOException like below. ShuffleHandler sends the actual error message in > response inside sendError() but the Fetcher does not log this message. > Logs from NodeManager ShuffleHandler: > {code} > 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler > (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating > headers : > java.io.IOException: Error Reading IndexFile > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Owner 'hbase' for path > /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index > did not match expected owner 'bde' > at > org.apache.hadoop.io.S
[jira] [Created] (MAPREDUCE-7087) NNBench shows invalid Avg exec time and Avg Lat
Prabhu Joseph created MAPREDUCE-7087: Summary: NNBench shows invalid Avg exec time and Avg Lat Key: MAPREDUCE-7087 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7087 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.7.3 Reporter: Prabhu Joseph Assignee: Prabhu Joseph NNBench shows Invalid Avg exec time and Avg Lat when there is zero successful file operations. Better to not show them instead of invalid numbers. {code} 18/04/25 09:57:33 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity 18/04/25 09:57:33 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450133#comment-16450133 ] Prabhu Joseph edited comment on MAPREDUCE-7026 at 4/24/18 4:08 PM: --- Thanks [~jlowe] for the review. Yes you are right, the Fetcher error message in description is different. The patch tries to address below one {code} 2018-04-19 12:24:39,511 WARN [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Invalid map id java.lang.IllegalArgumentException: TaskAttemptId string : TTP/1.1 500 Internal Server Error Content-Type: text/plain; charset=UTF is not properly formed at org.apache.hadoop.mapreduce.TaskAttemptID.forName(TaskAttemptID.java:201) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:517) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:345) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:200) {code} Have added instrumentation code to throw FileNotFoundException. Below is the complete response the ShuffleHandler has send. {code} TTP/1.1 500 Internal Server Error Content-Type: text/plain; charset=UTF name: mapreduce version: 1.0.0 /tmp/file (No such file or directory) {code} Below is the sample log output after the patch. {code} 2018-04-19 12:24:39,510 WARN [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Error message from Shuffle Handler: name: mapreduce version: 1.0.0 /tmp/file (No such file or directory) {code} was (Author: prabhu joseph): Thanks [~jlowe] for the review. Have added instrumentation code to throw FileNotFoundException. Below is the complete response the ShuffleHandler has send. {code} TTP/1.1 500 Internal Server Error Content-Type: text/plain; charset=UTF name: mapreduce version: 1.0.0 /tmp/file (No such file or directory) {code} Below is the sample log output after the patch. {code} 2018-04-19 12:24:39,510 WARN [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Error message from Shuffle Handler: name: mapreduce version: 1.0.0 /tmp/file (No such file or directory) {code} > Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler > -- > > Key: MAPREDUCE-7026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: supportability > Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch > > > A job is failing with reduce tasks failed to fetch map output and the > NodeManager ShuffleHandler failed to serve the map outputs with some > IOException like below. ShuffleHandler sends the actual error message in > response inside sendError() but the Fetcher does not log this message. > Logs from NodeManager ShuffleHandler: > {code} > 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler > (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating > headers : > java.io.IOException: Error Reading IndexFile > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived
[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7026: - Description: A job is failing with reduce tasks failed to fetch map output and the NodeManager ShuffleHandler failed to serve the map outputs with some IOException like below. ShuffleHandler sends the actual error message in response inside sendError() but the Fetcher does not log this message. Logs from NodeManager ShuffleHandler: {code} 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating headers : java.io.IOException: Error Reading IndexFile at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Owner 'hbase' for path /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index did not match expected owner 'bde' at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285) at org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174) at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158) at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:70) at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:62) at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119) {code} Fetcher Logs below without the actual error message: {code} 2018-04-19 12:24:39,521 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
[jira] [Commented] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450133#comment-16450133 ] Prabhu Joseph commented on MAPREDUCE-7026: -- Thanks [~jlowe] for the review. Have added instrumentation code to throw FileNotFoundException. Below is the complete response the ShuffleHandler has send. {code} TTP/1.1 500 Internal Server Error Content-Type: text/plain; charset=UTF name: mapreduce version: 1.0.0 /tmp/file (No such file or directory) {code} Below is the sample log output after the patch. {code} 2018-04-19 12:24:39,510 WARN [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Error message from Shuffle Handler: name: mapreduce version: 1.0.0 /tmp/file (No such file or directory) {code} > Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler > -- > > Key: MAPREDUCE-7026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: supportability > Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch > > > A job is failing with reduce tasks failed to fetch map output and the > NodeManager ShuffleHandler failed to serve the map outputs with some > IOException like below. ShuffleHandler sends the actual error message in > response inside sendError() but the Fetcher does not log this message. > Logs from NodeManager ShuffleHandler: > {code} > 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler > (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating > headers : > java.io.IOException: Error Reading IndexFile > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProo
[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7026: - Description: A job is failing with reduce tasks failed to fetch map output and the NodeManager ShuffleHandler failed to serve the map outputs with some IOException like below. ShuffleHandler sends the actual error message in response inside sendError() but the Fetcher does not log this message. Logs from NodeManager ShuffleHandler: {code} 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating headers : java.io.IOException: Error Reading IndexFile at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Owner 'hbase' for path /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index did not match expected owner 'bde' at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285) at org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174) at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158) at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:70) at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:62) at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119) {code} Fetcher Logs below without the actual error message: {code} 2018-04-19 12:24:39,511 WARN [fetcher#3] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Invalid map id java.lang.IllegalArgumentException: TaskAttemptId string : TTP/1.1 500 Internal Server Error Content-Type: text/plain; charset=UTF is not properly formed at org.
[jira] [Commented] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449590#comment-16449590 ] Prabhu Joseph commented on MAPREDUCE-7026: -- [~sunilg] Can you review this when you get time. > Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler > -- > > Key: MAPREDUCE-7026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: supportability > Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch > > > A job is failing with reduce tasks failed to fetch map output and the > NodeManager ShuffleHandler failed to serve the map outputs with some > IOException like below. ShuffleHandler sends the actual error message in > response inside sendError() but the Fetcher does not log this message. > Logs from NodeManager ShuffleHandler: > {code} > 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler > (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating > headers : > java.io.IOException: Error Reading IndexFile > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Owner 'hbase' for path > /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index > did not match expected owner 'bd
[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7026: - Attachment: MAPREDUCE-7026.2.patch > Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler > -- > > Key: MAPREDUCE-7026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: supportability > Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch > > > A job is failing with reduce tasks failed to fetch map output and the > NodeManager ShuffleHandler failed to serve the map outputs with some > IOException like below. ShuffleHandler sends the actual error message in > response inside sendError() but the Fetcher does not log this message. > Logs from NodeManager ShuffleHandler: > {code} > 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler > (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating > headers : > java.io.IOException: Error Reading IndexFile > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Owner 'hbase' for path > /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index > did not match expected owner 'bde' > at > org.apache.hadoop.io.SecureIOUtils.checkStat(Secu
[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7026: - Status: Patch Available (was: Open) > Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler > -- > > Key: MAPREDUCE-7026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: supportability > Attachments: MAPREDUCE-7026.1.patch > > > A job is failing with reduce tasks failed to fetch map output and the > NodeManager ShuffleHandler failed to serve the map outputs with some > IOException like below. ShuffleHandler sends the actual error message in > response inside sendError() but the Fetcher does not log this message. > Logs from NodeManager ShuffleHandler: > {code} > 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler > (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating > headers : > java.io.IOException: Error Reading IndexFile > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Owner 'hbase' for path > /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index > did not match expected owner 'bde' > at > org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285) >
[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7026: - Attachment: MAPREDUCE-7026.1.patch > Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler > -- > > Key: MAPREDUCE-7026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: supportability > Attachments: MAPREDUCE-7026.1.patch > > > A job is failing with reduce tasks failed to fetch map output and the > NodeManager ShuffleHandler failed to serve the map outputs with some > IOException like below. ShuffleHandler sends the actual error message in > response inside sendError() but the Fetcher does not log this message. > Logs from NodeManager ShuffleHandler: > {code} > 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler > (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating > headers : > java.io.IOException: Error Reading IndexFile > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Owner 'hbase' for path > /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index > did not match expected owner 'bde' > at > org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285) >
[jira] [Assigned] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph reassigned MAPREDUCE-7026: Assignee: Prabhu Joseph > Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler > -- > > Key: MAPREDUCE-7026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Labels: supportability > > A job is failing with reduce tasks failed to fetch map output and the > NodeManager ShuffleHandler failed to serve the map outputs with some > IOException like below. ShuffleHandler sends the actual error message in > response inside sendError() but the Fetcher does not log this message. > Logs from NodeManager ShuffleHandler: > {code} > 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler > (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating > headers : > java.io.IOException: Error Reading IndexFile > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Owner 'hbase' for path > /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index > did not match expected owner 'bde' > at > org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285) > at > org.apache.hadoop.io.SecureIOUtils.force
[jira] [Updated] (MAPREDUCE-7071) Bypass the Fetcher and read directly from the local filesystem if source Mapper ran on the same host
[ https://issues.apache.org/jira/browse/MAPREDUCE-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7071: - Description: In the case of the source mapper and reducer are on the same host bypass the Fetcher and read it directly from the local filesystem Idea is from Tez - https://issues.apache.org/jira/browse/TEZ-1343 was:In the case of the source mapper and reducer are on the same host bypass the Fetcher and read it directly from the local filesystem > Bypass the Fetcher and read directly from the local filesystem if source > Mapper ran on the same host > > > Key: MAPREDUCE-7071 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7071 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > In the case of the source mapper and reducer are on the same host bypass the > Fetcher and read it directly from the local filesystem > Idea is from Tez - https://issues.apache.org/jira/browse/TEZ-1343 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7071) Bypass the Fetcher and read directly from the local filesystem if source Mapper ran on the same host
Prabhu Joseph created MAPREDUCE-7071: Summary: Bypass the Fetcher and read directly from the local filesystem if source Mapper ran on the same host Key: MAPREDUCE-7071 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7071 Project: Hadoop Map/Reduce Issue Type: Task Components: task Affects Versions: 2.7.3 Reporter: Prabhu Joseph Assignee: Prabhu Joseph In the case of the source mapper and reducer are on the same host bypass the Fetcher and read it directly from the local filesystem -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7045) JobListCache grows unlimited when the jobs are failed to move to done directory
Prabhu Joseph created MAPREDUCE-7045: Summary: JobListCache grows unlimited when the jobs are failed to move to done directory Key: MAPREDUCE-7045 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7045 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.7.3 Reporter: Prabhu Joseph When the jobs are failed to move to the done directory due to some reason like Permission issue, the JobListCache size grows unlimited with all failed jobs and the addIfAbsent() has to scan all the cache items. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7026: - Description: A job is failing with reduce tasks failed to fetch map output and the NodeManager ShuffleHandler failed to serve the map outputs with some IOException like below. ShuffleHandler sends the actual error message in response inside sendError() but the Fetcher does not log this message. Logs from NodeManager ShuffleHandler: {code} 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating headers : java.io.IOException: Error Reading IndexFile at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Owner 'hbase' for path /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index did not match expected owner 'bde' at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285) at org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174) at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158) at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:70) at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:62) at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119) {code} Fetcher Logs below without the actual error message: {code} 2017-12-18 10:10:17,688 INFO [IPC Server handler 1 on 35118] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1511248592679_0039_r_00_0: Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3
[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-7026: - Labels: supportability (was: ) > Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler > -- > > Key: MAPREDUCE-7026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph > Labels: supportability > > A job is failing with reduce tasks failed to fetch map output and the > NodeManager ShuffleHandler failed to serve the map outputs with some > IOException like below. ShuffleHandler sends the actual error message in > response inside sendError() but the Fetcher does not log this message. > Logs from NodeManager ShuffleHandler: > {code} > 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler > (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating > headers : > java.io.IOException: Error Reading IndexFile > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) > at > org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) > at > org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) > at > org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Owner 'hbase' for path > /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index > did not match expected owner 'bde' > at > org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285) > at > org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174) > at
[jira] [Created] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
Prabhu Joseph created MAPREDUCE-7026: Summary: Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler Key: MAPREDUCE-7026 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 2.7.3 Reporter: Prabhu Joseph A job is failing with reduce tasks failed to fetch map output and the NodeManager ShuffleHandler failed to serve the map outputs with some IOException like below. ShuffleHandler sends the actual error message in response inside sendError() but the Fetcher does not log this message. Logs from NodeManager ShuffleHandler: {code} 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating headers : java.io.IOException: Error Reading IndexFile at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089) at org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536) at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Owner 'hbase' for path /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index did not match expected owner 'bde' at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285) at org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174) at org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158) at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:70) at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:62) at org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119) {code} Fetcher Logs below Instead without the actual error message: {code} 2017-12-18 10:10:17,688 INFO [IPC Server h
[jira] [Commented] (MAPREDUCE-6975) Logging task counters
[ https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240222#comment-16240222 ] Prabhu Joseph commented on MAPREDUCE-6975: -- Thanks a lot [~Naganarasimha] > Logging task counters > -- > > Key: MAPREDUCE-6975 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Fix For: 2.9.0, 2.7.5, 3.0.0, 2.8.3 > > Attachments: Log_Output, MAPREDUCE-6975.1.patch, > MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch > > > Logging counters for each task at the end of it's syslog will make debug > easier with just application logs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6975) Logging task counters
[ https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16237523#comment-16237523 ] Prabhu Joseph commented on MAPREDUCE-6975: -- Hi [~Naganarasimha], if you get some time review this. > Logging task counters > -- > > Key: MAPREDUCE-6975 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: Log_Output, MAPREDUCE-6975.1.patch, > MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch > > > Logging counters for each task at the end of it's syslog will make debug > easier with just application logs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6993) Provide additional aggregated task stats at the Map / Reduce level
Prabhu Joseph created MAPREDUCE-6993: Summary: Provide additional aggregated task stats at the Map / Reduce level Key: MAPREDUCE-6993 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6993 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.3 Reporter: Prabhu Joseph MapReduce ApplicationMaster can log aggregated tasks stats for Map / Reduce stage like below which will make debugging easier. Similar to what Tez provides TEZ-930 firstTaskStartTime, firstTasksToStart lastTaskFinishTime lastTasksToFinish minTaskDuration maxTaskDuration avgTaskDuration numSuccessfulTasks shortestDurationTasks longestDurationTasks numFailedTaskAttempts numKilledTaskAttempts numCompletedTasks numSucceededTasks numKilledTasks numFailedTasks -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6975) Logging task counters
[ https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218324#comment-16218324 ] Prabhu Joseph commented on MAPREDUCE-6975: -- Thanks [~Naganarasimha] > Logging task counters > -- > > Key: MAPREDUCE-6975 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: Log_Output, MAPREDUCE-6975.1.patch, > MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch > > > Logging counters for each task at the end of it's syslog will make debug > easier with just application logs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6975) Logging task counters
[ https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218306#comment-16218306 ] Prabhu Joseph commented on MAPREDUCE-6975: -- [~Naganarasimha] [~rohithsharma] Can you help in reviewing this. > Logging task counters > -- > > Key: MAPREDUCE-6975 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: Log_Output, MAPREDUCE-6975.1.patch, > MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch > > > Logging counters for each task at the end of it's syslog will make debug > easier with just application logs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6975) Logging task counters
[ https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206187#comment-16206187 ] Prabhu Joseph commented on MAPREDUCE-6975: -- [~Naganarasimha] Did some testing with failures cases. When AM fails, Diagnostics from stderr is displayed at Client and RM UI side and this won't have any logging of task counters. When task fails, there is no diagnostics captured. As per the testing, did not see any confusion caused by logging of task counters. > Logging task counters > -- > > Key: MAPREDUCE-6975 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: Log_Output, MAPREDUCE-6975.1.patch, > MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch > > > Logging counters for each task at the end of it's syslog will make debug > easier with just application logs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6975) Logging task counters
[ https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201783#comment-16201783 ] Prabhu Joseph commented on MAPREDUCE-6975: -- [~Naganarasimha] Looks like we capture diagnostic message from ApplicationMaster and not for Task Containers, the counters we log is only for task containers and will be in syslog. Correct me if i am missing something. > Logging task counters > -- > > Key: MAPREDUCE-6975 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: Log_Output, MAPREDUCE-6975.1.patch, > MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch > > > Logging counters for each task at the end of it's syslog will make debug > easier with just application logs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6975) Logging task counters
[ https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201723#comment-16201723 ] Prabhu Joseph commented on MAPREDUCE-6975: -- [~Naganarasimha] Yes, have used Counters.toString() which output each counter in separate line. > Logging task counters > -- > > Key: MAPREDUCE-6975 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Affects Versions: 2.7.3 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph > Attachments: Log_Output, MAPREDUCE-6975.1.patch, > MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch > > > Logging counters for each task at the end of it's syslog will make debug > easier with just application logs. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org