[jira] [Commented] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal

2022-09-17 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606137#comment-17606137
 ] 

Prabhu Joseph commented on MAPREDUCE-7351:
--

This patch removes the _temporary directory under output path and not the 
output path. And in case of job succeed or failed, even without this patch the 
_temporary directory under output path will be removed.

> CleanupJob during handle of SIGTERM signal
> --
>
> Key: MAPREDUCE-7351
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Shubham Gupta
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7351-001.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently MR CleanupJob happens when the job is either successful or fail. 
> But during kill, it is not handled. This leaves all the temporary folders 
> under the output path.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7201) Make Job History File Permissions configurable

2022-07-10 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7201:
-
Fix Version/s: 3.4.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Make Job History File Permissions configurable
> --
>
> Key: MAPREDUCE-7201
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: MAPREDUCE-7201-001.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to 
> configure the intermediate user directory permission but still the jhist file 
> permission are not changed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7369) MapReduce tasks timing out when spends more time on MultipleOutputs#close

2021-12-06 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph reassigned MAPREDUCE-7369:


Assignee: Ravuri Sushma sree  (was: Prabhu Joseph)

> MapReduce tasks timing out when spends more time on MultipleOutputs#close
> -
>
> Key: MAPREDUCE-7369
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7369
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.3.1
>Reporter: Prabhu Joseph
>Assignee: Ravuri Sushma sree
>Priority: Major
>
> MapReduce tasks timing out when spends more time on MultipleOutputs#close. 
> MultipleOutputs#closes takes more time when there are multiple files to be 
> closed & there is a high latency in closing a stream.
> {code}
> 2021-11-01 02:45:08,312 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1634949471086_61268_m_001115_0: 
> AttemptID:attempt_1634949471086_61268_m_001115_0 Timed out after 300 secs
> {code}
> MapReduce task timeout can be increased but it is tough to set the right 
> timeout value. The timeout can be disabled with 0 but that might lead to 
> hanging tasks not getting killed.
> The tasks are sending the ping every 3 seconds which are not honored by 
> ApplicationMaster. It expects the status information which won't be send 
> during MultipleOutputs#close. This jira is to add a config which considers 
> the ping from task as part of Task Liveliness Check in the ApplicationMaster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7369) MapReduce tasks timing out when spends more time on MultipleOutputs#close

2021-11-29 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450668#comment-17450668
 ] 

Prabhu Joseph commented on MAPREDUCE-7369:
--

bq. Have you thought about also parallelising the close so that and the 
different outputs can be closed simultaneously?

That will improve the speed. Have reported 
[MapReduce-7370|https://issues.apache.org/jira/browse/MAPREDUCE-7370] to handle 
the same. Thanks.


> MapReduce tasks timing out when spends more time on MultipleOutputs#close
> -
>
> Key: MAPREDUCE-7369
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7369
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.3.1
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> MapReduce tasks timing out when spends more time on MultipleOutputs#close. 
> MultipleOutputs#closes takes more time when there are multiple files to be 
> closed & there is a high latency in closing a stream.
> {code}
> 2021-11-01 02:45:08,312 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1634949471086_61268_m_001115_0: 
> AttemptID:attempt_1634949471086_61268_m_001115_0 Timed out after 300 secs
> {code}
> MapReduce task timeout can be increased but it is tough to set the right 
> timeout value. The timeout can be disabled with 0 but that might lead to 
> hanging tasks not getting killed.
> The tasks are sending the ping every 3 seconds which are not honored by 
> ApplicationMaster. It expects the status information which won't be send 
> during MultipleOutputs#close. This jira is to add a config which considers 
> the ping from task as part of Task Liveliness Check in the ApplicationMaster.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7370) Parallelize MultipleOutputs#close call

2021-11-29 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7370:
-
Description: 
This call takes more time when there are lot of files to close and there is a 
high latency to close. Parallelize MultipleOutputs#close call to improve the 
speed.

{code}
  public void close() throws IOException {
for (RecordWriter writer : recordWriters.values()) {
  writer.close(null);
}
  }
{code}

Idea is from [~ste...@apache.org]

  was:
This call takes more time when there are lot of files to close and there is a 
high latency to close. Parallelize MultipleOutputs#close call to improve the 
speed.

{code}
  public void close() throws IOException {
for (RecordWriter writer : recordWriters.values()) {
  writer.close(null);
}
  }
{code}


> Parallelize MultipleOutputs#close call
> --
>
> Key: MAPREDUCE-7370
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7370
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Ravuri Sushma sree
>Priority: Major
>
> This call takes more time when there are lot of files to close and there is a 
> high latency to close. Parallelize MultipleOutputs#close call to improve the 
> speed.
> {code}
>   public void close() throws IOException {
> for (RecordWriter writer : recordWriters.values()) {
>   writer.close(null);
> }
>   }
> {code}
> Idea is from [~ste...@apache.org]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7370) Parallelize MultipleOutputs#close call

2021-11-29 Thread Prabhu Joseph (Jira)
Prabhu Joseph created MAPREDUCE-7370:


 Summary: Parallelize MultipleOutputs#close call
 Key: MAPREDUCE-7370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7370
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Ravuri Sushma sree


This call takes more time when there are lot of files to close and there is a 
high latency to close. Parallelize MultipleOutputs#close call to improve the 
speed.

{code}
  public void close() throws IOException {
for (RecordWriter writer : recordWriters.values()) {
  writer.close(null);
}
  }
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7369) MapReduce tasks timing out when spends more time on MultipleOutputs#close

2021-11-18 Thread Prabhu Joseph (Jira)
Prabhu Joseph created MAPREDUCE-7369:


 Summary: MapReduce tasks timing out when spends more time on 
MultipleOutputs#close
 Key: MAPREDUCE-7369
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7369
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.3.1
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


MapReduce tasks timing out when spends more time on MultipleOutputs#close. 
MultipleOutputs#closes takes more time when there are multiple files to be 
closed & there is a high latency in closing a stream.

{code}
2021-11-01 02:45:08,312 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1634949471086_61268_m_001115_0: 
AttemptID:attempt_1634949471086_61268_m_001115_0 Timed out after 300 secs
{code}

MapReduce task timeout can be increased but it is tough to set the right 
timeout value. The timeout can be disabled with 0 but that might lead to 
hanging tasks not getting killed.

The tasks are sending the ping every 3 seconds which are not honored by 
ApplicationMaster. It expects the status information which won't be send during 
MultipleOutputs#close. This jira is to add a config which considers the ping 
from task as part of Task Liveliness Check in the ApplicationMaster.








--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal

2021-07-06 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7351:
-
Labels:   (was: pull-request-available)

> CleanupJob during handle of SIGTERM signal
> --
>
> Key: MAPREDUCE-7351
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Shubham Gupta
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7351-001.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently MR CleanupJob happens when the job is either successful or fail. 
> But during kill, it is not handled. This leaves all the temporary folders 
> under the output path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal

2021-07-06 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7351:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> CleanupJob during handle of SIGTERM signal
> --
>
> Key: MAPREDUCE-7351
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Shubham Gupta
>Priority: Minor
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7351-001.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently MR CleanupJob happens when the job is either successful or fail. 
> But during kill, it is not handled. This leaves all the temporary folders 
> under the output path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal

2021-07-06 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376197#comment-17376197
 ] 

Prabhu Joseph commented on MAPREDUCE-7351:
--

Thanks [~shubhamod] for the patch. Have committed it to trunk.

> CleanupJob during handle of SIGTERM signal
> --
>
> Key: MAPREDUCE-7351
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Shubham Gupta
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7351-001.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently MR CleanupJob happens when the job is either successful or fail. 
> But during kill, it is not handled. This leaves all the temporary folders 
> under the output path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7355) Fix MRAppMaster to getStagingAreaDir from Job Configuration

2021-06-23 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7355:
-
Description: 
When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to 
submit the job, client uses /mapreducestaging/yarn as staging directory whereas 
MRAppMaster uses /mapreducestaging/oozie. This leads to below failure

{code}
Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.io.FileNotFoundException: 
hdfs://yarncluster/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo:
 No such file or directory.
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263)
{code}

MRAppMaster can rely on Job Configuration mapreduce.job.dir to avoid this issue.

  was:
When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to 
submit the job, client uses /mapreducestaging/yarn as staging directory whereas 
MRAppMaster uses /mapreducestaging/oozie. This leads to below failure

{code}
Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.io.FileNotFoundException: 
wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo:
 No such file or directory.
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263)
{code}

MRAppMaster can rely on Job Configuration mapreduce.job.dir to avoid this issue.


> Fix MRAppMaster to getStagingAreaDir from Job Configuration
> ---
>
> Key: MAPREDUCE-7355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7355
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to 
> submit the job, client uses /mapreducestaging/yarn as staging directory 
> whereas MRAppMaster uses /mapreducestaging/oozie. This leads to below failure
> {code}
> Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.io.FileNotFoundException: 
> hdfs://yarncluster/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo:
>  No such file or directory.
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.State

[jira] [Updated] (MAPREDUCE-7355) Fix MRAppMaster to getStagingAreaDir from Job Configuration

2021-06-21 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7355:
-
Description: 
When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to 
submit the job, client uses /mapreducestaging/yarn as staging directory whereas 
MRAppMaster uses /mapreducestaging/oozie. This leads to below failure

{code}
Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.io.FileNotFoundException: 
wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo:
 No such file or directory.
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263)
{code}

MRAppMaster can rely on Job Configuration mapreduce.job.dir to avoid this issue.

  was:
When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to 
submit the job, client uses /mapreducestaging/yarn as staging directory whereas 
MRAppMaster uses /mapreducestaging/oozie. This leads to below failure

{code}
Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.io.FileNotFoundException: 
wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo:
 No such file or directory.
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263)
{code}

MRApps#getStagingAreaDir can rely on Job Configuration mapreduce.job.dir to 
avoid this issue.


> Fix MRAppMaster to getStagingAreaDir from Job Configuration
> ---
>
> Key: MAPREDUCE-7355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7355
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to 
> submit the job, client uses /mapreducestaging/yarn as staging directory 
> whereas MRAppMaster uses /mapreducestaging/oozie. This leads to below failure
> {code}
> Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.io.FileNotFoundException: 
> wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo:
>  No such file or directory.
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431)
> at 
> org.apache.hadoop.yarn.sta

[jira] [Updated] (MAPREDUCE-7355) Fix MRAppMaster to getStagingAreaDir from Job Configuration

2021-06-21 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7355:
-
Summary: Fix MRAppMaster to getStagingAreaDir from Job Configuration  (was: 
Fix MRApps#getStagingAreaDir to fetch it from Job Configuration)

> Fix MRAppMaster to getStagingAreaDir from Job Configuration
> ---
>
> Key: MAPREDUCE-7355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7355
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to 
> submit the job, client uses /mapreducestaging/yarn as staging directory 
> whereas MRAppMaster uses /mapreducestaging/oozie. This leads to below failure
> {code}
> Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
> java.io.FileNotFoundException: 
> wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo:
>  No such file or directory.
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263)
> {code}
> MRApps#getStagingAreaDir can rely on Job Configuration mapreduce.job.dir to 
> avoid this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7355) Fix MRApps#getStagingAreaDir to fetch it from Job Configuration

2021-06-21 Thread Prabhu Joseph (Jira)
Prabhu Joseph created MAPREDUCE-7355:


 Summary: Fix MRApps#getStagingAreaDir to fetch it from Job 
Configuration
 Key: MAPREDUCE-7355
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7355
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


When JobClient (runs as yarn user) uses RM_DELEGATION_TOKEN (owner:oozie) to 
submit the job, client uses /mapreducestaging/yarn as staging directory whereas 
MRAppMaster uses /mapreducestaging/oozie. This leads to below failure

{code}
Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.io.FileNotFoundException: 
wasb://oozie-2021-06-21t13-57-29-7...@ooziehdistorage.blob.core.windows.net/mapreducestaging/oozie/.staging/job_1624284676187_0003/job.splitmetainfo:
 No such file or directory.
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1611)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1473)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1431)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$500(StateMachineFactory.java:46)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:487)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:1010)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:141)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1544)
at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1263)
{code}

MRApps#getStagingAreaDir can rely on Job Configuration mapreduce.job.dir to 
avoid this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal

2021-06-13 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362473#comment-17362473
 ] 

Prabhu Joseph commented on MAPREDUCE-7351:
--

Yes right [~ste...@apache.org].

> CleanupJob during handle of SIGTERM signal
> --
>
> Key: MAPREDUCE-7351
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>
> Currently MR CleanupJob happens when the job is either successful or fail. 
> But during kill, it is not handled. This leaves all the temporary folders 
> under the output path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7351) CleanupJob during handle of SIGTERM signal

2021-06-12 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7351:
-
Summary: CleanupJob during handle of SIGTERM signal  (was: CleanupJob when 
handling SIGTERM signal)

> CleanupJob during handle of SIGTERM signal
> --
>
> Key: MAPREDUCE-7351
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>
> Currently MR CleanupJob happens when the job is either successful or fail. 
> But during kill, it is not handled. This leaves all the temporary folders 
> under the output path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7351) CleanupJob when handling SIGTERM signal

2021-06-12 Thread Prabhu Joseph (Jira)
Prabhu Joseph created MAPREDUCE-7351:


 Summary: CleanupJob when handling SIGTERM signal
 Key: MAPREDUCE-7351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7351
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


Currently MR CleanupJob happens when the job is either successful or fail. But 
during kill, it is not handled. This leaves all the temporary folders under the 
output path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7250) FrameworkUploader: skip replication check entirely if timeout == 0

2019-12-04 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987931#comment-16987931
 ] 

Prabhu Joseph commented on MAPREDUCE-7250:
--

Have committed it to trunk. Will resolve the Jira.

> FrameworkUploader: skip replication check entirely if timeout == 0
> --
>
> Key: MAPREDUCE-7250
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7250-001.patch
>
>
> The framework uploader tool has this piece of code which makes sure that all 
> block of the uploaded mapreduce tarball has been replicated:
> {noformat}
>   while(endTime - startTime < timeout * 1000 &&
>currentReplication < acceptableReplication) {
> Thread.sleep(1000);
> endTime = System.currentTimeMillis();
> currentReplication = getSmallestReplicatedBlockCount();
>   }
> {noformat}
> There are cases, however, when we don't want to wait for this (eg. we want to 
> speed up Hadoop installation).
> I suggest adding {{--skiprelicationcheck}} switch which disables this 
> replication test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7250) FrameworkUploader: skip replication check entirely if timeout == 0

2019-12-04 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7250:
-
Fix Version/s: 3.3.0

> FrameworkUploader: skip replication check entirely if timeout == 0
> --
>
> Key: MAPREDUCE-7250
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: Reviewed
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7250-001.patch
>
>
> The framework uploader tool has this piece of code which makes sure that all 
> block of the uploaded mapreduce tarball has been replicated:
> {noformat}
>   while(endTime - startTime < timeout * 1000 &&
>currentReplication < acceptableReplication) {
> Thread.sleep(1000);
> endTime = System.currentTimeMillis();
> currentReplication = getSmallestReplicatedBlockCount();
>   }
> {noformat}
> There are cases, however, when we don't want to wait for this (eg. we want to 
> speed up Hadoop installation).
> I suggest adding {{--skiprelicationcheck}} switch which disables this 
> replication test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7250) FrameworkUploader: skip replication check entirely if timeout == 0

2019-12-04 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7250:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> FrameworkUploader: skip replication check entirely if timeout == 0
> --
>
> Key: MAPREDUCE-7250
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: Reviewed
> Attachments: MAPREDUCE-7250-001.patch
>
>
> The framework uploader tool has this piece of code which makes sure that all 
> block of the uploaded mapreduce tarball has been replicated:
> {noformat}
>   while(endTime - startTime < timeout * 1000 &&
>currentReplication < acceptableReplication) {
> Thread.sleep(1000);
> endTime = System.currentTimeMillis();
> currentReplication = getSmallestReplicatedBlockCount();
>   }
> {noformat}
> There are cases, however, when we don't want to wait for this (eg. we want to 
> speed up Hadoop installation).
> I suggest adding {{--skiprelicationcheck}} switch which disables this 
> replication test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7250) FrameworkUploader: skip replication check entirely if timeout == 0

2019-12-04 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7250:
-
Labels: Reviewed  (was: )

> FrameworkUploader: skip replication check entirely if timeout == 0
> --
>
> Key: MAPREDUCE-7250
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
>  Labels: Reviewed
> Attachments: MAPREDUCE-7250-001.patch
>
>
> The framework uploader tool has this piece of code which makes sure that all 
> block of the uploaded mapreduce tarball has been replicated:
> {noformat}
>   while(endTime - startTime < timeout * 1000 &&
>currentReplication < acceptableReplication) {
> Thread.sleep(1000);
> endTime = System.currentTimeMillis();
> currentReplication = getSmallestReplicatedBlockCount();
>   }
> {noformat}
> There are cases, however, when we don't want to wait for this (eg. we want to 
> speed up Hadoop installation).
> I suggest adding {{--skiprelicationcheck}} switch which disables this 
> replication test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7250) FrameworkUploader: skip replication check entirely if timeout == 0

2019-12-04 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987917#comment-16987917
 ] 

Prabhu Joseph commented on MAPREDUCE-7250:
--

Thanks [~pbacsko] for the patch. +1, will commit it shortly.

The fix does not change any existing behavior other then not logging error 
message, will ignore testcase for the patch.

> FrameworkUploader: skip replication check entirely if timeout == 0
> --
>
> Key: MAPREDUCE-7250
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7250
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
>Priority: Major
> Attachments: MAPREDUCE-7250-001.patch
>
>
> The framework uploader tool has this piece of code which makes sure that all 
> block of the uploaded mapreduce tarball has been replicated:
> {noformat}
>   while(endTime - startTime < timeout * 1000 &&
>currentReplication < acceptableReplication) {
> Thread.sleep(1000);
> endTime = System.currentTimeMillis();
> currentReplication = getSmallestReplicatedBlockCount();
>   }
> {noformat}
> There are cases, however, when we don't want to wait for this (eg. we want to 
> speed up Hadoop installation).
> I suggest adding {{--skiprelicationcheck}} switch which disables this 
> replication test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7249) Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes job failure

2019-11-28 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7249:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes 
> job failure 
> 
>
> Key: MAPREDUCE-7249
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7249
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>  Labels: Reviewed
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: MAPREDUCE-7249-001.patch, MAPREDUCE-7249-002.patch, 
> MAPREDUCE-7249-branch-3.2.001.patch
>
>
> Same issue as in MAPREDUCE-7240 but this one has a different state in which 
> the Exception {{TA_TOO_MANY_FETCH_FAILURE}} event is received:
> {code}
> 2019-11-18 23:03:40,270 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1568654141590_630203_m_003108_1
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1183)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1388)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1380)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> {code}
> The stack trace is from a CDH release which is highly patched 2.6 release. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7249) Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes job failure

2019-11-28 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984259#comment-16984259
 ] 

Prabhu Joseph commented on MAPREDUCE-7249:
--

Have committed to trunk, branch-3.2 and branch-3.1. Will resolve the Jira.

> Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes 
> job failure 
> 
>
> Key: MAPREDUCE-7249
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7249
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>  Labels: Reviewed
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: MAPREDUCE-7249-001.patch, MAPREDUCE-7249-002.patch, 
> MAPREDUCE-7249-branch-3.2.001.patch
>
>
> Same issue as in MAPREDUCE-7240 but this one has a different state in which 
> the Exception {{TA_TOO_MANY_FETCH_FAILURE}} event is received:
> {code}
> 2019-11-18 23:03:40,270 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1568654141590_630203_m_003108_1
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1183)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1388)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1380)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> {code}
> The stack trace is from a CDH release which is highly patched 2.6 release. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7249) Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes job failure

2019-11-28 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7249:
-
Fix Version/s: 3.2.2
   3.1.4
   3.3.0

> Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes 
> job failure 
> 
>
> Key: MAPREDUCE-7249
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7249
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: MAPREDUCE-7249-001.patch, MAPREDUCE-7249-002.patch, 
> MAPREDUCE-7249-branch-3.2.001.patch
>
>
> Same issue as in MAPREDUCE-7240 but this one has a different state in which 
> the Exception {{TA_TOO_MANY_FETCH_FAILURE}} event is received:
> {code}
> 2019-11-18 23:03:40,270 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1568654141590_630203_m_003108_1
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1183)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1388)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1380)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> {code}
> The stack trace is from a CDH release which is highly patched 2.6 release. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7249) Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes job failure

2019-11-28 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7249:
-
Labels: Reviewed  (was: )

> Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes 
> job failure 
> 
>
> Key: MAPREDUCE-7249
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7249
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
>  Labels: Reviewed
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
> Attachments: MAPREDUCE-7249-001.patch, MAPREDUCE-7249-002.patch, 
> MAPREDUCE-7249-branch-3.2.001.patch
>
>
> Same issue as in MAPREDUCE-7240 but this one has a different state in which 
> the Exception {{TA_TOO_MANY_FETCH_FAILURE}} event is received:
> {code}
> 2019-11-18 23:03:40,270 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1568654141590_630203_m_003108_1
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1183)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1388)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1380)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> {code}
> The stack trace is from a CDH release which is highly patched 2.6 release. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7249) Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes job failure

2019-11-27 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984155#comment-16984155
 ] 

Prabhu Joseph commented on MAPREDUCE-7249:
--

Thanks [~wilfreds] for fixing the issue. Patch looks good, +1. Will commit it 
shortly.

> Invalid event TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP causes 
> job failure 
> 
>
> Key: MAPREDUCE-7249
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7249
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 3.1.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Critical
> Attachments: MAPREDUCE-7249-001.patch, MAPREDUCE-7249-002.patch, 
> MAPREDUCE-7249-branch-3.2.001.patch
>
>
> Same issue as in MAPREDUCE-7240 but this one has a different state in which 
> the Exception {{TA_TOO_MANY_FETCH_FAILURE}} event is received:
> {code}
> 2019-11-18 23:03:40,270 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1568654141590_630203_m_003108_1
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_CONTAINER_CLEANUP
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1183)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:148)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1388)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1380)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109)
> {code}
> The stack trace is from a CDH release which is highly patched 2.6 release. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error

2019-11-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7240:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at 
> SUCCESS_FINISHING_CONTAINER' cause job error
> 
>
> Key: MAPREDUCE-7240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: luhuachao
>Assignee: luhuachao
>Priority: Critical
>  Labels: applicationmaster, mrv2
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, 
> application_1566552310686_260041.log
>
>
> *log in appmaster*
> {noformat}
> 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_52_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_49_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_51_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_50_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_53_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to 
> FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454
> 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_49_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>   at java.lang.Thread.run(Thread.java:745)
> 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_51_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.ma

[jira] [Updated] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error

2019-11-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7240:
-
Fix Version/s: 3.3.0

> Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at 
> SUCCESS_FINISHING_CONTAINER' cause job error
> 
>
> Key: MAPREDUCE-7240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: luhuachao
>Assignee: luhuachao
>Priority: Critical
>  Labels: applicationmaster, mrv2
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, 
> application_1566552310686_260041.log
>
>
> *log in appmaster*
> {noformat}
> 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_52_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_49_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_51_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_50_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_53_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to 
> FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454
> 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_49_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>   at java.lang.Thread.run(Thread.java:745)
> 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_51_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDis

[jira] [Updated] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error

2019-11-27 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7240:
-
Labels: Reviewed applicationmaster mrv2  (was: applicationmaster mrv2)

> Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at 
> SUCCESS_FINISHING_CONTAINER' cause job error
> 
>
> Key: MAPREDUCE-7240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: luhuachao
>Assignee: luhuachao
>Priority: Critical
>  Labels: Reviewed, applicationmaster, mrv2
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, 
> application_1566552310686_260041.log
>
>
> *log in appmaster*
> {noformat}
> 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_52_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_49_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_51_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_50_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_53_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to 
> FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454
> 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_49_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>   at java.lang.Thread.run(Thread.java:745)
> 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_51_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.ap

[jira] [Commented] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error

2019-11-27 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983298#comment-16983298
 ] 

Prabhu Joseph commented on MAPREDUCE-7240:
--

Patch [^MAPREDUCE-7240-002.patch] looks good, +1. Have committed to trunk.

Thanks [~Huachao] and [~pbacsko] for the patch and [~wilfreds] for the review.

> Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at 
> SUCCESS_FINISHING_CONTAINER' cause job error
> 
>
> Key: MAPREDUCE-7240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: luhuachao
>Assignee: luhuachao
>Priority: Critical
>  Labels: applicationmaster, mrv2
> Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, 
> application_1566552310686_260041.log
>
>
> *log in appmaster*
> {noformat}
> 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_52_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_49_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_51_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_50_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_53_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to 
> FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454
> 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_49_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>   at java.lang.Thread.run(Thread.java:745)
> 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_51_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce

[jira] [Commented] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error

2019-11-26 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16983191#comment-16983191
 ] 

Prabhu Joseph commented on MAPREDUCE-7240:
--

Thanks [~wilfreds] and [~pbacsko] for the clarification. Yes, the SUCCEEDED map 
attempt is also marked as FAILED on TA_TOO_MANY_FETCH_FAILURE.

{code}
 // Transitions from SUCCEEDED
 .addTransition(TaskAttemptStateInternal.SUCCEEDED, //only possible for map 
attempts
 TaskAttemptStateInternal.FAILED,
 TaskAttemptEventType.TA_TOO_MANY_FETCH_FAILURE,
 new TooManyFetchFailureTransition())
{code}

The patch looks good except below. Have fixed it in  
[^MAPREDUCE-7240-002.patch] . 

1. @Test is missed in the testcase. 



> Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at 
> SUCCESS_FINISHING_CONTAINER' cause job error
> 
>
> Key: MAPREDUCE-7240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: luhuachao
>Assignee: luhuachao
>Priority: Critical
>  Labels: kerberos
> Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, 
> application_1566552310686_260041.log
>
>
> *log in appmaster*
> {noformat}
> 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_52_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_49_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_51_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_50_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_53_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to 
> FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454
> 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_49_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>   at java.lang.Thread.run(Thread.java:745)
> 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_51_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> 

[jira] [Updated] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error

2019-11-26 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7240:
-
Attachment: MAPREDUCE-7240-002.patch

> Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at 
> SUCCESS_FINISHING_CONTAINER' cause job error
> 
>
> Key: MAPREDUCE-7240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: luhuachao
>Assignee: luhuachao
>Priority: Critical
>  Labels: kerberos
> Attachments: MAPREDUCE-7240-001.patch, MAPREDUCE-7240-002.patch, 
> application_1566552310686_260041.log
>
>
> *log in appmaster*
> {noformat}
> 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_52_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_49_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_51_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_50_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_53_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to 
> FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454
> 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_49_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>   at java.lang.Thread.run(Thread.java:745)
> 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_51_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.jav

[jira] [Commented] (MAPREDUCE-7240) Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER' cause job error

2019-11-26 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982254#comment-16982254
 ] 

Prabhu Joseph commented on MAPREDUCE-7240:
--

[~Huachao] Have a doubt, what happens if the successfully finishing container 
simply ignores this event. 

> Exception ' Invalid event: TA_TOO_MANY_FETCH_FAILURE at 
> SUCCESS_FINISHING_CONTAINER' cause job error
> 
>
> Key: MAPREDUCE-7240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.8.2
>Reporter: luhuachao
>Assignee: luhuachao
>Priority: Critical
>  Labels: kerberos
> Attachments: application_1566552310686_260041.log
>
>
> *log in appmaster*
> {noformat}
> 2019-09-03 17:18:43,090 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_52_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_49_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_51_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_50_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,091 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1566552310686_260041_m_53_0 ... 
> raising fetch failure to map
> 2019-09-03 17:18:43,092 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1566552310686_260041_m_52_0 transitioned from state SUCCEEDED to 
> FAILED, event type is TA_TOO_MANY_FETCH_FAILURE and nodeId=yarn095:45454
> 2019-09-03 17:18:43,092 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_49_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1450)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>   at java.lang.Thread.run(Thread.java:745)
> 2019-09-03 17:18:43,093 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1566552310686_260041_m_51_0
> org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at SUCCESS_FINISHING_CONTAINER
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1206)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:146)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1458)
>   at 
> org.apache.hadoop.mapreduce

[jira] [Updated] (MAPREDUCE-7238) TestMRJobs.testJobClassloader fails intermittent

2019-09-03 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7238:
-
Attachment: stdout

> TestMRJobs.testJobClassloader fails intermittent
> 
>
> Key: MAPREDUCE-7238
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7238
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: stdout
>
>
> *TestMRJobs.testJobClassloader fails intermittent* observed in 
> {code}
> ERROR] testJobClassloader(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time 
> elapsed: 29.77 s  <<< FAILURE!
> java.lang.AssertionError: 
> Job status: Application application_1567255842834_0009 failed 2 times due to 
> AM Container for appattempt_1567255842834_0009_02 exited with  exitCode: 1
> Failing this attempt.Diagnostics: [2019-08-31 12:54:14.542]Exception from 
> container-launch.
> Container id: container_1567255842834_0009_02_01
> Exit code: 1
> [2019-08-31 12:54:14.546]Container exited with a non-zero exit code 1. Error 
> file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> [2019-08-31 12:54:14.547]Container exited with a non-zero exit code 1. Error 
> file: prelaunch.err.
> Last 4096 bytes of prelaunch.err :
> Last 4096 bytes of stderr :
> For more detailed output, check the application tracking page: 
> http://6437fb7eb209:32931/cluster/app/application_1567255842834_0009 Then 
> click on links to logs of each attempt.
> . Failing the application.
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloader(TestMRJobs.java:531)
>   at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloader(TestMRJobs.java:473)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7238) TestMRJobs.testJobClassloader fails intermittent

2019-09-03 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7238:
-
Description: 
*TestMRJobs.testJobClassloader fails intermittent* observed in 

{code}
ERROR] testJobClassloader(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time 
elapsed: 29.77 s  <<< FAILURE!
java.lang.AssertionError: 
Job status: Application application_1567255842834_0009 failed 2 times due to AM 
Container for appattempt_1567255842834_0009_02 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2019-08-31 12:54:14.542]Exception from 
container-launch.
Container id: container_1567255842834_0009_02_01
Exit code: 1

[2019-08-31 12:54:14.546]Container exited with a non-zero exit code 1. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :


[2019-08-31 12:54:14.547]Container exited with a non-zero exit code 1. Error 
file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :


For more detailed output, check the application tracking page: 
http://6437fb7eb209:32931/cluster/app/application_1567255842834_0009 Then click 
on links to logs of each attempt.
. Failing the application.
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloader(TestMRJobs.java:531)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testJobClassloader(TestMRJobs.java:473)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)

{code}

  was:
TestMRJobs.testThreadDumpOnTaskTimeout fails

{code}
[ERROR] testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestMRJobs)  
Time elapsed: 43.282 s  <<< FAILURE!
java.lang.AssertionError: No thread dump
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1222)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:745)
{code}


> TestMRJobs.testJobClassloader fails intermittent
> 
>
> Key: MAPREDUCE-7238
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7238
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: stdout
>
>
> *TestMRJobs.testJobClassloader fails intermittent* observed in 
> {code}
> ERROR] testJobClassloader(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time 
> elapsed: 29.77 s  <<< FAILURE!
> java.lang.AssertionError: 
> Job status: Application application_1567255842834_0009 failed 2 times due to 
> AM Container for appattempt_1567255842834_0009_02 exited with  exitCode: 1
> Failing this attempt.Diagnostics: [2019-

[jira] [Updated] (MAPREDUCE-7238) TestMRJobs.testJobClassloader fails intermittent

2019-09-03 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7238:
-
Summary: TestMRJobs.testJobClassloader fails intermittent  (was: 
TestMRJobs.testThreadDumpOnTaskTimeout fails)

> TestMRJobs.testJobClassloader fails intermittent
> 
>
> Key: MAPREDUCE-7238
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7238
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> TestMRJobs.testThreadDumpOnTaskTimeout fails
> {code}
> [ERROR] 
> testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time 
> elapsed: 43.282 s  <<< FAILURE!
> java.lang.AssertionError: No thread dump
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1222)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7238) TestMRJobs.testThreadDumpOnTaskTimeout fails

2019-08-31 Thread Prabhu Joseph (Jira)
Prabhu Joseph created MAPREDUCE-7238:


 Summary: TestMRJobs.testThreadDumpOnTaskTimeout fails
 Key: MAPREDUCE-7238
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7238
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestMRJobs.testThreadDumpOnTaskTimeout fails

{code}
[ERROR] testThreadDumpOnTaskTimeout(org.apache.hadoop.mapreduce.v2.TestMRJobs)  
Time elapsed: 43.282 s  <<< FAILURE!
java.lang.AssertionError: No thread dump
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1222)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails

2019-08-15 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16908162#comment-16908162
 ] 

Prabhu Joseph commented on MAPREDUCE-7230:
--

Thanks [~snemeth].

> TestHSWebApp.testLogsViewSingle fails
> -
>
> Key: MAPREDUCE-7230
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
> Attachments: MAPREDUCE-7230-001.patch
>
>
> TestHSWebApp.testLogsViewSingle fails.
> {code}
> [ERROR] 
> testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp)  
> Time elapsed: 0.294 s  <<< FAILURE!
> Argument(s) are different! Wanted:
> printWriter.write(
> "Logs not available for container_10_0001_01_01. Aggregation may not 
> be complete, Check back later or try the nodemanager at localhost:1234"
> );
> -> at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234)
> Actual invocations have different arguments:
> printWriter.print(
> " "http://www.w3.org/TR/html4/strict.dtd";>"
> );
> -> at 
> org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62)
> printWriter.write(
> " "http://www.w3.org/TR/html4/strict.dtd";>"
> );
> -> at java.io.PrintWriter.print(PrintWriter.java:617)
> printWriter.write(
> " "http://www.w3.org/TR/html4/strict.dtd";>",
> 0,
> 90
> );
> -> at java.io.PrintWriter.write(PrintWriter.java:473)
> printWriter.println(
> 
> );
> -> at 
> org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81)
> printWriter.print(
> " );
> -> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273)
> printWriter.write(
> " );
> -> at java.io.PrintWriter.print(PrintWriter.java:603)
> printWriter.write(
> " 0,
> 5
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails

2019-08-14 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907452#comment-16907452
 ] 

Prabhu Joseph commented on MAPREDUCE-7230:
--

Yes [~snemeth], as the testcase fails in 3.2 and 3.1 as well.

> TestHSWebApp.testLogsViewSingle fails
> -
>
> Key: MAPREDUCE-7230
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7230-001.patch
>
>
> TestHSWebApp.testLogsViewSingle fails.
> {code}
> [ERROR] 
> testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp)  
> Time elapsed: 0.294 s  <<< FAILURE!
> Argument(s) are different! Wanted:
> printWriter.write(
> "Logs not available for container_10_0001_01_01. Aggregation may not 
> be complete, Check back later or try the nodemanager at localhost:1234"
> );
> -> at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234)
> Actual invocations have different arguments:
> printWriter.print(
> " "http://www.w3.org/TR/html4/strict.dtd";>"
> );
> -> at 
> org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62)
> printWriter.write(
> " "http://www.w3.org/TR/html4/strict.dtd";>"
> );
> -> at java.io.PrintWriter.print(PrintWriter.java:617)
> printWriter.write(
> " "http://www.w3.org/TR/html4/strict.dtd";>",
> 0,
> 90
> );
> -> at java.io.PrintWriter.write(PrintWriter.java:473)
> printWriter.println(
> 
> );
> -> at 
> org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81)
> printWriter.print(
> " );
> -> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273)
> printWriter.write(
> " );
> -> at java.io.PrintWriter.print(PrintWriter.java:603)
> printWriter.write(
> " 0,
> 5
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails

2019-08-13 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906519#comment-16906519
 ] 

Prabhu Joseph commented on MAPREDUCE-7230:
--

[~snemeth] Can you review this Jira when you get time. This fixes failing test 
case TestHSWebApp.testLogsViewSingle caused by YARN-9451. Have missed it 
earlier as Jenkins Build for YARN-9451 patch did not trigger testcases of 
hadoop-mapreduce-client-hs.

> TestHSWebApp.testLogsViewSingle fails
> -
>
> Key: MAPREDUCE-7230
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7230-001.patch
>
>
> TestHSWebApp.testLogsViewSingle fails.
> {code}
> [ERROR] 
> testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp)  
> Time elapsed: 0.294 s  <<< FAILURE!
> Argument(s) are different! Wanted:
> printWriter.write(
> "Logs not available for container_10_0001_01_01. Aggregation may not 
> be complete, Check back later or try the nodemanager at localhost:1234"
> );
> -> at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234)
> Actual invocations have different arguments:
> printWriter.print(
> " "http://www.w3.org/TR/html4/strict.dtd";>"
> );
> -> at 
> org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62)
> printWriter.write(
> " "http://www.w3.org/TR/html4/strict.dtd";>"
> );
> -> at java.io.PrintWriter.print(PrintWriter.java:617)
> printWriter.write(
> " "http://www.w3.org/TR/html4/strict.dtd";>",
> 0,
> 90
> );
> -> at java.io.PrintWriter.write(PrintWriter.java:473)
> printWriter.println(
> 
> );
> -> at 
> org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81)
> printWriter.print(
> " );
> -> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273)
> printWriter.write(
> " );
> -> at java.io.PrintWriter.print(PrintWriter.java:603)
> printWriter.write(
> " 0,
> 5
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails

2019-08-13 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7230:
-
Status: Patch Available  (was: Open)

> TestHSWebApp.testLogsViewSingle fails
> -
>
> Key: MAPREDUCE-7230
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7230-001.patch
>
>
> TestHSWebApp.testLogsViewSingle fails.
> {code}
> [ERROR] 
> testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp)  
> Time elapsed: 0.294 s  <<< FAILURE!
> Argument(s) are different! Wanted:
> printWriter.write(
> "Logs not available for container_10_0001_01_01. Aggregation may not 
> be complete, Check back later or try the nodemanager at localhost:1234"
> );
> -> at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234)
> Actual invocations have different arguments:
> printWriter.print(
> " "http://www.w3.org/TR/html4/strict.dtd";>"
> );
> -> at 
> org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62)
> printWriter.write(
> " "http://www.w3.org/TR/html4/strict.dtd";>"
> );
> -> at java.io.PrintWriter.print(PrintWriter.java:617)
> printWriter.write(
> " "http://www.w3.org/TR/html4/strict.dtd";>",
> 0,
> 90
> );
> -> at java.io.PrintWriter.write(PrintWriter.java:473)
> printWriter.println(
> 
> );
> -> at 
> org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81)
> printWriter.print(
> " );
> -> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273)
> printWriter.write(
> " );
> -> at java.io.PrintWriter.print(PrintWriter.java:603)
> printWriter.write(
> " 0,
> 5
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails

2019-08-13 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7230:
-
Attachment: MAPREDUCE-7230-001.patch

> TestHSWebApp.testLogsViewSingle fails
> -
>
> Key: MAPREDUCE-7230
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, test
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7230-001.patch
>
>
> TestHSWebApp.testLogsViewSingle fails.
> {code}
> [ERROR] 
> testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp)  
> Time elapsed: 0.294 s  <<< FAILURE!
> Argument(s) are different! Wanted:
> printWriter.write(
> "Logs not available for container_10_0001_01_01. Aggregation may not 
> be complete, Check back later or try the nodemanager at localhost:1234"
> );
> -> at 
> org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234)
> Actual invocations have different arguments:
> printWriter.print(
> " "http://www.w3.org/TR/html4/strict.dtd";>"
> );
> -> at 
> org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62)
> printWriter.write(
> " "http://www.w3.org/TR/html4/strict.dtd";>"
> );
> -> at java.io.PrintWriter.print(PrintWriter.java:617)
> printWriter.write(
> " "http://www.w3.org/TR/html4/strict.dtd";>",
> 0,
> 90
> );
> -> at java.io.PrintWriter.write(PrintWriter.java:473)
> printWriter.println(
> 
> );
> -> at 
> org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81)
> printWriter.print(
> " );
> -> at 
> org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273)
> printWriter.write(
> " );
> -> at java.io.PrintWriter.print(PrintWriter.java:603)
> printWriter.write(
> " 0,
> 5
> );
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7231) hadoop-mapreduce-client-jobclient fails with timeout

2019-08-13 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7231:


 Summary: hadoop-mapreduce-client-jobclient fails with timeout
 Key: MAPREDUCE-7231
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7231
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph
 Attachments: Maven_TestCase_Report.txt

hadoop-mapreduce-client-jobclient fails with timeout

{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on 
project hadoop-mapreduce-client-jobclient: There was a timeout or other error 
in the fork -> [Help 1]
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7230) TestHSWebApp.testLogsViewSingle fails

2019-08-13 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7230:


 Summary: TestHSWebApp.testLogsViewSingle fails
 Key: MAPREDUCE-7230
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7230
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, test
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestHSWebApp.testLogsViewSingle fails.

{code}
[ERROR] 
testLogsViewSingle(org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp)  Time 
elapsed: 0.294 s  <<< FAILURE!
Argument(s) are different! Wanted:
printWriter.write(
"Logs not available for container_10_0001_01_01. Aggregation may not be 
complete, Check back later or try the nodemanager at localhost:1234"
);
-> at 
org.apache.hadoop.mapreduce.v2.hs.webapp.TestHSWebApp.testLogsViewSingle(TestHSWebApp.java:234)
Actual invocations have different arguments:
printWriter.print(
"http://www.w3.org/TR/html4/strict.dtd";>"
);
-> at 
org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:62)
printWriter.write(
"http://www.w3.org/TR/html4/strict.dtd";>"
);
-> at java.io.PrintWriter.print(PrintWriter.java:617)
printWriter.write(
"http://www.w3.org/TR/html4/strict.dtd";>",
0,
90
);
-> at java.io.PrintWriter.write(PrintWriter.java:473)
printWriter.println(

);
-> at 
org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:81)
printWriter.print(
" at 
org.apache.hadoop.yarn.webapp.hamlet2.HamletImpl.printStartTag(HamletImpl.java:273)
printWriter.write(
" at java.io.PrintWriter.print(PrintWriter.java:603)
printWriter.write(
"

[jira] [Updated] (MAPREDUCE-7216) Fix TeraSort Job failing on S3 DirectoryStagingCommitter

2019-06-10 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7216:
-
Status: Patch Available  (was: Open)

> Fix TeraSort Job failing on S3 DirectoryStagingCommitter
> 
>
> Key: MAPREDUCE-7216
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: MAPREDUCE-7216-001.patch
>
>
> TeraSort Job fails on S3 with below exception. Terasort creates OutputPath 
> and writes partition filename but DirectoryStagingCommitter expects output 
> path to not exist.
> {code}
> 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with 
> state FAILED due to: Job setup failed : 
> org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job 
> as Task committer attempt_1559891760159_0011_m_00_0: Destination path 
> exists and committer conflict resolution mode is "fail"
>   at 
> org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)
>   at 
> org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)
>   at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
>   at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Creating partition filename in /tmp or some other directory fixes the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7216) Fix TeraSort Job failing on S3 DirectoryStagingCommitter

2019-06-10 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7216:
-
Attachment: MAPREDUCE-7216-001.patch

> Fix TeraSort Job failing on S3 DirectoryStagingCommitter
> 
>
> Key: MAPREDUCE-7216
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
> Attachments: MAPREDUCE-7216-001.patch
>
>
> TeraSort Job fails on S3 with below exception. Terasort creates OutputPath 
> and writes partition filename but DirectoryStagingCommitter expects output 
> path to not exist.
> {code}
> 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with 
> state FAILED due to: Job setup failed : 
> org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job 
> as Task committer attempt_1559891760159_0011_m_00_0: Destination path 
> exists and committer conflict resolution mode is "fail"
>   at 
> org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)
>   at 
> org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)
>   at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
>   at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Creating partition filename in /tmp or some other directory fixes the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7216) Fix TeraSort Job failing on S3 DirectoryStagingCommitter

2019-06-10 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7216:
-
Component/s: examples

> Fix TeraSort Job failing on S3 DirectoryStagingCommitter
> 
>
> Key: MAPREDUCE-7216
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: examples
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>
> TeraSort Job fails on S3 with below exception. Terasort creates OutputPath 
> and writes partition filename but DirectoryStagingCommitter expects output 
> path to not exist.
> {code}
> 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with 
> state FAILED due to: Job setup failed : 
> org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job 
> as Task committer attempt_1559891760159_0011_m_00_0: Destination path 
> exists and committer conflict resolution mode is "fail"
>   at 
> org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)
>   at 
> org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)
>   at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
>   at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Creating partition filename in /tmp or some other directory fixes the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7216) Fix TeraSort Job failing on S3 DirectoryStagingCommitter

2019-06-10 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7216:
-
Summary: Fix TeraSort Job failing on S3 DirectoryStagingCommitter  (was: 
TeraSort Job Fails on S3)

> Fix TeraSort Job failing on S3 DirectoryStagingCommitter
> 
>
> Key: MAPREDUCE-7216
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>
> TeraSort Job fails on S3 with below exception. Terasort creates OutputPath 
> and writes partition filename but DirectoryStagingCommitter expects output 
> path to not exist.
> {code}
> 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with 
> state FAILED due to: Job setup failed : 
> org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job 
> as Task committer attempt_1559891760159_0011_m_00_0: Destination path 
> exists and committer conflict resolution mode is "fail"
>   at 
> org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)
>   at 
> org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)
>   at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
>   at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Creating partition filename in /tmp or some other directory fixes the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7216) Fix TeraSort Job failing on S3 DirectoryStagingCommitter

2019-06-10 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7216:
-
Affects Version/s: 3.3.0

> Fix TeraSort Job failing on S3 DirectoryStagingCommitter
> 
>
> Key: MAPREDUCE-7216
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Minor
>
> TeraSort Job fails on S3 with below exception. Terasort creates OutputPath 
> and writes partition filename but DirectoryStagingCommitter expects output 
> path to not exist.
> {code}
> 9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with 
> state FAILED due to: Job setup failed : 
> org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job 
> as Task committer attempt_1559891760159_0011_m_00_0: Destination path 
> exists and committer conflict resolution mode is "fail"
>   at 
> org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)
>   at 
> org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)
>   at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)
>   at 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Creating partition filename in /tmp or some other directory fixes the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7217) TestMRTimelineEventHandling.testMRTimelineEventHandling fails

2019-06-10 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7217:
-
Description: 
*TestMRTimelineEventHandling.testMRTimelineEventHandling fails.*

{code:java}

ERROR] 
testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling)
  Time elapsed: 46.337 s  <<< FAILURE!
org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}

*TestJobHistoryEventHandler.testTimelineEventHandling* 

{code}
[ERROR] 
testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler)
  Time elapsed: 5.858 s  <<< FAILURE!
java.lang.AssertionError: expected:<1> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at 
org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:597)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{code}

  was:
*TestMRTimelineEventHandling.testMRTimelineEventHandling fails.*

{code:java}

ERROR] 
testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling)
  Time elapsed: 46.337 s  <<< FAILURE!
org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.as

[jira] [Commented] (MAPREDUCE-7217) TestMRTimelineEventHandling.testMRTimelineEventHandling fails

2019-06-10 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859805#comment-16859805
 ] 

Prabhu Joseph commented on MAPREDUCE-7217:
--

Testcase fails as the putEntities to {{ApplicationHistoryServer}} failed.
{code:java}
2019-06-10 12:14:27,283 ERROR [RM Timeline dispatcher] 
metrics.TimelineServiceV1Publisher 
(TimelineServiceV1Publisher.java:putEntity(385)) - Error when publishing entity 
[YARN_CONTAINER,container_1560149051337_0001_01_04]
org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
from the timeline server. HTTP error code: 403
at 
org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:139)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:178)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:383)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.access$100(TimelineServiceV1Publisher.java:52)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher$TimelineV1EventHandler.handle(TimelineServiceV1Publisher.java:408)
at 
org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher$TimelineV1EventHandler.handle(TimelineServiceV1Publisher.java:404)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:200)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:131)
at java.lang.Thread.run(Thread.java:745)
{code}
 
 {{TimelineWebService}} fails the request as the remote user is null.
{code:java}
2019-06-10 12:14:27,282 ERROR [qtp564893839-417] webapp.TimelineWebServices 
(TimelineWebServices.java:postEntities(237)) - The owner of the posted timeline 
entities is not set{code}
 
 This happens when Filters are set with {{RMAuthenticationFilter}} and 
{{TimelineAuthenticationFIlter}} which both conflicts.
{code:java}
2019-06-10 12:14:12,685 INFO  [Listener at hw12663/50946] 
timeline.TimelineServerUtils (TimelineServerUtils.java:setTimelineFilters(73)) 
- Filter initializers set for timeline service: 
org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilterInitializer,org.apache.hadoop.http.lib.StaticUserWebFilter,org.apache.hadoop.yarn.server.timeline.security.TimelineAuthenticationFilterInitializer{code}
 
 {{MiniYarnCluster}} uses same config for starting both {{ResourceManager}} and 
{{ApplicationHistoryServer}}. RM started first with {{RMAuthenticationFilter}} 
and then {{ApplicationHistoryServer}} appends the config 
{{hadoop.http.filter.initializers}} with {{TimelineAuthenticationFIlter}}. Have 
ignored {{RMAuthenticationFilter}} in {{MiniYarnCluster}} while starting 
{{ApplicationHistoryServer}} which fixes the issue.

 

> TestMRTimelineEventHandling.testMRTimelineEventHandling fails
> -
>
> Key: MAPREDUCE-7217
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7217
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7217-001.patch
>
>
> *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.*
> {code:java}
> ERROR] 
> testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling)
>   Time elapsed: 46.337 s  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED>
>   at org.junit.Assert.assertEquals(Assert.java:115)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRun

[jira] [Updated] (MAPREDUCE-7217) TestMRTimelineEventHandling.testMRTimelineEventHandling fails

2019-06-10 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7217:
-
Status: Patch Available  (was: Open)

> TestMRTimelineEventHandling.testMRTimelineEventHandling fails
> -
>
> Key: MAPREDUCE-7217
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7217
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7217-001.patch
>
>
> *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.*
> {code:java}
> ERROR] 
> testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling)
>   Time elapsed: 46.337 s  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED>
>   at org.junit.Assert.assertEquals(Assert.java:115)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7217) TestMRTimelineEventHandling.testMRTimelineEventHandling fails

2019-06-10 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7217:
-
Attachment: MAPREDUCE-7217-001.patch

> TestMRTimelineEventHandling.testMRTimelineEventHandling fails
> -
>
> Key: MAPREDUCE-7217
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7217
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7217-001.patch
>
>
> *TestMRTimelineEventHandling.testMRTimelineEventHandling fails.*
> {code:java}
> ERROR] 
> testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling)
>   Time elapsed: 46.337 s  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED>
>   at org.junit.Assert.assertEquals(Assert.java:115)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7217) TestMRTimelineEventHandling.testMRTimelineEventHandling fails

2019-06-09 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7217:


 Summary: TestMRTimelineEventHandling.testMRTimelineEventHandling 
fails
 Key: MAPREDUCE-7217
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7217
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


*TestMRTimelineEventHandling.testMRTimelineEventHandling fails.*

{code:java}

ERROR] 
testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling)
  Time elapsed: 46.337 s  <<< FAILURE!
org.junit.ComparisonFailure: expected:<[AM_STAR]TED> but was:<[JOB_SUBMIT]TED>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:147)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7216) TeraSort Job Fails on S3

2019-06-07 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7216:


 Summary: TeraSort Job Fails on S3
 Key: MAPREDUCE-7216
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7216
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TeraSort Job fails on S3 with below exception. Terasort creates OutputPath and 
writes partition filename but DirectoryStagingCommitter expects output path to 
not exist.


{code}
9/06/07 14:13:34 INFO mapreduce.Job: Job job_1559891760159_0011 failed with 
state FAILED due to: Job setup failed : 
org.apache.hadoop.fs.PathExistsException: `s3a://bucket/OUTPUT': Setting job as 
Task committer attempt_1559891760159_0011_m_00_0: Destination path exists 
and committer conflict resolution mode is "fail"

at 
org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.failDestinationExists(StagingCommitter.java:878)

at 
org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.setupJob(DirectoryStagingCommitter.java:71)

at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:255)

at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:235)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)
{code}

Creating partition filename in /tmp or some other directory fixes the issue.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7201) Make Job History File Permissions configurable

2019-05-16 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841928#comment-16841928
 ] 

Prabhu Joseph commented on MAPREDUCE-7201:
--

Thanks [~erwaman] for reviewing.

[~eyang] Can you review this Jira when you get time. This allows jhist file 
permission configurable.

> Make Job History File Permissions configurable
> --
>
> Key: MAPREDUCE-7201
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7201-001.patch
>
>
> Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to 
> configure the intermediate user directory permission but still the jhist file 
> permission are not changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7203) TestRuntimeEstimators fails intermittent

2019-05-10 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7203:


 Summary: TestRuntimeEstimators fails intermittent
 Key: MAPREDUCE-7203
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7203
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestRuntimeEstimators fails intermittent.

{code}
[ERROR] 
testExponentialEstimator(org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators)
  Time elapsed: 9.637 s  <<< FAILURE!
java.lang.AssertionError: We got the wrong number of successful speculations. 
expected:<3> but was:<1>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at 
org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.coreTestEstimator(TestRuntimeEstimators.java:243)
at 
org.apache.hadoop.mapreduce.v2.app.TestRuntimeEstimators.testExponentialEstimator(TestRuntimeEstimators.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6993) Provide additional aggregated task stats at the Map / Reduce level

2019-05-10 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph reassigned MAPREDUCE-6993:


Assignee: Prabhu Joseph

> Provide additional aggregated task stats at the Map / Reduce level
> --
>
> Key: MAPREDUCE-6993
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6993
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> MapReduce ApplicationMaster can log aggregated tasks stats for Map / Reduce 
> stage like below which will make debugging easier. Similar to what Tez 
> provides TEZ-930
> firstTaskStartTime,
> firstTasksToStart
> lastTaskFinishTime
> lastTasksToFinish
> minTaskDuration
> maxTaskDuration 
> avgTaskDuration
> numSuccessfulTasks
> shortestDurationTasks
> longestDurationTasks
> numFailedTaskAttempts
> numKilledTaskAttempts
> numCompletedTasks
> numSucceededTasks
> numKilledTasks
> numFailedTasks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7201) Make Job History File Permissions configurable

2019-05-09 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16836413#comment-16836413
 ] 

Prabhu Joseph commented on MAPREDUCE-7201:
--

Have tested with 
mapreduce.jobhistory.intermediate-user-done-dir.permissions=775 and both 
intermediate user done dir and history files permission set correctly.

{code}
[ambari-qa@yarn-ats-1 ~]$ hadoop fs -ls /mr-history/tmp/
Found 1 items
drwxrwxr-x   - ambari-qa supergroup  0 2019-05-09 13:32 
/mr-history/tmp/ambari-qa
[ambari-qa@yarn-ats-1 ~]$ 
[ambari-qa@yarn-ats-1 ~]$ hadoop fs -ls /mr-history/tmp/ambari-qa
Found 3 items
-rwxrwxr-x   3 ambari-qa supergroup  22926 2019-05-09 13:32 
/mr-history/tmp/ambari-qa/job_1556909089920_0006-1557408730503-ambari%2Dqa-word+count-1557408748287-1-1-SUCCEEDED-default-1557408736193.jhist
-rwxrwxr-x   3 ambari-qa supergroup444 2019-05-09 13:32 
/mr-history/tmp/ambari-qa/job_1556909089920_0006.summary
-rwxrwxr-x   3 ambari-qa supergroup 220806 2019-05-09 13:32 
/mr-history/tmp/ambari-qa/job_1556909089920_0006_conf.xml
{code}

> Make Job History File Permissions configurable
> --
>
> Key: MAPREDUCE-7201
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7201-001.patch
>
>
> Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to 
> configure the intermediate user directory permission but still the jhist file 
> permission are not changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7201) Make Job History File Permissions configurable

2019-05-09 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7201:
-
Attachment: MAPREDUCE-7201-001.patch

> Make Job History File Permissions configurable
> --
>
> Key: MAPREDUCE-7201
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7201-001.patch
>
>
> Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to 
> configure the intermediate user directory permission but still the jhist file 
> permission are not changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7201) Make Job History File Permissions configurable

2019-05-09 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7201:
-
Status: Patch Available  (was: Open)

> Make Job History File Permissions configurable
> --
>
> Key: MAPREDUCE-7201
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Affects Versions: 3.2.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7201-001.patch
>
>
> Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to 
> configure the intermediate user directory permission but still the jhist file 
> permission are not changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7201) Make Job History File Permissions configurable

2019-05-03 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7201:


 Summary: Make Job History File Permissions configurable
 Key: MAPREDUCE-7201
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7201
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Affects Versions: 3.2.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


Job History File Permissions are hardcoded to 770. MAPREDUCE-7010 allows to 
configure the intermediate user directory permission but still the jhist file 
permission are not changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7184) TestJobCounters byte counters omitting crc file bytes read

2019-02-12 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph reassigned MAPREDUCE-7184:


Assignee: (was: Prabhu Joseph)

> TestJobCounters byte counters omitting crc file bytes read
> --
>
> Key: MAPREDUCE-7184
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch, 
> MAPREDUCE-7184-003.patch
>
>
> TestJobCounters test cases are failing in trunk while validating the input 
> files size with BYTES_READ by the job. The crc files are considered in 
> getFileSize whereas the job FileInputFormat ignores them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7184) TestJobCounters byte counters omitting crc file bytes read

2019-02-12 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766091#comment-16766091
 ] 

Prabhu Joseph commented on MAPREDUCE-7184:
--

Thanks [~ste...@apache.org] for the details. I am OK either way. 

> TestJobCounters byte counters omitting crc file bytes read
> --
>
> Key: MAPREDUCE-7184
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch, 
> MAPREDUCE-7184-003.patch
>
>
> TestJobCounters test cases are failing in trunk while validating the input 
> files size with BYTES_READ by the job. The crc files are considered in 
> getFileSize whereas the job FileInputFormat ignores them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file

2019-02-12 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765994#comment-16765994
 ] 

Prabhu Joseph commented on MAPREDUCE-7184:
--

[~tasanuma0829] FileInputFormat does not consider hidden files 
(HiddenFileFilter) in the input list and so suspect the crc files are getting 
created after some change after which test case breaks. Since FileInputFormat 
does not consider crc files and so the getFileSize can also ignore.

> TestJobCounters#getFileSize can ignore crc file
> ---
>
> Key: MAPREDUCE-7184
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch, 
> MAPREDUCE-7184-003.patch
>
>
> TestJobCounters test cases are failing in trunk while validating the input 
> files size with BYTES_READ by the job. The crc files are considered in 
> getFileSize whereas the job FileInputFormat ignores them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file

2019-02-11 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7184:
-
Attachment: MAPREDUCE-7184-003.patch

> TestJobCounters#getFileSize can ignore crc file
> ---
>
> Key: MAPREDUCE-7184
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch, 
> MAPREDUCE-7184-003.patch
>
>
> TestJobCounters test cases are failing in trunk while validating the input 
> files size with BYTES_READ by the job. The crc files are considered in 
> getFileSize whereas the job FileInputFormat ignores them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file

2019-02-11 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7184:
-
Attachment: MAPREDUCE-7184-002.patch

> TestJobCounters#getFileSize can ignore crc file
> ---
>
> Key: MAPREDUCE-7184
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7184-001.patch, MAPREDUCE-7184-002.patch
>
>
> TestJobCounters test cases are failing in trunk while validating the input 
> files size with BYTES_READ by the job. The crc files are considered in 
> getFileSize whereas the job FileInputFormat ignores them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file

2019-02-10 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764670#comment-16764670
 ] 

Prabhu Joseph commented on MAPREDUCE-7184:
--

Hi [~sunilg], Can you review this jira which fixes TestJobCounters failing on 
trunk.

> TestJobCounters#getFileSize can ignore crc file
> ---
>
> Key: MAPREDUCE-7184
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7184-001.patch
>
>
> TestJobCounters test cases are failing in trunk while validating the input 
> files size with BYTES_READ by the job. The crc files are considered in 
> getFileSize whereas the job FileInputFormat ignores them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file

2019-02-10 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7184:
-
Status: Patch Available  (was: Open)

> TestJobCounters#getFileSize can ignore crc file
> ---
>
> Key: MAPREDUCE-7184
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7184-001.patch
>
>
> TestJobCounters test cases are failing in trunk while validating the input 
> files size with BYTES_READ by the job. The crc files are considered in 
> getFileSize whereas the job FileInputFormat ignores them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file

2019-02-10 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7184:
-
Attachment: MAPREDUCE-7184-001.patch

> TestJobCounters#getFileSize can ignore crc file
> ---
>
> Key: MAPREDUCE-7184
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: MAPREDUCE-7184-001.patch
>
>
> TestJobCounters test cases are failing in trunk while validating the input 
> files size with BYTES_READ by the job. The crc files are considered in 
> getFileSize whereas the job FileInputFormat ignores them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7184) TestJobCounters#getFileSize can ignore crc file

2019-02-10 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7184:


 Summary: TestJobCounters#getFileSize can ignore crc file
 Key: MAPREDUCE-7184
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7184
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


TestJobCounters test cases are failing in trunk while validating the input 
files size with BYTES_READ by the job. The crc files are considered in 
getFileSize whereas the job FileInputFormat ignores them.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2019-01-23 Thread Prabhu Joseph (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749762#comment-16749762
 ] 

Prabhu Joseph commented on MAPREDUCE-7026:
--

[~sunilg] Can you review this jira as well. Thanks.

> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> --
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: supportability
> Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch, 
> MAPREDUCE-7026.3.patch
>
>
> A job is failing with reduce tasks failed to fetch map output and the 
> NodeManager ShuffleHandler failed to serve the map outputs with some 
> IOException like below. ShuffleHandler sends the actual error message in 
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
> headers :
> java.io.IOException: Error Reading IndexFile
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Owner 'hbase' for path 
> /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
>  d

[jira] [Assigned] (MAPREDUCE-7145) Improve ShuffleHandler Logging

2018-09-26 Thread Prabhu Joseph (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph reassigned MAPREDUCE-7145:


Assignee: Prabhu Joseph

> Improve ShuffleHandler Logging
> --
>
> Key: MAPREDUCE-7145
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7145
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> ShuffleHandler logs SpillFile not found when there is a permission denied 
> issue which is misleading.
> {code}
>  try {
> spill = SecureIOUtils.openForRandomRead(spillfile, "r", user, null);
>   } catch (FileNotFoundException e) {
> LOG.info(spillfile + " not found");
> return null;
> }
> {code}
> SecureIOUtils.openForRandomRead should log  "Permission denied" or  "No such 
> file or directory" instead of generic "file not found"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7145) Improve ShuffleHandler Logging

2018-09-26 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7145:


 Summary: Improve ShuffleHandler Logging
 Key: MAPREDUCE-7145
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7145
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.3
Reporter: Prabhu Joseph


ShuffleHandler logs SpillFile not found when there is a permission denied issue 
which is misleading.

{code}
 try {
spill = SecureIOUtils.openForRandomRead(spillfile, "r", user, null);
  } catch (FileNotFoundException e) {
LOG.info(spillfile + " not found");
return null;
}
{code}

SecureIOUtils.openForRandomRead should log  "Permission denied" or  "No such 
file or directory" instead of generic "file not found"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2018-04-25 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7026:
-
Attachment: MAPREDUCE-7026.3.patch

> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> --
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: supportability
> Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch, 
> MAPREDUCE-7026.3.patch
>
>
> A job is failing with reduce tasks failed to fetch map output and the 
> NodeManager ShuffleHandler failed to serve the map outputs with some 
> IOException like below. ShuffleHandler sends the actual error message in 
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
> headers :
> java.io.IOException: Error Reading IndexFile
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Owner 'hbase' for path 
> /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
>  did not match expected owner 'bde'
> at 
> org.apache.hadoop.io.S

[jira] [Created] (MAPREDUCE-7087) NNBench shows invalid Avg exec time and Avg Lat

2018-04-25 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7087:


 Summary: NNBench shows invalid Avg exec time and Avg Lat
 Key: MAPREDUCE-7087
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7087
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.7.3
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


NNBench shows Invalid Avg exec time  and Avg Lat when there is zero successful 
file operations. Better to not show them instead of invalid numbers.

{code}
18/04/25 09:57:33 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 
Infinity
18/04/25 09:57:33 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2018-04-24 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450133#comment-16450133
 ] 

Prabhu Joseph edited comment on MAPREDUCE-7026 at 4/24/18 4:08 PM:
---

Thanks [~jlowe] for the review. Yes you are right, the Fetcher error message in 
description is different. The patch tries to address below one

{code}
2018-04-19 12:24:39,511 WARN [fetcher#3] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Invalid map id
java.lang.IllegalArgumentException: TaskAttemptId string : TTP/1.1 500 Internal 
Server Error
Content-Type: text/plain; charset=UTF is not properly formed
at 
org.apache.hadoop.mapreduce.TaskAttemptID.forName(TaskAttemptID.java:201)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:517)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:345)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:200)
{code}

Have added instrumentation code to throw FileNotFoundException. Below is the 
complete response the ShuffleHandler has send.

{code}
TTP/1.1 500 Internal Server Error
Content-Type: text/plain; charset=UTF
name: mapreduce
version: 1.0.0

/tmp/file (No such file or directory)

{code}

Below is the sample log output after the patch. 
{code}
2018-04-19 12:24:39,510 WARN [fetcher#3] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Error message from Shuffle 
Handler:
name: mapreduce
version: 1.0.0

/tmp/file (No such file or directory)
{code}


was (Author: prabhu joseph):
Thanks [~jlowe] for the review. 

Have added instrumentation code to throw FileNotFoundException. Below is the 
complete response the ShuffleHandler has send.

{code}
TTP/1.1 500 Internal Server Error
Content-Type: text/plain; charset=UTF
name: mapreduce
version: 1.0.0

/tmp/file (No such file or directory)

{code}

Below is the sample log output after the patch. 
{code}
2018-04-19 12:24:39,510 WARN [fetcher#3] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Error message from Shuffle 
Handler:
name: mapreduce
version: 1.0.0

/tmp/file (No such file or directory)
{code}

> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> --
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: supportability
> Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch
>
>
> A job is failing with reduce tasks failed to fetch map output and the 
> NodeManager ShuffleHandler failed to serve the map outputs with some 
> IOException like below. ShuffleHandler sends the actual error message in 
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
> headers :
> java.io.IOException: Error Reading IndexFile
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived

[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2018-04-24 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7026:
-
Description: 
A job is failing with reduce tasks failed to fetch map output and the 
NodeManager ShuffleHandler failed to serve the map outputs with some 
IOException like below. ShuffleHandler sends the actual error message in 
response inside sendError() but the Fetcher does not log this message.

Logs from NodeManager ShuffleHandler:

{code}
2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
(ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
headers :
java.io.IOException: Error Reading IndexFile
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at 
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Owner 'hbase' for path 
/grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
 did not match expected owner 'bde'
at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
at 
org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174)
at 
org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:70)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:62)
at 
org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
{code}

Fetcher Logs below without the actual error message:

{code}
2018-04-19 12:24:39,521 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : 
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle 
in fetcher#3
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
   

[jira] [Commented] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2018-04-24 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450133#comment-16450133
 ] 

Prabhu Joseph commented on MAPREDUCE-7026:
--

Thanks [~jlowe] for the review. 

Have added instrumentation code to throw FileNotFoundException. Below is the 
complete response the ShuffleHandler has send.

{code}
TTP/1.1 500 Internal Server Error
Content-Type: text/plain; charset=UTF
name: mapreduce
version: 1.0.0

/tmp/file (No such file or directory)

{code}

Below is the sample log output after the patch. 
{code}
2018-04-19 12:24:39,510 WARN [fetcher#3] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Error message from Shuffle 
Handler:
name: mapreduce
version: 1.0.0

/tmp/file (No such file or directory)
{code}

> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> --
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: supportability
> Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch
>
>
> A job is failing with reduce tasks failed to fetch map output and the 
> NodeManager ShuffleHandler failed to serve the map outputs with some 
> IOException like below. ShuffleHandler sends the actual error message in 
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
> headers :
> java.io.IOException: Error Reading IndexFile
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.jboss.netty.util.internal.DeadLockProo

[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2018-04-24 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7026:
-
Description: 
A job is failing with reduce tasks failed to fetch map output and the 
NodeManager ShuffleHandler failed to serve the map outputs with some 
IOException like below. ShuffleHandler sends the actual error message in 
response inside sendError() but the Fetcher does not log this message.

Logs from NodeManager ShuffleHandler:

{code}
2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
(ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
headers :
java.io.IOException: Error Reading IndexFile
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at 
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Owner 'hbase' for path 
/grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
 did not match expected owner 'bde'
at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
at 
org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174)
at 
org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:70)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:62)
at 
org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
{code}

Fetcher Logs below without the actual error message:

{code}
2018-04-19 12:24:39,511 WARN [fetcher#3] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Invalid map id
java.lang.IllegalArgumentException: TaskAttemptId string : TTP/1.1 500 Internal 
Server Error
Content-Type: text/plain; charset=UTF is not properly formed
at 
org.

[jira] [Commented] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2018-04-24 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449590#comment-16449590
 ] 

Prabhu Joseph commented on MAPREDUCE-7026:
--

[~sunilg] Can you review this when you get time.

> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> --
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: supportability
> Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch
>
>
> A job is failing with reduce tasks failed to fetch map output and the 
> NodeManager ShuffleHandler failed to serve the map outputs with some 
> IOException like below. ShuffleHandler sends the actual error message in 
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
> headers :
> java.io.IOException: Error Reading IndexFile
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Owner 'hbase' for path 
> /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
>  did not match expected owner 'bd

[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2018-04-19 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7026:
-
Attachment: MAPREDUCE-7026.2.patch

> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> --
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: supportability
> Attachments: MAPREDUCE-7026.1.patch, MAPREDUCE-7026.2.patch
>
>
> A job is failing with reduce tasks failed to fetch map output and the 
> NodeManager ShuffleHandler failed to serve the map outputs with some 
> IOException like below. ShuffleHandler sends the actual error message in 
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
> headers :
> java.io.IOException: Error Reading IndexFile
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Owner 'hbase' for path 
> /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
>  did not match expected owner 'bde'
> at 
> org.apache.hadoop.io.SecureIOUtils.checkStat(Secu

[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2018-04-19 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7026:
-
Status: Patch Available  (was: Open)

> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> --
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: supportability
> Attachments: MAPREDUCE-7026.1.patch
>
>
> A job is failing with reduce tasks failed to fetch map output and the 
> NodeManager ShuffleHandler failed to serve the map outputs with some 
> IOException like below. ShuffleHandler sends the actual error message in 
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
> headers :
> java.io.IOException: Error Reading IndexFile
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Owner 'hbase' for path 
> /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
>  did not match expected owner 'bde'
> at 
> org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
> 

[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2018-04-19 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7026:
-
Attachment: MAPREDUCE-7026.1.patch

> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> --
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: supportability
> Attachments: MAPREDUCE-7026.1.patch
>
>
> A job is failing with reduce tasks failed to fetch map output and the 
> NodeManager ShuffleHandler failed to serve the map outputs with some 
> IOException like below. ShuffleHandler sends the actual error message in 
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
> headers :
> java.io.IOException: Error Reading IndexFile
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Owner 'hbase' for path 
> /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
>  did not match expected owner 'bde'
> at 
> org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
>   

[jira] [Assigned] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2018-04-19 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph reassigned MAPREDUCE-7026:


Assignee: Prabhu Joseph

> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> --
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>  Labels: supportability
>
> A job is failing with reduce tasks failed to fetch map output and the 
> NodeManager ShuffleHandler failed to serve the map outputs with some 
> IOException like below. ShuffleHandler sends the actual error message in 
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
> headers :
> java.io.IOException: Error Reading IndexFile
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Owner 'hbase' for path 
> /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
>  did not match expected owner 'bde'
> at 
> org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
> at 
> org.apache.hadoop.io.SecureIOUtils.force

[jira] [Updated] (MAPREDUCE-7071) Bypass the Fetcher and read directly from the local filesystem if source Mapper ran on the same host

2018-03-29 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7071:
-
Description: 
In the case of the source mapper and reducer are on the same host bypass the 
Fetcher and read it directly from the local filesystem

Idea is from Tez - https://issues.apache.org/jira/browse/TEZ-1343

  was:In the case of the source mapper and reducer are on the same host bypass 
the Fetcher and read it directly from the local filesystem


> Bypass the Fetcher and read directly from the local filesystem if source 
> Mapper ran on the same host
> 
>
> Key: MAPREDUCE-7071
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7071
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> In the case of the source mapper and reducer are on the same host bypass the 
> Fetcher and read it directly from the local filesystem
> Idea is from Tez - https://issues.apache.org/jira/browse/TEZ-1343



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7071) Bypass the Fetcher and read directly from the local filesystem if source Mapper ran on the same host

2018-03-29 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7071:


 Summary: Bypass the Fetcher and read directly from the local 
filesystem if source Mapper ran on the same host
 Key: MAPREDUCE-7071
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7071
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: task
Affects Versions: 2.7.3
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


In the case of the source mapper and reducer are on the same host bypass the 
Fetcher and read it directly from the local filesystem



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7045) JobListCache grows unlimited when the jobs are failed to move to done directory

2018-01-29 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7045:


 Summary: JobListCache grows unlimited when the jobs are failed to 
move to done directory
 Key: MAPREDUCE-7045
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7045
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.7.3
Reporter: Prabhu Joseph


When the jobs are failed to move to the done directory due to some reason like 
Permission issue, the JobListCache size grows unlimited with all failed jobs 
and the addIfAbsent() has to scan all the cache items.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2017-12-18 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7026:
-
Description: 
A job is failing with reduce tasks failed to fetch map output and the 
NodeManager ShuffleHandler failed to serve the map outputs with some 
IOException like below. ShuffleHandler sends the actual error message in 
response inside sendError() but the Fetcher does not log this message.

Logs from NodeManager ShuffleHandler:

{code}
2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
(ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
headers :
java.io.IOException: Error Reading IndexFile
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at 
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Owner 'hbase' for path 
/grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
 did not match expected owner 'bde'
at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
at 
org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174)
at 
org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:70)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:62)
at 
org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
{code}

Fetcher Logs below without the actual error message:

{code}
2017-12-18 10:10:17,688 INFO [IPC Server handler 1 on 35118] 
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from 
attempt_1511248592679_0039_r_00_0: Error: 
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle 
in fetcher#3

[jira] [Updated] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2017-12-18 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-7026:
-
Labels: supportability  (was: )

> Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler
> --
>
> Key: MAPREDUCE-7026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>  Labels: supportability
>
> A job is failing with reduce tasks failed to fetch map output and the 
> NodeManager ShuffleHandler failed to serve the map outputs with some 
> IOException like below. ShuffleHandler sends the actual error message in 
> response inside sendError() but the Fetcher does not log this message.
> Logs from NodeManager ShuffleHandler:
> {code}
> 2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
> (ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
> headers :
> java.io.IOException: Error Reading IndexFile
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
> at 
> org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
> at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
> at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
> at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
> at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
> at 
> org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
> at 
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Owner 'hbase' for path 
> /grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
>  did not match expected owner 'bde'
> at 
> org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
> at 
> org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174)
> at 

[jira] [Created] (MAPREDUCE-7026) Shuffle Fetcher does not log the actual error message thrown by ShuffleHandler

2017-12-18 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-7026:


 Summary: Shuffle Fetcher does not log the actual error message 
thrown by ShuffleHandler
 Key: MAPREDUCE-7026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7026
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 2.7.3
Reporter: Prabhu Joseph


A job is failing with reduce tasks failed to fetch map output and the 
NodeManager ShuffleHandler failed to serve the map outputs with some 
IOException like below. ShuffleHandler sends the actual error message in 
response inside sendError() but the Fetcher does not log this message.

Logs from NodeManager ShuffleHandler:

{code}
2017-12-18 10:10:30,728 ERROR mapred.ShuffleHandler 
(ShuffleHandler.java:messageReceived(962)) - Shuffle error in populating 
headers :
java.io.IOException: Error Reading IndexFile
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.populateHeaders(ShuffleHandler.java:1089)
at 
org.apache.hadoop.mapred.ShuffleHandler$Shuffle.messageReceived(ShuffleHandler.java:958)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:142)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:148)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:787)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at 
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at 
org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at 
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at 
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:555)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at 
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:107)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Owner 'hbase' for path 
/grid/7/hadoop/yarn/local/usercache/bde/appcache/application_1512457770852_9447/output/attempt_1512457770852_9447_1_01_07_0_10004/file.out.index
 did not match expected owner 'bde'
at org.apache.hadoop.io.SecureIOUtils.checkStat(SecureIOUtils.java:285)
at 
org.apache.hadoop.io.SecureIOUtils.forceSecureOpenFSDataInputStream(SecureIOUtils.java:174)
at 
org.apache.hadoop.io.SecureIOUtils.openFSDataInputStream(SecureIOUtils.java:158)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:70)
at org.apache.hadoop.mapred.SpillRecord.(SpillRecord.java:62)
at 
org.apache.hadoop.mapred.IndexCache.readIndexFileToCache(IndexCache.java:119)
{code}

Fetcher Logs below Instead without the actual error message:

{code}
2017-12-18 10:10:17,688 INFO [IPC Server h

[jira] [Commented] (MAPREDUCE-6975) Logging task counters

2017-11-06 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240222#comment-16240222
 ] 

Prabhu Joseph commented on MAPREDUCE-6975:
--

Thanks a lot [~Naganarasimha]

> Logging task counters 
> --
>
> Key: MAPREDUCE-6975
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
> Fix For: 2.9.0, 2.7.5, 3.0.0, 2.8.3
>
> Attachments: Log_Output, MAPREDUCE-6975.1.patch, 
> MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch
>
>
> Logging counters for each task at the end of it's syslog will make debug 
> easier with just application logs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6975) Logging task counters

2017-11-03 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16237523#comment-16237523
 ] 

Prabhu Joseph commented on MAPREDUCE-6975:
--

Hi [~Naganarasimha], if you get some time review this.

> Logging task counters 
> --
>
> Key: MAPREDUCE-6975
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: Log_Output, MAPREDUCE-6975.1.patch, 
> MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch
>
>
> Logging counters for each task at the end of it's syslog will make debug 
> easier with just application logs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6993) Provide additional aggregated task stats at the Map / Reduce level

2017-10-27 Thread Prabhu Joseph (JIRA)
Prabhu Joseph created MAPREDUCE-6993:


 Summary: Provide additional aggregated task stats at the Map / 
Reduce level
 Key: MAPREDUCE-6993
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6993
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.3
Reporter: Prabhu Joseph


MapReduce ApplicationMaster can log aggregated tasks stats for Map / Reduce 
stage like below which will make debugging easier. Similar to what Tez provides 
TEZ-930

firstTaskStartTime,
firstTasksToStart
lastTaskFinishTime
lastTasksToFinish
minTaskDuration
maxTaskDuration 
avgTaskDuration
numSuccessfulTasks
shortestDurationTasks
longestDurationTasks
numFailedTaskAttempts
numKilledTaskAttempts
numCompletedTasks
numSucceededTasks
numKilledTasks
numFailedTasks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6975) Logging task counters

2017-10-25 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218324#comment-16218324
 ] 

Prabhu Joseph commented on MAPREDUCE-6975:
--

Thanks [~Naganarasimha]

> Logging task counters 
> --
>
> Key: MAPREDUCE-6975
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
> Attachments: Log_Output, MAPREDUCE-6975.1.patch, 
> MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch
>
>
> Logging counters for each task at the end of it's syslog will make debug 
> easier with just application logs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6975) Logging task counters

2017-10-25 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218306#comment-16218306
 ] 

Prabhu Joseph commented on MAPREDUCE-6975:
--

[~Naganarasimha] [~rohithsharma] Can you help in reviewing this.

> Logging task counters 
> --
>
> Key: MAPREDUCE-6975
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
> Attachments: Log_Output, MAPREDUCE-6975.1.patch, 
> MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch
>
>
> Logging counters for each task at the end of it's syslog will make debug 
> easier with just application logs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6975) Logging task counters

2017-10-16 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206187#comment-16206187
 ] 

Prabhu Joseph commented on MAPREDUCE-6975:
--

[~Naganarasimha] Did some testing with failures cases. When AM fails, 
Diagnostics from stderr is displayed at Client and RM UI side and this won't 
have any logging of task counters. When task fails, there is no diagnostics 
captured. As per the testing, did not see any confusion caused by logging of 
task counters.



> Logging task counters 
> --
>
> Key: MAPREDUCE-6975
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
> Attachments: Log_Output, MAPREDUCE-6975.1.patch, 
> MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch
>
>
> Logging counters for each task at the end of it's syslog will make debug 
> easier with just application logs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6975) Logging task counters

2017-10-12 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201783#comment-16201783
 ] 

Prabhu Joseph commented on MAPREDUCE-6975:
--

[~Naganarasimha] Looks like we capture diagnostic message from 
ApplicationMaster and not for Task Containers, the counters we log is only for 
task containers and will be in syslog. Correct me if i am missing something.

> Logging task counters 
> --
>
> Key: MAPREDUCE-6975
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
> Attachments: Log_Output, MAPREDUCE-6975.1.patch, 
> MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch
>
>
> Logging counters for each task at the end of it's syslog will make debug 
> easier with just application logs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6975) Logging task counters

2017-10-12 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16201723#comment-16201723
 ] 

Prabhu Joseph commented on MAPREDUCE-6975:
--

[~Naganarasimha] Yes, have used Counters.toString() which output each counter 
in separate line.

> Logging task counters 
> --
>
> Key: MAPREDUCE-6975
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6975
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 2.7.3
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
> Attachments: Log_Output, MAPREDUCE-6975.1.patch, 
> MAPREDUCE-6975.2.patch, MAPREDUCE-6975.patch
>
>
> Logging counters for each task at the end of it's syslog will make debug 
> easier with just application logs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



  1   2   >