[jira] [Created] (MAPREDUCE-7473) Entity id/type not updated for HistoryEvent NORMALIZED_RESOURCE

2024-03-20 Thread Bilwa S T (Jira)
Bilwa S T created MAPREDUCE-7473:


 Summary: Entity id/type not updated for HistoryEvent 
NORMALIZED_RESOURCE
 Key: MAPREDUCE-7473
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7473
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bilwa S T
Assignee: Bilwa S T


Getting below exception in MR AM logs:

2024-03-09 16:23:30,329 ERROR [Job ATS Event Dispatcher] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error putting 
entity null to TimelineServer
org.apache.hadoop.yarn.exceptions.YarnException: Incomplete entity without 
entity id/type
at 
org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:88)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:187)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1129)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:745)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:241)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:156)
at java.base/java.lang.Thread.run(Thread.java:840)
2024-03-09 16:23:30,332 ERROR [Job ATS Event Dispatcher] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error putting 
entity null to TimelineServer
org.apache.hadoop.yarn.exceptions.YarnException: Incomplete entity without 
entity id/type
at 
org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:88)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:187)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1129)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:745)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:241)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:156)
at java.base/java.lang.Thread.run(Thread.java:840)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6784) JobImpl state changes for containers reuse

2022-10-13 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6784:
-
Status: Patch Available  (was: Open)

> JobImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6784
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6784
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6784-v0.patch
>
>
> Add JobImpl state changes for supporting reusing of containers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2021-09-02 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17408773#comment-17408773
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

Hi [~jeagles]

Thanks for your review .

* Why is denying racks and hosts should be enabled separately? Can you please 
elaborate? Currently we try to avoid launching on same rack as old attempt if 
there are no containers on diff rack then we try choosing node other than old 
attempt node. 
*  I will update patch by changing blacklist to denylist.





> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> MAPREDUCE-7169.006.patch, MAPREDUCE-7169.007.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-07-07 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17377054#comment-17377054
 ] 

Bilwa S T commented on MAPREDUCE-7353:
--

Thank you [~epayne]

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 3.4.0, 2.10.2, 3.2.3, 3.3.2
>
> Attachments: MAPREDUCE-7353.001.patch, MAPREDUCE-7353.002.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686

[jira] [Commented] (MAPREDUCE-6786) TaskAttemptImpl state changes for containers reuse

2021-07-03 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17373949#comment-17373949
 ] 

Bilwa S T commented on MAPREDUCE-6786:
--

Hi [~devaraj] [~brahma]

Can you please take a look at this whenever you get time? Thank you

> TaskAttemptImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6786
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6786
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6786-MR-6749.001.patch, 
> MAPREDUCE-6786-MR-6749.002.patch, MAPREDUCE-6786-v0.patch, 
> MAPREDUCE-6786.001.patch, MAPREDUCE-6786.002.patch
>
>
> Update TaskAttemptImpl to support the reuse of containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-30 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371932#comment-17371932
 ] 

Bilwa S T commented on MAPREDUCE-7353:
--

Hi [~epayne] can you please check updated patch? Thanks

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch, MAPREDUCE-7353.002.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>   Line 

[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-24 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368812#comment-17368812
 ] 

Bilwa S T commented on MAPREDUCE-7353:
--

Thanks [~epayne] for your review. I have added UT . Please take a look

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch, MAPREDUCE-7353.002.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>  

[jira] [Updated] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-24 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7353:
-
Attachment: MAPREDUCE-7353.002.patch

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch, MAPREDUCE-7353.002.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>   Line 74093: 2021-06-02 16:26:56,056 | INFO  | fetcher#9 | 

[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-23 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17368087#comment-17368087
 ] 

Bilwa S T commented on MAPREDUCE-7353:
--

Hi [~epayne]
Can you please take a look at this today if possible? Thanks

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>   Line 74093: 

[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-17 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364993#comment-17364993
 ] 

Bilwa S T commented on MAPREDUCE-7353:
--

Ok Thanks [~epayne]

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>   Line 74093: 2021-06-02 16:26:56,056 | INFO  | fetcher#9 | Reporting 
> 

[jira] [Updated] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-16 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7353:
-
Status: Patch Available  (was: Open)

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>   Line 74093: 2021-06-02 16:26:56,056 | INFO  | fetcher#9 | Reporting 
> fetch failure for 

[jira] [Updated] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-16 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7353:
-
Attachment: MAPREDUCE-7353.001.patch

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>   Line 74093: 2021-06-02 16:26:56,056 | INFO  | fetcher#9 | Reporting 
> fetch failure for 

[jira] [Commented] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-16 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17364233#comment-17364233
 ] 

Bilwa S T commented on MAPREDUCE-7353:
--

cc [~epayne] [~jbrennan]

> Mapreduce job fails when NM is stopped
> --
>
> Key: MAPREDUCE-7353
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7353.001.patch
>
>
> Job fails as task fail due to too many fetch failures 
> {code:java}
> Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e03_1622107691213_1054_01_05 taskAttempt 
> attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
>   Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
> KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
>   Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | TaskAttempt killed because it ran on unusable node 
> node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
> JobImpl.java:1401
>   Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
> | Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
> running on unusable node:node-group-1ZYEq0002:26009 | 
> RMContainerAllocator.java:1066
>   Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> Container released on a *lost* node | TaskAttemptImpl.java:2649
>   Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
> TaskAttemptImpl.java:1390
>   Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | Too many fetch-failures for output of task attempt: 
> attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
> JobImpl.java:2005
>   Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
>   Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
> handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
> SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
> and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
>   Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
>   Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
> handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
> cleanup failed for container container_e03_1622107691213_1054_01_05 : 
> java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
> node-group-1ZYEq0002:26009 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>   Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
> handler | Processing attempt_1622107691213_1054_m_00_0 of type 
> TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
>   Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
> going to fetch from node-group-1ZYEq0002:26008 for: 
> [attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
>   Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
> for node-group-1ZYEq0002:26008 -> 
> http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
>  | Fetcher.java:686
>   Line 74093: 2021-06-02 16:26:56,056 | INFO  | fetcher#9 | 

[jira] [Created] (MAPREDUCE-7353) Mapreduce job fails when NM is stopped

2021-06-16 Thread Bilwa S T (Jira)
Bilwa S T created MAPREDUCE-7353:


 Summary: Mapreduce job fails when NM is stopped
 Key: MAPREDUCE-7353
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7353
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bilwa S T
Assignee: Bilwa S T


Job fails as task fail due to too many fetch failures 
{code:java}
Line 48048: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | Processing 
the event EventType: CONTAINER_REMOTE_CLEANUP for container 
container_e03_1622107691213_1054_01_05 taskAttempt 
attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:394
Line 48053: 2021-06-02 16:25:02,002 | INFO  | ContainerLauncher #6 | 
KILLING attempt_1622107691213_1054_m_00_0 | ContainerLauncherImpl.java:209
Line 58026: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
handler | TaskAttempt killed because it ran on unusable node 
node-group-1ZYEq0002:26009. AttemptId:attempt_1622107691213_1054_m_00_0 | 
JobImpl.java:1401
Line 58030: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
TaskAttemptImpl.java:1390
Line 58035: 2021-06-02 16:26:34,034 | INFO  | RMCommunicator Allocator 
| Killing taskAttempt:attempt_1622107691213_1054_m_00_0 because it is 
running on unusable node:node-group-1ZYEq0002:26009 | 
RMContainerAllocator.java:1066
Line 58043: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
TaskAttemptImpl.java:1390
Line 58054: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type 
TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
Line 58055: 2021-06-02 16:26:34,034 | INFO  | AsyncDispatcher event 
handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
Container released on a *lost* node | TaskAttemptImpl.java:2649
Line 58057: 2021-06-02 16:26:34,034 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type TA_KILL | 
TaskAttemptImpl.java:1390
Line 60317: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
handler | Too many fetch-failures for output of task attempt: 
attempt_1622107691213_1054_m_00_0 ... raising fetch failure to map | 
JobImpl.java:2005
Line 60319: 2021-06-02 16:26:57,057 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type 
TA_TOO_MANY_FETCH_FAILURE | TaskAttemptImpl.java:1390
Line 60320: 2021-06-02 16:26:57,057 | INFO  | AsyncDispatcher event 
handler | attempt_1622107691213_1054_m_00_0 transitioned from state 
SUCCESS_CONTAINER_CLEANUP to FAILED, event type is TA_TOO_MANY_FETCH_FAILURE 
and nodeId=node-group-1ZYEq0002:26009 | TaskAttemptImpl.java:1411
Line 69487: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type 
TA_DIAGNOSTICS_UPDATE | TaskAttemptImpl.java:1390
Line 69527: 2021-06-02 16:30:02,002 | INFO  | AsyncDispatcher event 
handler | Diagnostics report from attempt_1622107691213_1054_m_00_0: 
cleanup failed for container container_e03_1622107691213_1054_01_05 : 
java.net.ConnectException: Call From node-group-1ZYEq0001/192.168.0.66 to 
node-group-1ZYEq0002:26009 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
Line 69607: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type 
TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
Line 69609: 2021-06-02 16:30:02,002 | DEBUG | AsyncDispatcher event 
handler | Processing attempt_1622107691213_1054_m_00_0 of type 
TA_CONTAINER_CLEANED | TaskAttemptImpl.java:1390
Line 73645: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | Fetcher 9 
going to fetch from node-group-1ZYEq0002:26008 for: 
[attempt_1622107691213_1054_m_00_0] | Fetcher.java:318
Line 73646: 2021-06-02 16:23:56,056 | DEBUG | fetcher#9 | MapOutput URL 
for node-group-1ZYEq0002:26008 -> 
http://node-group-1ZYEq0002:26008/mapOutput?job=job_1622107691213_1054=4=attempt_1622107691213_1054_m_00_0
 | Fetcher.java:686
Line 74093: 2021-06-02 16:26:56,056 | INFO  | fetcher#9 | Reporting 
fetch failure for attempt_1622107691213_1054_m_00_0 to MRAppMaster. | 
ShuffleSchedulerImpl.java:349
{code}

As we can see from logs that RM reported AM about node update at 16:26:34 but 
event was skipped as KILL event is ignored when TaskAttemptImpl is in 
SUCCESS_CONTAINER_CLEANUP state. So next we receive TA_TOO_MANY_FETCH_FAILURE 
event which will lead to task fail. 
 



--
This 

[jira] [Commented] (MAPREDUCE-6786) TaskAttemptImpl state changes for containers reuse

2021-05-30 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17353932#comment-17353932
 ] 

Bilwa S T commented on MAPREDUCE-6786:
--

Hi [~brahma] [~devaraj]
I have rebased this branch. Can you please help me in reviewing this patch? 
Thanks

> TaskAttemptImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6786
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6786
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6786-MR-6749.001.patch, 
> MAPREDUCE-6786-MR-6749.002.patch, MAPREDUCE-6786-v0.patch, 
> MAPREDUCE-6786.001.patch, MAPREDUCE-6786.002.patch
>
>
> Update TaskAttemptImpl to support the reuse of containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6786) TaskAttemptImpl state changes for containers reuse

2021-05-29 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6786:
-
Attachment: MAPREDUCE-6786-MR-6749.002.patch

> TaskAttemptImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6786
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6786
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6786-MR-6749.001.patch, 
> MAPREDUCE-6786-MR-6749.002.patch, MAPREDUCE-6786-v0.patch, 
> MAPREDUCE-6786.001.patch, MAPREDUCE-6786.002.patch
>
>
> Update TaskAttemptImpl to support the reuse of containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7199) HsJobsBlock reuse JobACLsManager for checkAccess

2021-04-02 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313912#comment-17313912
 ] 

Bilwa S T commented on MAPREDUCE-7199:
--

Hi [~brahmareddy] can we backport it to branch-3.3?

> HsJobsBlock reuse JobACLsManager for checkAccess
> 
>
> Key: MAPREDUCE-7199
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7199
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Fix For: 3.4.0
>
> Attachments: MAPREDUCE-7199-001.patch, MAPREDUCE-7199.002.patch, 
> MAPREDUCE-7199.003.patch
>
>
> Reuse JobAclManager.checkAccess
> {code} 
>  private boolean checkAccess(String userName) {
> if(!areAclsEnabled) {
>   return true;
> }
> // User could see its own job.
> if (ugi.getShortUserName().equals(userName)) {
>   return true;
> }
> // Admin could also see all jobs
> if (adminAclList != null && adminAclList.isUserAllowed(ugi)) {
>   return true;
> }
> return false;
>   }
> {code} 
> {code}
> jobACLsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, ..
>   new AccessControlList()))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6826) Job fails with InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED/COMMITTING

2021-03-31 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312094#comment-17312094
 ] 

Bilwa S T commented on MAPREDUCE-6826:
--

[~brahmareddy] can you please backport this to branch-3.3? Thanks

> Job fails with InvalidStateTransitonException: Invalid event: 
> JOB_TASK_COMPLETED at SUCCEEDED/COMMITTING
> 
>
> Key: MAPREDUCE-6826
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6826
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Varun Saxena
>Assignee: Bilwa S T
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: MAPREDUCE-6826-001.patch, MAPREDUCE-6826-002.patch, 
> MAPREDUCE-6826-003.patch
>
>
> This happens if a container is preempted by scheduler after job starts 
> committing.
> And this exception in turn leads to application being marked as FAILED in 
> YARN.
> I think we can probably ignore JOB_TASK_COMPLETED event while JobImpl state 
> is COMMITTING or SUCCEEDED as job is in the process of finishing.
> Also is there any point in attempting to scheduler another task attempt if 
> job is already in COMMITTING or SUCCEEDED state.
> {noformat}
> 2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1482404625971_23910_m_04 Task Transitioned from RUNNING to SUCCEEDED
> 2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 5
> 2016-12-23 09:10:38,643 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1482404625971_23910Job Transitioned from RUNNING to COMMITTING
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing 
> the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e55_1482404625971_23910_01_10 taskAttempt 
> attempt_1482404625971_23910_m_04_1
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING 
> attempt_1482404625971_23910_m_04_1
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: 
> Opening proxy : linux-19:26009
> 2016-12-23 09:10:38,644 INFO [CommitterEvent Processor #4] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT
> 2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : 
> jvm_1482404625971_23910_m_60473139527690 asked for a task
> 2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: 
> jvm_1482404625971_23910_m_60473139527690 is invalid and will be killed.
> 2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for 
> JobFinishedEvent 
> 2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1482404625971_23910Job Transitioned from COMMITTING to SUCCEEDED
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Job finished cleanly, 
> recording last MRAppMaster retry
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: RMCommunicator notified 
> that shouldUnregistered is: true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: 
> true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: 
> JobHistoryEventHandler notified that forceJobCompletion is true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the 
> services
> 2016-12-23 09:10:38,800 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping 
> JobHistoryEventHandler. Size of the outstanding queue size is 1
> 2016-12-23 09:10:38,989 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 
> AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-23 09:10:38,993 INFO [RMCommunicator Allocator] 
> 

[jira] [Commented] (MAPREDUCE-6809) Create ContainerRequestor interface and refactor RMContainerRequestor to use it

2021-02-23 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289220#comment-17289220
 ] 

Bilwa S T commented on MAPREDUCE-6809:
--

Attached rebased patch

> Create ContainerRequestor interface and refactor RMContainerRequestor to use 
> it
> ---
>
> Key: MAPREDUCE-6809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6809
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Devaraj Kavali
>Priority: Major
> Fix For: MR-6749
>
> Attachments: MAPREDUCE-6809-MR-6749.001.patch, 
> MAPREDUCE-6809-MR-6749.002.patch, MAPREDUCE-6809.001.patch
>
>
> As per the discussion in MAPREDUCE-6773, create a ContainerRequestor 
> interface and refactor RMContainerRequestor to use this interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6809) Create ContainerRequestor interface and refactor RMContainerRequestor to use it

2021-02-23 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6809:
-
Attachment: MAPREDUCE-6809.001.patch

> Create ContainerRequestor interface and refactor RMContainerRequestor to use 
> it
> ---
>
> Key: MAPREDUCE-6809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6809
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Devaraj Kavali
>Priority: Major
> Fix For: MR-6749
>
> Attachments: MAPREDUCE-6809-MR-6749.001.patch, 
> MAPREDUCE-6809-MR-6749.002.patch, MAPREDUCE-6809.001.patch
>
>
> As per the discussion in MAPREDUCE-6773, create a ContainerRequestor 
> interface and refactor RMContainerRequestor to use this interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6772) Add MR Job Configurations for Containers reuse

2021-02-23 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289191#comment-17289191
 ] 

Bilwa S T commented on MAPREDUCE-6772:
--

MAPREDUCE-6772.001 Rebased patch with trunk code 

> Add MR Job Configurations for Containers reuse
> --
>
> Key: MAPREDUCE-6772
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6772
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Devaraj Kavali
>Priority: Major
> Fix For: MR-6749
>
> Attachments: MAPREDUCE-6772-MR-6749.004.patch, 
> MAPREDUCE-6772-v0.patch, MAPREDUCE-6772-v1.patch, MAPREDUCE-6772.001.patch, 
> MR-6749-MAPREDUCE-6772.003.patch
>
>
> This task adds configurations required for MR AM Container reuse feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6772) Add MR Job Configurations for Containers reuse

2021-02-23 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6772:
-
Attachment: (was: LICENSE-binary)

> Add MR Job Configurations for Containers reuse
> --
>
> Key: MAPREDUCE-6772
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6772
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Devaraj Kavali
>Priority: Major
> Fix For: MR-6749
>
> Attachments: MAPREDUCE-6772-MR-6749.004.patch, 
> MAPREDUCE-6772-v0.patch, MAPREDUCE-6772-v1.patch, MAPREDUCE-6772.001.patch, 
> MR-6749-MAPREDUCE-6772.003.patch
>
>
> This task adds configurations required for MR AM Container reuse feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6772) Add MR Job Configurations for Containers reuse

2021-02-23 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6772:
-
Attachment: MAPREDUCE-6772.001.patch

> Add MR Job Configurations for Containers reuse
> --
>
> Key: MAPREDUCE-6772
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6772
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Devaraj Kavali
>Priority: Major
> Fix For: MR-6749
>
> Attachments: MAPREDUCE-6772-MR-6749.004.patch, 
> MAPREDUCE-6772-v0.patch, MAPREDUCE-6772-v1.patch, MAPREDUCE-6772.001.patch, 
> MR-6749-MAPREDUCE-6772.003.patch
>
>
> This task adds configurations required for MR AM Container reuse feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6772) Add MR Job Configurations for Containers reuse

2021-02-23 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6772:
-
Attachment: LICENSE-binary

> Add MR Job Configurations for Containers reuse
> --
>
> Key: MAPREDUCE-6772
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6772
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Devaraj Kavali
>Priority: Major
> Fix For: MR-6749
>
> Attachments: MAPREDUCE-6772-MR-6749.004.patch, 
> MAPREDUCE-6772-v0.patch, MAPREDUCE-6772-v1.patch, MAPREDUCE-6772.001.patch, 
> MR-6749-MAPREDUCE-6772.003.patch
>
>
> This task adds configurations required for MR AM Container reuse feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6749) MR AM should reuse containers for Map/Reduce Tasks

2021-02-17 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17285715#comment-17285715
 ] 

Bilwa S T commented on MAPREDUCE-6749:
--

Attached test report for this feature. 

> MR AM should reuse containers for Map/Reduce Tasks
> --
>
> Key: MAPREDUCE-6749
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6749
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Devaraj Kavali
>Priority: Major
> Attachments: Container Reuse Performance Report.pdf, 
> MAPREDUCE-6749-Container Reuse-v0.pdf
>
>
> It is with the continuation of MAPREDUCE-3902, MR AM should reuse containers 
> for Map/Reduce Tasks similar to the JVM Reuse feature we had in MRv1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6749) MR AM should reuse containers for Map/Reduce Tasks

2021-02-17 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6749:
-
Attachment: Container Reuse Performance Report.pdf

> MR AM should reuse containers for Map/Reduce Tasks
> --
>
> Key: MAPREDUCE-6749
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6749
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Devaraj Kavali
>Priority: Major
> Attachments: Container Reuse Performance Report.pdf, 
> MAPREDUCE-6749-Container Reuse-v0.pdf
>
>
> It is with the continuation of MAPREDUCE-3902, MR AM should reuse containers 
> for Map/Reduce Tasks similar to the JVM Reuse feature we had in MRv1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2021-01-25 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271291#comment-17271291
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

Hi [~epayne]

Can you please help in reviewing this patch?

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> MAPREDUCE-7169.006.patch, MAPREDUCE-7169.007.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7314) Job will hang if NM is restarted while its running

2021-01-20 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7314:
-
Attachment: MAPREDUCE-7314-MR-6749.001.patch

> Job will hang if NM is restarted while its running
> --
>
> Key: MAPREDUCE-7314
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7314-MR-6749.001.patch
>
>
> This is due to three different reasons
>  # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
>  # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill 
> current attempt which is assigned to container. That is because task attempt 
> is not updated in ContainerLauncherImpl#Container class. 
>  # Container gets assigned to task attempt even when container has stopped 
> running ie Container completed event is processed. This is because we add 
> reuse container map to allocated list. Makeremoterequest gets the same 
> container in allocationResponse whereas RM has sent same container in 
> finished container list. To avoid this we need to make sure allocated list 
> doesnt have any containers which are finished.
> Test credits : [~Rajshree]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7314) Job will hang if NM is restarted while its running

2021-01-20 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7314:
-
Status: Patch Available  (was: Open)

> Job will hang if NM is restarted while its running
> --
>
> Key: MAPREDUCE-7314
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7314-MR-6749.001.patch
>
>
> This is due to three different reasons
>  # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
>  # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill 
> current attempt which is assigned to container. That is because task attempt 
> is not updated in ContainerLauncherImpl#Container class. 
>  # Container gets assigned to task attempt even when container has stopped 
> running ie Container completed event is processed. This is because we add 
> reuse container map to allocated list. Makeremoterequest gets the same 
> container in allocationResponse whereas RM has sent same container in 
> finished container list. To avoid this we need to make sure allocated list 
> doesnt have any containers which are finished.
> Test credits : [~Rajshree]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7314) Job will hang if NM is restarted while its running

2021-01-20 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268769#comment-17268769
 ] 

Bilwa S T commented on MAPREDUCE-7314:
--

Hi [~epayne]

This is in branch MR-6749 when container reuse is enabled

> Job will hang if NM is restarted while its running
> --
>
> Key: MAPREDUCE-7314
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> This is due to three different reasons
>  # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
>  # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill 
> current attempt which is assigned to container. That is because task attempt 
> is not updated in ContainerLauncherImpl#Container class. 
>  # Container gets assigned to task attempt even when container has stopped 
> running ie Container completed event is processed. This is because we add 
> reuse container map to allocated list. Makeremoterequest gets the same 
> container in allocationResponse whereas RM has sent same container in 
> finished container list. To avoid this we need to make sure allocated list 
> doesnt have any containers which are finished.
> Test credits : [~Rajshree]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2021-01-19 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17267823#comment-17267823
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

Hi [~ahussein]

Sorry i missed your comment. All changes have been done to this. This is good 
to go

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> MAPREDUCE-7169.006.patch, MAPREDUCE-7169.007.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7314) Job will hang if NM is restarted while its running

2020-12-15 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7314:
-
Description: 
This is due to three different reasons
 # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
 # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill 
current attempt which is assigned to container. That is because task attempt is 
not updated in ContainerLauncherImpl#Container class. 
 # Container gets assigned to task attempt even when container has stopped 
running ie Container completed event is processed. This is because we add reuse 
container map to allocated list. Makeremoterequest gets the same container in 
allocationResponse whereas RM has sent same container in finished container 
list. To avoid this we need to make sure allocated list doesnt have any 
containers which are finished.

Test credits : [~Rajshree]

  was:
This is due to three different reasons
 # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
 # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill 
current attempt which is assigned to container. That is because task attempt is 
not updated in ContainerLauncherImpl#Container class. 
 # Container gets assigned to task attempt even when container has stopped 
running ie Container completed event is processed. This is because we add reuse 
container map to allocated list. Makeremoterequest gets the same container in 
allocationResponse whereas RM has sent same container in finished container 
list. To avoid this we need to make sure allocated list doesnt have any 
containers which are finished.


> Job will hang if NM is restarted while its running
> --
>
> Key: MAPREDUCE-7314
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> This is due to three different reasons
>  # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
>  # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill 
> current attempt which is assigned to container. That is because task attempt 
> is not updated in ContainerLauncherImpl#Container class. 
>  # Container gets assigned to task attempt even when container has stopped 
> running ie Container completed event is processed. This is because we add 
> reuse container map to allocated list. Makeremoterequest gets the same 
> container in allocationResponse whereas RM has sent same container in 
> finished container list. To avoid this we need to make sure allocated list 
> doesnt have any containers which are finished.
> Test credits : [~Rajshree]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7314) Job will hang if NM is restarted while its running

2020-12-15 Thread Bilwa S T (Jira)
Bilwa S T created MAPREDUCE-7314:


 Summary: Job will hang if NM is restarted while its running
 Key: MAPREDUCE-7314
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7314
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Bilwa S T
Assignee: Bilwa S T


This is due to three different reasons
 # PRIORITY_FAST_FAIL_MAP priority containers should be considered for reuse.
 # Whenever CONTAINER_REMOTE_CLEANUP is fired for task attempt, it wont kill 
current attempt which is assigned to container. That is because task attempt is 
not updated in ContainerLauncherImpl#Container class. 
 # Container gets assigned to task attempt even when container has stopped 
running ie Container completed event is processed. This is because we add reuse 
container map to allocated list. Makeremoterequest gets the same container in 
allocationResponse whereas RM has sent same container in finished container 
list. To avoid this we need to make sure allocated list doesnt have any 
containers which are finished.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6773) Implement RM Container Reuse Requestor to handle the reuse containers for resource requests

2020-12-09 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246463#comment-17246463
 ] 

Bilwa S T commented on MAPREDUCE-6773:
--

Hi [~devaraj]

I see that containers are reused only if priority is PRIORITY_MAP and 
PRIORITY_REUDCE. currently containers are not even cleaned up if its not any of 
these priority due to which job hangs whenever containers are killed as NM 
restarted[as tasks are not assigned to container]. Is there any reason why 
containers with priority  PRIORITY_FAST_FAIL_MAP are not being reused?
{quote}RMContainerReuseRequestor ln no 133, we are checking for PRIORITY_MAP 
and PRIORITY_REDUCE what about other priorities ? do we need to send 
CONTAINER_REMOTE_CLEANUP,
{quote}

> Implement RM Container Reuse Requestor to handle the reuse containers for 
> resource requests
> ---
>
> Key: MAPREDUCE-6773
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6773
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Affects Versions: MR-6749
>Reporter: Devaraj Kavali
>Assignee: Devaraj Kavali
>Priority: Major
> Fix For: MR-6749
>
> Attachments: MAPREDUCE-6773-MR-6749.003.patch, 
> MAPREDUCE-6773-MR-6749.004.patch, MAPREDUCE-6773-MR-6749.005.patch, 
> MAPREDUCE-6773-v0.patch, MAPREDUCE-6773-v1.patch, MAPREDUCE-6773-v2.patch
>
>
> Add RM Container Reuse Requestor which handles the reuse containers against 
> the Job reource requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7308) Containers never get reused as containersToReuse map gets cleared on makeRemoteRequest

2020-11-23 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7308:
-
Attachment: MAPREDUCE-7308-MR-6749.001.patch

> Containers never get reused as containersToReuse map gets cleared on 
> makeRemoteRequest
> --
>
> Key: MAPREDUCE-7308
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7308
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7308-MR-6749.001.patch
>
>
> In RMContainerReuseRequestor whenever containerAssigned is called it checks 
> if allocated container can be reused. This always returns false as the map is 
> getting cleared on makeRemoteRequest. I think container can be removed from 
> containersToReuse map once its used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7308) Containers never get reused as containersToReuse map gets cleared on makeRemoteRequest

2020-11-23 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7308:
-
Status: Patch Available  (was: Open)

> Containers never get reused as containersToReuse map gets cleared on 
> makeRemoteRequest
> --
>
> Key: MAPREDUCE-7308
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7308
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7308-MR-6749.001.patch
>
>
> In RMContainerReuseRequestor whenever containerAssigned is called it checks 
> if allocated container can be reused. This always returns false as the map is 
> getting cleared on makeRemoteRequest. I think container can be removed from 
> containersToReuse map once its used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6784) JobImpl state changes for containers reuse

2020-11-23 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned MAPREDUCE-6784:


Assignee: Bilwa S T  (was: Devaraj Kavali)

> JobImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6784
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6784
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6784-v0.patch
>
>
> Add JobImpl state changes for supporting reusing of containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6786) TaskAttemptImpl state changes for containers reuse

2020-11-23 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237293#comment-17237293
 ] 

Bilwa S T commented on MAPREDUCE-6786:
--

Hi [~devaraj] [~brahmareddy]

Can you please review this patch?

> TaskAttemptImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6786
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6786
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6786-MR-6749.001.patch, 
> MAPREDUCE-6786-v0.patch, MAPREDUCE-6786.001.patch, MAPREDUCE-6786.002.patch
>
>
> Update TaskAttemptImpl to support the reuse of containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7308) Containers never get reused as containersToReuse map gets cleared on makeRemoteRequest

2020-11-23 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7308:
-
Description: In RMContainerReuseRequestor whenever containerAssigned is 
called it checks if allocated container can be reused. This always returns 
false as the map is getting cleared on makeRemoteRequest. I think container can 
be removed from containersToReuse map once its used.  (was: In 
RMContainerReuseRequestor whenever containerAssigned is called it checks if 
allocated container can be reused. This always returns false as the map is 
getting cleared on makeRemoteRequest. I think there is no need to clear the map 
as container will be removed from containersToReuse map once its used.)

> Containers never get reused as containersToReuse map gets cleared on 
> makeRemoteRequest
> --
>
> Key: MAPREDUCE-7308
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7308
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> In RMContainerReuseRequestor whenever containerAssigned is called it checks 
> if allocated container can be reused. This always returns false as the map is 
> getting cleared on makeRemoteRequest. I think container can be removed from 
> containersToReuse map once its used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7308) Containers never get reused as containersToReuse map gets cleared on makeRemoteRequest

2020-11-18 Thread Bilwa S T (Jira)
Bilwa S T created MAPREDUCE-7308:


 Summary: Containers never get reused as containersToReuse map gets 
cleared on makeRemoteRequest
 Key: MAPREDUCE-7308
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7308
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Bilwa S T
Assignee: Bilwa S T


In RMContainerReuseRequestor whenever containerAssigned is called it checks if 
allocated container can be reused. This always returns false as the map is 
getting cleared on makeRemoteRequest. I think there is no need to clear the map 
as container will be removed from containersToReuse map once its used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7293) All pages in JHS should honor yarn.webapp.filter-entity-list-by-user

2020-11-12 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17230764#comment-17230764
 ] 

Bilwa S T commented on MAPREDUCE-7293:
--

Resolving this issue as jobACL info was null because client didn't enable acl. 

> All pages in JHS should honor yarn.webapp.filter-entity-list-by-user
> 
>
> Key: MAPREDUCE-7293
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7293
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> Currently only HsJobsBlock checks for the access. If user who doesn't have 
> permission to access job page is able to do it which is wrong. So we need to 
> have below check in HsJobBlock,HsTasksBlock and HsTaskPage
> {code:java}
>   if (isFilterAppListByUserEnabled && ugi != null && !aclsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, job.getUserName(), null)) {
> 
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-7293) All pages in JHS should honor yarn.webapp.filter-entity-list-by-user

2020-11-12 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T resolved MAPREDUCE-7293.
--
Resolution: Not A Problem

> All pages in JHS should honor yarn.webapp.filter-entity-list-by-user
> 
>
> Key: MAPREDUCE-7293
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7293
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> Currently only HsJobsBlock checks for the access. If user who doesn't have 
> permission to access job page is able to do it which is wrong. So we need to 
> have below check in HsJobBlock,HsTasksBlock and HsTaskPage
> {code:java}
>   if (isFilterAppListByUserEnabled && ugi != null && !aclsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, job.getUserName(), null)) {
> 
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6781) YarnChild should wait for another task when reuse is enabled

2020-10-07 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6781:
-
Attachment: MAPREDUCE-6781-MR-6749.001.patch

> YarnChild should wait for another task when reuse is enabled
> 
>
> Key: MAPREDUCE-6781
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6781
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6781-MR-6749.001.patch, MAPREDUCE-6781-v0.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6781) YarnChild should wait for another task when reuse is enabled

2020-10-07 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6781:
-
Status: Patch Available  (was: Open)

> YarnChild should wait for another task when reuse is enabled
> 
>
> Key: MAPREDUCE-6781
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6781
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6781-MR-6749.001.patch, MAPREDUCE-6781-v0.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6781) YarnChild should wait for another task when reuse is enabled

2020-10-07 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned MAPREDUCE-6781:


Assignee: Bilwa S T  (was: Devaraj Kavali)

> YarnChild should wait for another task when reuse is enabled
> 
>
> Key: MAPREDUCE-6781
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6781
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6781-v0.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7297) Exception thrown in log when trying to download conf from JHS

2020-10-07 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7297:
-
Status: Patch Available  (was: Open)

> Exception thrown in log when trying to download conf from JHS
> -
>
> Key: MAPREDUCE-7297
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7297
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: MAPREDUCE-7297.001.patch
>
>
> Below exception is thrown in JHS logs
> {code:java}
> 2020-09-23 21:53:07,437 | ERROR | qtp1635772897-51 | error handling URI: 
> /jobhistory/downloadconf/job_1600668504751_0001 | Dispatcher.java:175
> java.lang.IllegalStateException: No view rendered for 200
> at 
> org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:171)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
> at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
> at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
> at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
> at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
> at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
> at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
> at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
> at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
> at 
> com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
> at 
> com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
> at 
> org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
> at 
> com.huawei.hadoop.adapter.sso.LogoutFilter.doFilter(LogoutFilter.java:62)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
> at 
> com.huawei.hadoop.adapter.sso.RefererCheckFilter.doFilter(RefererCheckFilter.java:76)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
> at 
> org.jasig.cas.client.util.HttpServletRequestWrapperFilter.doFilter(HttpServletRequestWrapperFilter.java:70)
> at 
> com.huawei.hadoop.adapter.sso.HttpServletRequestWrapperFilterWrapper.doFilter(HttpServletRequestWrapperFilterWrapper.java:75)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
> at 
> org.jasig.cas.client.validation.AbstractTicketValidationFilter.doFilter(AbstractTicketValidationFilter.java:238)
> at 
> com.huawei.hadoop.adapter.sso.Cas20ProxyReceivingTicketValidationFilterWrapper.doFilter(Cas20ProxyReceivingTicketValidationFilterWrapper.java:40)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7297) Exception thrown in log when trying to download conf from JHS

2020-10-07 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7297:
-
Attachment: MAPREDUCE-7297.001.patch

> Exception thrown in log when trying to download conf from JHS
> -
>
> Key: MAPREDUCE-7297
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7297
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.1
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: MAPREDUCE-7297.001.patch
>
>
> Below exception is thrown in JHS logs
> {code:java}
> 2020-09-23 21:53:07,437 | ERROR | qtp1635772897-51 | error handling URI: 
> /jobhistory/downloadconf/job_1600668504751_0001 | Dispatcher.java:175
> java.lang.IllegalStateException: No view rendered for 200
> at 
> org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:171)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
> at 
> com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
> at 
> com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
> at 
> com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
> at 
> com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
> at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
> at 
> com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
> at 
> com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
> at 
> com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
> at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
> at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
> at 
> com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
> at 
> com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
> at 
> org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
> at 
> com.huawei.hadoop.adapter.sso.LogoutFilter.doFilter(LogoutFilter.java:62)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
> at 
> com.huawei.hadoop.adapter.sso.RefererCheckFilter.doFilter(RefererCheckFilter.java:76)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
> at 
> org.jasig.cas.client.util.HttpServletRequestWrapperFilter.doFilter(HttpServletRequestWrapperFilter.java:70)
> at 
> com.huawei.hadoop.adapter.sso.HttpServletRequestWrapperFilterWrapper.doFilter(HttpServletRequestWrapperFilterWrapper.java:75)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
> at 
> org.jasig.cas.client.validation.AbstractTicketValidationFilter.doFilter(AbstractTicketValidationFilter.java:238)
> at 
> com.huawei.hadoop.adapter.sso.Cas20ProxyReceivingTicketValidationFilterWrapper.doFilter(Cas20ProxyReceivingTicketValidationFilterWrapper.java:40)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6786) TaskAttemptImpl state changes for containers reuse

2020-09-30 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6786:
-
Attachment: MAPREDUCE-6786-MR-6749.001.patch

> TaskAttemptImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6786
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6786
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6786-MR-6749.001.patch, 
> MAPREDUCE-6786-v0.patch, MAPREDUCE-6786.001.patch, MAPREDUCE-6786.002.patch
>
>
> Update TaskAttemptImpl to support the reuse of containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6786) TaskAttemptImpl state changes for containers reuse

2020-09-28 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6786:
-
Attachment: MAPREDUCE-6786.002.patch

> TaskAttemptImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6786
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6786
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6786-v0.patch, MAPREDUCE-6786.001.patch, 
> MAPREDUCE-6786.002.patch
>
>
> Update TaskAttemptImpl to support the reuse of containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6786) TaskAttemptImpl state changes for containers reuse

2020-09-28 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6786:
-
Status: Patch Available  (was: Open)

> TaskAttemptImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6786
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6786
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6786-v0.patch, MAPREDUCE-6786.001.patch
>
>
> Update TaskAttemptImpl to support the reuse of containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6786) TaskAttemptImpl state changes for containers reuse

2020-09-28 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6786:
-
Attachment: MAPREDUCE-6786.001.patch

> TaskAttemptImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6786
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6786
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6786-v0.patch, MAPREDUCE-6786.001.patch
>
>
> Update TaskAttemptImpl to support the reuse of containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6786) TaskAttemptImpl state changes for containers reuse

2020-09-27 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17202846#comment-17202846
 ] 

Bilwa S T commented on MAPREDUCE-6786:
--

Hi [~devaraj] [~Naganarasimha]
i would like to work on this. So assigning it .

> TaskAttemptImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6786
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6786
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6786-v0.patch
>
>
> Update TaskAttemptImpl to support the reuse of containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6786) TaskAttemptImpl state changes for containers reuse

2020-09-27 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned MAPREDUCE-6786:


Assignee: Bilwa S T  (was: Naganarasimha G R)

> TaskAttemptImpl state changes for containers reuse
> --
>
> Key: MAPREDUCE-6786
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6786
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj Kavali
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6786-v0.patch
>
>
> Update TaskAttemptImpl to support the reuse of containers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7297) Exception thrown in log when trying to download conf from JHS

2020-09-23 Thread Bilwa S T (Jira)
Bilwa S T created MAPREDUCE-7297:


 Summary: Exception thrown in log when trying to download conf from 
JHS
 Key: MAPREDUCE-7297
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7297
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.1.1
Reporter: Bilwa S T
Assignee: Bilwa S T


Below exception is thrown in JHS logs

{code:java}
2020-09-23 21:53:07,437 | ERROR | qtp1635772897-51 | error handling URI: 
/jobhistory/downloadconf/job_1600668504751_0001 | Dispatcher.java:175
java.lang.IllegalStateException: No view rendered for 200
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:171)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
at 
com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
at 
org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
at 
com.huawei.hadoop.adapter.sso.LogoutFilter.doFilter(LogoutFilter.java:62)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
at 
com.huawei.hadoop.adapter.sso.RefererCheckFilter.doFilter(RefererCheckFilter.java:76)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
at 
org.jasig.cas.client.util.HttpServletRequestWrapperFilter.doFilter(HttpServletRequestWrapperFilter.java:70)
at 
com.huawei.hadoop.adapter.sso.HttpServletRequestWrapperFilterWrapper.doFilter(HttpServletRequestWrapperFilterWrapper.java:75)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1591)
at 
org.jasig.cas.client.validation.AbstractTicketValidationFilter.doFilter(AbstractTicketValidationFilter.java:238)
at 
com.huawei.hadoop.adapter.sso.Cas20ProxyReceivingTicketValidationFilterWrapper.doFilter(Cas20ProxyReceivingTicketValidationFilterWrapper.java:40)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7293) All pages in JHS should honor yarn.webapp.filter-entity-list-by-user

2020-08-22 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17182270#comment-17182270
 ] 

Bilwa S T commented on MAPREDUCE-7293:
--

Hi [~sunilg]

I analysed this again and found that there is already checkAccess happening for 
all this pages in AppController#checkAccess. this will be called whenever 
getJob, getTask, getTasks calls are made. This works fine when AM is running 
but for JHS this is currently not working as jobACL info in below code is 
coming as null. This is an issue even in case of REST api calls. As checkAccess 
calls are being made from REST call too. I think solving jobACL issue would 
solve this problem. Hence no need to again add this to all other pages
{code:java}
@Override
 public
 boolean checkAccess(UserGroupInformation callerUGI, JobACL jobOperation) {
 Map jobACLs = jobInfo.getJobACLs();
 AccessControlList jobACL = jobACLs.get(jobOperation);
 if (jobACL == null) {
 return true;
 }
 return aclsMgr.checkAccess(callerUGI, jobOperation, 
 jobInfo.getUsername(), jobACL);
 }{code}
Thanks for taking a look at this.

> All pages in JHS should honor yarn.webapp.filter-entity-list-by-user
> 
>
> Key: MAPREDUCE-7293
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7293
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> Currently only HsJobsBlock checks for the access. If user who doesn't have 
> permission to access job page is able to do it which is wrong. So we need to 
> have below check in HsJobBlock,HsTasksBlock and HsTaskPage
> {code:java}
>   if (isFilterAppListByUserEnabled && ugi != null && !aclsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, job.getUserName(), null)) {
> 
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7293) All pages in JHS should honor yarn.webapp.filter-entity-list-by-user

2020-08-17 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178760#comment-17178760
 ] 

Bilwa S T commented on MAPREDUCE-7293:
--

Hi [~sunilg]

In MAPREDUCE-7097 i see that checkAccess was called from HsJobBlock before and 
then it was moved to HsJobsBlock. But HsJobBlock can be accessed by other user. 
i think it should be added in all places. any suggestions? 

> All pages in JHS should honor yarn.webapp.filter-entity-list-by-user
> 
>
> Key: MAPREDUCE-7293
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7293
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>
> Currently only HsJobsBlock checks for the access. If user who doesn't have 
> permission to access job page is able to do it which is wrong. So we need to 
> have below check in HsJobBlock,HsTasksBlock and HsTaskPage
> {code:java}
>   if (isFilterAppListByUserEnabled && ugi != null && !aclsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, job.getUserName(), null)) {
> 
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7293) All pages in JHS should honor yarn.webapp.filter-entity-list-by-user

2020-08-14 Thread Bilwa S T (Jira)
Bilwa S T created MAPREDUCE-7293:


 Summary: All pages in JHS should honor 
yarn.webapp.filter-entity-list-by-user
 Key: MAPREDUCE-7293
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7293
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bilwa S T
Assignee: Bilwa S T


Currently only HsJobsBlock checks for the access. If user who doesn't have 
permission to access job page is able to do it which is wrong. So we need to 
have below check in HsJobBlock,HsTasksBlock and HsTaskPage

{code:java}
  if (isFilterAppListByUserEnabled && ugi != null && !aclsManager
  .checkAccess(ugi, JobACL.VIEW_JOB, job.getUserName(), null)) {

  }
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6726) YARN Registry based AM discovery with retry and in-flight task persistent via JHS

2020-08-10 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned MAPREDUCE-6726:


Assignee: Bilwa S T  (was: Srikanth Sampath)

> YARN Registry based AM discovery with retry and in-flight task persistent via 
> JHS
> -
>
> Key: MAPREDUCE-6726
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6726
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster
>Reporter: Junping Du
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6726-MAPREDUCE-6608.001.patch, 
> MAPREDUCE-6726-MAPREDUCE-6608.001.patch, 
> MAPREDUCE-6726-MAPREDUCE-6608.002.patch, 
> MAPREDUCE-6726-MAPREDUCE-6608.003.patch, WorkPreservingMRAppMaster.pdf
>
>
> Several tasks will be achieved in this JIRA based on the demo patch in 
> MAPREDUCE-6608:
> 1. AM discovery base on YARN register service. Could be replaced by YARN-4758 
> later due to scale up issue.
> 2. Retry logic for TaskUmbilicalProtocol RPC connection
> 3. In-flight task recover after AM restart via JHS
> 4. Configuration to control the behavior compatible with previous when not 
> enable this feature (by default).
> All security related issues and other concerns discussed in MAPREDUCE-6608 
> will be addressed in follow up JIRAs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6726) YARN Registry based AM discovery with retry and in-flight task persistent via JHS

2020-08-04 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170590#comment-17170590
 ] 

Bilwa S T commented on MAPREDUCE-6726:
--

Hi [~srikanth.sampath]
me and [~dmmkr] would like to work on this further if you are okay with it

> YARN Registry based AM discovery with retry and in-flight task persistent via 
> JHS
> -
>
> Key: MAPREDUCE-6726
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6726
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster
>Reporter: Junping Du
>Assignee: Srikanth Sampath
>Priority: Major
> Attachments: MAPREDUCE-6726-MAPREDUCE-6608.001.patch, 
> MAPREDUCE-6726-MAPREDUCE-6608.001.patch, 
> MAPREDUCE-6726-MAPREDUCE-6608.002.patch, 
> MAPREDUCE-6726-MAPREDUCE-6608.003.patch, WorkPreservingMRAppMaster.pdf
>
>
> Several tasks will be achieved in this JIRA based on the demo patch in 
> MAPREDUCE-6608:
> 1. AM discovery base on YARN register service. Could be replaced by YARN-4758 
> later due to scale up issue.
> 2. Retry logic for TaskUmbilicalProtocol RPC connection
> 3. In-flight task recover after AM restart via JHS
> 4. Configuration to control the behavior compatible with previous when not 
> enable this feature (by default).
> All security related issues and other concerns discussed in MAPREDUCE-6608 
> will be addressed in follow up JIRAs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-08-01 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169240#comment-17169240
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

Hi [~ahussein]

I have handled all review comments and fixed checkstyle issues which can be 
fixed.

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> MAPREDUCE-7169.006.patch, MAPREDUCE-7169.007.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-08-01 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7169:
-
Attachment: MAPREDUCE-7169.007.patch

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> MAPREDUCE-7169.006.patch, MAPREDUCE-7169.007.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-07-31 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7169:
-
Attachment: MAPREDUCE-7169.006.patch

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> MAPREDUCE-7169.006.patch, image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6976) mapred job -set-priority claims to set priority higher than yarn.cluster.max-application-priority

2020-07-09 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned MAPREDUCE-6976:


Assignee: (was: Bilwa S T)

> mapred job -set-priority claims to set priority higher than 
> yarn.cluster.max-application-priority
> -
>
> Key: MAPREDUCE-6976
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6976
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.9.0, 2.8.1, 3.1.0
>Reporter: Eric Payne
>Priority: Minor
>
> With {{yarn.cluster.max-application-priority}} set to 20 and 
> {{job_1507226760578_0002}} running at priority 0, run the following command:
> {noformat}
> $ mapred job -set-priority job_1507226760578_0002 21
> Changed job priority.
> {noformat}
> The above commands sets {{job_1507226760578_0002}} to priority 20. If 
> {{job_1507226760578_0002}} is already at 20, the command does nothing.
> Compare this behavior to running the {{yarn application -updatePriority}} 
> command:
> {code}
> $ yarn application -updatePriority 21 -appId application_1507226760578_0002
> Updating priority of an aplication application_1507226760578_0002
> Updated priority of an application  application_1507226760578_0002 to cluster 
> max priority OR keeping old priority as application is in final states
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6976) mapred job -set-priority claims to set priority higher than yarn.cluster.max-application-priority

2020-05-28 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned MAPREDUCE-6976:


Assignee: Bilwa S T

> mapred job -set-priority claims to set priority higher than 
> yarn.cluster.max-application-priority
> -
>
> Key: MAPREDUCE-6976
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6976
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.9.0, 2.8.1, 3.1.0
>Reporter: Eric Payne
>Assignee: Bilwa S T
>Priority: Minor
>
> With {{yarn.cluster.max-application-priority}} set to 20 and 
> {{job_1507226760578_0002}} running at priority 0, run the following command:
> {noformat}
> $ mapred job -set-priority job_1507226760578_0002 21
> Changed job priority.
> {noformat}
> The above commands sets {{job_1507226760578_0002}} to priority 20. If 
> {{job_1507226760578_0002}} is already at 20, the command does nothing.
> Compare this behavior to running the {{yarn application -updatePriority}} 
> command:
> {code}
> $ yarn application -updatePriority 21 -appId application_1507226760578_0002
> Updating priority of an aplication application_1507226760578_0002
> Updated priority of an application  application_1507226760578_0002 to cluster 
> max priority OR keeping old priority as application is in final states
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6826) Job fails with InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED/COMMITTING

2020-05-13 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106907#comment-17106907
 ] 

Bilwa S T commented on MAPREDUCE-6826:
--

cc [~inigoiri] [~brahmareddy]

> Job fails with InvalidStateTransitonException: Invalid event: 
> JOB_TASK_COMPLETED at SUCCEEDED/COMMITTING
> 
>
> Key: MAPREDUCE-6826
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6826
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Varun Saxena
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6826-001.patch, MAPREDUCE-6826-002.patch, 
> MAPREDUCE-6826-003.patch
>
>
> This happens if a container is preempted by scheduler after job starts 
> committing.
> And this exception in turn leads to application being marked as FAILED in 
> YARN.
> I think we can probably ignore JOB_TASK_COMPLETED event while JobImpl state 
> is COMMITTING or SUCCEEDED as job is in the process of finishing.
> Also is there any point in attempting to scheduler another task attempt if 
> job is already in COMMITTING or SUCCEEDED state.
> {noformat}
> 2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1482404625971_23910_m_04 Task Transitioned from RUNNING to SUCCEEDED
> 2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 5
> 2016-12-23 09:10:38,643 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1482404625971_23910Job Transitioned from RUNNING to COMMITTING
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing 
> the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e55_1482404625971_23910_01_10 taskAttempt 
> attempt_1482404625971_23910_m_04_1
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING 
> attempt_1482404625971_23910_m_04_1
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: 
> Opening proxy : linux-19:26009
> 2016-12-23 09:10:38,644 INFO [CommitterEvent Processor #4] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT
> 2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : 
> jvm_1482404625971_23910_m_60473139527690 asked for a task
> 2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: 
> jvm_1482404625971_23910_m_60473139527690 is invalid and will be killed.
> 2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for 
> JobFinishedEvent 
> 2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1482404625971_23910Job Transitioned from COMMITTING to SUCCEEDED
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Job finished cleanly, 
> recording last MRAppMaster retry
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: RMCommunicator notified 
> that shouldUnregistered is: true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: 
> true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: 
> JobHistoryEventHandler notified that forceJobCompletion is true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the 
> services
> 2016-12-23 09:10:38,800 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping 
> JobHistoryEventHandler. Size of the outstanding queue size is 1
> 2016-12-23 09:10:38,989 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 
> AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-23 09:10:38,993 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container 

[jira] [Updated] (MAPREDUCE-6826) Job fails with InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED/COMMITTING

2020-05-13 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-6826:
-
Attachment: MAPREDUCE-6826-003.patch

> Job fails with InvalidStateTransitonException: Invalid event: 
> JOB_TASK_COMPLETED at SUCCEEDED/COMMITTING
> 
>
> Key: MAPREDUCE-6826
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6826
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Varun Saxena
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-6826-001.patch, MAPREDUCE-6826-002.patch, 
> MAPREDUCE-6826-003.patch
>
>
> This happens if a container is preempted by scheduler after job starts 
> committing.
> And this exception in turn leads to application being marked as FAILED in 
> YARN.
> I think we can probably ignore JOB_TASK_COMPLETED event while JobImpl state 
> is COMMITTING or SUCCEEDED as job is in the process of finishing.
> Also is there any point in attempting to scheduler another task attempt if 
> job is already in COMMITTING or SUCCEEDED state.
> {noformat}
> 2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1482404625971_23910_m_04 Task Transitioned from RUNNING to SUCCEEDED
> 2016-12-23 09:10:38,642 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 5
> 2016-12-23 09:10:38,643 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1482404625971_23910Job Transitioned from RUNNING to COMMITTING
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing 
> the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e55_1482404625971_23910_01_10 taskAttempt 
> attempt_1482404625971_23910_m_04_1
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING 
> attempt_1482404625971_23910_m_04_1
> 2016-12-23 09:10:38,644 INFO [ContainerLauncher #5] 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: 
> Opening proxy : linux-19:26009
> 2016-12-23 09:10:38,644 INFO [CommitterEvent Processor #4] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: JOB_COMMIT
> 2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : 
> jvm_1482404625971_23910_m_60473139527690 asked for a task
> 2016-12-23 09:10:38,724 INFO [IPC Server handler 0 on 27113] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: 
> jvm_1482404625971_23910_m_60473139527690 is invalid and will be killed.
> 2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for 
> JobFinishedEvent 
> 2016-12-23 09:10:38,797 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1482404625971_23910Job Transitioned from COMMITTING to SUCCEEDED
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Job finished cleanly, 
> recording last MRAppMaster retry
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify RMCommunicator 
> isAMLastRetry: true
> 2016-12-23 09:10:38,798 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator: RMCommunicator notified 
> that shouldUnregistered is: true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Notify JHEH isAMLastRetry: 
> true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: 
> JobHistoryEventHandler notified that forceJobCompletion is true
> 2016-12-23 09:10:38,799 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Calling stop for all the 
> services
> 2016-12-23 09:10:38,800 INFO [Thread-93] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Stopping 
> JobHistoryEventHandler. Size of the outstanding queue size is 1
> 2016-12-23 09:10:38,989 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:1 
> AssignedReds:0 CompletedMaps:5 CompletedReds:0 ContAlloc:8 ContRel:0 
> HostLocal:0 RackLocal:0
> 2016-12-23 09:10:38,993 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container 

[jira] [Resolved] (MAPREDUCE-7116) UT failure in TestMRTimelineEventHandling#testMRNewTimelineServiceEventHandling -Hadoop-3.1

2020-05-13 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T resolved MAPREDUCE-7116.
--
Resolution: Invalid

> UT failure in 
> TestMRTimelineEventHandling#testMRNewTimelineServiceEventHandling -Hadoop-3.1
> ---
>
> Key: MAPREDUCE-7116
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7116
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: K G Bakthavachalam
>Assignee: Bilwa S T
>Priority: Major
>
> unit test failure in 
> TestMRTimelineEventHandling#testMRNewTimelineServiceEventHandling due to 
> timeline server issue..due to Java.io.IOException : Job din't finish in 
> 30 seconds at 
> org.apache.hadoop.mapred.UtilsForTests.runJobSucceed(UtilsForTests.java:659)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-05-09 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103381#comment-17103381
 ] 

Bilwa S T edited comment on MAPREDUCE-7169 at 5/9/20, 5:21 PM:
---

Hi [~ahussein]

What we are trying to achieve here is speculative attempt shouldn't be launched 
on faulty node. So even if task gets killed there is no point launching it on 
that node as it will slow.This is expected behaviour
{quote} 
 * Assuming that a new speculative attempt is created. Following the 
implementation, the new attempt X will have blacklisted nodes and skipped racks 
relevant to the original taskAttempt Y
 * Assuming taskAttempt Y is killed before attempt X gets assigned.
 * The RMContainerAllocator would still assign a host to attemptX based on the 
dated blacklists.
 Is this the expected behavior? or it is supposed to clear attemptX' 
blacklisted nodes?{quote}
Yes i think these two cases should be handled
{quote} * Should that object be synchronized? I believe there are more than one 
thread reading/writing to that object. Perhaps changing 
{{taskAttemptToEventMapping}} to {{concurrentHashMap}} would be sufficient. 
What do you think?
{quote}* In {{taskAttemptToEventMapping}}, the data is only removed when the 
taskAttempt is assigned. If taskAttempt is killed before being assigned, 
{{taskAttemptToEventMapping}} would still have the taskAttempt.
{quote}{quote}
Will update this
{quote} * Racks are going to be black listed too. Not just nodes. I believe 
that the javadoc and description in default.xml should emphasize that enabling 
the flag also avoids the local rack unless no other rack is available for 
scheduling.{quote}
Actually when task attempt is killed by default Avataar is VIRGIN. this is 
defect which needs to be addressed. If speculative task attempt is killed it is 
launched as normal task attempt
{quote} * why do we need {{mapTaskAttemptToAvataar}} when each taskAttempt has 
a field called {{avataar}} ?{quote}
How do you get taskattempt details in RMContainerAllocator??
{quote} - That's a design issue. One would expect that RequestEvent's lifetime 
should not survive {{handle()}} call. Therefore, the metadata should be 
consumed by the handlers. In the patch, 
{{ContainerRequestEvent.blacklistedNodes}} could be a field in taskAttempt. 
Then you won't need {{TaskAttemptBlacklistManager}} class.{quote}

Thanks 


was (Author: bilwast):
Hi [~ahussein]

What we are trying to achieve here is speculative attempt shouldn't be launched 
on faulty node. So even if task gets killed there is no point launching it on 
that node as it will slow.This is expected behaviour
{quote} 
 * Assuming that a new speculative attempt is created. Following the 
implementation, the new attempt X will have blacklisted nodes and skipped racks 
relevant to the original taskAttempt Y
 * Assuming taskAttempt Y is killed before attempt X gets assigned.
 * The RMContainerAllocator would still assign a host to attemptX based on the 
dated blacklists.
 Is this the expected behavior? or it is supposed to clear attemptX' 
blacklisted nodes?{quote}
Yes i think these two cases should be handled
{quote} * Should that object be synchronized? I believe there are more than one 
thread reading/writing to that object. Perhaps changing 
{{taskAttemptToEventMapping}} to {{concurrentHashMap}} would be sufficient. 
What do you think?
{quote}* In {{taskAttemptToEventMapping}}, the data is only removed when the 
taskAttempt is assigned. If taskAttempt is killed before being assigned, 
{{taskAttemptToEventMapping}} would still have the taskAttempt.
{quote}{quote}
Will update this
{quote} * Racks are going to be black listed too. Not just nodes. I believe 
that the javadoc and description in default.xml should emphasize that enabling 
the flag also avoids the local rack unless no other rack is available for 
scheduling.{quote}
Actually when task attempt is killed by default Avataar is VIRGIN. this is 
defect which needs to be addressed. If speculative task attempt is killed it is 
launched as normal task attempt
{quote} * why do we need {{mapTaskAttemptToAvataar}} when each taskAttempt has 
a field called {{avataar}} ?{quote}
How do you get taskattempt details in RMContainerAllocator??
{quote} - That's a design issue. One would expect that RequestEvent's lifetime 
should not survive {{handle()}} call. Therefore, the metadata should be 
consumed by the handlers. In the patch, 
{{ContainerRequestEvent.blacklistedNodes}} could be a field in taskAttempt. 
Then you won't need {{TaskAttemptBlacklistManager}} class.{quote}

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: 

[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-05-09 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103381#comment-17103381
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

Hi [~ahussein]

What we are trying to achieve here is speculative attempt shouldn't be launched 
on faulty node. So even if task gets killed there is no point launching it on 
that node as it will slow.This is expected behaviour
{quote} 
 * Assuming that a new speculative attempt is created. Following the 
implementation, the new attempt X will have blacklisted nodes and skipped racks 
relevant to the original taskAttempt Y
 * Assuming taskAttempt Y is killed before attempt X gets assigned.
 * The RMContainerAllocator would still assign a host to attemptX based on the 
dated blacklists.
 Is this the expected behavior? or it is supposed to clear attemptX' 
blacklisted nodes?{quote}
Yes i think these two cases should be handled
{quote} * Should that object be synchronized? I believe there are more than one 
thread reading/writing to that object. Perhaps changing 
{{taskAttemptToEventMapping}} to {{concurrentHashMap}} would be sufficient. 
What do you think?
{quote}* In {{taskAttemptToEventMapping}}, the data is only removed when the 
taskAttempt is assigned. If taskAttempt is killed before being assigned, 
{{taskAttemptToEventMapping}} would still have the taskAttempt.
{quote}{quote}
Will update this
{quote} * Racks are going to be black listed too. Not just nodes. I believe 
that the javadoc and description in default.xml should emphasize that enabling 
the flag also avoids the local rack unless no other rack is available for 
scheduling.{quote}
Actually when task attempt is killed by default Avataar is VIRGIN. this is 
defect which needs to be addressed. If speculative task attempt is killed it is 
launched as normal task attempt
{quote} * why do we need {{mapTaskAttemptToAvataar}} when each taskAttempt has 
a field called {{avataar}} ?{quote}
How do you get taskattempt details in RMContainerAllocator??
{quote} - That's a design issue. One would expect that RequestEvent's lifetime 
should not survive {{handle()}} call. Therefore, the metadata should be 
consumed by the handlers. In the patch, 
{{ContainerRequestEvent.blacklistedNodes}} could be a field in taskAttempt. 
Then you won't need {{TaskAttemptBlacklistManager}} class.{quote}

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-04-29 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095502#comment-17095502
 ] 

Bilwa S T edited comment on MAPREDUCE-7169 at 4/29/20, 2:40 PM:


Hi [~ahussein]
{code:java}
The same node will be picked if there are no other available nodes. In 
MAPREDUCE-7169.005.patch , what is the expected behavior if the resources 
available to run the taskAttempt are only available on the same node? I do not 
see this case in the unit test.{code}

Container is not assigned until resources are available on other node.  Task 
will wait until it gets container on other node. As per use case we do not want 
container to be launched on same node as node might have a problem


was (Author: bilwast):
Hi [~ahussein]
{code:java}
The same node will be picked if there are no other available nodes. In 
MAPREDUCE-7169.005.patch , what is the expected behavior if the resources 
available to run the taskAttempt are only available on the same node? I do not 
see this case in the unit test.{code}

Container is not assigned until resources are available on other node. 

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-04-29 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095502#comment-17095502
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

Hi [~ahussein]
{code:java}
The same node will be picked if there are no other available nodes. In 
MAPREDUCE-7169.005.patch , what is the expected behavior if the resources 
available to run the taskAttempt are only available on the same node? I do not 
see this case in the unit test.{code}

Container is not assigned until resources are available on other node. 

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-04-28 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094600#comment-17094600
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

Hi [~jeagles]

can you please review when you have free time?

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-04-15 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084024#comment-17084024
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

Hi [~jeagles]

Could you please help to review?

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7199) HsJobsBlock reuse JobACLsManager for checkAccess

2020-04-13 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082254#comment-17082254
 ] 

Bilwa S T commented on MAPREDUCE-7199:
--

Hi [~surendrasingh] 
i have attached a new patch. As part of MAPREDUCE-7097 addendum it was added 
that job owner and MR admin should be able to view job if filter entity list by 
user is enabled.  As JobACLManager takes care of verifying adminAcl there is no 
need to pass AdminACL
Please review latest patch

> HsJobsBlock reuse JobACLsManager for checkAccess
> 
>
> Key: MAPREDUCE-7199
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7199
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: MAPREDUCE-7199-001.patch, MAPREDUCE-7199.002.patch, 
> MAPREDUCE-7199.003.patch
>
>
> Reuse JobAclManager.checkAccess
> {code} 
>  private boolean checkAccess(String userName) {
> if(!areAclsEnabled) {
>   return true;
> }
> // User could see its own job.
> if (ugi.getShortUserName().equals(userName)) {
>   return true;
> }
> // Admin could also see all jobs
> if (adminAclList != null && adminAclList.isUserAllowed(ugi)) {
>   return true;
> }
> return false;
>   }
> {code} 
> {code}
> jobACLsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, ..
>   new AccessControlList()))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7199) HsJobsBlock reuse JobACLsManager for checkAccess

2020-04-13 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7199:
-
Attachment: MAPREDUCE-7199.003.patch

> HsJobsBlock reuse JobACLsManager for checkAccess
> 
>
> Key: MAPREDUCE-7199
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7199
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: MAPREDUCE-7199-001.patch, MAPREDUCE-7199.002.patch, 
> MAPREDUCE-7199.003.patch
>
>
> Reuse JobAclManager.checkAccess
> {code} 
>  private boolean checkAccess(String userName) {
> if(!areAclsEnabled) {
>   return true;
> }
> // User could see its own job.
> if (ugi.getShortUserName().equals(userName)) {
>   return true;
> }
> // Admin could also see all jobs
> if (adminAclList != null && adminAclList.isUserAllowed(ugi)) {
>   return true;
> }
> return false;
>   }
> {code} 
> {code}
> jobACLsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, ..
>   new AccessControlList()))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-04-11 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7169:
-
Attachment: MAPREDUCE-7169.005.patch

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, MAPREDUCE-7169.005.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-04-11 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081325#comment-17081325
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

cc [~ahussein] [~jeagles] [~jiwq]

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-04-11 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7169:
-
Attachment: MAPREDUCE-7169.004.patch

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, MAPREDUCE-7169.004.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-04-11 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081201#comment-17081201
 ] 

Bilwa S T edited comment on MAPREDUCE-7169 at 4/11/20, 7:32 AM:


Hi [~ahussein] 

Sorry for late reply. 
{quote}I see that you add the node hosting the original task to the blacklist 
of the speculative task. Wouldn't it be easier just to change the order of the 
dataLocalHosts so that the node will be picked last in the loop? In that case, 
the speculative task will run on the same node only if all other nodes cannot 
be assigned to the speculative task.
{quote}
With change like this, there are chances that task attempt will get launched on 
same node where it was launched which wouldn't solve the problem. If we 
blacklist node then in next iteration of containers assigned it will be 
launched on other node.
{quote}My intuition is that changing the policy to pick the node for the 
speculative task will inherently change the efficiency of the speculation.
For example, picking a different node may increase the startup time of the 
speculative task. This implies change of the speculation efficiency compared to 
the legacy behavior. Thus, I suggest to give the option for the user to 
enable/disable the new policy in case she prefers to evaluate the new behavior 
and revert back to the legacy one if necessary.
{quote}

I agree with this. i will change code accordingly. Currently working on it


was (Author: bilwast):
Hi [~ahussein] 

Sorry for late reply. 
{quote}I see that you add the node hosting the original task to the blacklist 
of the speculative task. Wouldn't it be easier just to change the order of the 
dataLocalHosts so that the node will be picked last in the loop? In that case, 
the speculative task will run on the same node only if all other nodes cannot 
be assigned to the speculative task.
{quote}

With change like this, there are chances that task attempt will get launched on 
same node where it was launched which wouldn't solve the problem. If we 
blacklist node then in next iteration of containers assigned it will be 
launched on other node.

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2020-04-11 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17081201#comment-17081201
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

Hi [~ahussein] 

Sorry for late reply. 
{quote}I see that you add the node hosting the original task to the blacklist 
of the speculative task. Wouldn't it be easier just to change the order of the 
dataLocalHosts so that the node will be picked last in the loop? In that case, 
the speculative task will run on the same node only if all other nodes cannot 
be assigned to the speculative task.
{quote}

With change like this, there are chances that task attempt will get launched on 
same node where it was launched which wouldn't solve the problem. If we 
blacklist node then in next iteration of containers assigned it will be 
launched on other node.

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7199) HsJobsBlock reuse JobACLsManager for checkAccess

2020-04-10 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080617#comment-17080617
 ] 

Bilwa S T commented on MAPREDUCE-7199:
--

Thanks [~surendrasingh] for review comments.

Instead of creating JobAClsManager object and calling  
JobACLsManager#checkAccess we can directly call
{code:java}
job.checkAccess(ugi, JobACL.VIEW_JOB)
{code}
which will call JobACLsManager#checkAccess. I have uploaded patch. Please review

> HsJobsBlock reuse JobACLsManager for checkAccess
> 
>
> Key: MAPREDUCE-7199
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7199
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: MAPREDUCE-7199-001.patch, MAPREDUCE-7199.002.patch
>
>
> Reuse JobAclManager.checkAccess
> {code} 
>  private boolean checkAccess(String userName) {
> if(!areAclsEnabled) {
>   return true;
> }
> // User could see its own job.
> if (ugi.getShortUserName().equals(userName)) {
>   return true;
> }
> // Admin could also see all jobs
> if (adminAclList != null && adminAclList.isUserAllowed(ugi)) {
>   return true;
> }
> return false;
>   }
> {code} 
> {code}
> jobACLsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, ..
>   new AccessControlList()))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7199) HsJobsBlock reuse JobACLsManager for checkAccess

2020-04-10 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7199:
-
Attachment: MAPREDUCE-7199.002.patch

> HsJobsBlock reuse JobACLsManager for checkAccess
> 
>
> Key: MAPREDUCE-7199
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7199
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Bibin Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: MAPREDUCE-7199-001.patch, MAPREDUCE-7199.002.patch
>
>
> Reuse JobAclManager.checkAccess
> {code} 
>  private boolean checkAccess(String userName) {
> if(!areAclsEnabled) {
>   return true;
> }
> // User could see its own job.
> if (ugi.getShortUserName().equals(userName)) {
>   return true;
> }
> // Admin could also see all jobs
> if (adminAclList != null && adminAclList.isUserAllowed(ugi)) {
>   return true;
> }
> return false;
>   }
> {code} 
> {code}
> jobACLsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, ..
>   new AccessControlList()))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-12-21 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001832#comment-17001832
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

Hi [~jeagles]  can u please review the patch?

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-11-26 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7169:
-
Attachment: MAPREDUCE-7169-003.patch

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> MAPREDUCE-7169-003.patch, image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-11-26 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16983178#comment-16983178
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

HI [~jeagles]  I will update it by EOD

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-11-16 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975784#comment-16975784
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

[~jiwq] sorry i missed ur comment on patch. I have uploaded latest patch. 
Please check

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-11-16 Thread Bilwa S T (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975783#comment-16975783
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

[~ahussein] I have updated patch.

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-11-16 Thread Bilwa S T (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7169:
-
Attachment: MAPREDUCE-7169-002.patch

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, MAPREDUCE-7169-002.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7199) HsJobsBlock reuse JobACLsManager for checkAccess

2019-04-26 Thread Bilwa S T (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826671#comment-16826671
 ] 

Bilwa S T commented on MAPREDUCE-7199:
--

Hi [~bibinchundatt]  i have attached patch. Please review

> HsJobsBlock reuse JobACLsManager for checkAccess
> 
>
> Key: MAPREDUCE-7199
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7199
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: MAPREDUCE-7199-001.patch
>
>
> Reuse JobAclManager.checkAccess
> {code} 
>  private boolean checkAccess(String userName) {
> if(!areAclsEnabled) {
>   return true;
> }
> // User could see its own job.
> if (ugi.getShortUserName().equals(userName)) {
>   return true;
> }
> // Admin could also see all jobs
> if (adminAclList != null && adminAclList.isUserAllowed(ugi)) {
>   return true;
> }
> return false;
>   }
> {code} 
> {code}
> jobACLsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, ..
>   new AccessControlList()))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7199) HsJobsBlock reuse JobACLsManager for checkAccess

2019-04-26 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7199:
-
Attachment: MAPREDUCE-7199-001.patch

> HsJobsBlock reuse JobACLsManager for checkAccess
> 
>
> Key: MAPREDUCE-7199
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7199
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: MAPREDUCE-7199-001.patch
>
>
> Reuse JobAclManager.checkAccess
> {code} 
>  private boolean checkAccess(String userName) {
> if(!areAclsEnabled) {
>   return true;
> }
> // User could see its own job.
> if (ugi.getShortUserName().equals(userName)) {
>   return true;
> }
> // Admin could also see all jobs
> if (adminAclList != null && adminAclList.isUserAllowed(ugi)) {
>   return true;
> }
> return false;
>   }
> {code} 
> {code}
> jobACLsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, ..
>   new AccessControlList()))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7199) HsJobsBlock reuse JobACLsManager for checkAccess

2019-04-26 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7199:
-
Status: Patch Available  (was: Open)

> HsJobsBlock reuse JobACLsManager for checkAccess
> 
>
> Key: MAPREDUCE-7199
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7199
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Minor
> Attachments: MAPREDUCE-7199-001.patch
>
>
> Reuse JobAclManager.checkAccess
> {code} 
>  private boolean checkAccess(String userName) {
> if(!areAclsEnabled) {
>   return true;
> }
> // User could see its own job.
> if (ugi.getShortUserName().equals(userName)) {
>   return true;
> }
> // Admin could also see all jobs
> if (adminAclList != null && adminAclList.isUserAllowed(ugi)) {
>   return true;
> }
> return false;
>   }
> {code} 
> {code}
> jobACLsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, ..
>   new AccessControlList()))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7199) HsJobsBlock reuse JobACLsManager for checkAccess

2019-04-23 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned MAPREDUCE-7199:


Assignee: Bilwa S T

> HsJobsBlock reuse JobACLsManager for checkAccess
> 
>
> Key: MAPREDUCE-7199
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7199
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Minor
>
> Reuse JobAclManager.checkAccess
> {code} 
>  private boolean checkAccess(String userName) {
> if(!areAclsEnabled) {
>   return true;
> }
> // User could see its own job.
> if (ugi.getShortUserName().equals(userName)) {
>   return true;
> }
> // Admin could also see all jobs
> if (adminAclList != null && adminAclList.isUserAllowed(ugi)) {
>   return true;
> }
> return false;
>   }
> {code} 
> {code}
> jobACLsManager
>   .checkAccess(ugi, JobACL.VIEW_JOB, ..
>   new AccessControlList()))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7116) UT failure in TestMRTimelineEventHandling#testMRNewTimelineServiceEventHandling -Hadoop-3.1

2019-04-18 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned MAPREDUCE-7116:


Assignee: Bilwa S T

> UT failure in 
> TestMRTimelineEventHandling#testMRNewTimelineServiceEventHandling -Hadoop-3.1
> ---
>
> Key: MAPREDUCE-7116
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7116
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: K G Bakthavachalam
>Assignee: Bilwa S T
>Priority: Major
>
> unit test failure in 
> TestMRTimelineEventHandling#testMRNewTimelineServiceEventHandling due to 
> timeline server issue..due to Java.io.IOException : Job din't finish in 
> 30 seconds at 
> org.apache.hadoop.mapred.UtilsForTests.runJobSucceed(UtilsForTests.java:659)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-03-21 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7169:
-
Attachment: MAPREDUCE-7169-001.patch

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-03-21 Thread Bilwa S T (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798074#comment-16798074
 ] 

Bilwa S T commented on MAPREDUCE-7169:
--

cc [~bibinchundatt] [~jlowe]

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-03-21 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T updated MAPREDUCE-7169:
-
Status: Patch Available  (was: Open)

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: MAPREDUCE-7169-001.patch, 
> image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7195) Mapreduce task timeout to zero could cause too many status update

2019-03-21 Thread Bilwa S T (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bilwa S T reassigned MAPREDUCE-7195:


Assignee: Bilwa S T

> Mapreduce task timeout to zero could cause too many status update
> -
>
> Key: MAPREDUCE-7195
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7195
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bilwa S T
>Priority: Major
> Attachments: screenshot-1.png
>
>
> * mapreduce.task.timeout=0
> Could create too many status update
> {code}
>   public static long getTaskProgressReportInterval(final Configuration conf) {
> long taskHeartbeatTimeOut = conf.getLong(
> MRJobConfig.TASK_TIMEOUT, MRJobConfig.DEFAULT_TASK_TIMEOUT_MILLIS);
> return conf.getLong(MRJobConfig.TASK_PROGRESS_REPORT_INTERVAL,
> (long) (TASK_REPORT_INTERVAL_TO_TIMEOUT_RATIO * 
> taskHeartbeatTimeOut));
>   }
> {code}
> mapreduce timeout=0 is used to disable timeout feature



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-03-17 Thread Bilwa S T (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794711#comment-16794711
 ] 

Bilwa S T edited comment on MAPREDUCE-7169 at 3/18/19 4:57 AM:
---

Hi [~uranus]

Same as what [~bibinchundatt] had suggested. We can skip nodes where previous 
attempt was launched. 
{quote} * In TaskImpl#addAndScheduleAttempt  whenever Avataar is SPECULATIVE  
update previous attempts nodes and racks                                 
 *  In TaskAttemptImpl#RequestContainerTransition when Avataar is SPECULATIVE , 
we can skip previous attempts nodes and racks . Also we need to keep record of 
blacklisted  nodes                                                   
 *  During allocation for mapper RMContainerAllocator#assignMapsWithLocality 
,we have three types of  resource requests                             
 *  Hosts - already we have updated datahosts for the task attempt              
   
 *  Racks - this is also taken care in taskattemptimpl                          
             
 *  Any - In this case we need blacklisted node record which we had upadated so 
that we can check if node is blacklisted for the task. If it is blacklisted 
then we skip allocating and it would retry for other container {quote}


was (Author: bilwast):
Hi [~uranus]

Same as what [~bibinchundatt] had suggested. We can skip nodes where previous 
attempt was launched. 
{quote} * In TaskImpl#addAndScheduleAttempt  whenever Avataar is SPECULATIVE  
update previous attempts nodes and racks                                 
 *  In TaskAttemptImpl#RequestContainerTransition when Avataar is SPECULATIVE , 
we can skip previous attempts nodes and racks . Also we need to keep record of 
blacklisted  nodes                                                   
 *  During allocation for mapper RMContainerAllocator#assignMapsWithLocality 
,we have three types of  resource requests                             1. Hosts 
- already we have updated datahosts for the task attempt                  2. 
Racks - this is also taken care in taskattemptimpl                              
           3. Any - In this case we need blacklisted node record which we had 
upadated so that we can check if node is blacklisted for the task. If it is 
blacklisted then we skip allocating and it would retry for other container 
{quote}

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-03-17 Thread Bilwa S T (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794711#comment-16794711
 ] 

Bilwa S T edited comment on MAPREDUCE-7169 at 3/18/19 4:53 AM:
---

Hi [~uranus]

Same as what [~bibinchundatt] had suggested. We can skip nodes where previous 
attempt was launched. 
{quote} * In TaskImpl#addAndScheduleAttempt  whenever Avataar is SPECULATIVE  
update previous attempts nodes and racks                                 
 *  In TaskAttemptImpl#RequestContainerTransition when Avataar is SPECULATIVE , 
we can skip previous attempts nodes and racks . Also we need to keep record of 
blacklisted  nodes                                                   
 *  During allocation for mapper RMContainerAllocator#assignMapsWithLocality 
,we have three types of  resource requests                                      
                                                    1. Hosts - already we have 
updated datahosts for the task attempt                 2. Racks - this is also 
taken care in taskattemptimpl                                       3. Any - In 
this case we need blacklisted node record which we had upadated so that we can 
check if node is blacklisted for the task. If it is blacklisted then we skip 
allocating and it would retry for other container {quote}


was (Author: bilwast):
Hi [~uranus]

Same as what [~bibinchundatt] had suggested. We can skip nodes where previous 
attempt was launched. 

1. In TaskImpl#addAndScheduleAttempt  whenever Avataar is SPECULATIVE           
                         update previous attempts nodes and racks               
                                                                     2. In 
TaskAttemptImpl#RequestContainerTransition when Avataar is SPECULATIVE          
              we can skip previous attempts nodes and racks . Also we need to 
keep record of blacklisted       nodes                                          
                                                                                
                       3. During allocation for mappers ie in 
RMContainerAllocator#assignMapsWithLocality                    we have three 
types of  resource requests                                                     
                                    1. Hosts - already we have updated 
datahosts for the task attempt                                                  
 2. Racks - this is also taken care in taskattemptimpl                          
                                               3. Any - In this case we need 
blacklisted node record which we had upadated so that we can check if node is 
blacklisted for the task. If it is blacklisted then we skip allocating and it 
would retry for other container 

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-7169) Speculative attempts should not run on the same node

2019-03-17 Thread Bilwa S T (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16794711#comment-16794711
 ] 

Bilwa S T edited comment on MAPREDUCE-7169 at 3/18/19 4:56 AM:
---

Hi [~uranus]

Same as what [~bibinchundatt] had suggested. We can skip nodes where previous 
attempt was launched. 
{quote} * In TaskImpl#addAndScheduleAttempt  whenever Avataar is SPECULATIVE  
update previous attempts nodes and racks                                 
 *  In TaskAttemptImpl#RequestContainerTransition when Avataar is SPECULATIVE , 
we can skip previous attempts nodes and racks . Also we need to keep record of 
blacklisted  nodes                                                   
 *  During allocation for mapper RMContainerAllocator#assignMapsWithLocality 
,we have three types of  resource requests                         1. Hosts - 
already we have updated datahosts for the task attempt                  2. 
Racks - this is also taken care in taskattemptimpl                              
           3. Any - In this case we need blacklisted node record which we had 
upadated so that we can check if node is blacklisted for the task. If it is 
blacklisted then we skip allocating and it would retry for other container 
{quote}


was (Author: bilwast):
Hi [~uranus]

Same as what [~bibinchundatt] had suggested. We can skip nodes where previous 
attempt was launched. 
{quote} # In TaskImpl#addAndScheduleAttempt  whenever Avataar is SPECULATIVE  
update previous attempts nodes and racks                                 
 #  In TaskAttemptImpl#RequestContainerTransition when Avataar is SPECULATIVE , 
we can skip previous attempts nodes and racks . Also we need to keep record of 
blacklisted  nodes                                                   
 #  During allocation for mapper RMContainerAllocator#assignMapsWithLocality 
,we have three types of  resource requests                                      
                    1. Hosts - already we have updated datahosts for the task 
attempt                  2. Racks - this is also taken care in taskattemptimpl  
                                       3. Any - In this case we need 
blacklisted node record which we had upadated so that we can check if node is 
blacklisted for the task. If it is blacklisted then we skip allocating and it 
would retry for other container {quote}

> Speculative attempts should not run on the same node
> 
>
> Key: MAPREDUCE-7169
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7169
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: yarn
>Affects Versions: 2.7.2
>Reporter: Lee chen
>Assignee: Bilwa S T
>Priority: Major
> Attachments: image-2018-12-03-09-54-07-859.png
>
>
>   I found in all versions of yarn, Speculative Execution may set the 
> speculative task to the node of  original task.What i have read is only it 
> will try to have one more task attempt. haven't seen any place mentioning not 
> on same node.It is unreasonable.If the node have some problems lead to tasks 
> execution will be very slow. and then placement the speculative  task to same 
> node cannot help the  problematic task.
>  In our cluster (version 2.7.2,2700 nodes),this phenomenon appear 
> almost everyday.
>  !image-2018-12-03-09-54-07-859.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



  1   2   >