[jira] [Created] (YARN-11640) capacity scheduler supports application priority with FairOrderingPolicy

2024-01-04 Thread Ming Chen (Jira)
Ming Chen created YARN-11640:


 Summary: capacity scheduler supports application priority with 
FairOrderingPolicy
 Key: YARN-11640
 URL: https://issues.apache.org/jira/browse/YARN-11640
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: yarn
Reporter: Ming Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802793#comment-17802793
 ] 

Shilun Fan commented on YARN-574:
-

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Ajith S
>Priority: Major
> Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.05.patch, 
> YARN-574.1.patch, YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-574) PrivateLocalizer does not support parallel resource download via ContainerLocalizer

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-574:

Target Version/s: 3.5.0  (was: 3.4.0)

> PrivateLocalizer does not support parallel resource download via 
> ContainerLocalizer
> ---
>
> Key: YARN-574
> URL: https://issues.apache.org/jira/browse/YARN-574
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Omkar Vinit Joshi
>Assignee: Ajith S
>Priority: Major
> Attachments: YARN-574.03.patch, YARN-574.04.patch, YARN-574.05.patch, 
> YARN-574.1.patch, YARN-574.2.patch
>
>
> At present private resources will be downloaded in parallel only if multiple 
> containers request the same resource. However otherwise it will be serial. 
> The protocol between PrivateLocalizer and ContainerLocalizer supports 
> multiple downloads however it is not used and only one resource is sent for 
> downloading at a time.
> I think we can increase / assure parallelism (even for single container 
> requesting resource) for private/application resources by making multiple 
> downloads per ContainerLocalizer.
> Total Parallelism before
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers[private and application resource]
> Total Parallelism after
> = number of threads allotted for PublicLocalizer [public resource] + number 
> of containers * max downloads per container [private and application resource]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1564) add workflow YARN services

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802788#comment-17802788
 ] 

Shilun Fan commented on YARN-1564:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> add workflow YARN services
> --
>
> Key: YARN-1564
> URL: https://issues.apache.org/jira/browse/YARN-1564
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: oct16-hard
> Attachments: YARN-1564-001.patch, YARN-1564-002.patch, 
> YARN-1564-003.patch
>
>   Original Estimate: 24h
>  Time Spent: 48h
>  Remaining Estimate: 0h
>
> I've been using some alternative composite services to help build workflows 
> of process execution in a YARN AM.
> They and their tests could be moved in YARN for the use by others -this would 
> make it easier to build aggregate services in an AM



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-1426:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> YARN Components need to unregister their beans upon shutdown
> 
>
> Key: YARN-1426
> URL: https://issues.apache.org/jira/browse/YARN-1426
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.3.0, 3.0.0-alpha1
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: oct16-easy
> Attachments: YARN-1426.2.patch, YARN-1426.patch, YARN-1426.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-867) Isolation of failures in aux services

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-867:

Target Version/s: 3.5.0  (was: 3.4.0)

> Isolation of failures in aux services 
> --
>
> Key: YARN-867
> URL: https://issues.apache.org/jira/browse/YARN-867
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hitesh Shah
>Assignee: Xuan Gong
>Priority: Major
> Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, 
> YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, 
> YARN-867.sampleCode.2.patch
>
>
> Today, a malicious application can bring down the NM by sending bad data to a 
> service. For example, sending data to the ShuffleService such that it results 
> any non-IOException will cause the NM's async dispatcher to exit as the 
> service's INIT APP event is not handled properly. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1564) add workflow YARN services

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-1564:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> add workflow YARN services
> --
>
> Key: YARN-1564
> URL: https://issues.apache.org/jira/browse/YARN-1564
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: api, nodemanager, resourcemanager
>Affects Versions: 2.8.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: oct16-hard
> Attachments: YARN-1564-001.patch, YARN-1564-002.patch, 
> YARN-1564-003.patch
>
>   Original Estimate: 24h
>  Time Spent: 48h
>  Remaining Estimate: 0h
>
> I've been using some alternative composite services to help build workflows 
> of process execution in a YARN AM.
> They and their tests could be moved in YARN for the use by others -this would 
> make it easier to build aggregate services in an AM



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-1946) need Public interface for WebAppUtils.getProxyHostAndPort

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-1946:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> need Public interface for WebAppUtils.getProxyHostAndPort
> -
>
> Key: YARN-1946
> URL: https://issues.apache.org/jira/browse/YARN-1946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, webapp
>Affects Versions: 2.4.0
>Reporter: Thomas Graves
>Priority: Major
>
> ApplicationMasters are supposed to go through the ResourceManager web app 
> proxy if they have web UI's so they are properly secured.  There is currently 
> no public interface for Application Masters to conveniently get the proxy 
> host and port.  There is a function in WebAppUtils, but that class is 
> private.  
> We should provide this as a utility since any properly written AM will need 
> to do this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1426) YARN Components need to unregister their beans upon shutdown

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802789#comment-17802789
 ] 

Shilun Fan commented on YARN-1426:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> YARN Components need to unregister their beans upon shutdown
> 
>
> Key: YARN-1426
> URL: https://issues.apache.org/jira/browse/YARN-1426
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.3.0, 3.0.0-alpha1
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: oct16-easy
> Attachments: YARN-1426.2.patch, YARN-1426.patch, YARN-1426.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1946) need Public interface for WebAppUtils.getProxyHostAndPort

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802787#comment-17802787
 ] 

Shilun Fan commented on YARN-1946:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> need Public interface for WebAppUtils.getProxyHostAndPort
> -
>
> Key: YARN-1946
> URL: https://issues.apache.org/jira/browse/YARN-1946
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, webapp
>Affects Versions: 2.4.0
>Reporter: Thomas Graves
>Priority: Major
>
> ApplicationMasters are supposed to go through the ResourceManager web app 
> proxy if they have web UI's so they are properly secured.  There is currently 
> no public interface for Application Masters to conveniently get the proxy 
> host and port.  There is a function in WebAppUtils, but that class is 
> private.  
> We should provide this as a utility since any properly written AM will need 
> to do this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802785#comment-17802785
 ] 

Shilun Fan commented on YARN-2024:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>Priority: Major
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2024) IOException in AppLogAggregatorImpl does not give stacktrace and leaves aggregated TFile in a bad state.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-2024:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> IOException in AppLogAggregatorImpl does not give stacktrace and leaves 
> aggregated TFile in a bad state.
> 
>
> Key: YARN-2024
> URL: https://issues.apache.org/jira/browse/YARN-2024
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 0.23.10, 2.4.0
>Reporter: Eric Payne
>Assignee: Xuan Gong
>Priority: Major
>
> Multiple issues were encountered when AppLogAggregatorImpl encountered an 
> IOException in AppLogAggregatorImpl#uploadLogsForContainer while aggregating 
> yarn-logs for an application that had very large (>150G each) error logs.
> - An IOException was encountered during the LogWriter#append call, and a 
> message was printed, but no stacktrace was provided. Message: "ERROR: 
> Couldn't upload logs for container_n_nnn_nn_nn. Skipping 
> this container."
> - After the IOExceptin, the TFile is in a bad state, so subsequent calls to 
> LogWriter#append fail with the following stacktrace:
> 2014-04-16 13:29:09,772 [LogAggregationService #17907] ERROR 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[LogAggregationService #17907,5,main] threw an Exception.
> java.lang.IllegalStateException: Incorrect state to start a new key: IN_VALUE
> at 
> org.apache.hadoop.io.file.tfile.TFile$Writer.prepareAppendKey(TFile.java:528)
> at 
> org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.append(AggregatedLogFormat.java:262)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainer(AppLogAggregatorImpl.java:128)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:164)
> ...
> - At this point, the yarn-logs cleaner still thinks the thread is 
> aggregating, so the huge yarn-logs never get cleaned up for that application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2098) App priority support in Fair Scheduler

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-2098:
-
Target Version/s: 3.5.0

> App priority support in Fair Scheduler
> --
>
> Key: YARN-2098
> URL: https://issues.apache.org/jira/browse/YARN-2098
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-2098.patch, YARN-2098.patch
>
>
> This jira is created for supporting app priorities in fair scheduler. 
> AppSchedulable hard codes priority of apps to 1, we should change this to get 
> priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-2031:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> YARN Proxy model doesn't support REST APIs in AMs
> -
>
> Key: YARN-2031
> URL: https://issues.apache.org/jira/browse/YARN-2031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: YARN-2031-002.patch, YARN-2031-003.patch, 
> YARN-2031-004.patch, YARN-2031-005.patch, YARN-2031.patch.001
>
>
> AMs can't support REST APIs because
> # the AM filter redirects all requests to the proxy with a 302 response (not 
> 307)
> # the proxy doesn't forward PUT/POST/DELETE verbs
> Either the AM filter needs to return 307 and the proxy to forward the verbs, 
> or Am filter should not filter a REST bit of the web site



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2031) YARN Proxy model doesn't support REST APIs in AMs

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802784#comment-17802784
 ] 

Shilun Fan commented on YARN-2031:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> YARN Proxy model doesn't support REST APIs in AMs
> -
>
> Key: YARN-2031
> URL: https://issues.apache.org/jira/browse/YARN-2031
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: YARN-2031-002.patch, YARN-2031-003.patch, 
> YARN-2031-004.patch, YARN-2031-005.patch, YARN-2031.patch.001
>
>
> AMs can't support REST APIs because
> # the AM filter redirects all requests to the proxy with a 302 response (not 
> 307)
> # the proxy doesn't forward PUT/POST/DELETE verbs
> Either the AM filter needs to return 307 and the proxy to forward the verbs, 
> or Am filter should not filter a REST bit of the web site



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2098) App priority support in Fair Scheduler

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802783#comment-17802783
 ] 

Shilun Fan commented on YARN-2098:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> App priority support in Fair Scheduler
> --
>
> Key: YARN-2098
> URL: https://issues.apache.org/jira/browse/YARN-2098
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-2098.patch, YARN-2098.patch
>
>
> This jira is created for supporting app priorities in fair scheduler. 
> AppSchedulable hard codes priority of apps to 1, we should change this to get 
> priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Reopened] (YARN-2098) App priority support in Fair Scheduler

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reopened YARN-2098:
--
  Assignee: Shilun Fan

> App priority support in Fair Scheduler
> --
>
> Key: YARN-2098
> URL: https://issues.apache.org/jira/browse/YARN-2098
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-2098.patch, YARN-2098.patch
>
>
> This jira is created for supporting app priorities in fair scheduler. 
> AppSchedulable hard codes priority of apps to 1, we should change this to get 
> priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2098) App priority support in Fair Scheduler

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-2098:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> App priority support in Fair Scheduler
> --
>
> Key: YARN-2098
> URL: https://issues.apache.org/jira/browse/YARN-2098
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-2098.patch, YARN-2098.patch
>
>
> This jira is created for supporting app priorities in fair scheduler. 
> AppSchedulable hard codes priority of apps to 1, we should change this to get 
> priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802782#comment-17802782
 ] 

Shilun Fan commented on YARN-2681:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Support bandwidth enforcement for containers while reading from HDFS
> 
>
> Key: YARN-2681
> URL: https://issues.apache.org/jira/browse/YARN-2681
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.5.1
> Environment: Linux
>Reporter: Nam H. Do
>Priority: Major
> Attachments: Traffic Control Design.png, YARN-2681.001.patch, 
> YARN-2681.002.patch, YARN-2681.003.patch, YARN-2681.004.patch, 
> YARN-2681.005.patch, YARN-2681.patch
>
>
> To read/write data from HDFS on data node, applications establise TCP/IP 
> connections with the datanode. The HDFS read can be controled by setting 
> Linux Traffic Control  (TC) subsystem on the data node to make filters on 
> appropriate connections.
> The current cgroups net_cls concept can not be applied on the node where the 
> container is launched, netheir on data node since:
> -   TC hanldes outgoing bandwidth only, so it can be set on container node 
> (HDFS read = incoming data for the container)
> -   Since HDFS data node is handled by only one process,  it is not possible 
> to use net_cls to separate connections from different containers to the 
> datanode.
> Tasks:
> 1) Extend Resource model to define bandwidth enforcement rate
> 2) Monitor TCP/IP connection estabilised by container handling process and 
> its child processes
> 3) Set Linux Traffic Control rules on data node base on address:port pairs in 
> order to enforce bandwidth of outgoing data
> Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf
> Implementation: 
> http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf
> http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl_UML_diagram.png



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-2098) App priority support in Fair Scheduler

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved YARN-2098.
--
Target Version/s:   (was: 3.5.0)
  Resolution: Done

> App priority support in Fair Scheduler
> --
>
> Key: YARN-2098
> URL: https://issues.apache.org/jira/browse/YARN-2098
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Ashwin Shankar
>Priority: Major
>  Labels: pull-request-available
> Attachments: YARN-2098.patch, YARN-2098.patch
>
>
> This jira is created for supporting app priorities in fair scheduler. 
> AppSchedulable hard codes priority of apps to 1, we should change this to get 
> priority from ApplicationSubmissionContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2684) FairScheduler: When failing an application due to changes in queue config or placement policy, indicate the cause.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-2684:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> FairScheduler: When failing an application due to changes in queue config or 
> placement policy, indicate the cause.
> --
>
> Key: YARN-2684
> URL: https://issues.apache.org/jira/browse/YARN-2684
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.5.1
>Reporter: Karthik Kambatla
>Priority: Major
> Attachments: 0001-YARN-2684.patch, 0002-YARN-2684.patch
>
>
> YARN-2308 fixes this issue for CS, this JIRA is to fix it for FS. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2681) Support bandwidth enforcement for containers while reading from HDFS

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-2681:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Support bandwidth enforcement for containers while reading from HDFS
> 
>
> Key: YARN-2681
> URL: https://issues.apache.org/jira/browse/YARN-2681
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 2.5.1
> Environment: Linux
>Reporter: Nam H. Do
>Priority: Major
> Attachments: Traffic Control Design.png, YARN-2681.001.patch, 
> YARN-2681.002.patch, YARN-2681.003.patch, YARN-2681.004.patch, 
> YARN-2681.005.patch, YARN-2681.patch
>
>
> To read/write data from HDFS on data node, applications establise TCP/IP 
> connections with the datanode. The HDFS read can be controled by setting 
> Linux Traffic Control  (TC) subsystem on the data node to make filters on 
> appropriate connections.
> The current cgroups net_cls concept can not be applied on the node where the 
> container is launched, netheir on data node since:
> -   TC hanldes outgoing bandwidth only, so it can be set on container node 
> (HDFS read = incoming data for the container)
> -   Since HDFS data node is handled by only one process,  it is not possible 
> to use net_cls to separate connections from different containers to the 
> datanode.
> Tasks:
> 1) Extend Resource model to define bandwidth enforcement rate
> 2) Monitor TCP/IP connection estabilised by container handling process and 
> its child processes
> 3) Set Linux Traffic Control rules on data node base on address:port pairs in 
> order to enforce bandwidth of outgoing data
> Concept: http://www.hit.bme.hu/~do/papers/EnforcementDesign.pdf
> Implementation: 
> http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl.pdf
> http://www.hit.bme.hu/~dohoai/documents/HdfsTrafficControl_UML_diagram.png



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2748) Upload logs in the sub-folders under the local log dir when aggregating logs

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-2748:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Upload logs in the sub-folders under the local log dir when aggregating logs
> 
>
> Key: YARN-2748
> URL: https://issues.apache.org/jira/browse/YARN-2748
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: log-aggregation
>Affects Versions: 2.6.0
>Reporter: Zhijie Shen
>Assignee: Varun Saxena
>Priority: Major
> Attachments: YARN-2748.001.patch, YARN-2748.002.patch, 
> YARN-2748.03.patch, YARN-2748.04.patch
>
>
> YARN-2734 has a temporal fix to skip sub folders to avoid exception. Ideally, 
> if the app is creating a sub folder and putting its rolling logs there, we 
> need to upload these logs as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-2836) RM behaviour on token renewal failures is broken

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802779#comment-17802779
 ] 

Shilun Fan commented on YARN-2836:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> RM behaviour on token renewal failures is broken
> 
>
> Key: YARN-2836
> URL: https://issues.apache.org/jira/browse/YARN-2836
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
>
> Found this while reviewing YARN-2834.
> We now completely ignore token renewal failures. For things like Timeline 
> tokens which are automatically obtained whether the app needs it or not (we 
> should fix this to be user driven), we can ignore failures. But for HDFS 
> Tokens etc, ignoring failures is bad because it (1) wastes resources as AMs 
> will continue and eventually fail (2) app doesn't know what happened it fails 
> eventually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3232) Some application states are not necessarily exposed to users

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802777#comment-17802777
 ] 

Shilun Fan commented on YARN-3232:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Some application states are not necessarily exposed to users
> 
>
> Key: YARN-3232
> URL: https://issues.apache.org/jira/browse/YARN-3232
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Jian He
>Assignee: Varun Saxena
>Priority: Major
> Attachments: YARN-3232.002.patch, YARN-3232.01.patch, 
> YARN-3232.02.patch, YARN-3232.v2.01.patch
>
>
> application NEW_SAVING and SUBMITTED states are not necessarily exposed to 
> users as they mostly internal to the system, transient and not user-facing. 
> We may deprecate these two states and remove them from the web UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2836) RM behaviour on token renewal failures is broken

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-2836:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> RM behaviour on token renewal failures is broken
> 
>
> Key: YARN-2836
> URL: https://issues.apache.org/jira/browse/YARN-2836
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Major
>
> Found this while reviewing YARN-2834.
> We now completely ignore token renewal failures. For things like Timeline 
> tokens which are automatically obtained whether the app needs it or not (we 
> should fix this to be user driven), we can ignore failures. But for HDFS 
> Tokens etc, ignoring failures is bad because it (1) wastes resources as AMs 
> will continue and eventually fail (2) app doesn't know what happened it fails 
> eventually.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3232) Some application states are not necessarily exposed to users

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-3232:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Some application states are not necessarily exposed to users
> 
>
> Key: YARN-3232
> URL: https://issues.apache.org/jira/browse/YARN-3232
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Jian He
>Assignee: Varun Saxena
>Priority: Major
> Attachments: YARN-3232.002.patch, YARN-3232.01.patch, 
> YARN-3232.02.patch, YARN-3232.v2.01.patch
>
>
> application NEW_SAVING and SUBMITTED states are not necessarily exposed to 
> users as they mostly internal to the system, transient and not user-facing. 
> We may deprecate these two states and remove them from the web UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-3514:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Priority: Minor
>  Labels: oct16-easy
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802774#comment-17802774
 ] 

Shilun Fan commented on YARN-3625:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
> --
>
> Key: YARN-3625
> URL: https://issues.apache.org/jira/browse/YARN-3625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-3625.1.patch, YARN-3625.2.patch
>
>
> RollingLevelDBTimelineStore batches all entities in the same put to improve 
> performance. This causes an error when relating to an entity in the same put 
> however.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-3514) Active directory usernames like domain\login cause YARN failures

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802775#comment-17802775
 ] 

Shilun Fan commented on YARN-3514:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Active directory usernames like domain\login cause YARN failures
> 
>
> Key: YARN-3514
> URL: https://issues.apache.org/jira/browse/YARN-3514
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.2.0
> Environment: CentOS6
>Reporter: john lilley
>Priority: Minor
>  Labels: oct16-easy
> Attachments: YARN-3514.001.patch, YARN-3514.002.patch
>
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_01 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
> at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3625) RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-3625:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> RollingLevelDBTimelineStore Incorrectly Forbids Related Entity in Same Put
> --
>
> Key: YARN-3625
> URL: https://issues.apache.org/jira/browse/YARN-3625
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Reporter: Jonathan Turner Eagles
>Assignee: Jonathan Turner Eagles
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-3625.1.patch, YARN-3625.2.patch
>
>
> RollingLevelDBTimelineStore batches all entities in the same put to improve 
> performance. This causes an error when relating to an entity in the same put 
> however.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4485) [Umbrella] Capture per-application and per-queue container allocation latency

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4485:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> [Umbrella] Capture per-application and per-queue container allocation latency
> -
>
> Key: YARN-4485
> URL: https://issues.apache.org/jira/browse/YARN-4485
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Priority: Major
>  Labels: supportability, tuning
>
> Per-application and per-queue container allocation latencies would go a long 
> way towards help with tuning scheduler queue configs. 
> This umbrella JIRA tracks adding these metrics. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4435) Add RM Delegation Token DtFetcher Implementation for DtUtil

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4435:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Add RM Delegation Token DtFetcher Implementation for DtUtil
> ---
>
> Key: YARN-4435
> URL: https://issues.apache.org/jira/browse/YARN-4435
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client, security, yarn
>Affects Versions: 3.0.0-alpha2
>Reporter: Matthew Paduano
>Assignee: Matthew Paduano
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-4435-003.patch, YARN-4435-003.patch, 
> YARN-4435.00.patch.txt, YARN-4435.01.patch, YARN-4435.02.patch, 
> proposed_solution
>
>
> Add a class to yarn project that implements the DtFetcher interface to return 
> a RM delegation token object.  
> I attached a proposed class implementation that does this, but it cannot be 
> added as a patch until the interface is merged in HADOOP-12563



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3861) Add fav icon to YARN & MR daemons web UI

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-3861:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Add fav icon to YARN & MR daemons web UI
> 
>
> Key: YARN-3861
> URL: https://issues.apache.org/jira/browse/YARN-3861
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: webapp
>Reporter: Devaraj Kavali
>Assignee: Devaraj Kavali
>Priority: Major
>  Labels: oct16-easy
> Attachments: RM UI in Chrome-With Patch.png, RM UI in Chrome-Without 
> Patch.png, RM UI in IE-With Patch.png, RM UI in IE-Without Patch.png.png, 
> YARN-3861.patch, hadoop-fav-transparent.png, hadoop-fav.png
>
>
> Add fav icon image to all YARN & MR daemons web UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4485) [Umbrella] Capture per-application and per-queue container allocation latency

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802770#comment-17802770
 ] 

Shilun Fan commented on YARN-4485:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> [Umbrella] Capture per-application and per-queue container allocation latency
> -
>
> Key: YARN-4485
> URL: https://issues.apache.org/jira/browse/YARN-4485
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Karthik Kambatla
>Priority: Major
>  Labels: supportability, tuning
>
> Per-application and per-queue container allocation latencies would go a long 
> way towards help with tuning scheduler queue configs. 
> This umbrella JIRA tracks adding these metrics. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4495:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, client
>Reporter: sandflee
>Priority: Major
>  Labels: oct16-hard
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4495) add a way to tell AM container increase/decrease request is invalid

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802764#comment-17802764
 ] 

Shilun Fan commented on YARN-4495:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> add a way to tell AM container increase/decrease request is invalid
> ---
>
> Key: YARN-4495
> URL: https://issues.apache.org/jira/browse/YARN-4495
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, client
>Reporter: sandflee
>Priority: Major
>  Labels: oct16-hard
> Attachments: YARN-4495.01.patch
>
>
> now RM may pass InvalidResourceRequestException to AM or just ignore the 
> change request, the former will cause AMRMClientAsync down. and the latter 
> will leave AM waiting for the relay.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4636) Make blacklist tracking policy pluggable for more extensions.

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802763#comment-17802763
 ] 

Shilun Fan commented on YARN-4636:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Make blacklist tracking policy pluggable for more extensions.
> -
>
> Key: YARN-4636
> URL: https://issues.apache.org/jira/browse/YARN-4636
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4636) Make blacklist tracking policy pluggable for more extensions.

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4636:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Make blacklist tracking policy pluggable for more extensions.
> -
>
> Key: YARN-4636
> URL: https://issues.apache.org/jira/browse/YARN-4636
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4637) AM launching blacklist purge mechanism (time based)

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4637:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> AM launching blacklist purge mechanism (time based)
> ---
>
> Key: YARN-4637
> URL: https://issues.apache.org/jira/browse/YARN-4637
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Sunil G
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4713) Warning by unchecked conversion in TestTimelineWebServices

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4713:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Warning by unchecked conversion in TestTimelineWebServices 
> ---
>
> Key: YARN-4713
> URL: https://issues.apache.org/jira/browse/YARN-4713
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Reporter: Tsuyoshi Ozawa
>Priority: Major
>  Labels: newbie
> Attachments: YARN-4713.1.patch, YARN-4713.2.patch
>
>
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java:[123,38]
>  [unchecked] unchecked conversion
> {code}
>   Enumeration names = mock(Enumeration.class);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4638) Node whitelist support for AM launching

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4638:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Node whitelist support for AM launching 
> 
>
> Key: YARN-4638
> URL: https://issues.apache.org/jira/browse/YARN-4638
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4713) Warning by unchecked conversion in TestTimelineWebServices

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802759#comment-17802759
 ] 

Shilun Fan commented on YARN-4713:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Warning by unchecked conversion in TestTimelineWebServices 
> ---
>
> Key: YARN-4713
> URL: https://issues.apache.org/jira/browse/YARN-4713
> Project: Hadoop YARN
>  Issue Type: Test
>  Components: test
>Reporter: Tsuyoshi Ozawa
>Priority: Major
>  Labels: newbie
> Attachments: YARN-4713.1.patch, YARN-4713.2.patch
>
>
> [WARNING] 
> /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/timeline/webapp/TestTimelineWebServices.java:[123,38]
>  [unchecked] unchecked conversion
> {code}
>   Enumeration names = mock(Enumeration.class);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4758) Enable discovery of AMs by containers

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4758:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Enable discovery of AMs by containers
> -
>
> Key: YARN-4758
> URL: https://issues.apache.org/jira/browse/YARN-4758
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Junping Du
>Priority: Major
> Attachments: YARN-4758. AM Discovery Service for YARN Container.pdf
>
>
> {color:red}
> This is already discussed on the umbrella JIRA YARN-1489.
> Copying some of my condensed summary from the design doc (section 3.2.10.3) 
> of YARN-4692.
> {color}
> Even after the existing work in Work­preserving AM restart (Section 3.1.2 / 
> YARN-1489), we still haven’t solved the problem of old running containers not 
> knowing where the new AM starts running after the previous AM crashes. This 
> is a specifically important problem to be solved for long running services 
> where we’d like to avoid killing service containers when AMs fail­over. So 
> far, we left this as a task for the apps, but solving it in YARN is much 
> desirable. [(Task) This looks very much like service­-registry (YARN-913), 
> but for app­containers to discover their own AMs.
> Combining this requirement (of any container being able to find their AM 
> across fail­overs) with those of services (to be able to find through DNS 
> where a service container is running - YARN-4757) will put our registry 
> scalability needs to be much higher than that of just service end­points. 
> This calls for a more distributed solution for registry readers  something 
> that is discussed in the comments section of YARN-1489 and MAPREDUCE-6608.
> See comment 
> https://issues.apache.org/jira/browse/YARN-1489?focusedCommentId=13862359=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13862359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4758) Enable discovery of AMs by containers

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802758#comment-17802758
 ] 

Shilun Fan commented on YARN-4758:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Enable discovery of AMs by containers
> -
>
> Key: YARN-4758
> URL: https://issues.apache.org/jira/browse/YARN-4758
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Junping Du
>Priority: Major
> Attachments: YARN-4758. AM Discovery Service for YARN Container.pdf
>
>
> {color:red}
> This is already discussed on the umbrella JIRA YARN-1489.
> Copying some of my condensed summary from the design doc (section 3.2.10.3) 
> of YARN-4692.
> {color}
> Even after the existing work in Work­preserving AM restart (Section 3.1.2 / 
> YARN-1489), we still haven’t solved the problem of old running containers not 
> knowing where the new AM starts running after the previous AM crashes. This 
> is a specifically important problem to be solved for long running services 
> where we’d like to avoid killing service containers when AMs fail­over. So 
> far, we left this as a task for the apps, but solving it in YARN is much 
> desirable. [(Task) This looks very much like service­-registry (YARN-913), 
> but for app­containers to discover their own AMs.
> Combining this requirement (of any container being able to find their AM 
> across fail­overs) with those of services (to be able to find through DNS 
> where a service container is running - YARN-4757) will put our registry 
> scalability needs to be much higher than that of just service end­points. 
> This calls for a more distributed solution for registry readers  something 
> that is discussed in the comments section of YARN-1489 and MAPREDUCE-6608.
> See comment 
> https://issues.apache.org/jira/browse/YARN-1489?focusedCommentId=13862359=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13862359



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4808) SchedulerNode can use a few more cosmetic changes

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802753#comment-17802753
 ] 

Shilun Fan commented on YARN-4808:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> SchedulerNode can use a few more cosmetic changes
> -
>
> Key: YARN-4808
> URL: https://issues.apache.org/jira/browse/YARN-4808
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Major
> Attachments: yarn-4808-1.patch, yarn-4808-2.patch
>
>
> We have made some cosmetic changes to SchedulerNode recently. While working 
> on YARN-4511, realized we could improve it a little more:
> # Remove volatile variables - don't see the need for them being volatile
> # Some methods end up doing very similar things, so consolidating them
> # Renaming totalResource to capacity. YARN-4511 plans to add inflatedCapacity 
> to include the un-utilized resources, and having two totals can be a little 
> confusing. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4804) [Umbrella] Improve test run duration

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4804:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> [Umbrella] Improve test run duration
> 
>
> Key: YARN-4804
> URL: https://issues.apache.org/jira/browse/YARN-4804
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Priority: Major
>
> Our tests take a long time to run. e.g. the RM tests take 67 minutes. Given 
> our precommit builds run our tests against two Java versions, this issue is 
> exacerbated. 
> Filing this umbrella JIRA to address this. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4808) SchedulerNode can use a few more cosmetic changes

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4808:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> SchedulerNode can use a few more cosmetic changes
> -
>
> Key: YARN-4808
> URL: https://issues.apache.org/jira/browse/YARN-4808
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.8.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>Priority: Major
> Attachments: yarn-4808-1.patch, yarn-4808-2.patch
>
>
> We have made some cosmetic changes to SchedulerNode recently. While working 
> on YARN-4511, realized we could improve it a little more:
> # Remove volatile variables - don't see the need for them being volatile
> # Some methods end up doing very similar things, so consolidating them
> # Renaming totalResource to capacity. YARN-4511 plans to add inflatedCapacity 
> to include the un-utilized resources, and having two totals can be a little 
> confusing. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9354) Resources should be created with ResourceTypesTestHelper instead of TestUtils

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9354:
-
Target Version/s: 3.4.0  (was: 3.5.0)

> Resources should be created with ResourceTypesTestHelper instead of TestUtils
> -
>
> Key: YARN-9354
> URL: https://issues.apache.org/jira/browse/YARN-9354
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Trivial
>  Labels: newbie, newbie++
> Fix For: 3.3.0, 3.2.2, 3.4.0
>
> Attachments: YARN-9354.001.patch, YARN-9354.002.patch, 
> YARN-9354.003.patch, YARN-9354.004.patch, YARN-9354.branch-3.2.001.patch, 
> YARN-9354.branch-3.2.002.patch, YARN-9354.branch-3.2.003.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestUtils#createResource
>  has not identical, but very similar implementation to 
> org.apache.hadoop.yarn.resourcetypes.ResourceTypesTestHelper#newResource. 
> Since these 2 methods are doing the same essentially and 
> ResourceTypesTestHelper is newer and used more, TestUtils#createResource 
> should be replaced with ResourceTypesTestHelper#newResource with all 
> occurrence.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4843) [Umbrella] Revisit YARN ProtocolBuffer int32 usages that need to upgrade to int64

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4843:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> [Umbrella] Revisit YARN ProtocolBuffer int32 usages that need to upgrade to 
> int64
> -
>
> Key: YARN-4843
> URL: https://issues.apache.org/jira/browse/YARN-4843
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 3.0.0-alpha1
>Reporter: Wangda Tan
>Priority: Major
>
> This JIRA is to track all int32 usages in YARN's ProtocolBuffer APIs that we 
> possibly need to update to int64.
> One example is resource API. We use int32 for memory now, if a cluster has 
> 10k nodes, each node has 210G memory, we will get a negative total cluster 
> memory.
> We may have other fields may need to upgrade from int32 to int64. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4843) [Umbrella] Revisit YARN ProtocolBuffer int32 usages that need to upgrade to int64

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802752#comment-17802752
 ] 

Shilun Fan commented on YARN-4843:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> [Umbrella] Revisit YARN ProtocolBuffer int32 usages that need to upgrade to 
> int64
> -
>
> Key: YARN-4843
> URL: https://issues.apache.org/jira/browse/YARN-4843
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api
>Affects Versions: 3.0.0-alpha1
>Reporter: Wangda Tan
>Priority: Major
>
> This JIRA is to track all int32 usages in YARN's ProtocolBuffer APIs that we 
> possibly need to update to int64.
> One example is resource API. We use int32 for memory now, if a cluster has 
> 10k nodes, each node has 210G memory, we will get a negative total cluster 
> memory.
> We may have other fields may need to upgrade from int32 to int64. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-9354) Resources should be created with ResourceTypesTestHelper instead of TestUtils

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-9354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-9354:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Resources should be created with ResourceTypesTestHelper instead of TestUtils
> -
>
> Key: YARN-9354
> URL: https://issues.apache.org/jira/browse/YARN-9354
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Szilard Nemeth
>Assignee: Andras Gyori
>Priority: Trivial
>  Labels: newbie, newbie++
> Fix For: 3.3.0, 3.2.2, 3.4.0
>
> Attachments: YARN-9354.001.patch, YARN-9354.002.patch, 
> YARN-9354.003.patch, YARN-9354.004.patch, YARN-9354.branch-3.2.001.patch, 
> YARN-9354.branch-3.2.002.patch, YARN-9354.branch-3.2.003.patch
>
>
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestUtils#createResource
>  has not identical, but very similar implementation to 
> org.apache.hadoop.yarn.resourcetypes.ResourceTypesTestHelper#newResource. 
> Since these 2 methods are doing the same essentially and 
> ResourceTypesTestHelper is newer and used more, TestUtils#createResource 
> should be replaced with ResourceTypesTestHelper#newResource with all 
> occurrence.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802749#comment-17802749
 ] 

Shilun Fan commented on YARN-4971:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> RM fails to re-bind to wildcard IP after failover in multi homed clusters
> -
>
> Key: YARN-4971
> URL: https://issues.apache.org/jira/browse/YARN-4971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-4971.1.patch
>
>
> If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first 
> time the service becomes active binding to the wildcard works as expected. If 
> the service has transitioned from active to standby and then becomes active 
> again after failovers the service only binds to one of the ip addresses.
> There is a difference between the services inside the RM: it only seem to 
> happen for the services listening on ports: 8030 and 8032



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4988) Limit filter in ApplicationBaseProtocol#getApplications should return latest applications

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802748#comment-17802748
 ] 

Shilun Fan commented on YARN-4988:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Limit filter in ApplicationBaseProtocol#getApplications should return latest 
> applications
> -
>
> Key: YARN-4988
> URL: https://issues.apache.org/jira/browse/YARN-4988
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-4988-wip.patch
>
>
> When ever limit filter is used to get application report using 
> ApplicationBaseProtocol#getApplications, the applications retrieved are not 
> the latest. The retrieved applications are random based on the hashcode. 
> The reason for above problem is RM maintains the apps in MAP where in 
> insertion of application id is based on the hashcode. So if there are 10 
> applications from app-1 to app-10 and then limit is 5, then supposed to 
> expect that applications from app-6 to app-10 should be retrieved. But now 
> some first 5 apps in the MAP are retrieved. So applications retrieved are 
> random 5!!
> I think limit should retrieve latest applications only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4953) Delete completed container log folder when rolling log aggregation is enabled

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4953:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Delete completed container log folder when rolling log aggregation is enabled
> -
>
> Key: YARN-4953
> URL: https://issues.apache.org/jira/browse/YARN-4953
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
>
> There would be potential bottle neck when cluster is running with very large 
> number of containers on the same NodeManager for single application. The 
> linux limits the subfolders count to 32K. If number of containers is greater 
> than 32K for an application, there would be container launch failure. At this 
> point of time, there are no more containers can be launched in this node.
> Currently log folders are deleted after app is finished. Rolling log 
> aggregation aggregates logs to hdfs periodically. 
> I think if aggregation is completed for finished containers, then clean up 
> can be done i.e deleting log folder for finished containers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4969) Fix more loggings in CapacityScheduler

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802750#comment-17802750
 ] 

Shilun Fan commented on YARN-4969:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Fix more loggings in CapacityScheduler
> --
>
> Key: YARN-4969
> URL: https://issues.apache.org/jira/browse/YARN-4969
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>  Labels: oct16-easy
> Attachments: YARN-4969.1.patch
>
>
> YARN-3966 did logging cleanup for Capacity Scheduler before, however, 
> there're some loggings we need to improvement:
> Container allocation / complete / reservation / un-reserve messages for every 
> hierarchy (app/leaf/parent-queue) should be printed at INFO level:
> I'm debugging one issue that root queue's resource usage could be negative, 
> it is very hard to reproduce, so we cannot enable debug logging since RM 
> start, size of log cannot be fit in a single disk.
> Existing CS prints INFO message when container cannot be allocated, such as 
> re-reservation / node heartbeat, etc. we should avoid printing such message 
> at INFO level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4988) Limit filter in ApplicationBaseProtocol#getApplications should return latest applications

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4988:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Limit filter in ApplicationBaseProtocol#getApplications should return latest 
> applications
> -
>
> Key: YARN-4988
> URL: https://issues.apache.org/jira/browse/YARN-4988
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
>  Labels: oct16-medium
> Attachments: YARN-4988-wip.patch
>
>
> When ever limit filter is used to get application report using 
> ApplicationBaseProtocol#getApplications, the applications retrieved are not 
> the latest. The retrieved applications are random based on the hashcode. 
> The reason for above problem is RM maintains the apps in MAP where in 
> insertion of application id is based on the hashcode. So if there are 10 
> applications from app-1 to app-10 and then limit is 5, then supposed to 
> expect that applications from app-6 to app-10 should be retrieved. But now 
> some first 5 apps in the MAP are retrieved. So applications retrieved are 
> random 5!!
> I think limit should retrieve latest applications only.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4969) Fix more loggings in CapacityScheduler

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4969:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Fix more loggings in CapacityScheduler
> --
>
> Key: YARN-4969
> URL: https://issues.apache.org/jira/browse/YARN-4969
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>  Labels: oct16-easy
> Attachments: YARN-4969.1.patch
>
>
> YARN-3966 did logging cleanup for Capacity Scheduler before, however, 
> there're some loggings we need to improvement:
> Container allocation / complete / reservation / un-reserve messages for every 
> hierarchy (app/leaf/parent-queue) should be printed at INFO level:
> I'm debugging one issue that root queue's resource usage could be negative, 
> it is very hard to reproduce, so we cannot enable debug logging since RM 
> start, size of log cannot be fit in a single disk.
> Existing CS prints INFO message when container cannot be allocated, such as 
> re-reservation / node heartbeat, etc. we should avoid printing such message 
> at INFO level.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4953) Delete completed container log folder when rolling log aggregation is enabled

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802751#comment-17802751
 ] 

Shilun Fan commented on YARN-4953:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Delete completed container log folder when rolling log aggregation is enabled
> -
>
> Key: YARN-4953
> URL: https://issues.apache.org/jira/browse/YARN-4953
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Rohith Sharma K S
>Assignee: Rohith Sharma K S
>Priority: Major
>
> There would be potential bottle neck when cluster is running with very large 
> number of containers on the same NodeManager for single application. The 
> linux limits the subfolders count to 32K. If number of containers is greater 
> than 32K for an application, there would be container launch failure. At this 
> point of time, there are no more containers can be launched in this node.
> Currently log folders are deleted after app is finished. Rolling log 
> aggregation aggregates logs to hdfs periodically. 
> I think if aggregation is completed for finished containers, then clean up 
> can be done i.e deleting log folder for finished containers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-4971:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> RM fails to re-bind to wildcard IP after failover in multi homed clusters
> -
>
> Key: YARN-4971
> URL: https://issues.apache.org/jira/browse/YARN-4971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-4971.1.patch
>
>
> If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first 
> time the service becomes active binding to the wildcard works as expected. If 
> the service has transitioned from active to standby and then becomes active 
> again after failovers the service only binds to one of the ip addresses.
> There is a difference between the services inside the RM: it only seem to 
> happen for the services listening on ports: 8030 and 8032



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5205) yarn logs for live applications does not provide log files which may have already been aggregated

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5205:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> yarn logs for live applications does not provide log files which may have 
> already been aggregated
> -
>
> Key: YARN-5205
> URL: https://issues.apache.org/jira/browse/YARN-5205
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Siddharth Seth
>Priority: Major
>
> With periodic aggregation enabled, the logs which have been partially 
> aggregated are not always displayed by the yarn logs command.
> If the file exists in the log dir for a container - all previously aggregated 
> files with the same name, along with the current file will be part of the 
> yarn log output.
> Files which have been previously aggregated, for which a file with the same 
> name does not exists in the container log dir do not show up in the output.
> After the app completes, all logs are available.
> cc [~xgong]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5414) Integrate NodeQueueLoadMonitor with ClusterNodeTracker

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802745#comment-17802745
 ] 

Shilun Fan commented on YARN-5414:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Integrate NodeQueueLoadMonitor with ClusterNodeTracker
> --
>
> Key: YARN-5414
> URL: https://issues.apache.org/jira/browse/YARN-5414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: container-queuing, distributed-scheduling, scheduler
>Reporter: Arun Suresh
>Assignee: Abhishek Modi
>Priority: Major
>
> The {{ClusterNodeTracker}} tracks the states of clusterNodes and provides 
> convenience methods like sort and filter.
> The {{NodeQueueLoadMonitor}} should use the {{ClusterNodeTracker}} instead of 
> maintaining its own data-structure of node information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5414) Integrate NodeQueueLoadMonitor with ClusterNodeTracker

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5414:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Integrate NodeQueueLoadMonitor with ClusterNodeTracker
> --
>
> Key: YARN-5414
> URL: https://issues.apache.org/jira/browse/YARN-5414
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: container-queuing, distributed-scheduling, scheduler
>Reporter: Arun Suresh
>Assignee: Abhishek Modi
>Priority: Major
>
> The {{ClusterNodeTracker}} tracks the states of clusterNodes and provides 
> convenience methods like sort and filter.
> The {{NodeQueueLoadMonitor}} should use the {{ClusterNodeTracker}} instead of 
> maintaining its own data-structure of node information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802744#comment-17802744
 ] 

Shilun Fan commented on YARN-5464:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, yarn
>Reporter: Robert Kanter
>Assignee: Gergely Pollák
>Priority: Major
> Attachments: YARN-5464.001.patch, YARN-5464.002.patch, 
> YARN-5464.003.patch, YARN-5464.004.patch, YARN-5464.005.patch, 
> YARN-5464.006.patch, YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5464) Server-Side NM Graceful Decommissioning with RM HA

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5464:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Server-Side NM Graceful Decommissioning with RM HA
> --
>
> Key: YARN-5464
> URL: https://issues.apache.org/jira/browse/YARN-5464
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful, yarn
>Reporter: Robert Kanter
>Assignee: Gergely Pollák
>Priority: Major
> Attachments: YARN-5464.001.patch, YARN-5464.002.patch, 
> YARN-5464.003.patch, YARN-5464.004.patch, YARN-5464.005.patch, 
> YARN-5464.006.patch, YARN-5464.wip.patch
>
>
> Make sure to remove the note added by YARN-7094 about RM HA failover not 
> working right.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5194) Avoid adding yarn-site to all Configuration instances created by the JVM

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5194:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Avoid adding yarn-site to all Configuration instances created by the JVM
> 
>
> Key: YARN-5194
> URL: https://issues.apache.org/jira/browse/YARN-5194
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Priority: Major
>
> {code}
> static {
> addDeprecatedKeys();
> Configuration.addDefaultResource(YARN_DEFAULT_CONFIGURATION_FILE);
> Configuration.addDefaultResource(YARN_SITE_CONFIGURATION_FILE);
>   }
> {code}
> This puts the contents of yarn-default and yarn-site into every configuration 
> instance created in the VM after YarnConfiguration has been initialized.
> This should be changed to a local addResource for the specific 
> YarnConfiguration instance, instead of polluting every Configuration instance.
> Incompatible change. Have set the target version to 3.x. 
> The same applies to HdfsConfiguration (hdfs-site.xml), and Configuration 
> (core-site.xml etc).
> core-site may be worth including everywhere, however it would be better to 
> expect users to explicitly add the relevant resources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5536) Multiple format support (JSON, etc.) for exclude node file in NM graceful decommission with timeout

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802742#comment-17802742
 ] 

Shilun Fan commented on YARN-5536:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Multiple format support (JSON, etc.) for exclude node file in NM graceful 
> decommission with timeout
> ---
>
> Key: YARN-5536
> URL: https://issues.apache.org/jira/browse/YARN-5536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Priority: Major
>
> Per discussion in YARN-4676, we agree that multiple format (other than xml) 
> should be supported to decommission nodes with timeout values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5465:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Server-Side NM Graceful Decommissioning subsequent call behavior
> 
>
> Key: YARN-5465
> URL: https://issues.apache.org/jira/browse/YARN-5465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Priority: Major
>
> The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has 
> the following behavior when subsequent calls are made:
> # Start a long-running job that has containers running on nodeA
> # Add nodeA to the exclude file
> # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA
> # Wait 30 seconds
> # Add nodeB to the exclude file
> # Run {{-refreshNodes -g 30 -server}} (30sec)
> # After 30 seconds, both nodeA and nodeB shut down
> In a nutshell, issuing a subsequent call to gracefully decommission nodes 
> updates the timeout for any currently decommissioning nodes.  This makes it 
> impossible to gracefully decommission different sets of nodes with different 
> timeouts.  Though it does let you easily update the timeout of currently 
> decommissioning nodes.
> Another behavior we could do is this:
> # {color:grey}Start a long-running job that has containers running on nodeA
> # {color:grey}Add nodeA to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA{color}
> # {color:grey}Wait 30 seconds{color}
> # {color:grey}Add nodeB to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color}
> # After 30 seconds, nodeB shuts down
> # After 60 more seconds, nodeA shuts down
> This keeps the nodes affected by each call to gracefully decommission nodes 
> independent.  You can now have different sets of decommissioning nodes with 
> different timeouts.  However, to update the timeout of a currently 
> decommissioning node, you'd have to first recommission it, and then 
> decommission it again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5536) Multiple format support (JSON, etc.) for exclude node file in NM graceful decommission with timeout

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5536:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Multiple format support (JSON, etc.) for exclude node file in NM graceful 
> decommission with timeout
> ---
>
> Key: YARN-5536
> URL: https://issues.apache.org/jira/browse/YARN-5536
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Junping Du
>Priority: Major
>
> Per discussion in YARN-4676, we agree that multiple format (other than xml) 
> should be supported to decommission nodes with timeout values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5625) FairScheduler should use FSContext more aggressively to avoid constructors with many parameters

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802741#comment-17802741
 ] 

Shilun Fan commented on YARN-5625:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> FairScheduler should use FSContext more aggressively to avoid constructors 
> with many parameters
> ---
>
> Key: YARN-5625
> URL: https://issues.apache.org/jira/browse/YARN-5625
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Priority: Major
>
> YARN-5609 introduces FSContext, a structure to capture basic FairScheduler 
> information. In addition to preemption details, it could host references to 
> the scheduler, QueueManager, AllocationConfiguration etc. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5465) Server-Side NM Graceful Decommissioning subsequent call behavior

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802743#comment-17802743
 ] 

Shilun Fan commented on YARN-5465:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Server-Side NM Graceful Decommissioning subsequent call behavior
> 
>
> Key: YARN-5465
> URL: https://issues.apache.org/jira/browse/YARN-5465
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: graceful
>Reporter: Robert Kanter
>Priority: Major
>
> The Server-Side NM Graceful Decommissioning feature added by YARN-4676 has 
> the following behavior when subsequent calls are made:
> # Start a long-running job that has containers running on nodeA
> # Add nodeA to the exclude file
> # Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA
> # Wait 30 seconds
> # Add nodeB to the exclude file
> # Run {{-refreshNodes -g 30 -server}} (30sec)
> # After 30 seconds, both nodeA and nodeB shut down
> In a nutshell, issuing a subsequent call to gracefully decommission nodes 
> updates the timeout for any currently decommissioning nodes.  This makes it 
> impossible to gracefully decommission different sets of nodes with different 
> timeouts.  Though it does let you easily update the timeout of currently 
> decommissioning nodes.
> Another behavior we could do is this:
> # {color:grey}Start a long-running job that has containers running on nodeA
> # {color:grey}Add nodeA to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 120 -server}} (2min) to begin gracefully 
> decommissioning nodeA{color}
> # {color:grey}Wait 30 seconds{color}
> # {color:grey}Add nodeB to the exclude file{color}
> # {color:grey}Run {{-refreshNodes -g 30 -server}} (30sec){color}
> # After 30 seconds, nodeB shuts down
> # After 60 more seconds, nodeA shuts down
> This keeps the nodes affected by each call to gracefully decommission nodes 
> independent.  You can now have different sets of decommissioning nodes with 
> different timeouts.  However, to update the timeout of a currently 
> decommissioning node, you'd have to first recommission it, and then 
> decommission it again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5625) FairScheduler should use FSContext more aggressively to avoid constructors with many parameters

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5625:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> FairScheduler should use FSContext more aggressively to avoid constructors 
> with many parameters
> ---
>
> Key: YARN-5625
> URL: https://issues.apache.org/jira/browse/YARN-5625
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: fairscheduler
>Affects Versions: 2.9.0
>Reporter: Karthik Kambatla
>Priority: Major
>
> YARN-5609 introduces FSContext, a structure to capture basic FairScheduler 
> information. In addition to preemption details, it could host references to 
> the scheduler, QueueManager, AllocationConfiguration etc. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5674) FairScheduler handles "dots" in user names inconsistently in the config

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5674:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> FairScheduler handles "dots" in user names inconsistently in the config
> ---
>
> Key: YARN-5674
> URL: https://issues.apache.org/jira/browse/YARN-5674
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> A user name can contain a dot because it could be used as the queue name we 
> replace the dot with a defined separator. When defining queues in the 
> configuration for users containing a dot we expect that the dot is replaced 
> by the "\_dot\_" string.
> In the user limits we do not do that and user limits need a normal dot in the 
> user name. This is confusing when you create a scheduler configuration in 
> some places you need to replace in others you do not. This can cause issue 
> when user limits are not enforced as expected.
> We should use one way to specify the user and since the queue naming can not 
> be changed we should also use the same "\_dot\_" in the user limits and 
> enforce correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5683) Support specifying storage type for per-application local dirs

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802739#comment-17802739
 ] 

Shilun Fan commented on YARN-5683:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: oct16-hard
> Attachments: YARN-5683-1.patch, YARN-5683-2.patch, YARN-5683-3.patch, 
> flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png
>
>
> h3.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local directories.
> h3.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories, and choose to not launch container if fallback 
> through these environment variables (ENSURE_LOCAL_STORAGE_TYPE and 
> ENSURE_LOG_STORAGE_TYPE).
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories and fallback 
> strategy (MapReduce configurations: mapreduce.job.local-storage-type, 
> mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and 
> mapreduce.job.ensure-log-storage-type).
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce-2.png!
> h3.  Further Discussion
> * Scheduling : The requirement of storage type for local/log directories may 
> not be satisfied for a part of nodes on heterogeneous clusters. To achieve 
> global optimum, scheduler should aware and manage disk resources. 
> ** Approach-1: Based on node attributes (YARN-3409), Scheduler can allocate 
> containers which have SSD requirement on nodes with attribute:ssd=true.
> ** Approach-2: Based on extended resource model (YARN-3926), it's easy to 
> support scheduling through extending resource models like vdisk and vssd 
> using this feature, but hard to measure for applications and isolate for 
> non-CFQ based disks.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle? We have implemented a fallback strategy that fail 
> to launch container when none of desired storage type disk 

[jira] [Commented] (YARN-5674) FairScheduler handles "dots" in user names inconsistently in the config

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802740#comment-17802740
 ] 

Shilun Fan commented on YARN-5674:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> FairScheduler handles "dots" in user names inconsistently in the config
> ---
>
> Key: YARN-5674
> URL: https://issues.apache.org/jira/browse/YARN-5674
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.6.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> A user name can contain a dot because it could be used as the queue name we 
> replace the dot with a defined separator. When defining queues in the 
> configuration for users containing a dot we expect that the dot is replaced 
> by the "\_dot\_" string.
> In the user limits we do not do that and user limits need a normal dot in the 
> user name. This is confusing when you create a scheduler configuration in 
> some places you need to replace in others you do not. This can cause issue 
> when user limits are not enforced as expected.
> We should use one way to specify the user and since the queue naming can not 
> be changed we should also use the same "\_dot\_" in the user limits and 
> enforce correctly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5683) Support specifying storage type for per-application local dirs

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5683:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Support specifying storage type for per-application local dirs
> --
>
> Key: YARN-5683
> URL: https://issues.apache.org/jira/browse/YARN-5683
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Tao Yang
>Assignee: Tao Yang
>Priority: Major
>  Labels: oct16-hard
> Attachments: YARN-5683-1.patch, YARN-5683-2.patch, YARN-5683-3.patch, 
> flow_diagram_for_MapReduce-2.png, flow_diagram_for_MapReduce.png
>
>
> h3.  Introduction
> * Some applications of various frameworks (Flink, Spark and MapReduce etc) 
> using local storage (checkpoint, shuffle etc) might require high IO 
> performance. It's useful to allocate local directories to high performance 
> storage media for these applications on heterogeneous clusters.
> * YARN does not distinguish different storage types and hence applications 
> cannot selectively use storage media with different performance 
> characteristics. Adding awareness of storage media can allow YARN to make 
> better decisions about the placement of local directories.
> h3.  Approach
> * NodeManager will distinguish storage types for local directories.
> ** yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs configuration 
> should allow the cluster administrator to optionally specify the storage type 
> for each local directories. Example: 
> [SSD]/disk1/nm-local-dir,/disk2/nm-local-dir,/disk3/nm-local-dir (equals to 
> [SSD]/disk1/nm-local-dir,[DISK]/disk2/nm-local-dir,[DISK]/disk3/nm-local-dir)
> ** StorageType defines DISK/SSD storage types and takes DISK as the default 
> storage type. 
> ** StorageLocation separates storage type and directory path, used by 
> LocalDirAllocator to aware the types of local dirs, the default storage type 
> is DISK.
> ** getLocalPathForWrite method of LocalDirAllcator will prefer to choose the 
> local directory of the specified storage type, and will fallback to not care 
> storage type if the requirement can not be satisfied.
> ** Support for container related local/log directories by ContainerLaunch. 
> All application frameworks can set the environment variables 
> (LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE) to specified the desired storage 
> type of local/log directories, and choose to not launch container if fallback 
> through these environment variables (ENSURE_LOCAL_STORAGE_TYPE and 
> ENSURE_LOG_STORAGE_TYPE).
> * Allow specified storage type for various frameworks (Take MapReduce as an 
> example)
> ** Add new configurations should allow application administrator to 
> optionally specify the storage type of local/log directories and fallback 
> strategy (MapReduce configurations: mapreduce.job.local-storage-type, 
> mapreduce.job.log-storage-type, mapreduce.job.ensure-local-storage-type and 
> mapreduce.job.ensure-log-storage-type).
> ** Support for container work directories. Set the environment variables 
> includes LOCAL_STORAGE_TYPE and LOG_STORAGE_TYPE according to configurations 
> above for ContainerLaunchContext and ApplicationSubmissionContext. (MapReduce 
> should update YARNRunner and TaskAttemptImpl)
> ** Add storage type prefix for request path to support for other local 
> directories of frameworks (such as shuffle directories for MapReduce). 
> (MapReduce should update YarnOutputFiles, MROutputFiles and YarnChild to 
> support for output/work directories)
> ** Flow diagram for MapReduce framework
> !flow_diagram_for_MapReduce-2.png!
> h3.  Further Discussion
> * Scheduling : The requirement of storage type for local/log directories may 
> not be satisfied for a part of nodes on heterogeneous clusters. To achieve 
> global optimum, scheduler should aware and manage disk resources. 
> ** Approach-1: Based on node attributes (YARN-3409), Scheduler can allocate 
> containers which have SSD requirement on nodes with attribute:ssd=true.
> ** Approach-2: Based on extended resource model (YARN-3926), it's easy to 
> support scheduling through extending resource models like vdisk and vssd 
> using this feature, but hard to measure for applications and isolate for 
> non-CFQ based disks.
> * Fallback strategy still needs to be concerned. Certain applications might 
> not work well when the requirement of storage type is not satisfied. When 
> none of desired storage type disk are available, should container launching 
> be failed? let AM handle? We have implemented a fallback strategy that fail 
> to launch container when none of desired storage type disk are available. Is 
> there some better methods? 
> This feature has been used for half a year to meet 

[jira] [Updated] (YARN-5814) Add druid as storage backend in YARN Timeline Service

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5814:
-
Target Version/s: 3.5.0  (was: 3.4.0)

>  Add druid as storage backend in YARN Timeline Service
> --
>
> Key: YARN-5814
> URL: https://issues.apache.org/jira/browse/YARN-5814
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: ATSv2
>Affects Versions: 3.0.0-alpha2
>Reporter: Bingxue Qiu
>Priority: Major
> Attachments: Add-Druid-in-YARN-Timeline-Service.pdf
>
>
> h3. Introduction
> I propose to add druid as storage backend in YARN Timeline Service.
> We run more than 6000 applications and generate 450 million metrics daily in 
> Alibaba Clusters with thousands of nodes. We need to collect and store 
> meta/events/metrics data, online analyze the utilization reports of various 
> dimensions and display the trends of allocation/usage resources for cluster 
> by joining and aggregating data. It helps us to manage and optimize the 
> cluster by tracking resource utilization.
> To achieve our goal we have changed to use druid as the storage instead of 
> HBase and have achieved sub-second OLAP performance in our production 
> environment for few months. 
> h3. Analysis
> Currently YARN Timeline Service only supports aggregating metrics at a) flow 
> level by FlowRunCoprocessor and b) application level metrics aggregating by 
> AppLevelTimelineCollector, offline (time-based periodic) aggregation for 
> flows/users/queues for reporting and analysis is planned but not yet 
> implemented. YARN Timeline Service chooses Apache HBase as the primary 
> storage backend. As we all know that HBase doesn't fit for OLAP.
>  For arbitrary exploration of data,such as online analyze the utilization 
> reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by 
> joining and aggregating data, Druid's custom column format enables ad-hoc 
> queries without pre-computation. The format also enables fast scans on 
> columns, which is important for good aggregation performance.
> To achieve our goal that support to online analyze the utilization reports of 
> various dimensions, display the variation trends of allocation/usage 
> resources for cluster, and arbitrary exploration of data, we propose to add 
> druid storage and implement DruidWriter /DruidReader in YARN Timeline Service.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5814) Add druid as storage backend in YARN Timeline Service

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802738#comment-17802738
 ] 

Shilun Fan commented on YARN-5814:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

>  Add druid as storage backend in YARN Timeline Service
> --
>
> Key: YARN-5814
> URL: https://issues.apache.org/jira/browse/YARN-5814
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: ATSv2
>Affects Versions: 3.0.0-alpha2
>Reporter: Bingxue Qiu
>Priority: Major
> Attachments: Add-Druid-in-YARN-Timeline-Service.pdf
>
>
> h3. Introduction
> I propose to add druid as storage backend in YARN Timeline Service.
> We run more than 6000 applications and generate 450 million metrics daily in 
> Alibaba Clusters with thousands of nodes. We need to collect and store 
> meta/events/metrics data, online analyze the utilization reports of various 
> dimensions and display the trends of allocation/usage resources for cluster 
> by joining and aggregating data. It helps us to manage and optimize the 
> cluster by tracking resource utilization.
> To achieve our goal we have changed to use druid as the storage instead of 
> HBase and have achieved sub-second OLAP performance in our production 
> environment for few months. 
> h3. Analysis
> Currently YARN Timeline Service only supports aggregating metrics at a) flow 
> level by FlowRunCoprocessor and b) application level metrics aggregating by 
> AppLevelTimelineCollector, offline (time-based periodic) aggregation for 
> flows/users/queues for reporting and analysis is planned but not yet 
> implemented. YARN Timeline Service chooses Apache HBase as the primary 
> storage backend. As we all know that HBase doesn't fit for OLAP.
>  For arbitrary exploration of data,such as online analyze the utilization 
> reports of various dimensions(Queue,Flow,Users,Application,CPU,Memory) by 
> joining and aggregating data, Druid's custom column format enables ad-hoc 
> queries without pre-computation. The format also enables fast scans on 
> columns, which is important for good aggregation performance.
> To achieve our goal that support to online analyze the utilization reports of 
> various dimensions, display the variation trends of allocation/usage 
> resources for cluster, and arbitrary exploration of data, we propose to add 
> druid storage and implement DruidWriter /DruidReader in YARN Timeline Service.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5902) yarn.scheduler.increment-allocation-mb and yarn.scheduler.increment-allocation-vcores are undocumented in yarn-default.xml

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5902:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> yarn.scheduler.increment-allocation-mb and 
> yarn.scheduler.increment-allocation-vcores are undocumented in 
> yarn-default.xml
> --
>
> Key: YARN-5902
> URL: https://issues.apache.org/jira/browse/YARN-5902
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: YARN-5902.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5902) yarn.scheduler.increment-allocation-mb and yarn.scheduler.increment-allocation-vcores are undocumented in yarn-default.xml

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802735#comment-17802735
 ] 

Shilun Fan commented on YARN-5902:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> yarn.scheduler.increment-allocation-mb and 
> yarn.scheduler.increment-allocation-vcores are undocumented in 
> yarn-default.xml
> --
>
> Key: YARN-5902
> URL: https://issues.apache.org/jira/browse/YARN-5902
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Daniel Templeton
>Assignee: Daniel Templeton
>Priority: Major
> Attachments: YARN-5902.001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5852) Consolidate CSAssignment, ContainerAllocation, ContainerAllocationContext class in CapacityScheduler

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802736#comment-17802736
 ] 

Shilun Fan commented on YARN-5852:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Consolidate CSAssignment, ContainerAllocation, ContainerAllocationContext 
> class in CapacityScheduler
> 
>
> Key: YARN-5852
> URL: https://issues.apache.org/jira/browse/YARN-5852
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Priority: Major
>
> Quite a few data structures which wraps container related info with similar 
> names: CSAssignment, ContainerAllocation, ContainerAllocationContext, And a 
> bunch of code to convert one from another. we should consolidate those to be 
> a single one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5852) Consolidate CSAssignment, ContainerAllocation, ContainerAllocationContext class in CapacityScheduler

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5852:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Consolidate CSAssignment, ContainerAllocation, ContainerAllocationContext 
> class in CapacityScheduler
> 
>
> Key: YARN-5852
> URL: https://issues.apache.org/jira/browse/YARN-5852
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Priority: Major
>
> Quite a few data structures which wraps container related info with similar 
> names: CSAssignment, ContainerAllocation, ContainerAllocationContext, And a 
> bunch of code to convert one from another. we should consolidate those to be 
> a single one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802734#comment-17802734
 ] 

Shilun Fan commented on YARN-5995:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>Priority: Major
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.0003.patch, YARN-5995.0004.patch, YARN-5995.0005.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-5995) Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition performance

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-5995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-5995:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance
> ---
>
> Key: YARN-5995
> URL: https://issues.apache.org/jira/browse/YARN-5995
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: metrics, resourcemanager
>Affects Versions: 2.7.1
> Environment: CentOS7.2 Hadoop-2.7.1 
>Reporter: zhangyubiao
>Assignee: zhangyubiao
>Priority: Major
>  Labels: patch
> Attachments: YARN-5995.0001.patch, YARN-5995.0002.patch, 
> YARN-5995.0003.patch, YARN-5995.0004.patch, YARN-5995.0005.patch, 
> YARN-5995.patch
>
>
> Add RMStateStore metrics to monitor all RMStateStoreEventTypeTransition 
> performance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6147) Blacklisting nodes not happening for AM containers

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802732#comment-17802732
 ] 

Shilun Fan commented on YARN-6147:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Blacklisting nodes not happening for AM containers
> --
>
> Key: YARN-6147
> URL: https://issues.apache.org/jira/browse/YARN-6147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
>
> Black Listing of nodes are not happening in the following scenarios
> 1. RMAppattempt is in ALLOCATED and LAUNCH_FAILED event comes when NM is down.
> 2. RMAppattempt is in LAUNCHED and EXPIRE event comes when NM is down.
> In both these cases AppAttempt goes to *FINAL_SAVING* and eventually to 
> *FINAL* state before *CONTAINER_FINISHED* event is triggered by 
> {{RMContainerImpl}} and in the {{FINAL}} state {{CONTAINER_FINISHED}} event 
> is ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6147) Blacklisting nodes not happening for AM containers

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6147:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Blacklisting nodes not happening for AM containers
> --
>
> Key: YARN-6147
> URL: https://issues.apache.org/jira/browse/YARN-6147
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Major
>
> Black Listing of nodes are not happening in the following scenarios
> 1. RMAppattempt is in ALLOCATED and LAUNCH_FAILED event comes when NM is down.
> 2. RMAppattempt is in LAUNCHED and EXPIRE event comes when NM is down.
> In both these cases AppAttempt goes to *FINAL_SAVING* and eventually to 
> *FINAL* state before *CONTAINER_FINISHED* event is triggered by 
> {{RMContainerImpl}} and in the {{FINAL}} state {{CONTAINER_FINISHED}} event 
> is ignored.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6088) RM UI has to redirect to AHS for completed applications logs

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6088:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> RM UI has to redirect to AHS for completed applications logs
> 
>
> Key: YARN-6088
> URL: https://issues.apache.org/jira/browse/YARN-6088
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: webapp
>Affects Versions: 2.7.3
>Reporter: Sunil G
>Priority: Major
>
> Currently AMContainer logs link in RMAppBlock is hardcoded containers' host 
> node. If that node unavailable, we will not have enough information.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6382) Address race condition on TimelineWriter.flush() caused by buffer-sized flush

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6382:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Address race condition on TimelineWriter.flush() caused by buffer-sized flush
> -
>
> Key: YARN-6382
> URL: https://issues.apache.org/jira/browse/YARN-6382
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Yousef Abu-Salah
>Priority: Major
>
> YARN-6376 fixes the race condition between putEntities() and periodical 
> flush() by WriterFlushThread in TimelineCollectorManager, or between 
> putEntities() in different threads.
> However, BufferedMutator can have internal size-based flush as well. We need 
> to address the resulting race condition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6315:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, 
> YARN-6315.006.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6315) Improve LocalResourcesTrackerImpl#isResourcePresent to return false for corrupted files

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802727#comment-17802727
 ] 

Shilun Fan commented on YARN-6315:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.


> Improve LocalResourcesTrackerImpl#isResourcePresent to return false for 
> corrupted files
> ---
>
> Key: YARN-6315
> URL: https://issues.apache.org/jira/browse/YARN-6315
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.3, 2.8.1
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: YARN-6315.001.patch, YARN-6315.002.patch, 
> YARN-6315.003.patch, YARN-6315.004.patch, YARN-6315.005.patch, 
> YARN-6315.006.patch
>
>
> We currently check if a resource is present by making sure that the file 
> exists locally. There can be a case where the LocalizationTracker thinks that 
> it has the resource if the file exists but with size 0 or less than the 
> "expected" size of the LocalResource. This JIRA tracks the change to harden 
> the isResourcePresent call to address that case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6382) Address race condition on TimelineWriter.flush() caused by buffer-sized flush

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802725#comment-17802725
 ] 

Shilun Fan commented on YARN-6382:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Address race condition on TimelineWriter.flush() caused by buffer-sized flush
> -
>
> Key: YARN-6382
> URL: https://issues.apache.org/jira/browse/YARN-6382
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Yousef Abu-Salah
>Priority: Major
>
> YARN-6376 fixes the race condition between putEntities() and periodical 
> flush() by WriterFlushThread in TimelineCollectorManager, or between 
> putEntities() in different threads.
> However, BufferedMutator can have internal size-based flush as well. We need 
> to address the resulting race condition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6409) RM does not blacklist node for AM launch failures

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802724#comment-17802724
 ] 

Shilun Fan commented on YARN-6409:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> RM does not blacklist node for AM launch failures
> -
>
> Key: YARN-6409
> URL: https://issues.apache.org/jira/browse/YARN-6409
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Kanwaljeet Sachdev
>Priority: Major
> Attachments: YARN-6409.00.patch, YARN-6409.01.patch, 
> YARN-6409.02.patch, YARN-6409.03.patch
>
>
> Currently, node blacklisting upon AM failures only handles failures that 
> happen after AM container is launched (see 
> RMAppAttemptImpl.shouldCountTowardsNodeBlacklisting()).  However, AM launch 
> can also fail if the NM, where the AM container is allocated, goes 
> unresponsive.  Because it is not handled, scheduler may continue to allocate 
> AM containers on that same NM for the following app attempts. 
> {code}
> Application application_1478721503753_0870 failed 2 times due to Error 
> launching appattempt_1478721503753_0870_02. Got exception: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/17.111.179.113:46702 remote=*.me.com/17.111.178.125:8041]; Host 
> Details : local host is: "*.me.com/17.111.179.113"; destination host is: 
> "*.me.com":8041; 
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1475) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1408) 
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>  
> at com.sun.proxy.$Proxy86.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
>  
> at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:497) 
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  
> at com.sun.proxy.$Proxy87.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:120)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:256)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: java.io.IOException: java.net.SocketTimeoutException: 6 millis 
> timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>  
> at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650)
>  
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738) 
> at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) 
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1447) 
> ... 15 more 
> Caused by: java.net.SocketTimeoutException: 6 millis timeout while 
> waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) 
> at java.io.FilterInputStream.read(FilterInputStream.java:133) 
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265) 
> at java.io.DataInputStream.readInt(DataInputStream.java:387) 
> at 
> 

[jira] [Commented] (YARN-6429) Revisit implementation of LocalitySchedulingPlacementSet to avoid invoke methods of AppSchedulingInfo

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802722#comment-17802722
 ] 

Shilun Fan commented on YARN-6429:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Revisit implementation of LocalitySchedulingPlacementSet to avoid invoke 
> methods of AppSchedulingInfo
> -
>
> Key: YARN-6429
> URL: https://issues.apache.org/jira/browse/YARN-6429
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> An example is, LocalitySchedulingPlacementSet#decrementOutstanding: it calls 
> appSchedulingInfo directly, which could potentially cause trouble since it 
> tries to modify parent from child. Is it possible to move this logic to 
> AppSchedulingInfo#allocate.
> Need to check other methods as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6426) Compress ZK YARN keys to scale up (especially AppStateData

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6426:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Compress ZK YARN keys to scale up (especially AppStateData
> --
>
> Key: YARN-6426
> URL: https://issues.apache.org/jira/browse/YARN-6426
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0-alpha2
>Reporter: Roni Burd
>Assignee: Roni Burd
>Priority: Major
>  Labels: patch
> Attachments: zkcompression.patch
>
>
> ZK today stores the protobuf files uncompressed. This is not an issue except 
> that if a customer job has thousands of files, AppStateData will store the 
> user context as a string with multiple URLs and it is easy to get to 1MB or 
> more. 
> This can put unnecessary strain on ZK and make the process slow. 
> The proposal is to simply compress protobufs before sending them to ZK



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6426) Compress ZK YARN keys to scale up (especially AppStateData

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802723#comment-17802723
 ] 

Shilun Fan commented on YARN-6426:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Compress ZK YARN keys to scale up (especially AppStateData
> --
>
> Key: YARN-6426
> URL: https://issues.apache.org/jira/browse/YARN-6426
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0-alpha2
>Reporter: Roni Burd
>Assignee: Roni Burd
>Priority: Major
>  Labels: patch
> Attachments: zkcompression.patch
>
>
> ZK today stores the protobuf files uncompressed. This is not an issue except 
> that if a customer job has thousands of files, AppStateData will store the 
> user context as a string with multiple URLs and it is easy to get to 1MB or 
> more. 
> This can put unnecessary strain on ZK and make the process slow. 
> The proposal is to simply compress protobufs before sending them to ZK



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6409) RM does not blacklist node for AM launch failures

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6409:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> RM does not blacklist node for AM launch failures
> -
>
> Key: YARN-6409
> URL: https://issues.apache.org/jira/browse/YARN-6409
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0-alpha2
>Reporter: Haibo Chen
>Assignee: Kanwaljeet Sachdev
>Priority: Major
> Attachments: YARN-6409.00.patch, YARN-6409.01.patch, 
> YARN-6409.02.patch, YARN-6409.03.patch
>
>
> Currently, node blacklisting upon AM failures only handles failures that 
> happen after AM container is launched (see 
> RMAppAttemptImpl.shouldCountTowardsNodeBlacklisting()).  However, AM launch 
> can also fail if the NM, where the AM container is allocated, goes 
> unresponsive.  Because it is not handled, scheduler may continue to allocate 
> AM containers on that same NM for the following app attempts. 
> {code}
> Application application_1478721503753_0870 failed 2 times due to Error 
> launching appattempt_1478721503753_0870_02. Got exception: 
> java.io.IOException: Failed on local exception: java.io.IOException: 
> java.net.SocketTimeoutException: 6 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/17.111.179.113:46702 remote=*.me.com/17.111.178.125:8041]; Host 
> Details : local host is: "*.me.com/17.111.179.113"; destination host is: 
> "*.me.com":8041; 
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1475) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1408) 
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>  
> at com.sun.proxy.$Proxy86.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
>  
> at sun.reflect.GeneratedMethodAccessor155.invoke(Unknown Source) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:497) 
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>  
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>  
> at com.sun.proxy.$Proxy87.startContainers(Unknown Source) 
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:120)
>  
> at 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:256)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  
> at java.lang.Thread.run(Thread.java:745) 
> Caused by: java.io.IOException: java.net.SocketTimeoutException: 6 millis 
> timeout while waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:422) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
>  
> at 
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650)
>  
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738) 
> at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) 
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1524) 
> at org.apache.hadoop.ipc.Client.call(Client.java:1447) 
> ... 15 more 
> Caused by: java.net.SocketTimeoutException: 6 millis timeout while 
> waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/17.111.179.113:46702 
> remote=*.me.com/17.111.178.125:8041] 
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) 
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) 
> at java.io.FilterInputStream.read(FilterInputStream.java:133) 
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246) 
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265) 
> at java.io.DataInputStream.readInt(DataInputStream.java:387) 
> at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367) 
> at 
> 

[jira] [Updated] (YARN-6429) Revisit implementation of LocalitySchedulingPlacementSet to avoid invoke methods of AppSchedulingInfo

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6429:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Revisit implementation of LocalitySchedulingPlacementSet to avoid invoke 
> methods of AppSchedulingInfo
> -
>
> Key: YARN-6429
> URL: https://issues.apache.org/jira/browse/YARN-6429
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> An example is, LocalitySchedulingPlacementSet#decrementOutstanding: it calls 
> appSchedulingInfo directly, which could potentially cause trouble since it 
> tries to modify parent from child. Is it possible to move this logic to 
> AppSchedulingInfo#allocate.
> Need to check other methods as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6466) Provide shaded framework jar for containers

2024-01-04 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-6466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated YARN-6466:
-
Target Version/s: 3.5.0  (was: 3.4.0)

> Provide shaded framework jar for containers
> ---
>
> Key: YARN-6466
> URL: https://issues.apache.org/jira/browse/YARN-6466
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: build, yarn
>Affects Versions: 3.0.0-alpha1
>Reporter: Sean Busbey
>Assignee: Haibo Chen
>Priority: Major
>
> We should build on the existing shading work to provide a jar with all of the 
> bits needed within a YARN application's container to talk to the resource 
> manager and node manager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6488) Remove continuous scheduling tests

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802720#comment-17802720
 ] 

Shilun Fan commented on YARN-6488:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> Remove continuous scheduling tests
> --
>
> Key: YARN-6488
> URL: https://issues.apache.org/jira/browse/YARN-6488
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: fairscheduler
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
>
> Remove all continuous scheduling tests from the code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6606) The implementation of LocalizationStatus in ContainerStatusProto

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-6606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802717#comment-17802717
 ] 

Shilun Fan commented on YARN-6606:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> The implementation of LocalizationStatus in ContainerStatusProto
> 
>
> Key: YARN-6606
> URL: https://issues.apache.org/jira/browse/YARN-6606
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: nodemanager
>Affects Versions: 2.9.0
>Reporter: Bingxue Qiu
>Priority: Major
> Attachments: YARN-6606.1.patch, YARN-6606.2.patch
>
>
> we have a use case, where the full implementation of localization status in 
> ContainerStatusProto 
> [Continuous-resource-localization|https://issues.apache.org/jira/secure/attachment/12825041/Continuous-resource-localization.pdf]
>need to be done , so we make it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



  1   2   3   >