[jira] [Commented] (YARN-3427) Remove deprecated methods from ResourceCalculatorProcessTree
[ https://issues.apache.org/jira/browse/YARN-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946414#comment-15946414 ] Hitesh Shah commented on YARN-3427: --- Thanks for the heads up [~dan...@cloudera.com]. \cc [~sseth] [~rajesh.balamohan] > Remove deprecated methods from ResourceCalculatorProcessTree > > > Key: YARN-3427 > URL: https://issues.apache.org/jira/browse/YARN-3427 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Blocker > Attachments: YARN-3427.000.patch, YARN-3427.001.patch > > > In 2.7, we made ResourceCalculatorProcessTree Public and exposed some > existing ill-formed methods as deprecated ones for use by Tez. > We should remove it in 3.0.0, considering that the methods have been > deprecated for the all 2.x.y releases that it is marked Public in. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1593) support out-of-proc AuxiliaryServices
[ https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665094#comment-15665094 ] Hitesh Shah commented on YARN-1593: --- Thanks [~vvasudev]. It does so partially. My concern is around the feedback loop in terms of failure handling by the apps when the system container dies at any of the following points: - system container dies before an allocated container is launched on that node - it dies while a container is running - it dies after a container has completed Would applications that define affinity to these system services now be getting updates (notifications) when system service containers go down or come back up? In addition to the feedback loop, is there any behavior change as a result of this? i.e. if the system container is not alive, will the app container still get launched given that its dependent service is down ( for shuffle, this might be ok if the system container eventually comes up but there might be other services that provide more synchronous functionality such as a caching layer? > support out-of-proc AuxiliaryServices > - > > Key: YARN-1593 > URL: https://issues.apache.org/jira/browse/YARN-1593 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, rolling upgrade >Reporter: Ming Ma >Assignee: Varun Vasudev > Attachments: SystemContainersandSystemServices.pdf > > > AuxiliaryServices such as ShuffleHandler currently run in the same process as > NM. There are some benefits to host them in dedicated processes. > 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the > ShuffleHandler restart. If ShuffleHandler runs as a separate process, > ShuffleHandler can continue to run during NM restart. NM can reconnect the > the running ShuffleHandler after restart. > 2. Resource management. It is possible another type of AuxiliaryServices will > be implemented. AuxiliaryServices are considered YARN application specific > and could consume lots of resources. Running AuxiliaryServices in separate > processes allow easier resource management. NM could potentially stop a > specific AuxiliaryServices process from running if it consumes resource way > above its allocation. > Here are some high level ideas: > 1. NM provides a hosting process for each AuxiliaryService. Existing > AuxiliaryService API doesn't change. > 2. The hosting process provides RPC server for AuxiliaryService proxy object > inside NM to connect to. > 3. When we rolling restart NM, the existing AuxiliaryService processes will > continue to run. NM could reconnect to the running AuxiliaryService processes > upon restart. > 4. Policy and resource management of AuxiliaryServices. So far we don't have > immediate need for this. AuxiliaryService could run inside a container and > its resource utilization could be taken into account by RM and RM could > consider a specific type of applications overutilize cluster resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-1593) support out-of-proc AuxiliaryServices
[ https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651723#comment-15651723 ] Hitesh Shah edited comment on YARN-1593 at 11/9/16 6:55 PM: [~vvasudev] One question on the design doc. The doc does not seem to cover how user applications can define dependencies on these system services. For example, how to ensure that an MR/Tez/xyz container that requires the shuffle service does not get launched on a node where the system service is not running. This has 2 aspects - firstly how to ensure container allocations happen on correct nodes where these services are running and secondly, the service might be down when the container actually gets launched and therefore how the behavior will change as a result ( does the container eventually fail, does the NM itself stop the launch of the container and send an error back, etc). Is this something that will be looked at later or should it be designed for from now itself to simplify the use of system services for user applications? was (Author: hitesh): [~vvasudev] One question on the design doc. The doc does not seem to cover how user applications can define dependencies on these system services. For example, how to ensure that an MR/Tez/xyz container that requires the shuffle service does not get launched on a node where the system service is not running. This has 2 aspects - firstly how to ensure container allocations happen on correct nodes where these services are running and secondly, the service might be down when the container actually gets launched and therefore how the behavior will change as a result ( does the container eventually fail, does the NM itself stop the launch of the container and send an error back, etc). > support out-of-proc AuxiliaryServices > - > > Key: YARN-1593 > URL: https://issues.apache.org/jira/browse/YARN-1593 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, rolling upgrade >Reporter: Ming Ma >Assignee: Varun Vasudev > Attachments: SystemContainersandSystemServices.pdf > > > AuxiliaryServices such as ShuffleHandler currently run in the same process as > NM. There are some benefits to host them in dedicated processes. > 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the > ShuffleHandler restart. If ShuffleHandler runs as a separate process, > ShuffleHandler can continue to run during NM restart. NM can reconnect the > the running ShuffleHandler after restart. > 2. Resource management. It is possible another type of AuxiliaryServices will > be implemented. AuxiliaryServices are considered YARN application specific > and could consume lots of resources. Running AuxiliaryServices in separate > processes allow easier resource management. NM could potentially stop a > specific AuxiliaryServices process from running if it consumes resource way > above its allocation. > Here are some high level ideas: > 1. NM provides a hosting process for each AuxiliaryService. Existing > AuxiliaryService API doesn't change. > 2. The hosting process provides RPC server for AuxiliaryService proxy object > inside NM to connect to. > 3. When we rolling restart NM, the existing AuxiliaryService processes will > continue to run. NM could reconnect to the running AuxiliaryService processes > upon restart. > 4. Policy and resource management of AuxiliaryServices. So far we don't have > immediate need for this. AuxiliaryService could run inside a container and > its resource utilization could be taken into account by RM and RM could > consider a specific type of applications overutilize cluster resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1593) support out-of-proc AuxiliaryServices
[ https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651723#comment-15651723 ] Hitesh Shah commented on YARN-1593: --- [~vvasudev] One question on the design doc. The doc does not seem to cover how user applications can define dependencies on these system services. For example, how to ensure that an MR/Tez/xyz container that requires the shuffle service does not get launched on a node where the system service is not running. This has 2 aspects - firstly how to ensure container allocations happen on correct nodes where these services are running and secondly, the service might be down when the container actually gets launched and therefore how the behavior will change as a result ( does the container eventually fail, does the NM itself stop the launch of the container and send an error back, etc). > support out-of-proc AuxiliaryServices > - > > Key: YARN-1593 > URL: https://issues.apache.org/jira/browse/YARN-1593 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, rolling upgrade >Reporter: Ming Ma >Assignee: Varun Vasudev > Attachments: SystemContainersandSystemServices.pdf > > > AuxiliaryServices such as ShuffleHandler currently run in the same process as > NM. There are some benefits to host them in dedicated processes. > 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the > ShuffleHandler restart. If ShuffleHandler runs as a separate process, > ShuffleHandler can continue to run during NM restart. NM can reconnect the > the running ShuffleHandler after restart. > 2. Resource management. It is possible another type of AuxiliaryServices will > be implemented. AuxiliaryServices are considered YARN application specific > and could consume lots of resources. Running AuxiliaryServices in separate > processes allow easier resource management. NM could potentially stop a > specific AuxiliaryServices process from running if it consumes resource way > above its allocation. > Here are some high level ideas: > 1. NM provides a hosting process for each AuxiliaryService. Existing > AuxiliaryService API doesn't change. > 2. The hosting process provides RPC server for AuxiliaryService proxy object > inside NM to connect to. > 3. When we rolling restart NM, the existing AuxiliaryService processes will > continue to run. NM could reconnect to the running AuxiliaryService processes > upon restart. > 4. Policy and resource management of AuxiliaryServices. So far we don't have > immediate need for this. AuxiliaryService could run inside a container and > its resource utilization could be taken into account by RM and RM could > consider a specific type of applications overutilize cluster resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5759) Capability to register for a notification/callback on the expiry of timeouts
[ https://issues.apache.org/jira/browse/YARN-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593093#comment-15593093 ] Hitesh Shah commented on YARN-5759: --- Will this address support for a post-app action executed by YARN after the application reaches an end state? i.e. somewhat like a finally block for a yarn app? > Capability to register for a notification/callback on the expiry of timeouts > > > Key: YARN-5759 > URL: https://issues.apache.org/jira/browse/YARN-5759 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Gour Saha > > There is a need for the YARN native services REST-API service, to take > certain actions once a timeout of an application expires. For example, an > immediate requirement is to destroy a Slider application, once its lifetime > timeout expires and YARN has stopped the application. Destroying a Slider > application means cleanup of Slider HDFS state store and ZK paths for that > application. > Potentially, there will be advanced requirements from the REST-API service > and other services in the future, which will make this feature very handy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5659) getPathFromYarnURL should use standard methods
[ https://issues.apache.org/jira/browse/YARN-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530407#comment-15530407 ] Hitesh Shah commented on YARN-5659: --- [~sershe] Just annotate the functions that are only required by unit tests with @Private and @VisibleForTesting > getPathFromYarnURL should use standard methods > -- > > Key: YARN-5659 > URL: https://issues.apache.org/jira/browse/YARN-5659 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: YARN-5659.01.patch, YARN-5659.02.patch, > YARN-5659.03.patch, YARN-5659.04.patch, YARN-5659.04.patch, YARN-5659.patch > > > getPathFromYarnURL does some string shenanigans where standard ctors should > suffice. > There are also bugs in it e.g. passing an empty scheme to the URI ctor is > invalid, null should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5659) getPathFromYarnURL should use standard methods
[ https://issues.apache.org/jira/browse/YARN-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15530407#comment-15530407 ] Hitesh Shah edited comment on YARN-5659 at 9/28/16 6:06 PM: [~sershe] Just annotate the functions that are only required by unit tests with @Private and @VisibleForTesting . In this case, the simplest approach would to use the above annotations for all the new functions that are added as part of this patch. was (Author: hitesh): [~sershe] Just annotate the functions that are only required by unit tests with @Private and @VisibleForTesting > getPathFromYarnURL should use standard methods > -- > > Key: YARN-5659 > URL: https://issues.apache.org/jira/browse/YARN-5659 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: YARN-5659.01.patch, YARN-5659.02.patch, > YARN-5659.03.patch, YARN-5659.04.patch, YARN-5659.04.patch, YARN-5659.patch > > > getPathFromYarnURL does some string shenanigans where standard ctors should > suffice. > There are also bugs in it e.g. passing an empty scheme to the URI ctor is > invalid, null should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-3877) YarnClientImpl.submitApplication swallows exceptions
[ https://issues.apache.org/jira/browse/YARN-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-3877: -- Target Version/s: 2.7.3 (was: 2.8.0) > YarnClientImpl.submitApplication swallows exceptions > > > Key: YARN-3877 > URL: https://issues.apache.org/jira/browse/YARN-3877 > Project: Hadoop YARN > Issue Type: Improvement > Components: client >Affects Versions: 2.7.2 >Reporter: Steve Loughran >Assignee: Varun Saxena >Priority: Minor > Attachments: YARN-3877.01.patch, YARN-3877.02.patch, > YARN-3877.03.patch > > > When {{YarnClientImpl.submitApplication}} spins waiting for the application > to be accepted, any interruption during its Sleep() calls are logged and > swallowed. > this makes it hard to interrupt the thread during shutdown. Really it should > throw some form of exception and let the caller deal with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5659) getPathFromYarnURL should use standard methods
[ https://issues.apache.org/jira/browse/YARN-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15507971#comment-15507971 ] Hitesh Shah commented on YARN-5659: --- \cc [~leftnoteasy] [~vvasudev] [~djp] > getPathFromYarnURL should use standard methods > -- > > Key: YARN-5659 > URL: https://issues.apache.org/jira/browse/YARN-5659 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: YARN-5659.patch > > > getPathFromYarnURL does some string shenanigans where standard ctors should > suffice. > There are also bugs in it e.g. passing an empty scheme to the URI ctor is > invalid, null should be used. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5219) When an export var command fails in launch_container.sh, the full container launch should fail
[ https://issues.apache.org/jira/browse/YARN-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-5219: -- Description: Today, a container fails if certain files fail to localize. However, if certain env vars fail to get setup properly either due to bugs in the yarn application or misconfiguration, the actual process launch still gets triggered. This results in either confusing error messages if the process fails to launch or worse yet the process launches but then starts behaving wrongly if the env var is used to control some behavioral aspects. In this scenario, the issue was reproduced by trying to do export abc="$\{foo.bar}" which is invalid as var names cannot contain "." in bash. was: Today, a container fails if certain files fail to localize. However, if certain env vars fail to get setup properly either due to bugs in the yarn application or misconfiguration, the actual process launch still gets triggered. This results in either confusing error messages if the process fails to launch or worse yet the process launches but then starts behaving wrongly if the env var is used to control some behavioral aspects. In this scenario, the issue was reproduced by trying to do export abc="$\X{foo.bar}" which is invalid as var names cannot contain "." in bash. > When an export var command fails in launch_container.sh, the full container > launch should fail > -- > > Key: YARN-5219 > URL: https://issues.apache.org/jira/browse/YARN-5219 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah > > Today, a container fails if certain files fail to localize. However, if > certain env vars fail to get setup properly either due to bugs in the yarn > application or misconfiguration, the actual process launch still gets > triggered. This results in either confusing error messages if the process > fails to launch or worse yet the process launches but then starts behaving > wrongly if the env var is used to control some behavioral aspects. > In this scenario, the issue was reproduced by trying to do export > abc="$\{foo.bar}" which is invalid as var names cannot contain "." in bash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-5219) When an export var command fails in launch_container.sh, the full container launch should fail
[ https://issues.apache.org/jira/browse/YARN-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-5219: -- Description: Today, a container fails if certain files fail to localize. However, if certain env vars fail to get setup properly either due to bugs in the yarn application or misconfiguration, the actual process launch still gets triggered. This results in either confusing error messages if the process fails to launch or worse yet the process launches but then starts behaving wrongly if the env var is used to control some behavioral aspects. In this scenario, the issue was reproduced by trying to do export abc="$\X{foo.bar}" which is invalid as var names cannot contain "." in bash. was: Today, a container fails if certain files fail to localize. However, if certain env vars fail to get setup properly either due to bugs in the yarn application or misconfiguration, the actual process launch still gets triggered. This results in either confusing error messages if the process fails to launch or worse yet the process launches but then starts behaving wrongly if the env var is used to control some behavioral aspects. In this scenario, the issue was reproduced by trying to do export abc="${foo.bar}" which is invalid as var names cannot contain "." in bash. > When an export var command fails in launch_container.sh, the full container > launch should fail > -- > > Key: YARN-5219 > URL: https://issues.apache.org/jira/browse/YARN-5219 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Hitesh Shah > > Today, a container fails if certain files fail to localize. However, if > certain env vars fail to get setup properly either due to bugs in the yarn > application or misconfiguration, the actual process launch still gets > triggered. This results in either confusing error messages if the process > fails to launch or worse yet the process launches but then starts behaving > wrongly if the env var is used to control some behavioral aspects. > In this scenario, the issue was reproduced by trying to do export > abc="$\X{foo.bar}" which is invalid as var names cannot contain "." in bash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-5219) When an export var command fails in launch_container.sh, the full container launch should fail
Hitesh Shah created YARN-5219: - Summary: When an export var command fails in launch_container.sh, the full container launch should fail Key: YARN-5219 URL: https://issues.apache.org/jira/browse/YARN-5219 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Today, a container fails if certain files fail to localize. However, if certain env vars fail to get setup properly either due to bugs in the yarn application or misconfiguration, the actual process launch still gets triggered. This results in either confusing error messages if the process fails to launch or worse yet the process launches but then starts behaving wrongly if the env var is used to control some behavioral aspects. In this scenario, the issue was reproduced by trying to do export abc="${foo.bar}" which is invalid as var names cannot contain "." in bash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5131) Distributed shell AM fails because of InterruptedException
[ https://issues.apache.org/jira/browse/YARN-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297174#comment-15297174 ] Hitesh Shah edited comment on YARN-5131 at 5/23/16 10:04 PM: - The error in the description is not really an error. The thread was interrupted and does not match the title related to an NPE. was (Author: hitesh): The error in the description is not really an error. The thread was interrupted. > Distributed shell AM fails because of InterruptedException > -- > > Key: YARN-5131 > URL: https://issues.apache.org/jira/browse/YARN-5131 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Wangda Tan > > DShell AM fails with the following exception > {code} > INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287) > End of LogType:AppMaster.stderr > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5131) Distributed shell AM fails because of InterruptedException
[ https://issues.apache.org/jira/browse/YARN-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297174#comment-15297174 ] Hitesh Shah commented on YARN-5131: --- The error in the description is not really an error. The thread was interrupted. > Distributed shell AM fails because of InterruptedException > -- > > Key: YARN-5131 > URL: https://issues.apache.org/jira/browse/YARN-5131 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sumana Sathish >Assignee: Wangda Tan > > DShell AM fails with the following exception > {code} > INFO impl.AMRMClientAsyncImpl: Interrupted while waiting for queue > java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2052) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) > at > org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:287) > End of LogType:AppMaster.stderr > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files
[ https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287966#comment-15287966 ] Hitesh Shah commented on YARN-1151: --- For paths, the impl should do auto-resolving i.e. use default fs if no fs specified, support both file:// and hdfs:// which is likely to happen especially if we have a mix of jars coming in via rpms vs hdfs-based cache. > Ability to configure auxiliary services from HDFS-based JAR files > - > > Key: YARN-1151 > URL: https://issues.apache.org/jira/browse/YARN-1151 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.1.0-beta, 2.9.0 >Reporter: john lilley >Assignee: Xuan Gong > Labels: auxiliary-service, yarn > Attachments: YARN-1151.1.patch > > > I would like to install an auxiliary service in Hadoop YARN without actually > installing files/services on every node in the system. Discussions on the > user@ list indicate that this is not easily done. The reason we want an > auxiliary service is that our application has some persistent-data components > that are not appropriate for HDFS. In fact, they are somewhat analogous to > the mapper output of MapReduce's shuffle, which is what led me to > auxiliary-services in the first place. It would be much easier if we could > just place our service's JARs in HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1151) Ability to configure auxiliary services from HDFS-based JAR files
[ https://issues.apache.org/jira/browse/YARN-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287963#comment-15287963 ] Hitesh Shah commented on YARN-1151: --- How can one specify an archive? or would only simple files be supported? Additionally, would a fat jar ( jar-with-dependencies ) work out of the box? > Ability to configure auxiliary services from HDFS-based JAR files > - > > Key: YARN-1151 > URL: https://issues.apache.org/jira/browse/YARN-1151 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.1.0-beta, 2.9.0 >Reporter: john lilley >Assignee: Xuan Gong > Labels: auxiliary-service, yarn > Attachments: YARN-1151.1.patch > > > I would like to install an auxiliary service in Hadoop YARN without actually > installing files/services on every node in the system. Discussions on the > user@ list indicate that this is not easily done. The reason we want an > auxiliary service is that our application has some persistent-data components > that are not appropriate for HDFS. In fact, they are somewhat analogous to > the mapper output of MapReduce's shuffle, which is what led me to > auxiliary-services in the first place. It would be much easier if we could > just place our service's JARs in HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5079) [Umbrella] Native YARN framework layer for services and beyond
[ https://issues.apache.org/jira/browse/YARN-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15282902#comment-15282902 ] Hitesh Shah commented on YARN-5079: --- It might be better to have this discussion on the mailing lists instead of JIRA. > [Umbrella] Native YARN framework layer for services and beyond > -- > > Key: YARN-5079 > URL: https://issues.apache.org/jira/browse/YARN-5079 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > > (See overview doc at YARN-4692, modifying and copy-pasting some of the > relevant pieces and sub-section 3.3.1 to track the specific sub-item.) > (This is a companion to YARN-4793 in our effort to simplify the entire story, > but focusing on APIs) > So far, YARN by design has restricted itself to having a very low-level API > that can support any type of application. Frameworks like Apache Hadoop > MapReduce, Apache Tez, Apache Spark, Apache REEF, Apache Twill, Apache Helix > and others ended up exposing higher level APIs that end-users can directly > leverage to build their applications on top of YARN. On the services side, > Apache Slider has done something similar. > With our current attention on making services first-class and simplified, > it's time to take a fresh look at how we can make Apache Hadoop YARN support > services well out of the box. Beyond the functionality that I outlined in the > previous sections in the doc on how NodeManagers can be enhanced to help > services, the biggest missing piece is the framework itself. There is a lot > of very important functionality that a services' framework can own together > with YARN in executing services end-to-end. > In this JIRA I propose we look at having a native Apache Hadoop framework for > running services natively on YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2506) TimelineClient should NOT be in yarn-common project
[ https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280992#comment-15280992 ] Hitesh Shah commented on YARN-2506: --- Why not do this in trunk? > TimelineClient should NOT be in yarn-common project > --- > > Key: YARN-2506 > URL: https://issues.apache.org/jira/browse/YARN-2506 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen >Priority: Critical > > YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't > belong there, we should move it back to yarn-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-2506) TimelineClient should NOT be in yarn-common project
[ https://issues.apache.org/jira/browse/YARN-2506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280994#comment-15280994 ] Hitesh Shah commented on YARN-2506: --- Seems like something useful as part of 3.x given that the client library is meant to be part of yarn-client-api > TimelineClient should NOT be in yarn-common project > --- > > Key: YARN-2506 > URL: https://issues.apache.org/jira/browse/YARN-2506 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Zhijie Shen >Priority: Critical > > YARN-2298 incorrectly moved TimelineClient to yarn-common project. It doesn't > belong there, we should move it back to yarn-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5068) The AM does not know the queue from which it is launched.
[ https://issues.apache.org/jira/browse/YARN-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279479#comment-15279479 ] Hitesh Shah commented on YARN-5068: --- The main use-case is this: - tez runs multiple dags in a single yarn application - for each dag, we publish data to yarn timeline and support searching for these dags based on the data published. - Timeline does not support doing searches which require a join between app-specific data and data published by the yarn framework. - Even though AHS has queue data, a single webservice call to ATS cannot be used to retrieve dags that were submitted to a particular queue. - To do this, we need to access the queue information in the AM and publish it along with the dag data as app-specific data and then use it for filtering. > The AM does not know the queue from which it is launched. > - > > Key: YARN-5068 > URL: https://issues.apache.org/jira/browse/YARN-5068 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash > Attachments: MAPREDUCE-6692.patch > > > The AM needs to know the queue name in which it was launched. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4851) Metric improvements for ATS v1.5 storage components
[ https://issues.apache.org/jira/browse/YARN-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15263118#comment-15263118 ] Hitesh Shah commented on YARN-4851: --- New entries look fine. And agree that the others can done in follow-up jiras. > Metric improvements for ATS v1.5 storage components > --- > > Key: YARN-4851 > URL: https://issues.apache.org/jira/browse/YARN-4851 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4851-trunk.001.patch, YARN-4851-trunk.002.patch, > YARN-4851-trunk.003.patch > > > We can add more metrics to the ATS v1.5 storage systems, including purging, > cache hit/misses, read latency, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261187#comment-15261187 ] Hitesh Shah commented on YARN-4844: --- bq. Per my understanding, changing from int to long won't affect downstream project a lot, it's an error which can be captured by compiler directly. And getMemory/getVCores should not be used intensively by downstream project. For example, MR uses only ~20 times of getMemory()/VCores for non-testing code. Which can be easily fixed. If you are going to force downstream apps to change, I dont understand why you are not forcing them to do this in the first 3.0.0 release? What benefit does this provide anyone by delaying it to some later 3.x.y release? It just means that you have do the production stability verification of upstream apps all over again. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch, YARN-4844.3.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254970#comment-15254970 ] Hitesh Shah commented on YARN-4844: --- Additionally we are not talking about use in production but rather making upstream apps change as needed to work with 3.x and over time stabilize 3.x. Making an API change earlier rather than later is actually better as the API changes in this case have no relevance to production stability. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254967#comment-15254967 ] Hitesh Shah commented on YARN-4844: --- bq. considering there are hundreds of blockers and criticals of 3.0.0 release, nobody will actually use the new release in production even if 3.0-alpha can be released. We can mark Resource API of trunk to be unstable and update it in future 3.x releases. So the plan is to force users to change their usage of these APIs in some version of 3.x but not in 3.0.0 ? > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4851) Metric improvements for ATS v1.5 storage components
[ https://issues.apache.org/jira/browse/YARN-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254750#comment-15254750 ] Hitesh Shah commented on YARN-4851: --- Some general comments on usability ( have not reviewed the patch in detail) - names need a bit of work e.g. SummaryDataReadTimeNumOps and SummaryDataReadTimeAvgTime - not sure why NumOps has a relation to ReadTime and time in ReadTimeAvgTime seems redundant. - would be good to have the scale in there i.e. is time in millis or seconds? - updates to the timeline server docs for these metrics seems missing. - what is the difference bet CacheRefreshTimeNumOps and CacheRefreshOps ? - Likewise for LogCleanTimeNumOps vs LogsDirsCleaned or PutDomainTimeNumOps vs PutDomainOps - cache eviction rates needed? - how do we get a count of how many cache refreshes were due to stale data vs never cached/evicted earlier? do we need this? - should be there 2 levels of metrics - one group enabled by default and a second group for more detailed monitoring to reduce load on the metrics system? - would be good to understand the request count at the ATSv1.5 level itself to understand which calls end up going to summary vs cache vs fs-based lookups ( i.e. across all gets ). - at the overall ATS level, an overall avg latency across all reqs might be useful for a general health check > Metric improvements for ATS v1.5 storage components > --- > > Key: YARN-4851 > URL: https://issues.apache.org/jira/browse/YARN-4851 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4851-trunk.001.patch, YARN-4851-trunk.002.patch > > > We can add more metrics to the ATS v1.5 storage systems, including purging, > cache hit/misses, read latency, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4990) Re-direction of a particular log file within in a container in NM UI does not redirect properly to Log Server ( history ) on container completion
Hitesh Shah created YARN-4990: - Summary: Re-direction of a particular log file within in a container in NM UI does not redirect properly to Log Server ( history ) on container completion Key: YARN-4990 URL: https://issues.apache.org/jira/browse/YARN-4990 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah The NM does the redirection to the history server correctly. However if the user is viewing or has a link to a particular specific file, the redirect ends up going to the top level page for the container and not redirecting to the specific file. Additionally, the start param to show logs from the offset 0 also goes missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253084#comment-15253084 ] Hitesh Shah commented on YARN-4844: --- bq. It is not a very hard thing to drop it, we'd better to do it close to first branch-3 release. I believe a recent comment on the mailing list was trying to target a 3.0 release within the next few weeks so I guess that means we make this change now? > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15253028#comment-15253028 ] Hitesh Shah commented on YARN-4844: --- getMemoryLong(), etc just seems messy. I can understand why this is needed on branch-2 if we need to support long but for trunk, it seems better to change getMemory() to return a long. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()
[ https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-868: - Attachment: (was: YARN-868.0003.patch) > YarnClient should set the service address in tokens returned by > getRMDelegationToken() > -- > > Key: YARN-868 > URL: https://issues.apache.org/jira/browse/YARN-868 > Project: Hadoop YARN > Issue Type: Bug > Components: client, resourcemanager >Affects Versions: 2.7.0 >Reporter: Hitesh Shah >Assignee: Varun Saxena > Labels: BB2015-05-RFC > Attachments: YARN-868.02.patch, YARN-868.patch > > > Either the client should set this information into the token or the client > layer should expose an api that returns the service address. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()
[ https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186366#comment-15186366 ] Hitesh Shah commented on YARN-868: -- Removed patch 3 for now - discovered basic errors/wrong assumptions on the serviceAddr/renewer > YarnClient should set the service address in tokens returned by > getRMDelegationToken() > -- > > Key: YARN-868 > URL: https://issues.apache.org/jira/browse/YARN-868 > Project: Hadoop YARN > Issue Type: Bug > Components: client, resourcemanager >Affects Versions: 2.7.0 >Reporter: Hitesh Shah >Assignee: Varun Saxena > Labels: BB2015-05-RFC > Attachments: YARN-868.02.patch, YARN-868.patch > > > Either the client should set this information into the token or the client > layer should expose an api that returns the service address. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()
[ https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185685#comment-15185685 ] Hitesh Shah commented on YARN-868: -- Sorry - mixed contents on the previous comments. There are 2 issues: - the service address should be set in tokens handed back to user-code by the framework itself - the user-code should not need to pass in the renewer string when requesting a token patch 2 seems to try address the former and patch 3 is my attempt to fix the latter. \cc [~vvasudev] [~vinodkv] for comments > YarnClient should set the service address in tokens returned by > getRMDelegationToken() > -- > > Key: YARN-868 > URL: https://issues.apache.org/jira/browse/YARN-868 > Project: Hadoop YARN > Issue Type: Bug > Components: client, resourcemanager >Affects Versions: 2.7.0 >Reporter: Hitesh Shah >Assignee: Varun Saxena > Labels: BB2015-05-RFC > Attachments: YARN-868.0003.patch, YARN-868.02.patch, YARN-868.patch > > > Either the client should set this information into the token or the client > layer should expose an api that returns the service address. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()
[ https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15185618#comment-15185618 ] Hitesh Shah commented on YARN-868: -- [~varun_saxena] to add to your previous comment, yes and yes. I also made the ClientRMProxy api public given that it is currently unstable. > YarnClient should set the service address in tokens returned by > getRMDelegationToken() > -- > > Key: YARN-868 > URL: https://issues.apache.org/jira/browse/YARN-868 > Project: Hadoop YARN > Issue Type: Bug > Components: client, resourcemanager >Affects Versions: 2.7.0 >Reporter: Hitesh Shah >Assignee: Varun Saxena > Labels: BB2015-05-RFC > Attachments: YARN-868.0003.patch, YARN-868.02.patch, YARN-868.patch > > > Either the client should set this information into the token or the client > layer should expose an api that returns the service address. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-868) YarnClient should set the service address in tokens returned by getRMDelegationToken()
[ https://issues.apache.org/jira/browse/YARN-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-868: - Attachment: YARN-868.0003.patch [~varun_saxena] Sorry - this completely dropped off my radar. Looking at the patch and the current code, I tried to take a different approach and came up with a patch. [~vinodkv] Mind taking a look at patch 02 and 03 and giving your comments. > YarnClient should set the service address in tokens returned by > getRMDelegationToken() > -- > > Key: YARN-868 > URL: https://issues.apache.org/jira/browse/YARN-868 > Project: Hadoop YARN > Issue Type: Bug > Components: client, resourcemanager >Affects Versions: 2.7.0 >Reporter: Hitesh Shah >Assignee: Varun Saxena > Labels: BB2015-05-RFC > Attachments: YARN-868.0003.patch, YARN-868.02.patch, YARN-868.patch > > > Either the client should set this information into the token or the client > layer should expose an api that returns the service address. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305
[ https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046055#comment-15046055 ] Hitesh Shah commented on YARN-3996: --- Why should a minimum resource of 0 be ever supported? > YARN-789 (Support for zero capabilities in fairscheduler) is broken after > YARN-3305 > --- > > Key: YARN-3996 > URL: https://issues.apache.org/jira/browse/YARN-3996 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler, fairscheduler >Reporter: Anubhav Dhoot >Assignee: Neelesh Srinivas Salian >Priority: Critical > Attachments: YARN-3996.001.patch, YARN-3996.002.patch, > YARN-3996.003.patch, YARN-3996.prelim.patch > > > RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest > with mininumResource for the incrementResource. This causes normalize to > return zero if minimum is set to zero as per YARN-789 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986731#comment-14986731 ] Hitesh Shah commented on YARN-2513: --- [~yeshavora] and I are trying to use this on a secure cluster. I can kinit and make a curl call to "/ws/v1/timeline/TEZ_DAG_ID?limit=1" and it works correctly but when trying to make a curl call to the hosted UI, it fails. I see that KerberosAuthenticationHandler.java:init(214) is being invoked twice. The error being thrown is: {code} 2015-11-03 05:01:45,864 WARN server.AuthenticationFilter (AuthenticationFilter.java:doFilter(551)) - Authentication exception: GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34)) org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: Failure unspecified at GSS-API level (Mechanism level: Request is a replay (34)) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:399) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationHandler.authenticate(DelegationTokenAuthenticationHandler.java:347) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:507) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1225) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) {code} [~jeagles] [~vinodkv] [~steve_l] Any suggestions? > Host framework UIs in YARN for use with the ATS > --- > > Key: YARN-2513 > URL: https://issues.apache.org/jira/browse/YARN-2513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, > YARN-2513.v3.patch, YARN-2513.v4.patch, YARN-2513.v5.patch > > > Allow for pluggable UIs as described by TEZ-8. Yarn can provide the > infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986732#comment-14986732 ] Hitesh Shah commented on YARN-2513: --- The below patch seems to work but I am not sure what else I may be breaking: {code} --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/Applicatio +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/Applicatio @@ -305,16 +305,16 @@ private void startWebApp() { WebAppContext uiWebAppContext = new WebAppContext(); uiWebAppContext.setContextPath(webPath); uiWebAppContext.setWar(onDiskPath); - final String[] ALL_URLS = { "/*" }; - FilterHolder[] filterHolders = - webAppContext.getServletHandler().getFilters(); - for (FilterHolder filterHolder: filterHolders) { - if (!"guice".equals(filterHolder.getName())) { - HttpServer2.defineFilter(uiWebAppContext, filterHolder.getName(), - filterHolder.getClassName(), filterHolder.getInitParameters(), - ALL_URLS); - } - } + //final String[] ALL_URLS = { "/*" }; + //FilterHolder[] filterHolders = + // webAppContext.getServletHandler().getFilters(); + //for (FilterHolder filterHolder: filterHolders) { + // if (!"guice".equals(filterHolder.getName())) { + //HttpServer2.defineFilter(uiWebAppContext, filterHolder.getName(), + //filterHolder.getClassName(), filterHolder.getInitParameters(), + //ALL_URLS); + // } + //} {code} > Host framework UIs in YARN for use with the ATS > --- > > Key: YARN-2513 > URL: https://issues.apache.org/jira/browse/YARN-2513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Fix For: 3.0.0, 2.8.0, 2.7.2 > > Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, > YARN-2513.v3.patch, YARN-2513.v4.patch, YARN-2513.v5.patch > > > Allow for pluggable UIs as described by TEZ-8. Yarn can provide the > infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4323) AMRMClient does not respect SchedulerResourceTypes post YARN-2448
[ https://issues.apache.org/jira/browse/YARN-4323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-4323: -- Affects Version/s: 2.6.0 > AMRMClient does not respect SchedulerResourceTypes post YARN-2448 > - > > Key: YARN-4323 > URL: https://issues.apache.org/jira/browse/YARN-4323 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Hitesh Shah > > Given that the RM now informs the AM of the resources it supports, AMRMClient > should be changed to match correctly by normalizing the invalid resource > types. > i.e. AMRMClient::getMatchingRequests() should correctly return back matches > by only looking at the resource types that are valid. > \cc [~vvasudev] [~bikassaha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4323) AMRMClient does not respect SchedulerResourceTypes post YARN-2448
Hitesh Shah created YARN-4323: - Summary: AMRMClient does not respect SchedulerResourceTypes post YARN-2448 Key: YARN-4323 URL: https://issues.apache.org/jira/browse/YARN-4323 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Given that the RM now informs the AM of the resources it supports, AMRMClient should be changed to match correctly by normalizing the invalid resource types. i.e. AMRMClient::getMatchingRequests() should correctly return back matches by only looking at the resource types that are valid. \cc [~vvasudev] [~bikassaha] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967596#comment-14967596 ] Hitesh Shah commented on YARN-2513: --- Tested the latest patch with multiple UIs being hosted. Works fine now. > Host framework UIs in YARN for use with the ATS > --- > > Key: YARN-2513 > URL: https://issues.apache.org/jira/browse/YARN-2513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, > YARN-2513.v3.patch, YARN-2513.v4.patch, YARN-2513.v5.patch > > > Allow for pluggable UIs as described by TEZ-8. Yarn can provide the > infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967598#comment-14967598 ] Hitesh Shah commented on YARN-2513: --- +1 > Host framework UIs in YARN for use with the ATS > --- > > Key: YARN-2513 > URL: https://issues.apache.org/jira/browse/YARN-2513 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, > YARN-2513.v3.patch, YARN-2513.v4.patch, YARN-2513.v5.patch > > > Allow for pluggable UIs as described by TEZ-8. Yarn can provide the > infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941935#comment-14941935 ] Hitesh Shah commented on YARN-4009: --- [~jeagles] Did you get a chance to look at the latest patch? > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch, YARN-4009.005.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901169#comment-14901169 ] Hitesh Shah commented on YARN-4009: --- Thinking more on this, a global config might be something that is okay to start with ( we already have a huge proliferation of configs which users do not set ). If there are concerns raised down the line, it should likely be easy enough to add yarn and hdfs specific configs which would override the global one in a compatible manner? [~jeagles] comments? > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791225#comment-14791225 ] Hitesh Shah commented on YARN-4009: --- Couple of questions: {code} if (!initializers.contains(CrossOriginFilterInitializer.class.getName())) { if(conf.getBoolean(YarnConfiguration .TIMELINE_SERVICE_HTTP_CROSS_ORIGIN_ENABLED, YarnConfiguration .TIMELINE_SERVICE_HTTP_CROSS_ORIGIN_ENABLED_DEFAULT)) { initializers = CrossOriginFilterInitializer.class.getName() + "," + initializers; modifiedInitializers = true; } } {code} I see this code in Timeline which makes it easier to enable cross-origin support just for Timeline. I am assuming Timeline also looks at the hadoop filters defined in core-site? What happens when both of these are enabled at the same time with different settings? Not sure if there is a question of selecting enabling cors support for different services such as NN webservices vs RM webservices. Apart from the above, if a global config is good enough, patch looks good. > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791228#comment-14791228 ] Hitesh Shah commented on YARN-4009: --- [~jeagles] Any comments on the patch? > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4009) CORS support for ResourceManager REST API
[ https://issues.apache.org/jira/browse/YARN-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791226#comment-14791226 ] Hitesh Shah commented on YARN-4009: --- [~jeagles] ? > CORS support for ResourceManager REST API > - > > Key: YARN-4009 > URL: https://issues.apache.org/jira/browse/YARN-4009 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Prakash Ramachandran >Assignee: Varun Vasudev > Attachments: YARN-4009.001.patch, YARN-4009.002.patch, > YARN-4009.003.patch, YARN-4009.004.patch > > > Currently the REST API's do not have CORS support. This means any UI (running > in browser) cannot consume the REST API's. For ex Tez UI would like to use > the REST API for getting application, application attempt information exposed > by the API's. > It would be very useful if CORS is enabled for the REST API's. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742159#comment-14742159 ] Hitesh Shah commented on YARN-3942: --- Thanks [~gss2002]. One point to note - if you use long running Hive sessions, this will cause an OOM in the timeline server as the data is cached on a per "session" basis. I am not sure if there is another simple way to disable Hive session re-use in the HiveServer \cc [~vikram.dixit] > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14729815#comment-14729815 ] Hitesh Shah commented on YARN-3942: --- Some ideas from an offline discussion with [~bikassaha] and [~vinodkv]: - option 1) Could we just use leveldb as an LRU cache instead of a memory based cache to handle the OOM issue? - option 2) Could we just take the data from HDFS and write it out to leveldb and using the level db to serve data out? This would address the OOM issue too. \cc [~jlowe] [~jeagles] > Timeline store to read events from HDFS > --- > > Key: YARN-3942 > URL: https://issues.apache.org/jira/browse/YARN-3942 > Project: Hadoop YARN > Issue Type: Improvement > Components: timelineserver >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: YARN-3942.001.patch > > > This adds a new timeline store plugin that is intended as a stop-gap measure > to mitigate some of the issues we've seen with ATS v1 while waiting for ATS > v2. The intent of this plugin is to provide a workable solution for running > the Tez UI against the timeline server on a large-scale clusters running many > thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3942) Timeline store to read events from HDFS
[ https://issues.apache.org/jira/browse/YARN-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720574#comment-14720574 ] Hitesh Shah commented on YARN-3942: --- [~jlowe] [~rajesh.balamohan] observed that the timeline server was running out of memory in a certain scenario. In this scenario, we are using Hive-on-Tez but Hive re-uses the application to run 100s of DAGs/queries (doAs=false with perimeter security using say Ranger or Sentry). The EntityFileStore sizes a cache based on the no. of applications it can cache but in the above scenario, even a single app could be very large. Ideally, if each dag was in a separate file and all of its entries treated as a single cache entity - that would probably work better but making this generic enough may be a bit tricky. Any suggestions here? Timeline store to read events from HDFS --- Key: YARN-3942 URL: https://issues.apache.org/jira/browse/YARN-3942 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-3942.001.patch This adds a new timeline store plugin that is intended as a stop-gap measure to mitigate some of the issues we've seen with ATS v1 while waiting for ATS v2. The intent of this plugin is to provide a workable solution for running the Tez UI against the timeline server on a large-scale clusters running many thousands of jobs per day. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4085) Generate file with container resource limits in the container work dir
[ https://issues.apache.org/jira/browse/YARN-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717524#comment-14717524 ] Hitesh Shah commented on YARN-4085: --- Set values in the environment as compared to a file? If a file, should that be a properties file with all useful information written into it and not just the resource size info? Generate file with container resource limits in the container work dir -- Key: YARN-4085 URL: https://issues.apache.org/jira/browse/YARN-4085 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Priority: Minor Currently, a container doesn't know what resource limits are being imposed on it. It would be helpful if the NM generated a simple file in the container work dir with the resource limits specified. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4087) Set YARN_FAIL_FAST to be false by default
[ https://issues.apache.org/jira/browse/YARN-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717526#comment-14717526 ] Hitesh Shah commented on YARN-4087: --- It would be good to rename the config property to something that provides a bit more clarity on what the config knob is meant to control. Set YARN_FAIL_FAST to be false by default - Key: YARN-4087 URL: https://issues.apache.org/jira/browse/YARN-4087 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Assignee: Jian He Attachments: YARN-4087.1.patch Increasingly, I feel setting this property to be false makes more sense especially in production environment, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3944) Connection refused to nodemanagers are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-3944: -- Labels: 2.6.1-candidate (was: ) Connection refused to nodemanagers are retried at multiple levels - Key: YARN-3944 URL: https://issues.apache.org/jira/browse/YARN-3944 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Labels: 2.6.1-candidate Attachments: YARN-3944.v1.patch This is related to YARN-3238. When NM is down, ipc client will get ConnectException. Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521) at org.apache.hadoop.ipc.Client.call(Client.java:1438) However, retry happens at two layers(ipc retry 40 times and serverProxy retrying 91 times), this could end up with ~1 hour retry interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4047) ClientRMService getApplications has high scheduler lock contention
[ https://issues.apache.org/jira/browse/YARN-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-4047: -- Labels: 2.6.1-candidate (was: ) ClientRMService getApplications has high scheduler lock contention -- Key: YARN-4047 URL: https://issues.apache.org/jira/browse/YARN-4047 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Labels: 2.6.1-candidate Attachments: YARN-4047.001.patch The getApplications call can be particuarly expensive because the code can call checkAccess on every application being tracked by the RM. checkAccess will often call scheduler.checkAccess which will grab the big scheduler lock. This can cause a lot of contention with the scheduler thread which is busy trying to process node heartbeats, app allocation requests, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3978) Configurably turn off the saving of container info in Generic AHS
[ https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-3978: -- Labels: 2.6.1-candidate (was: ) Configurably turn off the saving of container info in Generic AHS - Key: YARN-3978 URL: https://issues.apache.org/jira/browse/YARN-3978 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver, yarn Affects Versions: 2.8.0, 2.7.1 Reporter: Eric Payne Assignee: Eric Payne Labels: 2.6.1-candidate Fix For: 3.0.0, 2.8.0, 2.7.2 Attachments: YARN-3978.001.patch, YARN-3978.002.patch, YARN-3978.003.patch, YARN-3978.004.patch Depending on how each application's metadata is stored, one week's worth of data stored in the Generic Application History Server's database can grow to be almost a terabyte of local disk space. In order to alleviate this, I suggest that there is a need for a configuration option to turn off saving of non-AM container metadata in the GAHS data store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4032) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834
[ https://issues.apache.org/jira/browse/YARN-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-4032: -- Labels: 2.6.1-candidate (was: ) Corrupted state from a previous version can still cause RM to fail with NPE due to same reasons as YARN-2834 Key: YARN-4032 URL: https://issues.apache.org/jira/browse/YARN-4032 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Priority: Critical Labels: 2.6.1-candidate YARN-2834 ensures in 2.6.0 there will not be any inconsistent state. But if someone is upgrading from a previous version, the state can still be inconsistent and then RM will still fail with NPE after upgrade to 2.6.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1809) Synchronize RM and Generic History Service Web-UIs
[ https://issues.apache.org/jira/browse/YARN-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-1809: -- Labels: 2.6.1-candidate (was: ) Synchronize RM and Generic History Service Web-UIs -- Key: YARN-1809 URL: https://issues.apache.org/jira/browse/YARN-1809 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Xuan Gong Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: YARN-1809.1.patch, YARN-1809.10.patch, YARN-1809.11.patch, YARN-1809.12.patch, YARN-1809.13.patch, YARN-1809.14.patch, YARN-1809.15-rebase.patch, YARN-1809.15.patch, YARN-1809.16.patch, YARN-1809.17.patch, YARN-1809.17.rebase.patch, YARN-1809.17.rebase.patch, YARN-1809.2.patch, YARN-1809.3.patch, YARN-1809.4.patch, YARN-1809.5.patch, YARN-1809.5.patch, YARN-1809.6.patch, YARN-1809.7.patch, YARN-1809.8.patch, YARN-1809.9.patch After YARN-953, the web-UI of generic history service is provide more information than that of RM, the details about app attempt and container. It's good to provide similar web-UIs, but retrieve the data from separate source, i.e., RM cache and history store respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3287) TimelineClient kerberos authentication failure uses wrong login context.
[ https://issues.apache.org/jira/browse/YARN-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-3287: -- Labels: 2.6.1-candidate (was: ) TimelineClient kerberos authentication failure uses wrong login context. Key: YARN-3287 URL: https://issues.apache.org/jira/browse/YARN-3287 Project: Hadoop YARN Issue Type: Bug Reporter: Jonathan Eagles Assignee: Daryn Sharp Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: YARN-3287.1.patch, YARN-3287.2.patch, YARN-3287.3.patch, timeline.patch TimelineClientImpl:doPosting is not wrapped in a doAs, which can cause failure for yarn clients to create timeline domains during job submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3725) App submission via REST API is broken in secure mode due to Timeline DT service address is empty
[ https://issues.apache.org/jira/browse/YARN-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-3725: -- Labels: 2.6.1-candidate (was: ) App submission via REST API is broken in secure mode due to Timeline DT service address is empty Key: YARN-3725 URL: https://issues.apache.org/jira/browse/YARN-3725 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.7.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.7.1 Attachments: YARN-3725.1.patch YARN-2971 changes TimelineClient to use the service address from Timeline DT to renew the DT instead of configured address. This break the procedure of submitting an YARN app via REST API in the secure mode. The problem is that service address is set by the client instead of the server in Java code. REST API response is an encode token Sting, such that it's so inconvenient to deserialize it and set the service address and serialize it again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3493) RM fails to come up with error Failed to load/recover state when mem settings are changed
[ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-3493: -- Labels: 2.6.1-candidate (was: ) RM fails to come up with error Failed to load/recover state when mem settings are changed Key: YARN-3493 URL: https://issues.apache.org/jira/browse/YARN-3493 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Reporter: Sumana Sathish Assignee: Jian He Priority: Critical Labels: 2.6.1-candidate Fix For: 2.8.0, 2.7.1 Attachments: YARN-3493.1.patch, YARN-3493.2.patch, YARN-3493.3.patch, YARN-3493.4.patch, YARN-3493.5.patch, yarn-yarn-resourcemanager.log.zip RM fails to come up for the following case: 1. Change yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in background and wait for the job to reach running state 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 before the above job completes 4. Restart RM 5. RM fails to come up with the below error {code:title= RM error for Mem settings changed} - RM app submission failed in validating AM resource request for application application_1429094976272_0008 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory 0, or requested memory max configured, requestedMemory=3072, maxMemory=2048 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(579)) - Failed to load/recover state org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory 0, or requested memory max configured, requestedMemory=3072, maxMemory=2048 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at
[jira] [Updated] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)
[ https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-2900: -- Labels: 2.6.1-candidate (was: ) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500) --- Key: YARN-2900 URL: https://issues.apache.org/jira/browse/YARN-2900 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Mit Desai Labels: 2.6.1-candidate Fix For: 2.7.1 Attachments: YARN-2900-b2-2.patch, YARN-2900-b2.patch, YARN-2900-branch-2.7.20150530.patch, YARN-2900.20150529.patch, YARN-2900.20150530.patch, YARN-2900.20150530.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, YARN-2900.patch Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222) at org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679) at org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218) ... 59 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3267) Timelineserver applies the ACL rules after applying the limit on the number of records
[ https://issues.apache.org/jira/browse/YARN-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-3267: -- Labels: 2.6.1-candidate (was: ) Timelineserver applies the ACL rules after applying the limit on the number of records -- Key: YARN-3267 URL: https://issues.apache.org/jira/browse/YARN-3267 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Prakash Ramachandran Assignee: Chang Li Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: YARN-3267.3.patch, YARN-3267.4.patch, YARN-3267.5.patch, YARN_3267_V1.patch, YARN_3267_V2.patch, YARN_3267_WIP.patch, YARN_3267_WIP1.patch, YARN_3267_WIP2.patch, YARN_3267_WIP3.patch While fetching the entities from timelineserver, the limit is applied on the entities to be fetched from leveldb, the ACL filters are applied after this (TimelineDataManager.java::getEntities). this could mean that even if there are entities available which match the query criteria, we could end up not getting any results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638192#comment-14638192 ] Hitesh Shah commented on YARN-2513: --- I am not sure if multiple uis work. Tried the following configs: {code} property nameyarn.timeline-service.ui-names/name valuetezui,tezui2/value /property property nameyarn.timeline-service.ui-web-path.tezui/name value/tezui/value /property property nameyarn.timeline-service.ui-on-disk-path.tezui/name value/install//tez/ui//value /property property nameyarn.timeline-service.ui-web-path.tezui2/name value/tezui2/value /property property nameyarn.timeline-service.ui-on-disk-path.tezui2/name value/install/tez/tez-ui-0.8.0-SNAPSHOT.war/value /property {code} Logs: {code} 2015-07-22 22:43:09,643 ERROR applicationhistoryservice.ApplicationHistoryServer - AHSWebApp failed to start. java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.startWebApp(ApplicationHistoryServer.java:295) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceStart(ApplicationHistoryServer.java:114) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:162) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:171) 2015-07-22 22:43:09,644 INFO service.AbstractService - Service org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: AHSWebApp failed to start. org.apache.hadoop.yarn.exceptions.YarnRuntimeException: AHSWebApp failed to start. at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.startWebApp(ApplicationHistoryServer.java:305) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.serviceStart(ApplicationHistoryServer.java:114) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.launchAppHistoryServer(ApplicationHistoryServer.java:162) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.main(ApplicationHistoryServer.java:171) Caused by: java.lang.NullPointerException at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer.startWebApp(ApplicationHistoryServer.java:295) ... 4 more {code} Host framework UIs in YARN for use with the ATS --- Key: YARN-2513 URL: https://issues.apache.org/jira/browse/YARN-2513 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Labels: 2.6.1-candidate Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, YARN-2513.v3.patch Allow for pluggable UIs as described by TEZ-8. Yarn can provide the infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-2513: -- Labels: 2.6.1-candidate (was: ) Host framework UIs in YARN for use with the ATS --- Key: YARN-2513 URL: https://issues.apache.org/jira/browse/YARN-2513 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Labels: 2.6.1-candidate Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, YARN-2513.v3.patch Allow for pluggable UIs as described by TEZ-8. Yarn can provide the infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-2890: -- Labels: 2.6.1-candidate (was: ) MiniYarnCluster should turn on timeline service if configured to do so -- Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Labels: 2.6.1-candidate Fix For: 2.8.0 Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-2859: -- Labels: 2.6.1-candidate (was: ) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster -- Key: YARN-2859 URL: https://issues.apache.org/jira/browse/YARN-2859 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Hitesh Shah Assignee: Zhijie Shen Priority: Critical Labels: 2.6.1-candidate In mini cluster, a random port should be used. Also, the config is not updated to the host that the process got bound to. {code} 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer address: localhost:10200 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer web address: 0.0.0.0:8188 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622691#comment-14622691 ] Hitesh Shah commented on YARN-867: -- [~vinodkv] [~xgong] Is this still open or addressed elsewhere? Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2513) Host framework UIs in YARN for use with the ATS
[ https://issues.apache.org/jira/browse/YARN-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571822#comment-14571822 ] Hitesh Shah commented on YARN-2513: --- +1 to making this available for ATS v1. Would be useful in various deployments . Host framework UIs in YARN for use with the ATS --- Key: YARN-2513 URL: https://issues.apache.org/jira/browse/YARN-2513 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: YARN-2513-v1.patch, YARN-2513-v2.patch, YARN-2513.v3.patch Allow for pluggable UIs as described by TEZ-8. Yarn can provide the infrastructure to host java script and possible java UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-900) YarnClientApplication uses composition to hold GetNewApplicationResponse instead of having a simpler flattened structure
[ https://issues.apache.org/jira/browse/YARN-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523409#comment-14523409 ] Hitesh Shah commented on YARN-900: -- Probably too late to make this change now due to compatibility issues. YarnClientApplication uses composition to hold GetNewApplicationResponse instead of having a simpler flattened structure Key: YARN-900 URL: https://issues.apache.org/jira/browse/YARN-900 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Instead of YarnClientApplication have apis like getApplicationId, getMaximumResourceCapability, etc - it currently holds a GetNewApplicationResponse object. It might be simpler to get rid of GetNewApplicationResponse and return a more well-suited object both at the client as well from over the rpc layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-867) Isolation of failures in aux services
[ https://issues.apache.org/jira/browse/YARN-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-867: - Target Version/s: 2.8.0 (was: 2.3.0) Isolation of failures in aux services -- Key: YARN-867 URL: https://issues.apache.org/jira/browse/YARN-867 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Critical Attachments: YARN-867.1.sampleCode.patch, YARN-867.3.patch, YARN-867.4.patch, YARN-867.5.patch, YARN-867.6.patch, YARN-867.sampleCode.2.patch Today, a malicious application can bring down the NM by sending bad data to a service. For example, sending data to the ShuffleService such that it results any non-IOException will cause the NM's async dispatcher to exit as the service's INIT APP event is not handled properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-900) YarnClientApplication uses composition to hold GetNewApplicationResponse instead of having a simpler flattened structure
[ https://issues.apache.org/jira/browse/YARN-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved YARN-900. -- Resolution: Not A Problem YarnClientApplication uses composition to hold GetNewApplicationResponse instead of having a simpler flattened structure Key: YARN-900 URL: https://issues.apache.org/jira/browse/YARN-900 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Instead of YarnClientApplication have apis like getApplicationId, getMaximumResourceCapability, etc - it currently holds a GetNewApplicationResponse object. It might be simpler to get rid of GetNewApplicationResponse and return a more well-suited object both at the client as well from over the rpc layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2916) MiniYARNCluster should support enabling Timeline and new services via config
[ https://issues.apache.org/jira/browse/YARN-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved YARN-2916. --- Resolution: Duplicate MiniYARNCluster should support enabling Timeline and new services via config Key: YARN-2916 URL: https://issues.apache.org/jira/browse/YARN-2916 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.2.0, 2.3.0, 2.4.0, 2.5.0, 2.6.0 Reporter: Hitesh Shah For any application to use the MiniYARNCluster without a shim, supporting new components/services within the MiniYARNCluster should be done via config based flags instead of additional params to the constructor. Currently, for the same code to compile against 2.2/2.3/2.4/2.5/2.6, one needs different invocations to MiniYARNCluster if timeline needs to be enabled for versions of hadoop that support it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2840) Timeline should support creation of Domains where domainId is not provided by the user
[ https://issues.apache.org/jira/browse/YARN-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523603#comment-14523603 ] Hitesh Shah commented on YARN-2840: --- [~zjshen] Is there any thinking of how acls will work with v2? Can you either move this into the v2 sub-tasks or create a new jira and close this out as a wont-fix assuming v1 will not be enhanced. Timeline should support creation of Domains where domainId is not provided by the user -- Key: YARN-2840 URL: https://issues.apache.org/jira/browse/YARN-2840 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Hitesh Shah Current expectation is that the user has to come up with a unique domain id. When using this with applications such as Pig/Hive/Oozie, these applications will need to come up with a cluster-wide unique id to be able to create a domain as domainIds need to be unique. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2833) Timeline Domains should be immutable by default
[ https://issues.apache.org/jira/browse/YARN-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523602#comment-14523602 ] Hitesh Shah commented on YARN-2833: --- [~zjshen] Is there any thinking of how acls will work with v2? Can you either move this into the v2 sub-tasks or create a new jira and close this out as a wont-fix assuming v1 will not be enhanced. Timeline Domains should be immutable by default --- Key: YARN-2833 URL: https://issues.apache.org/jira/browse/YARN-2833 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Hitesh Shah In a general sense, when ACLs are defined for applications in various orgs deploying Hadoop clusters, the ratio of unique ACLs to no. of jobs run on the cluster should likely be a low value. In such a situation, it makes sense to have a way to normalize the ACL set to generate an immutable domain id. This should likely have performance and storage benefits. There may be some cases where domains may need to be mutable. For that, I propose a flag to be set when the domain is being created ( flag's default value being immutable ). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-857) Localization failures should be available in container diagnostics
[ https://issues.apache.org/jira/browse/YARN-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-857: - Summary: Localization failures should be available in container diagnostics (was: Errors when localizing end up with the localization failure not being seen by the NM) Localization failures should be available in container diagnostics -- Key: YARN-857 URL: https://issues.apache.org/jira/browse/YARN-857 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: YARN-857.1.patch, YARN-857.2.patch at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:106) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:978) Traced this down to DefaultExecutor which does not look at the exit code for the localizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-857) Errors when localizing end up with the localization failure not being seen by the NM
[ https://issues.apache.org/jira/browse/YARN-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-857: - Priority: Critical (was: Major) Errors when localizing end up with the localization failure not being seen by the NM Key: YARN-857 URL: https://issues.apache.org/jira/browse/YARN-857 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Vinod Kumar Vavilapalli Priority: Critical Attachments: YARN-857.1.patch, YARN-857.2.patch at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:106) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:978) Traced this down to DefaultExecutor which does not look at the exit code for the localizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-857) Errors when localizing end up with the localization failure not being seen by the NM
[ https://issues.apache.org/jira/browse/YARN-857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-857: - Target Version/s: 2.8.0 (was: 2.1.0-beta) Errors when localizing end up with the localization failure not being seen by the NM Key: YARN-857 URL: https://issues.apache.org/jira/browse/YARN-857 Project: Hadoop YARN Issue Type: Sub-task Reporter: Hitesh Shah Assignee: Vinod Kumar Vavilapalli Attachments: YARN-857.1.patch, YARN-857.2.patch at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:106) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:978) Traced this down to DefaultExecutor which does not look at the exit code for the localizer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-971) hadoop-yarn-api pom does not define a dependencies tag
[ https://issues.apache.org/jira/browse/YARN-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved YARN-971. -- Resolution: Duplicate hadoop-yarn-api pom does not define a dependencies tag -- Key: YARN-971 URL: https://issues.apache.org/jira/browse/YARN-971 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Attachments: yarn-971-v1.patch As there is no dependencies tag defined in the pom, it inherits all the dependencies defined in hadoop-yarn-project/pom.xml which contains a huge list with dependencies like guice, netty, hdfs, jersey etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-931) Overlapping classes across hadoop-yarn-api and hadoop-yarn-common
[ https://issues.apache.org/jira/browse/YARN-931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-931: - Target Version/s: (was: 2.1.0-beta) Overlapping classes across hadoop-yarn-api and hadoop-yarn-common - Key: YARN-931 URL: https://issues.apache.org/jira/browse/YARN-931 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah hadoop-yarn-common-3.0.0-SNAPSHOT.jar, hadoop-yarn-api-3.0.0-SNAPSHOT.jar define 3 overlappping classes: [WARNING] - org.apache.hadoop.yarn.factories.package-info [WARNING] - org.apache.hadoop.yarn.util.package-info [WARNING] - org.apache.hadoop.yarn.factory.providers.package-info -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2833) Timeline Domains should be immutable by default
[ https://issues.apache.org/jira/browse/YARN-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523622#comment-14523622 ] Hitesh Shah commented on YARN-2833: --- [~zjshen] Please create a dup of this with the relevant info so that it is considered for the v2 design. I will close this one out. Timeline Domains should be immutable by default --- Key: YARN-2833 URL: https://issues.apache.org/jira/browse/YARN-2833 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Hitesh Shah In a general sense, when ACLs are defined for applications in various orgs deploying Hadoop clusters, the ratio of unique ACLs to no. of jobs run on the cluster should likely be a low value. In such a situation, it makes sense to have a way to normalize the ACL set to generate an immutable domain id. This should likely have performance and storage benefits. There may be some cases where domains may need to be mutable. For that, I propose a flag to be set when the domain is being created ( flag's default value being immutable ). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2840) Timeline should support creation of Domains where domainId is not provided by the user
[ https://issues.apache.org/jira/browse/YARN-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523623#comment-14523623 ] Hitesh Shah commented on YARN-2840: --- [~zjshen] Please create a dup of this with the relevant info so that it is considered for the v2 design. I will close this one out. Timeline should support creation of Domains where domainId is not provided by the user -- Key: YARN-2840 URL: https://issues.apache.org/jira/browse/YARN-2840 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Hitesh Shah Current expectation is that the user has to come up with a unique domain id. When using this with applications such as Pig/Hive/Oozie, these applications will need to come up with a cluster-wide unique id to be able to create a domain as domainIds need to be unique. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2833) Timeline Domains should be immutable by default
[ https://issues.apache.org/jira/browse/YARN-2833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved YARN-2833. --- Resolution: Won't Fix Timeline Domains should be immutable by default --- Key: YARN-2833 URL: https://issues.apache.org/jira/browse/YARN-2833 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Hitesh Shah In a general sense, when ACLs are defined for applications in various orgs deploying Hadoop clusters, the ratio of unique ACLs to no. of jobs run on the cluster should likely be a low value. In such a situation, it makes sense to have a way to normalize the ACL set to generate an immutable domain id. This should likely have performance and storage benefits. There may be some cases where domains may need to be mutable. For that, I propose a flag to be set when the domain is being created ( flag's default value being immutable ). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2840) Timeline should support creation of Domains where domainId is not provided by the user
[ https://issues.apache.org/jira/browse/YARN-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved YARN-2840. --- Resolution: Won't Fix Timeline should support creation of Domains where domainId is not provided by the user -- Key: YARN-2840 URL: https://issues.apache.org/jira/browse/YARN-2840 Project: Hadoop YARN Issue Type: Improvement Components: timelineserver Reporter: Hitesh Shah Current expectation is that the user has to come up with a unique domain id. When using this with applications such as Pig/Hive/Oozie, these applications will need to come up with a cluster-wide unique id to be able to create a domain as domainIds need to be unique. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app
[ https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520378#comment-14520378 ] Hitesh Shah commented on YARN-3544: --- Doesnt the NM log link redirect the log server after the logs have been aggregated? AM logs link missing in the RM UI for a completed app -- Key: YARN-3544 URL: https://issues.apache.org/jira/browse/YARN-3544 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.7.0 Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Blocker Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, YARN-3544.1.patch AM log links should always be present ( for both running and completed apps). Likewise node info is also empty. This is usually quite crucial when trying to debug where an AM was launched and a pointer to which NM's logs to look at if the AM failed to launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3544) AM logs link missing in the RM UI for a completed app
[ https://issues.apache.org/jira/browse/YARN-3544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520379#comment-14520379 ] Hitesh Shah commented on YARN-3544: --- I meant redirect to the log server AM logs link missing in the RM UI for a completed app -- Key: YARN-3544 URL: https://issues.apache.org/jira/browse/YARN-3544 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.7.0 Reporter: Hitesh Shah Assignee: Xuan Gong Priority: Blocker Attachments: Screen Shot 2015-04-27 at 6.24.05 PM.png, YARN-3544.1.patch AM log links should always be present ( for both running and completed apps). Likewise node info is also empty. This is usually quite crucial when trying to debug where an AM was launched and a pointer to which NM's logs to look at if the AM failed to launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515016#comment-14515016 ] Hitesh Shah commented on YARN-2859: --- [~zjshen] Are you planning to look at this? [~vinodkv] this will be a good candidate for 2.6.1 ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster -- Key: YARN-2859 URL: https://issues.apache.org/jira/browse/YARN-2859 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Hitesh Shah Assignee: Zhijie Shen Priority: Critical In mini cluster, a random port should be used. Also, the config is not updated to the host that the process got bound to. {code} 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer address: localhost:10200 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer web address: 0.0.0.0:8188 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14515828#comment-14515828 ] Hitesh Shah commented on YARN-2092: --- Closing this out as it is no longer an issue for tez. Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT Key: YARN-2092 URL: https://issues.apache.org/jira/browse/YARN-2092 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Came across this when trying to integrate with the timeline server. Using a 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user jars are first in the classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2092) Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/YARN-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah resolved YARN-2092. --- Resolution: Not A Problem Incompatible org.codehaus.jackson* dependencies when moving from 2.4.0 to 2.5.0-SNAPSHOT Key: YARN-2092 URL: https://issues.apache.org/jira/browse/YARN-2092 Project: Hadoop YARN Issue Type: Bug Reporter: Hitesh Shah Came across this when trying to integrate with the timeline server. Using a 1.8.8 dependency of jackson works fine against 2.4.0 but fails against 2.5.0-SNAPSHOT which needs 1.9.13. This is in the scenario where the user jars are first in the classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2859) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster
[ https://issues.apache.org/jira/browse/YARN-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-2859: -- Target Version/s: 2.6.1, 2.8.0 (was: 2.8.0) ApplicationHistoryServer binds to default port 8188 in MiniYARNCluster -- Key: YARN-2859 URL: https://issues.apache.org/jira/browse/YARN-2859 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Hitesh Shah Assignee: Zhijie Shen Priority: Critical In mini cluster, a random port should be used. Also, the config is not updated to the host that the process got bound to. {code} 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(722)) - MiniYARN ApplicationHistoryServer address: localhost:10200 2014-11-13 13:07:01,905 INFO [main] server.MiniYARNCluster (MiniYARNCluster.java:serviceStart(724)) - MiniYARN ApplicationHistoryServer web address: 0.0.0.0:8188 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3544) AM logs link missing in the RM UI for a completed app
Hitesh Shah created YARN-3544: - Summary: AM logs link missing in the RM UI for a completed app Key: YARN-3544 URL: https://issues.apache.org/jira/browse/YARN-3544 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Reporter: Hitesh Shah AM log links should always be present ( for both running and completed apps). Likewise node info is also empty. This is usually quite crucial when trying to debug where an AM was launched and a pointer to which NM's logs to look at if the AM failed to launch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name
[ https://issues.apache.org/jira/browse/YARN-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498470#comment-14498470 ] Hitesh Shah commented on YARN-2976: --- Typo: meant to say that it *does clash with the current config property name. Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name Key: YARN-2976 URL: https://issues.apache.org/jira/browse/YARN-2976 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Vijay Bhat Priority: Minor Docs on http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html mention setting docker -H=tcp://0.0.0.0:4243 for yarn.nodemanager.docker-container-executor.exec-name. However, the actual implementation does a fileExists for the specified value. Either the docs need to be fixed or the impl changed to allow relative paths or commands with additional args -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name
[ https://issues.apache.org/jira/browse/YARN-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498466#comment-14498466 ] Hitesh Shah commented on YARN-2976: --- The latter definitely makes more sense but it does not clash with the config property name. Maybe, we can deprecate the old one in favor of the newer config property which supports a flexible command ( relative path, args, etc)? For the old/cuurent one, we can fix the docs to say that it does a file exists check and does not support additional args? Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name Key: YARN-2976 URL: https://issues.apache.org/jira/browse/YARN-2976 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Vijay Bhat Priority: Minor Docs on http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html mention setting docker -H=tcp://0.0.0.0:4243 for yarn.nodemanager.docker-container-executor.exec-name. However, the actual implementation does a fileExists for the specified value. Either the docs need to be fixed or the impl changed to allow relative paths or commands with additional args -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2976) Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name
[ https://issues.apache.org/jira/browse/YARN-2976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498493#comment-14498493 ] Hitesh Shah commented on YARN-2976: --- Agreed. The newer one would take precedence. Invalid docs for specifying yarn.nodemanager.docker-container-executor.exec-name Key: YARN-2976 URL: https://issues.apache.org/jira/browse/YARN-2976 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Hitesh Shah Assignee: Vijay Bhat Priority: Minor Docs on http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/DockerContainerExecutor.html mention setting docker -H=tcp://0.0.0.0:4243 for yarn.nodemanager.docker-container-executor.exec-name. However, the actual implementation does a fileExists for the specified value. Either the docs need to be fixed or the impl changed to allow relative paths or commands with additional args -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2890) MiniYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated YARN-2890: -- Summary: MiniYarnCluster should turn on timeline service if configured to do so (was: MiniMRYarnCluster should turn on timeline service if configured to do so) MiniYarnCluster should turn on timeline service if configured to do so -- Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486033#comment-14486033 ] Hitesh Shah commented on YARN-2890: --- +1. Thanks for patiently addressing review comments. Committing shortly. MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, YARN-2890.4.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2890) MiniMRYarnCluster should turn on timeline service if configured to do so
[ https://issues.apache.org/jira/browse/YARN-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393445#comment-14393445 ] Hitesh Shah commented on YARN-2890: --- Sorry did not check the last update. Minor nit: Some of the test changes in TestMRTimelineEventHandling probably need to belong in TestMiniYarnCluster if that exists as yarn timeline flag behaviour checks should ideally be tested in yarn code and not MR code. MiniMRYarnCluster should turn on timeline service if configured to do so Key: YARN-2890 URL: https://issues.apache.org/jira/browse/YARN-2890 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: YARN-2890.1.patch, YARN-2890.2.patch, YARN-2890.3.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch, YARN-2890.patch Currently the MiniMRYarnCluster does not consider the configuration value for enabling timeline service before starting. The MiniYarnCluster should only start the timeline service if it is configured to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388817#comment-14388817 ] Hitesh Shah commented on YARN-3304: --- [~kasha] 3.0.0 is a major release. I would assume all deprecated apis should be removed. Given the length of time after which new major releases come into existence, there would be no point of deprecating apis if they are not removed in the next major release. ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3304-appendix-v2.patch, YARN-3304-appendix.patch, YARN-3304-v2.patch, YARN-3304-v3.patch, YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, YARN-3304.patch, yarn-3304-5.patch Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387345#comment-14387345 ] Hitesh Shah commented on YARN-3304: --- Sigh. This broke compatibility again. Was there a reason why APIs were just removed/renamed instead of some form of supporting 2 APIs with a way to check at runtime whether the plugin supports old or new APIs? ( and the old ones being deprecated ). {code} public int getValueFromOldAPI(); public int getValueFromNewAPI(); public boolean supportsNewAPI() { return false; } if (supportsNewAPI()) { getValueFromNewAPI(); } else { getValueFromOldAPI(); } ... {code} ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3304-v2.patch, YARN-3304-v3.patch, YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, YARN-3304.patch, yarn-3304-5.patch Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah reopened YARN-3304: --- Also, FWIW, ResourceCalculatorProcessTree is a public API. Re-opening as this breaks Tez ( again ). ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3304-v2.patch, YARN-3304-v3.patch, YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, YARN-3304.patch, yarn-3304-5.patch Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387810#comment-14387810 ] Hitesh Shah commented on YARN-3304: --- https://issues.apache.org/jira/browse/YARN-3297 is probably relevant too. ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3304-v2.patch, YARN-3304-v3.patch, YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, YARN-3304.patch, yarn-3304-5.patch Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387808#comment-14387808 ] Hitesh Shah commented on YARN-3304: --- [~aw] Thanks for putting it so bluntly. You may wish to look at the related jiras such as https://issues.apache.org/jira/browse/YARN-3296. ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3304-v2.patch, YARN-3304-v3.patch, YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, YARN-3304.patch, yarn-3304-5.patch Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3304) ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters
[ https://issues.apache.org/jira/browse/YARN-3304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14387656#comment-14387656 ] Hitesh Shah commented on YARN-3304: --- Forgot to add, we use this for resource monitoring for a task within a container. Given that we run multiple tasks within the same container, this api stability becomes more important as YARN cannot provide the resource monitoring functionality at the granularity that we need. ResourceCalculatorProcessTree#getCpuUsagePercent default return value is inconsistent with other getters Key: YARN-3304 URL: https://issues.apache.org/jira/browse/YARN-3304 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Reporter: Junping Du Assignee: Junping Du Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3304-v2.patch, YARN-3304-v3.patch, YARN-3304-v4-boolean-way.patch, YARN-3304-v4-negative-way-MR.patch, YARN-3304-v4-negtive-value-way.patch, YARN-3304-v6-no-rename.patch, YARN-3304-v6-with-rename.patch, YARN-3304-v7.patch, YARN-3304-v8.patch, YARN-3304.patch, yarn-3304-5.patch Per discussions in YARN-3296, getCpuUsagePercent() will return -1 for unavailable case while other resource metrics are return 0 in the same case which sounds inconsistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)