Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Munagala Ramanath
Agree it should be via YARN; the poison pill would be the final barrier in the event all other mechanisms have failed -- sort of like an API call which documents that a parameter should be non-null but nevertheless checks it internally and throws an exception if it finds null. Additionally, it

Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Munagala Ramanath
The OP is claiming (in the comment to the first response) that he actually tried the proposed solution and it did not work for him and shows the RM code fragment that is clobbering his preference. Ram On Fri, Dec 2, 2016 at 12:17 AM, Sandesh Hegde wrote: > Yarn allows

Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Amol Kekre
Stram exclude node should be via Yarn, poison pill is not a good way as it induces a terminate for wrong reasons. Thks Amol On Fri, Dec 2, 2016 at 7:13 AM, Munagala Ramanath wrote: > Could STRAM include a poison pill where it simply exits with diagnostic if > its host

Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Munagala Ramanath
Could STRAM include a poison pill where it simply exits with diagnostic if its host name is blacklisted ? Ram On Thu, Dec 1, 2016 at 11:52 PM, Amol Kekre wrote: > Yarn will deploy AM (Stram) on a node of its choice, therey rendering any > attribute within the app

Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Milind Barve
So all Apex will need to do is - to make sure as a part of the initial configuration validations that the node selected to run the master is not a part of the "excludeNode" list. On Fri, Dec 2, 2016 at 1:47 PM, Sandesh Hegde wrote: > Yarn allows the AppMaster to run on

Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Sandesh Hegde
Yarn allows the AppMaster to run on the selected node, Apex shouldn't select the blacklisted nodes, so it is possible to achieve not running the Apex containers on certain nodes. http://stackoverflow.com/questions/29302659/run-my-own-application-master-on-a-specific-node-in-a-yarn-cluster On

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Amol Kekre
Yarn will deploy AM (Stram) on a node of its choice, therey rendering any attribute within the app un-enforceable in terms of not deploying master on a node. Thks Amol On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve wrote: > Additionally, this would apply to Stram as well

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Milind Barve
While it is possible to extend anti-affinity to take care of this, I feel it will cause confusion from a user perspective. As a user, when I think about anti-affinity, what comes to mind right away is a relative relation between operators. On the other hand, the current ask is not that, but a

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Bhupesh Chawda
Okay, I think that serves an alternate purpose of detecting any newly gone bad node and excluding it. +1 for covering the original scenario under anti-affinity. ~ Bhupesh On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath wrote: > It only takes effect after failures -- no

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Munagala Ramanath
It only takes effect after failures -- no way to exclude from the get-go. Ram On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" wrote: > As suggested by Sandesh, the parameter > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do exactly what > is needed. > Why would

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread AJAY GUPTA
Hi, Can't we make use of existing Node Label + queue feature in Yarn to achieve this. Though we will have to redeploy cluster, its still possible to exclude nodes. https://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/NodeLabel.html Thanks, Ajay On Fri, Dec 2, 2016 at 5:57 AM,

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Amol Kekre
I agree, this should be on top of affinity work Thks Amol On Thu, Dec 1, 2016 at 1:01 PM, Pramod Immaneni wrote: > I see a host locality available as an attribute in DAG for individual > operators. If affinity doesn't support this today, we could probably add > it. You

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Pramod Immaneni
I see a host locality available as an attribute in DAG for individual operators. If affinity doesn't support this today, we could probably add it. You could also make setting a blacklist directly a convenience function on top of affinity. On Thu, Dec 1, 2016 at 11:58 AM, Sandesh Hegde

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Pramod Immaneni
Shouldn't this be already covered by anti-affinity. Today users can specify multiple affinity rules, for each rule they can specify positive or negative affinity, locality and operator selection. If an affinity rule specifying negative affinity, node locality and all operators, does not work then

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Sandesh Hegde
I have created a jira, for adding the list of blacklisted nodes, https://issues.apache.org/jira/browse/APEXCORE-584 On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare wrote: > Yes, Ram explained to me that in practice this would be a useful feature > for Apex devops who

Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sanjay Pujare
Yes, Ram explained to me that in practice this would be a useful feature for Apex devops who typically have no control over Hadoop/Yarn cluster. On 11/30/16, 9:22 PM, "Mohit Jotwani" wrote: This is a practical scenario where developers would be required to exclude

Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sandesh Hegde
Apex has automatic blacklisting of the troublesome nodes, please take a look at the following attributes, MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST https://www.datatorrent.com/docs/apidocs/com/datatorrent/api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST

Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Munagala Ramanath
Not sure if this is what Milind had in mind but we often run into situations where the dev group working with Apex has no control over cluster configuration -- to make any changes to the cluster they need to go through an elaborate process that can take many days. Meanwhile, if they notice that a

Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Amol Kekre
I agree, Randomly rebooting node is Yarn issue. Even anti-affinity between apps should be Yarn in long run. We could contribute the above jira. Thks Amol On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare wrote: > To me both use cases appear to be generic resource

Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sanjay Pujare
To me both use cases appear to be generic resource management use cases. For example, a randomly rebooting node is not good for any purpose esp. long running apps so it is a bit of a stretch to imagine that these nodes will be acceptable for some batch jobs in Yarn. So such a node should be

Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Munagala Ramanath
But then, what's the solution to the 2 problem scenarios that Milind describes ? Ram On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare wrote: > I think “exclude nodes” and such is really the job of the resource manager > i.e. Yarn. So I am not sure taking over some of

Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sanjay Pujare
I think “exclude nodes” and such is really the job of the resource manager i.e. Yarn. So I am not sure taking over some of these tasks in Apex would be very useful. I agree with Amol that apps should be node neutral. Resource management in Yarn together with fault tolerance in Apex should

Re: "ExcludeNodes" for an Apex application

2016-11-29 Thread Amol Kekre
We do have this feature in Yarn, but that applies to all applications. I am not sure if Yarn has anti-affinity. This feature may be used, but in general there is danger is an application taking over resource allocation. Another quirk is that big data apps should ideally be node-neutral. This is a

"ExcludeNodes" for an Apex application

2016-11-29 Thread Milind Barve
We have seen 2 cases mentioned below, where, it would have been nice if Apex allowed us to exclude a node from the cluster for an application. 1. A node in the cluster had gone bad (was randomly rebooting) and so an Apex app should not use it - other apps can use it as they were batch jobs. 2. A