Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Munagala Ramanath
Agree it should be via YARN; the poison pill would be the final barrier in
the event
all other mechanisms have failed -- sort of like an API call which
documents that a parameter
should be non-null but nevertheless checks it internally and throws an
exception if it finds null.

Additionally, it also helps teams that do not have control over YARN
configuration.

Ram

On Fri, Dec 2, 2016 at 7:15 AM, Amol Kekre  wrote:

> Stram exclude node should be via Yarn, poison pill is not a good way as it
> induces a terminate for wrong reasons.
>
> Thks
> Amol
>
>
> On Fri, Dec 2, 2016 at 7:13 AM, Munagala Ramanath 
> wrote:
>
> > Could STRAM include a poison pill where it simply exits with diagnostic
> if
> > its host name is blacklisted ?
> >
> > Ram
> >
> > On Thu, Dec 1, 2016 at 11:52 PM, Amol Kekre 
> wrote:
> >
> > > Yarn will deploy AM (Stram) on a node of its choice, therey rendering
> any
> > > attribute within the app un-enforceable in terms of not deploying
> master
> > on
> > > a node.
> > >
> > > Thks
> > > Amol
> > >
> > >
> > > On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve 
> wrote:
> > >
> > > > Additionally, this would apply to Stram as well i.e. the master
> should
> > > also
> > > > not be deployed on these nodes. Not sure if anti-affinity goes beyond
> > > > operators.
> > > >
> > > > On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve 
> > wrote:
> > > >
> > > > > My previous mail explains it, but just forgot to add : -1 to cover
> > this
> > > > > under anti affinity.
> > > > >
> > > > > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve 
> > > wrote:
> > > > >
> > > > >> While it is possible to extend anti-affinity to take care of
> this, I
> > > > feel
> > > > >> it will cause confusion from a user perspective. As a user, when I
> > > think
> > > > >> about anti-affinity, what comes to mind right away is a relative
> > > > relation
> > > > >> between operators.
> > > > >>
> > > > >> On the other hand, the current ask is not that, but a relation at
> an
> > > > >> application level w.r.t. a node. (Further, we might even think of
> > > > extending
> > > > >> this at an operator level - which would mean do not deploy an
> > operator
> > > > on a
> > > > >> particular node)
> > > > >>
> > > > >> We would be better off clearly articulating and allowing users to
> > > > >> configure it seperately as against using anti-affinity.
> > > > >>
> > > > >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> > > > bhup...@datatorrent.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Okay, I think that serves an alternate purpose of detecting any
> > newly
> > > > >>> gone
> > > > >>> bad node and excluding it.
> > > > >>>
> > > > >>> +1 for covering the original scenario under anti-affinity.
> > > > >>>
> > > > >>> ~ Bhupesh
> > > > >>>
> > > > >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath <
> > > r...@datatorrent.com
> > > > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> > It only takes effect after failures -- no way to exclude from
> the
> > > > >>> get-go.
> > > > >>> >
> > > > >>> > Ram
> > > > >>> >
> > > > >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" <
> > bhup...@datatorrent.com>
> > > > >>> wrote:
> > > > >>> >
> > > > >>> > > As suggested by Sandesh, the parameter
> > > > >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> > > > exactly
> > > > >>> > what
> > > > >>> > > is needed.
> > > > >>> > > Why would this not work?
> > > > >>> > >
> > > > >>> > > ~ Bhupesh
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> ~Milind bee at gee mail dot com
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > ~Milind bee at gee mail dot com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > ~Milind bee at gee mail dot com
> > > >
> > >
> >
>


Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Munagala Ramanath
The OP is claiming (in the comment to the first response) that he actually
tried the
proposed solution and it did not work for him and shows the RM code fragment
that is clobbering his preference.

Ram

On Fri, Dec 2, 2016 at 12:17 AM, Sandesh Hegde 
wrote:

> Yarn allows the AppMaster to run on the selected node, Apex shouldn't
> select the blacklisted nodes, so it is possible to achieve not running the
> Apex containers on certain nodes.
>
> http://stackoverflow.com/questions/29302659/run-my-own-
> application-master-on-a-specific-node-in-a-yarn-cluster
>
>
> On Thu, Dec 1, 2016 at 11:52 PM Amol Kekre  wrote:
>
> > Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
> > attribute within the app un-enforceable in terms of not deploying master
> on
> > a node.
> >
> > Thks
> > Amol
> >
> >
> > On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve  wrote:
> >
> > > Additionally, this would apply to Stram as well i.e. the master should
> > also
> > > not be deployed on these nodes. Not sure if anti-affinity goes beyond
> > > operators.
> > >
> > > On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve 
> wrote:
> > >
> > > > My previous mail explains it, but just forgot to add : -1 to cover
> this
> > > > under anti affinity.
> > > >
> > > > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve 
> > wrote:
> > > >
> > > >> While it is possible to extend anti-affinity to take care of this, I
> > > feel
> > > >> it will cause confusion from a user perspective. As a user, when I
> > think
> > > >> about anti-affinity, what comes to mind right away is a relative
> > > relation
> > > >> between operators.
> > > >>
> > > >> On the other hand, the current ask is not that, but a relation at an
> > > >> application level w.r.t. a node. (Further, we might even think of
> > > extending
> > > >> this at an operator level - which would mean do not deploy an
> operator
> > > on a
> > > >> particular node)
> > > >>
> > > >> We would be better off clearly articulating and allowing users to
> > > >> configure it seperately as against using anti-affinity.
> > > >>
> > > >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> > > bhup...@datatorrent.com>
> > > >> wrote:
> > > >>
> > > >>> Okay, I think that serves an alternate purpose of detecting any
> newly
> > > >>> gone
> > > >>> bad node and excluding it.
> > > >>>
> > > >>> +1 for covering the original scenario under anti-affinity.
> > > >>>
> > > >>> ~ Bhupesh
> > > >>>
> > > >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath <
> > r...@datatorrent.com
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> > It only takes effect after failures -- no way to exclude from the
> > > >>> get-go.
> > > >>> >
> > > >>> > Ram
> > > >>> >
> > > >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" <
> bhup...@datatorrent.com>
> > > >>> wrote:
> > > >>> >
> > > >>> > > As suggested by Sandesh, the parameter
> > > >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> > > exactly
> > > >>> > what
> > > >>> > > is needed.
> > > >>> > > Why would this not work?
> > > >>> > >
> > > >>> > > ~ Bhupesh
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> ~Milind bee at gee mail dot com
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > ~Milind bee at gee mail dot com
> > > >
> > >
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
>


Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Amol Kekre
Stram exclude node should be via Yarn, poison pill is not a good way as it
induces a terminate for wrong reasons.

Thks
Amol


On Fri, Dec 2, 2016 at 7:13 AM, Munagala Ramanath 
wrote:

> Could STRAM include a poison pill where it simply exits with diagnostic if
> its host name is blacklisted ?
>
> Ram
>
> On Thu, Dec 1, 2016 at 11:52 PM, Amol Kekre  wrote:
>
> > Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
> > attribute within the app un-enforceable in terms of not deploying master
> on
> > a node.
> >
> > Thks
> > Amol
> >
> >
> > On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve  wrote:
> >
> > > Additionally, this would apply to Stram as well i.e. the master should
> > also
> > > not be deployed on these nodes. Not sure if anti-affinity goes beyond
> > > operators.
> > >
> > > On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve 
> wrote:
> > >
> > > > My previous mail explains it, but just forgot to add : -1 to cover
> this
> > > > under anti affinity.
> > > >
> > > > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve 
> > wrote:
> > > >
> > > >> While it is possible to extend anti-affinity to take care of this, I
> > > feel
> > > >> it will cause confusion from a user perspective. As a user, when I
> > think
> > > >> about anti-affinity, what comes to mind right away is a relative
> > > relation
> > > >> between operators.
> > > >>
> > > >> On the other hand, the current ask is not that, but a relation at an
> > > >> application level w.r.t. a node. (Further, we might even think of
> > > extending
> > > >> this at an operator level - which would mean do not deploy an
> operator
> > > on a
> > > >> particular node)
> > > >>
> > > >> We would be better off clearly articulating and allowing users to
> > > >> configure it seperately as against using anti-affinity.
> > > >>
> > > >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> > > bhup...@datatorrent.com>
> > > >> wrote:
> > > >>
> > > >>> Okay, I think that serves an alternate purpose of detecting any
> newly
> > > >>> gone
> > > >>> bad node and excluding it.
> > > >>>
> > > >>> +1 for covering the original scenario under anti-affinity.
> > > >>>
> > > >>> ~ Bhupesh
> > > >>>
> > > >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath <
> > r...@datatorrent.com
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> > It only takes effect after failures -- no way to exclude from the
> > > >>> get-go.
> > > >>> >
> > > >>> > Ram
> > > >>> >
> > > >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" <
> bhup...@datatorrent.com>
> > > >>> wrote:
> > > >>> >
> > > >>> > > As suggested by Sandesh, the parameter
> > > >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> > > exactly
> > > >>> > what
> > > >>> > > is needed.
> > > >>> > > Why would this not work?
> > > >>> > >
> > > >>> > > ~ Bhupesh
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> ~Milind bee at gee mail dot com
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > ~Milind bee at gee mail dot com
> > > >
> > >
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
>


Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Munagala Ramanath
Could STRAM include a poison pill where it simply exits with diagnostic if
its host name is blacklisted ?

Ram

On Thu, Dec 1, 2016 at 11:52 PM, Amol Kekre  wrote:

> Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
> attribute within the app un-enforceable in terms of not deploying master on
> a node.
>
> Thks
> Amol
>
>
> On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve  wrote:
>
> > Additionally, this would apply to Stram as well i.e. the master should
> also
> > not be deployed on these nodes. Not sure if anti-affinity goes beyond
> > operators.
> >
> > On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve  wrote:
> >
> > > My previous mail explains it, but just forgot to add : -1 to cover this
> > > under anti affinity.
> > >
> > > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve 
> wrote:
> > >
> > >> While it is possible to extend anti-affinity to take care of this, I
> > feel
> > >> it will cause confusion from a user perspective. As a user, when I
> think
> > >> about anti-affinity, what comes to mind right away is a relative
> > relation
> > >> between operators.
> > >>
> > >> On the other hand, the current ask is not that, but a relation at an
> > >> application level w.r.t. a node. (Further, we might even think of
> > extending
> > >> this at an operator level - which would mean do not deploy an operator
> > on a
> > >> particular node)
> > >>
> > >> We would be better off clearly articulating and allowing users to
> > >> configure it seperately as against using anti-affinity.
> > >>
> > >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> > bhup...@datatorrent.com>
> > >> wrote:
> > >>
> > >>> Okay, I think that serves an alternate purpose of detecting any newly
> > >>> gone
> > >>> bad node and excluding it.
> > >>>
> > >>> +1 for covering the original scenario under anti-affinity.
> > >>>
> > >>> ~ Bhupesh
> > >>>
> > >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath <
> r...@datatorrent.com
> > >
> > >>> wrote:
> > >>>
> > >>> > It only takes effect after failures -- no way to exclude from the
> > >>> get-go.
> > >>> >
> > >>> > Ram
> > >>> >
> > >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" 
> > >>> wrote:
> > >>> >
> > >>> > > As suggested by Sandesh, the parameter
> > >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> > exactly
> > >>> > what
> > >>> > > is needed.
> > >>> > > Why would this not work?
> > >>> > >
> > >>> > > ~ Bhupesh
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> ~Milind bee at gee mail dot com
> > >>
> > >
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
> >
> >
> > --
> > ~Milind bee at gee mail dot com
> >
>


Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Milind Barve
So all Apex will need to do is - to make sure as a part of the initial
configuration validations that the node selected to run the master is not a
part of the "excludeNode" list.

On Fri, Dec 2, 2016 at 1:47 PM, Sandesh Hegde 
wrote:

> Yarn allows the AppMaster to run on the selected node, Apex shouldn't
> select the blacklisted nodes, so it is possible to achieve not running the
> Apex containers on certain nodes.
>
> http://stackoverflow.com/questions/29302659/run-my-own-
> application-master-on-a-specific-node-in-a-yarn-cluster
>
>
> On Thu, Dec 1, 2016 at 11:52 PM Amol Kekre  wrote:
>
> > Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
> > attribute within the app un-enforceable in terms of not deploying master
> on
> > a node.
> >
> > Thks
> > Amol
> >
> >
> > On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve  wrote:
> >
> > > Additionally, this would apply to Stram as well i.e. the master should
> > also
> > > not be deployed on these nodes. Not sure if anti-affinity goes beyond
> > > operators.
> > >
> > > On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve 
> wrote:
> > >
> > > > My previous mail explains it, but just forgot to add : -1 to cover
> this
> > > > under anti affinity.
> > > >
> > > > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve 
> > wrote:
> > > >
> > > >> While it is possible to extend anti-affinity to take care of this, I
> > > feel
> > > >> it will cause confusion from a user perspective. As a user, when I
> > think
> > > >> about anti-affinity, what comes to mind right away is a relative
> > > relation
> > > >> between operators.
> > > >>
> > > >> On the other hand, the current ask is not that, but a relation at an
> > > >> application level w.r.t. a node. (Further, we might even think of
> > > extending
> > > >> this at an operator level - which would mean do not deploy an
> operator
> > > on a
> > > >> particular node)
> > > >>
> > > >> We would be better off clearly articulating and allowing users to
> > > >> configure it seperately as against using anti-affinity.
> > > >>
> > > >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> > > bhup...@datatorrent.com>
> > > >> wrote:
> > > >>
> > > >>> Okay, I think that serves an alternate purpose of detecting any
> newly
> > > >>> gone
> > > >>> bad node and excluding it.
> > > >>>
> > > >>> +1 for covering the original scenario under anti-affinity.
> > > >>>
> > > >>> ~ Bhupesh
> > > >>>
> > > >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath <
> > r...@datatorrent.com
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> > It only takes effect after failures -- no way to exclude from the
> > > >>> get-go.
> > > >>> >
> > > >>> > Ram
> > > >>> >
> > > >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" <
> bhup...@datatorrent.com>
> > > >>> wrote:
> > > >>> >
> > > >>> > > As suggested by Sandesh, the parameter
> > > >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> > > exactly
> > > >>> > what
> > > >>> > > is needed.
> > > >>> > > Why would this not work?
> > > >>> > >
> > > >>> > > ~ Bhupesh
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> ~Milind bee at gee mail dot com
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > ~Milind bee at gee mail dot com
> > > >
> > >
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
>



-- 
~Milind bee at gee mail dot com


Re: "ExcludeNodes" for an Apex application

2016-12-02 Thread Sandesh Hegde
Yarn allows the AppMaster to run on the selected node, Apex shouldn't
select the blacklisted nodes, so it is possible to achieve not running the
Apex containers on certain nodes.

http://stackoverflow.com/questions/29302659/run-my-own-application-master-on-a-specific-node-in-a-yarn-cluster


On Thu, Dec 1, 2016 at 11:52 PM Amol Kekre  wrote:

> Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
> attribute within the app un-enforceable in terms of not deploying master on
> a node.
>
> Thks
> Amol
>
>
> On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve  wrote:
>
> > Additionally, this would apply to Stram as well i.e. the master should
> also
> > not be deployed on these nodes. Not sure if anti-affinity goes beyond
> > operators.
> >
> > On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve  wrote:
> >
> > > My previous mail explains it, but just forgot to add : -1 to cover this
> > > under anti affinity.
> > >
> > > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve 
> wrote:
> > >
> > >> While it is possible to extend anti-affinity to take care of this, I
> > feel
> > >> it will cause confusion from a user perspective. As a user, when I
> think
> > >> about anti-affinity, what comes to mind right away is a relative
> > relation
> > >> between operators.
> > >>
> > >> On the other hand, the current ask is not that, but a relation at an
> > >> application level w.r.t. a node. (Further, we might even think of
> > extending
> > >> this at an operator level - which would mean do not deploy an operator
> > on a
> > >> particular node)
> > >>
> > >> We would be better off clearly articulating and allowing users to
> > >> configure it seperately as against using anti-affinity.
> > >>
> > >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> > bhup...@datatorrent.com>
> > >> wrote:
> > >>
> > >>> Okay, I think that serves an alternate purpose of detecting any newly
> > >>> gone
> > >>> bad node and excluding it.
> > >>>
> > >>> +1 for covering the original scenario under anti-affinity.
> > >>>
> > >>> ~ Bhupesh
> > >>>
> > >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath <
> r...@datatorrent.com
> > >
> > >>> wrote:
> > >>>
> > >>> > It only takes effect after failures -- no way to exclude from the
> > >>> get-go.
> > >>> >
> > >>> > Ram
> > >>> >
> > >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" 
> > >>> wrote:
> > >>> >
> > >>> > > As suggested by Sandesh, the parameter
> > >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> > exactly
> > >>> > what
> > >>> > > is needed.
> > >>> > > Why would this not work?
> > >>> > >
> > >>> > > ~ Bhupesh
> > >>> > >
> > >>> >
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> ~Milind bee at gee mail dot com
> > >>
> > >
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
> >
> >
> > --
> > ~Milind bee at gee mail dot com
> >
>


Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Amol Kekre
Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
attribute within the app un-enforceable in terms of not deploying master on
a node.

Thks
Amol


On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve  wrote:

> Additionally, this would apply to Stram as well i.e. the master should also
> not be deployed on these nodes. Not sure if anti-affinity goes beyond
> operators.
>
> On Fri, Dec 2, 2016 at 12:47 PM, Milind Barve  wrote:
>
> > My previous mail explains it, but just forgot to add : -1 to cover this
> > under anti affinity.
> >
> > On Fri, Dec 2, 2016 at 12:46 PM, Milind Barve  wrote:
> >
> >> While it is possible to extend anti-affinity to take care of this, I
> feel
> >> it will cause confusion from a user perspective. As a user, when I think
> >> about anti-affinity, what comes to mind right away is a relative
> relation
> >> between operators.
> >>
> >> On the other hand, the current ask is not that, but a relation at an
> >> application level w.r.t. a node. (Further, we might even think of
> extending
> >> this at an operator level - which would mean do not deploy an operator
> on a
> >> particular node)
> >>
> >> We would be better off clearly articulating and allowing users to
> >> configure it seperately as against using anti-affinity.
> >>
> >> On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda <
> bhup...@datatorrent.com>
> >> wrote:
> >>
> >>> Okay, I think that serves an alternate purpose of detecting any newly
> >>> gone
> >>> bad node and excluding it.
> >>>
> >>> +1 for covering the original scenario under anti-affinity.
> >>>
> >>> ~ Bhupesh
> >>>
> >>> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath  >
> >>> wrote:
> >>>
> >>> > It only takes effect after failures -- no way to exclude from the
> >>> get-go.
> >>> >
> >>> > Ram
> >>> >
> >>> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" 
> >>> wrote:
> >>> >
> >>> > > As suggested by Sandesh, the parameter
> >>> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do
> exactly
> >>> > what
> >>> > > is needed.
> >>> > > Why would this not work?
> >>> > >
> >>> > > ~ Bhupesh
> >>> > >
> >>> >
> >>>
> >>
> >>
> >>
> >> --
> >> ~Milind bee at gee mail dot com
> >>
> >
> >
> >
> > --
> > ~Milind bee at gee mail dot com
> >
>
>
>
> --
> ~Milind bee at gee mail dot com
>


Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Milind Barve
While it is possible to extend anti-affinity to take care of this, I feel
it will cause confusion from a user perspective. As a user, when I think
about anti-affinity, what comes to mind right away is a relative relation
between operators.

On the other hand, the current ask is not that, but a relation at an
application level w.r.t. a node. (Further, we might even think of extending
this at an operator level - which would mean do not deploy an operator on a
particular node)

We would be better off clearly articulating and allowing users to configure
it seperately as against using anti-affinity.

On Fri, Dec 2, 2016 at 10:03 AM, Bhupesh Chawda 
wrote:

> Okay, I think that serves an alternate purpose of detecting any newly gone
> bad node and excluding it.
>
> +1 for covering the original scenario under anti-affinity.
>
> ~ Bhupesh
>
> On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath 
> wrote:
>
> > It only takes effect after failures -- no way to exclude from the get-go.
> >
> > Ram
> >
> > On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" 
> wrote:
> >
> > > As suggested by Sandesh, the parameter
> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do exactly
> > what
> > > is needed.
> > > Why would this not work?
> > >
> > > ~ Bhupesh
> > >
> >
>



-- 
~Milind bee at gee mail dot com


Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Bhupesh Chawda
Okay, I think that serves an alternate purpose of detecting any newly gone
bad node and excluding it.

+1 for covering the original scenario under anti-affinity.

~ Bhupesh

On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath 
wrote:

> It only takes effect after failures -- no way to exclude from the get-go.
>
> Ram
>
> On Dec 1, 2016 7:15 PM, "Bhupesh Chawda"  wrote:
>
> > As suggested by Sandesh, the parameter
> > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do exactly
> what
> > is needed.
> > Why would this not work?
> >
> > ~ Bhupesh
> >
>


Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Munagala Ramanath
It only takes effect after failures -- no way to exclude from the get-go.

Ram

On Dec 1, 2016 7:15 PM, "Bhupesh Chawda"  wrote:

> As suggested by Sandesh, the parameter
> MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do exactly what
> is needed.
> Why would this not work?
>
> ~ Bhupesh
>


Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread AJAY GUPTA
Hi,

Can't we make use of existing Node Label + queue feature in Yarn to achieve
this. Though we will have to redeploy cluster, its still possible to
exclude nodes.
https://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/NodeLabel.html


Thanks,
Ajay

On Fri, Dec 2, 2016 at 5:57 AM, Amol Kekre  wrote:

> I agree, this should be on top of affinity work
>
> Thks
> Amol
>
> On Thu, Dec 1, 2016 at 1:01 PM, Pramod Immaneni 
> wrote:
>
> > I see a host locality available as an attribute in DAG for individual
> > operators. If affinity doesn't support this today, we could probably add
> > it. You could also make setting a blacklist directly a convenience
> function
> > on top of affinity.
> >
> > On Thu, Dec 1, 2016 at 11:58 AM, Sandesh Hegde 
> > wrote:
> >
> > > Pramod,
> > >
> > > How to specify,  "don't deploy any operators on Node20" using
> > > anti-affinity?
> > >
> > > I don't see any examples here,
> > > http://apex.apache.org/docs/apex/application_development/#
> affinity-rules
> > >
> > >
> > > On Thu, Dec 1, 2016 at 11:31 AM Pramod Immaneni <
> pra...@datatorrent.com>
> > > wrote:
> > >
> > > > Shouldn't this be already covered by anti-affinity. Today users can
> > > specify
> > > > multiple affinity rules, for each rule they can specify positive or
> > > > negative affinity, locality and operator selection. If an affinity
> rule
> > > > specifying negative affinity, node locality and all operators, does
> not
> > > > work then let's fix that scenario instead of creating a new option.
> > > >
> > > > On Thu, Dec 1, 2016 at 11:17 AM, Sandesh Hegde <
> > sand...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > I have created a jira, for adding the list of blacklisted nodes,
> > > > > https://issues.apache.org/jira/browse/APEXCORE-584
> > > > >
> > > > > On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare <
> > san...@datatorrent.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Yes, Ram explained to me that in practice this would be a useful
> > > > feature
> > > > > > for Apex devops who typically have no control over Hadoop/Yarn
> > > cluster.
> > > > > >
> > > > > > On 11/30/16, 9:22 PM, "Mohit Jotwani" 
> > wrote:
> > > > > >
> > > > > > This is a practical scenario where developers would be
> required
> > > to
> > > > > > exclude
> > > > > > certain nodes as they might be required for some mission
> > critical
> > > > > > applications. It would be good to have this feature.
> > > > > >
> > > > > > I understand that Stram should not get into resourcing and
> > still
> > > > rely
> > > > > > on
> > > > > > Yarn, however, as the App Master it should have the right to
> > > reject
> > > > > the
> > > > > > nodes offered by Yarn and request for other resources.
> > > > > >
> > > > > > Regards,
> > > > > > Mohit
> > > > > >
> > > > > > On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde <
> > > > > sand...@datatorrent.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Apex has automatic blacklisting of the troublesome nodes,
> > > please
> > > > > > take a
> > > > > > > look at the following attributes,
> > > > > > >
> > > > > > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
> > > > > > > https://www.datatorrent.com/docs/apidocs/com/datatorrent/
> > > > > > > api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_
> > > > > > > FAILURES_FOR_BLACKLIST
> > > > > > >
> > > > > > > BLACKLISTED_NODE_REMOVAL_TIME_MILLIS
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath <
> > > > > > r...@datatorrent.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > Not sure if this is what Milind had in mind but we often
> run
> > > into
> > > > > > > situations where the dev group
> > > > > > > working with Apex has no control over cluster configuration
> > --
> > > to
> > > > > > make any
> > > > > > > changes to the cluster they need to
> > > > > > > go through an elaborate process that can take many days.
> > > > > > >
> > > > > > > Meanwhile, if they notice that a particular node is
> > > consistently
> > > > > > causing
> > > > > > > problems for their
> > > > > > > app, having a simple way to exclude it would be very
> helpful
> > > > since
> > > > > > it gives
> > > > > > > them a way
> > > > > > > to bypass communication and process issues within their own
> > > > > > organization.
> > > > > > >
> > > > > > > Ram
> > > > > > >
> > > > > > > On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <
> > > > > > san...@datatorrent.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > To me both use cases appear to be generic resource
> > management
> > > > use
> > > > > > cases.
> > > > > > > > For example, a randomly rebooting node is not 

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Amol Kekre
I agree, this should be on top of affinity work

Thks
Amol

On Thu, Dec 1, 2016 at 1:01 PM, Pramod Immaneni 
wrote:

> I see a host locality available as an attribute in DAG for individual
> operators. If affinity doesn't support this today, we could probably add
> it. You could also make setting a blacklist directly a convenience function
> on top of affinity.
>
> On Thu, Dec 1, 2016 at 11:58 AM, Sandesh Hegde 
> wrote:
>
> > Pramod,
> >
> > How to specify,  "don't deploy any operators on Node20" using
> > anti-affinity?
> >
> > I don't see any examples here,
> > http://apex.apache.org/docs/apex/application_development/#affinity-rules
> >
> >
> > On Thu, Dec 1, 2016 at 11:31 AM Pramod Immaneni 
> > wrote:
> >
> > > Shouldn't this be already covered by anti-affinity. Today users can
> > specify
> > > multiple affinity rules, for each rule they can specify positive or
> > > negative affinity, locality and operator selection. If an affinity rule
> > > specifying negative affinity, node locality and all operators, does not
> > > work then let's fix that scenario instead of creating a new option.
> > >
> > > On Thu, Dec 1, 2016 at 11:17 AM, Sandesh Hegde <
> sand...@datatorrent.com>
> > > wrote:
> > >
> > > > I have created a jira, for adding the list of blacklisted nodes,
> > > > https://issues.apache.org/jira/browse/APEXCORE-584
> > > >
> > > > On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare <
> san...@datatorrent.com
> > >
> > > > wrote:
> > > >
> > > > > Yes, Ram explained to me that in practice this would be a useful
> > > feature
> > > > > for Apex devops who typically have no control over Hadoop/Yarn
> > cluster.
> > > > >
> > > > > On 11/30/16, 9:22 PM, "Mohit Jotwani" 
> wrote:
> > > > >
> > > > > This is a practical scenario where developers would be required
> > to
> > > > > exclude
> > > > > certain nodes as they might be required for some mission
> critical
> > > > > applications. It would be good to have this feature.
> > > > >
> > > > > I understand that Stram should not get into resourcing and
> still
> > > rely
> > > > > on
> > > > > Yarn, however, as the App Master it should have the right to
> > reject
> > > > the
> > > > > nodes offered by Yarn and request for other resources.
> > > > >
> > > > > Regards,
> > > > > Mohit
> > > > >
> > > > > On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde <
> > > > sand...@datatorrent.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Apex has automatic blacklisting of the troublesome nodes,
> > please
> > > > > take a
> > > > > > look at the following attributes,
> > > > > >
> > > > > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
> > > > > > https://www.datatorrent.com/docs/apidocs/com/datatorrent/
> > > > > > api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_
> > > > > > FAILURES_FOR_BLACKLIST
> > > > > >
> > > > > > BLACKLISTED_NODE_REMOVAL_TIME_MILLIS
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath <
> > > > > r...@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > Not sure if this is what Milind had in mind but we often run
> > into
> > > > > > situations where the dev group
> > > > > > working with Apex has no control over cluster configuration
> --
> > to
> > > > > make any
> > > > > > changes to the cluster they need to
> > > > > > go through an elaborate process that can take many days.
> > > > > >
> > > > > > Meanwhile, if they notice that a particular node is
> > consistently
> > > > > causing
> > > > > > problems for their
> > > > > > app, having a simple way to exclude it would be very helpful
> > > since
> > > > > it gives
> > > > > > them a way
> > > > > > to bypass communication and process issues within their own
> > > > > organization.
> > > > > >
> > > > > > Ram
> > > > > >
> > > > > > On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <
> > > > > san...@datatorrent.com>
> > > > > > wrote:
> > > > > >
> > > > > > > To me both use cases appear to be generic resource
> management
> > > use
> > > > > cases.
> > > > > > > For example, a randomly rebooting node is not good for any
> > > > purpose
> > > > > esp.
> > > > > > > long running apps so it is a bit of a stretch to imagine
> that
> > > > > these nodes
> > > > > > > will be acceptable for some batch jobs in Yarn. So such a
> > node
> > > > > should be
> > > > > > > marked “Bad” or Unavailable in Yarn itself.
> > > > > > >
> > > > > > > Second use case is also typical anti-affinity use case
> which
> > > > > ideally
> > > > > > > should be implemented in Yarn – Milind’s example can also
> > apply
> > > > to
> > > > > > non-Apex
> > > > > > > batch jobs. In any case it looks like Yarn 

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Pramod Immaneni
I see a host locality available as an attribute in DAG for individual
operators. If affinity doesn't support this today, we could probably add
it. You could also make setting a blacklist directly a convenience function
on top of affinity.

On Thu, Dec 1, 2016 at 11:58 AM, Sandesh Hegde 
wrote:

> Pramod,
>
> How to specify,  "don't deploy any operators on Node20" using
> anti-affinity?
>
> I don't see any examples here,
> http://apex.apache.org/docs/apex/application_development/#affinity-rules
>
>
> On Thu, Dec 1, 2016 at 11:31 AM Pramod Immaneni 
> wrote:
>
> > Shouldn't this be already covered by anti-affinity. Today users can
> specify
> > multiple affinity rules, for each rule they can specify positive or
> > negative affinity, locality and operator selection. If an affinity rule
> > specifying negative affinity, node locality and all operators, does not
> > work then let's fix that scenario instead of creating a new option.
> >
> > On Thu, Dec 1, 2016 at 11:17 AM, Sandesh Hegde 
> > wrote:
> >
> > > I have created a jira, for adding the list of blacklisted nodes,
> > > https://issues.apache.org/jira/browse/APEXCORE-584
> > >
> > > On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare  >
> > > wrote:
> > >
> > > > Yes, Ram explained to me that in practice this would be a useful
> > feature
> > > > for Apex devops who typically have no control over Hadoop/Yarn
> cluster.
> > > >
> > > > On 11/30/16, 9:22 PM, "Mohit Jotwani"  wrote:
> > > >
> > > > This is a practical scenario where developers would be required
> to
> > > > exclude
> > > > certain nodes as they might be required for some mission critical
> > > > applications. It would be good to have this feature.
> > > >
> > > > I understand that Stram should not get into resourcing and still
> > rely
> > > > on
> > > > Yarn, however, as the App Master it should have the right to
> reject
> > > the
> > > > nodes offered by Yarn and request for other resources.
> > > >
> > > > Regards,
> > > > Mohit
> > > >
> > > > On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde <
> > > sand...@datatorrent.com
> > > > >
> > > > wrote:
> > > >
> > > > > Apex has automatic blacklisting of the troublesome nodes,
> please
> > > > take a
> > > > > look at the following attributes,
> > > > >
> > > > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
> > > > > https://www.datatorrent.com/docs/apidocs/com/datatorrent/
> > > > > api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_
> > > > > FAILURES_FOR_BLACKLIST
> > > > >
> > > > > BLACKLISTED_NODE_REMOVAL_TIME_MILLIS
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath <
> > > > r...@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > Not sure if this is what Milind had in mind but we often run
> into
> > > > > situations where the dev group
> > > > > working with Apex has no control over cluster configuration --
> to
> > > > make any
> > > > > changes to the cluster they need to
> > > > > go through an elaborate process that can take many days.
> > > > >
> > > > > Meanwhile, if they notice that a particular node is
> consistently
> > > > causing
> > > > > problems for their
> > > > > app, having a simple way to exclude it would be very helpful
> > since
> > > > it gives
> > > > > them a way
> > > > > to bypass communication and process issues within their own
> > > > organization.
> > > > >
> > > > > Ram
> > > > >
> > > > > On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <
> > > > san...@datatorrent.com>
> > > > > wrote:
> > > > >
> > > > > > To me both use cases appear to be generic resource management
> > use
> > > > cases.
> > > > > > For example, a randomly rebooting node is not good for any
> > > purpose
> > > > esp.
> > > > > > long running apps so it is a bit of a stretch to imagine that
> > > > these nodes
> > > > > > will be acceptable for some batch jobs in Yarn. So such a
> node
> > > > should be
> > > > > > marked “Bad” or Unavailable in Yarn itself.
> > > > > >
> > > > > > Second use case is also typical anti-affinity use case which
> > > > ideally
> > > > > > should be implemented in Yarn – Milind’s example can also
> apply
> > > to
> > > > > non-Apex
> > > > > > batch jobs. In any case it looks like Yarn still doesn’t have
> > it
> > > (
> > > > > > https://issues.apache.org/jira/browse/YARN-1042) so if Apex
> > > needs
> > > > it we
> > > > > > will need to do it ourselves.
> > > > > >
> > > > > > On 11/30/16, 10:39 AM, "Munagala Ramanath" <
> > r...@datatorrent.com>
> > > > wrote:
> > > > > >
> > > > > > But then, what's the solution to the 2 problem scenarios
> > that
> > > > Milind
> > > > 

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Pramod Immaneni
Shouldn't this be already covered by anti-affinity. Today users can specify
multiple affinity rules, for each rule they can specify positive or
negative affinity, locality and operator selection. If an affinity rule
specifying negative affinity, node locality and all operators, does not
work then let's fix that scenario instead of creating a new option.

On Thu, Dec 1, 2016 at 11:17 AM, Sandesh Hegde 
wrote:

> I have created a jira, for adding the list of blacklisted nodes,
> https://issues.apache.org/jira/browse/APEXCORE-584
>
> On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare 
> wrote:
>
> > Yes, Ram explained to me that in practice this would be a useful feature
> > for Apex devops who typically have no control over Hadoop/Yarn cluster.
> >
> > On 11/30/16, 9:22 PM, "Mohit Jotwani"  wrote:
> >
> > This is a practical scenario where developers would be required to
> > exclude
> > certain nodes as they might be required for some mission critical
> > applications. It would be good to have this feature.
> >
> > I understand that Stram should not get into resourcing and still rely
> > on
> > Yarn, however, as the App Master it should have the right to reject
> the
> > nodes offered by Yarn and request for other resources.
> >
> > Regards,
> > Mohit
> >
> > On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde <
> sand...@datatorrent.com
> > >
> > wrote:
> >
> > > Apex has automatic blacklisting of the troublesome nodes, please
> > take a
> > > look at the following attributes,
> > >
> > > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
> > > https://www.datatorrent.com/docs/apidocs/com/datatorrent/
> > > api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_
> > > FAILURES_FOR_BLACKLIST
> > >
> > > BLACKLISTED_NODE_REMOVAL_TIME_MILLIS
> > >
> > > Thanks
> > >
> > >
> > >
> > > On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath <
> > r...@datatorrent.com>
> > > wrote:
> > >
> > > Not sure if this is what Milind had in mind but we often run into
> > > situations where the dev group
> > > working with Apex has no control over cluster configuration -- to
> > make any
> > > changes to the cluster they need to
> > > go through an elaborate process that can take many days.
> > >
> > > Meanwhile, if they notice that a particular node is consistently
> > causing
> > > problems for their
> > > app, having a simple way to exclude it would be very helpful since
> > it gives
> > > them a way
> > > to bypass communication and process issues within their own
> > organization.
> > >
> > > Ram
> > >
> > > On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <
> > san...@datatorrent.com>
> > > wrote:
> > >
> > > > To me both use cases appear to be generic resource management use
> > cases.
> > > > For example, a randomly rebooting node is not good for any
> purpose
> > esp.
> > > > long running apps so it is a bit of a stretch to imagine that
> > these nodes
> > > > will be acceptable for some batch jobs in Yarn. So such a node
> > should be
> > > > marked “Bad” or Unavailable in Yarn itself.
> > > >
> > > > Second use case is also typical anti-affinity use case which
> > ideally
> > > > should be implemented in Yarn – Milind’s example can also apply
> to
> > > non-Apex
> > > > batch jobs. In any case it looks like Yarn still doesn’t have it
> (
> > > > https://issues.apache.org/jira/browse/YARN-1042) so if Apex
> needs
> > it we
> > > > will need to do it ourselves.
> > > >
> > > > On 11/30/16, 10:39 AM, "Munagala Ramanath" 
> > wrote:
> > > >
> > > > But then, what's the solution to the 2 problem scenarios that
> > Milind
> > > > describes ?
> > > >
> > > > Ram
> > > >
> > > > On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> > > > san...@datatorrent.com>
> > > > wrote:
> > > >
> > > > > I think “exclude nodes” and such is really the job of the
> > resource
> > > > manager
> > > > > i.e. Yarn. So I am not sure taking over some of these tasks
> > in Apex
> > > > would
> > > > > be very useful.
> > > > >
> > > > > I agree with Amol that apps should be node neutral.
> Resource
> > > > management in
> > > > > Yarn together with fault tolerance in Apex should minimize
> > the need
> > > > for
> > > > > this feature although I am sure one can find use cases.
> > > > >
> > > > >
> > > > > On 11/29/16, 10:41 PM, "Amol Kekre" 
> > wrote:
> > > > >
> > > > > We do have this feature in Yarn, but that applies to
> all
> > > > applications.
> > > > > I am
> > > > > not sure if Yarn has anti-affinity. This 

Re: "ExcludeNodes" for an Apex application

2016-12-01 Thread Sandesh Hegde
I have created a jira, for adding the list of blacklisted nodes,
https://issues.apache.org/jira/browse/APEXCORE-584

On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare 
wrote:

> Yes, Ram explained to me that in practice this would be a useful feature
> for Apex devops who typically have no control over Hadoop/Yarn cluster.
>
> On 11/30/16, 9:22 PM, "Mohit Jotwani"  wrote:
>
> This is a practical scenario where developers would be required to
> exclude
> certain nodes as they might be required for some mission critical
> applications. It would be good to have this feature.
>
> I understand that Stram should not get into resourcing and still rely
> on
> Yarn, however, as the App Master it should have the right to reject the
> nodes offered by Yarn and request for other resources.
>
> Regards,
> Mohit
>
> On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde  >
> wrote:
>
> > Apex has automatic blacklisting of the troublesome nodes, please
> take a
> > look at the following attributes,
> >
> > MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
> > https://www.datatorrent.com/docs/apidocs/com/datatorrent/
> > api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_
> > FAILURES_FOR_BLACKLIST
> >
> > BLACKLISTED_NODE_REMOVAL_TIME_MILLIS
> >
> > Thanks
> >
> >
> >
> > On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath <
> r...@datatorrent.com>
> > wrote:
> >
> > Not sure if this is what Milind had in mind but we often run into
> > situations where the dev group
> > working with Apex has no control over cluster configuration -- to
> make any
> > changes to the cluster they need to
> > go through an elaborate process that can take many days.
> >
> > Meanwhile, if they notice that a particular node is consistently
> causing
> > problems for their
> > app, having a simple way to exclude it would be very helpful since
> it gives
> > them a way
> > to bypass communication and process issues within their own
> organization.
> >
> > Ram
> >
> > On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare <
> san...@datatorrent.com>
> > wrote:
> >
> > > To me both use cases appear to be generic resource management use
> cases.
> > > For example, a randomly rebooting node is not good for any purpose
> esp.
> > > long running apps so it is a bit of a stretch to imagine that
> these nodes
> > > will be acceptable for some batch jobs in Yarn. So such a node
> should be
> > > marked “Bad” or Unavailable in Yarn itself.
> > >
> > > Second use case is also typical anti-affinity use case which
> ideally
> > > should be implemented in Yarn – Milind’s example can also apply to
> > non-Apex
> > > batch jobs. In any case it looks like Yarn still doesn’t have it (
> > > https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs
> it we
> > > will need to do it ourselves.
> > >
> > > On 11/30/16, 10:39 AM, "Munagala Ramanath" 
> wrote:
> > >
> > > But then, what's the solution to the 2 problem scenarios that
> Milind
> > > describes ?
> > >
> > > Ram
> > >
> > > On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> > > san...@datatorrent.com>
> > > wrote:
> > >
> > > > I think “exclude nodes” and such is really the job of the
> resource
> > > manager
> > > > i.e. Yarn. So I am not sure taking over some of these tasks
> in Apex
> > > would
> > > > be very useful.
> > > >
> > > > I agree with Amol that apps should be node neutral. Resource
> > > management in
> > > > Yarn together with fault tolerance in Apex should minimize
> the need
> > > for
> > > > this feature although I am sure one can find use cases.
> > > >
> > > >
> > > > On 11/29/16, 10:41 PM, "Amol Kekre" 
> wrote:
> > > >
> > > > We do have this feature in Yarn, but that applies to all
> > > applications.
> > > > I am
> > > > not sure if Yarn has anti-affinity. This feature may be
> used,
> > > but in
> > > > general there is danger is an application taking over
> resource
> > > > allocation.
> > > > Another quirk is that big data apps should ideally be
> > > node-neutral.
> > > > This is
> > > > a good idea, if we are able to carve out something where
> need
> > is
> > > app
> > > > specific.
> > > >
> > > > Thks
> > > > Amol
> > > >
> > > >
> > > > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> > > mili...@gmail.com>
> > > > wrote:
> > > >
> > > > > We have seen 2 cases mentioned 

Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sanjay Pujare
Yes, Ram explained to me that in practice this would be a useful feature for 
Apex devops who typically have no control over Hadoop/Yarn cluster.

On 11/30/16, 9:22 PM, "Mohit Jotwani"  wrote:

This is a practical scenario where developers would be required to exclude
certain nodes as they might be required for some mission critical
applications. It would be good to have this feature.

I understand that Stram should not get into resourcing and still rely on
Yarn, however, as the App Master it should have the right to reject the
nodes offered by Yarn and request for other resources.

Regards,
Mohit

On Thu, Dec 1, 2016 at 2:34 AM, Sandesh Hegde 
wrote:

> Apex has automatic blacklisting of the troublesome nodes, please take a
> look at the following attributes,
>
> MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
> https://www.datatorrent.com/docs/apidocs/com/datatorrent/
> api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_
> FAILURES_FOR_BLACKLIST
>
> BLACKLISTED_NODE_REMOVAL_TIME_MILLIS
>
> Thanks
>
>
>
> On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath 
> wrote:
>
> Not sure if this is what Milind had in mind but we often run into
> situations where the dev group
> working with Apex has no control over cluster configuration -- to make any
> changes to the cluster they need to
> go through an elaborate process that can take many days.
>
> Meanwhile, if they notice that a particular node is consistently causing
> problems for their
> app, having a simple way to exclude it would be very helpful since it 
gives
> them a way
> to bypass communication and process issues within their own organization.
>
> Ram
>
> On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare 
> wrote:
>
> > To me both use cases appear to be generic resource management use cases.
> > For example, a randomly rebooting node is not good for any purpose esp.
> > long running apps so it is a bit of a stretch to imagine that these 
nodes
> > will be acceptable for some batch jobs in Yarn. So such a node should be
> > marked “Bad” or Unavailable in Yarn itself.
> >
> > Second use case is also typical anti-affinity use case which ideally
> > should be implemented in Yarn – Milind’s example can also apply to
> non-Apex
> > batch jobs. In any case it looks like Yarn still doesn’t have it (
> > https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> > will need to do it ourselves.
> >
> > On 11/30/16, 10:39 AM, "Munagala Ramanath"  wrote:
> >
> > But then, what's the solution to the 2 problem scenarios that Milind
> > describes ?
> >
> > Ram
> >
> > On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> > san...@datatorrent.com>
> > wrote:
> >
> > > I think “exclude nodes” and such is really the job of the resource
> > manager
> > > i.e. Yarn. So I am not sure taking over some of these tasks in 
Apex
> > would
> > > be very useful.
> > >
> > > I agree with Amol that apps should be node neutral. Resource
> > management in
> > > Yarn together with fault tolerance in Apex should minimize the 
need
> > for
> > > this feature although I am sure one can find use cases.
> > >
> > >
> > > On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
> > >
> > > We do have this feature in Yarn, but that applies to all
> > applications.
> > > I am
> > > not sure if Yarn has anti-affinity. This feature may be used,
> > but in
> > > general there is danger is an application taking over resource
> > > allocation.
> > > Another quirk is that big data apps should ideally be
> > node-neutral.
> > > This is
> > > a good idea, if we are able to carve out something where need
> is
> > app
> > > specific.
> > >
> > > Thks
> > > Amol
> > >
> > >
> > > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> > mili...@gmail.com>
> > > wrote:
> > >
> > > > We have seen 2 cases mentioned below, where, it would have
> > been nice
> > > if
> > > > Apex allowed us to exclude a node from the cluster for an
> > > application.
> > > >
> > > > 1. A node in the cluster had gone bad (was randomly
> rebooting)
> > and
> > > so an
> > > > Apex app should not use it - other apps can use it as they
> were
> > > batch jobs.
> > > > 2. A node is 

Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sandesh Hegde
Apex has automatic blacklisting of the troublesome nodes, please take a
look at the following attributes,

MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
https://www.datatorrent.com/docs/apidocs/com/datatorrent/api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST

BLACKLISTED_NODE_REMOVAL_TIME_MILLIS

Thanks



On Wed, Nov 30, 2016 at 12:56 PM Munagala Ramanath 
wrote:

Not sure if this is what Milind had in mind but we often run into
situations where the dev group
working with Apex has no control over cluster configuration -- to make any
changes to the cluster they need to
go through an elaborate process that can take many days.

Meanwhile, if they notice that a particular node is consistently causing
problems for their
app, having a simple way to exclude it would be very helpful since it gives
them a way
to bypass communication and process issues within their own organization.

Ram

On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare 
wrote:

> To me both use cases appear to be generic resource management use cases.
> For example, a randomly rebooting node is not good for any purpose esp.
> long running apps so it is a bit of a stretch to imagine that these nodes
> will be acceptable for some batch jobs in Yarn. So such a node should be
> marked “Bad” or Unavailable in Yarn itself.
>
> Second use case is also typical anti-affinity use case which ideally
> should be implemented in Yarn – Milind’s example can also apply to
non-Apex
> batch jobs. In any case it looks like Yarn still doesn’t have it (
> https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> will need to do it ourselves.
>
> On 11/30/16, 10:39 AM, "Munagala Ramanath"  wrote:
>
> But then, what's the solution to the 2 problem scenarios that Milind
> describes ?
>
> Ram
>
> On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> san...@datatorrent.com>
> wrote:
>
> > I think “exclude nodes” and such is really the job of the resource
> manager
> > i.e. Yarn. So I am not sure taking over some of these tasks in Apex
> would
> > be very useful.
> >
> > I agree with Amol that apps should be node neutral. Resource
> management in
> > Yarn together with fault tolerance in Apex should minimize the need
> for
> > this feature although I am sure one can find use cases.
> >
> >
> > On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
> >
> > We do have this feature in Yarn, but that applies to all
> applications.
> > I am
> > not sure if Yarn has anti-affinity. This feature may be used,
> but in
> > general there is danger is an application taking over resource
> > allocation.
> > Another quirk is that big data apps should ideally be
> node-neutral.
> > This is
> > a good idea, if we are able to carve out something where need is
> app
> > specific.
> >
> > Thks
> > Amol
> >
> >
> > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> mili...@gmail.com>
> > wrote:
> >
> > > We have seen 2 cases mentioned below, where, it would have
> been nice
> > if
> > > Apex allowed us to exclude a node from the cluster for an
> > application.
> > >
> > > 1. A node in the cluster had gone bad (was randomly rebooting)
> and
> > so an
> > > Apex app should not use it - other apps can use it as they
were
> > batch jobs.
> > > 2. A node is being used for a mission critical app (Could be
> an Apex
> > app
> > > itself), but another Apex app which is mission critical should
> not
> > be using
> > > resources on that node.
> > >
> > > Can we have a way in which, Stram and YARN can coordinate
> between
> > each
> > > other to not use a set of nodes for the application. It an be
> done
> > in 2 way
> > > s-
> > >
> > > 1. Have a list of "exclude" nodes with Stram- when YARN
> allcates
> > resources
> > > on either of these, STRAM rejects and gets resources allocated
> again
> > frm
> > > YARN
> > > 2. Have a list of nodes that can be used for an app - This can
> be a
> > part of
> > > config. Hwever, I don't think this would be a right way to do
> so as
> > we will
> > > need support from YARN as well. Further, this might be
> difficult to
> > change
> > > at runtim if need be.
> > >
> > > Any thoughts?
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
> >
> >
> >
>
>
>
>


Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Munagala Ramanath
Not sure if this is what Milind had in mind but we often run into
situations where the dev group
working with Apex has no control over cluster configuration -- to make any
changes to the cluster they need to
go through an elaborate process that can take many days.

Meanwhile, if they notice that a particular node is consistently causing
problems for their
app, having a simple way to exclude it would be very helpful since it gives
them a way
to bypass communication and process issues within their own organization.

Ram

On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare 
wrote:

> To me both use cases appear to be generic resource management use cases.
> For example, a randomly rebooting node is not good for any purpose esp.
> long running apps so it is a bit of a stretch to imagine that these nodes
> will be acceptable for some batch jobs in Yarn. So such a node should be
> marked “Bad” or Unavailable in Yarn itself.
>
> Second use case is also typical anti-affinity use case which ideally
> should be implemented in Yarn – Milind’s example can also apply to non-Apex
> batch jobs. In any case it looks like Yarn still doesn’t have it (
> https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> will need to do it ourselves.
>
> On 11/30/16, 10:39 AM, "Munagala Ramanath"  wrote:
>
> But then, what's the solution to the 2 problem scenarios that Milind
> describes ?
>
> Ram
>
> On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> san...@datatorrent.com>
> wrote:
>
> > I think “exclude nodes” and such is really the job of the resource
> manager
> > i.e. Yarn. So I am not sure taking over some of these tasks in Apex
> would
> > be very useful.
> >
> > I agree with Amol that apps should be node neutral. Resource
> management in
> > Yarn together with fault tolerance in Apex should minimize the need
> for
> > this feature although I am sure one can find use cases.
> >
> >
> > On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
> >
> > We do have this feature in Yarn, but that applies to all
> applications.
> > I am
> > not sure if Yarn has anti-affinity. This feature may be used,
> but in
> > general there is danger is an application taking over resource
> > allocation.
> > Another quirk is that big data apps should ideally be
> node-neutral.
> > This is
> > a good idea, if we are able to carve out something where need is
> app
> > specific.
> >
> > Thks
> > Amol
> >
> >
> > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> mili...@gmail.com>
> > wrote:
> >
> > > We have seen 2 cases mentioned below, where, it would have
> been nice
> > if
> > > Apex allowed us to exclude a node from the cluster for an
> > application.
> > >
> > > 1. A node in the cluster had gone bad (was randomly rebooting)
> and
> > so an
> > > Apex app should not use it - other apps can use it as they were
> > batch jobs.
> > > 2. A node is being used for a mission critical app (Could be
> an Apex
> > app
> > > itself), but another Apex app which is mission critical should
> not
> > be using
> > > resources on that node.
> > >
> > > Can we have a way in which, Stram and YARN can coordinate
> between
> > each
> > > other to not use a set of nodes for the application. It an be
> done
> > in 2 way
> > > s-
> > >
> > > 1. Have a list of "exclude" nodes with Stram- when YARN
> allcates
> > resources
> > > on either of these, STRAM rejects and gets resources allocated
> again
> > frm
> > > YARN
> > > 2. Have a list of nodes that can be used for an app - This can
> be a
> > part of
> > > config. Hwever, I don't think this would be a right way to do
> so as
> > we will
> > > need support from YARN as well. Further, this might be
> difficult to
> > change
> > > at runtim if need be.
> > >
> > > Any thoughts?
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
> >
> >
> >
>
>
>
>


Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Amol Kekre
I agree, Randomly rebooting node is Yarn issue. Even anti-affinity between
apps should be Yarn in long run. We could contribute the above jira.

Thks
Amol


On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare 
wrote:

> To me both use cases appear to be generic resource management use cases.
> For example, a randomly rebooting node is not good for any purpose esp.
> long running apps so it is a bit of a stretch to imagine that these nodes
> will be acceptable for some batch jobs in Yarn. So such a node should be
> marked “Bad” or Unavailable in Yarn itself.
>
> Second use case is also typical anti-affinity use case which ideally
> should be implemented in Yarn – Milind’s example can also apply to non-Apex
> batch jobs. In any case it looks like Yarn still doesn’t have it (
> https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we
> will need to do it ourselves.
>
> On 11/30/16, 10:39 AM, "Munagala Ramanath"  wrote:
>
> But then, what's the solution to the 2 problem scenarios that Milind
> describes ?
>
> Ram
>
> On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare <
> san...@datatorrent.com>
> wrote:
>
> > I think “exclude nodes” and such is really the job of the resource
> manager
> > i.e. Yarn. So I am not sure taking over some of these tasks in Apex
> would
> > be very useful.
> >
> > I agree with Amol that apps should be node neutral. Resource
> management in
> > Yarn together with fault tolerance in Apex should minimize the need
> for
> > this feature although I am sure one can find use cases.
> >
> >
> > On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
> >
> > We do have this feature in Yarn, but that applies to all
> applications.
> > I am
> > not sure if Yarn has anti-affinity. This feature may be used,
> but in
> > general there is danger is an application taking over resource
> > allocation.
> > Another quirk is that big data apps should ideally be
> node-neutral.
> > This is
> > a good idea, if we are able to carve out something where need is
> app
> > specific.
> >
> > Thks
> > Amol
> >
> >
> > On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve <
> mili...@gmail.com>
> > wrote:
> >
> > > We have seen 2 cases mentioned below, where, it would have
> been nice
> > if
> > > Apex allowed us to exclude a node from the cluster for an
> > application.
> > >
> > > 1. A node in the cluster had gone bad (was randomly rebooting)
> and
> > so an
> > > Apex app should not use it - other apps can use it as they were
> > batch jobs.
> > > 2. A node is being used for a mission critical app (Could be
> an Apex
> > app
> > > itself), but another Apex app which is mission critical should
> not
> > be using
> > > resources on that node.
> > >
> > > Can we have a way in which, Stram and YARN can coordinate
> between
> > each
> > > other to not use a set of nodes for the application. It an be
> done
> > in 2 way
> > > s-
> > >
> > > 1. Have a list of "exclude" nodes with Stram- when YARN
> allcates
> > resources
> > > on either of these, STRAM rejects and gets resources allocated
> again
> > frm
> > > YARN
> > > 2. Have a list of nodes that can be used for an app - This can
> be a
> > part of
> > > config. Hwever, I don't think this would be a right way to do
> so as
> > we will
> > > need support from YARN as well. Further, this might be
> difficult to
> > change
> > > at runtim if need be.
> > >
> > > Any thoughts?
> > >
> > >
> > > --
> > > ~Milind bee at gee mail dot com
> > >
> >
> >
> >
> >
>
>
>
>


Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sanjay Pujare
To me both use cases appear to be generic resource management use cases. For 
example, a randomly rebooting node is not good for any purpose esp. long 
running apps so it is a bit of a stretch to imagine that these nodes will be 
acceptable for some batch jobs in Yarn. So such a node should be marked “Bad” 
or Unavailable in Yarn itself.

Second use case is also typical anti-affinity use case which ideally should be 
implemented in Yarn – Milind’s example can also apply to non-Apex batch jobs. 
In any case it looks like Yarn still doesn’t have it 
(https://issues.apache.org/jira/browse/YARN-1042) so if Apex needs it we will 
need to do it ourselves.

On 11/30/16, 10:39 AM, "Munagala Ramanath"  wrote:

But then, what's the solution to the 2 problem scenarios that Milind
describes ?

Ram

On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare 
wrote:

> I think “exclude nodes” and such is really the job of the resource manager
> i.e. Yarn. So I am not sure taking over some of these tasks in Apex would
> be very useful.
>
> I agree with Amol that apps should be node neutral. Resource management in
> Yarn together with fault tolerance in Apex should minimize the need for
> this feature although I am sure one can find use cases.
>
>
> On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
>
> We do have this feature in Yarn, but that applies to all applications.
> I am
> not sure if Yarn has anti-affinity. This feature may be used, but in
> general there is danger is an application taking over resource
> allocation.
> Another quirk is that big data apps should ideally be node-neutral.
> This is
> a good idea, if we are able to carve out something where need is app
> specific.
>
> Thks
> Amol
>
>
> On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve 
> wrote:
>
> > We have seen 2 cases mentioned below, where, it would have been nice
> if
> > Apex allowed us to exclude a node from the cluster for an
> application.
> >
> > 1. A node in the cluster had gone bad (was randomly rebooting) and
> so an
> > Apex app should not use it - other apps can use it as they were
> batch jobs.
> > 2. A node is being used for a mission critical app (Could be an Apex
> app
> > itself), but another Apex app which is mission critical should not
> be using
> > resources on that node.
> >
> > Can we have a way in which, Stram and YARN can coordinate between
> each
> > other to not use a set of nodes for the application. It an be done
> in 2 way
> > s-
> >
> > 1. Have a list of "exclude" nodes with Stram- when YARN allcates
> resources
> > on either of these, STRAM rejects and gets resources allocated again
> frm
> > YARN
> > 2. Have a list of nodes that can be used for an app - This can be a
> part of
> > config. Hwever, I don't think this would be a right way to do so as
> we will
> > need support from YARN as well. Further, this might be difficult to
> change
> > at runtim if need be.
> >
> > Any thoughts?
> >
> >
> > --
> > ~Milind bee at gee mail dot com
> >
>
>
>
>





Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Munagala Ramanath
But then, what's the solution to the 2 problem scenarios that Milind
describes ?

Ram

On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare 
wrote:

> I think “exclude nodes” and such is really the job of the resource manager
> i.e. Yarn. So I am not sure taking over some of these tasks in Apex would
> be very useful.
>
> I agree with Amol that apps should be node neutral. Resource management in
> Yarn together with fault tolerance in Apex should minimize the need for
> this feature although I am sure one can find use cases.
>
>
> On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:
>
> We do have this feature in Yarn, but that applies to all applications.
> I am
> not sure if Yarn has anti-affinity. This feature may be used, but in
> general there is danger is an application taking over resource
> allocation.
> Another quirk is that big data apps should ideally be node-neutral.
> This is
> a good idea, if we are able to carve out something where need is app
> specific.
>
> Thks
> Amol
>
>
> On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve 
> wrote:
>
> > We have seen 2 cases mentioned below, where, it would have been nice
> if
> > Apex allowed us to exclude a node from the cluster for an
> application.
> >
> > 1. A node in the cluster had gone bad (was randomly rebooting) and
> so an
> > Apex app should not use it - other apps can use it as they were
> batch jobs.
> > 2. A node is being used for a mission critical app (Could be an Apex
> app
> > itself), but another Apex app which is mission critical should not
> be using
> > resources on that node.
> >
> > Can we have a way in which, Stram and YARN can coordinate between
> each
> > other to not use a set of nodes for the application. It an be done
> in 2 way
> > s-
> >
> > 1. Have a list of "exclude" nodes with Stram- when YARN allcates
> resources
> > on either of these, STRAM rejects and gets resources allocated again
> frm
> > YARN
> > 2. Have a list of nodes that can be used for an app - This can be a
> part of
> > config. Hwever, I don't think this would be a right way to do so as
> we will
> > need support from YARN as well. Further, this might be difficult to
> change
> > at runtim if need be.
> >
> > Any thoughts?
> >
> >
> > --
> > ~Milind bee at gee mail dot com
> >
>
>
>
>


Re: "ExcludeNodes" for an Apex application

2016-11-30 Thread Sanjay Pujare
I think “exclude nodes” and such is really the job of the resource manager i.e. 
Yarn. So I am not sure taking over some of these tasks in Apex would be very 
useful.

I agree with Amol that apps should be node neutral. Resource management in Yarn 
together with fault tolerance in Apex should minimize the need for this feature 
although I am sure one can find use cases.


On 11/29/16, 10:41 PM, "Amol Kekre"  wrote:

We do have this feature in Yarn, but that applies to all applications. I am
not sure if Yarn has anti-affinity. This feature may be used, but in
general there is danger is an application taking over resource allocation.
Another quirk is that big data apps should ideally be node-neutral. This is
a good idea, if we are able to carve out something where need is app
specific.

Thks
Amol


On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve  wrote:

> We have seen 2 cases mentioned below, where, it would have been nice if
> Apex allowed us to exclude a node from the cluster for an application.
>
> 1. A node in the cluster had gone bad (was randomly rebooting) and so an
> Apex app should not use it - other apps can use it as they were batch 
jobs.
> 2. A node is being used for a mission critical app (Could be an Apex app
> itself), but another Apex app which is mission critical should not be 
using
> resources on that node.
>
> Can we have a way in which, Stram and YARN can coordinate between each
> other to not use a set of nodes for the application. It an be done in 2 
way
> s-
>
> 1. Have a list of "exclude" nodes with Stram- when YARN allcates resources
> on either of these, STRAM rejects and gets resources allocated again frm
> YARN
> 2. Have a list of nodes that can be used for an app - This can be a part 
of
> config. Hwever, I don't think this would be a right way to do so as we 
will
> need support from YARN as well. Further, this might be difficult to change
> at runtim if need be.
>
> Any thoughts?
>
>
> --
> ~Milind bee at gee mail dot com
>





Re: "ExcludeNodes" for an Apex application

2016-11-29 Thread Amol Kekre
We do have this feature in Yarn, but that applies to all applications. I am
not sure if Yarn has anti-affinity. This feature may be used, but in
general there is danger is an application taking over resource allocation.
Another quirk is that big data apps should ideally be node-neutral. This is
a good idea, if we are able to carve out something where need is app
specific.

Thks
Amol


On Tue, Nov 29, 2016 at 10:00 PM, Milind Barve  wrote:

> We have seen 2 cases mentioned below, where, it would have been nice if
> Apex allowed us to exclude a node from the cluster for an application.
>
> 1. A node in the cluster had gone bad (was randomly rebooting) and so an
> Apex app should not use it - other apps can use it as they were batch jobs.
> 2. A node is being used for a mission critical app (Could be an Apex app
> itself), but another Apex app which is mission critical should not be using
> resources on that node.
>
> Can we have a way in which, Stram and YARN can coordinate between each
> other to not use a set of nodes for the application. It an be done in 2 way
> s-
>
> 1. Have a list of "exclude" nodes with Stram- when YARN allcates resources
> on either of these, STRAM rejects and gets resources allocated again frm
> YARN
> 2. Have a list of nodes that can be used for an app - This can be a part of
> config. Hwever, I don't think this would be a right way to do so as we will
> need support from YARN as well. Further, this might be difficult to change
> at runtim if need be.
>
> Any thoughts?
>
>
> --
> ~Milind bee at gee mail dot com
>