Agree it should be via YARN; the poison pill would be the final barrier in
the event
all other mechanisms have failed -- sort of like an API call which
documents that a parameter
should be non-null but nevertheless checks it internally and throws an
exception if it finds null.
Additionally, it
The OP is claiming (in the comment to the first response) that he actually
tried the
proposed solution and it did not work for him and shows the RM code fragment
that is clobbering his preference.
Ram
On Fri, Dec 2, 2016 at 12:17 AM, Sandesh Hegde
wrote:
> Yarn allows
Stram exclude node should be via Yarn, poison pill is not a good way as it
induces a terminate for wrong reasons.
Thks
Amol
On Fri, Dec 2, 2016 at 7:13 AM, Munagala Ramanath
wrote:
> Could STRAM include a poison pill where it simply exits with diagnostic if
> its host
Could STRAM include a poison pill where it simply exits with diagnostic if
its host name is blacklisted ?
Ram
On Thu, Dec 1, 2016 at 11:52 PM, Amol Kekre wrote:
> Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
> attribute within the app
So all Apex will need to do is - to make sure as a part of the initial
configuration validations that the node selected to run the master is not a
part of the "excludeNode" list.
On Fri, Dec 2, 2016 at 1:47 PM, Sandesh Hegde
wrote:
> Yarn allows the AppMaster to run on
Yarn allows the AppMaster to run on the selected node, Apex shouldn't
select the blacklisted nodes, so it is possible to achieve not running the
Apex containers on certain nodes.
http://stackoverflow.com/questions/29302659/run-my-own-application-master-on-a-specific-node-in-a-yarn-cluster
On
Yarn will deploy AM (Stram) on a node of its choice, therey rendering any
attribute within the app un-enforceable in terms of not deploying master on
a node.
Thks
Amol
On Thu, Dec 1, 2016 at 11:19 PM, Milind Barve wrote:
> Additionally, this would apply to Stram as well
While it is possible to extend anti-affinity to take care of this, I feel
it will cause confusion from a user perspective. As a user, when I think
about anti-affinity, what comes to mind right away is a relative relation
between operators.
On the other hand, the current ask is not that, but a
Okay, I think that serves an alternate purpose of detecting any newly gone
bad node and excluding it.
+1 for covering the original scenario under anti-affinity.
~ Bhupesh
On Fri, Dec 2, 2016 at 9:14 AM, Munagala Ramanath
wrote:
> It only takes effect after failures -- no
It only takes effect after failures -- no way to exclude from the get-go.
Ram
On Dec 1, 2016 7:15 PM, "Bhupesh Chawda" wrote:
> As suggested by Sandesh, the parameter
> MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST seems to do exactly what
> is needed.
> Why would
Hi,
Can't we make use of existing Node Label + queue feature in Yarn to achieve
this. Though we will have to redeploy cluster, its still possible to
exclude nodes.
https://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/NodeLabel.html
Thanks,
Ajay
On Fri, Dec 2, 2016 at 5:57 AM,
I agree, this should be on top of affinity work
Thks
Amol
On Thu, Dec 1, 2016 at 1:01 PM, Pramod Immaneni
wrote:
> I see a host locality available as an attribute in DAG for individual
> operators. If affinity doesn't support this today, we could probably add
> it. You
I see a host locality available as an attribute in DAG for individual
operators. If affinity doesn't support this today, we could probably add
it. You could also make setting a blacklist directly a convenience function
on top of affinity.
On Thu, Dec 1, 2016 at 11:58 AM, Sandesh Hegde
Shouldn't this be already covered by anti-affinity. Today users can specify
multiple affinity rules, for each rule they can specify positive or
negative affinity, locality and operator selection. If an affinity rule
specifying negative affinity, node locality and all operators, does not
work then
I have created a jira, for adding the list of blacklisted nodes,
https://issues.apache.org/jira/browse/APEXCORE-584
On Wed, Nov 30, 2016 at 11:06 PM Sanjay Pujare
wrote:
> Yes, Ram explained to me that in practice this would be a useful feature
> for Apex devops who
Yes, Ram explained to me that in practice this would be a useful feature for
Apex devops who typically have no control over Hadoop/Yarn cluster.
On 11/30/16, 9:22 PM, "Mohit Jotwani" wrote:
This is a practical scenario where developers would be required to exclude
Apex has automatic blacklisting of the troublesome nodes, please take a
look at the following attributes,
MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
https://www.datatorrent.com/docs/apidocs/com/datatorrent/api/Context.DAGContext.html#MAX_CONSECUTIVE_CONTAINER_FAILURES_FOR_BLACKLIST
Not sure if this is what Milind had in mind but we often run into
situations where the dev group
working with Apex has no control over cluster configuration -- to make any
changes to the cluster they need to
go through an elaborate process that can take many days.
Meanwhile, if they notice that a
I agree, Randomly rebooting node is Yarn issue. Even anti-affinity between
apps should be Yarn in long run. We could contribute the above jira.
Thks
Amol
On Wed, Nov 30, 2016 at 10:58 AM, Sanjay Pujare
wrote:
> To me both use cases appear to be generic resource
To me both use cases appear to be generic resource management use cases. For
example, a randomly rebooting node is not good for any purpose esp. long
running apps so it is a bit of a stretch to imagine that these nodes will be
acceptable for some batch jobs in Yarn. So such a node should be
But then, what's the solution to the 2 problem scenarios that Milind
describes ?
Ram
On Wed, Nov 30, 2016 at 10:34 AM, Sanjay Pujare
wrote:
> I think “exclude nodes” and such is really the job of the resource manager
> i.e. Yarn. So I am not sure taking over some of
I think “exclude nodes” and such is really the job of the resource manager i.e.
Yarn. So I am not sure taking over some of these tasks in Apex would be very
useful.
I agree with Amol that apps should be node neutral. Resource management in Yarn
together with fault tolerance in Apex should
We do have this feature in Yarn, but that applies to all applications. I am
not sure if Yarn has anti-affinity. This feature may be used, but in
general there is danger is an application taking over resource allocation.
Another quirk is that big data apps should ideally be node-neutral. This is
a
We have seen 2 cases mentioned below, where, it would have been nice if
Apex allowed us to exclude a node from the cluster for an application.
1. A node in the cluster had gone bad (was randomly rebooting) and so an
Apex app should not use it - other apps can use it as they were batch jobs.
2. A
24 matches
Mail list logo