Re: Review Request 51633: SAMZA-1013: Add YARN Node label support

2016-10-20 Thread Jagadish Venkatraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51633/#review153465
---


Ship it!




- Jagadish Venkatraman


On Oct. 7, 2016, 12:08 a.m., Maxim Logvinenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51633/
> ---
> 
> (Updated Oct. 7, 2016, 12:08 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1013
> https://issues.apache.org/jira/browse/SAMZA-1013
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> YARN Node labels were introduced in Hadoop version 2.6, which allows to group 
> nodes with similar characteristics and allows applications to specify where 
> to run. This patch adds support for YARN node labels in Samza.
> 
> In this implementation, node labels are defined directly in yarnConfig in 
> YarnClusterResourceManager. It might be better to have node labels as a part 
> of SamzaResourceRequest and SamzaResource classes, but 
> org.apache.hadoop.yarn.api.records.Container class doesn't contain node label 
> and hence we have nothing to pass to the SamzaResource constructor in 
> onContainersAllocated method of YarnClusterResourceManager class.
> 
> 
> Diffs
> -
> 
>   samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java 8f2dc48 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java
>  96d3d7c 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 
> 0998c43 
> 
> Diff: https://reviews.apache.org/r/51633/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Maxim Logvinenko
> 
>



Re: Review Request 51633: SAMZA-1013: Add YARN Node label support

2016-10-06 Thread Jagadish Venkatraman


> On Oct. 5, 2016, 6:18 p.m., Jagadish Venkatraman wrote:
> > Overall, the patch looks great! This is exciting given that Samza can 
> > support scheduling based on tags. For example, jobs with rocksdb can be 
> > assigned to nodes with SSDs.
> > 
> > 
> > Can you please add some detail on testing this feature? 
> > What was the label setup of the cluster? (for example: Did we use an 
> > exclusive node label?), How many node labels? How many containers were 
> > requested for the job?
> 
> Maxim Logvinenko wrote:
> We haven't tested it in production, but the main idea is the next: we 
> have 3 different types of nodes in our hadoop cluster. The first type is used 
> for ApplicationMasters (actually, we put up to 4 AM containers on one node). 
> The second type is used for stateless jobs and this type of nodes has a small 
> amount of memory. And the last type is used for stateful jobs and has more 
> memory than others. So, there are 3 labels in our cluster: taskam, 
> tasklowmem, taskhighmem. Now we force YARN to put containers on a particular 
> type of nodes by a small trick with resources (we chose resources for node in 
> such a way that YARN doesn't have any other variants except only one type of 
> nodes). But hadoop labels is a more natural way to request containers to be 
> placed on a specific node's type.

So, in this case, do you not care about *host affinity* at all when the job 
re-starts? Are you okay with your container coming back up on a different host 
(as long as it is a host with label `taskHighMem`)? We should make it explicit 
that when host-affinity.enabled=true, then node labelling will be ignored. Is 
my understanding reasonable?


- Jagadish


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51633/#review151519
---


On Oct. 7, 2016, 12:08 a.m., Maxim Logvinenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51633/
> ---
> 
> (Updated Oct. 7, 2016, 12:08 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1013
> https://issues.apache.org/jira/browse/SAMZA-1013
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> YARN Node labels were introduced in Hadoop version 2.6, which allows to group 
> nodes with similar characteristics and allows applications to specify where 
> to run. This patch adds support for YARN node labels in Samza.
> 
> In this implementation, node labels are defined directly in yarnConfig in 
> YarnClusterResourceManager. It might be better to have node labels as a part 
> of SamzaResourceRequest and SamzaResource classes, but 
> org.apache.hadoop.yarn.api.records.Container class doesn't contain node label 
> and hence we have nothing to pass to the SamzaResource constructor in 
> onContainersAllocated method of YarnClusterResourceManager class.
> 
> 
> Diffs
> -
> 
>   samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java 8f2dc48 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java
>  96d3d7c 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 
> 0998c43 
> 
> Diff: https://reviews.apache.org/r/51633/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Maxim Logvinenko
> 
>



Re: Review Request 51633: SAMZA-1013: Add YARN Node label support

2016-10-06 Thread Maxim Logvinenko


> On Oct. 5, 2016, 6:18 p.m., Jagadish Venkatraman wrote:
> > samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java,
> >  line 187
> > 
> >
> > nit: containerLabel so that we distinguish it from the AM label.

done


> On Oct. 5, 2016, 6:18 p.m., Jagadish Venkatraman wrote:
> > samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala, 
> > line 97
> > 
> >
> > nit: rename this to appMasterLabel (to avoid a conflict with 
> > containerLabel).

done


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51633/#review151519
---


On Oct. 7, 2016, 12:08 a.m., Maxim Logvinenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51633/
> ---
> 
> (Updated Oct. 7, 2016, 12:08 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1013
> https://issues.apache.org/jira/browse/SAMZA-1013
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> YARN Node labels were introduced in Hadoop version 2.6, which allows to group 
> nodes with similar characteristics and allows applications to specify where 
> to run. This patch adds support for YARN node labels in Samza.
> 
> In this implementation, node labels are defined directly in yarnConfig in 
> YarnClusterResourceManager. It might be better to have node labels as a part 
> of SamzaResourceRequest and SamzaResource classes, but 
> org.apache.hadoop.yarn.api.records.Container class doesn't contain node label 
> and hence we have nothing to pass to the SamzaResource constructor in 
> onContainersAllocated method of YarnClusterResourceManager class.
> 
> 
> Diffs
> -
> 
>   samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java 8f2dc48 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java
>  96d3d7c 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 
> 0998c43 
> 
> Diff: https://reviews.apache.org/r/51633/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Maxim Logvinenko
> 
>



Re: Review Request 51633: SAMZA-1013: Add YARN Node label support

2016-10-06 Thread Maxim Logvinenko


> On Oct. 5, 2016, 6:18 p.m., Jagadish Venkatraman wrote:
> > Overall, the patch looks great! This is exciting given that Samza can 
> > support scheduling based on tags. For example, jobs with rocksdb can be 
> > assigned to nodes with SSDs.
> > 
> > 
> > Can you please add some detail on testing this feature? 
> > What was the label setup of the cluster? (for example: Did we use an 
> > exclusive node label?), How many node labels? How many containers were 
> > requested for the job?

We haven't tested it in production, but the main idea is the next: we have 3 
different types of nodes in our hadoop cluster. The first type is used for 
ApplicationMasters (actually, we put up to 4 AM containers on one node). The 
second type is used for stateless jobs and this type of nodes has a small 
amount of memory. And the last type is used for stateful jobs and has more 
memory than others. So, there are 3 labels in our cluster: taskam, tasklowmem, 
taskhighmem. Now we force YARN to put containers on a particular type of nodes 
by a small trick with resources (we chose resources for node in such a way that 
YARN doesn't have any other variants except only one type of nodes). But hadoop 
labels is a more natural way to request containers to be placed on a specific 
node's type.


> On Oct. 5, 2016, 6:18 p.m., Jagadish Venkatraman wrote:
> > samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java,
> >  line 207
> > 
> >
> > Question: How does the node-label feature inter-play with the preferred 
> > host feature in Yarn? (Samza leverages this for host-affinity)
> > 
> > Does the label take precedence over the host? (or vice-versa).

If a preffered host is requested then label expression will be ignored. Here is 
from the hadoop documentation: Any please note that node label expression now 
can only take effect when the resource request has resourceName = ANY.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51633/#review151519
---


On Oct. 7, 2016, 12:08 a.m., Maxim Logvinenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51633/
> ---
> 
> (Updated Oct. 7, 2016, 12:08 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1013
> https://issues.apache.org/jira/browse/SAMZA-1013
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> YARN Node labels were introduced in Hadoop version 2.6, which allows to group 
> nodes with similar characteristics and allows applications to specify where 
> to run. This patch adds support for YARN node labels in Samza.
> 
> In this implementation, node labels are defined directly in yarnConfig in 
> YarnClusterResourceManager. It might be better to have node labels as a part 
> of SamzaResourceRequest and SamzaResource classes, but 
> org.apache.hadoop.yarn.api.records.Container class doesn't contain node label 
> and hence we have nothing to pass to the SamzaResource constructor in 
> onContainersAllocated method of YarnClusterResourceManager class.
> 
> 
> Diffs
> -
> 
>   samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java 8f2dc48 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java
>  96d3d7c 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 
> 0998c43 
> 
> Diff: https://reviews.apache.org/r/51633/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Maxim Logvinenko
> 
>



Re: Review Request 51633: SAMZA-1013: Add YARN Node label support

2016-10-06 Thread Maxim Logvinenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51633/
---

(Updated Oct. 7, 2016, 12:08 a.m.)


Review request for samza.


Changes
---

Addressed comments


Bugs: SAMZA-1013
https://issues.apache.org/jira/browse/SAMZA-1013


Repository: samza


Description
---

YARN Node labels were introduced in Hadoop version 2.6, which allows to group 
nodes with similar characteristics and allows applications to specify where to 
run. This patch adds support for YARN node labels in Samza.

In this implementation, node labels are defined directly in yarnConfig in 
YarnClusterResourceManager. It might be better to have node labels as a part of 
SamzaResourceRequest and SamzaResource classes, but 
org.apache.hadoop.yarn.api.records.Container class doesn't contain node label 
and hence we have nothing to pass to the SamzaResource constructor in 
onContainersAllocated method of YarnClusterResourceManager class.


Diffs (updated)
-

  samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java 8f2dc48 
  
samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java
 96d3d7c 
  samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 
0998c43 

Diff: https://reviews.apache.org/r/51633/diff/


Testing
---


Thanks,

Maxim Logvinenko



Re: Review Request 51633: SAMZA-1013: Add YARN Node label support

2016-10-05 Thread Jagadish Venkatraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51633/#review151519
---



Overall, the patch looks great! This is exciting given that Samza can support 
scheduling based on tags. For example, jobs with rocksdb can be assigned to 
nodes with SSDs.


Can you please add some detail on testing this feature? 
What was the label setup of the cluster? (for example: Did we use an exclusive 
node label?), How many node labels? How many containers were requested for the 
job?


samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java
 (line 187)


nit: containerLabel so that we distinguish it from the AM label.



samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java
 (line 207)


Question: How does the node-label feature inter-play with the preferred 
host feature in Yarn? (Samza leverages this for host-affinity)

Does the label take precedence over the host? (or vice-versa).



samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala (line 97)


nit: rename this to appMasterLabel (to avoid a conflict with 
containerLabel).


- Jagadish Venkatraman


On Sept. 5, 2016, 5:16 p.m., Maxim Logvinenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51633/
> ---
> 
> (Updated Sept. 5, 2016, 5:16 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1013
> https://issues.apache.org/jira/browse/SAMZA-1013
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> YARN Node labels were introduced in Hadoop version 2.6, which allows to group 
> nodes with similar characteristics and allows applications to specify where 
> to run. This patch adds support for YARN node labels in Samza.
> 
> In this implementation, node labels are defined directly in yarnConfig in 
> YarnClusterResourceManager. It might be better to have node labels as a part 
> of SamzaResourceRequest and SamzaResource classes, but 
> org.apache.hadoop.yarn.api.records.Container class doesn't contain node label 
> and hence we have nothing to pass to the SamzaResource constructor in 
> onContainersAllocated method of YarnClusterResourceManager class.
> 
> 
> Diffs
> -
> 
>   samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java 8f2dc48 
>   
> samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java
>  96d3d7c 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 
> 0998c43 
> 
> Diff: https://reviews.apache.org/r/51633/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Maxim Logvinenko
> 
>