[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-09-07 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16607445#comment-16607445
 ] 

Jonathan Hung commented on YARN-8200:
-

Build 
https://builds.apache.org/view/H-L/view/Hadoop/job/PreCommit-YARN-Build/21779 
timed out:
{noformat}cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-common.txt
 2>&1
Elapsed:   2m 40s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 2>&1
Elapsed:  15m 20s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-applicationhistoryservice.txt
 2>&1
Elapsed:   4m 49s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
 2>&1
Elapsed:  79m 41s
cd 
/testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
 2>&1
Elapsed:   3m 59s
cd /testptch/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client
/opt/maven/bin/mvn --batch-mode 
-Dmaven.repo.local=/home/jenkins/jenkins-slave/workspace/PreCommit-YARN-Build/yetus-m2/hadoop-branch-2-patch-0
 -Ptest-patch -Pparallel-tests -Pshelltest -Pnative -Drequire.fuse 
-Drequire.openssl -Drequire.snappy -Drequire.valgrind -Drequire.test.libhadoop 
-Pyarn-ui clean test -fae > 
/testptch/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
 2>&1
Build timed out (after 500 minutes). Marking the build as aborted.
Build was aborted
Performing Post build task...
Match found for :. : True
Logical operation result is TRUE
Running script  : #!/bin/bash{noformat}

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-8200-branch-2.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues 

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-09-06 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606091#comment-16606091
 ] 

Jonathan Hung commented on YARN-8200:
-

Rebased YARN-8200 on branch-2. Attached the full diff between branch-2 and 
YARN-8200 (001)

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: YARN-8200-branch-2.001.patch, 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-09-04 Thread Konstantin Shvachko (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603634#comment-16603634
 ] 

Konstantin Shvachko commented on YARN-8200:
---

I was trying to build YARN-8200 branch with this build:
https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86-jhung/8/console
And it is failing similar to HADOOP-15644. I think YARN-8200 branch need to be 
rebased to latest branch-2.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-08-01 Thread Wangda Tan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566119#comment-16566119
 ] 

Wangda Tan commented on YARN-8200:
--

[~jhung], thanks for sharing the result. Overall the number looks good.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-07-24 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554949#comment-16554949
 ] 

Jonathan Hung commented on YARN-8200:
-

Uploaded scheduler allocation counters for default resources (mem/cpu) and gpu 
resources. Also uploaded synth_sls.json configuration used for generating synth 
trace (4k nodes, 20k jobs)

SLS simulation using default resources took 2hr 10min, with gpu resources took 
2hr 25min. In the gpu SLS simulation we hardcoded each mapper and reducer to 
request 1 gpu.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
> Attachments: 
> counter.scheduler.operation.allocate.csv.defaultResources, 
> counter.scheduler.operation.allocate.csv.gpuResources, synth_sls.json
>
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-07-09 Thread Jonathan Hung (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537738#comment-16537738
 ] 

Jonathan Hung commented on YARN-8200:
-

Perf unit test:

2 resources:
{noformat}
2 14652.015
4 20876.826
6 29455.08
8 45045.047
10 37735.848
12 40816.33
14 46403.71
16 47169.812
18 49261.082
20 48543.688
22 49140.05
24 47393.363
26 48899.754
28 48899.754
30 49751.242
32 50125.312
34 46296.297
36 48780.49
38 47961.63
40 47732.695
42 47732.695
44 48076.92
46 49019.61
48 46728.973
50 42643.92
52 46296.297
54 48426.15
56 49504.95
58 47846.89
60 48543.688
62 47393.363
64 48899.754
66 48661.8
68 49140.05
70 49019.61
72 48780.49
74 48899.754
76 49382.715
78 47393.363
80 48076.92
82 48192.77
84 47732.695
86 50125.312
88 48899.754
90 49019.61
92 48076.92
94 48192.77
96 48076.92
98 42553.19
100 47846.89
102 47846.89
104 48780.49
106 47961.63
108 49140.05
110 47169.812
112 47846.89
114 47619.047
116 47619.047
118 49875.312
120 47619.047
122 47393.363
124 47505.938
126 48899.754
128 48780.49
130 46189.375
132 47505.938
134 45871.56
136 47619.047
138 48543.688
140 47619.047
142 48076.92
144 48076.92
146 47732.695
148 47281.324
150 48543.688
152 48661.8
154 47393.363
156 48543.688
158 47961.63
160 46296.297
162 47846.89
164 47846.89
166 48543.688
168 47505.938
170 47281.324
172 48309.18
174 48309.18
176 5.0
178 47505.938
180 48192.77
182 48192.77
184 48309.18
186 48543.688
188 48661.8
190 48192.77
192 47846.89
194 42105.26
196 48899.754
198 47961.63
#ResourceTypes = 2. Avg of fastest 20: 49382.715
2018-06-26 17:12:59,756 ERROR [Thread[Thread-11,5,main]] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:run(696)) - ExpiredTokenRemover 
received java.lang.InterruptedException: sleep interrupted{noformat}
3 resources:
{noformat}
2 10964.912
4 15760.441
6 26990.553
8 24752.475
10 32733.225
12 30487.805
14 27397.26
16 35778.176
18 33112.582
20 34843.207
22 31347.963
24 37383.176
26 34482.758
28 39062.5
30 38095.24
32 35842.293
34 32154.342
36 39447.73
38 37878.79
40 38240.918
42 36101.082
44 38167.938
46 38834.953
48 38022.812
50 38610.04
52 37105.75
54 38610.04
56 39215.688
58 38022.812
60 39215.688
62 37950.664
64 39138.94
66 37735.848
68 38684.72
70 38986.355
72 37735.848
74 37243.95
76 38535.645
78 37807.184
80 38314.176
82 36900.367
84 38610.04
86 39370.08
88 38314.176
90 39525.69
92 38461.54
94 39761.43
96 39370.08
98 38910.504
100 38022.812
102 39138.94
104 38314.176
106 39292.73
108 39292.73
110 39370.08
112 39292.73
114 38314.176
116 39840.637
118 39062.5
120 39370.08
122 37950.664
124 39062.5
126 37664.785
128 38684.72
130 38986.355
132 39525.69
134 40322.582
136 39292.73
138 37664.785
140 39525.69
142 39138.94
144 39370.08
146 39840.637
148 37037.035
150 38387.715
152 39525.69
154 37523.453
156 39603.96
158 36764.707
160 32362.459
162 29542.098
164 31250.0
166 29112.082
168 32000.0
170 27662.518
172 27100.271
174 26845.637
176 33388.98
178 35714.285
180 31152.648
182 36832.414
184 35650.625
186 38461.54
188 34662.047
190 31104.2
192 32573.29
194 36900.367
196 26702.27
198 30211.48
#ResourceTypes = 3. Avg of fastest 20: 39525.69
2018-06-26 17:16:14,530 ERROR [Thread[Thread-11,5,main]] 
delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:run(696)) - ExpiredTokenRemover 
received java.lang.InterruptedException: sleep interrupted
{noformat}
4 resources:
{noformat}
2 13166.557
4 21299.254
6 33222.59
8 37174.723
10 32786.887
12 33955.855
14 38095.24
16 37243.95
18 38167.938
20 37807.184
22 36900.367
24 39370.08
26 36563.07
28 38240.918
30 38759.69
32 39370.08
34 35523.98
36 39370.08
38 38610.04
40 38759.69
42 39603.96
44 37878.79
46 38910.504
48 38684.72
50 39682.54
52 38461.54
54 38535.645
56 37105.75
58 38910.504
60 38095.24
62 38684.72
64 38910.504
66 39138.94
68 39292.73
70 38095.24
72 39215.688
74 39447.73
76 39447.73
78 4.0
80 38759.69
82 38910.504
84 39603.96
86 38834.953
88 

[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-05-09 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469835#comment-16469835
 ] 

Sunil G commented on YARN-8200:
---

Thanks [~jhung]

Could u also pls share the SLS test comparison with branch-2 with feature 
branch also the perf UT case comparison.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-05-09 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469193#comment-16469193
 ] 

Jonathan Hung commented on YARN-8200:
-

FYI we just updated YARN-8200 with the latest set of backports (thanks [~zhz] 
for the help). 

 

[~sunilg] / [~leftnoteasy] / [~templedf], please let us know if we are missing 
anything major. Thanks!

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-05-02 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461622#comment-16461622
 ] 

Konstantin Shvachko commented on YARN-8200:
---

Hey guys I rebased branch YARN-8200 onto branch-2 and pushed [~jhung]'s commits 
into it.
Please take a look. Testing is in progress as I hear.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-30 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459363#comment-16459363
 ] 

Konstantin Shvachko commented on YARN-8200:
---

[~sunilg], thanks for the hints on the benchmarks.
Also I agree we should branch off of branch-2 rather than 2.9. Will re-branch.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-25 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453407#comment-16453407
 ] 

Sunil G commented on YARN-8200:
---

Hi [~shv] and [~jhung]

We used SLS to benchmark trunk first and then used same test bench to verify 
branch.
 # We used 4k nodes/ 8k nodes / 20k nodes slsnodes.json to verify the above 
mentioned case.
 # 
{{org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPerf}}
 is perf test UT which cover multiple resource types.

One suggestion from me is that, its better the branch is cut from branch-2 
compared to 2.9? 

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-25 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16453121#comment-16453121
 ] 

Konstantin Shvachko commented on YARN-8200:
---

Cut a branch for this jira out of branch-2.9. [~jhung] could you please merge 
your packports there.
[~sunilg], [~templedf] could you please advise on the tools for measuring 
performance impact for Capacity Scheduler and Resource Manager.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451385#comment-16451385
 ] 

Wangda Tan commented on YARN-8200:
--

+1 to have a branch for this which we can easier know which patches got 
backported.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-24 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451384#comment-16451384
 ] 

Konstantin Shvachko commented on YARN-8200:
---

What people think if we create a branch so that Jonathan could apply his work 
on the backporting?
That way we can make this discussion more material.
Also you guys will be able to try it and see if it fits your requirements.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-24 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450136#comment-16450136
 ] 

Daniel Templeton commented on YARN-8200:


The challenge in backporting resource types into 3.0 was mostly just in 
splitting resource types from resource profiles.  Otherwise it wasn't bad.  But 
that was pull from 3.x into 3.0.  Going back into 2.x will be much trickier.  
The code that resource types touches is code that you want to handle very, very 
carefully because it's at the core of what the resource manager does.

I don't think it's a good idea to pull resource types back into 2.x  Resource 
types represent a major change to the way the resource manager functions.  It's 
not something that appropriate for a minor release.  In fact, I would argue 
that resource types is one of the scarier changes in 3.0, so if you're willing 
to take on that risk in 2.x, you're probably better served just moving to 3.0.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449312#comment-16449312
 ] 

Wangda Tan commented on YARN-8200:
--

[~chris.douglas], I think [~sunilg] has already pointed out, the multiple 
resource type backport could be very tricky. IIRC, [~templedf] spent lots of 
time to backport from trunk to branch-3.0 in the last year and several issues 
caused by backport. And now it diverges more, we have more changes (about 5+ 
months) added to trunk including many scheduler related changes.

[~shv], I understand you want a bridge release. I'm still +1 to have a 2.x 
bridge release and backporting GPU related changes to branch-2. But it might be 
worthwhile to look at 3.x release and fix migration issues so all users who 
want to migrate to 3.x can benefit from such efforts. Just my $0.02.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-23 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449284#comment-16449284
 ] 

Sunil G commented on YARN-8200:
---

Hi [~jhung]

YARN-3926 has two parts of the work. Supporting multiple resource types and 
resource profiles. Due to the complex nature of resource profiles, Hadoop-3.0 
still contains only resource types which is a part of YARN-3926. Moreover there 
is a chance that CS performance could be impacted. For 3.0 and 3.1, many core 
scheduler part changes were there compared to branch-2. It was more easy as 
whole resource types development happened in trunk based code hence we could 
cross check the impacted areas well. In branch-2, there may be more chances of 
performance impact as code difference to 3.0 and branch-2.10 are bit of huge.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-23 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449023#comment-16449023
 ] 

Chris Douglas commented on YARN-8200:
-

bq. I would suggest to try use 3.x instead back porting this to 2.x so 
everybody is on the same codebase and improvement it. To me, the effort of 
backporting YARN-3926 + YARN-6223 will be comparable to upgrading a 3.x release 
and fixing (incompatible) issues
>From [~jhung]'s analysis, the backports were relatively straightforward 
>(mostly new code). Keeping it in sync with fixes/improvements in 3.x will 
>require ongoing maintenance, which is unfortunate. Are there specific areas 
>where you suspect the backport could become difficult to maintain?

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-23 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449000#comment-16449000
 ] 

Konstantin Shvachko commented on YARN-8200:
---

Hey [~leftnoteasy], we discussed it in [this 
thread|https://lists.apache.org/thread.html/6e200891756aefbfd8b36cd1d9f22f99626284b656671ab719ee1496@%3Chdfs-dev.hadoop.apache.org%3E]
 some time ago.
Clearly we want everybody on the same code base, but its a challenge to get 
there. So the thread proposed to build a bridge release, to help cross over to 
3.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8200) Backport resource types/GPU features to branch-2

2018-04-23 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448943#comment-16448943
 ] 

Wangda Tan commented on YARN-8200:
--

[~jhung], I would suggest to try use 3.x instead back porting this to 2.x so 
everybody is on the same codebase and improvement it. To me, the effort of 
backporting YARN-3926 + YARN-6223 will be comparable to upgrading a 3.x release 
and fixing (incompatible) issues. Both of the features are more than 0.5 MB and 
change many files.

I'm fine with backporting this to branch-2, but backporting itself could be 
very tricky.

> Backport resource types/GPU features to branch-2
> 
>
> Key: YARN-8200
> URL: https://issues.apache.org/jira/browse/YARN-8200
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Jonathan Hung
>Assignee: Jonathan Hung
>Priority: Major
>
> Currently we have a need for GPU scheduling on our YARN clusters to support 
> deep learning workloads. However, our main production clusters are running 
> older versions of branch-2 (2.7 in our case). To prevent supporting too many 
> very different hadoop versions across multiple clusters, we would like to 
> backport the resource types/resource profiles feature to branch-2, as well as 
> the GPU specific support.
>  
> We have done a trial backport of YARN-3926 and some miscellaneous patches in 
> YARN-7069 based on issues we uncovered, and the backport was fairly smooth. 
> We also did a trial backport of most of YARN-6223 (sans docker support).
>  
> Regarding the backports, perhaps we can do the development in a feature 
> branch and then merge to branch-2 when ready.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org