[jira] [Commented] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-11-19 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692696#comment-16692696 ] Zhankun Tang commented on YARN-8714: [~liuxun323], [~leftnoteasy], [~yuan_zac]. Updated a new version.

[jira] [Comment Edited] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-11-19 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692696#comment-16692696 ] Zhankun Tang edited comment on YARN-8714 at 11/20/18 5:58 AM: -- [~sunilg],

[jira] [Comment Edited] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-11-19 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692696#comment-16692696 ] Zhankun Tang edited comment on YARN-8714 at 11/20/18 5:55 AM: -- [~liuxun323],

[jira] [Updated] (YARN-8882) Phase 1 - Add a shared device mapping manager for device plugin to use

2018-11-20 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8882: --- Attachment: YARN-8882-trunk.004.patch > Phase 1 - Add a shared device mapping manager for device

[jira] [Updated] (YARN-8882) Phase 1 - Add a shared device mapping manager for device plugin to use

2018-11-20 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8882: --- Attachment: YARN-8882-trunk.007.patch > Phase 1 - Add a shared device mapping manager for device

[jira] [Commented] (YARN-8882) Phase 1 - Add a shared device mapping manager for device plugin to use

2018-11-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694457#comment-16694457 ] Zhankun Tang commented on YARN-8882: [~goyal.sunil] , [~leftnoteasy], Could you please help to review?

[jira] [Comment Edited] (YARN-8882) Phase 1 - Add a shared device mapping manager for device plugin to use

2018-11-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694457#comment-16694457 ] Zhankun Tang edited comment on YARN-8882 at 11/21/18 9:28 AM: -- [~sunilg] ,

[jira] [Commented] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-11-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694779#comment-16694779 ] Zhankun Tang commented on YARN-8714: [~liuxun323] , One question from me. Do you need to support

[jira] [Updated] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-11-22 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8714: --- Attachment: YARN-8714-trunk.002.patch > [Submarine] Support files/tarballs to be localized for a

[jira] [Updated] (YARN-8887) Support isolation in pluggable device framework

2018-11-22 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8887: --- Description: Devices isolation needs a complete description in API specs(DeviceRuntimeSpec) and a

[jira] [Commented] (YARN-9042) Javadoc error in deviceplugin package

2018-11-22 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695693#comment-16695693 ] Zhankun Tang commented on YARN-9042: [~rohithsharma] , the patch is uploaded. Please help to review.

[jira] [Commented] (YARN-8882) Phase 1 - Add a shared device mapping manager for device plugin to use

2018-11-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695504#comment-16695504 ] Zhankun Tang commented on YARN-8882: [~leftnoteasy] , Yeah. And it will include a list of device

[jira] [Updated] (YARN-9042) Javadoc error in deviceplugin package

2018-11-22 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9042: --- Attachment: YARN-9042-trunk.001.patch > Javadoc error in deviceplugin package >

[jira] [Updated] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-11-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8714: --- Description: See

[jira] [Assigned] (YARN-9042) Javadoc error in deviceplugin package

2018-11-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang reassigned YARN-9042: -- Assignee: Zhankun Tang > Javadoc error in deviceplugin package >

[jira] [Commented] (YARN-9042) Javadoc error in deviceplugin package

2018-11-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695507#comment-16695507 ] Zhankun Tang commented on YARN-9042: Thanks [~rohithsharma] for pointing this. I'll fix it. :) >

[jira] [Updated] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-11-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8714: --- Attachment: YARN-8714-trunk.001.patch > [Submarine] Support files/tarballs to be localized for a

[jira] [Commented] (YARN-5106) Provide a builder interface for FairScheduler allocations for use in tests

2018-11-20 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692955#comment-16692955 ] Zhankun Tang commented on YARN-5106: Thanks, [~snemeth] . +1, This LGTM. > Provide a builder

[jira] [Updated] (YARN-8882) Phase 1 - Add a shared device mapping manager for device plugin to use

2018-11-20 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8882: --- Description: Since a few devices uses FIFO policy to assign devices to the container, we use a shared

[jira] [Created] (YARN-9059) Support RESTful API in NM for query FPGA allocation

2018-11-26 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9059: -- Summary: Support RESTful API in NM for query FPGA allocation Key: YARN-9059 URL: https://issues.apache.org/jira/browse/YARN-9059 Project: Hadoop YARN Issue

[jira] [Created] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-11-26 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9060: -- Summary: [YARN-8851] Phase 1 - Support device isolation in native container-executor Key: YARN-9060 URL: https://issues.apache.org/jira/browse/YARN-9060 Project: Hadoop

[jira] [Updated] (YARN-8851) [Umbrella] A pluggable device plugin framework to ease vendor plugin development

2018-11-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8851: --- Summary: [Umbrella] A pluggable device plugin framework to ease vendor plugin development (was:

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-11-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Attachment: YARN-9060-trunk.001.patch > [YARN-8851] Phase 1 - Support device isolation in native

[jira] [Commented] (YARN-8822) Nvidia-docker v2 support

2018-11-27 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700056#comment-16700056 ] Zhankun Tang commented on YARN-8822: [~Charo Zhang], Thanks for the patch! It looks good to me.

[jira] [Updated] (YARN-9061) Improve the GPU/FPGA module log message of container-executor

2018-11-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9061: --- Attachment: YARN-9061-trunk.001.patch > Improve the GPU/FPGA module log message of container-executor

[jira] [Updated] (YARN-9061) Improve the GPU/FPGA module log message of container-executor

2018-11-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9061: --- Attachment: YARN-9061-trunk.002.patch > Improve the GPU/FPGA module log message of container-executor

[jira] [Created] (YARN-9061) Improve the GPU/FPGA module log message of container-executor

2018-11-26 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9061: -- Summary: Improve the GPU/FPGA module log message of container-executor Key: YARN-9061 URL: https://issues.apache.org/jira/browse/YARN-9061 Project: Hadoop YARN

[jira] [Commented] (YARN-8822) Nvidia-docker v2 support

2018-11-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698902#comment-16698902 ] Zhankun Tang commented on YARN-8822: [~Charo Zhang] , Thanks for the patch. Since a new pluggable

[jira] [Commented] (YARN-8822) Nvidia-docker v2 support

2018-11-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16699114#comment-16699114 ] Zhankun Tang commented on YARN-8822: [~Charo Zhang] , the patch name should be like

[jira] [Updated] (YARN-8820) [Umbrella] GPU support on YARN - Phase 2

2018-11-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8820: --- Description: In YARN-6223, we've done a basic support for Nvidia GPU on YARN including resource

[jira] [Assigned] (YARN-8823) Monitor the healthy state of GPU

2018-11-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang reassigned YARN-8823: -- Assignee: Zhankun Tang > Monitor the healthy state of GPU > >

[jira] [Updated] (YARN-8821) GPU hierarchy scheduling support

2018-11-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Description: GPU topology affects performance dramatically. There's been a discussion in YARN-7481.

[jira] [Created] (YARN-9033) ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled

2018-11-17 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9033: -- Summary: ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled Key: YARN-9033 URL: https://issues.apache.org/jira/browse/YARN-9033

[jira] [Updated] (YARN-9033) ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled

2018-11-17 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9033: --- Description: The ResourceHandlerChain#bootstrap will always be invoked in NM's

[jira] [Updated] (YARN-8823) Monitor the healthy state of GPU

2018-11-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8823: --- Description: We have GPU resource discovered when the NM bootstrap but not updated through later

[jira] [Assigned] (YARN-8821) GPU hierarchy scheduling support

2018-11-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang reassigned YARN-8821: -- Assignee: Zhankun Tang > GPU hierarchy scheduling support > >

[jira] [Created] (YARN-9190) [Submarine] Submarine job will fail to run as a first job on a new created Hadoop 3.2.0 RC1

2019-01-10 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9190: -- Summary: [Submarine] Submarine job will fail to run as a first job on a new created Hadoop 3.2.0 RC1 Key: YARN-9190 URL: https://issues.apache.org/jira/browse/YARN-9190

[jira] [Updated] (YARN-9190) [Submarine] Submarine job will fail to run as a first job on a new created Hadoop 3.2.0 RC1 cluster

2019-01-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9190: --- Summary: [Submarine] Submarine job will fail to run as a first job on a new created Hadoop 3.2.0 RC1

[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735941#comment-16735941 ] Zhankun Tang commented on YARN-8927: A draft patch WIP. Please comment in case the wrong direction.

[jira] [Updated] (YARN-9168) DistributedShell client timeout should be -1 by default

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9168: --- Attachment: YARN-9168-trunk.001.patch > DistributedShell client timeout should be -1 by default >

[jira] [Commented] (YARN-9168) DistributedShell client timeout should be -1 by default

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735902#comment-16735902 ] Zhankun Tang commented on YARN-9168: [~cheersyang] , Yeah. Agree. Please take a look at the patch. I

[jira] [Updated] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8927: --- Summary: Support trust top-level image like "centos" when "library" is configured in

[jira] [Updated] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8927: --- Attachment: YARN-8927-trunk.001.patch > Support trust top-level image like "centos" when "library" is

[jira] [Updated] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8927: --- Attachment: YARN-8927-trunk.002.patch > Support trust top-level image like "centos" when "library" is

[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736650#comment-16736650 ] Zhankun Tang commented on YARN-8927: [~eyang] , Thanks for the review! Yeah, it doesn't consider the

[jira] [Commented] (YARN-9176) [Submarine] Repair 404 error of links in documentation

2019-01-04 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733924#comment-16733924 ] Zhankun Tang commented on YARN-9176: [~hongdd] , Thanks for raising this. Could you post a screenshot

[jira] [Created] (YARN-9168) DistributedShell client timeout should be -1 by default

2019-01-02 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9168: -- Summary: DistributedShell client timeout should be -1 by default Key: YARN-9168 URL: https://issues.apache.org/jira/browse/YARN-9168 Project: Hadoop YARN Issue

[jira] [Commented] (YARN-9190) [Submarine] Submarine job will fail to run as a first job on a new created Hadoop 3.2.0 RC1 cluster

2019-01-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739899#comment-16739899 ] Zhankun Tang commented on YARN-9190: [~billie.rinaldi] , [~eyang] , [~csingh] . Do you know which

[jira] [Updated] (YARN-9190) [Submarine] Submarine job will fail to run as a first job on a new created Hadoop 3.2.0 RC1 cluster

2019-01-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9190: --- Description: This issue was found when verifying submarine in Hadoop 3.2.0 RC1 planning. The

[jira] [Commented] (YARN-9190) [Submarine] Submarine job will fail to run as a first job on a new created Hadoop 3.2.0 RC1 cluster

2019-01-13 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741752#comment-16741752 ] Zhankun Tang commented on YARN-9190: [~billie.rinaldi] , Thanks for the reply! One thing I forget to

[jira] [Updated] (YARN-8725) Submarine job staging directory has a lot of useless PRIMARY_WORKER-launch-script-***.sh scripts when submitting a job multiple times

2018-09-17 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8725: --- Attachment: YARN-8725-trunk.001.patch > Submarine job staging directory has a lot of useless >

[jira] [Commented] (YARN-8725) Submarine job staging directory has a lot of useless PRIMARY_WORKER-launch-script-***.sh scripts when submitting a job multiple times

2018-09-17 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617600#comment-16617600 ] Zhankun Tang commented on YARN-8725: Added a patch which does following: # add a new option

[jira] [Comment Edited] (YARN-8725) Submarine job staging directory has a lot of useless PRIMARY_WORKER-launch-script-***.sh scripts when submitting a job multiple times

2018-09-17 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617600#comment-16617600 ] Zhankun Tang edited comment on YARN-8725 at 9/17/18 2:36 PM: - Added a patch

[jira] [Commented] (YARN-7715) Support NM promotion/demotion of running containers.

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714443#comment-16714443 ] Zhankun Tang commented on YARN-7715: [~miklos.szeg...@cloudera.com], [~asuresh], Is this JIRA depend

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Attachment: YARN-9060-trunk.004.patch > [YARN-8851] Phase 1 - Support device isolation in native

[jira] [Commented] (YARN-9099) GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714513#comment-16714513 ] Zhankun Tang commented on YARN-9099: [~snemeth], Thanks for catching up this! The patch looks good to

[jira] [Created] (YARN-9104) Fix the bug in DeviceMappingManager#getReleasingDevices

2018-12-10 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9104: -- Summary: Fix the bug in DeviceMappingManager#getReleasingDevices Key: YARN-9104 URL: https://issues.apache.org/jira/browse/YARN-9104 Project: Hadoop YARN Issue

[jira] [Created] (YARN-9103) Fix the bug in DeviceMappingManager#getReleasingDevices

2018-12-10 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9103: -- Summary: Fix the bug in DeviceMappingManager#getReleasingDevices Key: YARN-9103 URL: https://issues.apache.org/jira/browse/YARN-9103 Project: Hadoop YARN Issue

[jira] [Resolved] (YARN-9104) Fix the bug in DeviceMappingManager#getReleasingDevices

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang resolved YARN-9104. Resolution: Duplicate Resolve this due to JIRA's duplicated the creation > Fix the bug in

[jira] [Commented] (YARN-9033) ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled

2018-12-19 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725010#comment-16725010 ] Zhankun Tang commented on YARN-9033: [~snemeth], thanks for looking at this.  {quote}"But actually,

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2019-01-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Attachment: YARN-9060-trunk.012.patch > [YARN-8851] Phase 1 - Support device isolation in native

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2019-01-24 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Attachment: YARN-9060-trunk.011.patch > [YARN-8851] Phase 1 - Support device isolation in native

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2019-01-24 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Attachment: YARN-9060-trunk.010.patch > [YARN-8851] Phase 1 - Support device isolation in native

[jira] [Updated] (YARN-8821) GPU hierarchy/topology scheduling support

2019-01-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Attachment: YARN-8821-trunk.001.patch > GPU hierarchy/topology scheduling support >

[jira] [Updated] (YARN-8821) GPU hierarchy/topology scheduling support

2019-01-26 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Description: GPU topology affects performance dramatically. There's been a discussion in YARN-7481.

[jira] [Commented] (YARN-9205) When using custom resource type, application will fail to run due to the CapacityScheduler throws InvalidResourceRequestException(GREATER_THEN_MAX_ALLOCATION)

2019-01-22 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748839#comment-16748839 ] Zhankun Tang commented on YARN-9205: [~sunilg], [~leftnoteasy] , The v08.patch is the latest patch

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2019-01-27 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Description: Due to the cgroups v1 implementation policy in linux kernel, we cannot update the value 

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin as an example

2019-01-27 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Summary: [YARN-8851] Phase 1 - Support device isolation and use the Nvidia GPU plugin as an example

[jira] [Updated] (YARN-8821) GPU hierarchy/topology scheduling support

2019-02-18 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Attachment: YARN-8821-trunk.007.patch > GPU hierarchy/topology scheduling support >

[jira] [Updated] (YARN-8821) GPU hierarchy/topology scheduling support

2019-02-18 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Description: h2. Background GPU topology affects performance. There's been a discussion in

[jira] [Updated] (YARN-8821) GPU hierarchy/topology scheduling support

2019-02-18 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Attachment: GPUTopologyPerformance.png > GPU hierarchy/topology scheduling support >

[jira] [Updated] (YARN-8821) GPU hierarchy/topology scheduling support

2019-02-18 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Description: h2. Background GPU topology affects performance. There's been a discussion in

[jira] [Updated] (YARN-8821) GPU hierarchy/topology scheduling support

2019-02-18 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Description: h2. Background GPU topology affects performance. There's been a discussion in

[jira] [Updated] (YARN-8821) GPU hierarchy/topology scheduling support

2019-02-18 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Attachment: YARN-8821-trunk.008.patch > GPU hierarchy/topology scheduling support >

[jira] [Updated] (YARN-8821) GPU hierarchy/topology scheduling support

2019-02-18 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Attachment: YARN-8821-trunk.009.patch > GPU hierarchy/topology scheduling support >

[jira] [Updated] (YARN-8821) GPU hierarchy/topology scheduling support

2019-02-18 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Description: h2. Background GPU topology affects performance. There's been a discussion in

[jira] [Updated] (YARN-8821) GPU hierarchy/topology scheduling support

2019-02-18 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Description: h2. Background GPU topology affects performance. There's been a discussion in

[jira] [Updated] (YARN-8821) [YARN-8851] GPU hierarchy/topology scheduling support based on pluggable device framework

2019-02-18 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8821: --- Summary: [YARN-8851] GPU hierarchy/topology scheduling support based on pluggable device framework

[jira] [Commented] (YARN-8821) [YARN-8851] GPU hierarchy/topology scheduling support based on pluggable device framework

2019-02-24 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776288#comment-16776288 ] Zhankun Tang commented on YARN-8821: [~sunilg] , [~Weiwei Yang] , Thanks for the review! >

[jira] [Commented] (YARN-9331) [YARN-8851] Fix a bug that lacking cgroup initialization when bootstrap DeviceResourceHandlerImpl

2019-02-25 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777491#comment-16777491 ] Zhankun Tang commented on YARN-9331: [~cheersyang], Thanks a lot! > [YARN-8851] Fix a bug that

[jira] [Updated] (YARN-9331) [YARN-8851] Fix a bug that lacking cgroup initialization when bootstrap DeviceResourceHandlerImpl

2019-02-25 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9331: --- Summary: [YARN-8851] Fix a bug that lacking cgroup initialization when bootstrap

[jira] [Resolved] (YARN-8887) Support isolation in pluggable device framework

2019-02-19 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang resolved YARN-8887. Resolution: Duplicate Resolve it as duplicated with YAR-9060 > Support isolation in pluggable

[jira] [Resolved] (YARN-8889) Add well-defined interface in container-executor to support vendor plugins isolation request

2019-02-19 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang resolved YARN-8889. Resolution: Duplicate Resolve this as already implemented in YARN-9060 > Add well-defined

[jira] [Resolved] (YARN-9103) Fix the bug in DeviceMappingManager#getReleasingDevices

2019-02-19 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang resolved YARN-9103. Resolution: Won't Fix Resolve it as it is fixed in YARN-9060 > Fix the bug in

[jira] [Resolved] (YARN-8888) Support device topology scheduling

2019-02-19 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang resolved YARN-. Resolution: Won't Fix Resolve it due to the GPU topology algorithm is better implemented in the

[jira] [Commented] (YARN-8821) [YARN-8851] GPU hierarchy/topology scheduling support based on pluggable device framework

2019-02-19 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771683#comment-16771683 ] Zhankun Tang commented on YARN-8821: The unit test seems unrelated to this patch. > [YARN-8851] GPU

[jira] [Resolved] (YARN-8883) Phase 1 - Provide an example of fake vendor plugin

2019-02-19 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang resolved YARN-8883. Resolution: Duplicate Resolve it due to the YARN-9060 has an example of Nvidia GPU plugin > Phase

[jira] [Updated] (YARN-9319) YARN-9060 does not compile

2019-02-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9319: --- Attachment: YARN-9319-trunk.001.patch > YARN-9060 does not compile > -- > >

[jira] [Commented] (YARN-9319) YARN-9060 does not compile

2019-02-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774212#comment-16774212 ] Zhankun Tang commented on YARN-9319: [~jojochuang] , it's caused by a compiler inconsistent behavior

[jira] [Assigned] (YARN-9319) YARN-9060 does not compile

2019-02-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang reassigned YARN-9319: -- Assignee: Zhankun Tang > YARN-9060 does not compile > -- > >

[jira] [Comment Edited] (YARN-9319) YARN-9060 does not compile

2019-02-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774212#comment-16774212 ] Zhankun Tang edited comment on YARN-9319 at 2/21/19 3:52 PM: - [~jojochuang] ,

[jira] [Comment Edited] (YARN-9319) YARN-9060 does not compile

2019-02-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774212#comment-16774212 ] Zhankun Tang edited comment on YARN-9319 at 2/21/19 3:52 PM: - [~jojochuang] ,

[jira] [Resolved] (YARN-8890) Port existing GPU module into pluggable device framework

2019-03-06 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang resolved YARN-8890. Resolution: Duplicate Since we have a sample Nvidia GPU plugin merged in YARN-9060. Need not to do

[jira] [Commented] (YARN-8891) Documentation of the pluggable device framework

2019-02-22 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775067#comment-16775067 ] Zhankun Tang commented on YARN-8891: [~sunilg] , the Jenkins result is ok too. Could you help to merge

[jira] [Commented] (YARN-8891) Documentation of the pluggable device framework

2019-02-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774833#comment-16774833 ] Zhankun Tang commented on YARN-8891: [~sunilg] , All great suggestions. Thanks a lot! Fixed all above

[jira] [Updated] (YARN-8891) Documentation of the pluggable device framework

2019-02-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8891: --- Attachment: YARN-8891-trunk.007.patch > Documentation of the pluggable device framework >

[jira] [Commented] (YARN-9121) Users of GpuDiscoverer.getInstance() are not possible to test as instance is a static field

2019-02-24 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776499#comment-16776499 ] Zhankun Tang commented on YARN-9121: [~snemeth] , thanks for the patch. Looks good to me. +1 > Users

[jira] [Updated] (YARN-9331) Fix a bug that lacking cgroup initialization when bootstrap DeviceResourceHandlerImpl

2019-02-25 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9331: --- Attachment: YARN-9331-trunk.001.patch > Fix a bug that lacking cgroup initialization when bootstrap

[jira] [Created] (YARN-9331) Fix a bug that lacking cgroup initialization when bootstrap DeviceResourceHandlerImpl

2019-02-25 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9331: -- Summary: Fix a bug that lacking cgroup initialization when bootstrap DeviceResourceHandlerImpl Key: YARN-9331 URL: https://issues.apache.org/jira/browse/YARN-9331

[jira] [Commented] (YARN-9156) [YARN-8851] Improve debug message in device plugin method compatibility check of ResourcePluginManager

2019-02-21 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774684#comment-16774684 ] Zhankun Tang commented on YARN-9156: [~cheersyang] , could you please help to review this? Thanks. >

<    3   4   5   6   7   8   9   10   11   >