[jira] [Created] (YARN-9190) [Submarine] Submarine job will fail to run as a first job on a new created Hadoop 3.2.0 RC1

2019-01-10 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9190: -- Summary: [Submarine] Submarine job will fail to run as a first job on a new created Hadoop 3.2.0 RC1 Key: YARN-9190 URL: https://issues.apache.org/jira/browse/YARN-9190

[jira] [Updated] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8927: --- Attachment: YARN-8927-trunk.002.patch > Support trust top-level image like "centos" when "library" is

[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736650#comment-16736650 ] Zhankun Tang commented on YARN-8927: [~eyang] , Thanks for the review! Yeah, it doesn't consider the

[jira] [Commented] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735941#comment-16735941 ] Zhankun Tang commented on YARN-8927: A draft patch WIP. Please comment in case the wrong direction.

[jira] [Updated] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8927: --- Attachment: YARN-8927-trunk.001.patch > Support trust top-level image like "centos" when "library" is

[jira] [Updated] (YARN-8927) Support trust top-level image like "centos" when "library" is configured in "docker.trusted.registries"

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8927: --- Summary: Support trust top-level image like "centos" when "library" is configured in

[jira] [Updated] (YARN-9168) DistributedShell client timeout should be -1 by default

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9168: --- Attachment: YARN-9168-trunk.001.patch > DistributedShell client timeout should be -1 by default >

[jira] [Commented] (YARN-9168) DistributedShell client timeout should be -1 by default

2019-01-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735902#comment-16735902 ] Zhankun Tang commented on YARN-9168: [~cheersyang] , Yeah. Agree. Please take a look at the patch. I

[jira] [Commented] (YARN-9176) [Submarine] Repair 404 error of links in documentation

2019-01-04 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733924#comment-16733924 ] Zhankun Tang commented on YARN-9176: [~hongdd] , Thanks for raising this. Could you post a screenshot

[jira] [Resolved] (YARN-9172) Correct the typo related to "DominantResourceCalculator" in error message of CapacityScheduler when resource types is more than two

2019-01-02 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang resolved YARN-9172. Resolution: Fixed Already fixed in trunk. Closed this JIRA. > Correct the typo related to

[jira] [Updated] (YARN-9172) Correct the typo related to "DominantResourceCalculator" in error message of CapacityScheduler when resource types is more than two

2019-01-02 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9172: --- Summary: Correct the typo related to "DominantResourceCalculator" in error message of

[jira] [Created] (YARN-9172) Correct the typo "DominantResourceCalculator" in CapacityScheduler

2019-01-02 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9172: -- Summary: Correct the typo "DominantResourceCalculator" in CapacityScheduler Key: YARN-9172 URL: https://issues.apache.org/jira/browse/YARN-9172 Project: Hadoop YARN

[jira] [Commented] (YARN-9168) DistributedShell client timeout should be -1 by default

2019-01-02 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732582#comment-16732582 ] Zhankun Tang commented on YARN-9168: [~cheersyang], Yeah. Thanks for reviewing this. I'm fine with the

[jira] [Created] (YARN-9168) DistributedShell client timeout should be -1 by default

2019-01-02 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9168: -- Summary: DistributedShell client timeout should be -1 by default Key: YARN-9168 URL: https://issues.apache.org/jira/browse/YARN-9168 Project: Hadoop YARN Issue

[jira] [Created] (YARN-9167) [Submarine] Support fault tolerance when Tensorflow worker container fails

2019-01-01 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9167: -- Summary: [Submarine] Support fault tolerance when Tensorflow worker container fails Key: YARN-9167 URL: https://issues.apache.org/jira/browse/YARN-9167 Project: Hadoop

[jira] [Updated] (YARN-9160) [Submarine] Document "PYTHONPATH" environment variable setting when using -localization options

2018-12-27 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9160: --- Summary: [Submarine] Document "PYTHONPATH" environment variable setting when using -localization

[jira] [Updated] (YARN-9160) Document "PYTHONPATH" environment variable setting when using -localization options

2018-12-27 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9160: --- Attachment: YARN-9160-trunk.001.patch > Document "PYTHONPATH" environment variable setting when using

[jira] [Updated] (YARN-9160) Document "PYTHONPATH" environment variable setting when using -localization options

2018-12-27 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9160: --- Summary: Document "PYTHONPATH" environment variable setting when using -localization options (was:

[jira] [Created] (YARN-9160) Add document for "PYTHONPATH" environment variable setting when using -localization options

2018-12-27 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9160: -- Summary: Add document for "PYTHONPATH" environment variable setting when using -localization options Key: YARN-9160 URL: https://issues.apache.org/jira/browse/YARN-9160

[jira] [Updated] (YARN-9156) [YARN-8851] Improve debug message in device plugin method compatibility check of ResourcePluginManager

2018-12-24 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9156: --- Attachment: YARN-9156-trunk.001.patch > [YARN-8851] Improve debug message in device plugin method

[jira] [Updated] (YARN-9156) [YARN-8851] Improve debug message in device plugin method compatibility check of ResourcePluginManager

2018-12-24 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9156: --- Summary: [YARN-8851] Improve debug message in device plugin method compatibility check of

[jira] [Created] (YARN-9156) [YARN-8851] Improve device plugin method compatibility debug message in ResourcePluginManager

2018-12-24 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9156: -- Summary: [YARN-8851] Improve device plugin method compatibility debug message in ResourcePluginManager Key: YARN-9156 URL: https://issues.apache.org/jira/browse/YARN-9156

[jira] [Commented] (YARN-9033) ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled

2018-12-19 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725010#comment-16725010 ] Zhankun Tang commented on YARN-9033: [~snemeth], thanks for looking at this.  {quote}"But actually,

[jira] [Commented] (YARN-9120) Need to have a way to turn off GPU auto-discovery in GpuDiscoverer

2018-12-17 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722821#comment-16722821 ] Zhankun Tang commented on YARN-9120: [~pbacsko], talked with [~rohithsharma] offline. The Amabari can

[jira] [Commented] (YARN-8822) Nvidia-docker v2 support

2018-12-16 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722453#comment-16722453 ] Zhankun Tang commented on YARN-8822: I verified in a GPU server with DS running cgroup and Nvidia

[jira] [Commented] (YARN-9120) Need to have a way to turn off GPU auto-discovery in GpuDiscoverer

2018-12-16 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16722443#comment-16722443 ] Zhankun Tang commented on YARN-9120: [~snemeth], I double-checked that if we remove "yarn.io/gpu" from

[jira] [Updated] (YARN-8822) Nvidia-docker v2 support

2018-12-16 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8822: --- Attachment: Nv2-3.png > Nvidia-docker v2 support > > > Key:

[jira] [Updated] (YARN-8822) Nvidia-docker v2 support

2018-12-16 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8822: --- Attachment: Nv2-2.png > Nvidia-docker v2 support > > > Key:

[jira] [Updated] (YARN-8822) Nvidia-docker v2 support

2018-12-16 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8822: --- Attachment: Nv2-1.png > Nvidia-docker v2 support > > > Key:

[jira] [Commented] (YARN-9120) Need to have a way to turn off GPU auto-discovery in GpuDiscoverer

2018-12-14 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721237#comment-16721237 ] Zhankun Tang commented on YARN-9120: {quote}1. Could you please confirm whether only removing the

[jira] [Comment Edited] (YARN-9120) Need to have a way to turn off GPU auto-discovery in GpuDiscoverer

2018-12-13 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721027#comment-16721027 ] Zhankun Tang edited comment on YARN-9120 at 12/14/18 7:46 AM: -- [~snemeth] , 

[jira] [Commented] (YARN-9120) Need to have a way to turn off GPU auto-discovery in GpuDiscoverer

2018-12-13 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721027#comment-16721027 ] Zhankun Tang commented on YARN-9120: [~snemeth] , Thanks for the explanation! Agree that it's valuable

[jira] [Updated] (YARN-9113) [Submarine] Add proper shutdown hook when user interrupts the client

2018-12-13 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9113: --- Attachment: YARN-9113-trunk.002.patch > [Submarine] Add proper shutdown hook when user interrupts the

[jira] [Updated] (YARN-9122) Add table of contents to YARN Service API document

2018-12-13 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9122: --- Attachment: s1.png > Add table of contents to YARN Service API document >

[jira] [Commented] (YARN-9122) Add table of contents to YARN Service API document

2018-12-13 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720975#comment-16720975 ] Zhankun Tang commented on YARN-9122: [~gsaha], happened to be compiling the website. A patch for your

[jira] [Updated] (YARN-9122) Add table of contents to YARN Service API document

2018-12-13 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9122: --- Attachment: YARN-9122-trunk.001.patch > Add table of contents to YARN Service API document >

[jira] [Commented] (YARN-9120) Need to have a way to turn off GPU auto-discovery in GpuDiscoverer

2018-12-13 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720929#comment-16720929 ] Zhankun Tang commented on YARN-9120: [~snemeth] , I'm not quite clear of the requirement here. Are we

[jira] [Comment Edited] (YARN-9120) Need to have a way to turn off GPU auto-discovery in GpuDiscoverer

2018-12-13 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720929#comment-16720929 ] Zhankun Tang edited comment on YARN-9120 at 12/14/18 5:15 AM: -- [~snemeth] , 

[jira] [Comment Edited] (YARN-9120) Need to have a way to turn off GPU auto-discovery in GpuDiscoverer

2018-12-13 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720929#comment-16720929 ] Zhankun Tang edited comment on YARN-9120 at 12/14/18 5:14 AM: -- [~snemeth] , 

[jira] [Comment Edited] (YARN-9120) Need to have a way to turn off GPU auto-discovery in GpuDiscoverer

2018-12-13 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720929#comment-16720929 ] Zhankun Tang edited comment on YARN-9120 at 12/14/18 5:12 AM: -- [~snemeth] , 

[jira] [Commented] (YARN-9033) ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled

2018-12-12 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718643#comment-16718643 ] Zhankun Tang commented on YARN-9033: [~haibochen] , [~cheersyang]. If any chance, could you help to

[jira] [Updated] (YARN-9113) [Submarine] Add proper shutdown hook when user interrupts the client

2018-12-12 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9113: --- Attachment: YARN-9113-trunk.001.patch > [Submarine] Add proper shutdown hook when user interrupts the

[jira] [Updated] (YARN-9113) [Submarine] Add proper shutdown hook when user interrupts the client

2018-12-12 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9113: --- Description: When the job run with "wait_job_finish" option and then the user "ctrl+c" the job. The

[jira] [Created] (YARN-9113) [Submarine] Add proper shutdown hook when user interrupts the client

2018-12-11 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9113: -- Summary: [Submarine] Add proper shutdown hook when user interrupts the client Key: YARN-9113 URL: https://issues.apache.org/jira/browse/YARN-9113 Project: Hadoop YARN

[jira] [Commented] (YARN-9112) [Submarine] Support polling applicationId when it's not ready in cluster

2018-12-11 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718431#comment-16718431 ] Zhankun Tang commented on YARN-9112: [~wangda] , please help to review. Thanks. > [Submarine] Support

[jira] [Updated] (YARN-9112) [Submarine] Support polling applicationId when it's not ready in cluster

2018-12-11 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9112: --- Attachment: YARN-9112-trunk.001.patch > [Submarine] Support polling applicationId when it's not ready

[jira] [Created] (YARN-9112) [Submarine] Support polling applicationId when it's not ready in cluster

2018-12-11 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9112: -- Summary: [Submarine] Support polling applicationId when it's not ready in cluster Key: YARN-9112 URL: https://issues.apache.org/jira/browse/YARN-9112 Project: Hadoop

[jira] [Updated] (YARN-9033) ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9033: --- Attachment: YARN-9033-trunk.002.patch > ResourceHandlerChain#bootstrap is invoked twice during NM

[jira] [Updated] (YARN-9033) ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9033: --- Attachment: YARN-9033-trunk.001.patch > ResourceHandlerChain#bootstrap is invoked twice during NM

[jira] [Updated] (YARN-9033) ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9033: --- Description: The ResourceHandlerChain#bootstrap will always be invoked in NM's

[jira] [Updated] (YARN-9033) ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9033: --- Description: The ResourceHandlerChain#bootstrap will always be invoked in NM's

[jira] [Updated] (YARN-9033) ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9033: --- Description: The ResourceHandlerChain#bootstrap will always be invoked in NM's

[jira] [Updated] (YARN-9033) ResourceHandlerChain#bootstrap is invoked twice during NM start if LinuxContainerExecutor enabled

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9033: --- Description: The ResourceHandlerChain#bootstrap will always be invoked in NM's

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Attachment: YARN-9060-trunk.005.patch > [YARN-8851] Phase 1 - Support device isolation in native

[jira] [Resolved] (YARN-9104) Fix the bug in DeviceMappingManager#getReleasingDevices

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang resolved YARN-9104. Resolution: Duplicate Resolve this due to JIRA's duplicated the creation > Fix the bug in

[jira] [Created] (YARN-9103) Fix the bug in DeviceMappingManager#getReleasingDevices

2018-12-10 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9103: -- Summary: Fix the bug in DeviceMappingManager#getReleasingDevices Key: YARN-9103 URL: https://issues.apache.org/jira/browse/YARN-9103 Project: Hadoop YARN Issue

[jira] [Created] (YARN-9104) Fix the bug in DeviceMappingManager#getReleasingDevices

2018-12-10 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9104: -- Summary: Fix the bug in DeviceMappingManager#getReleasingDevices Key: YARN-9104 URL: https://issues.apache.org/jira/browse/YARN-9104 Project: Hadoop YARN Issue

[jira] [Commented] (YARN-9099) GpuResourceAllocator.getReleasingGpus calculates number of GPUs in a wrong way

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714513#comment-16714513 ] Zhankun Tang commented on YARN-9099: [~snemeth], Thanks for catching up this! The patch looks good to

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Attachment: YARN-9060-trunk.004.patch > [YARN-8851] Phase 1 - Support device isolation in native

[jira] [Commented] (YARN-7715) Support NM promotion/demotion of running containers.

2018-12-10 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714443#comment-16714443 ] Zhankun Tang commented on YARN-7715: [~miklos.szeg...@cloudera.com], [~asuresh], Is this JIRA depend

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-12-09 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Attachment: YARN-9060-trunk.003.patch > [YARN-8851] Phase 1 - Support device isolation in native

[jira] [Commented] (YARN-9009) Fix flaky test TestEntityGroupFSTimelineStore.testCleanLogs

2018-12-09 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714240#comment-16714240 ] Zhankun Tang commented on YARN-9009: +1. LGTM. Could you help to review? [~cheersyang]? > Fix flaky

[jira] [Commented] (YARN-9015) Phase 1 - Add an interface for device plugin to provide customized scheduler

2018-12-08 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713633#comment-16713633 ] Zhankun Tang commented on YARN-9015: [~leftnoteasy], Please help review. Thanks. > Phase 1 - Add an

[jira] [Updated] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-08 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8714: --- Attachment: YARN-8714-trunk.010.patch > [Submarine] Support files/tarballs to be localized for a

[jira] [Updated] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-08 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8714: --- Attachment: YARN-8714-trunk.009.patch > [Submarine] Support files/tarballs to be localized for a

[jira] [Updated] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-07 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8714: --- Attachment: YARN-8714-trunk.008.patch > [Submarine] Support files/tarballs to be localized for a

[jira] [Updated] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-06 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8714: --- Attachment: YARN-8714-trunk.007.patch > [Submarine] Support files/tarballs to be localized for a

[jira] [Updated] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-06 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8714: --- Attachment: YARN-8714-trunk.006.patch > [Submarine] Support files/tarballs to be localized for a

[jira] [Updated] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-06 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8714: --- Attachment: YARN-8714-trunk.005.patch > [Submarine] Support files/tarballs to be localized for a

[jira] [Updated] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-05 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8714: --- Attachment: YARN-8714-trunk.004.patch > [Submarine] Support files/tarballs to be localized for a

[jira] [Updated] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-05 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-8714: --- Attachment: YARN-8714-trunk.003.patch > [Submarine] Support files/tarballs to be localized for a

[jira] [Comment Edited] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-04 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709547#comment-16709547 ] Zhankun Tang edited comment on YARN-8714 at 12/5/18 3:04 AM: - [~leftnoteasy],

[jira] [Updated] (YARN-9083) Support remote directory localization in yarn native service

2018-12-04 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9083: --- Description: When refining YARN-8714, found that the YARN localizer seems can handle remote

[jira] [Created] (YARN-9083) Support remote directory localization in yarn native service

2018-12-04 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9083: -- Summary: Support remote directory localization in yarn native service Key: YARN-9083 URL: https://issues.apache.org/jira/browse/YARN-9083 Project: Hadoop YARN

[jira] [Comment Edited] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-04 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709547#comment-16709547 ] Zhankun Tang edited comment on YARN-8714 at 12/5/18 2:33 AM: - [~leftnoteasy],

[jira] [Commented] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-04 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709547#comment-16709547 ] Zhankun Tang commented on YARN-8714: [~leftnoteasy], [~liuxun323] . Per my testing, the YARN localizer

[jira] [Comment Edited] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-04 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708874#comment-16708874 ] Zhankun Tang edited comment on YARN-8714 at 12/4/18 3:32 PM: - [~leftnoteasy],

[jira] [Comment Edited] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-04 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708874#comment-16708874 ] Zhankun Tang edited comment on YARN-8714 at 12/4/18 3:29 PM: - [~leftnoteasy],

[jira] [Commented] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-12-04 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708874#comment-16708874 ] Zhankun Tang commented on YARN-8714: [~leftnoteasy], [~liuxun323] , While refining the patch, I found

[jira] [Commented] (YARN-8918) [Submarine] Correct method usage of str.subString in CliUtils

2018-12-04 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708338#comment-16708338 ] Zhankun Tang commented on YARN-8918: [~sunilg] , it's a minor change. not important. > [Submarine]

[jira] [Commented] (YARN-8885) Phase 1 - Support NM APIs to query device resource allocation

2018-12-03 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708187#comment-16708187 ] Zhankun Tang commented on YARN-8885: [~leftnoteasy], Below is an example of the output: {code:java} {

[jira] [Updated] (YARN-9015) Phase 1 - Add an interface for device plugin to provide customized scheduler

2018-12-03 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9015: --- Attachment: YARN-9015-trunk.004.patch > Phase 1 - Add an interface for device plugin to provide

[jira] [Commented] (YARN-9015) Phase 1 - Add an interface for device plugin to provide customized scheduler

2018-12-03 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708164#comment-16708164 ] Zhankun Tang commented on YARN-9015: [~leftnoteasy], Thanks for the review! {quote}1)

[jira] [Commented] (YARN-9078) [Submarine] Clean up the code of CliUtils#parseResourcesString

2018-12-03 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708102#comment-16708102 ] Zhankun Tang commented on YARN-9078: [~leftnoteasy] , Our CLI options message doesn't have the "[]",

[jira] [Comment Edited] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-12-03 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707359#comment-16707359 ] Zhankun Tang edited comment on YARN-9060 at 12/3/18 4:00 PM: - [~leftnoteasy] ,

[jira] [Comment Edited] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-12-03 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707359#comment-16707359 ] Zhankun Tang edited comment on YARN-9060 at 12/3/18 3:27 PM: - [~leftnoteasy] ,

[jira] [Comment Edited] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-12-03 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707359#comment-16707359 ] Zhankun Tang edited comment on YARN-9060 at 12/3/18 3:27 PM: - [~leftnoteasy] ,

[jira] [Comment Edited] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-12-03 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707359#comment-16707359 ] Zhankun Tang edited comment on YARN-9060 at 12/3/18 3:26 PM: - [~leftnoteasy] ,

[jira] [Commented] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-12-03 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707359#comment-16707359 ] Zhankun Tang commented on YARN-9060: [~leftnoteasy] , Let's first see the bug(YARN-9073) we involve

[jira] [Updated] (YARN-9060) [YARN-8851] Phase 1 - Support device isolation in native container-executor

2018-12-03 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9060: --- Attachment: YARN-9060-trunk.002.patch > [YARN-8851] Phase 1 - Support device isolation in native

[jira] [Updated] (YARN-9077) [Submarine] Improve job state check

2018-12-02 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9077: --- Attachment: YARN-9077-trunk.002.patch > [Submarine] Improve job state check >

[jira] [Updated] (YARN-9078) [Submarine] Clean up the code of CliUtils#parseResourcesString

2018-12-01 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9078: --- Attachment: YARN-9078-trunk.001.patch > [Submarine] Clean up the code of

[jira] [Created] (YARN-9078) [Submarine] Clean up the code of CliUtils#parseResourcesString

2018-12-01 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9078: -- Summary: [Submarine] Clean up the code of CliUtils#parseResourcesString Key: YARN-9078 URL: https://issues.apache.org/jira/browse/YARN-9078 Project: Hadoop YARN

[jira] [Updated] (YARN-9077) [Submarine] Improve job state check

2018-12-01 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9077: --- Attachment: YARN-9077-trunk.001.patch > [Submarine] Improve job state check >

[jira] [Created] (YARN-9077) [Submarine] Improve job state check

2018-12-01 Thread Zhankun Tang (JIRA)
Zhankun Tang created YARN-9077: -- Summary: [Submarine] Improve job state check Key: YARN-9077 URL: https://issues.apache.org/jira/browse/YARN-9077 Project: Hadoop YARN Issue Type: Sub-task

[jira] [Commented] (YARN-8885) Phase 1 - Support NM APIs to query device resource allocation

2018-11-30 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704400#comment-16704400 ] Zhankun Tang commented on YARN-8885: Please help review if you're available. [~leftnoteasy], [~sunilg]

[jira] [Commented] (YARN-9015) Phase 1 - Add an interface for device plugin to provide customized scheduler

2018-11-30 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704401#comment-16704401 ] Zhankun Tang commented on YARN-9015: Please help review if you're available. [~leftnoteasy], [~sunilg] 

[jira] [Comment Edited] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-11-29 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704227#comment-16704227 ] Zhankun Tang edited comment on YARN-8714 at 11/30/18 6:47 AM: -- {quote}1) Why

[jira] [Commented] (YARN-8714) [Submarine] Support files/tarballs to be localized for a training job.

2018-11-29 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704227#comment-16704227 ] Zhankun Tang commented on YARN-8714: {quote}1) Why hardcoded to handle {{hdfs://}}, it could be s3,

[jira] [Updated] (YARN-9073) GPU/FPGA whitelist configuration in container-executor.cfg won't work when yarn-site.xml's allowed devices doesn't align with it

2018-11-29 Thread Zhankun Tang (JIRA)
[ https://issues.apache.org/jira/browse/YARN-9073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhankun Tang updated YARN-9073: --- Description: The current GPU/FPGA behavior may has an issue when c-g.cfg doesn't align with

<    1   2   3   4   5   6   7   8   9   10   >