[jira] [Updated] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images

2020-08-10 Thread Leitao Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-3159:
-
Description: 
Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
only 1 "/" in the path.
{code:java}
public static final String DOCKER_IMAGE_PATTERN = 
"^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
{code}
In our cluster, the image name have multi layers, such as 
"docker-registry:8080/cloud/hadoop-docker:2.6.0", which is workable when using 
"docker pull IMAGE_NAME", but can not pass the check of image name in 
saneDockerImage().

  was:
Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
only 1 "/" in the path.

{code}
public static final String DOCKER_IMAGE_PATTERN = 
"^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
{code}

In our cluster, the image name have multi layers, such as 
"docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0", which is 
workable when using "docker pull IMAGE_NAME", but can not pass the check of 
image name in saneDockerImage().


> DOCKER_IMAGE_PATTERN should support multilayered path of docker images
> --
>
> Key: YARN-3159
> URL: https://issues.apache.org/jira/browse/YARN-3159
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Leitao Guo
>Assignee: Leitao Guo
>Priority: Major
>  Labels: BB2015-05-TBR
> Attachments: YARN-3159.patch
>
>
> Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
> docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
> only 1 "/" in the path.
> {code:java}
> public static final String DOCKER_IMAGE_PATTERN = 
> "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
> {code}
> In our cluster, the image name have multi layers, such as 
> "docker-registry:8080/cloud/hadoop-docker:2.6.0", which is workable when 
> using "docker pull IMAGE_NAME", but can not pass the check of image name in 
> saneDockerImage().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-09-14 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133488#comment-14133488
 ] 

Leitao Guo commented on YARN-1729:
--

[~zjshen] , sorry! It must be my mistake to assign this to me.

> TimelineWebServices always passes primary and secondary filters as strings
> --
>
> Key: YARN-1729
> URL: https://issues.apache.org/jira/browse/YARN-1729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Billie Rinaldi
> Fix For: 2.4.0
>
> Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
> YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch
>
>
> Primary filters and secondary filter values can be arbitrary json-compatible 
> Object.  The web services should determine if the filters specified as query 
> parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-09-15 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-
Attachment: (was: 3.before-patch.JPG)

> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Assignee: Leitao Guo
> Attachments: YARN-2348.2.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-09-15 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-
Attachment: (was: 4.after-patch.JPG)

> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Assignee: Leitao Guo
> Attachments: YARN-2348.2.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-09-15 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-
Attachment: YARN-2348.3.patch

> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Assignee: Leitao Guo
> Attachments: YARN-2348.2.patch, YARN-2348.3.patch, afterpatch.jpg
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-09-15 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-
Attachment: afterpatch.jpg

> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Assignee: Leitao Guo
> Attachments: YARN-2348.2.patch, YARN-2348.3.patch, afterpatch.jpg
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue

2014-10-10 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166558#comment-14166558
 ] 

Leitao Guo commented on YARN-1582:
--

Any updates for this jira? Why not add yarn.scheduler.maximum-allocation-vcores 
to each queue?

> Capacity Scheduler: add a maximum-allocation-mb setting per queue 
> --
>
> Key: YARN-1582
> URL: https://issues.apache.org/jira/browse/YARN-1582
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacityscheduler
>Affects Versions: 3.0.0, 0.23.10, 2.2.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-1582-branch-0.23.patch
>
>
> We want to allow certain queues to use larger container sizes while limiting 
> other queues to smaller container sizes.  Setting it per queue will help 
> prevent abuse, help limit the impact of reservations, and allow changes in 
> the maximum container size to be rolled out more easily.
> One reason this is needed is more application types are becoming available on 
> yarn and certain applications require more memory to run efficiently. While 
> we want to allow for that we don't want other applications to abuse that and 
> start requesting bigger containers then what they really need.  
> Note that we could have this based on application type, but that might not be 
> totally accurate either since for example you might want to allow certain 
> users on MapReduce to use larger containers, while limiting other users of 
> MapReduce to smaller containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2321) NodeManager WebUI get wrong configuration of isPmemCheckEnabled()

2014-07-18 Thread Leitao Guo (JIRA)
Leitao Guo created YARN-2321:


 Summary: NodeManager WebUI get wrong configuration of 
isPmemCheckEnabled()
 Key: YARN-2321
 URL: https://issues.apache.org/jira/browse/YARN-2321
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo


WebUI of NodeManager get the wrong configuration of "Pmem enforcement enable".



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2321) NodeManager WebUI get wrong configuration of isPmemCheckEnabled()

2014-07-18 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2321:
-

Attachment: YARN-2321.patch

> NodeManager WebUI get wrong configuration of isPmemCheckEnabled()
> -
>
> Key: YARN-2321
> URL: https://issues.apache.org/jira/browse/YARN-2321
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: YARN-2321.patch
>
>
> WebUI of NodeManager get the wrong configuration of "Pmem enforcement enable".



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2321) NodeManager web UI can incorrectly report Pmem enforcement

2014-07-21 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069661#comment-14069661
 ] 

Leitao Guo commented on YARN-2321:
--

Thanks Jason Lowe!

> NodeManager web UI can incorrectly report Pmem enforcement
> --
>
> Key: YARN-2321
> URL: https://issues.apache.org/jira/browse/YARN-2321
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Assignee: Leitao Guo
> Fix For: 3.0.0, 2.6.0
>
> Attachments: YARN-2321.patch
>
>
> WebUI of NodeManager get the wrong configuration of "Pmem enforcement enable".



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-23 Thread Leitao Guo (JIRA)
Leitao Guo created YARN-2348:


 Summary: ResourceManager web UI should display locale time instead 
of UTC time
 Key: YARN-2348
 URL: https://issues.apache.org/jira/browse/YARN-2348
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
 Attachments: 1.before-change.jpg, 2.after-change.jpg

ResourceManager web UI, including application list and scheduler, displays UTC 
time in default,  this will confuse users who do not use UTC time. This web UI 
should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-23 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: 2.after-change.jpg

> ResourceManager web UI should display locale time instead of UTC time
> -
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: 1.before-change.jpg, 2.after-change.jpg
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-23 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: 1.before-change.jpg

> ResourceManager web UI should display locale time instead of UTC time
> -
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: 1.before-change.jpg, 2.after-change.jpg
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-23 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: YARN-2348.patch

Please have a check of the patch.

> ResourceManager web UI should display locale time instead of UTC time
> -
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: 1.before-change.jpg, 2.after-change.jpg, YARN-2348.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)
Leitao Guo created YARN-2368:


 Summary: ResourceManager failed when ZKRMStateStore tries to 
update znode data larger than 1MB
 Key: YARN-2368
 URL: https://issues.apache.org/jira/browse/YARN-2368
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.1
Reporter: Leitao Guo
Priority: Critical


Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)


Meanwhile ZooKeeps logs as the following:

2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747


  was:Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed 
finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger 
than 1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.


> ResourceManager failed when ZKRMStateStore tries to update znode data larger 
> than 1MB
> -
>
> Key: YARN-2368
> URL: https://issues.apache.org/jira/browse/YARN-2368
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Priority: Critical
>
> Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed 
> finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode 
> larger than 1MB, which is the default configuration of ZooKeeper server and 
> client in 'jute.maxbuffer'.
> ResourceManager log shows as the following:
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2014-07-25 22:33:11,214 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
> /rmstore/ZKRMStateRoot/RMAppRoot/

[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:



2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)


Meanwhile ZooKeeps logs as the following:

2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747


  was:
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.re

[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Attachment: YARN-2368.patch

> ResourceManager failed when ZKRMStateStore tries to update znode data larger 
> than 1MB
> -
>
> Key: YARN-2368
> URL: https://issues.apache.org/jira/browse/YARN-2368
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
>Priority: Critical
> Attachments: YARN-2368.patch
>
>
> Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed 
> finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode 
> larger than 1MB, which is the default configuration of ZooKeeper server and 
> client in 'jute.maxbuffer'.
> ResourceManager log shows as the following:
> 
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session connected
> 2014-07-25 22:33:11,078 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
> ZKRMStateStore Session restored
> 2014-07-25 22:33:11,214 FATAL 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
> org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
> STATE_STORE_OP_FAILED. Cause:
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for 
> /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
> at java.lang.Thread.run(Thread.java:745)
> Meanwhile ZooKeeps logs as the following:
> 
> 2014-07-25 22:10:09,742 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
> causing close of session 0x247684586e70006 due to java.io.IOException: Len 
> error 1530747
> ... ...
> 2014-07-25 22:33:10,966 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
> causing close of session 0x247684586e70006 due to java.io.IOException: Len 
> error 1530747



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)



Meanwhile ZooKeeps logs as the following:

2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747


  was:
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:



2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcema

[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)



Meanwhile, ZooKeeps logs as the following:

2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747


  was:
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.jav

[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)



Meanwhile, ZooKeeps log shows as the following:

2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747


  was:
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStor

[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)



Meanwhile, ZooKeeps log shows as the following:

2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747


  was:
Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStor

[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2368:
-

Description: 
Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager (ip addr: 10.153.80.8) log shows as the following:
{code}
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session connected
2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: 
ZKRMStateStore Session restored
2014-07-25 22:33:11,214 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a 
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type 
STATE_STORE_OP_FAILED. Cause:
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = 
ConnectionLoss for 
/rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01
at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766)
at 
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
{code}


Meanwhile, ZooKeeps log shows as the following:
{code}
2014-07-25 22:10:09,728 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.153.80.8:58890
2014-07-25 22:10:09,730 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
attempting to renew session 0x247684586e70006 at /10.153.80.8:58890
2014-07-25 22:10:09,730 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 
0x247684586e70006
2014-07-25 22:10:09,730 [myid:1] - INFO  
[QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 
0x247684586e70006 with negotiated timeout 1 for client /10.153.80.8:58890
2014-07-25 22:10:09,730 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth 
packet /10.153.80.8:58890
2014-07-25 22:10:09,730 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success 
/10.153.80.8:58890
2014-07-25 22:10:09,742 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530
747
2014-07-25 22:10:09,743 [myid:1] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /10.153.80.8:58890 which had sessionid 0x247684586e70006
... ...
2014-07-25 22:33:10,966 [myid:1] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
{code}

  was:
Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed finally. 
ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 
1MB, which is the default configuration of ZooKeeper server and client in 
'jute.maxbuffer'.

ResourceManager log shows as the following:

2014-07-25 22:33:11,078 INFO 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRM

[jira] [Commented] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB

2014-07-29 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078954#comment-14078954
 ] 

Leitao Guo commented on YARN-2368:
--

Thanks [~ozawa] for your comments. 

I deployed hadoop-2.3.0-cdh5.1.0 with 22-queue fairscheduler on my 20-node 
cluster. Two resourcemanagers are deployed exclusively on 10.153.80.8 and 
10.153.80.18. 

Jobs are submitted from gridmix:
{code}
sudo -u mapred hadoop jar /usr/lib/hadoop-mapreduce/hadoop-gridmix.jar 
-Dgridmix.min.file.size=10485760 
-Dgridmix.job-submission.use-queue-in-trace=true 
-Dgridmix.distributed-cache-emulation.enable=false  -generate 34816m 
hdfs:///user/mapred/foo/ hdfs:///tmp/job-trace.json
{code}
job-trace.json is generated by Rumen, with 6,000 jobs, average #maptasks per 
job is  320 and average #reducetasks is 25.

I found 3 times (gridmix tested more than 3 times) that resourcemanager failed 
when handle STATE_STORE_OP_FAILED event. At the same time, zookeeper throws out 
 'Len error IOException'
{code}
... ...
2014-07-24 21:00:51,170 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.153.80.8:47135
2014-07-24 21:00:51,171 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
attempting to renew session 0x247678daa88001a at /10.153.80.8:47135
2014-07-24 21:00:51,171 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 
0x247678daa88001a
2014-07-24 21:00:51,171 [myid:3] - INFO  
[QuorumPeer[myid=3]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 
0x247678daa88001a with negotiated timeout 1 for client /10.153.80.8:47135
2014-07-24 21:00:51,171 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth 
packet /10.153.80.8:47135
2014-07-24 21:00:51,172 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success 
/10.153.80.8:47135
2014-07-24 21:00:51,186 [myid:3] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247678daa88001a due to java.io.IOException: Len 
error 1813411
2014-07-24 21:00:51,186 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /10.153.80.8:47135 which had sessionid 0x247678daa88001a

... ...

2014-07-25 22:10:08,919 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.153.80.8:50480
2014-07-25 22:10:08,921 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
attempting to renew session 0x247684586e70006 at /10.153.80.8:50480
2014-07-25 22:10:08,922 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established 
session 0x247684586e70006 with negotiated timeout 1 for client 
/10.153.80.8:50480
2014-07-25 22:10:08,922 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth 
packet /10.153.80.8:50480
2014-07-25 22:10:08,923 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success 
/10.153.80.8:50480
2014-07-25 22:10:08,934 [myid:3] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x247684586e70006 due to java.io.IOException: Len 
error 1530747
2014-07-25 22:10:08,934 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /10.153.80.8:50480 which had sessionid 0x247684586e70006

... ...

2014-07-26 02:22:59,627 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted 
socket connection from /10.153.80.18:60588
2014-07-26 02:22:59,629 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client 
attempting to renew session 0x2476de7c1af0002 at /10.153.80.18:60588
2014-07-26 02:22:59,629 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established 
session 0x2476de7c1af0002 with negotiated timeout 1 for client 
/10.153.80.18:60588
2014-07-26 02:22:59,630 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth 
packet /10.153.80.18:60588
2014-07-26 02:22:59,630 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success 
/10.153.80.18:60588
2014-07-26 02:22:59,648 [myid:3] - WARN  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception 
causing close of session 0x2476de7c1af0002 due to java.io.IOException: Len 
error 1649043
2014-07-26 02:22:59,648 [myid:3] - INFO  
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket 
connection for client /10.153.80.18:60588 which had sessionid 0x2476de7c1af0

[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: YARN-2348.2.patch

Please find the new patch to this issue in YARN-2348.2.patch. 

In this patch, resourcemanager server will format the date of Start/FinisheTime 
first, instead of rendering date in browser.

> ResourceManager web UI should display locale time instead of UTC time
> -
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: YARN-2348.2.patch, YARN-2348.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: (was: 2.after-change.jpg)

> ResourceManager web UI should display locale time instead of UTC time
> -
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: YARN-2348.2.patch, YARN-2348.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: (was: 1.before-change.jpg)

> ResourceManager web UI should display locale time instead of UTC time
> -
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: YARN-2348.2.patch, YARN-2348.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: 4.after-patch.JPG
3.before-patch.JPG

Here are the new snapshots of Web UI of my cluster before/after the patch. 

> ResourceManager web UI should display locale time instead of UTC time
> -
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
> YARN-2348.2.patch, YARN-2348.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Summary: ResourceManager web UI should display server-side time instead of 
UTC time  (was: ResourceManager web UI should display locale time instead of 
UTC time)

> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
> YARN-2348.2.patch, YARN-2348.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display local time of users.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Description: ResourceManager web UI, including application list and 
scheduler, displays UTC time in default,  this will confuse users who do not 
use UTC time. This web UI should display server-side time in default.  (was: 
ResourceManager web UI, including application list and scheduler, displays UTC 
time in default,  this will confuse users who do not use UTC time. This web UI 
should display server-side time 
in default.)

> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
> YARN-2348.2.patch, YARN-2348.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Description: 
ResourceManager web UI, including application list and scheduler, displays UTC 
time in default,  this will confuse users who do not use UTC time. This web UI 
should display server-side time 
in default.

  was:ResourceManager web UI, including application list and scheduler, 
displays UTC time in default,  this will confuse users who do not use UTC time. 
This web UI should display local time of users.


> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
> YARN-2348.2.patch, YARN-2348.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display server-side time 
> in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079299#comment-14079299
 ] 

Leitao Guo commented on YARN-2348:
--

Hi [~aw] [~tucu00] [~raviprak] , thanks for your comments. I agree with you 
that the Web UI should display the time just the same as the server side. 
Please have a check of the new patch, thanks!

> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
> YARN-2348.2.patch, YARN-2348.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-30 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079423#comment-14079423
 ] 

Leitao Guo commented on YARN-2348:
--

[~chengbing.liu] thanks, Bing!

> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: 3.before-patch.JPG, 4.after-patch.JPG, 
> YARN-2348.2.patch, YARN-2348.2.patch, YARN-2348.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-31 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: (was: YARN-2348.patch)

> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time

2014-07-31 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-2348:
-

Attachment: (was: YARN-2348.2.patch)

> ResourceManager web UI should display server-side time instead of UTC time
> --
>
> Key: YARN-2348
> URL: https://issues.apache.org/jira/browse/YARN-2348
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.4.1
>Reporter: Leitao Guo
> Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch
>
>
> ResourceManager web UI, including application list and scheduler, displays 
> UTC time in default,  this will confuse users who do not use UTC time. This 
> web UI should display server-side time in default.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings

2014-08-06 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo reassigned YARN-1729:


Assignee: Leitao Guo  (was: Billie Rinaldi)

> TimelineWebServices always passes primary and secondary filters as strings
> --
>
> Key: YARN-1729
> URL: https://issues.apache.org/jira/browse/YARN-1729
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Billie Rinaldi
>Assignee: Leitao Guo
> Fix For: 2.4.0
>
> Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, 
> YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch
>
>
> Primary filters and secondary filter values can be arbitrary json-compatible 
> Object.  The web services should determine if the filters specified as query 
> parameters are objects or strings before passing them to the store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2466) Umbrella issue for Yarn launched Docker Containers

2015-01-22 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288608#comment-14288608
 ] 

Leitao Guo commented on YARN-2466:
--

Currently, if I want to use DCE in my cluster, all the application should be 
running in DCE, that is not practical in our cluster.  Can 
"yarn.nodemanager.container-executor.class" support configurable per 
application? So that, we can use DCE in some applications, others can still use 
LCE.

> Umbrella issue for Yarn launched Docker Containers
> --
>
> Key: YARN-2466
> URL: https://issues.apache.org/jira/browse/YARN-2466
> Project: Hadoop YARN
>  Issue Type: New Feature
>Affects Versions: 2.4.1
>Reporter: Abin Shahab
>Assignee: Abin Shahab
>
> Docker (https://www.docker.io/) is, increasingly, a very popular container 
> technology.
> In context of YARN, the support for Docker will provide a very elegant 
> solution to allow applications to package their software into a Docker 
> container (entire Linux file system incl. custom versions of perl, python 
> etc.) and use it as a blueprint to launch all their YARN containers with 
> requisite software environment. This provides both consistency (all YARN 
> containers will have the same software environment) and isolation (no 
> interference with whatever is installed on the physical machine).
> In addition to software isolation mentioned above, Docker containers will 
> provide resource, network, and user-namespace isolation. 
> Docker provides resource isolation through cgroups, similar to 
> LinuxContainerExecutor. This prevents one job from taking other jobs 
> resource(memory and CPU) on the same hadoop cluster. 
> User-namespace isolation will ensure that the root on the container is mapped 
> an unprivileged user on the host. This is currently being added to Docker.
> Network isolation will ensure that one user’s network traffic is completely 
> isolated from another user’s network traffic. 
> Last but not the least, the interaction of Docker and Kerberos will have to 
> be worked out. These Docker containers must work in a secure hadoop 
> environment.
> Additional details are here: 
> https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor

2015-01-26 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292809#comment-14292809
 ] 

Leitao Guo commented on YARN-2718:
--

I think this is good to our hadoop cluster, since we have a few applications 
which have to running in docker containers, but most of the apps needs LCE. So, 
we need a compositeContainerExecutor and let apps configure which 
containerexecutor they need.

> Create a CompositeConatainerExecutor that combines DockerContainerExecutor 
> and DefaultContainerExecutor
> ---
>
> Key: YARN-2718
> URL: https://issues.apache.org/jira/browse/YARN-2718
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Abin Shahab
> Attachments: YARN-2718.patch
>
>
> There should be a composite container that allows users to run their jobs in 
> DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging 
> purposes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor

2015-01-26 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292910#comment-14292910
 ] 

Leitao Guo commented on YARN-2718:
--

[~chenchun], in the following codes, I think you should return directly in case 
'containerExecutor == null'
[code]
  @Override
  public void setContainerExecutor(String containerExecutor) {
maybeInitBuilder();
if (containerExecutor == null) {
  builder.clearContainerExecutor();
}
builder.setContainerExecutor(containerExecutor);
  }
[code]

> Create a CompositeConatainerExecutor that combines DockerContainerExecutor 
> and DefaultContainerExecutor
> ---
>
> Key: YARN-2718
> URL: https://issues.apache.org/jira/browse/YARN-2718
> Project: Hadoop YARN
>  Issue Type: New Feature
>Reporter: Abin Shahab
> Attachments: YARN-2718.patch
>
>
> There should be a composite container that allows users to run their jobs in 
> DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging 
> purposes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images

2015-02-09 Thread Leitao Guo (JIRA)
Leitao Guo created YARN-3159:


 Summary: DOCKER_IMAGE_PATTERN should support multilayered path of 
docker images
 Key: YARN-3159
 URL: https://issues.apache.org/jira/browse/YARN-3159
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Leitao Guo


Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
only 1 "/" in the path.

{code}
public static final String DOCKER_IMAGE_PATTERN = 
"^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
{code}

In our cluster, the image name have multi layers, such as 
"docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0", which is 
workable when using "docker pull IMAGE_NAME", but can not pass the check of 
image name in saneDockerImage().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images

2015-02-09 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-3159:
-
Attachment: YARN-3159.patch

> DOCKER_IMAGE_PATTERN should support multilayered path of docker images
> --
>
> Key: YARN-3159
> URL: https://issues.apache.org/jira/browse/YARN-3159
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Leitao Guo
> Attachments: YARN-3159.patch
>
>
> Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
> docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
> only 1 "/" in the path.
> {code}
> public static final String DOCKER_IMAGE_PATTERN = 
> "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
> {code}
> In our cluster, the image name have multi layers, such as 
> "docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0", which is 
> workable when using "docker pull IMAGE_NAME", but can not pass the check of 
> image name in saneDockerImage().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images

2015-02-09 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312151#comment-14312151
 ] 

Leitao Guo commented on YARN-3159:
--

Ok, I'll add the unit test.

> DOCKER_IMAGE_PATTERN should support multilayered path of docker images
> --
>
> Key: YARN-3159
> URL: https://issues.apache.org/jira/browse/YARN-3159
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.6.0
>Reporter: Leitao Guo
>Assignee: Leitao Guo
> Attachments: YARN-3159.patch
>
>
> Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match 
> docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has 
> only 1 "/" in the path.
> {code}
> public static final String DOCKER_IMAGE_PATTERN = 
> "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$";
> {code}
> In our cluster, the image name have multi layers, such as 
> "docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0", which is 
> workable when using "docker pull IMAGE_NAME", but can not pass the check of 
> image name in saneDockerImage().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3282) DockerContainerExecutor should support environment variables setting

2015-02-28 Thread Leitao Guo (JIRA)
Leitao Guo created YARN-3282:


 Summary: DockerContainerExecutor should support environment 
variables setting
 Key: YARN-3282
 URL: https://issues.apache.org/jira/browse/YARN-3282
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: applications, nodemanager
Affects Versions: 2.6.0
Reporter: Leitao Guo


Currently, DockerContainerExecutor will mount "yarn.nodemanager.local-dirs" and 
"yarn.nodemanager.log-dirs" to containers automatically. However applications 
maybe need set more environment variables before launching containers. 

In our applications, just as the following command, we need to attach several 
directories and set some environment variables to docker containers. 

{code}
docker run -i -t -v /data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v /mnt:/mnt 
-e VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e VTC_RUNTIME=vtc 
sequenceiq/hadoop-docker:2.6.0 /bin/bash
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3282) DockerContainerExecutor should support environment variables setting

2015-02-28 Thread Leitao Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leitao Guo updated YARN-3282:
-
Attachment: YARN-3282.01.patch

When using DockerContainerExecutor, mapreduce jobs can set docker environment 
variables via "yarn.nodemanager.docker-container-executor.env" after the patch.

e.g. 
{code}
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples.jar wordcount 
-Dyarn.app.mapreduce.am.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.6.0"
 
-Dmapreduce.map.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.6.0"
  
-Dmapreduce.reduce.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.6.0"
   -Dyarn.nodemanager.docker-container-executor.env="-v 
/data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v /mnt:/mnt -e 
VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e VTC_RUNTIME=vtc" 
/wordcount_input /wordcount_output
{code}

> DockerContainerExecutor should support environment variables setting
> 
>
> Key: YARN-3282
> URL: https://issues.apache.org/jira/browse/YARN-3282
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, nodemanager
>Affects Versions: 2.6.0
>Reporter: Leitao Guo
> Attachments: YARN-3282.01.patch
>
>
> Currently, DockerContainerExecutor will mount "yarn.nodemanager.local-dirs" 
> and "yarn.nodemanager.log-dirs" to containers automatically. However 
> applications maybe need set more environment variables before launching 
> containers. 
> In our applications, just as the following command, we need to attach several 
> directories and set some environment variables to docker containers. 
> {code}
> docker run -i -t -v /data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v 
> /mnt:/mnt -e VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e 
> VTC_RUNTIME=vtc sequenceiq/hadoop-docker:2.6.0 /bin/bash
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3282) DockerContainerExecutor should support environment variables setting

2015-03-03 Thread Leitao Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346315#comment-14346315
 ] 

Leitao Guo commented on YARN-3282:
--

Hi [~ashahab], thanks for your comments. 

I agree with you that a docker image should try to be self-sufficient, but in 
our scenarios, the data are stored in a remote shared  storage system, we mount 
these storage as a local dirs to nodemanager, and only a few applications 
running in docker containers need to access these dirs. So these dirs are not 
suitable as "yarn.nodemanager.local-dirs", which are mounted to docker 
container as default. IMHO. I think an extra method to set environment 
variables for docker containers is necessary.

> DockerContainerExecutor should support environment variables setting
> 
>
> Key: YARN-3282
> URL: https://issues.apache.org/jira/browse/YARN-3282
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: applications, nodemanager
>Affects Versions: 2.6.0
>Reporter: Leitao Guo
> Attachments: YARN-3282.01.patch
>
>
> Currently, DockerContainerExecutor will mount "yarn.nodemanager.local-dirs" 
> and "yarn.nodemanager.log-dirs" to containers automatically. However 
> applications maybe need set more environment variables before launching 
> containers. 
> In our applications, just as the following command, we need to attach several 
> directories and set some environment variables to docker containers. 
> {code}
> docker run -i -t -v /data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v 
> /mnt:/mnt -e VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e 
> VTC_RUNTIME=vtc sequenceiq/hadoop-docker:2.6.0 /bin/bash
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)