[jira] [Updated] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images
[ https://issues.apache.org/jira/browse/YARN-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-3159: - Description: Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has only 1 "/" in the path. {code:java} public static final String DOCKER_IMAGE_PATTERN = "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$"; {code} In our cluster, the image name have multi layers, such as "docker-registry:8080/cloud/hadoop-docker:2.6.0", which is workable when using "docker pull IMAGE_NAME", but can not pass the check of image name in saneDockerImage(). was: Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has only 1 "/" in the path. {code} public static final String DOCKER_IMAGE_PATTERN = "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$"; {code} In our cluster, the image name have multi layers, such as "docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0", which is workable when using "docker pull IMAGE_NAME", but can not pass the check of image name in saneDockerImage(). > DOCKER_IMAGE_PATTERN should support multilayered path of docker images > -- > > Key: YARN-3159 > URL: https://issues.apache.org/jira/browse/YARN-3159 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Leitao Guo >Assignee: Leitao Guo >Priority: Major > Labels: BB2015-05-TBR > Attachments: YARN-3159.patch > > > Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match > docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has > only 1 "/" in the path. > {code:java} > public static final String DOCKER_IMAGE_PATTERN = > "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$"; > {code} > In our cluster, the image name have multi layers, such as > "docker-registry:8080/cloud/hadoop-docker:2.6.0", which is workable when > using "docker pull IMAGE_NAME", but can not pass the check of image name in > saneDockerImage(). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings
[ https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133488#comment-14133488 ] Leitao Guo commented on YARN-1729: -- [~zjshen] , sorry! It must be my mistake to assign this to me. > TimelineWebServices always passes primary and secondary filters as strings > -- > > Key: YARN-1729 > URL: https://issues.apache.org/jira/browse/YARN-1729 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Billie Rinaldi > Fix For: 2.4.0 > > Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, > YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch > > > Primary filters and secondary filter values can be arbitrary json-compatible > Object. The web services should determine if the filters specified as query > parameters are objects or strings before passing them to the store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: (was: 3.before-patch.JPG) > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo >Assignee: Leitao Guo > Attachments: YARN-2348.2.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: (was: 4.after-patch.JPG) > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo >Assignee: Leitao Guo > Attachments: YARN-2348.2.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: YARN-2348.3.patch > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo >Assignee: Leitao Guo > Attachments: YARN-2348.2.patch, YARN-2348.3.patch, afterpatch.jpg > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: afterpatch.jpg > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo >Assignee: Leitao Guo > Attachments: YARN-2348.2.patch, YARN-2348.3.patch, afterpatch.jpg > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1582) Capacity Scheduler: add a maximum-allocation-mb setting per queue
[ https://issues.apache.org/jira/browse/YARN-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166558#comment-14166558 ] Leitao Guo commented on YARN-1582: -- Any updates for this jira? Why not add yarn.scheduler.maximum-allocation-vcores to each queue? > Capacity Scheduler: add a maximum-allocation-mb setting per queue > -- > > Key: YARN-1582 > URL: https://issues.apache.org/jira/browse/YARN-1582 > Project: Hadoop YARN > Issue Type: Improvement > Components: capacityscheduler >Affects Versions: 3.0.0, 0.23.10, 2.2.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-1582-branch-0.23.patch > > > We want to allow certain queues to use larger container sizes while limiting > other queues to smaller container sizes. Setting it per queue will help > prevent abuse, help limit the impact of reservations, and allow changes in > the maximum container size to be rolled out more easily. > One reason this is needed is more application types are becoming available on > yarn and certain applications require more memory to run efficiently. While > we want to allow for that we don't want other applications to abuse that and > start requesting bigger containers then what they really need. > Note that we could have this based on application type, but that might not be > totally accurate either since for example you might want to allow certain > users on MapReduce to use larger containers, while limiting other users of > MapReduce to smaller containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2321) NodeManager WebUI get wrong configuration of isPmemCheckEnabled()
Leitao Guo created YARN-2321: Summary: NodeManager WebUI get wrong configuration of isPmemCheckEnabled() Key: YARN-2321 URL: https://issues.apache.org/jira/browse/YARN-2321 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.4.1 Reporter: Leitao Guo WebUI of NodeManager get the wrong configuration of "Pmem enforcement enable". -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2321) NodeManager WebUI get wrong configuration of isPmemCheckEnabled()
[ https://issues.apache.org/jira/browse/YARN-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2321: - Attachment: YARN-2321.patch > NodeManager WebUI get wrong configuration of isPmemCheckEnabled() > - > > Key: YARN-2321 > URL: https://issues.apache.org/jira/browse/YARN-2321 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: YARN-2321.patch > > > WebUI of NodeManager get the wrong configuration of "Pmem enforcement enable". -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2321) NodeManager web UI can incorrectly report Pmem enforcement
[ https://issues.apache.org/jira/browse/YARN-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069661#comment-14069661 ] Leitao Guo commented on YARN-2321: -- Thanks Jason Lowe! > NodeManager web UI can incorrectly report Pmem enforcement > -- > > Key: YARN-2321 > URL: https://issues.apache.org/jira/browse/YARN-2321 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo >Assignee: Leitao Guo > Fix For: 3.0.0, 2.6.0 > > Attachments: YARN-2321.patch > > > WebUI of NodeManager get the wrong configuration of "Pmem enforcement enable". -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
Leitao Guo created YARN-2348: Summary: ResourceManager web UI should display locale time instead of UTC time Key: YARN-2348 URL: https://issues.apache.org/jira/browse/YARN-2348 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Attachments: 1.before-change.jpg, 2.after-change.jpg ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: 2.after-change.jpg > ResourceManager web UI should display locale time instead of UTC time > - > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 1.before-change.jpg, 2.after-change.jpg > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: 1.before-change.jpg > ResourceManager web UI should display locale time instead of UTC time > - > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 1.before-change.jpg, 2.after-change.jpg > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: YARN-2348.patch Please have a check of the patch. > ResourceManager web UI should display locale time instead of UTC time > - > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 1.before-change.jpg, 2.after-change.jpg, YARN-2348.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
Leitao Guo created YARN-2368: Summary: ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB Key: YARN-2368 URL: https://issues.apache.org/jira/browse/YARN-2368 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.1 Reporter: Leitao Guo Priority: Critical Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2368: - Description: Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Meanwhile ZooKeeps logs as the following: 2014-07-25 22:10:09,742 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 ... ... 2014-07-25 22:33:10,966 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 was:Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. > ResourceManager failed when ZKRMStateStore tries to update znode data larger > than 1MB > - > > Key: YARN-2368 > URL: https://issues.apache.org/jira/browse/YARN-2368 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo >Priority: Critical > > Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed > finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode > larger than 1MB, which is the default configuration of ZooKeeper server and > client in 'jute.maxbuffer'. > ResourceManager log shows as the following: > 2014-07-25 22:33:11,078 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2014-07-25 22:33:11,078 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2014-07-25 22:33:11,214 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for > /rmstore/ZKRMStateRoot/RMAppRoot/
[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2368: - Description: Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Meanwhile ZooKeeps logs as the following: 2014-07-25 22:10:09,742 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 ... ... 2014-07-25 22:33:10,966 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 was: Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.re
[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2368: - Attachment: YARN-2368.patch > ResourceManager failed when ZKRMStateStore tries to update znode data larger > than 1MB > - > > Key: YARN-2368 > URL: https://issues.apache.org/jira/browse/YARN-2368 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo >Priority: Critical > Attachments: YARN-2368.patch > > > Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed > finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode > larger than 1MB, which is the default configuration of ZooKeeper server and > client in 'jute.maxbuffer'. > ResourceManager log shows as the following: > > 2014-07-25 22:33:11,078 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session connected > 2014-07-25 22:33:11,078 INFO > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: > ZKRMStateStore Session restored > 2014-07-25 22:33:11,214 FATAL > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a > org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type > STATE_STORE_OP_FAILED. Cause: > org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode > = ConnectionLoss for > /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > Meanwhile ZooKeeps logs as the following: > > 2014-07-25 22:10:09,742 [myid:1] - WARN > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception > causing close of session 0x247684586e70006 due to java.io.IOException: Len > error 1530747 > ... ... > 2014-07-25 22:33:10,966 [myid:1] - WARN > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception > causing close of session 0x247684586e70006 due to java.io.IOException: Len > error 1530747 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2368: - Description: Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Meanwhile ZooKeeps logs as the following: 2014-07-25 22:10:09,742 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 ... ... 2014-07-25 22:33:10,966 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 was: Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcema
[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2368: - Description: Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Meanwhile, ZooKeeps logs as the following: 2014-07-25 22:10:09,742 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 ... ... 2014-07-25 22:33:10,966 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 was: Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.jav
[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2368: - Description: Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Meanwhile, ZooKeeps log shows as the following: 2014-07-25 22:10:09,742 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 ... ... 2014-07-25 22:33:10,966 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 was: Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStor
[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2368: - Description: Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Meanwhile, ZooKeeps log shows as the following: 2014-07-25 22:10:09,742 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 ... ... 2014-07-25 22:33:10,966 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 was: Both ResouceManager throws out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStor
[jira] [Updated] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2368: - Description: Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager (ip addr: 10.153.80.8) log shows as the following: {code} 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session connected 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: ZKRMStateStore Session restored 2014-07-25 22:33:11,214 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED. Cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /rmstore/ZKRMStateRoot/RMAppRoot/application_1406264354826_1645/appattempt_1406264354826_1645_01 at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:926) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$8.run(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:973) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:992) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.existsWithRetries(ZKRMStateStore.java:923) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.updateApplicationAttemptStateInternal(ZKRMStateStore.java:620) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:675) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:766) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:761) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) {code} Meanwhile, ZooKeeps log shows as the following: {code} 2014-07-25 22:10:09,728 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.153.80.8:58890 2014-07-25 22:10:09,730 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x247684586e70006 at /10.153.80.8:58890 2014-07-25 22:10:09,730 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x247684586e70006 2014-07-25 22:10:09,730 [myid:1] - INFO [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x247684586e70006 with negotiated timeout 1 for client /10.153.80.8:58890 2014-07-25 22:10:09,730 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth packet /10.153.80.8:58890 2014-07-25 22:10:09,730 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success /10.153.80.8:58890 2014-07-25 22:10:09,742 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530 747 2014-07-25 22:10:09,743 [myid:1] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.153.80.8:58890 which had sessionid 0x247684586e70006 ... ... 2014-07-25 22:33:10,966 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 {code} was: Both ResouceManagers throw out STATE_STORE_OP_FAILED events and failed finally. ZooKeeper log shows that ZKRMStateStore tries to update a znode larger than 1MB, which is the default configuration of ZooKeeper server and client in 'jute.maxbuffer'. ResourceManager log shows as the following: 2014-07-25 22:33:11,078 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRM
[jira] [Commented] (YARN-2368) ResourceManager failed when ZKRMStateStore tries to update znode data larger than 1MB
[ https://issues.apache.org/jira/browse/YARN-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078954#comment-14078954 ] Leitao Guo commented on YARN-2368: -- Thanks [~ozawa] for your comments. I deployed hadoop-2.3.0-cdh5.1.0 with 22-queue fairscheduler on my 20-node cluster. Two resourcemanagers are deployed exclusively on 10.153.80.8 and 10.153.80.18. Jobs are submitted from gridmix: {code} sudo -u mapred hadoop jar /usr/lib/hadoop-mapreduce/hadoop-gridmix.jar -Dgridmix.min.file.size=10485760 -Dgridmix.job-submission.use-queue-in-trace=true -Dgridmix.distributed-cache-emulation.enable=false -generate 34816m hdfs:///user/mapred/foo/ hdfs:///tmp/job-trace.json {code} job-trace.json is generated by Rumen, with 6,000 jobs, average #maptasks per job is 320 and average #reducetasks is 25. I found 3 times (gridmix tested more than 3 times) that resourcemanager failed when handle STATE_STORE_OP_FAILED event. At the same time, zookeeper throws out 'Len error IOException' {code} ... ... 2014-07-24 21:00:51,170 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.153.80.8:47135 2014-07-24 21:00:51,171 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x247678daa88001a at /10.153.80.8:47135 2014-07-24 21:00:51,171 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:Learner@107] - Revalidating client: 0x247678daa88001a 2014-07-24 21:00:51,171 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x247678daa88001a with negotiated timeout 1 for client /10.153.80.8:47135 2014-07-24 21:00:51,171 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth packet /10.153.80.8:47135 2014-07-24 21:00:51,172 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success /10.153.80.8:47135 2014-07-24 21:00:51,186 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247678daa88001a due to java.io.IOException: Len error 1813411 2014-07-24 21:00:51,186 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.153.80.8:47135 which had sessionid 0x247678daa88001a ... ... 2014-07-25 22:10:08,919 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.153.80.8:50480 2014-07-25 22:10:08,921 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x247684586e70006 at /10.153.80.8:50480 2014-07-25 22:10:08,922 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x247684586e70006 with negotiated timeout 1 for client /10.153.80.8:50480 2014-07-25 22:10:08,922 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth packet /10.153.80.8:50480 2014-07-25 22:10:08,923 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success /10.153.80.8:50480 2014-07-25 22:10:08,934 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x247684586e70006 due to java.io.IOException: Len error 1530747 2014-07-25 22:10:08,934 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.153.80.8:50480 which had sessionid 0x247684586e70006 ... ... 2014-07-26 02:22:59,627 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.153.80.18:60588 2014-07-26 02:22:59,629 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@832] - Client attempting to renew session 0x2476de7c1af0002 at /10.153.80.18:60588 2014-07-26 02:22:59,629 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@595] - Established session 0x2476de7c1af0002 with negotiated timeout 1 for client /10.153.80.18:60588 2014-07-26 02:22:59,630 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@863] - got auth packet /10.153.80.18:60588 2014-07-26 02:22:59,630 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@897] - auth success /10.153.80.18:60588 2014-07-26 02:22:59,648 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception causing close of session 0x2476de7c1af0002 due to java.io.IOException: Len error 1649043 2014-07-26 02:22:59,648 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /10.153.80.18:60588 which had sessionid 0x2476de7c1af0
[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: YARN-2348.2.patch Please find the new patch to this issue in YARN-2348.2.patch. In this patch, resourcemanager server will format the date of Start/FinisheTime first, instead of rendering date in browser. > ResourceManager web UI should display locale time instead of UTC time > - > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: YARN-2348.2.patch, YARN-2348.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: (was: 2.after-change.jpg) > ResourceManager web UI should display locale time instead of UTC time > - > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: YARN-2348.2.patch, YARN-2348.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: (was: 1.before-change.jpg) > ResourceManager web UI should display locale time instead of UTC time > - > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: YARN-2348.2.patch, YARN-2348.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display locale time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: 4.after-patch.JPG 3.before-patch.JPG Here are the new snapshots of Web UI of my cluster before/after the patch. > ResourceManager web UI should display locale time instead of UTC time > - > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 3.before-patch.JPG, 4.after-patch.JPG, > YARN-2348.2.patch, YARN-2348.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Summary: ResourceManager web UI should display server-side time instead of UTC time (was: ResourceManager web UI should display locale time instead of UTC time) > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 3.before-patch.JPG, 4.after-patch.JPG, > YARN-2348.2.patch, YARN-2348.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display local time of users. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Description: ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. (was: ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default.) > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 3.before-patch.JPG, 4.after-patch.JPG, > YARN-2348.2.patch, YARN-2348.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Description: ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display server-side time in default. was:ResourceManager web UI, including application list and scheduler, displays UTC time in default, this will confuse users who do not use UTC time. This web UI should display local time of users. > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 3.before-patch.JPG, 4.after-patch.JPG, > YARN-2348.2.patch, YARN-2348.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time > in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079299#comment-14079299 ] Leitao Guo commented on YARN-2348: -- Hi [~aw] [~tucu00] [~raviprak] , thanks for your comments. I agree with you that the Web UI should display the time just the same as the server side. Please have a check of the new patch, thanks! > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 3.before-patch.JPG, 4.after-patch.JPG, > YARN-2348.2.patch, YARN-2348.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14079423#comment-14079423 ] Leitao Guo commented on YARN-2348: -- [~chengbing.liu] thanks, Bing! > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 3.before-patch.JPG, 4.after-patch.JPG, > YARN-2348.2.patch, YARN-2348.2.patch, YARN-2348.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: (was: YARN-2348.patch) > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2348) ResourceManager web UI should display server-side time instead of UTC time
[ https://issues.apache.org/jira/browse/YARN-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-2348: - Attachment: (was: YARN-2348.2.patch) > ResourceManager web UI should display server-side time instead of UTC time > -- > > Key: YARN-2348 > URL: https://issues.apache.org/jira/browse/YARN-2348 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.4.1 >Reporter: Leitao Guo > Attachments: 3.before-patch.JPG, 4.after-patch.JPG, YARN-2348.2.patch > > > ResourceManager web UI, including application list and scheduler, displays > UTC time in default, this will confuse users who do not use UTC time. This > web UI should display server-side time in default. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1729) TimelineWebServices always passes primary and secondary filters as strings
[ https://issues.apache.org/jira/browse/YARN-1729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo reassigned YARN-1729: Assignee: Leitao Guo (was: Billie Rinaldi) > TimelineWebServices always passes primary and secondary filters as strings > -- > > Key: YARN-1729 > URL: https://issues.apache.org/jira/browse/YARN-1729 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Billie Rinaldi >Assignee: Leitao Guo > Fix For: 2.4.0 > > Attachments: YARN-1729.1.patch, YARN-1729.2.patch, YARN-1729.3.patch, > YARN-1729.4.patch, YARN-1729.5.patch, YARN-1729.6.patch, YARN-1729.7.patch > > > Primary filters and secondary filter values can be arbitrary json-compatible > Object. The web services should determine if the filters specified as query > parameters are objects or strings before passing them to the store. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2466) Umbrella issue for Yarn launched Docker Containers
[ https://issues.apache.org/jira/browse/YARN-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288608#comment-14288608 ] Leitao Guo commented on YARN-2466: -- Currently, if I want to use DCE in my cluster, all the application should be running in DCE, that is not practical in our cluster. Can "yarn.nodemanager.container-executor.class" support configurable per application? So that, we can use DCE in some applications, others can still use LCE. > Umbrella issue for Yarn launched Docker Containers > -- > > Key: YARN-2466 > URL: https://issues.apache.org/jira/browse/YARN-2466 > Project: Hadoop YARN > Issue Type: New Feature >Affects Versions: 2.4.1 >Reporter: Abin Shahab >Assignee: Abin Shahab > > Docker (https://www.docker.io/) is, increasingly, a very popular container > technology. > In context of YARN, the support for Docker will provide a very elegant > solution to allow applications to package their software into a Docker > container (entire Linux file system incl. custom versions of perl, python > etc.) and use it as a blueprint to launch all their YARN containers with > requisite software environment. This provides both consistency (all YARN > containers will have the same software environment) and isolation (no > interference with whatever is installed on the physical machine). > In addition to software isolation mentioned above, Docker containers will > provide resource, network, and user-namespace isolation. > Docker provides resource isolation through cgroups, similar to > LinuxContainerExecutor. This prevents one job from taking other jobs > resource(memory and CPU) on the same hadoop cluster. > User-namespace isolation will ensure that the root on the container is mapped > an unprivileged user on the host. This is currently being added to Docker. > Network isolation will ensure that one user’s network traffic is completely > isolated from another user’s network traffic. > Last but not the least, the interaction of Docker and Kerberos will have to > be worked out. These Docker containers must work in a secure hadoop > environment. > Additional details are here: > https://wiki.apache.org/hadoop/dineshs/IsolatingYarnAppsInDockerContainers -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292809#comment-14292809 ] Leitao Guo commented on YARN-2718: -- I think this is good to our hadoop cluster, since we have a few applications which have to running in docker containers, but most of the apps needs LCE. So, we need a compositeContainerExecutor and let apps configure which containerexecutor they need. > Create a CompositeConatainerExecutor that combines DockerContainerExecutor > and DefaultContainerExecutor > --- > > Key: YARN-2718 > URL: https://issues.apache.org/jira/browse/YARN-2718 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Abin Shahab > Attachments: YARN-2718.patch > > > There should be a composite container that allows users to run their jobs in > DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging > purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2718) Create a CompositeConatainerExecutor that combines DockerContainerExecutor and DefaultContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292910#comment-14292910 ] Leitao Guo commented on YARN-2718: -- [~chenchun], in the following codes, I think you should return directly in case 'containerExecutor == null' [code] @Override public void setContainerExecutor(String containerExecutor) { maybeInitBuilder(); if (containerExecutor == null) { builder.clearContainerExecutor(); } builder.setContainerExecutor(containerExecutor); } [code] > Create a CompositeConatainerExecutor that combines DockerContainerExecutor > and DefaultContainerExecutor > --- > > Key: YARN-2718 > URL: https://issues.apache.org/jira/browse/YARN-2718 > Project: Hadoop YARN > Issue Type: New Feature >Reporter: Abin Shahab > Attachments: YARN-2718.patch > > > There should be a composite container that allows users to run their jobs in > DockerContainerExecutor, but switch to DefaultContainerExecutor for debugging > purposes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images
Leitao Guo created YARN-3159: Summary: DOCKER_IMAGE_PATTERN should support multilayered path of docker images Key: YARN-3159 URL: https://issues.apache.org/jira/browse/YARN-3159 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Reporter: Leitao Guo Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has only 1 "/" in the path. {code} public static final String DOCKER_IMAGE_PATTERN = "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$"; {code} In our cluster, the image name have multi layers, such as "docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0", which is workable when using "docker pull IMAGE_NAME", but can not pass the check of image name in saneDockerImage(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images
[ https://issues.apache.org/jira/browse/YARN-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-3159: - Attachment: YARN-3159.patch > DOCKER_IMAGE_PATTERN should support multilayered path of docker images > -- > > Key: YARN-3159 > URL: https://issues.apache.org/jira/browse/YARN-3159 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Leitao Guo > Attachments: YARN-3159.patch > > > Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match > docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has > only 1 "/" in the path. > {code} > public static final String DOCKER_IMAGE_PATTERN = > "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$"; > {code} > In our cluster, the image name have multi layers, such as > "docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0", which is > workable when using "docker pull IMAGE_NAME", but can not pass the check of > image name in saneDockerImage(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3159) DOCKER_IMAGE_PATTERN should support multilayered path of docker images
[ https://issues.apache.org/jira/browse/YARN-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312151#comment-14312151 ] Leitao Guo commented on YARN-3159: -- Ok, I'll add the unit test. > DOCKER_IMAGE_PATTERN should support multilayered path of docker images > -- > > Key: YARN-3159 > URL: https://issues.apache.org/jira/browse/YARN-3159 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Leitao Guo >Assignee: Leitao Guo > Attachments: YARN-3159.patch > > > Currently, DOCKER_IMAGE_PATTERN in DockerContainerExecutor can only match > docker images with the path like "sequenceiq/hadoop-docker:2.6.0", which has > only 1 "/" in the path. > {code} > public static final String DOCKER_IMAGE_PATTERN = > "^(([\\w\\.-]+)(:\\d+)*\\/)?[\\w\\.:-]+$"; > {code} > In our cluster, the image name have multi layers, such as > "docker-registry.qiyi.virtual:8080/cloud/hadoop-docker:2.6.0", which is > workable when using "docker pull IMAGE_NAME", but can not pass the check of > image name in saneDockerImage(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3282) DockerContainerExecutor should support environment variables setting
Leitao Guo created YARN-3282: Summary: DockerContainerExecutor should support environment variables setting Key: YARN-3282 URL: https://issues.apache.org/jira/browse/YARN-3282 Project: Hadoop YARN Issue Type: Improvement Components: applications, nodemanager Affects Versions: 2.6.0 Reporter: Leitao Guo Currently, DockerContainerExecutor will mount "yarn.nodemanager.local-dirs" and "yarn.nodemanager.log-dirs" to containers automatically. However applications maybe need set more environment variables before launching containers. In our applications, just as the following command, we need to attach several directories and set some environment variables to docker containers. {code} docker run -i -t -v /data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v /mnt:/mnt -e VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e VTC_RUNTIME=vtc sequenceiq/hadoop-docker:2.6.0 /bin/bash {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3282) DockerContainerExecutor should support environment variables setting
[ https://issues.apache.org/jira/browse/YARN-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leitao Guo updated YARN-3282: - Attachment: YARN-3282.01.patch When using DockerContainerExecutor, mapreduce jobs can set docker environment variables via "yarn.nodemanager.docker-container-executor.env" after the patch. e.g. {code} bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples.jar wordcount -Dyarn.app.mapreduce.am.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.6.0" -Dmapreduce.map.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.6.0" -Dmapreduce.reduce.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.6.0" -Dyarn.nodemanager.docker-container-executor.env="-v /data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v /mnt:/mnt -e VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e VTC_RUNTIME=vtc" /wordcount_input /wordcount_output {code} > DockerContainerExecutor should support environment variables setting > > > Key: YARN-3282 > URL: https://issues.apache.org/jira/browse/YARN-3282 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, nodemanager >Affects Versions: 2.6.0 >Reporter: Leitao Guo > Attachments: YARN-3282.01.patch > > > Currently, DockerContainerExecutor will mount "yarn.nodemanager.local-dirs" > and "yarn.nodemanager.log-dirs" to containers automatically. However > applications maybe need set more environment variables before launching > containers. > In our applications, just as the following command, we need to attach several > directories and set some environment variables to docker containers. > {code} > docker run -i -t -v /data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v > /mnt:/mnt -e VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e > VTC_RUNTIME=vtc sequenceiq/hadoop-docker:2.6.0 /bin/bash > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3282) DockerContainerExecutor should support environment variables setting
[ https://issues.apache.org/jira/browse/YARN-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346315#comment-14346315 ] Leitao Guo commented on YARN-3282: -- Hi [~ashahab], thanks for your comments. I agree with you that a docker image should try to be self-sufficient, but in our scenarios, the data are stored in a remote shared storage system, we mount these storage as a local dirs to nodemanager, and only a few applications running in docker containers need to access these dirs. So these dirs are not suitable as "yarn.nodemanager.local-dirs", which are mounted to docker container as default. IMHO. I think an extra method to set environment variables for docker containers is necessary. > DockerContainerExecutor should support environment variables setting > > > Key: YARN-3282 > URL: https://issues.apache.org/jira/browse/YARN-3282 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications, nodemanager >Affects Versions: 2.6.0 >Reporter: Leitao Guo > Attachments: YARN-3282.01.patch > > > Currently, DockerContainerExecutor will mount "yarn.nodemanager.local-dirs" > and "yarn.nodemanager.log-dirs" to containers automatically. However > applications maybe need set more environment variables before launching > containers. > In our applications, just as the following command, we need to attach several > directories and set some environment variables to docker containers. > {code} > docker run -i -t -v /data/transcode:/data/tmp -v /etc/qcs:/etc/qcs -v > /mnt:/mnt -e VTC_MQTYPE=rabbitmq -e VTC_APP=ugc -e VTC_LOCATION=sh -e > VTC_RUNTIME=vtc sequenceiq/hadoop-docker:2.6.0 /bin/bash > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)