[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HADOOP-15059: - Attachment: HADOOP-15059.006.patch The patch looks good to me. [~daryn], don't get scared by exception handling! Uploading a new patch that is same as that of Jason's but with SerializedFormat made static. The test case failures reported here are unrelated. Though it is surprising how badly the unit tests are broken - I'll debug them independently and file bugs. If Jenkins says okay, I'll commit this unless [~daryn] / [~jlowe] say no. > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, > HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch, > HADOOP-15059.006.patch > > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HADOOP-15059: - Hadoop Flags: Reviewed Status: Patch Available (was: Open) > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, > HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch, > HADOOP-15059.006.patch > > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HADOOP-15059: - Status: Open (was: Patch Available) > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, > HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch > > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-15059: Attachment: HADOOP-15059.005.patch Thanks for the reviews, Daryn and Vinod! I'm attaching a patch that moves the byte value to enum mapping into the enum itself. I'm not convinced the refactor was worth it, but here it is for you to decide. bq. If only YARN had a system whereby one could dynamically label a node based upon the current software stack. Then one could schedule around that and/or set up distributed cache content to match. The RM tracks which version each NM registered with, so it knows the software stack being provided by that node. Unfortunately YARN doesn't know what software stack the user's application expects, so it cannot fixup the distributed cache to match the expectation. In addition the framework could be embedded in the user's app which cannot be fixed in the general case by tweaking the distributed cache. > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, > HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch > > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-15059: Attachment: HADOOP-15059.004.patch Thanks for joining the conversation, Allen, and for pointing out the motivations behind the protobuf change. Do you know of existing use cases that are relying on the new format? I completely agree the new format is a great path forward for extensibility and portability, but unfortunately it breaks a number of existing use cases. bq. Let's be clear: this is only a problem if one has a bundled hadoop-common.jar. It's also important to point out that this is a rather common occurrence. Besides the typical habit of users running their *-with-dependencies.jar on the cluster, anyone leveraging the framework-on-HDFS approach will be bitten by this as soon as the nodemanager upgrades. Having frameworks deploy via HDFS rather than picking them up from the nodemanager's jars has proven to be a very useful way to better isolate apps during cluster rolling upgrades and support multiple versions of the framework on the cluster simultaneously. bq. Is the end result of this JIRA going to be that all file formats are locked forever, regardless of where they come from? I don't think so. As discussed above, we should be able to remove support for the Writable format when Hadoop no longer supports 2.x apps. Yes, that's likely quite a long time, but it does not have to be forever. bq. Hadoop releases have broken rolling upgrade (and non-rolling upgrades, for that matter) in the middle of the 2.x stream before by removing things such as container execution types. We've completed rolling upgrades across all of our clusters for every minor release of 2.x since rolling upgrades were first supported in 2.6, so we must not have hit this landmine. Was this the removal of the dedicated Docker container executor in favor of the unified Linux executor that does everything? I'm attaching a patch that implements the "bridge release(s)" approach where the code supports reading the new format but will write the old format by default. Code can still request the new format explicitly if necessary. The main drawback is that we don't get to easily leverage the benefits of the new format since it's not the default format. However I'm hoping native services and other things that need the new protobuf format can leverage dtutil to translate the credentials format for easier consumption. > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, > HADOOP-15059.003.patch, HADOOP-15059.004.patch > > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-15059: Attachment: HADOOP-15059.003.patch One last tweak to get rid of the last checkstyle warning, although it makes the new line stick out like a sore thumb compared to the others. I'd be happy to put the warnings back in just for the sake of code consistency if someone cares. Filed YARN-7576 for the existing findbugs warning in trunk. > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, > HADOOP-15059.003.patch > > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-15059: Attachment: HADOOP-15059.002.patch Heh, clearly the container-executor changes were broken. Attaching a new patch that fixes those along with changes to address the checkstyle issues. The findbug issue is pre-existing and unrelated, and the unit test failure also appears to be unrelated. It passes for me locally with the patch applied. I manually tested the container-executor changes on a single-node cluster configured to run with LinuxContainerExecutor. > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch > > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-15059: Attachment: HADOOP-15059.001.patch Attaching a patch that has the container launch process write two token files, the legacy token format in the existing container_tokens file and the new version 1 format in a new container_tokens-v1 file. I left the container localizer path alone since localizers are running the same code as the nodemanager and therefore can directly support the new v1 format as-is. The basic idea is to tack on the "-v1" suffix to the legacy token pathname to form the new v1 token pathname. The container launcher and container executors both do this, so the interface between them did not have to change. The legacy path is passed between them to indicate where both token files can be located (once the suffix is applied to form the new token path). It's definitely not the cleanest, but it was relatively simple to implement. I refactored some names in the container start context to make it more clear which path is being used. This needs a lot more testing, but I was able to run a sleep job on a simple security pseudo-distributed cluster and manually verified both container token files were being written and each was the proper format. I also manually forced the launcher to omit the new environment variable for the version 1 file, forcing the UGI to load the legacy token file, and that worked as well. I have not had a chance yet to test the rolling-upgrade-with-tarball scenario nor the native container-executor changes, but I thought it was far enough along to at least get some feedback. If others could take a look at the patch and/or take it for a test drive that would be great. > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > Attachments: HADOOP-15059.001.patch > > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
[ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated HADOOP-15059: Status: Patch Available (was: Open) > 3.0 deployment cannot work with old version MR tar ball which break rolling > upgrade > --- > > Key: HADOOP-15059 > URL: https://issues.apache.org/jira/browse/HADOOP-15059 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Junping Du >Assignee: Jason Lowe >Priority: Blocker > Attachments: HADOOP-15059.001.patch > > > I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed > because following error: > {noformat} > 2017-11-21 12:42:50,911 INFO [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for > application appattempt_1511295641738_0003_01 > 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: > Unable to load native-hadoop library for your platform... using builtin-java > classes where applicable > 2017-11-21 12:42:51,118 FATAL [main] > org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster > java.lang.RuntimeException: Unable to determine current user > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220) > at > org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212) > at > org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638) > Caused by: java.io.IOException: Exception reading > /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689) > at > org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252) > ... 4 more > Caused by: java.io.IOException: Unknown version 1 in token storage. > at > org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226) > at > org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205) > ... 8 more > 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting > with status 1: java.lang.RuntimeException: Unable to determine current user > {noformat} > I think it is due to token incompatiblity change between 2.9 and 3.0. As we > claim "rolling upgrade" is supported in Hadoop 3, we should fix this before > we ship 3.0 otherwise all MR running applications will get stuck during/after > upgrade. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org