[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-12-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15059:
-
Attachment: HADOOP-15059.006.patch

The patch looks good to me.

[~daryn], don't get scared by exception handling!

Uploading a new patch that is same as that of Jason's but with SerializedFormat 
made static.

The test case failures reported here are unrelated. Though it is surprising how 
badly the unit tests are broken - I'll debug them independently and file bugs.

If Jenkins says okay, I'll commit this unless [~daryn] / [~jlowe] say no.

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch, 
> HADOOP-15059.006.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-12-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15059:
-
Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch, 
> HADOOP-15059.006.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-12-06 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HADOOP-15059:
-
Status: Open  (was: Patch Available)

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-30 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HADOOP-15059:

Attachment: HADOOP-15059.005.patch

Thanks for the reviews, Daryn and Vinod!  I'm attaching a patch that moves the 
byte value to enum mapping into the enum itself.  I'm not convinced the 
refactor was worth it, but here it is for you to decide.

bq. If only YARN had a system whereby one could dynamically label a node based 
upon the current software stack. Then one could schedule around that and/or set 
up distributed cache content to match.

The RM tracks which version each NM registered with, so it knows the software 
stack being provided by that node.  Unfortunately YARN doesn't know what 
software stack the user's application expects, so it cannot fixup the 
distributed cache to match the expectation.  In addition the framework could be 
embedded in the user's app which cannot be fixed in the general case by 
tweaking the distributed cache.

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch, HADOOP-15059.005.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HADOOP-15059:

Attachment: HADOOP-15059.004.patch

Thanks for joining the conversation, Allen, and for pointing out the 
motivations behind the protobuf change.  Do you know of existing use cases that 
are relying on the new format?

I completely agree the new format is a great path forward for extensibility and 
portability, but unfortunately it breaks a number of existing use cases.

bq. Let's be clear: this is only a problem if one has a bundled 
hadoop-common.jar.

It's also important to point out that this is a rather common occurrence.  
Besides the typical habit of users running their *-with-dependencies.jar on the 
cluster, anyone leveraging the framework-on-HDFS approach will be bitten by 
this as soon as the nodemanager upgrades.  

Having frameworks deploy via HDFS rather than picking them up from the 
nodemanager's jars has proven to be a very useful way to better isolate apps 
during cluster rolling upgrades and support multiple versions of the framework 
on the cluster simultaneously.

bq. Is the end result of this JIRA going to be that all file formats are locked 
forever, regardless of where they come from?

I don't think so.  As discussed above, we should be able to remove support for 
the Writable format when Hadoop no longer supports 2.x apps.  Yes, that's 
likely quite a long time, but it does not have to be forever.

bq. Hadoop releases have broken rolling upgrade (and non-rolling upgrades, for 
that matter) in the middle of the 2.x stream before by removing things such as 
container execution types.

We've completed rolling upgrades across all of our clusters for every minor 
release of 2.x since rolling upgrades were first supported in 2.6, so we must 
not have hit this landmine.  Was this the removal of the dedicated Docker 
container executor in favor of the unified Linux executor that does everything?

I'm attaching a patch that implements the "bridge release(s)" approach where 
the code supports reading the new format but will write the old format by 
default.  Code can still request the new format explicitly if necessary.  The 
main drawback is that we don't get to easily leverage the benefits of the new 
format since it's not the default format.  However I'm hoping native services 
and other things that need the new protobuf format can leverage dtutil to 
translate the credentials format for easier consumption.

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch, HADOOP-15059.004.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)

[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-28 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HADOOP-15059:

Attachment: HADOOP-15059.003.patch

One last tweak to get rid of the last checkstyle warning, although it makes the 
new line stick out like a sore thumb compared to the others.  I'd be happy to 
put the warnings back in just for the sake of code consistency if someone cares.

Filed YARN-7576 for the existing findbugs warning in trunk.


> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, 
> HADOOP-15059.003.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-28 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HADOOP-15059:

Attachment: HADOOP-15059.002.patch

Heh, clearly the container-executor changes were broken.  Attaching a new patch 
that fixes those along with changes to address the checkstyle issues.  The 
findbug issue is pre-existing and unrelated, and the unit test failure also 
appears to be unrelated.  It passes for me locally with the patch applied.

I manually tested the container-executor changes on a single-node cluster 
configured to run with LinuxContainerExecutor.

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-27 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HADOOP-15059:

Attachment: HADOOP-15059.001.patch

Attaching a patch that has the container launch process write two token files, 
the legacy token format in the existing container_tokens file and the new 
version 1 format in a new container_tokens-v1 file.  I left the container 
localizer path alone since localizers are running the same code as the 
nodemanager and therefore can directly support the new v1 format as-is.

The basic idea is to tack on the "-v1" suffix to the legacy token pathname to 
form the new v1 token pathname.  The container launcher and container executors 
both do this, so the interface between them did not have to change.  The legacy 
path is passed between them to indicate where both token files can be located 
(once the suffix is applied to form the new token path).  It's definitely not 
the cleanest, but it was relatively simple to implement.  I refactored some 
names in the container start context to make it more clear which path is being 
used.

This needs a lot more testing, but I was able to run a sleep job on a simple 
security pseudo-distributed cluster and manually verified both container token 
files were being written and each was the proper format.  I also manually 
forced the launcher to omit the new environment variable for the version 1 
file, forcing the UGI to load the legacy token file, and that worked as well.  
I have not had a chance yet to test the rolling-upgrade-with-tarball scenario 
nor the native container-executor changes, but I thought it was far enough 
along to at least get some feedback.

If others could take a look at the patch and/or take it for a test drive that 
would be great.


> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade

2017-11-27 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HADOOP-15059:

Status: Patch Available  (was: Open)

> 3.0 deployment cannot work with old version MR tar ball which break rolling 
> upgrade
> ---
>
> Key: HADOOP-15059
> URL: https://issues.apache.org/jira/browse/HADOOP-15059
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Junping Du
>Assignee: Jason Lowe
>Priority: Blocker
> Attachments: HADOOP-15059.001.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed 
> because following error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for 
> application appattempt_1511295641738_0003_01
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: 
> Unable to load native-hadoop library for your platform... using builtin-java 
> classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:220)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.(Configuration.java:212)
>   at 
> org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading 
> /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_01/container_tokens
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
>   at 
> org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
>   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
>   at 
> org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
>   ... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
>   at 
> org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
>   ... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
> with status 1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we 
> claim "rolling upgrade" is supported in Hadoop 3, we should fix this before 
> we ship 3.0 otherwise all MR running applications will get stuck during/after 
> upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org