[jira] [Updated] (MAPREDUCE-7059) Downward Compatibility issue: MR job fails because of unknown setErasureCodingPolicy method from 3.x client to HDFS 2.x cluster
[ https://issues.apache.org/jira/browse/MAPREDUCE-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated MAPREDUCE-7059: - Fix Version/s: (was: 3.0.2) 3.0.3 > Downward Compatibility issue: MR job fails because of unknown > setErasureCodingPolicy method from 3.x client to HDFS 2.x cluster > --- > > Key: MAPREDUCE-7059 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7059 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: job submission >Affects Versions: 3.0.0 >Reporter: Jiandan Yang >Assignee: Jiandan Yang >Priority: Critical > Fix For: 3.1.0, 3.0.3 > > Attachments: MAPREDUCE-7059.001.patch, MAPREDUCE-7059.002.patch, > MAPREDUCE-7059.003.patch, MAPREDUCE-7059.004.patch, MAPREDUCE-7059.005.patch, > MAPREDUCE-7059.006.patch > > > Running teragen failed in the version of hadoop-3.1, and hdfs server is 2.8. > {code:java} > bin/hadoop jar > share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0-SNAPSHOT.jar teragen > 10 /teragen > {code} > The reason of failing is 2.8 HDFS does not have setErasureCodingPolicy. > one solution is parsing RemoteException in > JobResourceUploader#disableErasure like this: > {code:java} > private void disableErasureCodingForPath(FileSystem fs, Path path) > throws IOException { > try { > if (jtFs instanceof DistributedFileSystem) { > LOG.info("Disabling Erasure Coding for path: " + path); > DistributedFileSystem dfs = (DistributedFileSystem) jtFs; > dfs.setErasureCodingPolicy(path, > SystemErasureCodingPolicies.getReplicationPolicy().getName()); > } > } catch (RemoteException e) { > if (!e.getClassName().equals(RpcNoSuchMethodException.class.getName())) > { > throw e; > } else { > LOG.warn( > "hdfs server does not have method disableErasureCodingForPath," > + " and skip disableErasureCodingForPath", e); > } > } > } > {code} > Does anyone have better solution? > The detailed exception trace is: > {code:java} > 2018-02-26 11:22:53,178 INFO mapreduce.JobSubmitter: Cleaning up the staging > area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1518615699369_0006 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RpcNoSuchMethodException): > Unknown method setErasureCodingPolicy called on > org.apache.hadoop.hdfs.protocol.ClientProtocol protocol. > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:436) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:846) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:789) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1804) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2457) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1491) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy11.setErasureCodingPolicy(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setErasureCodingPolicy(ClientNamenodeProtocolTranslatorPB.java:1583) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) > at > org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) > at com.sun.proxy.$Proxy12.setErasureCodingPolicy(Unknown Source) > at >
[jira] [Commented] (MAPREDUCE-7069) Add ability to specify user environment variables individually
[ https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425908#comment-16425908 ] Jim Brennan commented on MAPREDUCE-7069: [~jlowe] thanks for the thorough review. Much appreciated! {quote}It's a bit odd and inefficient that setVMEnv calls MRApps.setEnvFromInputProperty twice. I think it would be clearer and more efficient to call it once, place the results in a temporary map (like it already does in the second call), then only set HADOOP_ROOT_LOGGER and HADOOP_CLIENT_OPTS in the environment if they are not set in the temporary map. Then at the end we can simply call addAll to dump the contents of the temporary map into the environment map. {quote} The reason it was done with two calls is because of the way environment variables are handled when they are already defined in the environment map. If you an environment variable that you are updating already exists in the environment, setEnvFromInput* functions append the new value to the existing value, using the appropriate separator. The special handling for HADOOP_ROOT_LOGGER and HADOOP_CLIENT_OPTS is to overwrite them instead of appending. That said, I can definitely change it to do it the way you suggest, except I can't just use addAll() - you ultimately need to use Apps.addToEnvironment on each k/v pair. I could expose an Apps.setEnvFromInputStringMapNoExpand() (or add a noExpand boolean to the existing one) to handle this though. Thanks for the documentation/comment recommendations - I was going to ask about that - I'll clean those up. {quote}Nit: setEnvFromInputStringMap does not need to be public. {quote} Will fix. In an earlier iteration I was calling this directly. {quote}Would it be easier to call tmpEnv.addAll(inputMap) and pass tmpEnv instead of inputMap? Then we don't need to explicitly iterate the map. {quote} Yes. I will make this change. > Add ability to specify user environment variables individually > -- > > Key: MAPREDUCE-7069 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch > > > As reported in YARN-6830, it is currently not possible to specify an > environment variable that contains commas via {{mapreduce.map.env}}, > mapreduce.reduce.env, or {{mapreduce.admin.user.env}}. > To address this, [~aw] proposed in [YARN-6830] that we add the ability to > specify environment variables individually: > {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7069) Add ability to specify user environment variables individually
[ https://issues.apache.org/jira/browse/MAPREDUCE-7069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425770#comment-16425770 ] Jason Lowe commented on MAPREDUCE-7069: --- Thanks for the patch! It's a bit odd and inefficient that setVMEnv calls MRApps.setEnvFromInputProperty twice. I think it would be clearer and more efficient to call it once, place the results in a temporary map (like it already does in the second call), then only set HADOOP_ROOT_LOGGER and HADOOP_CLIENT_OPTS in the environment if they are not set in the temporary map. Then at the end we can simply call addAll to dump the contents of the temporary map into the environment map. The example documentation in JobConf is confusing. It uses "MAPRED_MAP_TASK_ENV" and "MAPRED_REDUCE_TASK_ENV" but those literal strings should not be used in the property name. It would be clearer if this used "mapreduce.map.env" and "mapreduce.reduce.env" in the examples. Either that or give the example in the Java realm with something like set(MAPRED_MAP_TASK_ENV + ".varName", varValue) so it's clearly not a literal string in the property name. My pereference is the former. The relevant property descriptions in mapred-default.xml should be updated to reflect the new functionality. It would be good to update MapReduceTutorial.md to document the options for passing environment variables to tasks. There are a number of comments in setEnvFromString that should be fixed up. I realize this is mostly cut-n-paste from the old setEnvFromInputString, but since we're refactoring it would be nice to clean it up a bit in the process. There's not such thing as a tt (tasktracker) in YARN, and the comments imply this is only called to setup the env by a nodemanager for a child process. That's not always the case. "note" s/b "not", etc. For javadoc comments it's not necessary to state the type of the variable after the variable name. Javadoc can automatically extract this from the method signature. Nit: setEnvFromInputStringMap does not need to be public. Would it be easier to call tmpEnv.addAll(inputMap) and pass tmpEnv instead of inputMap? Then we don't need to explicitly iterate the map. The unit test should add a new properies with commas and or equal signs in the value and verify the values come through in the environment map. Does it make sense to split some of the unit test up into separate tests? For example the null input test can easily stand by itself. Separate tests make it easier to identify what's working and what's broken rather than a stacktrace with a line number in the middle of a large unit test that is testing many different aspects. > Add ability to specify user environment variables individually > -- > > Key: MAPREDUCE-7069 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7069 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Jim Brennan >Assignee: Jim Brennan >Priority: Major > Attachments: MAPREDUCE-7069.001.patch, MAPREDUCE-7069.002.patch > > > As reported in YARN-6830, it is currently not possible to specify an > environment variable that contains commas via {{mapreduce.map.env}}, > mapreduce.reduce.env, or {{mapreduce.admin.user.env}}. > To address this, [~aw] proposed in [YARN-6830] that we add the ability to > specify environment variables individually: > {quote}e.g, mapreduce.map.env.[foo]=bar gets turned into foo=bar > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7072) mapred job -history prints duplicate counter in human output
[ https://issues.apache.org/jira/browse/MAPREDUCE-7072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425564#comment-16425564 ] Wilfred Spiegelenburg commented on MAPREDUCE-7072: -- The root cause of the issue is located in the {{AbstractCounters}} code {{getGroupNames()}} When you track through the code in the debugger the number of counter groups returned is higher than expected. This is due to the fact that we add the deprecated counters names to the list of counter group names before we return. The display name of the counters that are tracked in the deprecated list, stored in the legacyMap, are the same as the display names in the non-deprecated counters. The deprecated counters added are already in the non deprecated list which causes the duplication. It works in the JSON format because it internally uses a HashMap. The HashMap uses the name of the counter groups as the key. The keys clash and we thus overwrite the existing value with the value from the deprecated value. To track where this issue is coming from: MAPREDUCE-4053 changed the iteration to work for oozie and seems related to OOZIE-777 and the HadoopELFunctions which still seems to use the deprecated counter name. Changing what the method returns is thus not possible without breaking oozie. We can use the iterator that can be returned by the abstract counters as it does not include the deprecated names. > mapred job -history prints duplicate counter in human output > > > Key: MAPREDUCE-7072 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7072 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 3.0.0 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg >Priority: Major > > 'mapred job -history' command prints duplicate entries for counters only for > the human output format. It does not do this for the JSON format. > mapred job -history /user/history/somefile.jhist -format human > {code} > > |Job Counters |Total megabyte-seconds taken by all map tasks|0 |0 |268,288,000 > ... > |Job Counters |Total megabyte-seconds taken by all map tasks|0 |0 |268,288,000 > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7072) mapred job -history prints duplicate counter in human output
Wilfred Spiegelenburg created MAPREDUCE-7072: Summary: mapred job -history prints duplicate counter in human output Key: MAPREDUCE-7072 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7072 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 3.0.0 Reporter: Wilfred Spiegelenburg Assignee: Wilfred Spiegelenburg 'mapred job -history' command prints duplicate entries for counters only for the human output format. It does not do this for the JSON format. mapred job -history /user/history/somefile.jhist -format human {code} |Job Counters |Total megabyte-seconds taken by all map tasks|0 |0 |268,288,000 ... |Job Counters |Total megabyte-seconds taken by all map tasks|0 |0 |268,288,000 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org