[jira] [Updated] (MAPREDUCE-3283) mapred classpath CLI does not display the complete classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated MAPREDUCE-3283: - Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 for the patch. This time, mapred.cmd worked correctly in my tests. I committed it to trunk and branch-2. Thank you for the contribution, Varun. mapred classpath CLI does not display the complete classpath Key: MAPREDUCE-3283 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3283 Project: Hadoop Map/Reduce Issue Type: Bug Components: scripts Affects Versions: 0.23.0, 2.6.0 Reporter: Ramya Sunil Assignee: Varun Saxena Priority: Minor Labels: newbie Fix For: 2.7.0 Attachments: MAPREDUCE-3283-branch-2.001.patch, MAPREDUCE-3283-branch-2.patch, MAPREDUCE-3283.002.patch, MAPREDUCE-3283.003.patch, MAPREDUCE-3283.004.patch, MAPREDUCE-3283.005.patch bin/yarn classpath does not display the complete classpath. Below is how the classpath looks like: {noformat} $HADOOP_CONF_DIR:$HADOOP_CONF_DIR::$TOOLS_JAR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*: $HADOOP_MAPRED_HOME/bin/../modules/*:$HADOOP_MAPRED_HOME/bin/../lib/* {noformat} * has to be substituted with the actual jars. Also, $HADOOP_CONF_DIR appears twice in the classpath -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6057) Remove obsolete entries from mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286253#comment-14286253 ] Ray Chiang commented on MAPREDUCE-6057: --- RE: findbugs No code changes in the files mentioned. Remove obsolete entries from mapred-default.xml --- Key: MAPREDUCE-6057 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6057 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: newbie Attachments: MAPREDUCE-6057-01.patch, MAPREDUCE-6057-02.patch, MAPREDUCE-6057-03.patch, MAPREDUCE-6057.004.patch The following properties are defined in mapred-default.xml but no longer exist in MRJobConfig. map.sort.class mapred.child.env mapred.child.java.opts mapreduce.app-submission.cross-platform mapreduce.client.completion.pollinterval mapreduce.client.output.filter mapreduce.client.progressmonitor.pollinterval mapreduce.client.submit.file.replication mapreduce.cluster.acls.enabled mapreduce.cluster.local.dir mapreduce.framework.name mapreduce.ifile.readahead mapreduce.ifile.readahead.bytes mapreduce.input.fileinputformat.list-status.num-threads mapreduce.input.fileinputformat.split.minsize mapreduce.input.lineinputformat.linespermap mapreduce.job.counters.limit mapreduce.job.max.split.locations mapreduce.job.reduce.shuffle.consumer.plugin.class mapreduce.jobhistory.address mapreduce.jobhistory.admin.acl mapreduce.jobhistory.admin.address mapreduce.jobhistory.cleaner.enable mapreduce.jobhistory.cleaner.interval-ms mapreduce.jobhistory.client.thread-count mapreduce.jobhistory.datestring.cache.size mapreduce.jobhistory.done-dir mapreduce.jobhistory.http.policy mapreduce.jobhistory.intermediate-done-dir mapreduce.jobhistory.joblist.cache.size mapreduce.jobhistory.keytab mapreduce.jobhistory.loadedjobs.cache.size mapreduce.jobhistory.max-age-ms mapreduce.jobhistory.minicluster.fixed.ports mapreduce.jobhistory.move.interval-ms mapreduce.jobhistory.move.thread-count mapreduce.jobhistory.principal mapreduce.jobhistory.recovery.enable mapreduce.jobhistory.recovery.store.class mapreduce.jobhistory.recovery.store.fs.uri mapreduce.jobhistory.store.class mapreduce.jobhistory.webapp.address mapreduce.local.clientfactory.class.name mapreduce.map.skip.proc.count.autoincr mapreduce.output.fileoutputformat.compress mapreduce.output.fileoutputformat.compress.codec mapreduce.output.fileoutputformat.compress.type mapreduce.reduce.skip.proc.count.autoincr mapreduce.shuffle.connection-keep-alive.enable mapreduce.shuffle.connection-keep-alive.timeout mapreduce.shuffle.max.connections mapreduce.shuffle.max.threads mapreduce.shuffle.port mapreduce.shuffle.ssl.enabled mapreduce.shuffle.ssl.file.buffer.size mapreduce.shuffle.transfer.buffer.size mapreduce.shuffle.transferTo.allowed yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts Submitting bug for comment/feedback about which properties should be kept in mapred-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3283) mapred classpath CLI does not display the complete classpath
[ https://issues.apache.org/jira/browse/MAPREDUCE-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286384#comment-14286384 ] Hudson commented on MAPREDUCE-3283: --- FAILURE: Integrated in Hadoop-trunk-Commit #6907 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6907/]) MAPREDUCE-3283. mapred classpath CLI does not display the complete classpath. Contributed by Varun Saxena. (cnauroth: rev 0742591335f15d2f8916555704c2db6124107618) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/bin/mapred * hadoop-mapreduce-project/bin/mapred.cmd mapred classpath CLI does not display the complete classpath Key: MAPREDUCE-3283 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3283 Project: Hadoop Map/Reduce Issue Type: Bug Components: scripts Affects Versions: 0.23.0, 2.6.0 Reporter: Ramya Sunil Assignee: Varun Saxena Priority: Minor Labels: newbie Fix For: 2.7.0 Attachments: MAPREDUCE-3283-branch-2.001.patch, MAPREDUCE-3283-branch-2.patch, MAPREDUCE-3283.002.patch, MAPREDUCE-3283.003.patch, MAPREDUCE-3283.004.patch, MAPREDUCE-3283.005.patch bin/yarn classpath does not display the complete classpath. Below is how the classpath looks like: {noformat} $HADOOP_CONF_DIR:$HADOOP_CONF_DIR::$TOOLS_JAR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*: $HADOOP_MAPRED_HOME/bin/../modules/*:$HADOOP_MAPRED_HOME/bin/../lib/* {noformat} * has to be substituted with the actual jars. Also, $HADOOP_CONF_DIR appears twice in the classpath -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6192) Create unit test to automatically compare MR related classes and mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286276#comment-14286276 ] Ray Chiang commented on MAPREDUCE-6192: --- RE: Failing unit tests Both tests pass in my tree. Create unit test to automatically compare MR related classes and mapred-default.xml --- Key: MAPREDUCE-6192 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6192 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Attachments: MAPREDUCE-6192.001.patch, MAPREDUCE-6192.002.patch Create a unit test that will automatically compare the fields in the various MapReduce related classes and mapred-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically
[ https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286866#comment-14286866 ] Hudson commented on MAPREDUCE-5785: --- SUCCESS: Integrated in Hadoop-trunk-Commit #6910 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6910/]) MAPREDUCE-5785. Derive heap size or mapreduce.*.memory.mb automatically. (Gera Shegalov and Karthik Kambatla via gera) (gera: rev a003f71cacd35834a1abbc2ffb5446a1166caf73) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestMapReduceChildJVM.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobConf.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java Derive heap size or mapreduce.*.memory.mb automatically --- Key: MAPREDUCE-5785 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, task Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 3.0.0 Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch, mr-5785-7.patch, mr-5785-8.patch, mr-5785-9.patch Currently users have to set 2 memory-related configs per Job / per task type. One first chooses some container size map reduce.\*.memory.mb and then a corresponding maximum Java heap size Xmx map reduce.\*.memory.mb. This makes sure that the JVM's C-heap (native memory + Java heap) does not exceed this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be - allocating big containers whereas the JVM will only use the default -Xmx200m. - allocating small containers that will OOM because Xmx is too high. With this JIRA, we propose to set Xmx automatically based on an empirical ratio that can be adjusted. Xmx is not changed automatically if provided by the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically
[ https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-5785: - Resolution: Fixed Assignee: Gera Shegalov (was: Karthik Kambatla) Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~kasha] for collaborating on this patch! Derive heap size or mapreduce.*.memory.mb automatically --- Key: MAPREDUCE-5785 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, task Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 3.0.0 Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch, mr-5785-7.patch, mr-5785-8.patch, mr-5785-9.patch Currently users have to set 2 memory-related configs per Job / per task type. One first chooses some container size map reduce.\*.memory.mb and then a corresponding maximum Java heap size Xmx map reduce.\*.memory.mb. This makes sure that the JVM's C-heap (native memory + Java heap) does not exceed this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be - allocating big containers whereas the JVM will only use the default -Xmx200m. - allocating small containers that will OOM because Xmx is too high. With this JIRA, we propose to set Xmx automatically based on an empirical ratio that can be adjusted. Xmx is not changed automatically if provided by the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6209) Implement a heuristic to auto-size Java heap of MRAppMaster container proportional to the job size
[ https://issues.apache.org/jira/browse/MAPREDUCE-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285503#comment-14285503 ] Tsuyoshi OZAWA commented on MAPREDUCE-6209: --- Karthik and Gera, thanks for your explanation. I got it. Do you have good heuristics to decide base value for making the heap size propotional? Implement a heuristic to auto-size Java heap of MRAppMaster container proportional to the job size -- Key: MAPREDUCE-6209 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6209 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster Reporter: Gera Shegalov The size of Java heap required by the AM is linearly proportional to the size of the MR job (number of mappers/splits and reducers). it would be nice if users did not have to adjust the AM container size when transitioning from testing job on a small sample to a production job on a full-scale dataset. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files
[ https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285390#comment-14285390 ] Gera Shegalov commented on MAPREDUCE-4815: -- Thanks for the latest patch, [~l201514]! Some comments/questions: 1. we are changing the behavior and not the api, we can have a property {{mapreduce.fileoutputcommitter.algorithm.version}} 1: the old behavior. This should be the default unless we have solved the upgrade in an efficient bullet-proof manner. 2: the new proposed design. Why the flag for the new behavior is not initialized when {{FileOutputCommitter#FileOutputCommitter(Path, TaskAttemptContext)}} is used. There is a minor difference between {{runOldCommitJob}} and {{runNewCommitJob}} in that the lengthy copy iterator is skipped. Therefore, no need to duplicate code. Enclose this copy loop into some {{if (version == 1)}}. I think it’s sufficient to have such checks for {{commit/recoverTask}} as well. Code under the comment {code} //for backwards compatibility after upgrade to the new fileOutputCommitter, //check if there are any output left in committedTaskPath {code} seems misplaced and should actually be under {{runNewRecoverTask}}. This scenario will need a test. Equally the existing tests should be run under both the new and the old logic. FileOutputCommitter.commitJob can be very slow for jobs with many output files -- Key: MAPREDUCE-4815 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1 Reporter: Jason Lowe Assignee: Siqi Li Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch If a job generates many files to commit then the commitJob method call at the end of the job can take minutes. This is a performance regression from 1.x, as 1.x had the tasks commit directly to the final output directory as they were completing and commitJob had very little to do. The commit work was processed in parallel and overlapped the processing of outstanding tasks. In 0.23/2.x, the commit is single-threaded and waits until all tasks have completed before commencing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)