[jira] [Updated] (MAPREDUCE-3283) mapred classpath CLI does not display the complete classpath

2015-01-21 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated MAPREDUCE-3283:
-
   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1 for the patch.  This time, mapred.cmd worked correctly in my tests.  I 
committed it to trunk and branch-2.  Thank you for the contribution, Varun.

 mapred classpath CLI does not display the complete classpath
 

 Key: MAPREDUCE-3283
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3283
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.23.0, 2.6.0
Reporter: Ramya Sunil
Assignee: Varun Saxena
Priority: Minor
  Labels: newbie
 Fix For: 2.7.0

 Attachments: MAPREDUCE-3283-branch-2.001.patch, 
 MAPREDUCE-3283-branch-2.patch, MAPREDUCE-3283.002.patch, 
 MAPREDUCE-3283.003.patch, MAPREDUCE-3283.004.patch, MAPREDUCE-3283.005.patch


 bin/yarn classpath does not display the complete classpath. Below is how the 
 classpath looks like:
 {noformat}
 $HADOOP_CONF_DIR:$HADOOP_CONF_DIR::$TOOLS_JAR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:
 $HADOOP_MAPRED_HOME/bin/../modules/*:$HADOOP_MAPRED_HOME/bin/../lib/*
 {noformat}
 * has to be substituted with the actual jars. Also, $HADOOP_CONF_DIR 
 appears twice in the classpath



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6057) Remove obsolete entries from mapred-default.xml

2015-01-21 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286253#comment-14286253
 ] 

Ray Chiang commented on MAPREDUCE-6057:
---

RE: findbugs

No code changes in the files mentioned.

 Remove obsolete entries from mapred-default.xml
 ---

 Key: MAPREDUCE-6057
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6057
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-6057-01.patch, MAPREDUCE-6057-02.patch, 
 MAPREDUCE-6057-03.patch, MAPREDUCE-6057.004.patch


 The following properties are defined in mapred-default.xml but no longer 
 exist in MRJobConfig.
   map.sort.class
   mapred.child.env
   mapred.child.java.opts
   mapreduce.app-submission.cross-platform
   mapreduce.client.completion.pollinterval
   mapreduce.client.output.filter
   mapreduce.client.progressmonitor.pollinterval
   mapreduce.client.submit.file.replication
   mapreduce.cluster.acls.enabled
   mapreduce.cluster.local.dir
   mapreduce.framework.name
   mapreduce.ifile.readahead
   mapreduce.ifile.readahead.bytes
   mapreduce.input.fileinputformat.list-status.num-threads
   mapreduce.input.fileinputformat.split.minsize
   mapreduce.input.lineinputformat.linespermap
   mapreduce.job.counters.limit
   mapreduce.job.max.split.locations
   mapreduce.job.reduce.shuffle.consumer.plugin.class
   mapreduce.jobhistory.address
   mapreduce.jobhistory.admin.acl
   mapreduce.jobhistory.admin.address
   mapreduce.jobhistory.cleaner.enable
   mapreduce.jobhistory.cleaner.interval-ms
   mapreduce.jobhistory.client.thread-count
   mapreduce.jobhistory.datestring.cache.size
   mapreduce.jobhistory.done-dir
   mapreduce.jobhistory.http.policy
   mapreduce.jobhistory.intermediate-done-dir
   mapreduce.jobhistory.joblist.cache.size
   mapreduce.jobhistory.keytab
   mapreduce.jobhistory.loadedjobs.cache.size
   mapreduce.jobhistory.max-age-ms
   mapreduce.jobhistory.minicluster.fixed.ports
   mapreduce.jobhistory.move.interval-ms
   mapreduce.jobhistory.move.thread-count
   mapreduce.jobhistory.principal
   mapreduce.jobhistory.recovery.enable
   mapreduce.jobhistory.recovery.store.class
   mapreduce.jobhistory.recovery.store.fs.uri
   mapreduce.jobhistory.store.class
   mapreduce.jobhistory.webapp.address
   mapreduce.local.clientfactory.class.name
   mapreduce.map.skip.proc.count.autoincr
   mapreduce.output.fileoutputformat.compress
   mapreduce.output.fileoutputformat.compress.codec
   mapreduce.output.fileoutputformat.compress.type
   mapreduce.reduce.skip.proc.count.autoincr
   mapreduce.shuffle.connection-keep-alive.enable
   mapreduce.shuffle.connection-keep-alive.timeout
   mapreduce.shuffle.max.connections
   mapreduce.shuffle.max.threads
   mapreduce.shuffle.port
   mapreduce.shuffle.ssl.enabled
   mapreduce.shuffle.ssl.file.buffer.size
   mapreduce.shuffle.transfer.buffer.size
   mapreduce.shuffle.transferTo.allowed
   yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts
 Submitting bug for comment/feedback about which properties should be kept in 
 mapred-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-3283) mapred classpath CLI does not display the complete classpath

2015-01-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286384#comment-14286384
 ] 

Hudson commented on MAPREDUCE-3283:
---

FAILURE: Integrated in Hadoop-trunk-Commit #6907 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6907/])
MAPREDUCE-3283. mapred classpath CLI does not display the complete classpath. 
Contributed by Varun Saxena. (cnauroth: rev 
0742591335f15d2f8916555704c2db6124107618)
* hadoop-mapreduce-project/CHANGES.txt
* hadoop-mapreduce-project/bin/mapred
* hadoop-mapreduce-project/bin/mapred.cmd


 mapred classpath CLI does not display the complete classpath
 

 Key: MAPREDUCE-3283
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3283
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.23.0, 2.6.0
Reporter: Ramya Sunil
Assignee: Varun Saxena
Priority: Minor
  Labels: newbie
 Fix For: 2.7.0

 Attachments: MAPREDUCE-3283-branch-2.001.patch, 
 MAPREDUCE-3283-branch-2.patch, MAPREDUCE-3283.002.patch, 
 MAPREDUCE-3283.003.patch, MAPREDUCE-3283.004.patch, MAPREDUCE-3283.005.patch


 bin/yarn classpath does not display the complete classpath. Below is how the 
 classpath looks like:
 {noformat}
 $HADOOP_CONF_DIR:$HADOOP_CONF_DIR::$TOOLS_JAR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:
 $HADOOP_MAPRED_HOME/bin/../modules/*:$HADOOP_MAPRED_HOME/bin/../lib/*
 {noformat}
 * has to be substituted with the actual jars. Also, $HADOOP_CONF_DIR 
 appears twice in the classpath



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6192) Create unit test to automatically compare MR related classes and mapred-default.xml

2015-01-21 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286276#comment-14286276
 ] 

Ray Chiang commented on MAPREDUCE-6192:
---

RE: Failing unit tests

Both tests pass in my tree.

 Create unit test to automatically compare MR related classes and 
 mapred-default.xml
 ---

 Key: MAPREDUCE-6192
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6192
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Minor
  Labels: supportability
 Attachments: MAPREDUCE-6192.001.patch, MAPREDUCE-6192.002.patch


 Create a unit test that will automatically compare the fields in the various 
 MapReduce related classes and mapred-default.xml. It should throw an error if 
 a property is missing in either the class or the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically

2015-01-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286866#comment-14286866
 ] 

Hudson commented on MAPREDUCE-5785:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #6910 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6910/])
MAPREDUCE-5785. Derive heap size or mapreduce.*.memory.mb automatically. (Gera 
Shegalov and Karthik Kambatla via gera) (gera: rev 
a003f71cacd35834a1abbc2ffb5446a1166caf73)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestMapReduceChildJVM.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobConf.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java


 Derive heap size or mapreduce.*.memory.mb automatically
 ---

 Key: MAPREDUCE-5785
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, 
 MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch, 
 mr-5785-7.patch, mr-5785-8.patch, mr-5785-9.patch


 Currently users have to set 2 memory-related configs per Job / per task type. 
  One first chooses some container size map reduce.\*.memory.mb and then a 
 corresponding maximum Java heap size Xmx  map reduce.\*.memory.mb. This 
 makes sure that the JVM's C-heap (native memory + Java heap) does not exceed 
 this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
 - allocating big containers whereas the JVM will only use the default 
 -Xmx200m.
 - allocating small containers that will OOM because Xmx is too high.
 With this JIRA, we propose to set Xmx automatically based on an empirical 
 ratio that can be adjusted. Xmx is not changed automatically if provided by 
 the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically

2015-01-21 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-5785:
-
Resolution: Fixed
  Assignee: Gera Shegalov  (was: Karthik Kambatla)
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks [~kasha] for collaborating on this patch! 

 Derive heap size or mapreduce.*.memory.mb automatically
 ---

 Key: MAPREDUCE-5785
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, 
 MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch, 
 mr-5785-7.patch, mr-5785-8.patch, mr-5785-9.patch


 Currently users have to set 2 memory-related configs per Job / per task type. 
  One first chooses some container size map reduce.\*.memory.mb and then a 
 corresponding maximum Java heap size Xmx  map reduce.\*.memory.mb. This 
 makes sure that the JVM's C-heap (native memory + Java heap) does not exceed 
 this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
 - allocating big containers whereas the JVM will only use the default 
 -Xmx200m.
 - allocating small containers that will OOM because Xmx is too high.
 With this JIRA, we propose to set Xmx automatically based on an empirical 
 ratio that can be adjusted. Xmx is not changed automatically if provided by 
 the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6209) Implement a heuristic to auto-size Java heap of MRAppMaster container proportional to the job size

2015-01-21 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285503#comment-14285503
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-6209:
---

Karthik and Gera, thanks for your explanation. I got it. Do you have good 
heuristics to decide base value for making the heap size propotional?

 Implement a heuristic to auto-size Java heap of MRAppMaster container 
 proportional to the job size
 --

 Key: MAPREDUCE-6209
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6209
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster
Reporter: Gera Shegalov

 The size of Java heap required by the AM is linearly proportional to the size 
 of the MR job (number of mappers/splits and reducers). it would be nice if 
 users did not have to adjust the AM container size when transitioning from 
 testing job on a small sample to a production job on a full-scale dataset. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files

2015-01-21 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285390#comment-14285390
 ] 

Gera Shegalov commented on MAPREDUCE-4815:
--

Thanks for the latest patch, [~l201514]! Some comments/questions:

1. we are changing the behavior and not the api, we can have a property
{{mapreduce.fileoutputcommitter.algorithm.version}}
1: the old behavior. This should be the default unless we have solved the 
upgrade in an efficient bullet-proof manner.
2: the new proposed design.

Why the flag for the new behavior is not initialized when 
{{FileOutputCommitter#FileOutputCommitter(Path, TaskAttemptContext)}} is used.

There is a minor difference between {{runOldCommitJob}} and {{runNewCommitJob}} 
in that the lengthy copy iterator is skipped. Therefore, no need to duplicate 
code. Enclose this copy loop into some {{if (version == 1)}}. I think it’s 
sufficient to have such checks for {{commit/recoverTask}} as well.

Code under the comment 
{code}
//for backwards compatibility after upgrade to the new fileOutputCommitter,
//check if there are any output left in committedTaskPath
{code} 
seems misplaced and should actually be under {{runNewRecoverTask}}. This 
scenario will need a test. Equally the existing tests should be run under both 
the new and the old logic.

 FileOutputCommitter.commitJob can be very slow for jobs with many output files
 --

 Key: MAPREDUCE-4815
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
Reporter: Jason Lowe
Assignee: Siqi Li
 Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, 
 MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, 
 MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch


 If a job generates many files to commit then the commitJob method call at the 
 end of the job can take minutes.  This is a performance regression from 1.x, 
 as 1.x had the tasks commit directly to the final output directory as they 
 were completing and commitJob had very little to do.  The commit work was 
 processed in parallel and overlapped the processing of outstanding tasks.  In 
 0.23/2.x, the commit is single-threaded and waits until all tasks have 
 completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)