[jira] [Commented] (MAPREDUCE-5420) Remove mapreduce.task.tmp.dir from mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239075#comment-14239075 ] Sandy Ryza commented on MAPREDUCE-5420: --- What's up with the findbugs warnings though? Remove mapreduce.task.tmp.dir from mapred-default.xml - Key: MAPREDUCE-5420 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5420 Project: Hadoop Map/Reduce Issue Type: Task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: James Carman Labels: newbie Attachments: MAPREDUCE-5420.patch, MAPREDUCE-5420.patch mapreduce.task.tmp.dir no longer has any effect, so it should no longer be documented in mapred-default. (There is no YARN equivalent for the property. It now is just always ./tmp). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5420) Remove mapreduce.task.tmp.dir from mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239074#comment-14239074 ] Sandy Ryza commented on MAPREDUCE-5420: --- This looks good to me. Remove mapreduce.task.tmp.dir from mapred-default.xml - Key: MAPREDUCE-5420 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5420 Project: Hadoop Map/Reduce Issue Type: Task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: James Carman Labels: newbie Attachments: MAPREDUCE-5420.patch, MAPREDUCE-5420.patch mapreduce.task.tmp.dir no longer has any effect, so it should no longer be documented in mapred-default. (There is no YARN equivalent for the property. It now is just always ./tmp). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5785) Derive task attempt JVM max heap size and io.sort.mb automatically from mapreduce.*.memory.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220641#comment-14220641 ] Sandy Ryza commented on MAPREDUCE-5785: --- Hi Karthik. Took a look at the patch. Had some comments - mostly stylistic. {code} -return adminClasspath + + userClasspath; + return jobConf.getTaskJavaOpts(isMapTask ? TaskType.MAP : TaskType.REDUCE); {code} Wrong indentation? {code} + public String getTaskJavaOpts(TaskType taskType) { + +String javaOpts = getConfiguredTaskJavaOpts(taskType); {code} Unnecessary blank line. {code} + LOG.info(Task java.opts does not specify max heap size, setting using + + mapreduce.*.memory.mb * + + MRJobConfig.HEAP_MEMORY_MB_RATIO); {code} Can we condense this and the log further down into a single message? {code} +if (LOG.isWarnEnabled()) { {code} Why use isWarnEnabled when we don't use isInfoEnabled? {code} + final int taskContainerMb = getMemoryRequired(taskType); {code} Any reason this should be final? Convention is usually not to declare local variables final unless they need to be (like referenced by an anonymous class). {code} + int taskHeapSize =(int)Math.ceil(taskContainerMb * heapRatio); {code} Should have a space after the =. {code} + public static int parseMaximumHeapSizeMB(String javaOpts) { {code} Can this (and others) be marked Private? {code} +int memory = 1024; {code} It looks like this value will be overwritten in all cases. {code} + (heapSize = parseMaximumHeapSizeMB( + getConfiguredTaskJavaOpts(taskType))) 0) { {code} This is a little weird. Can we assign heapSize outside of the condition? {code} +memory = +getInt(MRJobConfig.REDUCE_MEMORY_MB, +MRJobConfig.DEFAULT_REDUCE_MEMORY_MB); {code} This can be on 2 lines. {code} + If -Xmx is not set, it is inferred from mapreduc.{map|reduce}.memory.mb and {code} Missing an e at the end of mapreduc. {code} + descriptionThe ratio container between heap-size and container-size +If no -Xmx specified, it's calculated from the container memory +requirement: mapreduce.*.memory.mb * mapreduce.heap.memory-mb.ratio. +If -Xmx is specified but not mapreduce.*.memory.mb, it's calculated as +heapSize / mapreduce.heap.memory-mb.ratio. {code} Need a period after container size. * meaning both multiplication and either map or reduce is a little confusing here. It might be better to spell out {map|reduce} inside the config properties, which would also be consistent with how they're references above. Also, other descriptions tend to use it is instead of it's. {code} descriptionThe amount of memory to request from the scheduler for each - reduce task. +reduce task. If this is not specified, it is inferred from {code} Indentation here is inconsistent with other places. Any reason to have getTaskJavaOpts in JobConf instead of MapReduceChildJVM? Derive task attempt JVM max heap size and io.sort.mb automatically from mapreduce.*.memory.mb - Key: MAPREDUCE-5785 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, task Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, MAPREDUCE-5785.v03.patch, mr-5785-4.patch Currently users have to set 2 memory-related configs per Job / per task type. One first chooses some container size map reduce.\*.memory.mb and then a corresponding maximum Java heap size Xmx map reduce.\*.memory.mb. This makes sure that the JVM's C-heap (native memory + Java heap) does not exceed this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be - allocating big containers whereas the JVM will only use the default -Xmx200m. - allocating small containers that will OOM because Xmx is too high. With this JIRA, we propose to set Xmx automatically based on an empirical ratio that can be adjusted. Xmx is not changed automatically if provided by the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5785) Derive task attempt JVM max heap size and io.sort.mb automatically from mapreduce.*.memory.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221522#comment-14221522 ] Sandy Ryza commented on MAPREDUCE-5785: --- This looks almost there. Leaving the indentation seems fine to me. JobConf is already kind of a god class. I think that the more we can avoid further cluttering it by moving code closer to its access points, the better. {code} +final JobConf conf = new JobConf(new Configuration()); {code} The patch uses final in a lot of places that MR code conventionally does not. Even if this is better practice, I don't think now is the time to start. Derive task attempt JVM max heap size and io.sort.mb automatically from mapreduce.*.memory.mb - Key: MAPREDUCE-5785 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, task Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch Currently users have to set 2 memory-related configs per Job / per task type. One first chooses some container size map reduce.\*.memory.mb and then a corresponding maximum Java heap size Xmx map reduce.\*.memory.mb. This makes sure that the JVM's C-heap (native memory + Java heap) does not exceed this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be - allocating big containers whereas the JVM will only use the default -Xmx200m. - allocating small containers that will OOM because Xmx is too high. With this JIRA, we propose to set Xmx automatically based on an empirical ratio that can be adjusted. Xmx is not changed automatically if provided by the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically
[ https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221630#comment-14221630 ] Sandy Ryza commented on MAPREDUCE-5785: --- +1 Derive heap size or mapreduce.*.memory.mb automatically --- Key: MAPREDUCE-5785 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, task Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch Currently users have to set 2 memory-related configs per Job / per task type. One first chooses some container size map reduce.\*.memory.mb and then a corresponding maximum Java heap size Xmx map reduce.\*.memory.mb. This makes sure that the JVM's C-heap (native memory + Java heap) does not exceed this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be - allocating big containers whereas the JVM will only use the default -Xmx200m. - allocating small containers that will OOM because Xmx is too high. With this JIRA, we propose to set Xmx automatically based on an empirical ratio that can be adjusted. Xmx is not changed automatically if provided by the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically
[ https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5785: -- Target Version/s: 3.0.0 (was: 2.4.0) Derive heap size or mapreduce.*.memory.mb automatically --- Key: MAPREDUCE-5785 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, task Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch Currently users have to set 2 memory-related configs per Job / per task type. One first chooses some container size map reduce.\*.memory.mb and then a corresponding maximum Java heap size Xmx map reduce.\*.memory.mb. This makes sure that the JVM's C-heap (native memory + Java heap) does not exceed this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be - allocating big containers whereas the JVM will only use the default -Xmx200m. - allocating small containers that will OOM because Xmx is too high. With this JIRA, we propose to set Xmx automatically based on an empirical ratio that can be adjusted. Xmx is not changed automatically if provided by the user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6009) Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled in a reduce slot
[ https://issues.apache.org/jira/browse/MAPREDUCE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178503#comment-14178503 ] Sandy Ryza commented on MAPREDUCE-6009: --- +1 Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled in a reduce slot Key: MAPREDUCE-6009 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6009 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, job submission Affects Versions: 1.2.1 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: MAPREDUCE-6009.v01-branch-1.2.patch, MAPREDUCE-6009.v02-branch-1.2.patch In branch 1 job commit is executed in a JOB_CLEANUP task that may run in either map or reduce slot in org.apache.hadoop.mapreduce.Job#setUseNewAPI there is a logic setting new-api flag only for reduce-ful jobs. {code} if (numReduces != 0) { conf.setBooleanIfUnset(mapred.reducer.new-api, conf.get(oldReduceClass) == null); ... {code} Therefore, when cleanup runs in a reduce slot, ReduceTask inits using the old API and runs incorrect default OutputCommitter, instead of consulting OutputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6009) Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled in a reduce slot
[ https://issues.apache.org/jira/browse/MAPREDUCE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-6009: -- Resolution: Fixed Fix Version/s: 1.2.2 1.3.0 Status: Resolved (was: Patch Available) Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled in a reduce slot Key: MAPREDUCE-6009 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6009 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, job submission Affects Versions: 1.2.1 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Fix For: 1.3.0, 1.2.2 Attachments: MAPREDUCE-6009.v01-branch-1.2.patch, MAPREDUCE-6009.v02-branch-1.2.patch In branch 1 job commit is executed in a JOB_CLEANUP task that may run in either map or reduce slot in org.apache.hadoop.mapreduce.Job#setUseNewAPI there is a logic setting new-api flag only for reduce-ful jobs. {code} if (numReduces != 0) { conf.setBooleanIfUnset(mapred.reducer.new-api, conf.get(oldReduceClass) == null); ... {code} Therefore, when cleanup runs in a reduce slot, ReduceTask inits using the old API and runs incorrect default OutputCommitter, instead of consulting OutputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6009) Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled in a reduce slot
[ https://issues.apache.org/jira/browse/MAPREDUCE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14167401#comment-14167401 ] Sandy Ryza commented on MAPREDUCE-6009: --- Thanks, this looks good to me. Have you been able to run tests? Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled in a reduce slot Key: MAPREDUCE-6009 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6009 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, job submission Affects Versions: 1.2.1 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: MAPREDUCE-6009.v01-branch-1.2.patch, MAPREDUCE-6009.v02-branch-1.2.patch In branch 1 job commit is executed in a JOB_CLEANUP task that may run in either map or reduce slot in org.apache.hadoop.mapreduce.Job#setUseNewAPI there is a logic setting new-api flag only for reduce-ful jobs. {code} if (numReduces != 0) { conf.setBooleanIfUnset(mapred.reducer.new-api, conf.get(oldReduceClass) == null); ... {code} Therefore, when cleanup runs in a reduce slot, ReduceTask inits using the old API and runs incorrect default OutputCommitter, instead of consulting OutputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6009) Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled in a reduce slot
[ https://issues.apache.org/jira/browse/MAPREDUCE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165985#comment-14165985 ] Sandy Ryza commented on MAPREDUCE-6009: --- Is this an issue in Hadoop 2 as well? Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled in a reduce slot Key: MAPREDUCE-6009 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6009 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, job submission Affects Versions: 1.2.1 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: MAPREDUCE-6009.v01-branch-1.2.patch In branch 1 job commit is executed in a JOB_CLEANUP task that may run in either map or reduce slot in org.apache.hadoop.mapreduce.Job#setUseNewAPI there is a logic setting new-api flag only for reduce-ful jobs. {code} if (numReduces != 0) { conf.setBooleanIfUnset(mapred.reducer.new-api, conf.get(oldReduceClass) == null); ... {code} Therefore, when cleanup runs in a reduce slot, ReduceTask inits using the old API and runs incorrect default OutputCommitter, instead of consulting OutputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6009) Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled in a reduce slot
[ https://issues.apache.org/jira/browse/MAPREDUCE-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166012#comment-14166012 ] Sandy Ryza commented on MAPREDUCE-6009: --- Just to be conservative with changes, could we leave these in the if statement? {code} +if (conf.getUseNewReducer()) { + String mode = new reduce API; + ensureNotSet(mapred.output.format.class, mode); + ensureNotSet(oldReduceClass, mode); +} else { + String mode = reduce compatability; + ensureNotSet(JobContext.OUTPUT_FORMAT_CLASS_ATTR, mode); + ensureNotSet(JobContext.REDUCE_CLASS_ATTR, mode); +} {code} Map-only job with new-api runs wrong OutputCommitter when cleanup scheduled in a reduce slot Key: MAPREDUCE-6009 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6009 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, job submission Affects Versions: 1.2.1 Reporter: Gera Shegalov Assignee: Gera Shegalov Priority: Blocker Attachments: MAPREDUCE-6009.v01-branch-1.2.patch In branch 1 job commit is executed in a JOB_CLEANUP task that may run in either map or reduce slot in org.apache.hadoop.mapreduce.Job#setUseNewAPI there is a logic setting new-api flag only for reduce-ful jobs. {code} if (numReduces != 0) { conf.setBooleanIfUnset(mapred.reducer.new-api, conf.get(oldReduceClass) == null); ... {code} Therefore, when cleanup runs in a reduce slot, ReduceTask inits using the old API and runs incorrect default OutputCommitter, instead of consulting OutputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (MAPREDUCE-6066) Speculative attempts should not run on the same node as their original attempt
[ https://issues.apache.org/jira/browse/MAPREDUCE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza moved YARN-2491 to MAPREDUCE-6066: - Component/s: (was: scheduler) scheduler applicationmaster Affects Version/s: (was: 3.0.0) 3.0.0 Key: MAPREDUCE-6066 (was: YARN-2491) Project: Hadoop Map/Reduce (was: Hadoop YARN) Speculative attempts should not run on the same node as their original attempt -- Key: MAPREDUCE-6066 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6066 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, scheduler Affects Versions: 3.0.0 Reporter: Todd Lipcon I'm seeing a behavior on trunk with fair scheduler enabled where a speculative reduce attempt is getting run on the same node as its original attempt. This doesn't make sense -- the main reason for speculative execution is to deal with a slow node, so scheduling a second attempt on the same node would just make the problem worse if anything. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5130) Add missing job config options to mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107472#comment-14107472 ] Sandy Ryza commented on MAPREDUCE-5130: --- My bad. Thanks for catching this. Just pushed a fix. Add missing job config options to mapred-default.xml Key: MAPREDUCE-5130 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5130 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Ray Chiang Attachments: MAPREDUCE-5130-1.patch, MAPREDUCE-5130-1.patch, MAPREDUCE-5130-2.patch, MAPREDUCE-5130-3.patch, MAPREDUCE-5130-4.patch, MAPREDUCE-5130-5.patch, MAPREDUCE-5130-6.patch, MAPREDUCE-5130.patch, MAPREDUCE-5130.patch I came across that mapreduce.map.java.opts and mapreduce.reduce.java.opts were missing in mapred-default.xml. I'll do a fuller sweep to see what else is missing before posting a patch. List so far: mapreduce.map/reduce.java.opts mapreduce.map/reduce.memory.mb mapreduce.job.jvm.numtasks mapreduce.input.lineinputformat.linespermap mapreduce.task.combine.progress.records mapreduce.map/reduce.env -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5130) Add missing job config options to mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5130: -- Assignee: Ray Chiang (was: Sandy Ryza) Add missing job config options to mapred-default.xml Key: MAPREDUCE-5130 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5130 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Ray Chiang Attachments: MAPREDUCE-5130-1.patch, MAPREDUCE-5130-1.patch, MAPREDUCE-5130-2.patch, MAPREDUCE-5130-3.patch, MAPREDUCE-5130-4.patch, MAPREDUCE-5130-5.patch, MAPREDUCE-5130-6.patch, MAPREDUCE-5130.patch, MAPREDUCE-5130.patch I came across that mapreduce.map.java.opts and mapreduce.reduce.java.opts were missing in mapred-default.xml. I'll do a fuller sweep to see what else is missing before posting a patch. List so far: mapreduce.map/reduce.java.opts mapreduce.map/reduce.memory.mb mapreduce.job.jvm.numtasks mapreduce.input.lineinputformat.linespermap mapreduce.task.combine.progress.records mapreduce.map/reduce.env -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5130) Add missing job config options to mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104152#comment-14104152 ] Sandy Ryza commented on MAPREDUCE-5130: --- +1 Add missing job config options to mapred-default.xml Key: MAPREDUCE-5130 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5130 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Affects Versions: 2.0.4-alpha Reporter: Sandy Ryza Assignee: Ray Chiang Attachments: MAPREDUCE-5130-1.patch, MAPREDUCE-5130-1.patch, MAPREDUCE-5130-2.patch, MAPREDUCE-5130-3.patch, MAPREDUCE-5130-4.patch, MAPREDUCE-5130-5.patch, MAPREDUCE-5130-6.patch, MAPREDUCE-5130.patch, MAPREDUCE-5130.patch I came across that mapreduce.map.java.opts and mapreduce.reduce.java.opts were missing in mapred-default.xml. I'll do a fuller sweep to see what else is missing before posting a patch. List so far: mapreduce.map/reduce.java.opts mapreduce.map/reduce.memory.mb mapreduce.job.jvm.numtasks mapreduce.input.lineinputformat.linespermap mapreduce.task.combine.progress.records mapreduce.map/reduce.env -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027977#comment-14027977 ] Sandy Ryza commented on MAPREDUCE-5896: --- bq. Can we make InputSplitLocationInfo extend InputSplit? It doesn't make sense for any class to implement only InputSplitLocationInfo without implementing InputSplit. Will do. bq. Nothing to do with this patch. It is unfortunate that mapreduce.InputSplit doesn't implement mapred.InputSplit. Would it be easy to fix it? Not entirely sure the reasoning there, but as this stuff can have binary compatibility implications in mysterious ways, I'd rather not touch it if we don't need to. bq. The following two constants should probably be in SplitLocationInfo? They're only used in FileSplit and not in SplitLocationInfo - is there utility in moving them away from where they're used? I'd like to avoid adding these constants to the API because, when we include additional storage types, each SplitLocationInfo could end up as a union of storage types - needing to add a ON_DISK_AND_IN_FLASH_AND_IN_MEMORY would be ugly. bq. Instead of assigning ON_DISK by default, would it make sense to set it post null-check after the loop for checking if it is in memory. Any advantage to this? It would add extra code, an extra branch, and I don't think be particularly more readable. bq. Do you think it would make sense to include the string corresponding to the location in SplitLocationInfo itself? Will do. Allow InputSplits to indicate which locations have the block cached in memory - Key: MAPREDUCE-5896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5896: -- Attachment: MAPREDUCE-5896-2.patch Allow InputSplits to indicate which locations have the block cached in memory - Key: MAPREDUCE-5896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896-2.patch, MAPREDUCE-5896.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015502#comment-14015502 ] Sandy Ryza commented on MAPREDUCE-5777: --- Thanks for the patch Zhihai. It looks good, just a few nits: {code} +int newMaxLineLength = (int) Math.min(3L + (long) maxLineLength, +Integer.MAX_VALUE); {code} Why use longs here? {code} + LOG.info(Found UTF-8 BOM and Skipped it); {code} Skipped shouldn't be capitalized. Can we move the new code into a sub-method, skipUtfByteOrderMark? Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu Attachments: MAPREDUCE-5777.000.patch, MAPREDUCE-5777.001.patch, MAPREDUCE-5777.002.patch, MAPREDUCE-5777.003.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5777) Support utf-8 text with BOM (byte order marker)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016122#comment-14016122 ] Sandy Ryza commented on MAPREDUCE-5777: --- +1 Support utf-8 text with BOM (byte order marker) --- Key: MAPREDUCE-5777 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5777 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0, 2.2.0 Reporter: bc Wong Assignee: zhihai xu Attachments: MAPREDUCE-5777.000.patch, MAPREDUCE-5777.001.patch, MAPREDUCE-5777.002.patch, MAPREDUCE-5777.003.patch, MAPREDUCE-5777.004.patch UTF-8 text may have a BOM. TextInputFormat, KeyValueTextInputFormat and friends should recognize the BOM and not treat it as actual data. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5896: -- Attachment: MAPREDUCE-5896-1.patch Allow InputSplits to indicate which locations have the block cached in memory - Key: MAPREDUCE-5896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011391#comment-14011391 ] Sandy Ryza commented on MAPREDUCE-5896: --- Updated patch sets all the new APIs to Evolving, fixes the typo that Tom noticed, includes cached hosts in mapred.FileInputFormat split generation, and adds tests. Allow InputSplits to indicate which locations have the block cached in memory - Key: MAPREDUCE-5896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5896-1.patch, MAPREDUCE-5896.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5862) Line records longer than 2x split size aren't handled correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011505#comment-14011505 ] Sandy Ryza commented on MAPREDUCE-5862: --- Thanks Jason! Line records longer than 2x split size aren't handled correctly --- Key: MAPREDUCE-5862 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5862 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: bc Wong Assignee: bc Wong Priority: Critical Fix For: 3.0.0, 2.5.0 Attachments: 0001-Handle-records-larger-than-2x-split-size.1.patch, 0001-Handle-records-larger-than-2x-split-size.patch, 0001-Handle-records-larger-than-2x-split-size.patch, 0001-MAPREDUCE-5862.-Line-records-longer-than-2x-split-si.patch, recordSpanningMultipleSplits.txt.bz2 Suppose this split (100-200) is in the middle of a record (90-240): {noformat} 0 100200 300 | split | curr | split | --- record --- 90 240 {noformat} Currently, the first split would read the entire record, up to offset 240, which is good. But the 2nd split has a bug in producing a phantom record of (200, 240). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009055#comment-14009055 ] Sandy Ryza commented on MAPREDUCE-5896: --- What should the criteria be for marking this stable? I'd like to start using this in downstream projects (Spark, and I believe Tez could benefit as well) as soon as possible, and an Evolving annotation would prevent this. Allow InputSplits to indicate which locations have the block cached in memory - Key: MAPREDUCE-5896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5896.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004424#comment-14004424 ] Sandy Ryza commented on MAPREDUCE-5896: --- Given HDFS's plans for hierarchical storage management, I think it would be good to make this extensible to handle storage mediums beyond memory. I talked this over with [~andrew.wang] and [~atm] and we think the right interface would be something like a SplitLocationInfo class, with isInMemory() and isOnDisk() methods. We can later add isInFlash() and possibly even getDisk() to return which disk the data is on. InputSplits would have a SplitLocationInfo[] getLocationInfo() method that returns info about how the data is stored on each host returned by getLocations(). Allow InputSplits to indicate which locations have the block cached in memory - Key: MAPREDUCE-5896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Sandy Ryza -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5896: -- Attachment: MAPREDUCE-5896.patch Allow InputSplits to indicate which locations have the block cached in memory - Key: MAPREDUCE-5896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5896.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5896: -- Status: Patch Available (was: Open) Allow InputSplits to indicate which locations have the block cached in memory - Key: MAPREDUCE-5896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5896.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005194#comment-14005194 ] Sandy Ryza commented on MAPREDUCE-5896: --- Uploaded a POC patch. I'll add some tests if others think the APIs make sense. Allow InputSplits to indicate which locations have the block cached in memory - Key: MAPREDUCE-5896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5896.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5896) Allow InputSplits to indicate which locations have the block cached in memory
Sandy Ryza created MAPREDUCE-5896: - Summary: Allow InputSplits to indicate which locations have the block cached in memory Key: MAPREDUCE-5896 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5896 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Sandy Ryza Assignee: Sandy Ryza -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5887) Move split creation from submission client to MRAppMaster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5887: -- Resolution: Duplicate Status: Resolved (was: Patch Available) Move split creation from submission client to MRAppMaster - Key: MAPREDUCE-5887 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5887 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, client Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5887.v01.patch This JIRA is filed to improve scalability of job submission, specifically when there is a significant latency between the submission client and the cluster nodes RM and NN, e.g. in a multi-datacenter environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5887) Move split creation from submission client to MRAppMaster
[ https://issues.apache.org/jira/browse/MAPREDUCE-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994812#comment-13994812 ] Sandy Ryza commented on MAPREDUCE-5887: --- Mind attaching your patch on MAPREDUCE-207? Move split creation from submission client to MRAppMaster - Key: MAPREDUCE-5887 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5887 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, client Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5887.v01.patch This JIRA is filed to improve scalability of job submission, specifically when there is a significant latency between the submission client and the cluster nodes RM and NN, e.g. in a multi-datacenter environment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5877) Inconsistency between JT/TT for tasks taking a long time to launch
[ https://issues.apache.org/jira/browse/MAPREDUCE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5877: -- Resolution: Fixed Fix Version/s: 1.3.0 Target Version/s: (was: 1.2.2) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to branch-1 Inconsistency between JT/TT for tasks taking a long time to launch -- Key: MAPREDUCE-5877 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5877 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker, tasktracker Affects Versions: 1.2.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Fix For: 1.3.0 Attachments: mr-5877-1.patch, repro-mr-5877.patch For the tasks that take too long to launch (for genuine reasons like large distributed caches), JT expires the task. Depending on whether job recovery is enabled and the JT's restart state, another attempt is launched or not even when the JT is not restarted. The status of the attempt changes to Error launching task. Meanwhile, the TT is not informed of this task expiry and eventually launches the task. Also, the new attempt might be assigned to the same TT leading to more inconsistent behavior. To avoid this, one can bump up the mapred.tasktracker.expiry.interval, but leading to long TT failure discovery times. We should have a per-job timeout for task launches/ heartbeat and JT/TT should be consistent in what they say. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5877) Inconsistency between JT/TT for tasks taking a long time to launch
[ https://issues.apache.org/jira/browse/MAPREDUCE-5877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990063#comment-13990063 ] Sandy Ryza commented on MAPREDUCE-5877: --- This approach makes sense to me - +1. Inconsistency between JT/TT for tasks taking a long time to launch -- Key: MAPREDUCE-5877 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5877 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker, tasktracker Affects Versions: 1.2.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: mr-5877-1.patch, repro-mr-5877.patch For the tasks that take too long to launch (for genuine reasons like large distributed caches), JT expires the task. Depending on whether job recovery is enabled and the JT's restart state, another attempt is launched or not even when the JT is not restarted. The status of the attempt changes to Error launching task. Meanwhile, the TT is not informed of this task expiry and eventually launches the task. Also, the new attempt might be assigned to the same TT leading to more inconsistent behavior. To avoid this, one can bump up the mapred.tasktracker.expiry.interval, but leading to long TT failure discovery times. We should have a per-job timeout for task launches/ heartbeat and JT/TT should be consistent in what they say. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5875) Make Counter limits consistent conf across JobClient, MRAppMaster, and YarnChild
[ https://issues.apache.org/jira/browse/MAPREDUCE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1398#comment-1398 ] Sandy Ryza commented on MAPREDUCE-5875: --- Is this a dupe of MAPREDUCE-5856? Make Counter limits consistent conf across JobClient, MRAppMaster, and YarnChild Key: MAPREDUCE-5875 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5875 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, client, task Affects Versions: 2.4.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5875.v01.patch Currently, counter limits mapreduce.job.counters.* handled by {{org.apache.hadoop.mapreduce.counters.Limits}} are initialized asymmetrically: on the client side, and on the AM, job.xml is ignored whereas it's taken into account in YarnChild. It would be good to make the Limits job-configurable, such that max counters/groups is only increased when needed. With the current Limits implementation relying on static constants, it's going to be challenging for tools that submit jobs concurrently (e.g. via class loading isolation). The patch that I am uploading is not perfect but demonstrates the issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5862) Line records longer than 2x split size aren't handled correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983935#comment-13983935 ] Sandy Ryza commented on MAPREDUCE-5862: --- {code} +checkRecordSpanningMultipleSplits(recordSpanningMultipleSplits.txt.bz2, + 200 * 1000, + true); {code} indentation should be: {code} +checkRecordSpanningMultipleSplits(recordSpanningMultipleSplits.txt.bz2, +200 * 1000, true); {code} I can fix these on commit. Otherwise, the updated patch looks good to me. [~jlowe], anything you see that I'm missing? Line records longer than 2x split size aren't handled correctly --- Key: MAPREDUCE-5862 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5862 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: bc Wong Assignee: bc Wong Priority: Critical Attachments: 0001-Handle-records-larger-than-2x-split-size.1.patch, 0001-Handle-records-larger-than-2x-split-size.patch, 0001-Handle-records-larger-than-2x-split-size.patch, recordSpanningMultipleSplits.txt.bz2 Suppose this split (100-200) is in the middle of a record (90-240): {noformat} 0 100200 300 | split | curr | split | --- record --- 90 240 {noformat} Currently, the first split would read the entire record, up to offset 240, which is good. But the 2nd split has a bug in producing a phantom record of (200, 240). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5844) Reducer Preemption is too aggressive
[ https://issues.apache.org/jira/browse/MAPREDUCE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973866#comment-13973866 ] Sandy Ryza commented on MAPREDUCE-5844: --- Hi [~maysamyabandeh], you're not misreading the code - headroom calculation in the Fair Scheduler needs to be fixed. I filed YARN-1959 for this. Reducer Preemption is too aggressive Key: MAPREDUCE-5844 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5844 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh We observed cases where the reducer preemption makes the job finish much later, and the preemption does not seem to be necessary since after preemption both the preempted reducer and the mapper are assigned immediately--meaning that there was already enough space for the mapper. The logic for triggering preemption is at RMContainerAllocator::preemptReducesIfNeeded The preemption is triggered if the following is true: {code} headroom + am * |m| + pr * |r| mapResourceRequest {code} where am: number of assigned mappers, |m| is mapper size, pr is number of reducers being preempted, and |r| is the reducer size. The original idea apparently was that if headroom is not big enough for the new mapper requests, reducers should be preempted. This would work if the job is alone in the cluster. Once we have queues, the headroom calculation becomes more complicated and it would require a separate headroom calculation per queue/job. So, as a result headroom variable is kind of given up currently: *headroom is always set to 0* What this implies to the speculation is that speculation becomes very aggressive, not considering whether there is enough space for the mappers or not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (MAPREDUCE-5822) FairScheduler isStartvedForFairShare does not work when fairShare == 1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza moved YARN-1909 to MAPREDUCE-5822: - Key: MAPREDUCE-5822 (was: YARN-1909) Project: Hadoop Map/Reduce (was: Hadoop YARN) FairScheduler isStartvedForFairShare does not work when fairShare == 1 -- Key: MAPREDUCE-5822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5822 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot If the fair share returned by the scheduler getFairShare() == 1 the pool will never be marked as being starved because of the following calculation: {code} boolean isStarvedForFairShare(PoolSchedulable sched) { int desiredFairShare = (int) Math.floor(Math.min( sched.getFairShare() / 2, sched.getDemand())); return (sched.getRunningTasks() desiredFairShare); } {code} getFairShare() returns 1 Math.min calculation will return 0.5 Math.Floor() which will cause the desiredFairShare to be set to 0. the return value to be 'false' (0 0) If you have a small job without a minimum set it will not get scheduled if a large job is hogging the slots. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5822) FairScheduler isStartvedForFairShare does not work when fairShare == 1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5822: -- Affects Version/s: 1.2.1 FairScheduler isStartvedForFairShare does not work when fairShare == 1 -- Key: MAPREDUCE-5822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5822 Project: Hadoop Map/Reduce Issue Type: Bug Components: scheduler Affects Versions: 1.2.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot If the fair share returned by the scheduler getFairShare() == 1 the pool will never be marked as being starved because of the following calculation: {code} boolean isStarvedForFairShare(PoolSchedulable sched) { int desiredFairShare = (int) Math.floor(Math.min( sched.getFairShare() / 2, sched.getDemand())); return (sched.getRunningTasks() desiredFairShare); } {code} getFairShare() returns 1 Math.min calculation will return 0.5 Math.Floor() which will cause the desiredFairShare to be set to 0. the return value to be 'false' (0 0) If you have a small job without a minimum set it will not get scheduled if a large job is hogging the slots. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5822) FairScheduler isStartvedForFairShare does not work when fairShare == 1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5822: -- Component/s: scheduler FairScheduler isStartvedForFairShare does not work when fairShare == 1 -- Key: MAPREDUCE-5822 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5822 Project: Hadoop Map/Reduce Issue Type: Bug Components: scheduler Affects Versions: 1.2.1 Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot If the fair share returned by the scheduler getFairShare() == 1 the pool will never be marked as being starved because of the following calculation: {code} boolean isStarvedForFairShare(PoolSchedulable sched) { int desiredFairShare = (int) Math.floor(Math.min( sched.getFairShare() / 2, sched.getDemand())); return (sched.getRunningTasks() desiredFairShare); } {code} getFairShare() returns 1 Math.min calculation will return 0.5 Math.Floor() which will cause the desiredFairShare to be set to 0. the return value to be 'false' (0 0) If you have a small job without a minimum set it will not get scheduled if a large job is hogging the slots. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5759) Remove unnecessary conf load in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5759: -- Resolution: Fixed Fix Version/s: 2.5.0 Status: Resolved (was: Patch Available) Remove unnecessary conf load in Limits -- Key: MAPREDUCE-5759 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5759 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.5.0 Attachments: MAPREDUCE-5759.patch This is a continuation if MAPREDUCE-5487. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5785) Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925159#comment-13925159 ] Sandy Ryza commented on MAPREDUCE-5785: --- Something like this has been long-needed. Though I'm worried that it's not backwards-compatible - users would see their JVM max heaps change in certain situations. In situations where they didn't set the max heap, were cutting it close, but were still OK, they could see OutOfMemoryErrors after the change. Another thing is that, as a user, I care more about my max heap size than how much I request from YARN. The latter is usually a consequence of the former. One possible way around both of these would be to add a new parameter that controls max heap size and sets mapreduce.*.memory.mb accordingly. Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb -- Key: MAPREDUCE-5785 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mr-am, task Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5785.v01.patch Currently users have to set 2 memory-related configs for per Job / per task type. One fist choses some container size mapreduce.*.memory.mb and then a corresponding Xmx mapreduce.*.memory.mb to make sure that the JVM with the user code heap, and its native memory do not exceed this limit. If one forgets to tune Xmx, MR-AM might be allocating big containers whereas the JVM will only use the default -Xmx200m. With this JIRA, we propose to set Xmx automatically base on an empirical ratio that can be adjusted. Xmx is not changed automaically if provided by the user. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5763) Warn message about httpshuffle in NM logs
Sandy Ryza created MAPREDUCE-5763: - Summary: Warn message about httpshuffle in NM logs Key: MAPREDUCE-5763 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5763 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Sandy Ryza Assignee: Naren Koneru {code} 2014-02-20 12:08:45,141 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class class org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'. Because these are not the same tools trying to send ServiceData and read Service Meta Data may have issues unless the refer to the name in the config. 2014-02-20 12:08:45,142 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Adding auxiliary service httpshuffle, mapreduce_shuffle {code} I'm seeing this in my NodeManager logs, even though things work fine. A WARN is being caused by some sort of mismatch between the name of the service (in terms of org.apache.hadoop.service.Service.getName()) and the name of the auxiliary service. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (MAPREDUCE-5759) Remove unnecessary conf load in Limits
Sandy Ryza created MAPREDUCE-5759: - Summary: Remove unnecessary conf load in Limits Key: MAPREDUCE-5759 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5759 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Sandy Ryza Assignee: Sandy Ryza This is a continuation if MAPREDUCE-5487. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904300#comment-13904300 ] Sandy Ryza commented on MAPREDUCE-5487: --- Filed MAPREDUCE-5759 In task processes, JobConf is unnecessarily loaded again in Limits -- Key: MAPREDUCE-5487 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487 Project: Hadoop Map/Reduce Issue Type: Improvement Components: performance, task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch Limits statically loads a JobConf, which incurs costs of reading files from disk and parsing XML. The contents of this JobConf are identical to the one loaded by YarnChild (before adding job.xml as a resource). Allowing Limits to initialize with the JobConf loaded in YarnChild would reduce task startup time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5759) Remove unnecessary conf load in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5759: -- Attachment: MAPREDUCE-5759.patch Remove unnecessary conf load in Limits -- Key: MAPREDUCE-5759 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5759 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5759.patch This is a continuation if MAPREDUCE-5487. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5759) Remove unnecessary conf load in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5759: -- Status: Patch Available (was: Open) Remove unnecessary conf load in Limits -- Key: MAPREDUCE-5759 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5759 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.3.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5759.patch This is a continuation if MAPREDUCE-5487. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13904299#comment-13904299 ] Sandy Ryza commented on MAPREDUCE-5487: --- Very good point. Not sure how I missed that. In task processes, JobConf is unnecessarily loaded again in Limits -- Key: MAPREDUCE-5487 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487 Project: Hadoop Map/Reduce Issue Type: Improvement Components: performance, task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch Limits statically loads a JobConf, which incurs costs of reading files from disk and parsing XML. The contents of this JobConf are identical to the one loaded by YarnChild (before adding job.xml as a resource). Allowing Limits to initialize with the JobConf loaded in YarnChild would reduce task startup time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (MAPREDUCE-5745) thread may hang forever, even after it receives all the expected data
[ https://issues.apache.org/jira/browse/MAPREDUCE-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved MAPREDUCE-5745. --- Resolution: Invalid thread may hang forever, even after it receives all the expected data - Key: MAPREDUCE-5745 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5745 Project: Hadoop Map/Reduce Issue Type: Wish Reporter: Jinfeng Ni Priority: Trivial Please discard this JIRA issue (I should open it under a different project). Tried to cancel this issue, but could not find a way to do so. Sorry about this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5745) thread may hang forever, even after it receives all the expected data
[ https://issues.apache.org/jira/browse/MAPREDUCE-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894268#comment-13894268 ] Sandy Ryza commented on MAPREDUCE-5745: --- Hi Jinfeng, So you know for the future, if you accidentally open an issue under the wrong project, JIRA has a feature that allows you to move it to the intended one by clicking More and then Move. thread may hang forever, even after it receives all the expected data - Key: MAPREDUCE-5745 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5745 Project: Hadoop Map/Reduce Issue Type: Wish Reporter: Jinfeng Ni Priority: Trivial Please discard this JIRA issue (I should open it under a different project). Tried to cancel this issue, but could not find a way to do so. Sorry about this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5699) Tagging support for MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892756#comment-13892756 ] Sandy Ryza commented on MAPREDUCE-5699: --- bq. In addition to pass tags when being submitted to YARN, should we make the running MRAppMaster/Job remember the tags as well. Because the tags go in the job configuration, the MRAppMaster and JHS will have access to them. Comments on the patch: {code} ?xml version=1.0? +?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with @@ -15,7 +16,6 @@ See the License for the specific language governing permissions and limitations under the License. -- -?xml-stylesheet type=text/xsl href=configuration.xsl? {code} False change? {code} +description Tags for the job, the corresponding YARN application + inherits these tags. The ResourceManager can be queried for any of + these tags to fetch this application. {code} First sentence is a little awkward. Second makes it a little unclear that AHS can be queried for them too. Maybe use Tags for the job that will be passed to YARN at submission time. Queries to YARN for applications can filter on these tags. Otherwise, LGTM. Tagging support for MR jobs --- Key: MAPREDUCE-5699 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5699 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: mr-5699-1.patch, mr-5699-1.patch, mr-5699-2.patch YARN-1399 / YARN-1461 add support for tagging YARN applications. MR should expose to users, so they can set tags on an MR job. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5699) Tagging support for MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13892855#comment-13892855 ] Sandy Ryza commented on MAPREDUCE-5699: --- +1 pending jenkins Tagging support for MR jobs --- Key: MAPREDUCE-5699 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5699 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.3.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Attachments: mr-5699-1.patch, mr-5699-1.patch, mr-5699-2.patch, mr-5699-3.patch YARN-1399 / YARN-1461 add support for tagging YARN applications. MR should expose to users, so they can set tags on an MR job. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5719) Potential null pointer access in AbstractYarnScheduler#getTransferredContainers()
[ https://issues.apache.org/jira/browse/MAPREDUCE-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887062#comment-13887062 ] Sandy Ryza commented on MAPREDUCE-5719: --- Is my understanding correct that the null pointer can only happen in the test? Is there any way to work around this in the test code? If not, we should at least comment that we're doing the check for that reason. Potential null pointer access in AbstractYarnScheduler#getTransferredContainers() - Key: MAPREDUCE-5719 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5719 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 2.3.0 Reporter: Ted Yu Assignee: Ted Yu Attachments: mapreduce-5719-v1.txt, mapreduce-5719-v2.txt From https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1666/console : {code} Tests run: 14, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 63.12 sec FAILURE! - in org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator testCompletedTasksRecalculateSchedule(org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator) Time elapsed: 2.083 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.getTransferredContainers(AbstractYarnScheduler.java:50) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:277) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:154) at org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator$MyContainerAllocator.register(TestRMContainerAllocator.java:1476) at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:112) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:219) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator$MyContainerAllocator.init(TestRMContainerAllocator.java:1444) at org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator$RecalculateContainerAllocator.init(TestRMContainerAllocator.java:1629) at org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator.testCompletedTasksRecalculateSchedule(TestRMContainerAllocator.java:1665) {code} In above case getMasterContainer() returned null. AbstractYarnScheduler#getTransferredContainers() should check such condition. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5732) Report proper queue when job has been automatically placed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5732: -- Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Report proper queue when job has been automatically placed -- Key: MAPREDUCE-5732 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5732 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.4.0 Attachments: MAPREDUCE-5732.patch Some schedulers, such as the Fair Scheduler, provide the ability to automatically place an application into a queue based on attributes such as the user and group of the submitter. In these cases, the JobHistoryServer and AM web UI report the requested queue, not the queue that the app is actually running in. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5464) Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model
[ https://issues.apache.org/jira/browse/MAPREDUCE-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5464: -- Resolution: Fixed Fix Version/s: 2.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model --- Key: MAPREDUCE-5464 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5464 Project: Hadoop Map/Reduce Issue Type: Task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.3.0 Attachments: MAPREDUCE-5464.patch Per discussion on MAPREDUCE-5311, it would be good to have analogs for SLOTS_MILLIS that better fit the MR2 resource model. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5732) Report proper queue when job has been automatically placed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5732: -- Status: Patch Available (was: Open) Report proper queue when job has been automatically placed -- Key: MAPREDUCE-5732 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5732 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5732.patch Some schedulers, such as the Fair Scheduler, provide the ability to automatically place an application into a queue based on attributes such as the user and group of the submitter. In these cases, the JobHistoryServer and AM web UI report the requested queue, not the queue that the app is actually running in. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5732) Report proper queue when job has been automatically placed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5732: -- Attachment: MAPREDUCE-5732.patch Report proper queue when job has been automatically placed -- Key: MAPREDUCE-5732 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5732 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5732.patch Some schedulers, such as the Fair Scheduler, provide the ability to automatically place an application into a queue based on attributes such as the user and group of the submitter. In these cases, the JobHistoryServer and AM web UI report the requested queue, not the queue that the app is actually running in. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5732) Report proper queue when job has been automatically placed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883559#comment-13883559 ] Sandy Ryza commented on MAPREDUCE-5732: --- Attached patch fixes the issue by adding a JobQueueChangeEvent that updates a Job's queue in the history. I verified manually in addition to the tests. Report proper queue when job has been automatically placed -- Key: MAPREDUCE-5732 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5732 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5732.patch Some schedulers, such as the Fair Scheduler, provide the ability to automatically place an application into a queue based on attributes such as the user and group of the submitter. In these cases, the JobHistoryServer and AM web UI report the requested queue, not the queue that the app is actually running in. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5732) Report proper queue when job has been automatically placed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883599#comment-13883599 ] Sandy Ryza commented on MAPREDUCE-5732: --- Test failure is unrelated: MAPREDUCE-5719 Will fix the javac warnings Report proper queue when job has been automatically placed -- Key: MAPREDUCE-5732 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5732 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5732.patch Some schedulers, such as the Fair Scheduler, provide the ability to automatically place an application into a queue based on attributes such as the user and group of the submitter. In these cases, the JobHistoryServer and AM web UI report the requested queue, not the queue that the app is actually running in. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5732) Report proper queue when job has been automatically placed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883605#comment-13883605 ] Sandy Ryza commented on MAPREDUCE-5732: --- Actually, the javac warnings appear to be benign as well. They come from accessing deprecated fields of Avro objects in JobChangeQueueEvent. This is the way other job history events use Avro, so I think we should keep this consistent. Report proper queue when job has been automatically placed -- Key: MAPREDUCE-5732 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5732 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5732.patch Some schedulers, such as the Fair Scheduler, provide the ability to automatically place an application into a queue based on attributes such as the user and group of the submitter. In these cases, the JobHistoryServer and AM web UI report the requested queue, not the queue that the app is actually running in. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5464) Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model
[ https://issues.apache.org/jira/browse/MAPREDUCE-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5464: -- Summary: Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model (was: Add analogs of the SLOTS_MILLIS counters that fit the MR2 resource model) Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model --- Key: MAPREDUCE-5464 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5464 Project: Hadoop Map/Reduce Issue Type: Task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on MAPREDUCE-5311, it would be good to have analogs for SLOTS_MILLIS that better fit the MR2 resource model. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5464) Add analogs of the SLOTS_MILLIS counters that fit the MR2 resource model
[ https://issues.apache.org/jira/browse/MAPREDUCE-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5464: -- Summary: Add analogs of the SLOTS_MILLIS counters that fit the MR2 resource model (was: Add MEM_MILLIS_MAPS and MEM_MILLIS_REDUCES counter) Add analogs of the SLOTS_MILLIS counters that fit the MR2 resource model Key: MAPREDUCE-5464 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5464 Project: Hadoop Map/Reduce Issue Type: Task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on MAPREDUCE-5311, it would be good to have analogs for SLOTS_MILLIS that better fit the MR2 resource model. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5464) Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model
[ https://issues.apache.org/jira/browse/MAPREDUCE-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5464: -- Attachment: MAPREDUCE-5464.patch Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model --- Key: MAPREDUCE-5464 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5464 Project: Hadoop Map/Reduce Issue Type: Task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5464.patch Per discussion on MAPREDUCE-5311, it would be good to have analogs for SLOTS_MILLIS that better fit the MR2 resource model. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5464) Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model
[ https://issues.apache.org/jira/browse/MAPREDUCE-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879139#comment-13879139 ] Sandy Ryza commented on MAPREDUCE-5464: --- Attached patch adds MILLIS_MAPS, MILLIS_REDUCES, MB_MILLIS_MAPS, MB_MILLIS_REDUCES, VCORES_MILLIS_MAPS, and VCORES_MILLIS_REDUCES Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model --- Key: MAPREDUCE-5464 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5464 Project: Hadoop Map/Reduce Issue Type: Task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5464.patch Per discussion on MAPREDUCE-5311, it would be good to have analogs for SLOTS_MILLIS that better fit the MR2 resource model. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5464) Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model
[ https://issues.apache.org/jira/browse/MAPREDUCE-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5464: -- Status: Patch Available (was: Open) Add analogs of the SLOTS_MILLIS counters that jive with the YARN resource model --- Key: MAPREDUCE-5464 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5464 Project: Hadoop Map/Reduce Issue Type: Task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5464.patch Per discussion on MAPREDUCE-5311, it would be good to have analogs for SLOTS_MILLIS that better fit the MR2 resource model. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (MAPREDUCE-5732) Report proper queue when job has been automatically placed
Sandy Ryza created MAPREDUCE-5732: - Summary: Report proper queue when job has been automatically placed Key: MAPREDUCE-5732 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5732 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Sandy Ryza Assignee: Sandy Ryza Some schedulers, such as the Fair Scheduler, provide the ability to automatically place an application into a queue based on attributes such as the user and group of the submitter. In these cases, the JobHistoryServer and AM web UI report the requested queue, not the queue that the app is actually running in. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (MAPREDUCE-5464) Add MEM_MILLIS_MAPS and MEM_MILLIS_REDUCES counter
[ https://issues.apache.org/jira/browse/MAPREDUCE-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned MAPREDUCE-5464: - Assignee: Sandy Ryza Add MEM_MILLIS_MAPS and MEM_MILLIS_REDUCES counter -- Key: MAPREDUCE-5464 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5464 Project: Hadoop Map/Reduce Issue Type: Task Affects Versions: 2.1.0-beta Reporter: Sandy Ryza Assignee: Sandy Ryza Per discussion on MAPREDUCE-5311, it would be good to have analogs for SLOTS_MILLIS that better fit the MR2 resource model. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5725) TestNetworkedJob relies on the Capacity Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5725: -- Status: Patch Available (was: Open) TestNetworkedJob relies on the Capacity Scheduler - Key: MAPREDUCE-5725 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5725 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5275.patch We should either make this explicit or make it scheduler-agnostic. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5725) TestNetworkedJob relies on the Capacity Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5725: -- Attachment: MAPREDUCE-5275.patch TestNetworkedJob relies on the Capacity Scheduler - Key: MAPREDUCE-5725 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5725 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5275.patch We should either make this explicit or make it scheduler-agnostic. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5729) mapred job -list throws NPE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876694#comment-13876694 ] Sandy Ryza commented on MAPREDUCE-5729: --- +1 pending jenkins mapred job -list throws NPE --- Key: MAPREDUCE-5729 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5729 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.2.0 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Priority: Critical Attachments: mr-5729-1.patch mapred job -list throws the following NPE: {noformat} Exception in thread main java.lang.NullPointerException at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:445) at org.apache.hadoop.mapreduce.TypeConverter.fromYarnApps(TypeConverter.java:460) at org.apache.hadoop.mapred.ResourceMgrDelegate.getAllJobs(ResourceMgrDelegate.java:125) at org.apache.hadoop.mapred.YARNRunner.getAllJobs(YARNRunner.java:164) {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5725) TestNetworkedJob relies on the Capacity Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5725: -- Attachment: MAPREDUCE-5725-1.patch TestNetworkedJob relies on the Capacity Scheduler - Key: MAPREDUCE-5725 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5725 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5275.patch, MAPREDUCE-5725-1.patch We should either make this explicit or make it scheduler-agnostic. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5725) TestNetworkedJob relies on the Capacity Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5725: -- Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) TestNetworkedJob relies on the Capacity Scheduler - Key: MAPREDUCE-5725 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5725 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza Fix For: 2.4.0 Attachments: MAPREDUCE-5275.patch, MAPREDUCE-5725-1.patch We should either make this explicit or make it scheduler-agnostic. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5650) Job fails when hprof mapreduce.task.profile.map/reduce.params is specified
[ https://issues.apache.org/jira/browse/MAPREDUCE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5650: -- Resolution: Fixed Fix Version/s: 2.3.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this to trunk, branch-2, and branch-2.3. Thanks Gera! Job fails when hprof mapreduce.task.profile.map/reduce.params is specified -- Key: MAPREDUCE-5650 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5650 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.2.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 2.3.0 Attachments: MAPREDUCE-5650.v01.patch, MAPREDUCE-5650.v02.patch, MAPREDUCE-5650.v03.patch, MAPREDUCE-5650.v04.patch When one uses dedicated hprof mapreduce.task.profile.map.params or mapreduce.task.profile.reduce.params, the profiled tasks will fail to launch because hprof parameters are supplied to the child jvm twice. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
[ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872826#comment-13872826 ] Sandy Ryza commented on MAPREDUCE-5724: --- The approach sounds reasonable to me. {code} + //DistributedFileSystem returns a RemoteException with a message stating + // SafeModeException in it. So this is only way to check it is because of + // being in safe mode. {code} Should use block comments like the other (even non-public) methods in the class. Applies other places as well. {code} + throw new YarnRuntimeException( + Timed out waiting for FileSystem to become available); {code} Should report how long it waited. {code} +return ex.toString().contains(SafeModeException); {code} Two spaces between return and ex. Also, can we get the SafeModeException from ex.getCause()? {code} +boolean done = false; {code} Nit: succeeded would be clearer to me than done. Not a big deal either way. {code} + void createHistoryDirs(Clock clock, long intervalCheck, long timeOut) + throws Exception { ... + Thread.sleep(intervalCheck); {code} Should handle interrupted exception - we don't want to fail if we hit it? Then we can remove throws Exception because the only other exception in the method isn't checked? {code} + Assert.assertTrue(dfsCluster.getFileSystem().isInSafeMode()); +} catch (Exception ex) { + Assert.fail(ex.toString()); +} {code} Can asserts in other threads ever cause the test to fail? JobHistoryServer does not start if HDFS is not running -- Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Attachments: MAPREDUCE-5724.patch Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at
[jira] [Created] (MAPREDUCE-5725) TestNetworkedJob relies on the Capacity Scheduler
Sandy Ryza created MAPREDUCE-5725: - Summary: TestNetworkedJob relies on the Capacity Scheduler Key: MAPREDUCE-5725 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5725 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Sandy Ryza Assignee: Sandy Ryza We should either make this explicit or make it scheduler-agnostic. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
[ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873003#comment-13873003 ] Sandy Ryza commented on MAPREDUCE-5724: --- bq. Regarding removing the throw Exception from the createHistoryDirs(), not possible because the tryCreateHistoryDirs does throw a checked exception if the reason is other than the FS not being avail. In that case, createHistoryDirs should just throw an IOException, no? {code} return ex.toString().contains(SafeModeException); {code} Can we get the SafeModeException from ex.getCause()? Otherwise, LGTM. JobHistoryServer does not start if HDFS is not running -- Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Attachments: MAPREDUCE-5724.patch, MAPREDUCE-5724.patch Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at
[jira] [Commented] (MAPREDUCE-5724) JobHistoryServer does not start if HDFS is not running
[ https://issues.apache.org/jira/browse/MAPREDUCE-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13873075#comment-13873075 ] Sandy Ryza commented on MAPREDUCE-5724: --- bq. Regarding detecting the SafeModeException by cause, I've tried that at first, the problem is that the cause is NULL Makes sense +1 JobHistoryServer does not start if HDFS is not running -- Key: MAPREDUCE-5724 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5724 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0, 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Attachments: MAPREDUCE-5724.patch, MAPREDUCE-5724.patch, MAPREDUCE-5724.patch Starting JHS without HDFS running fails with the following error: {code} STARTUP_MSG: build = git://git.apache.org/hadoop-common.git -r ad74e8850b99e03b0b6435b04f5b3e9995bc3956; compiled by 'tucu' on 2014-01-14T22:40Z STARTUP_MSG: java = 1.7.0_45 / 2014-01-14 16:47:40,264 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: registered UNIX signal handlers for [TERM, HUP, INT] 2014-01-14 16:47:40,883 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-01-14 16:47:41,101 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: JobHistory Init 2014-01-14 16:47:41,710 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Error creating done directory: [hdfs://localhost:8020/tmp/hadoop-yarn/staging/history/done] at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:505) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:207) at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:217) Caused by: java.net.ConnectException: Call From dontknow.local/172.20.10.4 to localhost:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1410) at org.apache.hadoop.ipc.Client.call(Client.java:1359) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:185) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101) at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:671) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1722) at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:124) at
[jira] [Commented] (MAPREDUCE-5712) Backport Fair Scheduler pool placement by secondary group
[ https://issues.apache.org/jira/browse/MAPREDUCE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13869956#comment-13869956 ] Sandy Ryza commented on MAPREDUCE-5712: --- +1 Backport Fair Scheduler pool placement by secondary group - Key: MAPREDUCE-5712 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5712 Project: Hadoop Map/Reduce Issue Type: Improvement Components: scheduler Reporter: Ted Malaska Fix For: 1.3.0 Attachments: MAPREDUCE-5712 YARN-1423 introduced a quue police that support selecting a queue if a secondary group was found in the defined queues. This functionality would be useful and minimally invasive in MR1 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (MAPREDUCE-5712) Backport Fair Scheduler pool placement by secondary group
[ https://issues.apache.org/jira/browse/MAPREDUCE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved MAPREDUCE-5712. --- Resolution: Fixed Assignee: Ted Malaska Hadoop Flags: Reviewed I just committed this to branch-1 Backport Fair Scheduler pool placement by secondary group - Key: MAPREDUCE-5712 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5712 Project: Hadoop Map/Reduce Issue Type: Improvement Components: scheduler Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 1.3.0 Attachments: MAPREDUCE-5712 YARN-1423 introduced a quue police that support selecting a queue if a secondary group was found in the defined queues. This functionality would be useful and minimally invasive in MR1 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5712) Backport Fair Scheduler pool placement by secondary group
[ https://issues.apache.org/jira/browse/MAPREDUCE-5712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13867304#comment-13867304 ] Sandy Ryza commented on MAPREDUCE-5712: --- The code changes look good to me. Have you done any verification? Backport Fair Scheduler pool placement by secondary group - Key: MAPREDUCE-5712 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5712 Project: Hadoop Map/Reduce Issue Type: Improvement Components: scheduler Reporter: Ted Malaska Fix For: 1.3.0 Attachments: MAPREDUCE-5712 YARN-1423 introduced a quue police that support selecting a queue if a secondary group was found in the defined queues. This functionality would be useful and minimally invasive in MR1 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5707) JobClient does not allow setting RPC timeout for communications with JT/RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5707: -- Summary: JobClient does not allow setting RPC timeout for communications with JT/RM (was: JobClient does not allow to setting RPC timeout for communications with JT/RM) JobClient does not allow setting RPC timeout for communications with JT/RM -- Key: MAPREDUCE-5707 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5707 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 1.2.1, 2.2.0 Reporter: Gilad Wolff The ApplicationClientProtocolPBClientImpl c'tor (and the JobClient 0.20.2 c'tor as well) creates an rpc proxy that eventually uses '0' as the rpcTimeout: {code} public ApplicationClientProtocolPBClientImpl(long clientVersion, InetSocketAddress addr, Configuration conf) throws IOException { RPC.setProtocolEngine(conf, ApplicationClientProtocolPB.class, ProtobufRpcEngine.class); proxy = RPC.getProxy(ApplicationClientProtocolPB.class, clientVersion, addr, conf); } {code} which leads to this call in RPC: {code} public static T ProtocolProxyT getProtocolProxy(ClassT protocol, long clientVersion, InetSocketAddress addr, UserGroupInformation ticket, Configuration conf, SocketFactory factory) throws IOException { return getProtocolProxy( protocol, clientVersion, addr, ticket, conf, factory, 0, null); {code} (the '0' above is the rpc timeout). Clients should be able to specify the rpc timeout. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5707) JobClient does not allow to setting RPC timeout for communications with JT/RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5707: -- Summary: JobClient does not allow to setting RPC timeout for communications with JT/RM (was: ApplicationClientProtocolPBClientImpl (and JobClient) does not allow to set rpcTimeout) JobClient does not allow to setting RPC timeout for communications with JT/RM - Key: MAPREDUCE-5707 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5707 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 1.2.1, 2.2.0 Reporter: Gilad Wolff The ApplicationClientProtocolPBClientImpl c'tor (and the JobClient 0.20.2 c'tor as well) creates an rpc proxy that eventually uses '0' as the rpcTimeout: {code} public ApplicationClientProtocolPBClientImpl(long clientVersion, InetSocketAddress addr, Configuration conf) throws IOException { RPC.setProtocolEngine(conf, ApplicationClientProtocolPB.class, ProtobufRpcEngine.class); proxy = RPC.getProxy(ApplicationClientProtocolPB.class, clientVersion, addr, conf); } {code} which leads to this call in RPC: {code} public static T ProtocolProxyT getProtocolProxy(ClassT protocol, long clientVersion, InetSocketAddress addr, UserGroupInformation ticket, Configuration conf, SocketFactory factory) throws IOException { return getProtocolProxy( protocol, clientVersion, addr, ticket, conf, factory, 0, null); {code} (the '0' above is the rpc timeout). Clients should be able to specify the rpc timeout. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5707) JobClient does not allow to setting RPC timeout for communications with JT/RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5707: -- Affects Version/s: (was: 0.20.2) 1.2.1 JobClient does not allow to setting RPC timeout for communications with JT/RM - Key: MAPREDUCE-5707 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5707 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 1.2.1, 2.2.0 Reporter: Gilad Wolff The ApplicationClientProtocolPBClientImpl c'tor (and the JobClient 0.20.2 c'tor as well) creates an rpc proxy that eventually uses '0' as the rpcTimeout: {code} public ApplicationClientProtocolPBClientImpl(long clientVersion, InetSocketAddress addr, Configuration conf) throws IOException { RPC.setProtocolEngine(conf, ApplicationClientProtocolPB.class, ProtobufRpcEngine.class); proxy = RPC.getProxy(ApplicationClientProtocolPB.class, clientVersion, addr, conf); } {code} which leads to this call in RPC: {code} public static T ProtocolProxyT getProtocolProxy(ClassT protocol, long clientVersion, InetSocketAddress addr, UserGroupInformation ticket, Configuration conf, SocketFactory factory) throws IOException { return getProtocolProxy( protocol, clientVersion, addr, ticket, conf, factory, 0, null); {code} (the '0' above is the rpc timeout). Clients should be able to specify the rpc timeout. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5710) Backport MAPREDUCE-1305 to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5710: -- Summary: Backport MAPREDUCE-1305 to branch-1 (was: To backport MAPREDUCE-1305 to branch-1) Backport MAPREDUCE-1305 to branch-1 --- Key: MAPREDUCE-5710 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5710 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 1.3.0 Attachments: MAPREDUCE-5710.001.patch, MAPREDUCE-5710.002.patch File this bug for backporting MAPREDUCE-1305 to branch-1. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (MAPREDUCE-5651) Backport Fair Scheduler queue placement policies to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved MAPREDUCE-5651. --- Resolution: Fixed Fix Version/s: 1.3.0 Hadoop Flags: Reviewed I just committed this to branch-1. Thanks Ted! Backport Fair Scheduler queue placement policies to branch-1 Key: MAPREDUCE-5651 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5651 Project: Hadoop Map/Reduce Issue Type: Improvement Components: scheduler Reporter: Sandy Ryza Assignee: Ted Malaska Fix For: 1.3.0 Attachments: MAPREDUCE-5651.2.patch, MAPREDUCE-5651.3.patch, MAPREDUCE-5651.4.patch, MAPREDUCE-5651.5.patch, MAPREDUCE-5651.patch YARN-1392 introduced general policies for assigning applications to queues in the YARN fair scheduler. This functionality would be useful and minimally invasive in MR1 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5611) CombineFileInputFormat only requests a single location per split when more could be optimal
[ https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863446#comment-13863446 ] Sandy Ryza commented on MAPREDUCE-5611: --- bq. What if, JT is not able to schedule tasks on this node (slot limitation etc). Then it will pick any random node and schedule the task (having all the blocks non local). That's right. However, picking a node with a small fraction of the input data is not much better than picking a node without any of the input data. It is only useful to place a task on a node if the majority of the data is on that node. There may be more optimal approaches to this that take into account the number of bytes on each node, but I think using the intersection is a good start that we know will not cause perf regressions. bq. What if there is no intersection i.e common nodes for blocks in a split? The change was proposed to affect the code where we are building splits out of the nodeToBlocks map. In this part of the split creation process, there will always be an intersection because the blocks are all chosen from a specific node. CombineFileInputFormat only requests a single location per split when more could be optimal --- Key: MAPREDUCE-5611 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Chandra Prakash Bhagtani Assignee: Chandra Prakash Bhagtani Attachments: CombineFileInputFormat-trunk.patch I have come across an issue with CombineFileInputFormat. Actually I ran a hive query on approx 1.2 GB data with CombineHiveInputFormat which internally uses CombineFileInputFormat. My cluster size is 9 datanodes and max.split.size is 256 MB When I ran this query with replication factor 9, hive consistently creates all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local and 1 data local tasks. When replication factor is 9 (equal to cluster size), all the tasks should be data-local as each datanode contains all the replicas of the input data, but that is not happening i.e all the tasks are rack-local. When I dug into CombineFileInputFormat.java code in getMoreSplits method, I found the issue with the following snippet (specially in case of higher replication factor) {code:title=CombineFileInputFormat.java|borderStyle=solid} for (IteratorMap.EntryString, ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator(); iter.hasNext();) { Map.EntryString, ListOneBlockInfo one = iter.next(); nodes.add(one.getKey()); ListOneBlockInfo blocksInNode = one.getValue(); // for each block, copy it into validBlocks. Delete it from // blockToNodes so that the same block does not appear in // two different splits. for (OneBlockInfo oneblock : blocksInNode) { if (blockToNodes.containsKey(oneblock)) { validBlocks.add(oneblock); blockToNodes.remove(oneblock); curSplitSize += oneblock.length; // if the accumulated split size exceeds the maximum, then // create this split. if (maxSize != 0 curSplitSize = maxSize) { // create an input split and add it to the splits array addCreatedSplit(splits, nodes, validBlocks); curSplitSize = 0; validBlocks.clear(); } } } {code} First node in the map nodeToBlocks has all the replicas of input file, so the above code creates 6 splits all with only one location. Now if JT doesn't schedule these tasks on that node, all the tasks will be rack-local, even though all the other datanodes have all the other replicas. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5651) Backport Fair Scheduler queue placement policies to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13863910#comment-13863910 ] Sandy Ryza commented on MAPREDUCE-5651: --- One more thing: have you checked that all the tests in TestFairScheduler pass? Backport Fair Scheduler queue placement policies to branch-1 Key: MAPREDUCE-5651 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5651 Project: Hadoop Map/Reduce Issue Type: Improvement Components: scheduler Reporter: Sandy Ryza Assignee: Ted Malaska Attachments: MAPREDUCE-5651.2.patch, MAPREDUCE-5651.3.patch, MAPREDUCE-5651.4.patch, MAPREDUCE-5651.5.patch, MAPREDUCE-5651.patch YARN-1392 introduced general policies for assigning applications to queues in the YARN fair scheduler. This functionality would be useful and minimally invasive in MR1 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5651) Backport Fair Scheduler queue placement policies to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5651: -- Assignee: Ted Malaska Backport Fair Scheduler queue placement policies to branch-1 Key: MAPREDUCE-5651 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5651 Project: Hadoop Map/Reduce Issue Type: Improvement Components: scheduler Reporter: Sandy Ryza Assignee: Ted Malaska Attachments: MAPREDUCE-5651.2.patch, MAPREDUCE-5651.3.patch, MAPREDUCE-5651.4.patch, MAPREDUCE-5651.5.patch, MAPREDUCE-5651.patch YARN-1392 introduced general policies for assigning applications to queues in the YARN fair scheduler. This functionality would be useful and minimally invasive in MR1 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5651) Backport Fair Scheduler queue placement policies to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860701#comment-13860701 ] Sandy Ryza commented on MAPREDUCE-5651: --- Thanks Ted. getSimplePlacementRules still needs the change at the bottom of my last comment. After that, the patch looks good to me. Backport Fair Scheduler queue placement policies to branch-1 Key: MAPREDUCE-5651 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5651 Project: Hadoop Map/Reduce Issue Type: Improvement Components: scheduler Reporter: Sandy Ryza Attachments: MAPREDUCE-5651.2.patch, MAPREDUCE-5651.3.patch, MAPREDUCE-5651.patch YARN-1392 introduced general policies for assigning applications to queues in the YARN fair scheduler. This functionality would be useful and minimally invasive in MR1 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5651) Backport Fair Scheduler queue placement policies to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859648#comment-13859648 ] Sandy Ryza commented on MAPREDUCE-5651: --- Thanks Ted. A few more comments: {code} +//CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING, +placementPolicyConfig.setClass(hadoop.security.group.mapping, {code} Can take out the comment and just use CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING as the argument? All mentions of queue should still be replaced with mentions of pool. {code} +rules.add(new QueuePlacementRule.Specified().initialize(true, null)); {code} In simple placement rules, create param on the specified rule should only be true if allow undeclared pools property is true. Backport Fair Scheduler queue placement policies to branch-1 Key: MAPREDUCE-5651 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5651 Project: Hadoop Map/Reduce Issue Type: Improvement Components: scheduler Reporter: Sandy Ryza Attachments: MAPREDUCE-5651.2.patch, MAPREDUCE-5651.patch YARN-1392 introduced general policies for assigning applications to queues in the YARN fair scheduler. This functionality would be useful and minimally invasive in MR1 as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5698) Backport MAPREDUCE-1285 to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13859677#comment-13859677 ] Sandy Ryza commented on MAPREDUCE-5698: --- +1 Backport MAPREDUCE-1285 to branch-1 --- Key: MAPREDUCE-5698 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5698 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 1.3.0 Attachments: MAPREDUCE-5698.001.patch I found that MAPREDUCE-1285 is not in branch-1. File this issue for backporting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5698) Backport MAPREDUCE-1285 to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5698: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this. Thanks Yongjun! Backport MAPREDUCE-1285 to branch-1 --- Key: MAPREDUCE-5698 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5698 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 1.3.0 Attachments: MAPREDUCE-5698.001.patch I found that MAPREDUCE-1285 is not in branch-1. File this issue for backporting. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5691) Throttle shuffle's bandwidth utilization
[ https://issues.apache.org/jira/browse/MAPREDUCE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857249#comment-13857249 ] Sandy Ryza commented on MAPREDUCE-5691: --- In that case, adding network IO as a YARN resource and limiting it using cgroups might be a way to solve this problem as well. Throttle shuffle's bandwidth utilization Key: MAPREDUCE-5691 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5691 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Liyin Liang Attachments: ganglia-slave.jpg In our hadoop cluster, a reducer of a big job can utilize all the bandwidth during shuffle phase. Then any task reading data from the machine which running that reducer becomes very very slow. It's better to move DataTransferThrottler from hadoop-hdfs to hadoop-common. And create a throttler for Shuffle to throttle each Fetcher. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5692) Add explicit diagnostics when a task attempt is killed due to speculative execution
[ https://issues.apache.org/jira/browse/MAPREDUCE-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5692: -- Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this to trunk and branch-2. Thanks Gera! Add explicit diagnostics when a task attempt is killed due to speculative execution --- Key: MAPREDUCE-5692 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5692 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.2.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 2.4.0 Attachments: MAPREDUCE-5692.v01.patch, MAPREDUCE-5692.v02.patch We need to clearly indicate when a task attempt is killed because another task attempt succeeded first when speculative execution is enabled. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5550) Task Status message (reporter.setStatus) not shown in UI with Hadoop 2.0
[ https://issues.apache.org/jira/browse/MAPREDUCE-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854508#comment-13854508 ] Sandy Ryza commented on MAPREDUCE-5550: --- Just noticed: what's the reason for this change? {code} -ta.getLaunchTime() - ta.getShuffleFinishTime(), elapsedShuffleTime); +ta.getShuffleFinishTime() - ta.getLaunchTime(), elapsedShuffleTime); {code} Task Status message (reporter.setStatus) not shown in UI with Hadoop 2.0 Key: MAPREDUCE-5550 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5550 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.5-alpha Reporter: Vrushali C Assignee: Gera Shegalov Attachments: MAPREDUCE-5550.v01.patch, MAPREDUCE-5550.v02.patch, MAPREDUCE-5550.v03.patch, MAPREDUCE-5550.v04.patch, Map_tasks_new_UI.png, Map_tasks_oldUI.png, Screen Shot 2013-10-15 at 11.15.24 AM.png, Screen Shot 2013-10-15 at 11.16.02 AM.png Hadoop 1.0 JobTracker UI displays task status message when list of mapper or reduce tasks are listed. This give an idea of how that task is making progress. Hadoop 2.0 AM/JHS UI does not have this. It would be good to have this on AM/JHS UI. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5550) Task Status message (reporter.setStatus) not shown in UI with Hadoop 2.0
[ https://issues.apache.org/jira/browse/MAPREDUCE-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854546#comment-13854546 ] Sandy Ryza commented on MAPREDUCE-5550: --- Makes sense. +1 Task Status message (reporter.setStatus) not shown in UI with Hadoop 2.0 Key: MAPREDUCE-5550 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5550 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.5-alpha Reporter: Vrushali C Assignee: Gera Shegalov Attachments: MAPREDUCE-5550.v01.patch, MAPREDUCE-5550.v02.patch, MAPREDUCE-5550.v03.patch, MAPREDUCE-5550.v04.patch, Map_tasks_new_UI.png, Map_tasks_oldUI.png, Screen Shot 2013-10-15 at 11.15.24 AM.png, Screen Shot 2013-10-15 at 11.16.02 AM.png Hadoop 1.0 JobTracker UI displays task status message when list of mapper or reduce tasks are listed. This give an idea of how that task is making progress. Hadoop 2.0 AM/JHS UI does not have this. It would be good to have this on AM/JHS UI. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5550) Task Status message (reporter.setStatus) not shown in UI with Hadoop 2.0
[ https://issues.apache.org/jira/browse/MAPREDUCE-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated MAPREDUCE-5550: -- Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this to trunk and branch-2. Thanks Gera. Task Status message (reporter.setStatus) not shown in UI with Hadoop 2.0 Key: MAPREDUCE-5550 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5550 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.5-alpha Reporter: Vrushali C Assignee: Gera Shegalov Fix For: 2.4.0 Attachments: MAPREDUCE-5550.v01.patch, MAPREDUCE-5550.v02.patch, MAPREDUCE-5550.v03.patch, MAPREDUCE-5550.v04.patch, Map_tasks_new_UI.png, Map_tasks_oldUI.png, Screen Shot 2013-10-15 at 11.15.24 AM.png, Screen Shot 2013-10-15 at 11.16.02 AM.png Hadoop 1.0 JobTracker UI displays task status message when list of mapper or reduce tasks are listed. This give an idea of how that task is making progress. Hadoop 2.0 AM/JHS UI does not have this. It would be good to have this on AM/JHS UI. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5691) Throttle shuffle's bandwidth utilization
[ https://issues.apache.org/jira/browse/MAPREDUCE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854570#comment-13854570 ] Sandy Ryza commented on MAPREDUCE-5691: --- Would the throttling go on the server (NodeManager) side or the client (reducer) side? Throttle shuffle's bandwidth utilization Key: MAPREDUCE-5691 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5691 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Liyin Liang In our hadoop cluster, a reducer of a big job can utilize all the bandwidth during shuffle phase. Then any task reading data from the machine which running that reducer becomes very very slow. It's better to move DataTransferThrottler from hadoop-hdfs to hadoop-common. And create a throttler for Shuffle to throttle each Fetcher. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5650) Job fails when hprof mapreduce.task.profile.map/reduce.params is specified
[ https://issues.apache.org/jira/browse/MAPREDUCE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853061#comment-13853061 ] Sandy Ryza commented on MAPREDUCE-5650: --- In that case I agree with you that this is a bug. MapReduce properties that have specific versions map/reduce versions (e.g mapreduce.map.java.opts) should ignore the generic version (e.g. mapred.child.java.opts) when the specific versions are specified. I'd like to get some other opinions on whether this is compatible before committing it though. Job fails when hprof mapreduce.task.profile.map/reduce.params is specified -- Key: MAPREDUCE-5650 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5650 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.2.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5650.v01.patch, MAPREDUCE-5650.v02.patch When one uses dedicated hprof mapreduce.task.profile.map.params or mapreduce.task.profile.reduce.params, the profiled tasks will fail to launch because hprof parameters are supplied to the child jvm twice. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5650) Job fails when hprof mapreduce.task.profile.map/reduce.params is specified
[ https://issues.apache.org/jira/browse/MAPREDUCE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853065#comment-13853065 ] Sandy Ryza commented on MAPREDUCE-5650: --- For reference, here is the JIRA where these options were added: MAPREDUCE-3426. It seems unrelated to them. Job fails when hprof mapreduce.task.profile.map/reduce.params is specified -- Key: MAPREDUCE-5650 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5650 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.2.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5650.v01.patch, MAPREDUCE-5650.v02.patch When one uses dedicated hprof mapreduce.task.profile.map.params or mapreduce.task.profile.reduce.params, the profiled tasks will fail to launch because hprof parameters are supplied to the child jvm twice. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5692) Add explicit diagnostics when a task attempt is killed due to speculative execution
[ https://issues.apache.org/jira/browse/MAPREDUCE-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853396#comment-13853396 ] Sandy Ryza commented on MAPREDUCE-5692: --- The patch looks good to me other than a couple small things: {code} +task.eventHandler.handle(new TaskAttemptKillEvent(attemptID, + SPECULATION + task.commitAttempt + committed first!)); {code} Second line should be indented four spaces past the first line. This applies to a few places. {code} -job.setSpeculativeExecution(false); {code} Why is this necessary? {code} + private static TaskAttempt[] makeFirstAttemptWin( +EventHandler appEventHandler, Task speculatedTask) + { {code} Curly brace should go on same line as method definition {code} +TaskAttempt[] ta;// finish 1st TA, 2nd will be killed +final IteratorTaskAttempt it = speculatedTask.getAttempts(). + values().iterator(); +ta = new TaskAttempt[] { it.next(), it.next()}; {code} can use toArray here? Add explicit diagnostics when a task attempt is killed due to speculative execution --- Key: MAPREDUCE-5692 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5692 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.2.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5692.v01.patch We need to clearly indicate when a task attempt is killed because another task attempt succeeded first when speculative execution is enabled. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5692) Add explicit diagnostics when a task attempt is killed due to speculative execution
[ https://issues.apache.org/jira/browse/MAPREDUCE-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13853476#comment-13853476 ] Sandy Ryza commented on MAPREDUCE-5692: --- +1 pending jenkins Add explicit diagnostics when a task attempt is killed due to speculative execution --- Key: MAPREDUCE-5692 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5692 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.2.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: MAPREDUCE-5692.v01.patch, MAPREDUCE-5692.v02.patch We need to clearly indicate when a task attempt is killed because another task attempt succeeded first when speculative execution is enabled. -- This message was sent by Atlassian JIRA (v6.1.4#6159)