[jira] [Created] (MAPREDUCE-5045) UtilTest#isCygwin method appears to be unused
Chris Nauroth created MAPREDUCE-5045: Summary: UtilTest#isCygwin method appears to be unused Key: MAPREDUCE-5045 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5045 Project: Hadoop Map/Reduce Issue Type: Test Components: contrib/streaming, test Affects Versions: 3.0.0 Reporter: Chris Nauroth Priority: Trivial Method {{UtilTest#isCygwin}} in /hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/UtilTest.java appears to be unused. If so, then we need to remove it. If anything is calling it, then we need to update the naming to isWindows, or perhaps just change call sites to use {{Shell#WINDOWS}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5046) backport MAPREDUCE-1423 to mapred.lib.CombineFileInputFormat
Sangjin Lee created MAPREDUCE-5046: -- Summary: backport MAPREDUCE-1423 to mapred.lib.CombineFileInputFormat Key: MAPREDUCE-5046 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5046 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 1.1.1 Reporter: Sangjin Lee The CombineFileInputFormat class in org.apache.hadoop.mapred.lib (the old API) has a couple of issues. These issues were addressed in the new API (MAPREDUCE-1423), but the old class was not fixed. The main issue the JIRA refers to is a performance problem. However, IMO there is a more serious problem which is a thread-safety issue (rackToNodes) which was fixed alongside. What is the policy on addressing issues in the old API? Can we backport this to the old class? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters
Sandy Ryza created MAPREDUCE-5047: - Summary: keep.failed.task.files=true causes job failure on secure clusters Key: MAPREDUCE-5047 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047 Project: Hadoop Map/Reduce Issue Type: Bug Components: task, tasktracker Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza To support IsolationRunner, split info is written to local directories. This occurs inside MapTask#localizeConfiguration, which is called both tasktracker and by the child JVM. On a secure cluster, the tasktacker's attempt to write it fails, because the tasktracker does not have permission to write to the user's directory. It is likely that the call to localizeConfiguration in the tasktracker can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5048) streaming combiner feature breaks when input binary, output text
Antonio Piccolboni created MAPREDUCE-5048: - Summary: streaming combiner feature breaks when input binary, output text Key: MAPREDUCE-5048 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5048 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 1.0.2 Environment: centos 6.2 Reporter: Antonio Piccolboni When running hadoop streaming job with binary input and shuffling but text output with combiner on, it fails with error java.lang.RuntimeException: java.io.IOException: wrong key class: class org.apache.hadoop.io.Text is not class org.apache.hadoop.typedbytes.TypedBytesWritable repro: hadoop jar streaming jar -D 'stream.map.input=typedbytes' -D 'stream.map.output=typedbytes' -D 'stream.reduce.input=typedbytes' -input sequence file containing typedbytes -output any valid dir -mappercat -combiner cat -reducer cat -inputformat 'org.apache.hadoop.streaming.AutoInputFormat' if you remove the -combiner option, it works with only performance implications. If you specify in addition -D 'stream.reduce.output=typedbytes', it succeeds but outputs raw typedbytes (without the sequence file superstructure) I asked in the discussion of HADOOP-1722 (where typedbytes was first introduced) if this is a bug or my misunderstanding of that spec and a committer chipped in saying it seems a bug to him too. Originally reported by a user of the rmr2 package for R and filed by me here https://github.com/RevolutionAnalytics/rmr2/issues/16 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5049) CombineFileInputFormat counts all compressed files non-splitable
Sandy Ryza created MAPREDUCE-5049: - Summary: CombineFileInputFormat counts all compressed files non-splitable Key: MAPREDUCE-5049 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5049 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.1 Reporter: Sandy Ryza Assignee: Sandy Ryza In branch-1, CombineFileInputFormat doesn't take SplittableCompressionCodec into account and thinks that all compressible input files aren't splittable. This is a regression from when handling for non-splitable compression codecs was originally added in MAPREDUCE-1597, and seems to have somehow gotten in when the code was pulled from 0.22 to branch-1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira