[jira] [Created] (MAPREDUCE-5045) UtilTest#isCygwin method appears to be unused

2013-03-05 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5045:


 Summary: UtilTest#isCygwin method appears to be unused
 Key: MAPREDUCE-5045
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5045
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: contrib/streaming, test
Affects Versions: 3.0.0
Reporter: Chris Nauroth
Priority: Trivial


Method {{UtilTest#isCygwin}} in 
/hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/streaming/UtilTest.java
 appears to be unused.  If so, then we need to remove it.  If anything is 
calling it, then we need to update the naming to isWindows, or perhaps just 
change call sites to use {{Shell#WINDOWS}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5046) backport MAPREDUCE-1423 to mapred.lib.CombineFileInputFormat

2013-03-05 Thread Sangjin Lee (JIRA)
Sangjin Lee created MAPREDUCE-5046:
--

 Summary: backport MAPREDUCE-1423 to 
mapred.lib.CombineFileInputFormat
 Key: MAPREDUCE-5046
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5046
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 1.1.1
Reporter: Sangjin Lee


The CombineFileInputFormat class in org.apache.hadoop.mapred.lib (the old API) 
has a couple of issues. These issues were addressed in the new API 
(MAPREDUCE-1423), but the old class was not fixed.

The main issue the JIRA refers to is a performance problem. However, IMO there 
is a more serious problem which is a thread-safety issue (rackToNodes) which 
was fixed alongside.

What is the policy on addressing issues in the old API? Can we backport this to 
the old class?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5047) keep.failed.task.files=true causes job failure on secure clusters

2013-03-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5047:
-

 Summary: keep.failed.task.files=true causes job failure on secure 
clusters
 Key: MAPREDUCE-5047
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5047
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task, tasktracker
Affects Versions: 1.1.1
Reporter: Sandy Ryza
Assignee: Sandy Ryza


To support IsolationRunner, split info is written to local directories.  This 
occurs inside MapTask#localizeConfiguration, which is called both tasktracker 
and by the child JVM.  On a secure cluster, the tasktacker's attempt to write 
it fails, because the tasktracker does not have permission to write to the 
user's directory. It is likely that the call to localizeConfiguration in the 
tasktracker can be removed. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5048) streaming combiner feature breaks when input binary, output text

2013-03-05 Thread Antonio Piccolboni (JIRA)
Antonio Piccolboni created MAPREDUCE-5048:
-

 Summary: streaming combiner feature breaks when input binary, 
output text
 Key: MAPREDUCE-5048
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5048
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 1.0.2
 Environment: centos 6.2
Reporter: Antonio Piccolboni


When running hadoop streaming job with binary input and shuffling but text 
output with combiner on, it fails with error

java.lang.RuntimeException: java.io.IOException: wrong key class: class 
org.apache.hadoop.io.Text is not class 
org.apache.hadoop.typedbytes.TypedBytesWritable


repro:

hadoop jar streaming jar -D  'stream.map.input=typedbytes' -D 
'stream.map.output=typedbytes' -D 'stream.reduce.input=typedbytes'  
 -input  sequence file containing typedbytes -output  any valid dir  
-mappercat -combiner cat   -reducer cat -inputformat 
'org.apache.hadoop.streaming.AutoInputFormat'  

if you remove the -combiner option, it works with only performance 
implications. If you specify in addition -D 
'stream.reduce.output=typedbytes', it succeeds but outputs raw typedbytes 
(without the sequence file superstructure)

I asked in the discussion of HADOOP-1722 (where typedbytes was first 
introduced)  if this is a bug or my misunderstanding of that spec and a 
committer chipped in saying it seems a bug to him too.
Originally reported by a user of the rmr2 package for R and filed by me here 
https://github.com/RevolutionAnalytics/rmr2/issues/16

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5049) CombineFileInputFormat counts all compressed files non-splitable

2013-03-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5049:
-

 Summary: CombineFileInputFormat counts all compressed files 
non-splitable
 Key: MAPREDUCE-5049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5049
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1
Reporter: Sandy Ryza
Assignee: Sandy Ryza


In branch-1, CombineFileInputFormat doesn't take SplittableCompressionCodec 
into account and thinks that all compressible input files aren't splittable.  
This is a regression from when handling for non-splitable compression codecs 
was originally added in MAPREDUCE-1597, and seems to have somehow gotten in 
when the code was pulled from 0.22 to branch-1.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira