[jira] [Comment Edited] (MAPREDUCE-5119) Splitting issue when using NLineInputFormat with compression

2013-03-29 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617327#comment-13617327
 ] 

Suresh Srinivas edited comment on MAPREDUCE-5119 at 3/29/13 1:28 PM:
-

bq. It could be a bug, for Hadoop not splitting compressed data correctly using 
NLineInputFormat. 
The description of the jira made it sound like you were asking a question. 
There are many such jiras created in Hadoop where jira is misused for asking 
questions. Perhaps this could be a bug. So reopening is the right thing to do. 
I will ask someone with more mapreduce background to comment on this.

I am also moving this to jira to MapReduce.

  was (Author: sureshms):
bq. It could be a bug, for Hadoop not splitting compressed data correctly 
using NLineInputFormat. 
The description of the jira made is sound like you were asking a question. 
There are many such jiras created in Hadoop where jira is misused for asking 
questions. Perhaps this could be a bug. So reopening is the right thing to do. 
I will ask someone with more mapreduce background to comment on this.

I am also moving this to jira to MapReduce.
  
 Splitting issue when using NLineInputFormat with compression
 

 Key: MAPREDUCE-5119
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5119
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.2
 Environment: Try in Apache Hadoop 1.1.1, CDH4, and Amazon EMR. Same 
 result.
Reporter: Qiming He
Priority: Minor

 #make a long text line. It seems only long line text causing issue.
 $ cat abook.txt | base64 –w 0 onelinetext.b64 #200KB+ long
 $ hadoop fs –put onelinetext.b64 /input/onelinetext.b64
 $ hadoop jar hadoop-streaming.jar  \
 -input /input/onelinetext.b64 \
 -output /output \
 -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat \
 –mapper wc 
 Num task: 1, and output has one line:
 Line 1: 1 2 202699
 which makes sense because one line per mapper is intended.
 Then, using compression with NLineInputFormat 
 $ bzip2 onelinetext.b64
 $ hadoop fs –put onelinetext.b64.bz2  /input/onelinetext.b64.bz2
 $ hadoop jar hadoop-streaming.jar \
   -Dmapred.input.compress=true \
   
 -Dmapred.input.compression.codec=org.apache.hadoop.io.compress.GzipCodec \
   -input /input/onelinetext.b64.gz \
   -output /output \
   -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat \
   –mapper wc 
 I am expecting the same results as above, 'coz decompressing should occur 
 before processing one-line text (i.e. wc), however, I am getting:
 Num task: 397 (or other large numbers depend on environments), and output has 
 397 lines:
 Line1-396: 0 0 0
 Line 397: 1 2 202699
 Any idea why so many mapred.map.tasks 1? Is it incorrect splitting? I 
 purposely choose gzip because I believe it is NOT split-able. I got similar 
 results when using bzip2 and lzop codecs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5086) MR app master deletes staging dir when sent a reboot command from the RM

2013-03-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617757#comment-13617757
 ] 

Jason Lowe commented on MAPREDUCE-5086:
---

There are other examples of tests using MRAppMaster during multiple tests, and 
MRApp is one example of how this could be handled.  For example, we could put a 
DefaultMetricsSystem.shutdown call in a try..finally block around the tests or 
have an After method that calls it to make sure the metric registrations are 
cleaned up after each test.

 MR app master deletes staging dir when sent a reboot command from the RM
 

 Key: MAPREDUCE-5086
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5086
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: jian he
Assignee: jian he
 Attachments: YARN-472.1.patch, YARN-472.2.patch, YARN-472.3.patch, 
 YARN-472.4.patch, YARN-472.5.patch, YARN-472.6.patch


 If the RM is restarted when the MR job is running, then it sends a reboot 
 command to the job. The job ends up deleting the staging dir and that causes 
 the next attempt to fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core

2013-03-29 Thread Andrey Klochkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Klochkov updated MAPREDUCE-4980:
---

Attachment: MAPREDUCE-4980--n3.patch

Updating the patch according to changes in trunk

 Parallel test execution of hadoop-mapreduce-client-core
 ---

 Key: MAPREDUCE-4980
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, 
 MAPREDUCE-4980.patch


 The maven surefire plugin supports parallel testing feature. By using it, the 
 tests can be run more faster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core

2013-03-29 Thread Andrey Klochkov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617813#comment-13617813
 ] 

Andrey Klochkov commented on MAPREDUCE-4980:


I believe something's wrong with the QA robot. The patch is perfectly 
applicable to the current trunk.

 Parallel test execution of hadoop-mapreduce-client-core
 ---

 Key: MAPREDUCE-4980
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA
 Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, 
 MAPREDUCE-4980.patch


 The maven surefire plugin supports parallel testing feature. By using it, the 
 tests can be run more faster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira