[jira] [Comment Edited] (MAPREDUCE-5119) Splitting issue when using NLineInputFormat with compression
[ https://issues.apache.org/jira/browse/MAPREDUCE-5119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617327#comment-13617327 ] Suresh Srinivas edited comment on MAPREDUCE-5119 at 3/29/13 1:28 PM: - bq. It could be a bug, for Hadoop not splitting compressed data correctly using NLineInputFormat. The description of the jira made it sound like you were asking a question. There are many such jiras created in Hadoop where jira is misused for asking questions. Perhaps this could be a bug. So reopening is the right thing to do. I will ask someone with more mapreduce background to comment on this. I am also moving this to jira to MapReduce. was (Author: sureshms): bq. It could be a bug, for Hadoop not splitting compressed data correctly using NLineInputFormat. The description of the jira made is sound like you were asking a question. There are many such jiras created in Hadoop where jira is misused for asking questions. Perhaps this could be a bug. So reopening is the right thing to do. I will ask someone with more mapreduce background to comment on this. I am also moving this to jira to MapReduce. Splitting issue when using NLineInputFormat with compression Key: MAPREDUCE-5119 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5119 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.1.2 Environment: Try in Apache Hadoop 1.1.1, CDH4, and Amazon EMR. Same result. Reporter: Qiming He Priority: Minor #make a long text line. It seems only long line text causing issue. $ cat abook.txt | base64 –w 0 onelinetext.b64 #200KB+ long $ hadoop fs –put onelinetext.b64 /input/onelinetext.b64 $ hadoop jar hadoop-streaming.jar \ -input /input/onelinetext.b64 \ -output /output \ -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat \ –mapper wc Num task: 1, and output has one line: Line 1: 1 2 202699 which makes sense because one line per mapper is intended. Then, using compression with NLineInputFormat $ bzip2 onelinetext.b64 $ hadoop fs –put onelinetext.b64.bz2 /input/onelinetext.b64.bz2 $ hadoop jar hadoop-streaming.jar \ -Dmapred.input.compress=true \ -Dmapred.input.compression.codec=org.apache.hadoop.io.compress.GzipCodec \ -input /input/onelinetext.b64.gz \ -output /output \ -inputformat org.apache.hadoop.mapred.lib.NLineInputFormat \ –mapper wc I am expecting the same results as above, 'coz decompressing should occur before processing one-line text (i.e. wc), however, I am getting: Num task: 397 (or other large numbers depend on environments), and output has 397 lines: Line1-396: 0 0 0 Line 397: 1 2 202699 Any idea why so many mapred.map.tasks 1? Is it incorrect splitting? I purposely choose gzip because I believe it is NOT split-able. I got similar results when using bzip2 and lzop codecs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5086) MR app master deletes staging dir when sent a reboot command from the RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617757#comment-13617757 ] Jason Lowe commented on MAPREDUCE-5086: --- There are other examples of tests using MRAppMaster during multiple tests, and MRApp is one example of how this could be handled. For example, we could put a DefaultMetricsSystem.shutdown call in a try..finally block around the tests or have an After method that calls it to make sure the metric registrations are cleaned up after each test. MR app master deletes staging dir when sent a reboot command from the RM Key: MAPREDUCE-5086 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5086 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: jian he Assignee: jian he Attachments: YARN-472.1.patch, YARN-472.2.patch, YARN-472.3.patch, YARN-472.4.patch, YARN-472.5.patch, YARN-472.6.patch If the RM is restarted when the MR job is running, then it sends a reboot command to the job. The job ends up deleting the staging dir and that causes the next attempt to fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Klochkov updated MAPREDUCE-4980: --- Attachment: MAPREDUCE-4980--n3.patch Updating the patch according to changes in trunk Parallel test execution of hadoop-mapreduce-client-core --- Key: MAPREDUCE-4980 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980 Project: Hadoop Map/Reduce Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, MAPREDUCE-4980.patch The maven surefire plugin supports parallel testing feature. By using it, the tests can be run more faster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617813#comment-13617813 ] Andrey Klochkov commented on MAPREDUCE-4980: I believe something's wrong with the QA robot. The patch is perfectly applicable to the current trunk. Parallel test execution of hadoop-mapreduce-client-core --- Key: MAPREDUCE-4980 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980 Project: Hadoop Map/Reduce Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-4980.1.patch, MAPREDUCE-4980--n3.patch, MAPREDUCE-4980.patch The maven surefire plugin supports parallel testing feature. By using it, the tests can be run more faster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira