[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-21 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371926#comment-16371926
 ] 

Sean Mackrory commented on HADOOP-6852:
---

Committed - thanks for getting to the bottom of this long-standing issue!

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-6852.01.patch, HADOOP-6852.02.patch, 
> HADOOP-6852.03.patch, HADOOP-6852.04.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371951#comment-16371951
 ] 

Hudson commented on HADOOP-6852:


SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13698 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13698/])
HADOOP-6852. apparent bug in concatenated-bzip2 support (decoding). 
(mackrorysd: rev 2bc3351eaf240ea685bcf5042d79f1554bf89e00)
* (edit) 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BZip2Codec.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestConcatenatedCompressedInput.java
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/testConcatThenCompress.txt.gz
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/concat.bz2
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/testCompressThenConcat.txt.gz
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/concat.gz
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/testConcatThenCompress.txt.bz2
* (add) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/resources/testdata/testCompressThenConcat.txt.bz2
* (edit) hadoop-client-modules/hadoop-client-minicluster/pom.xml


> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: HADOOP-6852.01.patch, HADOOP-6852.02.patch, 
> HADOOP-6852.03.patch, HADOOP-6852.04.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-21 Thread Zsolt Venczel (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371917#comment-16371917
 ] 

Zsolt Venczel commented on HADOOP-6852:
---

Thanks for checking [~mackrorysd]!

The binary files were first added into git by commit: 
a196766ea07775f18ded69bd9e8d239f8cfd3ccc as a restructuring described by 
HADOOP-7106. These were present prior to that in MR SVN repository I assume.

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch, HADOOP-6852.02.patch, 
> HADOOP-6852.03.patch, HADOOP-6852.04.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-21 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371694#comment-16371694
 ] 

Sean Mackrory commented on HADOOP-6852:
---

Actually one last thing: I'm not big on the idea of binary files checked into 
source control. Where did these come from? Since these are tests that were 
commented out at some point, I suspect you just restored them from before, but 
we should make sure that's documented. Ideally we'd document the source and how 
they were generated. I wonder if it's possible that generating them at run-time 
would cause problems to get missed because we're testing recently compressed 
files and not files compressed with an old implementation. Both should work. So 
overall I'm okay committing this as-is, but I'd like to document where the 
binaries came from.

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch, HADOOP-6852.02.patch, 
> HADOOP-6852.03.patch, HADOOP-6852.04.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-21 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371680#comment-16371680
 ] 

Sean Mackrory commented on HADOOP-6852:
---

Thanks for fixing the checkstyle issue. Test failure appears unrelated to me - 
I ran tests with and without your patch locally and saw no difference, except 
for tests that used to be commented out now work. Will commit today.

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch, HADOOP-6852.02.patch, 
> HADOOP-6852.03.patch, HADOOP-6852.04.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-21 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371568#comment-16371568
 ] 

genericqa commented on HADOOP-6852:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 22m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-client-modules/hadoop-client-minicluster {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
 8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 18m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
36s{color} | {color:green} root: The patch generated 0 new + 69 unchanged - 11 
fixed = 69 total (was 80) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 44s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-client-modules/hadoop-client-minicluster {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
55s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}139m 40s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
39s{color} | {color:green} hadoop-client-minicluster in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}267m  8s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image

[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-14 Thread Sean Mackrory (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365109#comment-16365109
 ] 

Sean Mackrory commented on HADOOP-6852:
---

So after much review and reading the code, I'm a +1. I'm not an expert on the 
compression code, though, so I'm gonna hold off on committing for 1 more day in 
case anyone else watching the issue wants to chime in. I'll re-run tests 
tomorrow on latest trunk as well.

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch, HADOOP-6852.02.patch, 
> HADOOP-6852.03.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-06 Thread Zsolt Venczel (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354517#comment-16354517
 ] 

Zsolt Venczel commented on HADOOP-6852:
---

The above two unit test failures seem to be unrelated to the provided patch.

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch, HADOOP-6852.02.patch, 
> HADOOP-6852.03.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-06 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354289#comment-16354289
 ] 

genericqa commented on HADOOP-6852:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
59s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  0s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-client-modules/hadoop-client-minicluster {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
40s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 
16s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m  6s{color} | {color:orange} root: The patch generated 1 new + 69 unchanged - 
11 fixed = 70 total (was 80) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m  
7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
2s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-client-modules/hadoop-client-minicluster {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
22s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}127m  5s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
31s{color} | {color:green} hadoop-client-minicluster in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
42s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}239m 46s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.mapreduce.v2.TestUberAM |
| 

[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-06 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353881#comment-16353881
 ] 

genericqa commented on HADOOP-6852:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 
31s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red} 18m 
23s{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 26s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-client-modules/hadoop-client-minicluster {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red} 17m 57s{color} 
| {color:red} root generated 183 new + 1054 unchanged - 0 fixed = 1237 total 
(was 1054) {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
2m 24s{color} | {color:orange} root: The patch generated 16 new + 71 unchanged 
- 9 fixed = 87 total (was 80) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
1s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-client-modules/hadoop-client-minicluster {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
18s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}130m 15s{color} 
| {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
3s{color} | {color:green} hadoop-client-minicluster in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
56s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}243m 49s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Faile

[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-05 Thread Zsolt Venczel (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352392#comment-16352392
 ] 

Zsolt Venczel commented on HADOOP-6852:
---

In the attached patch I have added previously available test files back, 
re-enabled test-cases and added a proposed fix:

within BZip2Codec I've changed the default read mode from CONTINUOUS to BYBLOCK 
at BZip2Codec.createInputStream(InputStream in, Decompressor decompressor) as 
BYBLOCK read mode handles concatenated bzip2 correctly and also this way it is 
consistent with mapred/LineRecordReader and input/LineRecordReader decompressor 
creation logic. As a result of the change the concatenated-bzip2 issue is fixed.

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2018-02-05 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352394#comment-16352394
 ] 

genericqa commented on HADOOP-6852:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  3m 
54s{color} | {color:red} Docker failed to build yetus/hadoop:5b98639. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-6852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12909222/HADOOP-6852.01.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HADOOP-Build/14070/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>Assignee: Zsolt Venczel
>Priority: Major
> Attachments: HADOOP-6852.01.patch
>
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2014-03-17 Thread Bob Tiernay (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938152#comment-13938152
 ] 

Bob Tiernay commented on HADOOP-6852:
-

Just hit this bug. Is there any plans to fix this bug, or do we as a community 
need to add something to a project like Elephant Bird to circumvent the issue? 
Thanks.

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2012-07-03 Thread Iker Jimenez (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405813#comment-13405813
 ] 

Iker Jimenez commented on HADOOP-6852:
--

This is happening to us too when we compress files with pbzip2.
Any ETA for a fix?

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2011-10-27 Thread Len Trigg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137479#comment-13137479
 ] 

Len Trigg commented on HADOOP-6852:
---

We have been using the ant-based bzip2 library for our project and needed to be 
able to decompress concatenated bzip files. After poking around we came across 
the hadoop extensions and immediately found that it did not function correctly 
due to this bug. Essentially when crossing block boundaries the 
skipToNextMarker method leaves the stream position at the end of the block 
delimiter, but initBlock expects to be at the beginning of the block delimiter. 
After looking at the poor structure of the initBlock method, and the 
thread-unsafety that has been introduced into this class with the 
numberOfBytesTillNextMarker() method, we decided to avoid the hadoop version of 
this class altogether. 

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2011-01-24 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986044#action_12986044
 ] 

Todd Lipcon commented on HADOOP-6852:
-

Wouter: do you have HADOOP-6925 in your build? Worth trying that to see if your 
problem is the same as Greg's or the one we fixed in the other JIRA.

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2010-08-26 Thread Wouter de Bie (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903063#action_12903063
 ] 

Wouter de Bie commented on HADOOP-6852:
---

I'm having the same problem. We write files from our logging application using 
the BZip2Codec, but our application rotates files on an hourly basis. Next to 
that, when we flush log lines, we compress, which in turn causes different 
bzip2 block sizes.
Anyway, we're also getting the 'bad block header' exception.
Looking through some code and trying different things, we get things to work 
when we change the following in 
org.apache.hadoop.io.compress.bzip2.CBZip2InputStream:

{noformat}
  public CBZip2InputStream(final InputStream in) throws IOException {
this(in, READ_MODE.CONTINUOUS);
  }
{noformat}

to

{noformat}
  public CBZip2InputStream(final InputStream in) throws IOException {
this(in, READ_MODE.BYBLOCK);
  }
{noformat}

Which causes CBZip2InputStream to use the block size in initBlock(), instead of 
looking for the magic numbers.

I'll be digging into some code more tomorrow, but to quickly move forward on 
this issue, I would like to know why is CONTINUOUS the default mode and is 
there a place where the read mode is determined? BZip2 is block based, so why 
not default to that? 

I'll also try the above piece of code and see if that works with BYBLOCK.



> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HADOOP-6852) apparent bug in concatenated-bzip2 support (decoding)

2010-08-24 Thread Greg Roelofs (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902268#action_12902268
 ] 

Greg Roelofs commented on HADOOP-6852:
--

Alas, it doesn't. :-(

> apparent bug in concatenated-bzip2 support (decoding)
> -
>
> Key: HADOOP-6852
> URL: https://issues.apache.org/jira/browse/HADOOP-6852
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 0.22.0
> Environment: Linux x86_64 running 32-bit Hadoop, JDK 1.6.0_15
>Reporter: Greg Roelofs
>
> The following simplified code (manually picked out of testMoreBzip2() in 
> https://issues.apache.org/jira/secure/attachment/12448272/HADOOP-6835.v4.trunk-hadoop-mapreduce.patch)
>  triggers a "java.io.IOException: bad block header" in 
> org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock( 
> CBZip2InputStream.java:527):
> {noformat}
> JobConf jobConf = new JobConf(defaultConf);
> CompressionCodec bzip2 = new BZip2Codec();
> ReflectionUtils.setConf(bzip2, jobConf);
> localFs.delete(workDir, true);
> // copy multiple-member test file to HDFS
> String fn2 = "testCompressThenConcat.txt" + bzip2.getDefaultExtension();
> Path fnLocal2 = new 
> Path(System.getProperty("test.concat.data","/tmp"),fn2);
> Path fnHDFS2  = new Path(workDir, fn2);
> localFs.copyFromLocalFile(fnLocal2, fnHDFS2);
> FileInputFormat.setInputPaths(jobConf, workDir);
> final FileInputStream in2 = new FileInputStream(fnLocal2.toString());
> CompressionInputStream cin2 = bzip2.createInputStream(in2);
> LineReader in = new LineReader(cin2);
> Text out = new Text();
> int numBytes, totalBytes=0, lineNum=0;
> while ((numBytes = in.readLine(out)) > 0) {
>   ++lineNum;
>   totalBytes += numBytes;
> }
> in.close();
> {noformat}
> The specified file is also included in the H-6835 patch linked above, and 
> some additional debug output is included in the commented-out test loop 
> above.  (Only in the linked, "v4" version of the patch, however--I'm about to 
> remove the debug stuff for checkin.)
> It's possible I've done something completely boneheaded here, but the file, 
> at least, checks out in a subsequent set of subtests and with stock bzip2 
> itself.  Only the code above is problematic; it reads through the first 
> concatenated chunk (17 lines of text) just fine but chokes on the header of 
> the second one.  Altogether, the test file contains 84 lines of text and 4 
> concatenated bzip2 files.
> (It's possible this is a mapreduce issue rather than common, but note that 
> the identical gzip test works fine.  Possibly it's related to the 
> stream-vs-decompressor dichotomy, though; intentionally not supported?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.