[jira] [Updated] (MAPREDUCE-5656) bzip2 codec can drop records when reading data in splits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5656: -- Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks to Nathan, Chris, and Vinay for the reviews! I committed this to trunk and branch-2. bzip2 codec can drop records when reading data in splits Key: MAPREDUCE-5656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5656 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.4-alpha, 0.23.8 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Fix For: 3.0.0, 2.4.0 Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, HADOOP-9622.patch, MAPREDUCE-5656-2.patch, MAPREDUCE-5656.patch, blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2 Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when reading them in splits based on where record delimiters occur relative to compression block boundaries. Thanks to [~knoguchi] for discovering this problem while working on PIG-3251. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5656) bzip2 codec can drop records when reading data in splits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5656: -- Attachment: MAPREDUCE-5656-2.patch Slightly updated patch to fix the spacing issue in SplitLineReader. bzip2 codec can drop records when reading data in splits Key: MAPREDUCE-5656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5656 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.4-alpha, 0.23.8 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, HADOOP-9622.patch, MAPREDUCE-5656-2.patch, MAPREDUCE-5656.patch, blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2 Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when reading them in splits based on where record delimiters occur relative to compression block boundaries. Thanks to [~knoguchi] for discovering this problem while working on PIG-3251. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5656) bzip2 codec can drop records when reading data in splits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5656: -- Attachment: MAPREDUCE-5656.patch Cleanup work in preparation for commit. Moving this JIRA to MAPREDUCE since it's primarily changes in that project. Also uploading a binary patch that can be applied with git-apply as reference of what will be committed (same patch as before with binary test files added). Awating Jenkins confirmation and commit of MAPREDUCE-5640 to avoid test name conflicts. bzip2 codec can drop records when reading data in splits Key: MAPREDUCE-5656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5656 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.4-alpha, 0.23.8 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, HADOOP-9622.patch, MAPREDUCE-5656.patch, blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2 Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when reading them in splits based on where record delimiters occur relative to compression block boundaries. Thanks to [~knoguchi] for discovering this problem while working on PIG-3251. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5656) bzip2 codec can drop records when reading data in splits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5656: -- Target Version/s: 2.3.0 Status: Patch Available (was: Open) bzip2 codec can drop records when reading data in splits Key: MAPREDUCE-5656 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5656 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.8, 2.0.4-alpha Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: HADOOP-9622-2.patch, HADOOP-9622-testcase.patch, HADOOP-9622.patch, MAPREDUCE-5656.patch, blockEndingInCR.txt.bz2, blockEndingInCRThenLF.txt.bz2 Bzip2Codec.BZip2CompressionInputStream can cause records to be dropped when reading them in splits based on where record delimiters occur relative to compression block boundaries. Thanks to [~knoguchi] for discovering this problem while working on PIG-3251. -- This message was sent by Atlassian JIRA (v6.1#6144)