[jira] [Commented] (HIVE-4788) RCFile and bzip2 compression not working
[ https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238822#comment-14238822 ] Mithun Radhakrishnan commented on HIVE-4788: @[~Navis]: Could you please clarify why this solves the problem? Wouldn't this have an effect on data that's compressed using, say, GZip? RCFile and bzip2 compression not working Key: HIVE-4788 URL: https://issues.apache.org/jira/browse/HIVE-4788 Project: Hive Issue Type: Bug Components: Compression Affects Versions: 0.10.0 Environment: CDH4.2 Reporter: Johndee Burks Assignee: Navis Priority: Minor Attachments: HIVE-4788.1.patch.txt, HIVE-4788.2.patch.txt The issue is that Bzip2 compressed rcfile data is encountering an error when being queried even the most simple query select *. The issue is easily reproducible using the following. Create a table and load the sample data below. DDL: create table source_data (a string, b string) row format delimited fields terminated by ','; Sample data: apple,sauce Test: Do the following and you should receive the error listed below for the rcfile table with bz2 compression. create table rc_nobz2 (a string, b string) stored as rcfile; insert into table rc_nobz2 select * from source_txt; SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; SET mapred.compress.map.output=true; SET mapred.output.compress=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; create table rc_bz2 (a string, b string) stored as rcfile; insert into table rc_bz2 select * from source_txt; hive select * from rc_bz2; Failed with exception java.io.IOException:java.io.IOException: Stream is not BZip2 formatted: expected 'h' as first byte but got '�' hive select * from rc_nobz2; apple sauce -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4788) RCFile and bzip2 compression not working
[ https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14100627#comment-14100627 ] Hive QA commented on HIVE-4788: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662453/HIVE-4788.2.patch.txt {color:green}SUCCESS:{color} +1 5820 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/378/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/378/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-378/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12662453 RCFile and bzip2 compression not working Key: HIVE-4788 URL: https://issues.apache.org/jira/browse/HIVE-4788 Project: Hive Issue Type: Bug Components: Compression Affects Versions: 0.10.0 Environment: CDH4.2 Reporter: Johndee Burks Assignee: Navis Priority: Minor Attachments: HIVE-4788.1.patch.txt, HIVE-4788.2.patch.txt The issue is that Bzip2 compressed rcfile data is encountering an error when being queried even the most simple query select *. The issue is easily reproducible using the following. Create a table and load the sample data below. DDL: create table source_data (a string, b string) row format delimited fields terminated by ','; Sample data: apple,sauce Test: Do the following and you should receive the error listed below for the rcfile table with bz2 compression. create table rc_nobz2 (a string, b string) stored as rcfile; insert into table rc_nobz2 select * from source_txt; SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; SET mapred.compress.map.output=true; SET mapred.output.compress=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; create table rc_bz2 (a string, b string) stored as rcfile; insert into table rc_bz2 select * from source_txt; hive select * from rc_bz2; Failed with exception java.io.IOException:java.io.IOException: Stream is not BZip2 formatted: expected 'h' as first byte but got '�' hive select * from rc_nobz2; apple sauce -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4788) RCFile and bzip2 compression not working
[ https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101699#comment-14101699 ] Navis commented on HIVE-4788: - This is really a simple patch. Could anyone review this? RCFile and bzip2 compression not working Key: HIVE-4788 URL: https://issues.apache.org/jira/browse/HIVE-4788 Project: Hive Issue Type: Bug Components: Compression Affects Versions: 0.10.0 Environment: CDH4.2 Reporter: Johndee Burks Assignee: Navis Priority: Minor Attachments: HIVE-4788.1.patch.txt, HIVE-4788.2.patch.txt The issue is that Bzip2 compressed rcfile data is encountering an error when being queried even the most simple query select *. The issue is easily reproducible using the following. Create a table and load the sample data below. DDL: create table source_data (a string, b string) row format delimited fields terminated by ','; Sample data: apple,sauce Test: Do the following and you should receive the error listed below for the rcfile table with bz2 compression. create table rc_nobz2 (a string, b string) stored as rcfile; insert into table rc_nobz2 select * from source_txt; SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; SET mapred.compress.map.output=true; SET mapred.output.compress=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; create table rc_bz2 (a string, b string) stored as rcfile; insert into table rc_bz2 select * from source_txt; hive select * from rc_bz2; Failed with exception java.io.IOException:java.io.IOException: Stream is not BZip2 formatted: expected 'h' as first byte but got '�' hive select * from rc_nobz2; apple sauce -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4788) RCFile and bzip2 compression not working
[ https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071891#comment-14071891 ] Hive QA commented on HIVE-4788: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12657276/HIVE-4788.1.patch.txt {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5737 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_fail_8 org.apache.hive.jdbc.TestJdbcDriver2.testParentReferences org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/20/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/20/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-20/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12657276 RCFile and bzip2 compression not working Key: HIVE-4788 URL: https://issues.apache.org/jira/browse/HIVE-4788 Project: Hive Issue Type: Bug Components: Compression Affects Versions: 0.10.0 Environment: CDH4.2 Reporter: Johndee Burks Assignee: Navis Priority: Minor Attachments: HIVE-4788.1.patch.txt The issue is that Bzip2 compressed rcfile data is encountering an error when being queried even the most simple query select *. The issue is easily reproducible using the following. Create a table and load the sample data below. DDL: create table source_data (a string, b string) row format delimited fields terminated by ','; Sample data: apple,sauce Test: Do the following and you should receive the error listed below for the rcfile table with bz2 compression. create table rc_nobz2 (a string, b string) stored as rcfile; insert into table rc_nobz2 select * from source_txt; SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; SET mapred.compress.map.output=true; SET mapred.output.compress=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; create table rc_bz2 (a string, b string) stored as rcfile; insert into table rc_bz2 select * from source_txt; hive select * from rc_bz2; Failed with exception java.io.IOException:java.io.IOException: Stream is not BZip2 formatted: expected 'h' as first byte but got '�' hive select * from rc_nobz2; apple sauce -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4788) RCFile and bzip2 compression not working
[ https://issues.apache.org/jira/browse/HIVE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745222#comment-13745222 ] Tzur Turkenitz commented on HIVE-4788: -- This bug is persistent on hive 0.10.0 shipped with HDP RCFile and bzip2 compression not working Key: HIVE-4788 URL: https://issues.apache.org/jira/browse/HIVE-4788 Project: Hive Issue Type: Bug Components: Compression Affects Versions: 0.10.0 Environment: CDH4.2 Reporter: Johndee Burks Priority: Minor The issue is that Bzip2 compressed rcfile data is encountering an error when being queried even the most simple query select *. The issue is easily reproducible using the following. Create a table and load the sample data below. DDL: create table source_data (a string, b string) row format delimited fields terminated by ','; Sample data: apple,sauce Test: Do the following and you should receive the error listed below for the rcfile table with bz2 compression. create table rc_nobz2 (a string, b string) stored as rcfile; insert into table rc_nobz2 select * from source_txt; SET io.seqfile.compression.type=BLOCK; SET hive.exec.compress.output=true; SET mapred.compress.map.output=true; SET mapred.output.compress=true; SET mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec; create table rc_bz2 (a string, b string) stored as rcfile; insert into table rc_bz2 select * from source_txt; hive select * from rc_bz2; Failed with exception java.io.IOException:java.io.IOException: Stream is not BZip2 formatted: expected 'h' as first byte but got '�' hive select * from rc_nobz2; apple sauce -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira