Pls help ---------- Forwarded message --------- From: Tim Armstrong (JIRA) <j...@apache.org> Date: Mon, Apr 9, 2018 at 7:18 PM Subject: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file using impala or hive To: <kumar.sathish...@gmail.com>
[ https://issues.apache.org/jira/browse/IMPALA-6829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-6829. ----------------------------------- Resolution: Not A Bug We're happy to help you out with learning Impala, but it would be best to have the discussion on the user list: user@impala.apache.org We mainly use JIRA for tracking changes we want to make to Impala, so discussions with users tend to get lost here. > how to get compressed hdfs file using impala or hive > ---------------------------------------------------- > > Key: IMPALA-6829 > URL: https://issues.apache.org/jira/browse/IMPALA-6829 > Project: IMPALA > Issue Type: Question > Reporter: sathishkumar paramasivam > Priority: Major > > hi, > > i am doing the self learning now the impala and trying to enable the compression for the table but could not see the hdfs file getting the extension? > referring to > [ https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_txtfile.html ] > but not sure how the final compressed file are creating. > When I try sqoop, i can get the compress file. please guide. > create table csv_compressed (a string, b string, c string) > row format delimited fields terminated by ","; > insert into csv_compressed values > ('one - uncompressed', 'two - uncompressed', 'three - uncompressed'), > ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed'); > ...make equivalent .gz, .bz2, and .snappy files and load them into same table directory... > select * from csv_compressed; > +--------------------+--------------------+----------------------+ > | a | b | c | > +--------------------+--------------------+----------------------+ > | one - snappy | two - snappy | three - snappy | > | one - uncompressed | two - uncompressed | three - uncompressed | > | abc - uncompressed | xyz - uncompressed | 123 - uncompressed | > | one - bz2 | two - bz2 | three - bz2 | > | abc - bz2 | xyz - bz2 | 123 - bz2 | > | one - gzip | two - gzip | three - gzip | > | abc - gzip | xyz - gzip | 123 - gzip | > +--------------------+--------------------+----------------------+ > $ hdfs dfs -ls 'hdfs:// 127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/'; > ...truncated for readability... > 75 hdfs:// 127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed.snappy > 79 hdfs:// 127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed_bz2.csv.bz2 > 80 hdfs:// 127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/csv_compressed_gzip.csv.gz > 116 hdfs:// 127.0.0.1:8020/user/hive/warehouse/file_formats.db/csv_compressed/dd414df64d67d49b_data.0 . -- This message was sent by Atlassian JIRA (v7.6.3#76005)