Parquet compresses files within its own format, using a variety of codecs. You shouldn't expect to see the Parquet compression expressed in the filename. You may be able to use parquet-tools ( http://kitesdk.org/docs/0.17.1/labs/4-using-parquet-tools.html) to get metadata about a Parquet file, including how it's compressed.
-- Philip On Wed, Apr 11, 2018 at 11:18 AM, Sathishkumar Paramasivam < kumar.sathish...@gmail.com> wrote: > Hi, > > thanks for your attention on this issue. > > My question is, can we create compressed files with .snappy,gz,bz2 using > impala create table/insert statement? > > If not, then how about the set compression_codec=snappy statement. Or it > is not possible in Impala, but only in hive to create compressed files in > hdfs? > > Impala>set compression_code=snappy; > Impala> create table test(a string) stored as parquet; > Impala> insert into test values('1'); > > I am setting compression in impala and inserting data into text/parquet > table but not able to see hdfs_file_name*.snappy* file extension in the > hdfs. doing this in oracle quickstart VM provided by cloudera. > > > I could create compressed file in hive but trying to understand the steps > in impala for that same. I know there are certain restriction for > compression/file format but i tried wit parquet only which support all > compression and create & insert in impala. > > https://impala.apache.org/docs/build/html/topics/impala_file_formats.html > > Please guide. > > On 11 April 2018 at 12:33, Tim Armstrong <tarmstr...@cloudera.com> wrote: > >> Hi, >> If I understood correctly, the query is behaving as expected but you're >> wondering how it works, right? >> >> Impala detects the compression type based on the file suffix. We mention >> this in the docs in the "Using gzip, bzip2, or Snappy-Compressed Text >> Files" section: https://impala.apache.org/docs >> /build/html/topics/impala_txtfile.html >> >> - Tim >> >> On Mon, Apr 9, 2018 at 7:30 PM, Sathishkumar Paramasivam < >> kumar.sathish...@gmail.com> wrote: >> >>> >>> >>> Pls help >>> >>> ---------- Forwarded message --------- >>> From: Tim Armstrong (JIRA) <j...@apache.org> >>> Date: Mon, Apr 9, 2018 at 7:18 PM >>> Subject: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file >>> using impala or hive >>> To: <kumar.sathish...@gmail.com> >>> >>> >>> >>> [ https://issues.apache.org/jira/browse/IMPALA-6829?page=com.a >>> tlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] >>> >>> Tim Armstrong resolved IMPALA-6829. >>> ----------------------------------- >>> Resolution: Not A Bug >>> >>> We're happy to help you out with learning Impala, but it would be best >>> to have the discussion on the user list: user@impala.apache.org >>> >>> We mainly use JIRA for tracking changes we want to make to Impala, so >>> discussions with users tend to get lost here. >>> >>> > how to get compressed hdfs file using impala or hive >>> > ---------------------------------------------------- >>> > >>> > Key: IMPALA-6829 >>> > URL: https://issues.apache.org/jira/browse/IMPALA-6829 >>> > Project: IMPALA >>> > Issue Type: Question >>> > Reporter: sathishkumar paramasivam >>> > Priority: Major >>> > >>> > hi, >>> > >>> > i am doing the self learning now the impala and trying to enable the >>> compression for the table but could not see the hdfs file getting the >>> extension? >>> > referring to >>> > [https://www.cloudera.com/documentation/enterprise/5-8-x/top >>> ics/impala_txtfile.html] >>> > but not sure how the final compressed file are creating. >>> > When I try sqoop, i can get the compress file. please guide. >>> > create table csv_compressed (a string, b string, c string) >>> > row format delimited fields terminated by ","; >>> > insert into csv_compressed values >>> > ('one - uncompressed', 'two - uncompressed', 'three - uncompressed'), >>> > ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed'); >>> > ...make equivalent .gz, .bz2, and .snappy files and load them into >>> same table directory... >>> > select * from csv_compressed; >>> > +--------------------+--------------------+----------------------+ >>> > | a | b | c | >>> > +--------------------+--------------------+----------------------+ >>> > | one - snappy | two - snappy | three - snappy | >>> > | one - uncompressed | two - uncompressed | three - uncompressed | >>> > | abc - uncompressed | xyz - uncompressed | 123 - uncompressed | >>> > | one - bz2 | two - bz2 | three - bz2 | >>> > | abc - bz2 | xyz - bz2 | 123 - bz2 | >>> > | one - gzip | two - gzip | three - gzip | >>> > | abc - gzip | xyz - gzip | 123 - gzip | >>> > +--------------------+--------------------+----------------------+ >>> > $ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hi >>> ve/warehouse/file_formats.db/csv_compressed/'; >>> > ...truncated for readability... >>> > 75 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs >>> v_compressed/csv_compressed.snappy >>> > 79 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs >>> v_compressed/csv_compressed_bz2.csv.bz2 >>> > 80 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs >>> v_compressed/csv_compressed_gzip.csv.gz >>> > 116 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs >>> v_compressed/dd414df64d67d49b_data.0. >>> >>> >>> >>> -- >>> This message was sent by Atlassian JIRA >>> (v7.6.3#76005) >>> >> >> >