You can control the compression for Impala INSERTS with this query option: set compression_codec=gzip; <do your insert here>
or set compression_codec=snappy; <do your insert here> Impala uses snappy by default when inserting into Parquet. On Wed, Apr 11, 2018 at 12:03 PM, Philip Zeyliger <phi...@cloudera.com> wrote: > Parquet compresses files within its own format, using a variety of codecs. > You shouldn't expect to see the Parquet compression expressed in the > filename. You may be able to use parquet-tools ( > http://kitesdk.org/docs/0.17.1/labs/4-using-parquet-tools.html) to get > metadata about a Parquet file, including how it's compressed. > > -- Philip > > On Wed, Apr 11, 2018 at 11:18 AM, Sathishkumar Paramasivam < > kumar.sathish...@gmail.com> wrote: > >> Hi, >> >> thanks for your attention on this issue. >> >> My question is, can we create compressed files with .snappy,gz,bz2 using >> impala create table/insert statement? >> >> If not, then how about the set compression_codec=snappy statement. Or it >> is not possible in Impala, but only in hive to create compressed files in >> hdfs? >> >> Impala>set compression_code=snappy; >> Impala> create table test(a string) stored as parquet; >> Impala> insert into test values('1'); >> >> I am setting compression in impala and inserting data into text/parquet >> table but not able to see hdfs_file_name*.snappy* file extension in the >> hdfs. doing this in oracle quickstart VM provided by cloudera. >> >> >> I could create compressed file in hive but trying to understand the >> steps in impala for that same. I know there are certain restriction for >> compression/file format but i tried wit parquet only which support all >> compression and create & insert in impala. >> >> https://impala.apache.org/docs/build/html/topics/impala_file_formats.html >> >> Please guide. >> >> On 11 April 2018 at 12:33, Tim Armstrong <tarmstr...@cloudera.com> wrote: >> >>> Hi, >>> If I understood correctly, the query is behaving as expected but >>> you're wondering how it works, right? >>> >>> Impala detects the compression type based on the file suffix. We mention >>> this in the docs in the "Using gzip, bzip2, or Snappy-Compressed Text >>> Files" section: https://impala.apache.org/docs >>> /build/html/topics/impala_txtfile.html >>> >>> - Tim >>> >>> On Mon, Apr 9, 2018 at 7:30 PM, Sathishkumar Paramasivam < >>> kumar.sathish...@gmail.com> wrote: >>> >>>> >>>> >>>> Pls help >>>> >>>> ---------- Forwarded message --------- >>>> From: Tim Armstrong (JIRA) <j...@apache.org> >>>> Date: Mon, Apr 9, 2018 at 7:18 PM >>>> Subject: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs >>>> file using impala or hive >>>> To: <kumar.sathish...@gmail.com> >>>> >>>> >>>> >>>> [ https://issues.apache.org/jira/browse/IMPALA-6829?page=com.a >>>> tlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] >>>> >>>> Tim Armstrong resolved IMPALA-6829. >>>> ----------------------------------- >>>> Resolution: Not A Bug >>>> >>>> We're happy to help you out with learning Impala, but it would be best >>>> to have the discussion on the user list: user@impala.apache.org >>>> >>>> We mainly use JIRA for tracking changes we want to make to Impala, so >>>> discussions with users tend to get lost here. >>>> >>>> > how to get compressed hdfs file using impala or hive >>>> > ---------------------------------------------------- >>>> > >>>> > Key: IMPALA-6829 >>>> > URL: https://issues.apache.org/jira >>>> /browse/IMPALA-6829 >>>> > Project: IMPALA >>>> > Issue Type: Question >>>> > Reporter: sathishkumar paramasivam >>>> > Priority: Major >>>> > >>>> > hi, >>>> > >>>> > i am doing the self learning now the impala and trying to enable the >>>> compression for the table but could not see the hdfs file getting the >>>> extension? >>>> > referring to >>>> > [https://www.cloudera.com/documentation/enterprise/5-8-x/top >>>> ics/impala_txtfile.html] >>>> > but not sure how the final compressed file are creating. >>>> > When I try sqoop, i can get the compress file. please guide. >>>> > create table csv_compressed (a string, b string, c string) >>>> > row format delimited fields terminated by ","; >>>> > insert into csv_compressed values >>>> > ('one - uncompressed', 'two - uncompressed', 'three - >>>> uncompressed'), >>>> > ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed'); >>>> > ...make equivalent .gz, .bz2, and .snappy files and load them into >>>> same table directory... >>>> > select * from csv_compressed; >>>> > +--------------------+--------------------+----------------------+ >>>> > | a | b | c | >>>> > +--------------------+--------------------+----------------------+ >>>> > | one - snappy | two - snappy | three - snappy | >>>> > | one - uncompressed | two - uncompressed | three - uncompressed | >>>> > | abc - uncompressed | xyz - uncompressed | 123 - uncompressed | >>>> > | one - bz2 | two - bz2 | three - bz2 | >>>> > | abc - bz2 | xyz - bz2 | 123 - bz2 | >>>> > | one - gzip | two - gzip | three - gzip | >>>> > | abc - gzip | xyz - gzip | 123 - gzip | >>>> > +--------------------+--------------------+----------------------+ >>>> > $ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hi >>>> ve/warehouse/file_formats.db/csv_compressed/'; >>>> > ...truncated for readability... >>>> > 75 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs >>>> v_compressed/csv_compressed.snappy >>>> > 79 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs >>>> v_compressed/csv_compressed_bz2.csv.bz2 >>>> > 80 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs >>>> v_compressed/csv_compressed_gzip.csv.gz >>>> > 116 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs >>>> v_compressed/dd414df64d67d49b_data.0. >>>> >>>> >>>> >>>> -- >>>> This message was sent by Atlassian JIRA >>>> (v7.6.3#76005) >>>> >>> >>> >> >