Re: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file using impala or hive

Alexander Behm Wed, 11 Apr 2018 16:54:07 -0700

You can control the compression for Impala INSERTS with this query option:
set compression_codec=gzip;
<do your insert here>


or
set compression_codec=snappy;
<do your insert here>

Impala uses snappy by default when inserting into Parquet.



On Wed, Apr 11, 2018 at 12:03 PM, Philip Zeyliger <phi...@cloudera.com>
wrote:

> Parquet compresses files within its own format, using a variety of codecs.
> You shouldn't expect to see the Parquet compression expressed in the
> filename. You may be able to use parquet-tools (
> http://kitesdk.org/docs/0.17.1/labs/4-using-parquet-tools.html) to get
> metadata about a Parquet file, including how it's compressed.
>
> -- Philip
>
> On Wed, Apr 11, 2018 at 11:18 AM, Sathishkumar Paramasivam <
> kumar.sathish...@gmail.com> wrote:
>
>> Hi,
>>
>> thanks for your attention on this issue.
>>
>> My question is, can we create compressed files with .snappy,gz,bz2 using
>> impala create table/insert statement?
>>
>> If not, then how about the set compression_codec=snappy statement. Or it
>> is not possible in Impala, but only in hive to create compressed files in
>> hdfs?
>>
>> Impala>set compression_code=snappy;
>> Impala> create table test(a string) stored as parquet;
>> Impala> insert into test values('1');
>>
>> I am setting compression in impala and inserting data into text/parquet
>> table but not able to see hdfs_file_name*.snappy* file extension in the
>> hdfs. doing this in oracle quickstart VM provided by cloudera.
>>
>>
>>  I could create compressed file in hive but trying to understand the
>> steps in impala for that same. I know there are certain restriction for
>> compression/file format but i tried wit parquet only which support all
>> compression and create & insert in impala.
>>
>> https://impala.apache.org/docs/build/html/topics/impala_file_formats.html
>>
>> Please guide.
>>
>> On 11 April 2018 at 12:33, Tim Armstrong <tarmstr...@cloudera.com> wrote:
>>
>>> Hi,
>>>   If I understood correctly, the query is behaving as expected but
>>> you're wondering how it works, right?
>>>
>>> Impala detects the compression type based on the file suffix. We mention
>>> this in the docs in the "Using gzip, bzip2, or Snappy-Compressed Text
>>> Files" section: https://impala.apache.org/docs
>>> /build/html/topics/impala_txtfile.html
>>>
>>> - Tim
>>>
>>> On Mon, Apr 9, 2018 at 7:30 PM, Sathishkumar Paramasivam <
>>> kumar.sathish...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> Pls help
>>>>
>>>> ---------- Forwarded message ---------
>>>> From: Tim Armstrong (JIRA) <j...@apache.org>
>>>> Date: Mon, Apr 9, 2018 at 7:18 PM
>>>> Subject: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs
>>>> file using impala or hive
>>>> To: <kumar.sathish...@gmail.com>
>>>>
>>>>
>>>>
>>>>      [ https://issues.apache.org/jira/browse/IMPALA-6829?page=com.a
>>>> tlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>>>
>>>> Tim Armstrong resolved IMPALA-6829.
>>>> -----------------------------------
>>>>     Resolution: Not A Bug
>>>>
>>>> We're happy to help you out with learning Impala, but it would be best
>>>> to have the discussion on the user list: user@impala.apache.org
>>>>
>>>> We mainly use JIRA for tracking changes we want to make to Impala, so
>>>> discussions with users tend to get lost here.
>>>>
>>>> > how to get compressed hdfs file using impala or hive
>>>> > ----------------------------------------------------
>>>> >
>>>> >                 Key: IMPALA-6829
>>>> >                 URL: https://issues.apache.org/jira
>>>> /browse/IMPALA-6829
>>>> >             Project: IMPALA
>>>> >          Issue Type: Question
>>>> >            Reporter: sathishkumar paramasivam
>>>> >            Priority: Major
>>>> >
>>>> > hi,
>>>> >
>>>> > i am doing the self learning now the impala and trying to enable the
>>>> compression for the table but could not see the hdfs file getting the
>>>> extension?
>>>> > referring to
>>>> > [https://www.cloudera.com/documentation/enterprise/5-8-x/top
>>>> ics/impala_txtfile.html]
>>>> > but not sure how the final compressed file are creating.
>>>> > When I try sqoop, i can get the compress file.  please guide.
>>>> > create table csv_compressed (a string, b string, c string)
>>>> >   row format delimited fields terminated by ",";
>>>> > insert into csv_compressed values
>>>> >   ('one - uncompressed', 'two - uncompressed', 'three -
>>>> uncompressed'),
>>>> >   ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
>>>> > ...make equivalent .gz, .bz2, and .snappy files and load them into
>>>> same table directory...
>>>> > select * from csv_compressed;
>>>> > +--------------------+--------------------+----------------------+
>>>> > | a                  | b                  | c                    |
>>>> > +--------------------+--------------------+----------------------+
>>>> > | one - snappy       | two - snappy       | three - snappy       |
>>>> > | one - uncompressed | two - uncompressed | three - uncompressed |
>>>> > | abc - uncompressed | xyz - uncompressed | 123 - uncompressed   |
>>>> > | one - bz2          | two - bz2          | three - bz2          |
>>>> > | abc - bz2          | xyz - bz2          | 123 - bz2            |
>>>> > | one - gzip         | two - gzip         | three - gzip         |
>>>> > | abc - gzip         | xyz - gzip         | 123 - gzip           |
>>>> > +--------------------+--------------------+----------------------+
>>>> > $ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hi
>>>> ve/warehouse/file_formats.db/csv_compressed/';
>>>> > ...truncated for readability...
>>>> > 75 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs
>>>> v_compressed/csv_compressed.snappy
>>>> > 79 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs
>>>> v_compressed/csv_compressed_bz2.csv.bz2
>>>> > 80 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs
>>>> v_compressed/csv_compressed_gzip.csv.gz
>>>> > 116 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/cs
>>>> v_compressed/dd414df64d67d49b_data.0.
>>>>
>>>>
>>>>
>>>> --
>>>> This message was sent by Atlassian JIRA
>>>> (v7.6.3#76005)
>>>>
>>>
>>>
>>
>

Re: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file using impala or hive

Reply via email to