Re: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file using impala or hive

Sathishkumar Paramasivam Wed, 11 Apr 2018 11:20:00 -0700

Hi,

thanks for your attention on this issue.


My question is, can we create compressed files with .snappy,gz,bz2 using
impala create table/insert statement?

If not, then how about the set compression_codec=snappy statement. Or it is
not possible in Impala, but only in hive to create compressed files in hdfs?

Impala>set compression_code=snappy;
Impala> create table test(a string) stored as parquet;
Impala> insert into test values('1');

I am setting compression in impala and inserting data into text/parquet
table but not able to see hdfs_file_name*.snappy* file extension in the
hdfs. doing this in oracle quickstart VM provided by cloudera.


 I could create compressed file in hive but trying to understand the steps
in impala for that same. I know there are certain restriction for
compression/file format but i tried wit parquet only which support all
compression and create & insert in impala.

https://impala.apache.org/docs/build/html/topics/impala_file_formats.html

Please guide.

On 11 April 2018 at 12:33, Tim Armstrong <[email protected]> wrote:

> Hi,
>   If I understood correctly, the query is behaving as expected but you're
> wondering how it works, right?
>
> Impala detects the compression type based on the file suffix. We mention
> this in the docs in the "Using gzip, bzip2, or Snappy-Compressed Text
> Files" section: https://impala.apache.org/docs/build/html/topics/impala_
> txtfile.html
>
> - Tim
>
> On Mon, Apr 9, 2018 at 7:30 PM, Sathishkumar Paramasivam <
> [email protected]> wrote:
>
>>
>>
>> Pls help
>>
>> ---------- Forwarded message ---------
>> From: Tim Armstrong (JIRA) <[email protected]>
>> Date: Mon, Apr 9, 2018 at 7:18 PM
>> Subject: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file
>> using impala or hive
>> To: <[email protected]>
>>
>>
>>
>>      [ https://issues.apache.org/jira/browse/IMPALA-6829?page=com.
>> atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>
>> Tim Armstrong resolved IMPALA-6829.
>> -----------------------------------
>>     Resolution: Not A Bug
>>
>> We're happy to help you out with learning Impala, but it would be best to
>> have the discussion on the user list: [email protected]
>>
>> We mainly use JIRA for tracking changes we want to make to Impala, so
>> discussions with users tend to get lost here.
>>
>> > how to get compressed hdfs file using impala or hive
>> > ----------------------------------------------------
>> >
>> >                 Key: IMPALA-6829
>> >                 URL: https://issues.apache.org/jira/browse/IMPALA-6829
>> >             Project: IMPALA
>> >          Issue Type: Question
>> >            Reporter: sathishkumar paramasivam
>> >            Priority: Major
>> >
>> > hi,
>> >
>> > i am doing the self learning now the impala and trying to enable the
>> compression for the table but could not see the hdfs file getting the
>> extension?
>> > referring to
>> > [https://www.cloudera.com/documentation/enterprise/5-8-x/
>> topics/impala_txtfile.html]
>> > but not sure how the final compressed file are creating.
>> > When I try sqoop, i can get the compress file.  please guide.
>> > create table csv_compressed (a string, b string, c string)
>> >   row format delimited fields terminated by ",";
>> > insert into csv_compressed values
>> >   ('one - uncompressed', 'two - uncompressed', 'three - uncompressed'),
>> >   ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
>> > ...make equivalent .gz, .bz2, and .snappy files and load them into same
>> table directory...
>> > select * from csv_compressed;
>> > +--------------------+--------------------+----------------------+
>> > | a                  | b                  | c                    |
>> > +--------------------+--------------------+----------------------+
>> > | one - snappy       | two - snappy       | three - snappy       |
>> > | one - uncompressed | two - uncompressed | three - uncompressed |
>> > | abc - uncompressed | xyz - uncompressed | 123 - uncompressed   |
>> > | one - bz2          | two - bz2          | three - bz2          |
>> > | abc - bz2          | xyz - bz2          | 123 - bz2            |
>> > | one - gzip         | two - gzip         | three - gzip         |
>> > | abc - gzip         | xyz - gzip         | 123 - gzip           |
>> > +--------------------+--------------------+----------------------+
>> > $ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hi
>> ve/warehouse/file_formats.db/csv_compressed/';
>> > ...truncated for readability...
>> > 75 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/csv_compressed.snappy
>> > 79 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/csv_compressed_bz2.csv.bz2
>> > 80 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/csv_compressed_gzip.csv.gz
>> > 116 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/dd414df64d67d49b_data.0.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v7.6.3#76005)
>>
>
>

Re: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file using impala or hive

Reply via email to