Hi,
thanks for your attention on this issue.
My question is, can we create compressed files with .snappy,gz,bz2 using
impala create table/insert statement?
If not, then how about the set compression_codec=snappy statement. Or it is
not possible in Impala, but only in hive to create compressed files in hdfs?
Impala>set compression_code=snappy;
Impala> create table test(a string) stored as parquet;
Impala> insert into test values('1');
I am setting compression in impala and inserting data into text/parquet
table but not able to see hdfs_file_name*.snappy* file extension in the
hdfs. doing this in oracle quickstart VM provided by cloudera.
I could create compressed file in hive but trying to understand the steps
in impala for that same. I know there are certain restriction for
compression/file format but i tried wit parquet only which support all
compression and create & insert in impala.
https://impala.apache.org/docs/build/html/topics/impala_file_formats.html
Please guide.
On 11 April 2018 at 12:33, Tim Armstrong <[email protected]> wrote:
> Hi,
> If I understood correctly, the query is behaving as expected but you're
> wondering how it works, right?
>
> Impala detects the compression type based on the file suffix. We mention
> this in the docs in the "Using gzip, bzip2, or Snappy-Compressed Text
> Files" section: https://impala.apache.org/docs/build/html/topics/impala_
> txtfile.html
>
> - Tim
>
> On Mon, Apr 9, 2018 at 7:30 PM, Sathishkumar Paramasivam <
> [email protected]> wrote:
>
>>
>>
>> Pls help
>>
>> ---------- Forwarded message ---------
>> From: Tim Armstrong (JIRA) <[email protected]>
>> Date: Mon, Apr 9, 2018 at 7:18 PM
>> Subject: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file
>> using impala or hive
>> To: <[email protected]>
>>
>>
>>
>> [ https://issues.apache.org/jira/browse/IMPALA-6829?page=com.
>> atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>
>> Tim Armstrong resolved IMPALA-6829.
>> -----------------------------------
>> Resolution: Not A Bug
>>
>> We're happy to help you out with learning Impala, but it would be best to
>> have the discussion on the user list: [email protected]
>>
>> We mainly use JIRA for tracking changes we want to make to Impala, so
>> discussions with users tend to get lost here.
>>
>> > how to get compressed hdfs file using impala or hive
>> > ----------------------------------------------------
>> >
>> > Key: IMPALA-6829
>> > URL: https://issues.apache.org/jira/browse/IMPALA-6829
>> > Project: IMPALA
>> > Issue Type: Question
>> > Reporter: sathishkumar paramasivam
>> > Priority: Major
>> >
>> > hi,
>> >
>> > i am doing the self learning now the impala and trying to enable the
>> compression for the table but could not see the hdfs file getting the
>> extension?
>> > referring to
>> > [https://www.cloudera.com/documentation/enterprise/5-8-x/
>> topics/impala_txtfile.html]
>> > but not sure how the final compressed file are creating.
>> > When I try sqoop, i can get the compress file. please guide.
>> > create table csv_compressed (a string, b string, c string)
>> > row format delimited fields terminated by ",";
>> > insert into csv_compressed values
>> > ('one - uncompressed', 'two - uncompressed', 'three - uncompressed'),
>> > ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
>> > ...make equivalent .gz, .bz2, and .snappy files and load them into same
>> table directory...
>> > select * from csv_compressed;
>> > +--------------------+--------------------+----------------------+
>> > | a | b | c |
>> > +--------------------+--------------------+----------------------+
>> > | one - snappy | two - snappy | three - snappy |
>> > | one - uncompressed | two - uncompressed | three - uncompressed |
>> > | abc - uncompressed | xyz - uncompressed | 123 - uncompressed |
>> > | one - bz2 | two - bz2 | three - bz2 |
>> > | abc - bz2 | xyz - bz2 | 123 - bz2 |
>> > | one - gzip | two - gzip | three - gzip |
>> > | abc - gzip | xyz - gzip | 123 - gzip |
>> > +--------------------+--------------------+----------------------+
>> > $ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hi
>> ve/warehouse/file_formats.db/csv_compressed/';
>> > ...truncated for readability...
>> > 75 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/csv_compressed.snappy
>> > 79 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/csv_compressed_bz2.csv.bz2
>> > 80 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/csv_compressed_gzip.csv.gz
>> > 116 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/dd414df64d67d49b_data.0.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v7.6.3#76005)
>>
>
>