Re: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file using impala or hive

2018-04-11 Thread Alexander Behm
You can control the compression for Impala INSERTS with this query option:
set compression_codec=gzip;


or
set compression_codec=snappy;


Impala uses snappy by default when inserting into Parquet.



On Wed, Apr 11, 2018 at 12:03 PM, Philip Zeyliger 
wrote:

> Parquet compresses files within its own format, using a variety of codecs.
> You shouldn't expect to see the Parquet compression expressed in the
> filename. You may be able to use parquet-tools (
> http://kitesdk.org/docs/0.17.1/labs/4-using-parquet-tools.html) to get
> metadata about a Parquet file, including how it's compressed.
>
> -- Philip
>
> On Wed, Apr 11, 2018 at 11:18 AM, Sathishkumar Paramasivam <
> kumar.sathish...@gmail.com> wrote:
>
>> Hi,
>>
>> thanks for your attention on this issue.
>>
>> My question is, can we create compressed files with .snappy,gz,bz2 using
>> impala create table/insert statement?
>>
>> If not, then how about the set compression_codec=snappy statement. Or it
>> is not possible in Impala, but only in hive to create compressed files in
>> hdfs?
>>
>> Impala>set compression_code=snappy;
>> Impala> create table test(a string) stored as parquet;
>> Impala> insert into test values('1');
>>
>> I am setting compression in impala and inserting data into text/parquet
>> table but not able to see hdfs_file_name*.snappy* file extension in the
>> hdfs. doing this in oracle quickstart VM provided by cloudera.
>>
>>
>>  I could create compressed file in hive but trying to understand the
>> steps in impala for that same. I know there are certain restriction for
>> compression/file format but i tried wit parquet only which support all
>> compression and create & insert in impala.
>>
>> https://impala.apache.org/docs/build/html/topics/impala_file_formats.html
>>
>> Please guide.
>>
>> On 11 April 2018 at 12:33, Tim Armstrong  wrote:
>>
>>> Hi,
>>>   If I understood correctly, the query is behaving as expected but
>>> you're wondering how it works, right?
>>>
>>> Impala detects the compression type based on the file suffix. We mention
>>> this in the docs in the "Using gzip, bzip2, or Snappy-Compressed Text
>>> Files" section: https://impala.apache.org/docs
>>> /build/html/topics/impala_txtfile.html
>>>
>>> - Tim
>>>
>>> On Mon, Apr 9, 2018 at 7:30 PM, Sathishkumar Paramasivam <
>>> kumar.sathish...@gmail.com> wrote:
>>>


 Pls help

 -- Forwarded message -
 From: Tim Armstrong (JIRA) 
 Date: Mon, Apr 9, 2018 at 7:18 PM
 Subject: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs
 file using impala or hive
 To: 



  [ https://issues.apache.org/jira/browse/IMPALA-6829?page=com.a
 tlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

 Tim Armstrong resolved IMPALA-6829.
 ---
 Resolution: Not A Bug

 We're happy to help you out with learning Impala, but it would be best
 to have the discussion on the user list: user@impala.apache.org

 We mainly use JIRA for tracking changes we want to make to Impala, so
 discussions with users tend to get lost here.

 > how to get compressed hdfs file using impala or hive
 > 
 >
 > Key: IMPALA-6829
 > URL: https://issues.apache.org/jira
 /browse/IMPALA-6829
 > Project: IMPALA
 >  Issue Type: Question
 >Reporter: sathishkumar paramasivam
 >Priority: Major
 >
 > hi,
 >
 > i am doing the self learning now the impala and trying to enable the
 compression for the table but could not see the hdfs file getting the
 extension?
 > referring to
 > [https://www.cloudera.com/documentation/enterprise/5-8-x/top
 ics/impala_txtfile.html]
 > but not sure how the final compressed file are creating.
 > When I try sqoop, i can get the compress file.  please guide.
 > create table csv_compressed (a string, b string, c string)
 >   row format delimited fields terminated by ",";
 > insert into csv_compressed values
 >   ('one - uncompressed', 'two - uncompressed', 'three -
 uncompressed'),
 >   ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
 > ...make equivalent .gz, .bz2, and .snappy files and load them into
 same table directory...
 > select * from csv_compressed;
 > +++--+
 > | a  | b  | c|
 > +++--+
 > | one - snappy   | two - snappy   | three - snappy   |
 > | one - uncompressed | two - uncompressed | three - uncompressed |
 > | abc - uncompressed | xyz - uncompressed | 123 - uncompressed   |
 > | one - bz2  | two - bz2  | three - bz2  |
 > | ab

Re: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file using impala or hive

2018-04-11 Thread Philip Zeyliger
Parquet compresses files within its own format, using a variety of codecs.
You shouldn't expect to see the Parquet compression expressed in the
filename. You may be able to use parquet-tools (
http://kitesdk.org/docs/0.17.1/labs/4-using-parquet-tools.html) to get
metadata about a Parquet file, including how it's compressed.

-- Philip

On Wed, Apr 11, 2018 at 11:18 AM, Sathishkumar Paramasivam <
kumar.sathish...@gmail.com> wrote:

> Hi,
>
> thanks for your attention on this issue.
>
> My question is, can we create compressed files with .snappy,gz,bz2 using
> impala create table/insert statement?
>
> If not, then how about the set compression_codec=snappy statement. Or it
> is not possible in Impala, but only in hive to create compressed files in
> hdfs?
>
> Impala>set compression_code=snappy;
> Impala> create table test(a string) stored as parquet;
> Impala> insert into test values('1');
>
> I am setting compression in impala and inserting data into text/parquet
> table but not able to see hdfs_file_name*.snappy* file extension in the
> hdfs. doing this in oracle quickstart VM provided by cloudera.
>
>
>  I could create compressed file in hive but trying to understand the steps
> in impala for that same. I know there are certain restriction for
> compression/file format but i tried wit parquet only which support all
> compression and create & insert in impala.
>
> https://impala.apache.org/docs/build/html/topics/impala_file_formats.html
>
> Please guide.
>
> On 11 April 2018 at 12:33, Tim Armstrong  wrote:
>
>> Hi,
>>   If I understood correctly, the query is behaving as expected but you're
>> wondering how it works, right?
>>
>> Impala detects the compression type based on the file suffix. We mention
>> this in the docs in the "Using gzip, bzip2, or Snappy-Compressed Text
>> Files" section: https://impala.apache.org/docs
>> /build/html/topics/impala_txtfile.html
>>
>> - Tim
>>
>> On Mon, Apr 9, 2018 at 7:30 PM, Sathishkumar Paramasivam <
>> kumar.sathish...@gmail.com> wrote:
>>
>>>
>>>
>>> Pls help
>>>
>>> -- Forwarded message -
>>> From: Tim Armstrong (JIRA) 
>>> Date: Mon, Apr 9, 2018 at 7:18 PM
>>> Subject: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file
>>> using impala or hive
>>> To: 
>>>
>>>
>>>
>>>  [ https://issues.apache.org/jira/browse/IMPALA-6829?page=com.a
>>> tlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>>
>>> Tim Armstrong resolved IMPALA-6829.
>>> ---
>>> Resolution: Not A Bug
>>>
>>> We're happy to help you out with learning Impala, but it would be best
>>> to have the discussion on the user list: user@impala.apache.org
>>>
>>> We mainly use JIRA for tracking changes we want to make to Impala, so
>>> discussions with users tend to get lost here.
>>>
>>> > how to get compressed hdfs file using impala or hive
>>> > 
>>> >
>>> > Key: IMPALA-6829
>>> > URL: https://issues.apache.org/jira/browse/IMPALA-6829
>>> > Project: IMPALA
>>> >  Issue Type: Question
>>> >Reporter: sathishkumar paramasivam
>>> >Priority: Major
>>> >
>>> > hi,
>>> >
>>> > i am doing the self learning now the impala and trying to enable the
>>> compression for the table but could not see the hdfs file getting the
>>> extension?
>>> > referring to
>>> > [https://www.cloudera.com/documentation/enterprise/5-8-x/top
>>> ics/impala_txtfile.html]
>>> > but not sure how the final compressed file are creating.
>>> > When I try sqoop, i can get the compress file.  please guide.
>>> > create table csv_compressed (a string, b string, c string)
>>> >   row format delimited fields terminated by ",";
>>> > insert into csv_compressed values
>>> >   ('one - uncompressed', 'two - uncompressed', 'three - uncompressed'),
>>> >   ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
>>> > ...make equivalent .gz, .bz2, and .snappy files and load them into
>>> same table directory...
>>> > select * from csv_compressed;
>>> > +++--+
>>> > | a  | b  | c|
>>> > +++--+
>>> > | one - snappy   | two - snappy   | three - snappy   |
>>> > | one - uncompressed | two - uncompressed | three - uncompressed |
>>> > | abc - uncompressed | xyz - uncompressed | 123 - uncompressed   |
>>> > | one - bz2  | two - bz2  | three - bz2  |
>>> > | abc - bz2  | xyz - bz2  | 123 - bz2|
>>> > | one - gzip | two - gzip | three - gzip |
>>> > | abc - gzip | xyz - gzip | 123 - gzip   |
>>> > +++--+
>>> > $ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hi
>>> ve/warehouse/file_formats.db/csv_compressed/';
>>> > ...trun

Re: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file using impala or hive

2018-04-11 Thread Sathishkumar Paramasivam
Hi,

thanks for your attention on this issue.

My question is, can we create compressed files with .snappy,gz,bz2 using
impala create table/insert statement?

If not, then how about the set compression_codec=snappy statement. Or it is
not possible in Impala, but only in hive to create compressed files in hdfs?

Impala>set compression_code=snappy;
Impala> create table test(a string) stored as parquet;
Impala> insert into test values('1');

I am setting compression in impala and inserting data into text/parquet
table but not able to see hdfs_file_name*.snappy* file extension in the
hdfs. doing this in oracle quickstart VM provided by cloudera.


 I could create compressed file in hive but trying to understand the steps
in impala for that same. I know there are certain restriction for
compression/file format but i tried wit parquet only which support all
compression and create & insert in impala.

https://impala.apache.org/docs/build/html/topics/impala_file_formats.html

Please guide.

On 11 April 2018 at 12:33, Tim Armstrong  wrote:

> Hi,
>   If I understood correctly, the query is behaving as expected but you're
> wondering how it works, right?
>
> Impala detects the compression type based on the file suffix. We mention
> this in the docs in the "Using gzip, bzip2, or Snappy-Compressed Text
> Files" section: https://impala.apache.org/docs/build/html/topics/impala_
> txtfile.html
>
> - Tim
>
> On Mon, Apr 9, 2018 at 7:30 PM, Sathishkumar Paramasivam <
> kumar.sathish...@gmail.com> wrote:
>
>>
>>
>> Pls help
>>
>> -- Forwarded message -
>> From: Tim Armstrong (JIRA) 
>> Date: Mon, Apr 9, 2018 at 7:18 PM
>> Subject: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file
>> using impala or hive
>> To: 
>>
>>
>>
>>  [ https://issues.apache.org/jira/browse/IMPALA-6829?page=com.
>> atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>>
>> Tim Armstrong resolved IMPALA-6829.
>> ---
>> Resolution: Not A Bug
>>
>> We're happy to help you out with learning Impala, but it would be best to
>> have the discussion on the user list: user@impala.apache.org
>>
>> We mainly use JIRA for tracking changes we want to make to Impala, so
>> discussions with users tend to get lost here.
>>
>> > how to get compressed hdfs file using impala or hive
>> > 
>> >
>> > Key: IMPALA-6829
>> > URL: https://issues.apache.org/jira/browse/IMPALA-6829
>> > Project: IMPALA
>> >  Issue Type: Question
>> >Reporter: sathishkumar paramasivam
>> >Priority: Major
>> >
>> > hi,
>> >
>> > i am doing the self learning now the impala and trying to enable the
>> compression for the table but could not see the hdfs file getting the
>> extension?
>> > referring to
>> > [https://www.cloudera.com/documentation/enterprise/5-8-x/
>> topics/impala_txtfile.html]
>> > but not sure how the final compressed file are creating.
>> > When I try sqoop, i can get the compress file.  please guide.
>> > create table csv_compressed (a string, b string, c string)
>> >   row format delimited fields terminated by ",";
>> > insert into csv_compressed values
>> >   ('one - uncompressed', 'two - uncompressed', 'three - uncompressed'),
>> >   ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
>> > ...make equivalent .gz, .bz2, and .snappy files and load them into same
>> table directory...
>> > select * from csv_compressed;
>> > +++--+
>> > | a  | b  | c|
>> > +++--+
>> > | one - snappy   | two - snappy   | three - snappy   |
>> > | one - uncompressed | two - uncompressed | three - uncompressed |
>> > | abc - uncompressed | xyz - uncompressed | 123 - uncompressed   |
>> > | one - bz2  | two - bz2  | three - bz2  |
>> > | abc - bz2  | xyz - bz2  | 123 - bz2|
>> > | one - gzip | two - gzip | three - gzip |
>> > | abc - gzip | xyz - gzip | 123 - gzip   |
>> > +++--+
>> > $ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hi
>> ve/warehouse/file_formats.db/csv_compressed/';
>> > ...truncated for readability...
>> > 75 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/csv_compressed.snappy
>> > 79 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/csv_compressed_bz2.csv.bz2
>> > 80 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/csv_compressed_gzip.csv.gz
>> > 116 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.db/
>> csv_compressed/dd414df64d67d49b_data.0.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v7.6.3#76005)
>>
>
>


Re: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file using impala or hive

2018-04-11 Thread Tim Armstrong
Hi,
  If I understood correctly, the query is behaving as expected but you're
wondering how it works, right?

Impala detects the compression type based on the file suffix. We mention
this in the docs in the "Using gzip, bzip2, or Snappy-Compressed Text
Files" section:
https://impala.apache.org/docs/build/html/topics/impala_txtfile.html

- Tim

On Mon, Apr 9, 2018 at 7:30 PM, Sathishkumar Paramasivam <
kumar.sathish...@gmail.com> wrote:

>
>
> Pls help
>
> -- Forwarded message -
> From: Tim Armstrong (JIRA) 
> Date: Mon, Apr 9, 2018 at 7:18 PM
> Subject: [jira] [Resolved] (IMPALA-6829) how to get compressed hdfs file
> using impala or hive
> To: 
>
>
>
>  [ https://issues.apache.org/jira/browse/IMPALA-6829?page=
> com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Tim Armstrong resolved IMPALA-6829.
> ---
> Resolution: Not A Bug
>
> We're happy to help you out with learning Impala, but it would be best to
> have the discussion on the user list: user@impala.apache.org
>
> We mainly use JIRA for tracking changes we want to make to Impala, so
> discussions with users tend to get lost here.
>
> > how to get compressed hdfs file using impala or hive
> > 
> >
> > Key: IMPALA-6829
> > URL: https://issues.apache.org/jira/browse/IMPALA-6829
> > Project: IMPALA
> >  Issue Type: Question
> >Reporter: sathishkumar paramasivam
> >Priority: Major
> >
> > hi,
> >
> > i am doing the self learning now the impala and trying to enable the
> compression for the table but could not see the hdfs file getting the
> extension?
> > referring to
> > [https://www.cloudera.com/documentation/enterprise/5-8-
> x/topics/impala_txtfile.html]
> > but not sure how the final compressed file are creating.
> > When I try sqoop, i can get the compress file.  please guide.
> > create table csv_compressed (a string, b string, c string)
> >   row format delimited fields terminated by ",";
> > insert into csv_compressed values
> >   ('one - uncompressed', 'two - uncompressed', 'three - uncompressed'),
> >   ('abc - uncompressed', 'xyz - uncompressed', '123 - uncompressed');
> > ...make equivalent .gz, .bz2, and .snappy files and load them into same
> table directory...
> > select * from csv_compressed;
> > +++--+
> > | a  | b  | c|
> > +++--+
> > | one - snappy   | two - snappy   | three - snappy   |
> > | one - uncompressed | two - uncompressed | three - uncompressed |
> > | abc - uncompressed | xyz - uncompressed | 123 - uncompressed   |
> > | one - bz2  | two - bz2  | three - bz2  |
> > | abc - bz2  | xyz - bz2  | 123 - bz2|
> > | one - gzip | two - gzip | three - gzip |
> > | abc - gzip | xyz - gzip | 123 - gzip   |
> > +++--+
> > $ hdfs dfs -ls 'hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.
> db/csv_compressed/';
> > ...truncated for readability...
> > 75 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.
> db/csv_compressed/csv_compressed.snappy
> > 79 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.
> db/csv_compressed/csv_compressed_bz2.csv.bz2
> > 80 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.
> db/csv_compressed/csv_compressed_gzip.csv.gz
> > 116 hdfs://127.0.0.1:8020/user/hive/warehouse/file_formats.
> db/csv_compressed/dd414df64d67d49b_data.0.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>