Re: Saving parquet table as uncompressed with write.mode("overwrite").

2016-07-03 Thread Mich Talebzadeh
Checked default is gzip



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 3 July 2016 at 23:39, Mich Talebzadeh  wrote:

> thanks Ted that was it :)
>
> scala> val c = sqlContext.setConf("spark.sql.parquet.compression.codec",
> "uncompressed")
> c: Unit = ()
> scala> val s4 = s.write.mode("overwrite").parquet("/user/hduser/sales4")
> s4: Unit = ()
>
>
> Before
> -rw-r--r--   2 hduser supergroup  17487 2016-07-03 22:28
> /user/hduser/sales4/part-r-00199-9dcd4fb8-148d-48ba-9da3-8d68aa24aa5c.
> *gz.*parquet
>
> After
>
> hduser@rhes564:: :/home/hduser/dba/bin/sales> hdfs dfs -ls
> /user/hduser/sales4
> -rw-r--r--   2 hduser supergroup  40190 2016-07-03 23:23
> /user/hduser/sales4/part-r-0-19100306-f3d6-44fb-8bde-55307101cf3f.parquet
>
>
> Now the question is that if you do not specify the compression  with
> setConf it default to gzip compression?
>
> val s4 = s.write.mode("overwrite").parquet("/user/hduser/sales4")
>
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 3 July 2016 at 23:21, Ted Yu  wrote:
>
>> Have you tried the following (note the extraneous dot in your config
>> name) ?
>>
>> val c = sqlContext.setConf("spark.sql.parquet.compression.codec", "none")
>>
>> Also, parquet() has compression parameter which defaults to None
>>
>> FYI
>>
>> On Sun, Jul 3, 2016 at 2:42 PM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I simply read a Parquet table
>>>
>>> scala> val s = sqlContext.read.parquet("oraclehadoop.sales2")
>>> s: org.apache.spark.sql.DataFrame = [prod_id: bigint, cust_id: bigint,
>>> time_id: timestamp, channel_id: bigint, promo_id: bigint, quantity_sold:
>>> decimal(10,0), amount_sold: decimal(10,0)]
>>>
>>> Now all I want is to save data and make it uncompressed. By default it
>>> saves the table as *gzipped*
>>>
>>> val s4 = s.write.mode("overwrite").parquet("/user/hduser/sales4")
>>>
>>> However, I want use this approach without creating table explicitly
>>> myself with sqlContext etc
>>>
>>> This does not seem to work
>>>
>>> val c = sqlContext.setConf("spark.sql.parquet.compression.codec.",
>>> "uncompressed")
>>>
>>> Can I do through a method on DataFrame "s" above to make the table saved
>>> as uncompressed?
>>>
>>> Thanks,
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>
>>
>


Re: Saving parquet table as uncompressed with write.mode("overwrite").

2016-07-03 Thread Mich Talebzadeh
thanks Ted that was it :)

scala> val c = sqlContext.setConf("spark.sql.parquet.compression.codec",
"uncompressed")
c: Unit = ()
scala> val s4 = s.write.mode("overwrite").parquet("/user/hduser/sales4")
s4: Unit = ()


Before
-rw-r--r--   2 hduser supergroup  17487 2016-07-03 22:28
/user/hduser/sales4/part-r-00199-9dcd4fb8-148d-48ba-9da3-8d68aa24aa5c.*gz.*
parquet

After

hduser@rhes564:: :/home/hduser/dba/bin/sales> hdfs dfs -ls
/user/hduser/sales4
-rw-r--r--   2 hduser supergroup  40190 2016-07-03 23:23
/user/hduser/sales4/part-r-0-19100306-f3d6-44fb-8bde-55307101cf3f.parquet


Now the question is that if you do not specify the compression  with
setConf it default to gzip compression?

val s4 = s.write.mode("overwrite").parquet("/user/hduser/sales4")


Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 3 July 2016 at 23:21, Ted Yu  wrote:

> Have you tried the following (note the extraneous dot in your config name)
> ?
>
> val c = sqlContext.setConf("spark.sql.parquet.compression.codec", "none")
>
> Also, parquet() has compression parameter which defaults to None
>
> FYI
>
> On Sun, Jul 3, 2016 at 2:42 PM, Mich Talebzadeh  > wrote:
>
>> Hi,
>>
>> I simply read a Parquet table
>>
>> scala> val s = sqlContext.read.parquet("oraclehadoop.sales2")
>> s: org.apache.spark.sql.DataFrame = [prod_id: bigint, cust_id: bigint,
>> time_id: timestamp, channel_id: bigint, promo_id: bigint, quantity_sold:
>> decimal(10,0), amount_sold: decimal(10,0)]
>>
>> Now all I want is to save data and make it uncompressed. By default it
>> saves the table as *gzipped*
>>
>> val s4 = s.write.mode("overwrite").parquet("/user/hduser/sales4")
>>
>> However, I want use this approach without creating table explicitly
>> myself with sqlContext etc
>>
>> This does not seem to work
>>
>> val c = sqlContext.setConf("spark.sql.parquet.compression.codec.",
>> "uncompressed")
>>
>> Can I do through a method on DataFrame "s" above to make the table saved
>> as uncompressed?
>>
>> Thanks,
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>
>


Re: Saving parquet table as uncompressed with write.mode("overwrite").

2016-07-03 Thread Ted Yu
Have you tried the following (note the extraneous dot in your config name) ?

val c = sqlContext.setConf("spark.sql.parquet.compression.codec", "none")

Also, parquet() has compression parameter which defaults to None

FYI

On Sun, Jul 3, 2016 at 2:42 PM, Mich Talebzadeh 
wrote:

> Hi,
>
> I simply read a Parquet table
>
> scala> val s = sqlContext.read.parquet("oraclehadoop.sales2")
> s: org.apache.spark.sql.DataFrame = [prod_id: bigint, cust_id: bigint,
> time_id: timestamp, channel_id: bigint, promo_id: bigint, quantity_sold:
> decimal(10,0), amount_sold: decimal(10,0)]
>
> Now all I want is to save data and make it uncompressed. By default it
> saves the table as *gzipped*
>
> val s4 = s.write.mode("overwrite").parquet("/user/hduser/sales4")
>
> However, I want use this approach without creating table explicitly myself
> with sqlContext etc
>
> This does not seem to work
>
> val c = sqlContext.setConf("spark.sql.parquet.compression.codec.",
> "uncompressed")
>
> Can I do through a method on DataFrame "s" above to make the table saved
> as uncompressed?
>
> Thanks,
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Saving parquet table as uncompressed with write.mode("overwrite").

2016-07-03 Thread Mich Talebzadeh
Hi,

I simply read a Parquet table

scala> val s = sqlContext.read.parquet("oraclehadoop.sales2")
s: org.apache.spark.sql.DataFrame = [prod_id: bigint, cust_id: bigint,
time_id: timestamp, channel_id: bigint, promo_id: bigint, quantity_sold:
decimal(10,0), amount_sold: decimal(10,0)]

Now all I want is to save data and make it uncompressed. By default it
saves the table as *gzipped*

val s4 = s.write.mode("overwrite").parquet("/user/hduser/sales4")

However, I want use this approach without creating table explicitly myself
with sqlContext etc

This does not seem to work

val c = sqlContext.setConf("spark.sql.parquet.compression.codec.",
"uncompressed")

Can I do through a method on DataFrame "s" above to make the table saved as
uncompressed?

Thanks,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.