Re: SparkSQL Hive orc snappy table

2015-12-30 Thread Dawid Wysakowicz
I do understand that Snappy is not splittable as such, but ORCFile is. In ORC blocks are compressed with snappy so there should be no problem with it. Anyway ZLIB(used both in ORC and Parquet by default) is also not splittable but it works perfectly fine. 2015-12-30 16:26 GMT+01:00 Chris Fregly :

Re: SparkSQL Hive orc snappy table

2015-12-30 Thread Chris Fregly
Reminder that Snappy is not a splittable format. I've had success with Hive + LZF (splittable) and bzip2 (also splittable). Gzip is also not splittable, so you won't be utilizing your cluster to process this data in parallel as only 1 task can read and process unsplittable data - versus many task

Re: SparkSQL Hive orc snappy table

2015-12-30 Thread Dawid Wysakowicz
Didn't anyone used spark with orc and snappy compression? 2015-12-29 18:25 GMT+01:00 Dawid Wysakowicz : > Hi, > > I have a table in hive stored as orc with compression = snappy. I try to > execute a query on that table that fails (previously I run it on table in > orc-zlib format and parquet so i