I do understand that Snappy is not splittable as such, but ORCFile is. In
ORC blocks are compressed with snappy so there should be no problem with it.
Anyway ZLIB(used both in ORC and Parquet by default) is also not splittable
but it works perfectly fine.
2015-12-30 16:26 GMT+01:00 Chris Fregly :
Reminder that Snappy is not a splittable format.
I've had success with Hive + LZF (splittable) and bzip2 (also splittable).
Gzip is also not splittable, so you won't be utilizing your cluster to
process this data in parallel as only 1 task can read and process
unsplittable data - versus many task
Didn't anyone used spark with orc and snappy compression?
2015-12-29 18:25 GMT+01:00 Dawid Wysakowicz :
> Hi,
>
> I have a table in hive stored as orc with compression = snappy. I try to
> execute a query on that table that fails (previously I run it on table in
> orc-zlib format and parquet so i