Re: parquet vs orc files

Jörn Franke Wed, 21 Feb 2018 13:40:29 -0800

In the latest version both are equally well supported.

You need to insert the data sorted on filtering columns
Then you will benefit from min max indexes and in case of orc additional from 
bloom filters, if you configure them.
In any case I recommend also partitioning of files (do not confuse with Spark 
partitioning ).


What is best for you you have to figure out in a test. This highly depends on 
the data and the analysis you want to do. 

> On 21. Feb 2018, at 21:54, Kane Kim <[email protected]> wrote:
> 
> Hello,
> 
> Which format is better supported in spark, parquet or orc?
> Will spark use internal sorting of parquet/orc files (and how to test that)?
> Can spark save sorted parquet/orc files? 
> 
> Thanks!

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: parquet vs orc files

Reply via email to