In the latest version both are equally well supported. You need to insert the data sorted on filtering columns Then you will benefit from min max indexes and in case of orc additional from bloom filters, if you configure them. In any case I recommend also partitioning of files (do not confuse with Spark partitioning ).
What is best for you you have to figure out in a test. This highly depends on the data and the analysis you want to do. > On 21. Feb 2018, at 21:54, Kane Kim <kane.ist...@gmail.com> wrote: > > Hello, > > Which format is better supported in spark, parquet or orc? > Will spark use internal sorting of parquet/orc files (and how to test that)? > Can spark save sorted parquet/orc files? > > Thanks! --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org