Thanks, how does min/max index work? Can spark itself configure bloom filters when saving as orc?
On Wed, Feb 21, 2018 at 1:40 PM, Jörn Franke <jornfra...@gmail.com> wrote: > In the latest version both are equally well supported. > > You need to insert the data sorted on filtering columns > Then you will benefit from min max indexes and in case of orc additional > from bloom filters, if you configure them. > In any case I recommend also partitioning of files (do not confuse with > Spark partitioning ). > > What is best for you you have to figure out in a test. This highly depends > on the data and the analysis you want to do. > > > On 21. Feb 2018, at 21:54, Kane Kim <kane.ist...@gmail.com> wrote: > > > > Hello, > > > > Which format is better supported in spark, parquet or orc? > > Will spark use internal sorting of parquet/orc files (and how to test > that)? > > Can spark save sorted parquet/orc files? > > > > Thanks! >