Thanks, how does min/max index work? Can spark itself configure bloom
filters when saving as orc?

On Wed, Feb 21, 2018 at 1:40 PM, Jörn Franke <jornfra...@gmail.com> wrote:

> In the latest version both are equally well supported.
>
> You need to insert the data sorted on filtering columns
> Then you will benefit from min max indexes and in case of orc additional
> from bloom filters, if you configure them.
> In any case I recommend also partitioning of files (do not confuse with
> Spark partitioning ).
>
> What is best for you you have to figure out in a test. This highly depends
> on the data and the analysis you want to do.
>
> > On 21. Feb 2018, at 21:54, Kane Kim <kane.ist...@gmail.com> wrote:
> >
> > Hello,
> >
> > Which format is better supported in spark, parquet or orc?
> > Will spark use internal sorting of parquet/orc files (and how to test
> that)?
> > Can spark save sorted parquet/orc files?
> >
> > Thanks!
>

Reply via email to