I'm not aware of anyone having written a tuning guide for ORC. If someone
has one, it would be great to add to the ORC website.

Some of the top level points:
* Stripe size is a tradeoff:
    + larger is better for throughput and compression
    + smaller is better for parallelism and memory consumption
* Sorting is a big win for predicate pushdown with either equals or
comparison operators.
* Stride size is the granularity of the index:
    + larger consumes less space
    + smaller provides faster seeks and predicate push down
* Bloom filters are good for columns with predicate pushdown with equals
operators

.. Owen

On Mon, Oct 3, 2016 at 6:04 AM, Rohit <[email protected]> wrote:

> Is there a design and tuning guide for ORC that may cover things like
> choosing and implications / impact of:
> -  partitioning column
> -  sorting column(s)
> -  strip size
> -  stride size
> -  bloom filters
> -  anything else ...
>
> Rohit
>

Reply via email to