I was about to say that for IN lists of size 20 or more, Drill uses a more
efficient Values operator instead of OR conditions but then realized the OR
filter is referencing 4 different columns : $1..$4 and each of those
individual lists is less than 20.  Sungwook,  can you please provide the
SQL query and any view definitions or anything that goes with it ?  It is
difficult to figure out things without the full picture.
thanks,
Aman

On Mon, Aug 24, 2015 at 5:10 PM, Ted Dunning <[email protected]> wrote:

> On Mon, Aug 24, 2015 at 4:50 PM, Sungwook Yoon <[email protected]> wrote:
>
> > Still, the performance drop down due to OR filtering is just
> astounding...
> >
>
> That is what query optimizers are for and why getting them to work well is
> important.
>
> The difference in performance that you are observing is not surprising
> given the redundant work that you are seeing. Using the OR operator
> prevents any significant short-circuiting and the repeated conversion
> operations that are happening make the evaluation much more expensive than
> it would otherwise be (a dozen extra copies where only one is needed).
>
> Other queries that can be subject to similar problems include common table
> expressions that read the same (large) input file many times.  So far,
> Drill doesn't optimize all such expressions well.
>

Reply via email to