I was about to say that for IN lists of size 20 or more, Drill uses a more efficient Values operator instead of OR conditions but then realized the OR filter is referencing 4 different columns : $1..$4 and each of those individual lists is less than 20. Sungwook, can you please provide the SQL query and any view definitions or anything that goes with it ? It is difficult to figure out things without the full picture. thanks, Aman
On Mon, Aug 24, 2015 at 5:10 PM, Ted Dunning <[email protected]> wrote: > On Mon, Aug 24, 2015 at 4:50 PM, Sungwook Yoon <[email protected]> wrote: > > > Still, the performance drop down due to OR filtering is just > astounding... > > > > That is what query optimizers are for and why getting them to work well is > important. > > The difference in performance that you are observing is not surprising > given the redundant work that you are seeing. Using the OR operator > prevents any significant short-circuiting and the repeated conversion > operations that are happening make the evaluation much more expensive than > it would otherwise be (a dozen extra copies where only one is needed). > > Other queries that can be subject to similar problems include common table > expressions that read the same (large) input file many times. So far, > Drill doesn't optimize all such expressions well. >
