Tom Lane wrote:
Yes, letting the planner make its own decision would seem best (in
accordance with what we do for different join paths). But for large IN
lists, a substantial part of the planner is spent in estimating the
selectivity of the ScalarArrayExpr by calling scalararraysel. If we are
not eliminating this step in processing the IN list then we are not
doing any optimization. Asking the planner to do scalararraysel and also
compute cost of any other way and choose between the two is asking
planner to do more work.
"Atul Deopujari" <[EMAIL PROTECTED]> writes:
Tom Lane wrote:
I thought of giving this a shot and while I was working on it, it
occurred to me that we need to decide on a threshold value of the IN
list size above which such transformation should take place.
That's the least of the problems. We really ought to convert such cases
into an IN (VALUES(...)) type of query, since often repeated indexscans
aren't the best implementation.
I see no good reason to suppose that there is/should be a constant
threshold --- most likely it depends on size of table, availability of
indexes, etc. Having the planner try it both ways and compare costs
would be best.
Factors such as size of table, availability of index etc. would affect
both the ways similarly. So, if we see a gain in the execution of the IN
list due to an external factor then we will also see a similar gain in
the execution of the transformed IN (VALUES(...)) clause.
I agree that one value would not fit all cases. The problem with this
approach is that for some cases, large IN list would perform better than
the transformed IN (VALUES(...)) clause. But we know that the
transformed IN (VALUES(...)) clause has almost a steady state behavior
and it would not blow off the planner estimates. The error would be just
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?