Re: [PERFORM] Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries

Maxim Boguk Sun, 27 Nov 2011 16:38:36 -0800

On Mon, Nov 28, 2011 at 9:50 AM, Craig Ringer <ring...@ringerc.id.au> wrote:

> On 11/25/2011 06:53 AM, Maxim Boguk wrote:
>
>> I understand that position.
>> However if assumption: " the definition of ORDER BY --- it happens after
>> computing the select list, according to the SQL standard"
>> is correct,
>> then plans like:
>>
>> postgres=# EXPLAIN ANALYZE SELECT * from test order by _data limit 10
>> offset 1000;
>>                                                               QUERY PLAN
>> ------------------------------**------------------------------**
>> ------------------------------**------------------------------**
>> --------------
>>  Limit  (cost=2884.19..2913.03 rows=10 width=8) (actual
>> time=3.584..3.620 rows=10 loops=1)
>>    ->  Index Scan using random_key on test  (cost=0.00..2884190.16
>> rows=1000000 width=8) (actual time=0.103..3.354 rows=1010 loops=1)
>>  Total runtime: 3.663 ms
>> (3 rows)
>> should not be used at all.
>>
>
>
> `LIMIT' and `OFFSET' are explicitly defined to compute only that part of
> the SELECT list that is required. If they weren't specifically defined with
> that exception then you'd be right.
>
> LIMIT and OFFSET aren't standard anyway, so Pg can define them to mean
> whatever is most appropriate. The SQL standard is adding new and (as usual)
> painfully clumsily worded features that work like LIMIT and OFFSET, but I
> don't know whether they have the same rules about whether execution of
> functions can be skipped or not.
>
>
>  And it is not possible to predict in advance where and when you get hit
>> by that problem.
>>
>
> That's the biggest problem with statistics- and heuristics-based query
> planners in general, but this does seem to be a particularly difficult case.
>
> Setting a cost on the function call that more accurately reflects how
> expensive it is so PostgreSQL will work harder to avoid calling it might
> help. See http://www.postgresql.org/**docs/current/static/sql-**
> createfunction.html<http://www.postgresql.org/docs/current/static/sql-createfunction.html>.
>
> --
> Craig Ringer
>

Change cost for the functions in that case simple ignored by
planner/executor.

I think it should be possible always delay execution functions/subqueries
unrelated to order by list untill limit/offset were applied (even in the
worst case that will provide same performance as today), and no heuristics
need at all.

Hm, one more idea:  lets say I call the next sql query -
'SELECT ...,very_log_sure_toasted_field  FROM ... ORDER BY (something but
not very_log_toasted_field) LIMIT N'
which will use sort as top node.

Is detoasting of very_log_sure_toasted_field will be performed after
applying ORDER BY... LIMIT N or before it?

If  detoasting performed before applying order by/limit, than there exists
large class of queries where delayed/lazy detoasting can be huge
performance win.
If detoasting performed after applying order by/limit, than the same
mechanics can be used to delay subquery/function execution.

PS: Yes I know good response to my complaints: 'patch welcome', but I only
started study of postgresql source code and recovering my C coding skills.
Unfortunately, I don't think I will be ready to start hacking
planner/executor code in short future (planner/executor is most complicated
and easiest to break part of the postgresql code, that  is definitely not
newbie task).

-- 
Maxim Boguk

Re: [PERFORM] Some question about lazy subquery/procedures execution in SELECT ... ORDER BY... LIMIT N queries

Reply via email to