On Mon, Nov 28, 2011 at 9:50 AM, Craig Ringer <ring...@ringerc.id.au> wrote:
> On 11/25/2011 06:53 AM, Maxim Boguk wrote: > >> I understand that position. >> However if assumption: " the definition of ORDER BY --- it happens after >> computing the select list, according to the SQL standard" >> is correct, >> then plans like: >> >> postgres=# EXPLAIN ANALYZE SELECT * from test order by _data limit 10 >> offset 1000; >> QUERY PLAN >> ------------------------------**------------------------------** >> ------------------------------**------------------------------** >> -------------- >> Limit (cost=2884.19..2913.03 rows=10 width=8) (actual >> time=3.584..3.620 rows=10 loops=1) >> -> Index Scan using random_key on test (cost=0.00..2884190.16 >> rows=1000000 width=8) (actual time=0.103..3.354 rows=1010 loops=1) >> Total runtime: 3.663 ms >> (3 rows) >> should not be used at all. >> > > > `LIMIT' and `OFFSET' are explicitly defined to compute only that part of > the SELECT list that is required. If they weren't specifically defined with > that exception then you'd be right. > > LIMIT and OFFSET aren't standard anyway, so Pg can define them to mean > whatever is most appropriate. The SQL standard is adding new and (as usual) > painfully clumsily worded features that work like LIMIT and OFFSET, but I > don't know whether they have the same rules about whether execution of > functions can be skipped or not. > > > And it is not possible to predict in advance where and when you get hit >> by that problem. >> > > That's the biggest problem with statistics- and heuristics-based query > planners in general, but this does seem to be a particularly difficult case. > > Setting a cost on the function call that more accurately reflects how > expensive it is so PostgreSQL will work harder to avoid calling it might > help. See http://www.postgresql.org/**docs/current/static/sql-** > createfunction.html<http://www.postgresql.org/docs/current/static/sql-createfunction.html>. > > -- > Craig Ringer > Change cost for the functions in that case simple ignored by planner/executor. I think it should be possible always delay execution functions/subqueries unrelated to order by list untill limit/offset were applied (even in the worst case that will provide same performance as today), and no heuristics need at all. Hm, one more idea: lets say I call the next sql query - 'SELECT ...,very_log_sure_toasted_field FROM ... ORDER BY (something but not very_log_toasted_field) LIMIT N' which will use sort as top node. Is detoasting of very_log_sure_toasted_field will be performed after applying ORDER BY... LIMIT N or before it? If detoasting performed before applying order by/limit, than there exists large class of queries where delayed/lazy detoasting can be huge performance win. If detoasting performed after applying order by/limit, than the same mechanics can be used to delay subquery/function execution. PS: Yes I know good response to my complaints: 'patch welcome', but I only started study of postgresql source code and recovering my C coding skills. Unfortunately, I don't think I will be ready to start hacking planner/executor code in short future (planner/executor is most complicated and easiest to break part of the postgresql code, that is definitely not newbie task). -- Maxim Boguk