Re: Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)

2013-11-05 Thread Joe Love
I'm wondering what type of index would work for this as it is a volatile
function. Not knowing how PGs optimizer runs, I'm at a loss as to why this
wouldn't be possible or worth doing. It seems to me that all functions in
the select part of the statement could be calculated at the end of the
query after the results have been gathered, and even after the sorting had
been done as long as the column wasn't part of the order by (or perhaps
group by).

I have an entire set of functions that perform in this way. For example,
I'm selecting a list of all my products and the function does a complex
calculation based on inventory in the warehouse + expected deliveries from
the factory to determine how many of each item is available, and when they
first become available.   What's helpful is for the users search criteria
to initially limit the search result, and then I want to paginate the
results and only show them a few at a time. In the verbose syntax I
mentioned originally, the query performs well, in the most straightforward
syntax, it does not. I'm not sure I even need to hint the optimizer to
perform this type of an optimization as it seems it would be beneficial (or
at least not detrimental) 100% of the time.


On Sat, Nov 2, 2013 at 10:06 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Atri Sharma atri.j...@gmail.com writes:
  I understand the reasons for executing SELECT before the sort. But,
  couldnt we get the planner to see the LIMIT part and push the sort
  node above the select node for this specific case?

 [ Shrug... ]  I don't see the point.  If the OP actually cares about the
 speed of this query, he's going to want to avoid the sort step too,
 which is what makes the index a good idea.

 More generally, this is not a transformation we could apply
 unconditionally --- at the very least it'd need to be avoided when
 volatile functions are involved, and I don't think it's invariably
 a win from a cost standpoint even if there aren't semantic blockers.
 But right now the planner has no real ability to reason about placement
 of SELECT-list evaluation: it's done in a fixed spot in any particular
 plan structure.  I don't think it's practical to add such considerations
 to the rat's nest that is grouping_planner and friends.  I have
 ambitions to replace all that with a Path-generation-and-comparison
 approach, and the Paths used for this would need to carry some
 indication of which expressions would be evaluated where.  So maybe
 once that's done we could think about whether this is worth doing.
 I remain dubious though.

 regards, tom lane




-- 
Joe's Computer Service
405-227-0951
Computer Running Slow? Call Joe!
$125, Your computer will run like new!
www.JoesComputerService.net


Re: Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)

2013-11-05 Thread Tom Lane
Joe Love j...@primoweb.com writes:
 I'm wondering what type of index would work for this as it is a volatile
 function. Not knowing how PGs optimizer runs, I'm at a loss as to why this
 wouldn't be possible or worth doing. It seems to me that all functions in
 the select part of the statement could be calculated at the end of the
 query after the results have been gathered, and even after the sorting had
 been done as long as the column wasn't part of the order by (or perhaps
 group by).

The short answer is that doing so directly contradicts the computational
model defined by the SQL standard, and will break applications that rely
on the current behavior.  Since there's already a clear way to write the
query in a way that specifies evaluating the functions after the
sort/limit steps (ie, put the order by/limit in a sub-select), IMHO that's
what you should do, not lobby to make the optimizer reinterpret what you
wrote.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)

2013-11-05 Thread Atri Sharma
On Tue, Nov 5, 2013 at 11:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Joe Love j...@primoweb.com writes:
 I'm wondering what type of index would work for this as it is a volatile
 function. Not knowing how PGs optimizer runs, I'm at a loss as to why this
 wouldn't be possible or worth doing. It seems to me that all functions in
 the select part of the statement could be calculated at the end of the
 query after the results have been gathered, and even after the sorting had
 been done as long as the column wasn't part of the order by (or perhaps
 group by).

 The short answer is that doing so directly contradicts the computational
 model defined by the SQL standard, and will break applications that rely
 on the current behavior.  Since there's already a clear way to write the
 query in a way that specifies evaluating the functions after the
 sort/limit steps (ie, put the order by/limit in a sub-select), IMHO that's
 what you should do, not lobby to make the optimizer reinterpret what you
 wrote.



+1.

I thought more about our earlier discussion on this, and I agree with
the point that making the planner push limit over select for this
specific case is not a good idea.



-- 
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)

2013-11-02 Thread Atri Sharma
On Sat, Nov 2, 2013 at 2:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jim Nasby j...@nasby.net writes:
 On Oct 31, 2013, at 11:04 AM, Joe Love j...@primoweb.com wrote:
 In postgres 9.2 I have a function that is relatively expensive.  When I 
 write a query such as:

 select expensive_function(o.id),o.* from offeirng o where valid='Y' order 
 by name limit 1;

 Does anyone know what the SQL standard says about this, if anything?

 The computational model is that you evaluate the SELECT list before
 sorting; this must be so since you can write

  select somefunc(x) as y from tab order by y;

 In the general case, therefore, it's impossible to avoid evaluating the
 function at all rows.  I'm not sure what the spec says about whether it's
 okay to skip evaluation of functions that would be evaluated in a naive
 implementation of the computational model, so it's possible that what
 the OP is asking for is directly contrary to spec.  But more likely they
 permit implementations to skip unnecessary calls, if indeed they address
 this at all.

 As far as PG goes, I think the excess calls would only occur if the plan
 includes an explicit sort step, since the select list would be evaluated
 before the sort step.  If you've got an index on name (or maybe you'd
 need (valid, name) if there aren't many rows with valid = 'Y') I'd expect
 it to pick out the minimal name row with the index, avoiding any sort,
 and then the function would only be evaluated on the single fetched row.
 But these are implementation details not anything you're going to find
 support for in the spec.

 regards, tom lane

Doesnt that make the index mandatory here?

 If I understand correctly, if an index is present, the sort will be
avoided altogether. IMHO, thats avoiding the problem. The question
here is that whether we can add planner heuristics for understanding
this case and executing the LIMIT part first (before executing the
funtion).

I understand the reasons for executing SELECT before the sort. But,
couldnt we get the planner to see the LIMIT part and push the sort
node above the select node for this specific case?

So, seeing the LIMIT, the dataset is first sorted, then LIMITed, then
the function applied.

Is this process possible?

Regards,

Atri
-- 
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)

2013-11-02 Thread Tom Lane
Atri Sharma atri.j...@gmail.com writes:
 I understand the reasons for executing SELECT before the sort. But,
 couldnt we get the planner to see the LIMIT part and push the sort
 node above the select node for this specific case?

[ Shrug... ]  I don't see the point.  If the OP actually cares about the
speed of this query, he's going to want to avoid the sort step too,
which is what makes the index a good idea.

More generally, this is not a transformation we could apply
unconditionally --- at the very least it'd need to be avoided when
volatile functions are involved, and I don't think it's invariably
a win from a cost standpoint even if there aren't semantic blockers.
But right now the planner has no real ability to reason about placement
of SELECT-list evaluation: it's done in a fixed spot in any particular
plan structure.  I don't think it's practical to add such considerations
to the rat's nest that is grouping_planner and friends.  I have
ambitions to replace all that with a Path-generation-and-comparison
approach, and the Paths used for this would need to carry some
indication of which expressions would be evaluated where.  So maybe
once that's done we could think about whether this is worth doing.
I remain dubious though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)

2013-11-01 Thread Jim Nasby
On Oct 31, 2013, at 11:04 AM, Joe Love j...@primoweb.com wrote:
 In postgres 9.2 I have a function that is relatively expensive.  When I write 
 a query such as:
 
 select expensive_function(o.id),o.* from offeirng o where valid='Y' order by 
 name limit 1;
 
 the query runs slow and appears to be running the function on each ID, which 
 in this case should be totally unnecessary as it really only needs to run on 
 1 row.
 
 When I rewrite the query like so:
 
 select expensive_function(o.id), o.*
 from (select *offering where valid='Y' order by name limit 1) o;
 
 the expensive function only runs once and thus, much faster. I would think 
 that the optimizer could handle this situation, especially when limit or 
 offset is used and the expensive function is not used in a group by, order by 
 or where.

Does anyone know what the SQL standard says about this, if anything? I can't 
see any way that this would change the result set, but of course if the 
function has external effects this would make a difference...

Re: Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)

2013-11-01 Thread Tom Lane
Jim Nasby j...@nasby.net writes:
 On Oct 31, 2013, at 11:04 AM, Joe Love j...@primoweb.com wrote:
 In postgres 9.2 I have a function that is relatively expensive.  When I 
 write a query such as:
 
 select expensive_function(o.id),o.* from offeirng o where valid='Y' order by 
 name limit 1;

 Does anyone know what the SQL standard says about this, if anything?

The computational model is that you evaluate the SELECT list before
sorting; this must be so since you can write

 select somefunc(x) as y from tab order by y;

In the general case, therefore, it's impossible to avoid evaluating the
function at all rows.  I'm not sure what the spec says about whether it's
okay to skip evaluation of functions that would be evaluated in a naive
implementation of the computational model, so it's possible that what
the OP is asking for is directly contrary to spec.  But more likely they
permit implementations to skip unnecessary calls, if indeed they address
this at all.

As far as PG goes, I think the excess calls would only occur if the plan
includes an explicit sort step, since the select list would be evaluated
before the sort step.  If you've got an index on name (or maybe you'd
need (valid, name) if there aren't many rows with valid = 'Y') I'd expect
it to pick out the minimal name row with the index, avoiding any sort,
and then the function would only be evaluated on the single fetched row.
But these are implementation details not anything you're going to find
support for in the spec.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers