Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-11-23 Thread Kevin Grittner
Merlin Moncure wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Merlin Moncure mmonc...@gmail.com wrote:

 Hm, I bet it's possible (although probably not easy) to deduce
 volatility from the function body...maybe through the validator.
 If you could do that (perhaps warning in cases where you can't),
 then the performance regression-inducing-argument (which I agree
 with) becomes greatly ameliorated.

 For about the last 10 years the Wisconsin Courts have been parsing
 SQL code to generate Java query classes, including stored
 procedures, and determining information like this. For example,
 we set a readOnly property for the query class by examining the
 statements in the procedure and the readOnly status of each called
 procedure. It wasn't that hard for us, and I'm not sure what would
 make much it harder in PostgreSQL, if we can do it where a parse
 tree for the function is handy.
 
 hm, what do you do about 'after the fact' changes to things the
 procedure body is pointing to? what would the server have to do?

We did a regeneration of the whole set near the end of each release
cycle, and that or smaller changes as needed during the release
cycle. Of course, we didn't have any equivalent of pg_depend.

-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-20 Thread Merlin Moncure
On Wed, Sep 19, 2012 at 2:39 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 It still seems like awfully weird behavior.

 Why?  The WHERE condition relates only to the output of the _stats
 subquery, so why shouldn't it be evaluated there, rather than
 after the join?

 In another thread, Tom Lane t...@sss.pgh.pa.us wrote:
 It's easier to understand why this is if you realize that SQL has
 a very clear model of a pipeline of query execution.
 Conceptually, what happens is:

 1. Form the cartesian product of the tables listed in FROM (ie,
 all combinations of rows).

 2. Apply the WHERE condition to each row from 1, and drop rows
 that don't pass it.

 People expect that the results will be consistent with this model,
 even if the implementation is optimized under the covers.  I think
 correct semantics should trump performance here.

Hm, I bet it's possible (although probably not easy) to deduce
volatility from the function body...maybe through the validator. If
you could do that (perhaps warning in cases where you can't), then the
performance regression-inducing-argument (which I agree with) becomes
greatly ameliorated.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-20 Thread Merlin Moncure
On Thu, Sep 20, 2012 at 9:24 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Merlin Moncure mmonc...@gmail.com wrote:

 Hm, I bet it's possible (although probably not easy) to deduce
 volatility from the function body...maybe through the validator.
 If you could do that (perhaps warning in cases where you can't),
 then the performance regression-inducing-argument (which I agree
 with) becomes greatly ameliorated.

 For about the last 10 years the Wisconsin Courts have been parsing
 SQL code to generate Java query classes, including stored
 procedures, and determining information like this.  For example, we
 set a readOnly property for the query class by examining the
 statements in the procedure and the readOnly status of each called
 procedure.  It wasn't that hard for us, and I'm not sure what would
 make much it harder in PostgreSQL, if we can do it where a parse
 tree for the function is handy.

hm, what do you do about 'after the fact' changes to things the
procedure body is pointing to?  what would the server have to do?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Tom Lane
florian.schoppm...@emc.com (Florian Schoppmann) writes:
 In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query

 --8--
 WITH source AS (
 SELECT i FROM generate_series(1,10) AS i
 )
 SELECT
 i
 FROM
 source, (
 SELECT
 count(*) AS _n
 FROM source
 ) AS _stats
 WHERE
 random()  5::DOUBLE PRECISION/_n;
 --8--

[ doesn't do what you think it should ]

I can't get excited about this.  Any time you put a volatile function
into WHERE, you're playing with fire.  The docs warn against it:
http://www.postgresql.org/docs/9.2/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL

To do what you want, I'd suggest wrapping the join into a sub-select
with an OFFSET 0 clause, which will serve as an optimization fence
that prevents the random() call from being pushed down.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Merlin Moncure
On Wed, Sep 19, 2012 at 9:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 florian.schoppm...@emc.com (Florian Schoppmann) writes:
 In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query

 --8--
 WITH source AS (
 SELECT i FROM generate_series(1,10) AS i
 )
 SELECT
 i
 FROM
 source, (
 SELECT
 count(*) AS _n
 FROM source
 ) AS _stats
 WHERE
 random()  5::DOUBLE PRECISION/_n;
 --8--

 [ doesn't do what you think it should ]

 I can't get excited about this.  Any time you put a volatile function
 into WHERE, you're playing with fire.  The docs warn against it:
 http://www.postgresql.org/docs/9.2/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL

 To do what you want, I'd suggest wrapping the join into a sub-select
 with an OFFSET 0 clause, which will serve as an optimization fence
 that prevents the random() call from being pushed down.

I like the more standard CTE approach to optimization fencing where it works:

postgres=# WITH source AS (
SELECT i, random() r
FROM generate_series(1,10) AS i
)
SELECT
i
FROM
source, (
SELECT
count(*) AS _n
FROM source
) AS _stats
WHERE
r  5::DOUBLE PRECISION/_n;

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Robert Haas
On Wed, Sep 19, 2012 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 florian.schoppm...@emc.com (Florian Schoppmann) writes:
 In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query

 --8--
 WITH source AS (
 SELECT i FROM generate_series(1,10) AS i
 )
 SELECT
 i
 FROM
 source, (
 SELECT
 count(*) AS _n
 FROM source
 ) AS _stats
 WHERE
 random()  5::DOUBLE PRECISION/_n;
 --8--

 [ doesn't do what you think it should ]

 I can't get excited about this.  Any time you put a volatile function
 into WHERE, you're playing with fire.  The docs warn against it:
 http://www.postgresql.org/docs/9.2/static/sql-expressions.html#SYNTAX-EXPRESS-EVAL

 To do what you want, I'd suggest wrapping the join into a sub-select
 with an OFFSET 0 clause, which will serve as an optimization fence
 that prevents the random() call from being pushed down.

You've repeatedly objected to complaints on pgsql-performance on the
grounds that WITH is an optimization fence.  It seems awfully
inconsistent to turn around and say, oh, sometimes it's not a fence
after all.  It seems that users may not rely on WITH either to do the
optimizations necessary to have good performance or to fail to do
optimizations that lead to wrong results.  Ouch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Sep 19, 2012 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 To do what you want, I'd suggest wrapping the join into a sub-select
 with an OFFSET 0 clause, which will serve as an optimization fence
 that prevents the random() call from being pushed down.

 You've repeatedly objected to complaints on pgsql-performance on the
 grounds that WITH is an optimization fence.  It seems awfully
 inconsistent to turn around and say, oh, sometimes it's not a fence
 after all.

Huh?  The join in question is not inside a WITH.  If it were, that
would work too, as noted by Merlin.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Robert Haas
On Wed, Sep 19, 2012 at 12:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Sep 19, 2012 at 10:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 To do what you want, I'd suggest wrapping the join into a sub-select
 with an OFFSET 0 clause, which will serve as an optimization fence
 that prevents the random() call from being pushed down.

 You've repeatedly objected to complaints on pgsql-performance on the
 grounds that WITH is an optimization fence.  It seems awfully
 inconsistent to turn around and say, oh, sometimes it's not a fence
 after all.

 Huh?  The join in question is not inside a WITH.  If it were, that
 would work too, as noted by Merlin.

Oh, hmm.  I see now: the problem isn't that random() is being pushed
into the WITH, it's that it's being pushed into the join.  Sorry, I
should have read that more carefully.

It still seems like awfully weird behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 It still seems like awfully weird behavior.

Why?  The WHERE condition relates only to the output of the _stats
subquery, so why shouldn't it be evaluated there, rather than after
the join?

In a green field I might agree that we should de-optimize such cases,
but the problem with doing so is that it would totally destroy
performance for cases in which a user has defined a function that's
actually stable or immutable but they forgot to mark it so.  If
VOLATILE weren't the default marking, such a change wouldn't be so
problematic ... but it is.  Given that the behavior has been like
this since the late stone age, I'm not inclined to change it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Robert Haas
On Wed, Sep 19, 2012 at 1:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 It still seems like awfully weird behavior.

 Why?  The WHERE condition relates only to the output of the _stats
 subquery, so why shouldn't it be evaluated there, rather than after
 the join?

Because my mental model (and apparently that of the OP) is that the
WHERE clause gets evaluated separately for each row.  Obviously in
many cases that can be optimized without changing the results, but not
in this case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 It still seems like awfully weird behavior.
 
 Why?  The WHERE condition relates only to the output of the _stats
 subquery, so why shouldn't it be evaluated there, rather than
 after the join?

In another thread, Tom Lane t...@sss.pgh.pa.us wrote:
 It's easier to understand why this is if you realize that SQL has
 a very clear model of a pipeline of query execution. 
 Conceptually, what happens is:
 
 1. Form the cartesian product of the tables listed in FROM (ie,
 all combinations of rows).
 
 2. Apply the WHERE condition to each row from 1, and drop rows
 that don't pass it.
 
People expect that the results will be consistent with this model,
even if the implementation is optimized under the covers.  I think
correct semantics should trump performance here.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread k...@rice.edu
On Wed, Sep 19, 2012 at 02:39:12PM -0500, Kevin Grittner wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  It still seems like awfully weird behavior.
  
  Why?  The WHERE condition relates only to the output of the _stats
  subquery, so why shouldn't it be evaluated there, rather than
  after the join?
 
 In another thread, Tom Lane t...@sss.pgh.pa.us wrote:
  It's easier to understand why this is if you realize that SQL has
  a very clear model of a pipeline of query execution. 
  Conceptually, what happens is:
  
  1. Form the cartesian product of the tables listed in FROM (ie,
  all combinations of rows).
  
  2. Apply the WHERE condition to each row from 1, and drop rows
  that don't pass it.
  
 People expect that the results will be consistent with this model,
 even if the implementation is optimized under the covers.  I think
 correct semantics should trump performance here.
  
 -Kevin
 

It seems like this is what happens here except that the function is
evaluated once for the WHERE and not once per ROW. Both of these meet
the criterion for 2 above and Tom's earlier comments both hold.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Kevin Grittner
k...@rice.edu k...@rice.edu wrote:
 On Wed, Sep 19, 2012 at 02:39:12PM -0500, Kevin Grittner wrote:
 In another thread, Tom Lane t...@sss.pgh.pa.us wrote:
 
 2. Apply the WHERE condition to each row from 1, and drop rows
 that don't pass it.
  
 People expect that the results will be consistent with this
 model, even if the implementation is optimized under the
 covers.  I think correct semantics should trump performance
 here.
 
 It seems like this is what happens here except that the function
 is evaluated once for the WHERE and not once per ROW. Both of
 these meet the criterion for 2 above and Tom's earlier comments
 both hold.
 
There really needs to be some way to specify that when an expression
is evaluated for each row in a set, a function used within that
expression is not optimized away for some rows.  Fortunately we have
a way:
 
http://www.postgresql.org/docs/9.2/interactive/sql-createfunction.html
 
| VOLATILE indicates that the function value can change even within
| a single table scan, so no optimizations can be made. Relatively
| few database functions are volatile in this sense; some examples
| are random(), [...]
 
The behavior in the OP's query would certainly be sane if the
function were not VOLATILE; as it is, I have a hard time seeing this
as anything but a bug.

There is a workaround, if you don't mind ugly:
 
CREATE FUNCTION random_really_i_mean_it(dummy int)
  RETURNS double precision
  LANGUAGE plpgsql
  VOLATILE
AS $$
BEGIN
  -- no need to reference dummy parameter
  RETURN random();
END;
$$;

WITH source AS (
SELECT i FROM generate_series(1,10) AS i
)
SELECT
i
FROM
source, (
SELECT
count(*) AS _n
FROM source
) AS _stats
WHERE
random_really_i_mean_it(i)  5::DOUBLE PRECISION/_n;
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 There is a workaround, if you don't mind ugly:
 
Or, better:
 
WITH source AS (
SELECT i, random() AS r FROM generate_series(1,10) AS i
)
SELECT
i
FROM
source, (
SELECT
count(*) AS _n
FROM source
) AS _stats
WHERE
r  5::DOUBLE PRECISION/_n;
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread David Johnston
 -Original Message-
 There really needs to be some way to specify that when an expression is
 evaluated for each row in a set, a function used within that expression is
not
 optimized away for some rows.  Fortunately we have a way:
 
 http://www.postgresql.org/docs/9.2/interactive/sql-createfunction.html
 
 | VOLATILE indicates that the function value can change even within a
 | single table scan, so no optimizations can be made. Relatively few
 | database functions are volatile in this sense; some examples are
 | random(), [...]
 
 The behavior in the OP's query would certainly be sane if the function
were
 not VOLATILE; as it is, I have a hard time seeing this as anything but a
bug.

What are the arguments against adding a 4th identifier - call it PER_ROW for
this argument?  The main reason VOLATILE is broken is that it is the default
and in order to minimize beginner's penalty it is not treated as such in
some situations.  The new one could behave just like VOLATILE but would
never be optimized away and would always evaluate once for each row in its
context.  

Then the question is whether you write a new random() function or break
backwards compatibility and alter the existing version.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Kevin Grittner
David Johnston pol...@yahoo.com wrote:
 
 | VOLATILE indicates that the function value can change even
 | within a single table scan, so no optimizations can be made.
 | Relatively few database functions are volatile in this sense;
 | some examples are random(), [...]
 
 What are the arguments against adding a 4th identifier - call it
 PER_ROW for this argument?  The main reason VOLATILE is broken is
 that it is the default and in order to minimize beginner's penalty
 it is not treated as such in some situations.  The new one could
 behave just like VOLATILE but would never be optimized away and
 would always evaluate once for each row in its context.  
 
So how would you document that?  It sounds like the proposed level
would behave exactly as the VOLATILE level is currently documented
to behave; so I guess we could shift the documentation of VOLATILE
to PER_ROW (or whatever).  How would you then describe the behavior
of VOLATILE?
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread David Johnston
 -Original Message-
 
  | VOLATILE indicates that the function value can change even within a
  | single table scan, so no optimizations can be made.
  | Relatively few database functions are volatile in this sense; some
  | examples are random(), [...]
 
  What are the arguments against adding a 4th identifier - call it
  PER_ROW for this argument?  The main reason VOLATILE is broken is that
  it is the default and in order to minimize beginner's penalty it is
  not treated as such in some situations.  The new one could behave just
  like VOLATILE but would never be optimized away and would always
  evaluate once for each row in its context.
 
 So how would you document that?  It sounds like the proposed level would
 behave exactly as the VOLATILE level is currently documented to behave; so
I
 guess we could shift the documentation of VOLATILE to PER_ROW (or
 whatever).  How would you then describe the behavior of VOLATILE?
 

I'm not sure but however we would describe it we might as well make the
change now regardless of whether another level is added.

The main distinguishing characteristic is that VOLATILE is not guaranteed to
evaluate once-per-row if it is not dependent upon particular values within a
given row.  

VOLATILE: A Volatile function used in an ORDER BY or WHERE clause without
referencing any columns from the query itself (i.e., no parameters or all
constants) will be evaluated a single time and the result treated as a
constant (i.e., all rows will have identical values) for that part of the
query.

PER_ROW: A per_row function will be evaluated once for every row that is
visible to the function and will be treated as a virtual column of said
relation with each cell having an its own value as a result of the
function call.

Using random() as an example of the two possible behaviors should further
clarify the differences quite nicely.

Quick pass - hopefully, a) this inspires someone else, and b) this is the
correct understanding in the first place.

David J.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Kevin Grittner
David Johnston pol...@yahoo.com wrote:
 
 VOLATILE: A Volatile function used in an ORDER BY or WHERE clause
 without referencing any columns from the query itself (i.e., no
 parameters or all constants) will be evaluated a single time and
 the result treated as a constant (i.e., all rows will have
 identical values) for that part of the query.
 
I hope you're wrong about the ORDER BY part of that.  A quick test
confirms that it works in ORDER BY, at least for some cases.  If
there are any exceptions to that, I would sure like to know about
it -- and really soon.
 
select * from generate_series(1, 1) s(n)
  order by random() limit 10;
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread David Johnston
 -Original Message-
 From: Kevin Grittner [mailto:kevin.gritt...@wicourts.gov]
 Sent: Wednesday, September 19, 2012 5:51 PM
 To: k...@rice.edu; David Johnston
 Cc: 'Florian Schoppmann'; 'Robert Haas'; pgsql-hackers@postgresql.org;
'Tom
 Lane'
 Subject: RE: [HACKERS] Invalid optimization of VOLATILE function in WHERE
 clause?
 
 David Johnston pol...@yahoo.com wrote:
 
  VOLATILE: A Volatile function used in an ORDER BY or WHERE clause
  without referencing any columns from the query itself (i.e., no
  parameters or all constants) will be evaluated a single time and the
  result treated as a constant (i.e., all rows will have identical
  values) for that part of the query.
 
 I hope you're wrong about the ORDER BY part of that.  A quick test
confirms
 that it works in ORDER BY, at least for some cases.  If there are any
 exceptions to that, I would sure like to know about it -- and really soon.
 
 select * from generate_series(1, 1) s(n)
   order by random() limit 10;
 
 -Kevin

I'd rather have someone who knows the code assert one way or the other; I
tossed it in there because I thought I've seen people complain that random()
doesn't work as expected with ORDER BY but that may just be faulty memory.
It may or may not depend on whether LIMIT/OFFSET are involved...?  Used in
the SELECT-list it gets evaluated for each row and I guess the ORDER BY
could have that behavior as well (I would expect it to anyway), so is it
strictly limited to WHERE clause evaluation that this discrepancy manifests?

David J.






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-19 Thread Florian Schoppmann
Tom Lane t...@sss.pgh.pa.us wrote:

 florian.schoppm...@emc.com (Florian Schoppmann) writes:
  [VOLATILE function in WHERE clause *does* get optimized]
 
 I can't get excited about this.  Any time you put a volatile function
 into WHERE, you're playing with fire.  The docs warn against it:
 http://www.postgresql.org/docs/9.2/static/sql-expressions.html#SYNTAX-
 EXPRESS-EVAL

All this section tells me is that one cannot rely on the evaluation
order in expressions, and that side effects are dangerous in WHERE and
HAVING clauses.

I do not read in this section that VOLATILE functions are unsafe per se.
After all, a VOLATILE function is not required to have side effects
(suppose, e.g., somebody implemented a true random nubmer generator).

However,
http://www.postgresql.org/docs/9.2/interactive/sql-createfunction.html
is actually very clear in its wording:

| VOLATILE indicates that the function value can change even within a
| single table scan, so no optimizations can be made.

I therefore tend to see the behavior as a bug. I concede though that
there is also good reason to keep the current behavior (VOLATILE is the
default for UDFs, etc.). But then I think the documentation needs to be
changed, and it has to be made explicit where the optimizer may change
semantics and what, on the other hand, is defined behavior.

E.g., users should need to know if the following rewrite causes defined
or undefined behavior. Does it give correct results by accident, or are
correct results guaranteed?

--8--
WITH source AS (
SELECT i FROM generate_series(1,10) AS i
)
SELECT
i
FROM
source AS _stats
WHERE
random()  5::DOUBLE PRECISION 
   / (SELECT count(*) FROM source);
--8--

 To do what you want, I'd suggest wrapping the join into a sub-select
 with an OFFSET 0 clause, which will serve as an optimization fence
 that prevents the random() call from being pushed down.

My interpretation so far is that VOLATILE functions in a WHERE clause
can always be avoided: E.g., move the condition to the SELECT list and
embed in an outer query that then filters on the condition column. Or is
my assumption wrong, and the optimizer could theoretically interfere
even here?

Florian



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers