Hi all,

In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query

--8<--
WITH source AS (
    SELECT i FROM generate_series(1,10) AS i
)
SELECT
    i
FROM
    source, (
        SELECT
            count(*) AS _n
        FROM source
    ) AS _stats
WHERE
    random() < 5::DOUBLE PRECISION/_n;
-->8--

translates into the following query plan:

--8<--
 Nested Loop  (cost=35.00..65.03 rows=1000 width=4)
   CTE source
     ->  Function Scan on generate_series i  (cost=0.00..10.00 rows=1000
         width=4)
   ->  Aggregate  (cost=25.00..25.02 rows=1 width=0)
         Filter: (random() < (5::double precision / (count(*))::double
         precision))
         ->  CTE Scan on source  (cost=0.00..20.00 rows=1000 width=0)
   ->  CTE Scan on source  (cost=0.00..20.00 rows=1000 width=4)
-->8--

In other words, the query either gives exactly 0 or 10 rows, and both
cases happen with probability 0.5. Naturally, I would have expected
instead that each row is sampled independently with probability 0.5.

Since random() is volatile, so is the whole where-expression. So I
wonder why the condition is pushed down to the lowest level, given that
this changes results. Is this behavior correct, i.e., specified
somewhere? Or is this a bug?

Florian



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to