subject:"\[HACKERS\] Estimates not taking null_frac element into account with @@ operator\? \(8.4 .. git\-head\)"

Re: [HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

2011-02-17 Thread Tom Lane

Jesper Krogh  writes:
> Attached patch tries to align the behaviour

Applied with a bit of editorialization.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

2011-02-17 Thread Tom Lane

Jesper Krogh  writes:
> On 2011-02-17 23:20, Tom Lane wrote:
>> The proposed patch seems wrong to me: if we're estimating on the basis
>> of most-common-value fractions, the null_frac is already accounted for,
>> because it's not part of the MCV selectivity fractions.  IOW, aren't you
>> double-counting the null fraction?

> It might be the wrong place to fix, but here it seems like we're only
> counting MCE-freqs based on non-null elements:
> http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/backend/tsearch/ts_typanalyze.c;h=2654d644579fd1959282d83919474f42540ca703;hb=HEAD#l396

Hmm, you're right, and the specification in pg_statistic.h neglects to
say that.  This does need work.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

2011-02-17 Thread Jesper Krogh


On 2011-02-17 23:20, Tom Lane wrote:

Jesper Krogh  writes:

When something evaluates to "null" isn't included in the result,
shouldn't the query-planner
then take the null_frac into account when computing the estimate?

The proposed patch seems wrong to me: if we're estimating on the basis
of most-common-value fractions, the null_frac is already accounted for,
because it's not part of the MCV selectivity fractions.  IOW, aren't you
double-counting the null fraction?

It might be the wrong place to fix, but here it seems like we're only
counting MCE-freqs based on non-null elements:

http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/backend/tsearch/ts_typanalyze.c;h=2654d644579fd1959282d83919474f42540ca703;hb=HEAD#l396

And the testdata confirms the behaviour.

--
Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

2011-02-17 Thread Tom Lane

Jesper Krogh  writes:
> When something evaluates to "null" isn't included in the result, 
> shouldn't the query-planner
> then take the null_frac into account when computing the estimate?

The proposed patch seems wrong to me: if we're estimating on the basis
of most-common-value fractions, the null_frac is already accounted for,
because it's not part of the MCV selectivity fractions.  IOW, aren't you
double-counting the null fraction?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

2011-02-17 Thread Jesper Krogh


Hi All.

The NULL element always suprises me in unpleasant ways..  my brain simply
cant really understand the logic, so please let me know if this is
one of the cases where I just should spend way more efforts into fixing 
that instead.


I have a table with a "null_frac" of 0.5 and i have tested that a where
clause that evaluates to null isnt included in the result:

testdb=# select id from testtable where  null @@ to_tsquery('testterm80');
 id

(0 rows)

Then I'd expect to have the null_fraq taken into account when computing the
estimates for the query:

testdb=# explain select id from testtable where fts @@ 
to_tsquery('testterm80');

  QUERY PLAN
---
 Seq Scan on testtable  (cost=0.00..1985.03 rows=1966 width=4)
   Filter: (fts @@ to_tsquery('testterm80'::text))
(2 rows)

Whereas it actually does it if I explicitly add the "fts is not null" 
clause to the query.


testdb=# explain select id from testtable where fts @@ 
to_tsquery('testterm80') and fts is not null;

 QUERY PLAN
-
 Bitmap Heap Scan on testtable  (cost=130.34..1735.19 rows=983 width=4)
   Recheck Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (fts IS 
NOT NULL))
   ->  Bitmap Index Scan on testtable_fts_idx  (cost=0.00..130.09 
rows=983 width=0)
 Index Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (fts 
IS NOT NULL))

(4 rows)

When something evaluates to "null" isn't included in the result, 
shouldn't the query-planner

then take the null_frac into account when computing the estimate?

Trying to do the same thing using integers and the < operator seem to 
take the null_frac into

account.

Below snippet allows to reproduce the dataset.


create table testtable (id serial primary key, document text, fts tsvector);
create index on testtable using gist(fts);
CREATE OR REPLACE FUNCTION public.filltable(rows integer)
 RETURNS boolean
 LANGUAGE plpgsql
AS $function$
DECLARE
count integer;
BEGIN
count := 0;
LOOP
EXIT WHEN count = rows;
count := count +1;
insert into testtable(document,fts) select 
document,to_tsvector('english',document) from (select 
string_agg(concat,' ') as document from (select concat('testterm' || 
generate_series(1,floor(random()*100)::integer))) as foo) as bar;

END LOOP;
RETURN TRUE;
END;
$function$

select filltable(1);
testdb=# update testtable set fts = null where id % 2 = 0;
UPDATE 5001
testdb=# ANALYZE verbose testtable;
INFO:  analyzing "public.testtable"
INFO:  "testtable": scanned 1835 of 1835 pages, containing 10002 live 
rows and 5001 dead rows; 10002 rows in sample, 10002 estimated total rows

ANALYZE
testdb=# select null_frac from pg_stats where attname = 'fts';
 null_frac
---
   0.5
(1 row)

... trying with integers:

testdb=# ALTER  TABLE testtable add column testint integer;
ALTER TABLE
testdb=# update testtable set testint = floor(random()*100);
UPDATE 10002
testdb=# ANALYZE verbose testtable;
INFO:  analyzing "public.testtable"
INFO:  "testtable": scanned 2186 of 2186 pages, containing 10002 live 
rows and 10002 dead rows; 10002 rows in sample, 10002 estimated total rows

ANALYZE
testdb=# update testtable set testint = null where id %2 = 0;
UPDATE 5001
testdb=# ANALYZE verbose testtable;
INFO:  analyzing "public.testtable"
INFO:  "testtable": scanned 2282 of 2282 pages, containing 10002 live 
rows and 13335 dead rows; 10002 rows in sample, 10002 estimated total rows

analyzze ANALYZE
testdb=# explain select id from testtable where testint = 50;
 QUERY PLAN
-
 Seq Scan on testtable  (cost=0.00..2407.03 rows=64 width=4)
   Filter: (testint = 50)
(2 rows)

testdb=# explain select id from testtable where testint = 1;
 QUERY PLAN
-
 Seq Scan on testtable  (cost=0.00..2407.03 rows=48 width=4)
   Filter: (testint = 1)
(2 rows)

testdb=# explain select id from testtable where testint < 50;
  QUERY PLAN
---
 Seq Scan on testtable  (cost=0.00..2407.03 rows=2470 width=4)
   Filter: (testint < 50)
(2 rows)


(found on 8.4 and reproduced on git-head)

Attached patch tries to align the behaviour

Thanks.

--
Jesper
>From 4b4be27864f683a9b3464b86d6cfa567f8ab6bd2 Mon Sep 17 00:00:00 2001
From: Jesper Krogh 
Date: Thu, 17 Feb 2011 22:21:52 +0100
Subject: [PATCH] Take null_frac into account in calculating selectivity for @@ operator. This makes behaviour align with what > operator does for integers.

---
 src/backend/tsearch/ts_selfuncs.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/tsearch/ts_selfuncs.c b/s

Re: [HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

Re: [HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

Re: [HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

Re: [HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

[HACKERS] Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)

5 matches

Site Navigation

Mail list logo

Footer information