date:20010826

[SQL] undocumented setval()

2001-08-26 Thread Cedar Cox



When looking through a dump file, I noticed a setval(text, int,
bool).  What is this?  It doesn't appear to be documented.

-Cedar


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[SQL] Different Choices For Index/Sequential Scan With And Without A Join In 7.2

2001-08-26 Thread Mark kirkwood


Dear List,

I have been doing a little investigation on when the optimizer chooses a 
sequential scan over an index access. I have come accross what interesting 
behaviour in the current 7.2 sources ( 2001-08-17):

The consider two types of query on my "usual" tables :

SELECT 
   f.d0key,
   count(f.val)
FROM fact0 f
WHERE f.d0key BETWEEN  270 AND 
GROUP BY f.d0key;

and

SELECT
   d0.f1,
   count(f.val)
FROM dim0 d0,
 fact0 f
WHERE d0.d0key = f.d0key
AND   d0.f1 BETWEEN  '2000-01-26' AND <'date'>
GROUP BY d0.f1;


Note that  'f1' = '2000-01-26'  corrosponds to 'd0key' = 270 in the table 
'dim0';

I wanted to find the values for  and  for which the optimizer 
changed from and index acess to a seq scan of the 'fact0' table.

I used cpu_tuple_cost = 0.4, but everything else was fairly standard.

For the first query the value of  ( i.e : 'd0key' ) was 627
For the second the value of  (i.e 'f1' ) was '2000-02-05'  ( 
corrosponds to d0key = 279 ) 

It guess I was expecting the value that made the first query change from 
index to seq scan to be "close" to the value that made the second query use a 
sequential scanas the fact0 access of the second query is essentially the 
first query. However the results are vastly different - have I missed 
something obvious here ?


The script and explain output are listed below.

regards

Mark

<--script

SET cpu_tuple_cost=0.4;
SHOW cpu_tuple_cost;

--  show what keys are for what dates...
--
SELECT d0.d0key,
   d0.f1
FROM dim0 d0
WHERE d0.d0key IN ('270','279','280','626','627')
;


--  show when index scans change to sequential
--  for the fact0 table alone...
--
EXPLAIN
SELECT 
   f.d0key,
   count(f.val) 
FROM fact0 f
WHERE f.d0key BETWEEN  270 AND 626
GROUP BY f.d0key
;


EXPLAIN
SELECT 
   f.d0key,
   count(f.val)
FROM fact0 f
WHERE f.d0key BETWEEN  270 AND 627
GROUP BY f.d0key
;


--  show when index scans change to sequential
--  for the two table join
--EXPLAIN
SELECT
   d0.f1,
   count(f.val)
FROM dim0 d0,
 fact0 f
WHERE d0.d0key = f.d0key
AND   d0.f1 BETWEEN  '2000-01-26' AND '2000-02-04'
GROUP BY d0.f1
;


EXPLAIN
SELECT
   d0.f1,
   count(f.val)
FROM dim0 d0,
 fact0 f
WHERE d0.d0key = f.d0key
AND   d0.f1 BETWEEN  '2000-01-26' AND '2000-02-05'
GROUP BY d0.f1
;

<--results

SET VARIABLE
NOTICE:  cpu_tuple_cost is 0.4
SHOW VARIABLE
 d0key |   f1
---+
   270 | 2000-01-26 00:00:00+13
   279 | 2000-02-04 00:00:00+13
   280 | 2000-02-05 00:00:00+13
   626 | 2001-01-16 00:00:00+13
   627 | 2001-01-17 00:00:00+13
(5 rows)
   
NOTICE:  QUERY PLAN:

Aggregate  (cost=0.00..1308177.10 rows=33453 width=8)
  ->  Group  (cost=0.00..1307340.77 rows=334533 width=8)
->  Index Scan using fact0_pk on fact0 f  (cost=0.00..1306504.44 
rows=334533 width=8)

EXPLAIN
NOTICE:  QUERY PLAN:

Aggregate  (cost=1308030.21..1309707.21 rows=33540 width=8)
  ->  Group  (cost=1308030.21..1308868.71 rows=335400 width=8)
->  Sort  (cost=1308030.21..1308030.21 rows=335400 width=8)
  ->  Seq Scan on fact0 f  (cost=0.00..1272693.00 rows=335400 
width=8)

EXPLAIN
NOTICE:  QUERY PLAN:

Aggregate  (cost=0.00..1155870.07 rows=268 width=20)
  ->  Group  (cost=0.00..1155863.36 rows=2684 width=20)
->  Nested Loop  (cost=0.00..1155856.65 rows=2684 width=20)
  ->  Index Scan using dim0_q1 on dim0 d0  (cost=0.00..6.63 
rows=9 width=12)
  ->  Index Scan using fact0_pk on fact0 f  (cost=0.00..117117.99 
rows=3 width=8)

EXPLAIN
NOTICE:  QUERY PLAN:

Aggregate  (cost=1281572.52..1281587.43 rows=298 width=20)
  ->  Group  (cost=1281572.52..1281579.97 rows=2982 width=20)
->  Sort  (cost=1281572.52..1281572.52 rows=2982 width=20)
  ->  Hash Join  (cost=7.06..1281400.41 rows=2982 width=20)
->  Seq Scan on fact0 f  (cost=0.00..1257693.00 
rows=300 width=8)
->  Hash  (cost=7.04..7.04 rows=10 width=12)
  ->  Index Scan using dim0_q1 on dim0 d0  
(cost=0.00..7.04 rows=10 width=12)

EXPLAIN



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[SQL] "AND", "OR" and Materialize :((((

2001-08-26 Thread Meszaros Attila


Hi all,

We've got the following 3 tables and 2 simple queries. The only difference
lies in the join condition: the first uses OR, the second uses AND.

I expected some difference in the performace according to the
difference in the evaluation of the logical form, but not
3 magnitudes !!!

So the question is:

WHY SHALL IT MATERIALIZE A CONSTANT RESULT IN A LOOP OF 12 TIMES ??
[it would be enough to materialize only once, or even never, because
the size of the materialized table is not larger than 1Mb... ]

ps.: there are indeces on all referenced fields.


atti=# explain select count(*) from _108 left join (_111 cross join _110) 
on (_108.objectid=_111._108objectid OR _108.objectid=_110._108objectid); 
NOTICE:  QUERY PLAN:

Aggregate  (cost=5017202.06..5017202.06 rows=1 width=24)
  ->  Nested Loop  (cost=0.00..5016900.05 rows=120806 width=24)
->  Seq Scan on _108  (cost=0.00..44.70 rows=1670 width=8)
->  Materialize  (cost=2097.79..2097.79 rows=60421 width=16)
  ->  Nested Loop  (cost=0.00..2097.79 rows=60421 width=16)
->  Seq Scan on _110  (cost=0.00..1.37 rows=37 width=8)
->  Seq Scan on _111  (cost=0.00..40.33 rows=1633 width=8)

EXPLAIN
atti=# explain select count(*) from _108 left join (_111 cross join _110) 
on _108.objectid=_111._108objectid AND _108.objectid=_110._108objectid; 
NOTICE:  QUERY PLAN:

Aggregate  (cost=7965.68..7965.68 rows=1 width=24)
  ->  Merge Join  (cost=7030.14..7961.51 rows=1670 width=24)
->  Sort  (cost=134.09..134.09 rows=1670 width=8)
  ->  Seq Scan on _108  (cost=0.00..44.70 rows=1670 width=8)
->  Sort  (cost=6896.05..6896.05 rows=60421 width=16)
  ->  Nested Loop  (cost=0.00..2097.79 rows=60421 width=16)
->  Seq Scan on _110  (cost=0.00..1.37 rows=37 width=8)
->  Seq Scan on _111  (cost=0.00..40.33 rows=1633 width=8)

Attila



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [SQL] undocumented setval()

2001-08-26 Thread Tom Lane

Cedar Cox <[EMAIL PROTECTED]> writes:
> When looking through a dump file, I noticed a setval(text, int,
> bool).  What is this?  It doesn't appear to be documented.

The bool parameter sets the is_called state bit of the sequence.
Without this you can't restore a sequence to a virgin "never nextval'd"
state.

Actually, none of the sequence-related functions are documented in
what I'd call a proper way.  I think there should be a separate section
of the user's guide "Functions and Operators" chapter that describes
them --- not a large section, but a section nonetheless.  Any
volunteers to write it?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [SQL] Different Choices For Index/Sequential Scan With And Without A Join In 7.2

2001-08-26 Thread Tom Lane

Mark kirkwood <[EMAIL PROTECTED]> writes:
> Note that  'f1' = '2000-01-26'  corrosponds to 'd0key' = 270 in the table 
> 'dim0';

What do you mean by "corresponds to"?  Is there a one-to-one mapping
between distinct values of fact0.d0key and distinct values of dim0.f1?
Or do you just mean that the values play corresponding roles in these
two queries?

> I used cpu_tuple_cost = 0.4, but everything else was fairly standard.

?? You're claiming that the CPU time involved in processing a single
tuple is 40% as large as the time to fetch a page from disk.  Unless
you're running a high-end RAID array attached to an ENIAC, I don't
believe it.  This adjustment almost certainly will produce silly results.

> It guess I was expecting the value that made the first query change
> from index to seq scan to be "close" to the value that made the second
> query use a sequential scan...

Um, are you considering the effects of statistical density of the
values?  I see no particular reason to assume that a range of nine days
in a date column should be equally as selective as a range of nine
counts in an integer key column.  It all depends on what fraction of the
table entries actually fall within those ranges.

Have you looked at the ANALYZE statistics for the tables?  (You have
done an ANALYZE on them, I hope.)  Try
select * from pg_stats where tablename = 'fact';
The user documentation about 7.2 statistics is nonexistent as yet, but
you can read src/include/catalog/pg_statistic.h for info.

If the tables are large and have irregular distributions, you might find
that increasing the statistics target value for the key columns helps
the optimizer to produce good plan choices.  See ALTER TABLE SET
STATISTICS.  I'd be interested to hear about it if so --- the current
default target of 10 was picked "out of the air" and might well be
off-base.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [SQL] "AND", "OR" and Materialize :((((

2001-08-26 Thread Tom Lane

Meszaros Attila <[EMAIL PROTECTED]> writes:
> WHY SHALL IT MATERIALIZE A CONSTANT RESULT IN A LOOP OF 12 TIMES ??
> [it would be enough to materialize only once,

Which in fact is exactly what the materialize node is for.  The reported
costs are pretty bogus, but AFAICT the plan is the right thing.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

Re: [SQL] "AND", "OR" and Materialize :((((

2001-08-26 Thread Meszaros Attila


Hi,

> > WHY SHALL IT MATERIALIZE A CONSTANT RESULT IN A LOOP OF 12 TIMES ??
> > [it would be enough to materialize only once,
> 
> Which in fact is exactly what the materialize node is for.  The reported
> costs are pretty bogus, but AFAICT the plan is the right thing.

Thanx for the answer.
I've thought the same (eg. materialize should reduce the amount
of work to be done, but I haven't felt this in the result)

Unfortunatelly the cost prediction in line 2 may be close to
the real cost. According to some measures:
time for the query with 'AND':2 sec
time for the query with 'OR':   421 sec

So the question is what to do? 
Can I speed up the second one?

[vacuum analyze and indices are done, postgres version is 7.1.2]

1:Aggregate  (cost=5017202.06..5017202.06 rows=1 width=24)
2:  ->  Nested Loop  (cost=0.00..5016900.05 rows=120806 width=24)
3:->  Seq Scan on _108  (cost=0.00..44.70 rows=1670 width=8)
4:->  Materialize  (cost=2097.79..2097.79 rows=60421 width=16)
5:  ->  Nested Loop  (cost=0.00..2097.79 rows=60421 width=16)
6:->  Seq Scan on _110  (cost=0.00..1.37 rows=37 width=8)
7:->  Seq Scan on _111  (cost=0.00..40.33 rows=1633 width=8)

Attila


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [SQL] Different Choices For Index/Sequential Scan With And Without A Join In 7.2

2001-08-26 Thread markir

> Mark kirkwood <[EMAIL PROTECTED]> writes:
> > Note that  'f1' = '2000-01-26'  corrosponds to 'd0key' = 270 in the table 
> > 'dim0';
> 
> What do you mean by "corresponds to"?  Is there a one-to-one mapping
> between distinct values of fact0.d0key and distinct values of dim0.f1?
> Or do you just mean that the values play corresponding roles in these
> two queries?
> 

Sorry Tom ... clearly I didnt explain this very well...
But if you look at the rows in dim0 (see bottom of the previous mail) the above 
d0key and f1 are in the same row of 'dim0' - I neglected to mention that 'f1' 
is unique and 'd0key' is the primary key for 'dim0', so yes there is a 1-1 
mapping between 'd0key' and 'f1' in 'dim0. Therefore there is also a 1-1 
mapping between distinct 'd0key' in 'fact0' and 'f1' in 'dim0'.

so to get the rows in 'fact0', 'dim0' where dim0.f1 = '2000-01-26' is the same 
as getting the rows in fact0, dim0 where dim0.d0key = 270.
Given that the join is dim0.d0key = fact0.d0key then this is equivalent to 
fact0.d0key = 270.

Of course you are correct about a date column on dim0 not having the same 
selectivity as an int on fact0... but it seems to me ( incorrectly ? ) that in 
order in access fact0 from the resulting dim0 rows for f1, the optimizer must 
use the set of d0key(s) extracted from dim0 and go to fact0 with then. This was 
exactly what the unjoined query was doing - which gets us back to my original 
question again ( I think ).

On the other points : cpu_tuple_cost and distribution - These are completely 
correct, I will use another similar table that has uniformly distributed data - 
this should mean no fiddling about with cpu_tuple_cost is required.

In addition, to clarify the issue furthur I am considering removing f1 from the 
example, and using d0key in both queries, to see what happens then.

Thanks for your patience on this.

regards

Mark

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://www.postgresql.org/search.mpl

[SQL] undocumented setval()

[SQL] Different Choices For Index/Sequential Scan With And Without A Join In 7.2

[SQL] "AND", "OR" and Materialize :((((

Re: [SQL] undocumented setval()

Re: [SQL] Different Choices For Index/Sequential Scan With And Without A Join In 7.2

Re: [SQL] "AND", "OR" and Materialize :((((

Re: [SQL] "AND", "OR" and Materialize :((((

Re: [SQL] Different Choices For Index/Sequential Scan With And Without A Join In 7.2

8 matches

Site Navigation

Mail list logo

Footer information