Re: [PERFORM] *very* inefficient choice made by the planner (regarding

2004-06-10 Thread Stephan Szabo

On Thu, 10 Jun 2004, Stephan Szabo wrote:

>
> On Thu, 10 Jun 2004, Jean-Luc Lachance wrote:
>
> > I agree, but it should be a simple rewrite. No?
>
> It's NULLs inside the subselect that are the issue.
>
> select 1 in (select a from foo)
> select exists ( select 1 from foo where a=1)
>
> If foo.a contains a row with NULL but no rows containing a 1, the above
> give different results (unknown and exists) and IIRC, exists cannot

Erm that exists should have been false

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] *very* inefficient choice made by the planner (regarding

2004-06-10 Thread Stephan Szabo

On Thu, 10 Jun 2004, Jean-Luc Lachance wrote:

> I agree, but it should be a simple rewrite. No?

It's NULLs inside the subselect that are the issue.

select 1 in (select a from foo)
select exists ( select 1 from foo where a=1)

If foo.a contains a row with NULL but no rows containing a 1, the above
give different results (unknown and exists) and IIRC, exists cannot
return unknown, so there's no simple rewrite of the subselect that gives
equivalent behavior.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] *very* inefficient choice made by the planner (regarding

2004-06-10 Thread Jean-Luc Lachance
I agree, but it should be a simple rewrite. No?
x IS NULL/IS NOT NULL AND/OR NOT EXISTS
Tom Lane wrote:
Jean-Luc Lachance <[EMAIL PROTECTED]> writes:
If the two statments are functionally equivalent, why can't PG rewrite 
the "NOT IN" version into the more efficient "NOT EXISTS"?

They're not equivalent.  In particular, the behavior in the presence of
NULLs is quite different.
regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Database Server Tuning

2004-06-10 Thread Vivek Khera
On Jun 10, 2004, at 12:02 PM, Anjan Dave wrote:
Vivek,
Was there anything specific that helped you decide on a RAID-5 and not 
a RAID-10?
performance testing on restore times.  My DB is more than 50% write, so 
I needed to optimize for writes.

I have my DBs on RAID10, and would soon be moving them on FC drives, 
and i am considering RAID-10.
If I had to do it over again, I'd most likely go with RAID-50, and take 
the hit on restore time for the advantage on reads.  I have to dig 
through my records again to see the details... but then I have to do 
all that for my OSCON presentation on this topic at the end of July in 
Portland, OR. ;-)



smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] *very* inefficient choice made by the planner (regarding

2004-06-10 Thread Tom Lane
Jean-Luc Lachance <[EMAIL PROTECTED]> writes:
> If the two statments are functionally equivalent, why can't PG rewrite 
> the "NOT IN" version into the more efficient "NOT EXISTS"?

They're not equivalent.  In particular, the behavior in the presence of
NULLs is quite different.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Database Server Tuning

2004-06-10 Thread Anjan Dave
Vivek,
 
Was there anything specific that helped you decide on a RAID-5 and not a RAID-10?
 
I have my DBs on RAID10, and would soon be moving them on FC drives, and i am 
considering RAID-10.
 
Thanks,
Anjan

-Original Message- 
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Tue 3/2/2004 4:27 PM 
To: Vivek Khera; [EMAIL PROTECTED] 
Cc: 
Subject: Re: [PERFORM] Database Server Tuning



Vivek,

> I did a bunch of testing with different RAID levels on a 14 disk
> array.  I finally settled on this:  RAID5 across 14 disks for the
> data, the OS (including syslog directory) and WAL on a RAID1 pair on
> the other channel of the same controller (I didn't want to spring for
> dual RAID controllers).  The biggest bumps in performance came from
> increasing the checkpoint_buffers since my DB is heavily written to,
> and increasing sort_mem.

With large RAID, have you found that having WAL on a seperate array actually
boosts performance?   The empirical tests we've seen so far don't seem to
support this.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] *very* inefficient choice made by the planner (regarding

2004-06-10 Thread Jean-Luc Lachance
The real question is:
If the two statments are functionally equivalent, why can't PG rewrite 
the "NOT IN" version into the more efficient "NOT EXISTS"?


Frank van Vugt wrote:
L.S.
Could anybody explain why the planner is doing what it is doing?
What could I do to make it easier to choose a better plan?

*
Summary
*
On a freshly vacuum/analysed pair of tables with 7389 and 64333 records, this:
select id from location where id not in (select location_id from 
location_carrier);

takes 581546,497 ms
While a variant like:
select id from location where not exists (select 1 from location_carrier where 
location_id = location.id);

takes only 124,625 ms
*
Details
*
=# select version();
   version
-
 PostgreSQL 7.4.2 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
(1 row)
=# \d location
   Table "public.location"
   Column   |Type | Modifiers
+-+---
 id | integer | not null
Indexes:
"location_pkey" primary key, btree (id)
=# select count(*) from location;
 count
---
  7389
(1 row)
=# \d location_carrier
Table "public.location_carrier"
   Column|Type | Modifiers
-+-+---
 location_id | integer | not null
 carrier_id  | integer | not null
Indexes:
"location_carrier_pkey" primary key, btree (location_id, carrier_id)
=# select count(*) from location_carrier;
 count
---
 64333
(1 row)
=# explain select id from location where id not in (select location_id from 
location_carrier);
  QUERY PLAN
---
 Seq Scan on "location"  (cost=0.00..5077093.72 rows=3695 width=4)
   Filter: (NOT (subplan))
   SubPlan
 ->  Seq Scan on location_carrier  (cost=0.00..1213.33 rows=64333 width=4)
(4 rows)

=# explain analyse select id from location where id not in (select location_id 
from location_carrier);
   QUERY PLAN
-
 Seq Scan on "location"  (cost=0.00..5077093.72 rows=3695 width=4) (actual 
time=248.310..581541.483 rows=240 loops=1)
   Filter: (NOT (subplan))
   SubPlan
 ->  Seq Scan on location_carrier  (cost=0.00..1213.33 rows=64333 width=4) 
(actual time=0.007..48.517 rows=19364 loops=7389)
 Total runtime: 581542.560 ms
(5 rows)

Time: 581546,497 ms
=# explain analyse select id from location l left outer join location_carrier 
lc on l.id = lc.location_id where lc.location_id is null;
  QUERY 
PLAN
--
 Merge Left Join  (cost=0.00..3022.51 rows=7389 width=4) (actual 
time=0.083..435.841 rows=240 loops=1)
   Merge Cond: ("outer".id = "inner".location_id)
   Filter: ("inner".location_id IS NULL)
   ->  Index Scan using location_pkey on "location" l  (cost=0.00..258.85 
rows=7389 width=4) (actual time=0.041..26.211 rows=7389 loops=1)
   ->  Index Scan using location_carrier_pkey on location_carrier lc  
(cost=0.00..1941.22 rows=64333 width=4) (actual time=0.015..238.305 
rows=64333 loops=1)
 Total runtime: 436.213 ms
(6 rows)

Time: 440,787 ms
megafox=# explain analyse select id from location where not exists (select 1 
from location_carrier where location_id = location.id);
 QUERY 
PLAN
-
 Seq Scan on "location"  (cost=0.00..13242.14 rows=3695 width=4) (actual 
time=0.078..120.785 rows=240 loops=1)
   Filter: (NOT (subplan))
   SubPlan
 ->  Index Scan using location_carrier_pkey on location_carrier  
(cost=0.00..17.61 rows=10 width=0) (actual time=0.011..0.011 rows=1 
loops=7389)
   Index Cond: (location_id = $0)
 Total runtime: 121.165 ms
(6 rows)

Time: 124,625 ms




---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] *very* inefficient choice made by the planner (regarding IN(...))

2004-06-10 Thread Frank van Vugt
Wow,

The effectiveness of the pgsql mailinglists never ceases to amaze me.

Default sort mem it was, I guess I'd simply been to cautious with this 
per-client setting.




Stephan & Tom : thanks!




-- 
Best,




Frank.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] *very* inefficient choice made by the planner (regarding IN(...))

2004-06-10 Thread Tom Lane
Frank van Vugt <[EMAIL PROTECTED]> writes:
> What could I do to make it easier to choose a better plan?

Increase sort_mem.  You want it to pick a "hashed subplan", but
it's not doing so because 64000 rows won't fit in the default
sort_mem.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] *very* inefficient choice made by the planner (regarding

2004-06-10 Thread Stephan Szabo

On Thu, 10 Jun 2004, Frank van Vugt wrote:

> Could anybody explain why the planner is doing what it is doing?
>
> What could I do to make it easier to choose a better plan?

You might try raising sort_mem to see if it chooses a better plan.  I
think it may be guessing that the hash won't fit and falling back to the
plan you were getting.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] *very* inefficient choice made by the planner (regarding IN(...))

2004-06-10 Thread Frank van Vugt
L.S.

Could anybody explain why the planner is doing what it is doing?

What could I do to make it easier to choose a better plan?



*
Summary
*
On a freshly vacuum/analysed pair of tables with 7389 and 64333 records, this:

select id from location where id not in (select location_id from 
location_carrier);

takes 581546,497 ms


While a variant like:

select id from location where not exists (select 1 from location_carrier where 
location_id = location.id);

takes only 124,625 ms


*
Details
*
=# select version();
   version
-
 PostgreSQL 7.4.2 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
(1 row)


=# \d location
   Table "public.location"
   Column   |Type | Modifiers
+-+---
 id | integer | not null
Indexes:
"location_pkey" primary key, btree (id)


=# select count(*) from location;
 count
---
  7389
(1 row)


=# \d location_carrier
Table "public.location_carrier"
   Column|Type | Modifiers
-+-+---
 location_id | integer | not null
 carrier_id  | integer | not null
Indexes:
"location_carrier_pkey" primary key, btree (location_id, carrier_id)


=# select count(*) from location_carrier;
 count
---
 64333
(1 row)


=# explain select id from location where id not in (select location_id from 
location_carrier);
  QUERY PLAN
---
 Seq Scan on "location"  (cost=0.00..5077093.72 rows=3695 width=4)
   Filter: (NOT (subplan))
   SubPlan
 ->  Seq Scan on location_carrier  (cost=0.00..1213.33 rows=64333 width=4)
(4 rows)


=# explain analyse select id from location where id not in (select location_id 
from location_carrier);
   QUERY PLAN
-
 Seq Scan on "location"  (cost=0.00..5077093.72 rows=3695 width=4) (actual 
time=248.310..581541.483 rows=240 loops=1)
   Filter: (NOT (subplan))
   SubPlan
 ->  Seq Scan on location_carrier  (cost=0.00..1213.33 rows=64333 width=4) 
(actual time=0.007..48.517 rows=19364 loops=7389)
 Total runtime: 581542.560 ms
(5 rows)

Time: 581546,497 ms


=# explain analyse select id from location l left outer join location_carrier 
lc on l.id = lc.location_id where lc.location_id is null;
  QUERY 
PLAN
--
 Merge Left Join  (cost=0.00..3022.51 rows=7389 width=4) (actual 
time=0.083..435.841 rows=240 loops=1)
   Merge Cond: ("outer".id = "inner".location_id)
   Filter: ("inner".location_id IS NULL)
   ->  Index Scan using location_pkey on "location" l  (cost=0.00..258.85 
rows=7389 width=4) (actual time=0.041..26.211 rows=7389 loops=1)
   ->  Index Scan using location_carrier_pkey on location_carrier lc  
(cost=0.00..1941.22 rows=64333 width=4) (actual time=0.015..238.305 
rows=64333 loops=1)
 Total runtime: 436.213 ms
(6 rows)

Time: 440,787 ms


megafox=# explain analyse select id from location where not exists (select 1 
from location_carrier where location_id = location.id);
 QUERY 
PLAN
-
 Seq Scan on "location"  (cost=0.00..13242.14 rows=3695 width=4) (actual 
time=0.078..120.785 rows=240 loops=1)
   Filter: (NOT (subplan))
   SubPlan
 ->  Index Scan using location_carrier_pkey on location_carrier  
(cost=0.00..17.61 rows=10 width=0) (actual time=0.011..0.011 rows=1 
loops=7389)
   Index Cond: (location_id = $0)
 Total runtime: 121.165 ms
(6 rows)

Time: 124,625 ms







-- 
Best,




Frank.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Bulk INSERT performance in 7.4.1

2004-06-10 Thread Florian Weimer
Vivek Khera wrote:

> If you've got the time, could you try also doing the full bulk insert
> test with the checkpoint log files on another physical disk?  See if
> that's any faster.

We have been doing that for a few weeks, but the performance
improvements are less than what we expected.  There is hardly any disk
activity on the log RAID, even during checkpointing.

After I activated the tuned configuration, we are again mostly CPU-bound
(it seems that updating all four indices is quite expensive).  The
bulk INSERT process runs single-threaded right now, and if we switched
to multiple processes for that, we could reach some 1,500 INSERTs per
second, I believe.  This is more than sufficient for us; our real-time
data collector is tuned to emit about 150 records per second, on the
average.  (There is an on-disk queue to compensate temporary problems,
such as spikes in the data rate and database updates gone awry.)

-- 
Current mail filters: many dial-up/DSL/cable modem hosts, and the
following domains: atlas.cz, bigpond.com, freenet.de, hotmail.com,
libero.it, netscape.net, postino.it, tiscali.co.uk, tiscali.cz,
tiscali.it, voila.fr, wanadoo.fr, yahoo.com.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Feature request: smarter use of conditional indexes

2004-06-10 Thread John Siracusa
Given an index like this:

CREATE UNIQUE INDEX i1 ON t1 (c1) WHERE c1 IS NOT NULL;

and a query like this:

SELECT * FROM t1 WHERE c1 = 123;

I'd like the planner to be smart enough to use an index scan using i1.  Yes,
I can change the query to this:

SELECT * FROM t1 WHERE c1 = 123 AND c1 IS NOT NULL;

In which case the index will be used, but I shouldn't have to.  More
practically, since a lot of my SQL is auto-generated, it's difficult to make
this query change just in the cases where I need it.  And I'm loathe to
change every "column = value" pair in my auto-generated SQL into a double
pair of "(column = value and column is not null)"  It's redundant and looks
pretty silly, IMO.

Thanks for you consideration :)

-John


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly