Re: [PERFORM] *very* inefficient choice made by the planner (regarding
On Thu, 10 Jun 2004, Stephan Szabo wrote: > > On Thu, 10 Jun 2004, Jean-Luc Lachance wrote: > > > I agree, but it should be a simple rewrite. No? > > It's NULLs inside the subselect that are the issue. > > select 1 in (select a from foo) > select exists ( select 1 from foo where a=1) > > If foo.a contains a row with NULL but no rows containing a 1, the above > give different results (unknown and exists) and IIRC, exists cannot Erm that exists should have been false ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] *very* inefficient choice made by the planner (regarding
On Thu, 10 Jun 2004, Jean-Luc Lachance wrote: > I agree, but it should be a simple rewrite. No? It's NULLs inside the subselect that are the issue. select 1 in (select a from foo) select exists ( select 1 from foo where a=1) If foo.a contains a row with NULL but no rows containing a 1, the above give different results (unknown and exists) and IIRC, exists cannot return unknown, so there's no simple rewrite of the subselect that gives equivalent behavior. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] *very* inefficient choice made by the planner (regarding
I agree, but it should be a simple rewrite. No? x IS NULL/IS NOT NULL AND/OR NOT EXISTS Tom Lane wrote: Jean-Luc Lachance <[EMAIL PROTECTED]> writes: If the two statments are functionally equivalent, why can't PG rewrite the "NOT IN" version into the more efficient "NOT EXISTS"? They're not equivalent. In particular, the behavior in the presence of NULLs is quite different. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Database Server Tuning
On Jun 10, 2004, at 12:02 PM, Anjan Dave wrote: Vivek, Was there anything specific that helped you decide on a RAID-5 and not a RAID-10? performance testing on restore times. My DB is more than 50% write, so I needed to optimize for writes. I have my DBs on RAID10, and would soon be moving them on FC drives, and i am considering RAID-10. If I had to do it over again, I'd most likely go with RAID-50, and take the hit on restore time for the advantage on reads. I have to dig through my records again to see the details... but then I have to do all that for my OSCON presentation on this topic at the end of July in Portland, OR. ;-) smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] *very* inefficient choice made by the planner (regarding
Jean-Luc Lachance <[EMAIL PROTECTED]> writes: > If the two statments are functionally equivalent, why can't PG rewrite > the "NOT IN" version into the more efficient "NOT EXISTS"? They're not equivalent. In particular, the behavior in the presence of NULLs is quite different. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Database Server Tuning
Vivek, Was there anything specific that helped you decide on a RAID-5 and not a RAID-10? I have my DBs on RAID10, and would soon be moving them on FC drives, and i am considering RAID-10. Thanks, Anjan -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Tue 3/2/2004 4:27 PM To: Vivek Khera; [EMAIL PROTECTED] Cc: Subject: Re: [PERFORM] Database Server Tuning Vivek, > I did a bunch of testing with different RAID levels on a 14 disk > array. I finally settled on this: RAID5 across 14 disks for the > data, the OS (including syslog directory) and WAL on a RAID1 pair on > the other channel of the same controller (I didn't want to spring for > dual RAID controllers). The biggest bumps in performance came from > increasing the checkpoint_buffers since my DB is heavily written to, > and increasing sort_mem. With large RAID, have you found that having WAL on a seperate array actually boosts performance? The empirical tests we've seen so far don't seem to support this. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] *very* inefficient choice made by the planner (regarding
The real question is: If the two statments are functionally equivalent, why can't PG rewrite the "NOT IN" version into the more efficient "NOT EXISTS"? Frank van Vugt wrote: L.S. Could anybody explain why the planner is doing what it is doing? What could I do to make it easier to choose a better plan? * Summary * On a freshly vacuum/analysed pair of tables with 7389 and 64333 records, this: select id from location where id not in (select location_id from location_carrier); takes 581546,497 ms While a variant like: select id from location where not exists (select 1 from location_carrier where location_id = location.id); takes only 124,625 ms * Details * =# select version(); version - PostgreSQL 7.4.2 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66 (1 row) =# \d location Table "public.location" Column |Type | Modifiers +-+--- id | integer | not null Indexes: "location_pkey" primary key, btree (id) =# select count(*) from location; count --- 7389 (1 row) =# \d location_carrier Table "public.location_carrier" Column|Type | Modifiers -+-+--- location_id | integer | not null carrier_id | integer | not null Indexes: "location_carrier_pkey" primary key, btree (location_id, carrier_id) =# select count(*) from location_carrier; count --- 64333 (1 row) =# explain select id from location where id not in (select location_id from location_carrier); QUERY PLAN --- Seq Scan on "location" (cost=0.00..5077093.72 rows=3695 width=4) Filter: (NOT (subplan)) SubPlan -> Seq Scan on location_carrier (cost=0.00..1213.33 rows=64333 width=4) (4 rows) =# explain analyse select id from location where id not in (select location_id from location_carrier); QUERY PLAN - Seq Scan on "location" (cost=0.00..5077093.72 rows=3695 width=4) (actual time=248.310..581541.483 rows=240 loops=1) Filter: (NOT (subplan)) SubPlan -> Seq Scan on location_carrier (cost=0.00..1213.33 rows=64333 width=4) (actual time=0.007..48.517 rows=19364 loops=7389) Total runtime: 581542.560 ms (5 rows) Time: 581546,497 ms =# explain analyse select id from location l left outer join location_carrier lc on l.id = lc.location_id where lc.location_id is null; QUERY PLAN -- Merge Left Join (cost=0.00..3022.51 rows=7389 width=4) (actual time=0.083..435.841 rows=240 loops=1) Merge Cond: ("outer".id = "inner".location_id) Filter: ("inner".location_id IS NULL) -> Index Scan using location_pkey on "location" l (cost=0.00..258.85 rows=7389 width=4) (actual time=0.041..26.211 rows=7389 loops=1) -> Index Scan using location_carrier_pkey on location_carrier lc (cost=0.00..1941.22 rows=64333 width=4) (actual time=0.015..238.305 rows=64333 loops=1) Total runtime: 436.213 ms (6 rows) Time: 440,787 ms megafox=# explain analyse select id from location where not exists (select 1 from location_carrier where location_id = location.id); QUERY PLAN - Seq Scan on "location" (cost=0.00..13242.14 rows=3695 width=4) (actual time=0.078..120.785 rows=240 loops=1) Filter: (NOT (subplan)) SubPlan -> Index Scan using location_carrier_pkey on location_carrier (cost=0.00..17.61 rows=10 width=0) (actual time=0.011..0.011 rows=1 loops=7389) Index Cond: (location_id = $0) Total runtime: 121.165 ms (6 rows) Time: 124,625 ms ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] *very* inefficient choice made by the planner (regarding IN(...))
Wow, The effectiveness of the pgsql mailinglists never ceases to amaze me. Default sort mem it was, I guess I'd simply been to cautious with this per-client setting. Stephan & Tom : thanks! -- Best, Frank. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] *very* inefficient choice made by the planner (regarding IN(...))
Frank van Vugt <[EMAIL PROTECTED]> writes: > What could I do to make it easier to choose a better plan? Increase sort_mem. You want it to pick a "hashed subplan", but it's not doing so because 64000 rows won't fit in the default sort_mem. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] *very* inefficient choice made by the planner (regarding
On Thu, 10 Jun 2004, Frank van Vugt wrote: > Could anybody explain why the planner is doing what it is doing? > > What could I do to make it easier to choose a better plan? You might try raising sort_mem to see if it chooses a better plan. I think it may be guessing that the hash won't fit and falling back to the plan you were getting. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] *very* inefficient choice made by the planner (regarding IN(...))
L.S. Could anybody explain why the planner is doing what it is doing? What could I do to make it easier to choose a better plan? * Summary * On a freshly vacuum/analysed pair of tables with 7389 and 64333 records, this: select id from location where id not in (select location_id from location_carrier); takes 581546,497 ms While a variant like: select id from location where not exists (select 1 from location_carrier where location_id = location.id); takes only 124,625 ms * Details * =# select version(); version - PostgreSQL 7.4.2 on i686-pc-linux-gnu, compiled by GCC egcs-2.91.66 (1 row) =# \d location Table "public.location" Column |Type | Modifiers +-+--- id | integer | not null Indexes: "location_pkey" primary key, btree (id) =# select count(*) from location; count --- 7389 (1 row) =# \d location_carrier Table "public.location_carrier" Column|Type | Modifiers -+-+--- location_id | integer | not null carrier_id | integer | not null Indexes: "location_carrier_pkey" primary key, btree (location_id, carrier_id) =# select count(*) from location_carrier; count --- 64333 (1 row) =# explain select id from location where id not in (select location_id from location_carrier); QUERY PLAN --- Seq Scan on "location" (cost=0.00..5077093.72 rows=3695 width=4) Filter: (NOT (subplan)) SubPlan -> Seq Scan on location_carrier (cost=0.00..1213.33 rows=64333 width=4) (4 rows) =# explain analyse select id from location where id not in (select location_id from location_carrier); QUERY PLAN - Seq Scan on "location" (cost=0.00..5077093.72 rows=3695 width=4) (actual time=248.310..581541.483 rows=240 loops=1) Filter: (NOT (subplan)) SubPlan -> Seq Scan on location_carrier (cost=0.00..1213.33 rows=64333 width=4) (actual time=0.007..48.517 rows=19364 loops=7389) Total runtime: 581542.560 ms (5 rows) Time: 581546,497 ms =# explain analyse select id from location l left outer join location_carrier lc on l.id = lc.location_id where lc.location_id is null; QUERY PLAN -- Merge Left Join (cost=0.00..3022.51 rows=7389 width=4) (actual time=0.083..435.841 rows=240 loops=1) Merge Cond: ("outer".id = "inner".location_id) Filter: ("inner".location_id IS NULL) -> Index Scan using location_pkey on "location" l (cost=0.00..258.85 rows=7389 width=4) (actual time=0.041..26.211 rows=7389 loops=1) -> Index Scan using location_carrier_pkey on location_carrier lc (cost=0.00..1941.22 rows=64333 width=4) (actual time=0.015..238.305 rows=64333 loops=1) Total runtime: 436.213 ms (6 rows) Time: 440,787 ms megafox=# explain analyse select id from location where not exists (select 1 from location_carrier where location_id = location.id); QUERY PLAN - Seq Scan on "location" (cost=0.00..13242.14 rows=3695 width=4) (actual time=0.078..120.785 rows=240 loops=1) Filter: (NOT (subplan)) SubPlan -> Index Scan using location_carrier_pkey on location_carrier (cost=0.00..17.61 rows=10 width=0) (actual time=0.011..0.011 rows=1 loops=7389) Index Cond: (location_id = $0) Total runtime: 121.165 ms (6 rows) Time: 124,625 ms -- Best, Frank. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Bulk INSERT performance in 7.4.1
Vivek Khera wrote: > If you've got the time, could you try also doing the full bulk insert > test with the checkpoint log files on another physical disk? See if > that's any faster. We have been doing that for a few weeks, but the performance improvements are less than what we expected. There is hardly any disk activity on the log RAID, even during checkpointing. After I activated the tuned configuration, we are again mostly CPU-bound (it seems that updating all four indices is quite expensive). The bulk INSERT process runs single-threaded right now, and if we switched to multiple processes for that, we could reach some 1,500 INSERTs per second, I believe. This is more than sufficient for us; our real-time data collector is tuned to emit about 150 records per second, on the average. (There is an on-disk queue to compensate temporary problems, such as spikes in the data rate and database updates gone awry.) -- Current mail filters: many dial-up/DSL/cable modem hosts, and the following domains: atlas.cz, bigpond.com, freenet.de, hotmail.com, libero.it, netscape.net, postino.it, tiscali.co.uk, tiscali.cz, tiscali.it, voila.fr, wanadoo.fr, yahoo.com. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Feature request: smarter use of conditional indexes
Given an index like this: CREATE UNIQUE INDEX i1 ON t1 (c1) WHERE c1 IS NOT NULL; and a query like this: SELECT * FROM t1 WHERE c1 = 123; I'd like the planner to be smart enough to use an index scan using i1. Yes, I can change the query to this: SELECT * FROM t1 WHERE c1 = 123 AND c1 IS NOT NULL; In which case the index will be used, but I shouldn't have to. More practically, since a lot of my SQL is auto-generated, it's difficult to make this query change just in the cases where I need it. And I'm loathe to change every "column = value" pair in my auto-generated SQL into a double pair of "(column = value and column is not null)" It's redundant and looks pretty silly, IMO. Thanks for you consideration :) -John ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly