Re: [PERFORM] Shared buffers, Sort memory, Effective Cache Size
On Wed, 21 Apr 2004 10:01:30 -0700, Qing Zhao [EMAIL PROTECTED] wrote: I have recently configured my PG7.3 on a G5 (8GB RAM) with shmmax set to 512MB and shared_buffer=5, sort_mem=4096 and effective cache size = 1. It seems working great so far but I am wondering if I should make effctive cache size larger myself. Yes, much larger! And while you are at it make shared_buffers smaller. Servus Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] planner/optimizer question
Hi, You should try the next queries: select support_person_id from ticket_crm_map where crm_id = 7 GROUP BY support_person_id; select support_person_id from ticket_crm_map where crm_id = 1 GROUP BY support_person_id; It can use the 'ticket_crm_map_crm_id_suppid' index. Generally the Postgres use an k-column index if columns of your conditions are prefix of the index column. For example: CREATE INDEX test_idx on test(col1,col2,col3,col4); SELECT * FROM test WHERE col1=3 AND col2=13; -- This can use the index. But the next queries cannot use the index: SELECT * FROM test WHERE col1=3 AND col3=13;. SELECT * FROM test WHERE col2=3; If you have problem with seq_scan or sort, you can disable globally and locally: SET enable_seqscan=0; SET enable_sort = 0; Regards, Antal Attila ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Join problem
Hi, I came across a very intriguing thing: I had to join two tables and in both tables I wanted to restrict the result set by some (text/varchar) attributes. Here is an example: Table item # 147 000 entries Column | Type | Modifiers ---+---+ id| integer | not null description | text | comment | text | not null order_id | integer | Table orders # 210 000 entries Column | Type | Modifiers -++--- order_id| integer| order_name | character varying(255) | The tables have 147 000 and 210 000 entries, respectively. First I tried the following query, which took ages: (Query 1) EXPLAIN ANALYZE SELECT item.id FROM item, orders WHERE orders.order_name ~* 'Smit' AND item.description ~* 'CD' and orders.order_id = item.order_id; I found out, that the change of the operator from '~*' to '=' for the item.description brought a great boost in performance (425 secs to 1 sec!), but not in cost (Query plans at the end). (Query 2) EXPLAIN ANALYZE SELECT item.id FROM item, orders WHERE orders.order_name ~* 'Smit' AND item.description = 'CD' and orders.order_id = item.order_id; The main difference was that Query 2 used the Hash join instead of the Nested Loop, so I disabled the option 'NESTED LOOP' and got for Query 1 a similar time as for Query 2. Can anyone tell me, why in one case the Hash join and in the other the much worse Nested Loop is prefered? And my second question is, is there any possibility to execute the first query without disabling the Nested Loop first, but get the good performance of the Hash join? Many thanks in advance for your help or suggestions Silke QUERY PLANS: # Query 1: QUERY PLAN -- Nested Loop (cost=0.00..28836.75 rows=1 width=4) (actual time=65350.780..452130.702 rows=6 loops=1) Join Filter: (inner.order_id = outer.order_id) - Seq Scan on item (cost=0.00..28814.24 rows=1 width=8) (actual time=33.180..1365.190 rows=716 loops=1) Filter: (description ~* 'CD'::text) - Seq Scan on orders (cost=0.00..22.50 rows=1 width=4) (actual time=21.644..629.500 rows=18 loops=716) Filter: ((order_name)::text ~* 'Smith'::text) Total runtime: 452130.782 ms ### Query 2: QUERY PLAN -- Hash Join (cost=22.50..28840.44 rows=4 width=4) (actual time=1187.798..1187.798 rows=0 loops=1) Hash Cond: (outer.order_id = inner.order_id) - Seq Scan on item (cost=0.00..28814.24 rows=733 width=8) (actual time=542.737..542.737 rows=0 loops=1) Filter: (description = 'CD'::text) - Hash (cost=22.50..22.50 rows=1 width=4) (actual time=645.042..645.042 rows=0 loops=1) - Seq Scan on orders (cost=0.00..22.50 rows=1 width=4) (actual time=22.373..644.996 rows=18 loops=1) Filter: ((order_name)::text ~* 'Smith'::text) Total runtime: 1187.865 ms Query 1 with 'set enable_nestloop to false' QUERY PLAN --- Hash Join (cost=22.50..28836.75 rows=1 width=4) (actual time=1068.593..2003.330 rows=6 loops=1) Hash Cond: (outer.item_id = inner.item_id) - Seq Scan on item (cost=0.00..28814.24 rows=1 width=8) (actual time=33.347..1357.073 rows=716 loops=1) Filter: (description ~* 'CD'::text) - Hash (cost=22.50..22.50 rows=1 width=4) (actual time=645.287..645.287 rows=0 loops=1) - Seq Scan on orders (cost=0.00..22.50 rows=1 width=4) (actual time=22.212..645.239 rows=18 loops=1) Filter: ((order_name)::text ~* 'CD'::text) Total runtime: 2003.409 ms ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Dave, Are you testing this with Tom's code, you need to do a baseline measurement with 10 and then increase it, you will still get lots of cs, but it will be less. No, that was just a test of 1000 straight up.Tom outlined a method, but I didn't see any code that would help me find a better level, other than just trying each +100 increase one at a time. This would take days of testing ... -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] OT: Help with performance problems
On Mon, 26 Apr 2004, Rob Fielding wrote: scott.marlowe wrote: On Fri, 23 Apr 2004, Chris Hoover wrote: DB's on Powervaults 220S using raid 5 (over 6 disks) What controller is this, the adaptec? We've found it to be slower than the LSI megaraid based controller, but YMMV. Wow, really? You got any more details of the chipset, mobo and kernel driver ? We're running on a Dell 2650, the controller is the U320 LSI megaraid 2 channel (they only make the one that I know of right now). Don't know my mobo chipset offhand, but might be able to find out what one dell includes on the 2650. The kernel driver is the latest megaraid2 driver as of about Feb this year. I've been taken to my wits end wrestling with an LSI MegaRAID 320-1 controller on a supermicro board all weekend. I just couldn't get anything more than 10MB/sec out of it with megaraid driver v1 OR v2 in Linux 2.4.26, nor the version in 2.6.6-rc2. After 2 days of humming the Adaptec mantra I gave in and switched the array straight onto the onboard Adaptec 160 controller (same cable and everything). Software RAID 5 gets me over 40MB sec for a nominal cpu hit - more than 4 times what I could get out of the MegaRAID controller :( Even the 2nd SCSI-2 channel gets 40MB/sec max (pg_xlog :) And HOW LONG does it take to detect drives during POSTo never mind ... I really just wanna rant :) There should be a free counseling service for enraged sysops. I wonder if your controller is broken or something? Or maybe on a PCI slow that has to share IRQs or something. I've had great luck with SuperMicro mobos in the past (we're talking dual PPro 200 mobos, so seriously, IN THE PAST here... ) Hell, my Dual PPro 200 with an old MegaRAID 428 got 18 Megs a second cfer rate no problem. Have you tried that lsi card in another machine / mobo combo? Can you disable the onboard adaptec? We have on our Dell 2650s, the only active controllers are the onboard IDE and the add in LSI-320-2 controller. We're running ours with 128 Meg cache (I think could be 64) set to write back. I think our throughput on a RAID-1 pair was somewhere around 40+ megs a second reads with bonnie++ With RAID-5 it was not really much faster at reads (something like 60 megs a second) but was much more scalable under heavy parellel read/write access for PostgreSQL. Have you updated the BIOS on the mobo to see if that helps? I'm just throwing darts at the wall here. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Josh, I think you can safely increase by orders of magnitude here, instead of by +100, my wild ass guess is that the sweet spot is the spin time should be approximately the time it takes to consume the resource. So if you have a really fast machine then the spin count should be higher. Also you have to take into consideration your memory bus speed, with the pause instruction inserted in the loop the timing is now dependent on memory speed. But... you need a baseline first. Dave On Tue, 2004-04-27 at 14:05, Josh Berkus wrote: Dave, Are you testing this with Tom's code, you need to do a baseline measurement with 10 and then increase it, you will still get lots of cs, but it will be less. No, that was just a test of 1000 straight up.Tom outlined a method, but I didn't see any code that would help me find a better level, other than just trying each +100 increase one at a time. This would take days of testing ... -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Dave, But... you need a baseline first. A baseline on CS? I have that -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Simply join in PostrgeSQL takes too long
Hello pgsql-performance, I discussed the whole subject for some time in DevShed and didn't achieve much (as for results). I wonder if any of you guys can help out: http://forums.devshed.com/t136202/s.html Regards, Vitaly Belman ICQ: 1912453 AIM: VitalyB1984 MSN: [EMAIL PROTECTED] Yahoo!: VitalyBe ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Simply join in PostrgeSQL takes too long
Vitaly Belman wrote: Hello pgsql-performance, I discussed the whole subject for some time in DevShed and didn't achieve much (as for results). I wonder if any of you guys can help out: http://forums.devshed.com/t136202/s.html So cutting and pasting: - SCHEMA - CREATE TABLE bv_bookgenres ( book_id INT NOT NULL, genre_id INT NOT NULL ); CREATE TABLE bv_genre_children ( genre_id INT, genre_child_id INT ); --- - QUERY - select DISTINCT book_id from bookgenres, genre_children WHERE bookgenres.genre_id = genre_children.genre_child_id AND genre_children.genre_id = 1 LIMIT 10 - - EXPLAIN ANALYZE - QUERY PLAN Limit (cost=6503.51..6503.70 rows=10 width=4) (actual time=703.000..703.000 rows=10 loops=1) - Unique (cost=6503.51..6738.20 rows=12210 width=4) (actual time=703.000..703.000 rows=10 loops=1) - Sort (cost=6503.51..6620.85 rows=46937 width=4) (actual time=703.000..703.000 rows=24 loops=1) Sort Key: bv_bookgenres.book_id - Merge Join (cost=582.45..2861.57 rows=46937 width=4) (actual time=46.000..501.000 rows=45082 loops=1) Merge Cond: (outer.genre_id = inner.genre_child_id) - Index Scan using genre_id on bv_bookgenres (cost=0.00..1462.84 rows=45082 width=8) (actual time=0.000..158.000 rows=45082 loops=1) - Sort (cost=582.45..598.09 rows=6256 width=2) (actual time=46.000..77.000 rows=49815 loops=1) Sort Key: bv_genre_children.genre_child_id - Index Scan using genre_id2 on bv_genre_children (cost=0.00..187.98 rows=6256 width=2) (actual time=0.000..31.000 rows=6379 loops=1) Index Cond: (genre_id = 1) Total runtime: 703.000 ms --- - CONF SETTINGS - shared_buffers = 1000 # min 16, at least max_connections*2, 8KB each sort_mem = 1 #work_mem = 1024# min 64, size in KB #maintenance_work_mem = 16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB - Have you VACUUM ANALYZED recently. If not do that then rerun the EXPLAIN ANALYZE. You might wanna bump shared_buffers. You have 512MB RAM right? You probably want to bump shared_buffers to 1, restart PG then run a VACUUM ANALYZE. Then rerun the EXPLAIN ANALYZE. If that doesnt help try doing a ALTER TABLE bv_genre_children ALTER COLUMN genre_child_id SET STATISTICS 100; followed by a: VACUUM ANALYZE bv_genre_children; You might also want to be tweaking the effective_cache_size parameter in postgresql.conf, but I am unsure how this would work on Windows. Does Windows have a kernel disk cache anyone? HTH Nick ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Simply join in PostrgeSQL takes too long
Hi, You can try some variation: SELECT book_id FROM bookgenres, genre_children WHERE bookgenres.genre_id = genre_children.genre_child_id AND genre_children.genre_id = 1 GROUP BY book_id LIMIT 10 The next works if the 'genre_child_id' is UNIQUE on the 'genre_children' table. SELECT book_id FROM bookgenres WHERE bookgenres.genre_id = (SELECT genre_child_id FROM genre_children WHERE genre_id = 1) GROUP BY book_id LIMIT 10 You may need some index. Try these with EXPLAIN! CREATE INDEX bookgenres_genre_id_book_id ON bookgenres(genre_id, book_id); or CREATE INDEX bookgenres_book_id_genre_id ON bookgenres(book_id, genre_id); CREATE INDEX genre_children_genre_id ON genre_children(genre_id); Regards, Antal Attila ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Simply join in PostrgeSQL takes too long
On Tue, 2004-04-27 at 17:27, Vitaly Belman wrote: Hello pgsql-performance, I discussed the whole subject for some time in DevShed and didn't achieve much (as for results). I wonder if any of you guys can help out: http://forums.devshed.com/t136202/s.html You're taking the wrong approach. Rather than using a select query to ensure that the book_id is distinct, add a constraint to the table so that is guaranteed. CREATE UNIQUE INDEX bv_bookgeneres_unq ON bv_bookgenres(book_id, genre_id); Now you can do a simple join (Drop the DISTINCT keyword) and achieve the same results. The point is that a book cannot be of a certain genre more than once. Without the distinct, this should take a matter of a few milliseconds to execute. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] planner/optimizer question
[EMAIL PROTECTED] writes: ... Wouldn't the most efficient plan be to scan the index regardless of crm_id because the only columns needed are in the index? No. People coming from other databases often have the misconception that queries can be answered by looking only at an index. That is never true in Postgres because row validity info is only stored in the table; so we must always visit the table entry to make sure the row is still valid/visible for the current query. Accordingly, columns added to the index that aren't constrained by the WHERE clause are not very useful ... regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [JDBC] [PERFORM] is a good practice to create an index on the
AFAIK, oids aren't used for anything internally, so duplicates don't really matter. Besides, what would you do about duplicate oid's ? The best suggestion is of course his last, don't use them. On Mon, 2004-04-26 at 22:48, Christopher Kings-Lynne wrote: I am using the oid of the table as the main key and I've found that is not indexed (maybe because I have declared another primary key in the table) it is a good practice to create an index like this on the oid of a table? CREATE INDEX idoid annuncio400 USING btree (oid); Yes it is - in fact you really should add a unique index, not just a normal index, as you want to enforce uniqueness of the oid column. It is theoretically possible to end up with duplicate oids in wraparound situations. Even better though is to not use oids at all, of course... Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html !DSPAM:408dcc51235334924183622! -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [JDBC] [PERFORM] is a good practice to create an index on the
AFAIK, oids aren't used for anything internally, so duplicates don't really matter. Besides, what would you do about duplicate oid's ? If he's using them _externally_, then he does have to worry about duplicates. Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [JDBC] [PERFORM] is a good practice to create an index on the
I am going to use them as primary key of the table, so I'll surely need them unique :) Eduoardo, I REALLY suggest you don't use them at all. You should make a primary key like this: CREATE TABLE blah ( id SERIAL PRIMARY KEY, ... ); Also note that by default, OIDs are NOT dumped by pg_dump. You will need to add extra switches to your pg_dump backup to ensure that they are. Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster