Re: [PERFORM] Simply join in PostrgeSQL takes too long
On Thu, 29 Apr 2004 13:36:47 -0400, Rod Taylor [EMAIL PROTECTED] wrote: The reason for the function is that the sort routines (hash aggregation included) will not stop in mid-sort Good point. Servus Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Simply join in PostrgeSQL takes too long
Vitaly, I am in the middle of going through them all, till now disabling the enable_mergejoin really helped. In that case, your random_page_cost is probably too low. Check the ratio of per-tuple times on index vs. seqscan seeks. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Simply join in PostrgeSQL takes too long
On Tue, 27 Apr 2004 18:01:34 -0400, Rod Taylor [EMAIL PROTECTED] wrote: On Tue, 2004-04-27 at 17:27, Vitaly Belman wrote: Hello pgsql-performance, I discussed the whole subject for some time in DevShed and didn't achieve much (as for results). I wonder if any of you guys can help out: http://forums.devshed.com/t136202/s.html The point is that a book cannot be of a certain genre more than once. Rod, he has a hierarchy of genres. Genre 1 has 6379 child genres and a book can be in more than one of these. Vitaly, though LIMIT makes this look like a small query, DISTINCT requires the whole result set to be retrieved. 0.7 seconds doesn't look so bad for several thousand rows. Did you try with other genre_ids? Maybe a merge join is not the best choice. Set enable_mergejoin to false and see whether you get a (hopefully faster) hash join, assuming that sort_mem is large enough to keep the hash table in memory. If you send me your table contents I'll try it on Linux. Servus Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Simply join in PostrgeSQL takes too long
Hello pgsql-performance, I discussed the whole subject for some time in DevShed and didn't achieve much (as for results). I wonder if any of you guys can help out: http://forums.devshed.com/t136202/s.html Regards, Vitaly Belman ICQ: 1912453 AIM: VitalyB1984 MSN: [EMAIL PROTECTED] Yahoo!: VitalyBe ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Simply join in PostrgeSQL takes too long
Vitaly Belman wrote: Hello pgsql-performance, I discussed the whole subject for some time in DevShed and didn't achieve much (as for results). I wonder if any of you guys can help out: http://forums.devshed.com/t136202/s.html So cutting and pasting: - SCHEMA - CREATE TABLE bv_bookgenres ( book_id INT NOT NULL, genre_id INT NOT NULL ); CREATE TABLE bv_genre_children ( genre_id INT, genre_child_id INT ); --- - QUERY - select DISTINCT book_id from bookgenres, genre_children WHERE bookgenres.genre_id = genre_children.genre_child_id AND genre_children.genre_id = 1 LIMIT 10 - - EXPLAIN ANALYZE - QUERY PLAN Limit (cost=6503.51..6503.70 rows=10 width=4) (actual time=703.000..703.000 rows=10 loops=1) - Unique (cost=6503.51..6738.20 rows=12210 width=4) (actual time=703.000..703.000 rows=10 loops=1) - Sort (cost=6503.51..6620.85 rows=46937 width=4) (actual time=703.000..703.000 rows=24 loops=1) Sort Key: bv_bookgenres.book_id - Merge Join (cost=582.45..2861.57 rows=46937 width=4) (actual time=46.000..501.000 rows=45082 loops=1) Merge Cond: (outer.genre_id = inner.genre_child_id) - Index Scan using genre_id on bv_bookgenres (cost=0.00..1462.84 rows=45082 width=8) (actual time=0.000..158.000 rows=45082 loops=1) - Sort (cost=582.45..598.09 rows=6256 width=2) (actual time=46.000..77.000 rows=49815 loops=1) Sort Key: bv_genre_children.genre_child_id - Index Scan using genre_id2 on bv_genre_children (cost=0.00..187.98 rows=6256 width=2) (actual time=0.000..31.000 rows=6379 loops=1) Index Cond: (genre_id = 1) Total runtime: 703.000 ms --- - CONF SETTINGS - shared_buffers = 1000 # min 16, at least max_connections*2, 8KB each sort_mem = 1 #work_mem = 1024# min 64, size in KB #maintenance_work_mem = 16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB - Have you VACUUM ANALYZED recently. If not do that then rerun the EXPLAIN ANALYZE. You might wanna bump shared_buffers. You have 512MB RAM right? You probably want to bump shared_buffers to 1, restart PG then run a VACUUM ANALYZE. Then rerun the EXPLAIN ANALYZE. If that doesnt help try doing a ALTER TABLE bv_genre_children ALTER COLUMN genre_child_id SET STATISTICS 100; followed by a: VACUUM ANALYZE bv_genre_children; You might also want to be tweaking the effective_cache_size parameter in postgresql.conf, but I am unsure how this would work on Windows. Does Windows have a kernel disk cache anyone? HTH Nick ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Simply join in PostrgeSQL takes too long
Hi, You can try some variation: SELECT book_id FROM bookgenres, genre_children WHERE bookgenres.genre_id = genre_children.genre_child_id AND genre_children.genre_id = 1 GROUP BY book_id LIMIT 10 The next works if the 'genre_child_id' is UNIQUE on the 'genre_children' table. SELECT book_id FROM bookgenres WHERE bookgenres.genre_id = (SELECT genre_child_id FROM genre_children WHERE genre_id = 1) GROUP BY book_id LIMIT 10 You may need some index. Try these with EXPLAIN! CREATE INDEX bookgenres_genre_id_book_id ON bookgenres(genre_id, book_id); or CREATE INDEX bookgenres_book_id_genre_id ON bookgenres(book_id, genre_id); CREATE INDEX genre_children_genre_id ON genre_children(genre_id); Regards, Antal Attila ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Simply join in PostrgeSQL takes too long
On Tue, 2004-04-27 at 17:27, Vitaly Belman wrote: Hello pgsql-performance, I discussed the whole subject for some time in DevShed and didn't achieve much (as for results). I wonder if any of you guys can help out: http://forums.devshed.com/t136202/s.html You're taking the wrong approach. Rather than using a select query to ensure that the book_id is distinct, add a constraint to the table so that is guaranteed. CREATE UNIQUE INDEX bv_bookgeneres_unq ON bv_bookgenres(book_id, genre_id); Now you can do a simple join (Drop the DISTINCT keyword) and achieve the same results. The point is that a book cannot be of a certain genre more than once. Without the distinct, this should take a matter of a few milliseconds to execute. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings