Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Manfred Koizar
On Thu, 29 Apr 2004 13:36:47 -0400, Rod Taylor [EMAIL PROTECTED] wrote:
The reason for the function is that the sort routines (hash aggregation
included) will not stop in mid-sort

Good point.

Servus
 Manfred

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Josh Berkus
Vitaly,
 
 I am in the middle of going through them all, till now disabling the
 enable_mergejoin really helped.

In that case, your random_page_cost is probably too low.   Check the ratio of 
per-tuple times on index vs. seqscan seeks.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-28 Thread Manfred Koizar
On Tue, 27 Apr 2004 18:01:34 -0400, Rod Taylor [EMAIL PROTECTED] wrote:
On Tue, 2004-04-27 at 17:27, Vitaly Belman wrote:
 Hello pgsql-performance,
 
   I discussed the whole subject for some time in DevShed and didn't
   achieve much (as for results). I wonder if any of you guys can help
   out:
 
   http://forums.devshed.com/t136202/s.html

The point is that a book cannot be of a certain genre more than once.

Rod, he has a hierarchy of genres.  Genre 1 has 6379 child genres and a
book can be in more than one of these.

Vitaly, though LIMIT makes this look like a small query, DISTINCT
requires the whole result set to be retrieved.  0.7 seconds doesn't look
so bad for several thousand rows.  Did you try with other genre_ids?

Maybe a merge join is not the best choice.  Set enable_mergejoin to
false and see whether you get a (hopefully faster) hash join, assuming
that sort_mem is large enough to keep the hash table in memory.

If you send me your table contents I'll try it on Linux.

Servus
 Manfred

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Simply join in PostrgeSQL takes too long

2004-04-27 Thread Vitaly Belman
Hello pgsql-performance,

  I discussed the whole subject for some time in DevShed and didn't
  achieve much (as for results). I wonder if any of you guys can help
  out:

  http://forums.devshed.com/t136202/s.html

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-27 Thread Nick Barr
Vitaly Belman wrote:
Hello pgsql-performance,
  I discussed the whole subject for some time in DevShed and didn't
  achieve much (as for results). I wonder if any of you guys can help
  out:
  http://forums.devshed.com/t136202/s.html
So cutting and pasting:
- SCHEMA -
CREATE TABLE bv_bookgenres (
book_id INT NOT NULL,
genre_id INT NOT NULL
);
CREATE TABLE bv_genre_children (
genre_id INT,
genre_child_id INT
);
---
- QUERY -
select DISTINCT
  book_id
from
  bookgenres,
  genre_children
WHERE
 bookgenres.genre_id = genre_children.genre_child_id AND
 genre_children.genre_id = 1
LIMIT 10
-
- EXPLAIN ANALYZE -
QUERY PLAN
Limit  (cost=6503.51..6503.70 rows=10 width=4) (actual 
time=703.000..703.000 rows=10 loops=1)
  -  Unique  (cost=6503.51..6738.20 rows=12210 width=4) (actual 
time=703.000..703.000 rows=10 loops=1)
-  Sort  (cost=6503.51..6620.85 rows=46937 width=4) (actual 
time=703.000..703.000 rows=24 loops=1)
  Sort Key: bv_bookgenres.book_id
  -  Merge Join  (cost=582.45..2861.57 rows=46937 width=4) 
(actual time=46.000..501.000 rows=45082 loops=1)
Merge Cond: (outer.genre_id = inner.genre_child_id)
-  Index Scan using genre_id on bv_bookgenres 
(cost=0.00..1462.84 rows=45082 width=8) (actual time=0.000..158.000 
rows=45082 loops=1)
-  Sort  (cost=582.45..598.09 rows=6256 width=2) 
(actual time=46.000..77.000 rows=49815 loops=1)
  Sort Key: bv_genre_children.genre_child_id
  -  Index Scan using genre_id2 on 
bv_genre_children  (cost=0.00..187.98 rows=6256 width=2) (actual 
time=0.000..31.000 rows=6379 loops=1)
Index Cond: (genre_id = 1)
Total runtime: 703.000 ms
---

- CONF SETTINGS -
shared_buffers = 1000   # min 16, at least max_connections*2, 8KB each
sort_mem = 1
#work_mem = 1024# min 64, size in KB
#maintenance_work_mem = 16384   # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB
-
Have you VACUUM ANALYZED recently. If not do that then rerun the EXPLAIN 
ANALYZE.

You might wanna bump shared_buffers. You have 512MB RAM right? You 
probably want to bump shared_buffers to 1, restart PG then run a 
VACUUM ANALYZE. Then rerun the EXPLAIN ANALYZE.

If that doesnt help try doing a
ALTER TABLE bv_genre_children ALTER COLUMN genre_child_id SET STATISTICS 
100;

followed by a:
VACUUM ANALYZE bv_genre_children;
You might also want to be tweaking the effective_cache_size parameter in 
 postgresql.conf, but I am unsure how this would work on Windows. Does 
Windows have a kernel disk cache anyone?


HTH
Nick


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-27 Thread Atesz
Hi,

You can try some variation:

SELECT 
  book_id
FROM  
  bookgenres, genre_children
WHERE
   bookgenres.genre_id = genre_children.genre_child_id AND  
   genre_children.genre_id = 1
GROUP BY book_id
LIMIT 10

The next works if the 'genre_child_id' is UNIQUE on the 'genre_children'
table.

SELECT 
  book_id
FROM  
  bookgenres
WHERE
   bookgenres.genre_id = (SELECT genre_child_id FROM genre_children
WHERE genre_id = 1)
GROUP BY book_id
LIMIT 10

You may need some index. Try these with EXPLAIN!
CREATE INDEX bookgenres_genre_id_book_id ON bookgenres(genre_id,
book_id);  or
CREATE INDEX bookgenres_book_id_genre_id ON bookgenres(book_id,
genre_id);
CREATE INDEX genre_children_genre_id ON genre_children(genre_id);

Regards, Antal Attila



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-27 Thread Rod Taylor
On Tue, 2004-04-27 at 17:27, Vitaly Belman wrote:
 Hello pgsql-performance,
 
   I discussed the whole subject for some time in DevShed and didn't
   achieve much (as for results). I wonder if any of you guys can help
   out:
 
   http://forums.devshed.com/t136202/s.html

You're taking the wrong approach. Rather than using a select query to
ensure that the book_id is distinct, add a constraint to the table so
that is guaranteed.

CREATE UNIQUE INDEX bv_bookgeneres_unq ON bv_bookgenres(book_id,
genre_id);

Now you can do a simple join (Drop the DISTINCT keyword) and achieve the
same results.

The point is that a book cannot be of a certain genre more than once.

Without the distinct, this should take a matter of a few milliseconds to
execute.



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings