Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Josh Berkus
Vitaly,
 
> I am in the middle of going through them all, till now disabling the
> enable_mergejoin really helped.

In that case, your random_page_cost is probably too low.   Check the ratio of 
per-tuple times on index vs. seqscan seeks.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Vitaly Belman
Hello Manfred,

I thank everyone for helping me on this - So many tips.

I am in the middle of going through them all, till now disabling the
enable_mergejoin really helped.

Also, I agree that the design might be flawed (I could use triggers
and stuff like that) but for now I am just comparing how my project
will run on PostgreSQL (Considering migration from MySQL).

I'll be reporting back on how the other stuff helped.

Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe

Wednesday, April 28, 2004, 11:24:41 AM, you wrote:

MK> On Tue, 27 Apr 2004 18:01:34 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote:
>>On Tue, 2004-04-27 at 17:27, Vitaly Belman wrote:
>>> Hello pgsql-performance,
>>> 
>>>   I discussed the whole subject for some time in DevShed and didn't
>>>   achieve much (as for results). I wonder if any of you guys can help
>>>   out:
>>> 
>>>   http://forums.devshed.com/t136202/s.html

>>The point is that a book cannot be of a certain genre more than once.

MK> Rod, he has a hierarchy of genres.  Genre 1 has 6379 child genres and a
MK> book can be in more than one of these.

MK> Vitaly, though LIMIT makes this look like a small query, DISTINCT
MK> requires the whole result set to be retrieved.  0.7 seconds doesn't look
MK> so bad for several thousand rows.  Did you try with other genre_ids?

MK> Maybe a merge join is not the best choice.  Set enable_mergejoin to
MK> false and see whether you get a (hopefully faster) hash join, assuming
MK> that sort_mem is large enough to keep the hash table in memory.

MK> If you send me your table contents I'll try it on Linux.

MK> Servus
MK>  Manfred


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Manfred Koizar
On Thu, 29 Apr 2004 13:36:47 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote:
>The reason for the function is that the sort routines (hash aggregation
>included) will not stop in mid-sort

Good point.

Servus
 Manfred

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Rod Taylor
On Thu, 2004-04-29 at 13:13, Manfred Koizar wrote:
> On Wed, 28 Apr 2004 08:23:35 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote:
> >The below plpgsql forces the kind of algorithm we wish the planner could
> >choose. It should be fairly quick irregardless of dataset.
> 
> That reminds me of hash aggregation.  So here's another idea for Vitaly:

The reason for the function is that the sort routines (hash aggregation
included) will not stop in mid-sort, although I believe that feature is
on the TODO list.

I believe Vitaly will achieve 10ms or less query times using that
function.



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-29 Thread Manfred Koizar
On Wed, 28 Apr 2004 08:23:35 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote:
>The below plpgsql forces the kind of algorithm we wish the planner could
>choose. It should be fairly quick irregardless of dataset.

That reminds me of hash aggregation.  So here's another idea for Vitaly:

SELECT book_id
  FROM ...
 WHERE ...
 GROUP BY book_id
 LIMIT ...

Servus
 Manfred

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-28 Thread Rod Taylor
> Rod, he has a hierarchy of genres.  Genre 1 has 6379 child genres and a
> book can be in more than one of these.

 bookgenres.genre_id = genre_children.genre_child_id AND
 genre_children.genre_id = 1

I see, sorry. I didn't notice the genre_child_id in the where clause.
First glance had them all as genre_id.

When I run into this I usually create a 3rd table managed by triggers
that would relate the book to all genre entries. Insert takes a little
longer, but the selects can still be very quick.

The below plpgsql forces the kind of algorithm we wish the planner could
choose. It should be fairly quick irregardless of dataset.


CREATE OR REPLACE FUNCTION book_results(numeric) RETURNS SETOF numeric
AS
'
DECLARE
  v_genre ALIAS FOR $1;
  v_limit integer = 10;
  t_rows RECORD;
  v_transmitted integer = 0;

  v_transmitted_values numeric[] = ARRAY[1];

BEGIN
  FOR t_rows IN SELECT book_id
  FROM bv_bookgenres AS b
  JOIN bv_genre_children AS g ON (b.genre_id =
g.genre_child_id)
 WHERE g.genre_id = v_genre
  LOOP

-- If this is a new value, transmit it to the end user
IF NOT t_rows.book_id = ANY(v_transmitted_values) THEN
  v_transmitted_values := array_append(v_transmitted_values,
t_rows.book_id);
  v_transmitted := v_transmitted + 1;
  RETURN NEXT t_rows.book_id;
END IF;

EXIT WHEN v_transmitted >= v_limit;
  END LOOP;

  RETURN;
END;
' LANGUAGE plpgsql;

EXPLAIN ANALYZE SELECT * FROM book_results(1);
SELECT * FROM book_results(1);


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-28 Thread Manfred Koizar
On Tue, 27 Apr 2004 18:01:34 -0400, Rod Taylor <[EMAIL PROTECTED]> wrote:
>On Tue, 2004-04-27 at 17:27, Vitaly Belman wrote:
>> Hello pgsql-performance,
>> 
>>   I discussed the whole subject for some time in DevShed and didn't
>>   achieve much (as for results). I wonder if any of you guys can help
>>   out:
>> 
>>   http://forums.devshed.com/t136202/s.html

>The point is that a book cannot be of a certain genre more than once.

Rod, he has a hierarchy of genres.  Genre 1 has 6379 child genres and a
book can be in more than one of these.

Vitaly, though LIMIT makes this look like a small query, DISTINCT
requires the whole result set to be retrieved.  0.7 seconds doesn't look
so bad for several thousand rows.  Did you try with other genre_ids?

Maybe a merge join is not the best choice.  Set enable_mergejoin to
false and see whether you get a (hopefully faster) hash join, assuming
that sort_mem is large enough to keep the hash table in memory.

If you send me your table contents I'll try it on Linux.

Servus
 Manfred

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-27 Thread Josh Berkus
Vitaly,

I'm afraid that your helper on DevShed is right; 7.5 for Windows is still in 
development, we've not even *started* to check it for performance yet.  

Since the Merge Join is taking 90% of your query time, I might suggest 
increasing shared_buffers and sort_mem to see if that helps. 

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-27 Thread Rod Taylor
On Tue, 2004-04-27 at 17:27, Vitaly Belman wrote:
> Hello pgsql-performance,
> 
>   I discussed the whole subject for some time in DevShed and didn't
>   achieve much (as for results). I wonder if any of you guys can help
>   out:
> 
>   http://forums.devshed.com/t136202/s.html

You're taking the wrong approach. Rather than using a select query to
ensure that the book_id is distinct, add a constraint to the table so
that is guaranteed.

CREATE UNIQUE INDEX bv_bookgeneres_unq ON bv_bookgenres(book_id,
genre_id);

Now you can do a simple join (Drop the DISTINCT keyword) and achieve the
same results.

The point is that a book cannot be of a certain genre more than once.

Without the distinct, this should take a matter of a few milliseconds to
execute.



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-27 Thread Atesz
Hi,

You can try some variation:

SELECT 
  book_id
FROM  
  bookgenres, genre_children
WHERE
   bookgenres.genre_id = genre_children.genre_child_id AND  
   genre_children.genre_id = 1
GROUP BY book_id
LIMIT 10

The next works if the 'genre_child_id' is UNIQUE on the 'genre_children'
table.

SELECT 
  book_id
FROM  
  bookgenres
WHERE
   bookgenres.genre_id = (SELECT genre_child_id FROM genre_children
WHERE genre_id = 1)
GROUP BY book_id
LIMIT 10

You may need some index. Try these with EXPLAIN!
CREATE INDEX bookgenres_genre_id_book_id ON bookgenres(genre_id,
book_id);  or
CREATE INDEX bookgenres_book_id_genre_id ON bookgenres(book_id,
genre_id);
CREATE INDEX genre_children_genre_id ON genre_children(genre_id);

Regards, Antal Attila



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-27 Thread Nick Barr
Vitaly Belman wrote:
Hello pgsql-performance,
  I discussed the whole subject for some time in DevShed and didn't
  achieve much (as for results). I wonder if any of you guys can help
  out:
  http://forums.devshed.com/t136202/s.html
So cutting and pasting:
- SCHEMA -
CREATE TABLE bv_bookgenres (
book_id INT NOT NULL,
genre_id INT NOT NULL
);
CREATE TABLE bv_genre_children (
genre_id INT,
genre_child_id INT
);
---
- QUERY -
select DISTINCT
  book_id
from
  bookgenres,
  genre_children
WHERE
 bookgenres.genre_id = genre_children.genre_child_id AND
 genre_children.genre_id = 1
LIMIT 10
-
- EXPLAIN ANALYZE -
QUERY PLAN
Limit  (cost=6503.51..6503.70 rows=10 width=4) (actual 
time=703.000..703.000 rows=10 loops=1)
  ->  Unique  (cost=6503.51..6738.20 rows=12210 width=4) (actual 
time=703.000..703.000 rows=10 loops=1)
->  Sort  (cost=6503.51..6620.85 rows=46937 width=4) (actual 
time=703.000..703.000 rows=24 loops=1)
  Sort Key: bv_bookgenres.book_id
  ->  Merge Join  (cost=582.45..2861.57 rows=46937 width=4) 
(actual time=46.000..501.000 rows=45082 loops=1)
Merge Cond: ("outer".genre_id = "inner".genre_child_id)
->  Index Scan using genre_id on bv_bookgenres 
(cost=0.00..1462.84 rows=45082 width=8) (actual time=0.000..158.000 
rows=45082 loops=1)
->  Sort  (cost=582.45..598.09 rows=6256 width=2) 
(actual time=46.000..77.000 rows=49815 loops=1)
  Sort Key: bv_genre_children.genre_child_id
  ->  Index Scan using genre_id2 on 
bv_genre_children  (cost=0.00..187.98 rows=6256 width=2) (actual 
time=0.000..31.000 rows=6379 loops=1)
Index Cond: (genre_id = 1)
Total runtime: 703.000 ms
---

- CONF SETTINGS -
shared_buffers = 1000   # min 16, at least max_connections*2, 8KB each
sort_mem = 1
#work_mem = 1024# min 64, size in KB
#maintenance_work_mem = 16384   # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB
-
Have you VACUUM ANALYZED recently. If not do that then rerun the EXPLAIN 
ANALYZE.

You might wanna bump shared_buffers. You have 512MB RAM right? You 
probably want to bump shared_buffers to 1, restart PG then run a 
VACUUM ANALYZE. Then rerun the EXPLAIN ANALYZE.

If that doesnt help try doing a
ALTER TABLE bv_genre_children ALTER COLUMN genre_child_id SET STATISTICS 
100;

followed by a:
VACUUM ANALYZE bv_genre_children;
You might also want to be tweaking the effective_cache_size parameter in 
 postgresql.conf, but I am unsure how this would work on Windows. Does 
Windows have a kernel disk cache anyone?


HTH
Nick


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings