Re: [PERFORM] How to read query plan

2005-03-14 Thread Miroslav Šulc
John Arbash Meinel wrote:
In fact, on MySQL I didn't see any slow reactions so I didn't measure
and inspect it. But I can try it if I figure out how to copy the
database from PostgreSQL to MySQL.

I figured you still had a copy of the MySQL around to compare to. You
probably don't need to spend too much time on it yet.
So I have some results. I have tested the query on both PostgreSQL 8.0.1 
and MySQL 4.1.8 with LIMIT set to 30 and OFFSET set to 6000. PostgreSQL 
result is 11,667.916 ms, MySQL result is 448.4 ms.

Both databases are running on the same machine (my laptop) and contain 
the same data. However there are some differences in the data table 
definitions:
1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL I 
use 'enum'
2) in PostgreSQL in some cases I use connection fields that are not of 
the same type (smallint - integer (SERIAL)), in MySQL I use the same types

John
=:-
Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to read query plan

2005-03-14 Thread Miroslav Šulc
Tom Lane wrote:
...
I think the reason this is popping to the top of the runtime is that the
joins are so wide (an average of ~85 columns in a join tuple according
to the numbers above).  Because there are lots of variable-width columns
involved, most of the time the fast path for field access doesn't apply
and we end up going to nocachegetattr --- which itself is going to be
slow because it has to scan over so many columns.  So the cost is
roughly O(N^2) in the number of columns.
 

As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't 
be helpful to change them to char(1)? Would it solve the variable-width 
problem at least for some fields and speed the query up?

As a short-term hack, you might be able to improve matters if you can
reorder your LEFT JOINs to have the minimum number of columns
propagating up from the earlier join steps.  In other words make the
later joins add more columns than the earlier, as much as you can.
 

That will be hard as the main table which contains most of the fields is 
LEFT JOINed with the others. I'll look at it if I find some way to 
improve it.

I'm not sure whether I understand the process of performing the plan but 
I imagine that the data from AdDevicesSites are retrieved only once when 
they are loaded and maybe stored in memory. Are the columns stored in 
the order they are in the SQL command? If so, wouldn't it be useful to 
move all varchar fields at the end of the SELECT query? I'm just 
guessing because I don't know at all how a database server is 
implemented and what it really does.

..
			regards, tom lane
 

Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How to read query plan

2005-03-14 Thread PFC
	Instead of a varchar(1) containing 'y' or 'n' you could use a BOOL or an  
integer.
	Your query seems of the form :

	SELECT FROM main_table LEFT JOIN a lot of tables ORDER BY sort_key LIMIT  
N OFFSET M;

	I would suggest to rewrite it in a simpler way : instead of generating  
the whole result set, sorting it, and then grabbing a slice, generate only  
the ror id's, grab a slice, and then generate the full rows from that.

	- If you order by a field which is in main_table :
	SELECT FROM main_table LEFT JOIN a lot of tables WHERE main_table.id IN  
(SELECT id FROM main_table ORDER BY sort_key LIMIT N OFFSET M
) ORDER BY sort_key LIMIT N OFFSET M;

	- If you order by a field in one of the child tables, I guess you only  
want to display the rows in the main table which have this field, ie.  
not-null in the LEFT JOIN. You can also use the principle above.

- You can use a straight join instead of an IN.
On Mon, 14 Mar 2005 09:58:49 +0100, Miroslav ¦ulc  
[EMAIL PROTECTED] wrote:

John Arbash Meinel wrote:
In fact, on MySQL I didn't see any slow reactions so I didn't measure
and inspect it. But I can try it if I figure out how to copy the
database from PostgreSQL to MySQL.

I figured you still had a copy of the MySQL around to compare to. You
probably don't need to spend too much time on it yet.
So I have some results. I have tested the query on both PostgreSQL 8.0.1
and MySQL 4.1.8 with LIMIT set to 30 and OFFSET set to 6000. PostgreSQL
result is 11,667.916 ms, MySQL result is 448.4 ms.
Both databases are running on the same machine (my laptop) and contain
the same data. However there are some differences in the data table
definitions:
1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL I
use 'enum'
2) in PostgreSQL in some cases I use connection fields that are not of
the same type (smallint - integer (SERIAL)), in MySQL I use the same  
types

John
=:-
Miroslav

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] How to read query plan

2005-03-14 Thread Miroslav Šulc
Christopher Kings-Lynne wrote:
1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL 
I use 'enum'
2) in PostgreSQL in some cases I use connection fields that are not 
of the same type (smallint - integer (SERIAL)), in MySQL I use the 
same types

Well both those things will make PostgreSQL slower...
I think so. I'll change the varchar(1) fields to char(1) where possible 
and will think out what I will do with the smallint - integer JOINs. 
Something like SMALLSERIAL would be pleasant :-) I thought I will wait 
for Tom Lane's reaction to my improvement suggestions I have posted in 
other mail but maybe he has a deep night because of different time zone.

Chris
Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to read query plan

2005-03-14 Thread Christopher Kings-Lynne
1) in PostgreSQL I use 'varchar(1)' for a lot of fields and in MySQL I 
use 'enum'
2) in PostgreSQL in some cases I use connection fields that are not of 
the same type (smallint - integer (SERIAL)), in MySQL I use the same 
types
Well both those things will make PostgreSQL slower...
Chris
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] How to read query plan

2005-03-14 Thread Miroslav Šulc
PFC wrote:
Instead of a varchar(1) containing 'y' or 'n' you could use a BOOL 
or an  integer.
Sure I could. The problem is our project still supports both MySQL and 
PostgreSQL. We used enum('Y','N') in MySQL so there would be a lot of 
changes in the code if we would change to the BOOL data type.

Your query seems of the form :
SELECT FROM main_table LEFT JOIN a lot of tables ORDER BY sort_key 
LIMIT  N OFFSET M;

I would suggest to rewrite it in a simpler way : instead of 
generating  the whole result set, sorting it, and then grabbing a 
slice, generate only  the ror id's, grab a slice, and then generate 
the full rows from that.

- If you order by a field which is in main_table :
SELECT FROM main_table LEFT JOIN a lot of tables WHERE 
main_table.id IN  (SELECT id FROM main_table ORDER BY sort_key LIMIT N 
OFFSET M
) ORDER BY sort_key LIMIT N OFFSET M;

- If you order by a field in one of the child tables, I guess you 
only  want to display the rows in the main table which have this 
field, ie.  not-null in the LEFT JOIN. You can also use the principle 
above.

- You can use a straight join instead of an IN.
Do you mean something like this?
SELECT Table.IDPK, Table2.varchar1, Table2.varchar2, ...
FROM Table
LEFT JOIN many tables
INNER JOIN Table AS Table2
Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to read query plan

2005-03-14 Thread Tom Lane
=?windows-1250?Q?Miroslav_=8Aulc?= [EMAIL PROTECTED] writes:
 As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't 
 be helpful to change them to char(1)? Would it solve the variable-width 
 problem at least for some fields and speed the query up?

No, because char(1) isn't physically fixed-width (consider multibyte
encodings).  There's really no advantage to char(N) in Postgres.

I don't know what you're doing with those fields, but if they are
effectively booleans or small codes you might be able to convert them to
bool or int fields.  There is also the char datatype (not to be
confused with char(1)) which can hold single ASCII characters, but is
nonstandard and a bit impoverished as to functionality.

However, I doubt this is worth pursuing.  One of the things I tested
yesterday was a quick hack to organize the storage of intermediate join
tuples with fixed-width fields first and non-fixed ones later.  It
really didn't help much at all :-(.  I think the trouble with your
example is that in the existing code, the really fast path applies only
when the tuple contains no nulls --- and since you're doing all that
left joining, there's frequently at least one null lurking.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to read query plan

2005-03-14 Thread Harald Fuchs
In article [EMAIL PROTECTED],
=?ISO-8859-15?Q?Miroslav_=A6ulc?= [EMAIL PROTECTED] writes:

 Instead of a varchar(1) containing 'y' or 'n' you could use a
 BOOL or an  integer.

 Sure I could. The problem is our project still supports both MySQL and
 PostgreSQL. We used enum('Y','N') in MySQL so there would be a lot of
 changes in the code if we would change to the BOOL data type.

Since BOOL is exactly what you want to express and since MySQL also
supports BOOL (*), you should make that change anyway.

(*) MySQL recognizes BOOL as a column type and silently uses
TINYINT(1) instead.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to read query plan

2005-03-14 Thread Miroslav Šulc
Tom Lane wrote:
=?windows-1250?Q?Miroslav_=8Aulc?= [EMAIL PROTECTED] writes:
 

As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't 
be helpful to change them to char(1)? Would it solve the variable-width 
problem at least for some fields and speed the query up?
   

No, because char(1) isn't physically fixed-width (consider multibyte
encodings).  There's really no advantage to char(N) in Postgres.
 

I was aware of that :-(
I don't know what you're doing with those fields, but if they are
effectively booleans or small codes you might be able to convert them to
bool or int fields.  There is also the char datatype (not to be
confused with char(1)) which can hold single ASCII characters, but is
nonstandard and a bit impoverished as to functionality.
 

The problem lies in migration from MySQL to PostgreSQL. In MySQL we 
(badly) choose enum for yes/no switches (there's nothing like boolean 
field type in MySQL as I know but we could use tinyint). It will be very 
time consuming to rewrite all such enums and check the code whether it 
works.

However, I doubt this is worth pursuing.  One of the things I tested
yesterday was a quick hack to organize the storage of intermediate join
tuples with fixed-width fields first and non-fixed ones later.  It
really didn't help much at all :-(.  I think the trouble with your
example is that in the existing code, the really fast path applies only
when the tuple contains no nulls --- and since you're doing all that
left joining, there's frequently at least one null lurking.
 

Unfortunatelly I don't see any other way than LEFT JOINing in this case.
			regards, tom lane
 

Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] How to read query plan

2005-03-14 Thread Miroslav Šulc
Harald Fuchs wrote:
Sure I could. The problem is our project still supports both MySQL and
PostgreSQL. We used enum('Y','N') in MySQL so there would be a lot of
changes in the code if we would change to the BOOL data type.
   

Since BOOL is exactly what you want to express and since MySQL also
supports BOOL (*), you should make that change anyway.
 

I know that. The time will have to come.
(*) MySQL recognizes BOOL as a column type and silently uses
TINYINT(1) instead.
 

I've checked that and you are right, but the BOOL is in MySQL from 
version 4.1.0 though we could use tinyint instead of enum - our bad choice.

Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to read query plan

2005-03-14 Thread John Arbash Meinel
Miroslav ulc wrote:
Tom Lane wrote:
...
I think the reason this is popping to the top of the runtime is that the
joins are so wide (an average of ~85 columns in a join tuple according
to the numbers above).  Because there are lots of variable-width columns
involved, most of the time the fast path for field access doesn't apply
and we end up going to nocachegetattr --- which itself is going to be
slow because it has to scan over so many columns.  So the cost is
roughly O(N^2) in the number of columns.

As there are a lot of varchar(1) in the AdDevicesSites table, wouldn't
be helpful to change them to char(1)? Would it solve the
variable-width problem at least for some fields and speed the query up?
I'm guessing there really wouldn't be a difference. I think varchar()
and char() are stored the same way, just one always has space padding. I
believe they are both varlena types, so they are still variable length.
As a short-term hack, you might be able to improve matters if you can
reorder your LEFT JOINs to have the minimum number of columns
propagating up from the earlier join steps.  In other words make the
later joins add more columns than the earlier, as much as you can.

That will be hard as the main table which contains most of the fields
is LEFT JOINed with the others. I'll look at it if I find some way to
improve it.
One thing that you could try, is to select just the primary keys from
the main table, and then later on, join back to that table to get the
rest of the columns. It is a little bit hackish, but if it makes your
query faster, you might want to try it.
I'm not sure whether I understand the process of performing the plan
but I imagine that the data from AdDevicesSites are retrieved only
once when they are loaded and maybe stored in memory. Are the columns
stored in the order they are in the SQL command? If so, wouldn't it be
useful to move all varchar fields at the end of the SELECT query? I'm
just guessing because I don't know at all how a database server is
implemented and what it really does.
I don't think they are stored in the order of the SELECT  portion. I'm
guessing they are loaded and saved as you go. But that the order of the
LEFT JOIN at the end is probably important.
..
regards, tom lane

Miroslav

John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] How to read query plan

2005-03-14 Thread Greg Stark
Miroslav ©ulc [EMAIL PROTECTED] writes:

 I think so. I'll change the varchar(1) fields to char(1) where possible 

char isn't faster than varchar on postgres. If anything it may be slightly
slower because every comparison first needs to pad both sides with spaces.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to read query plan

2005-03-14 Thread Tom Lane
=?ISO-8859-15?Q?Miroslav_=A6ulc?= [EMAIL PROTECTED] writes:
 PFC wrote:
 Instead of a varchar(1) containing 'y' or 'n' you could use a BOOL 
 or an  integer.

 Sure I could. The problem is our project still supports both MySQL and 
 PostgreSQL. We used enum('Y','N') in MySQL so there would be a lot of 
 changes in the code if we would change to the BOOL data type.

Just FYI, I did a quick search-and-replace on your dump to replace
varchar(1) by char, which makes the column fixed-width without any
change in the visible data.  This made hardly any difference in the
join speed though :-(.  So that is looking like a dead end.

John's idea about re-joining to the main table to pick up the bulk of
its fields only after joining to the sub-tables might work.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] How to read query plan

2005-03-14 Thread Miroslav Šulc
Tom Lane wrote:
Just FYI, I did a quick search-and-replace on your dump to replace
varchar(1) by char, which makes the column fixed-width without any
change in the visible data.  This made hardly any difference in the
join speed though :-(.  So that is looking like a dead end.
 

I'll try to change the data type to bool but the problem I stand in 
front of is that the code expects that SELECTs return 'Y' or 'N' but 
what I have found out till now is that PostgreSQL returns 't' or 'f' for 
bool data. I think about some solution but they use CPU :-(

John's idea about re-joining to the main table to pick up the bulk of
its fields only after joining to the sub-tables might work.
 

I'll try that. It seems it could work.
			regards, tom lane
 

Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How to read query plan

2005-03-14 Thread John Arbash Meinel
Miroslav ¦ulc wrote:
PFC wrote:
Your query seems of the form :
SELECT FROM main_table LEFT JOIN a lot of tables ORDER BY
sort_key LIMIT  N OFFSET M;
I would suggest to rewrite it in a simpler way : instead of
generating  the whole result set, sorting it, and then grabbing a
slice, generate only  the ror id's, grab a slice, and then generate
the full rows from that.
- If you order by a field which is in main_table :
SELECT FROM main_table LEFT JOIN a lot of tables WHERE
main_table.id IN  (SELECT id FROM main_table ORDER BY sort_key LIMIT
N OFFSET M
) ORDER BY sort_key LIMIT N OFFSET M;
- If you order by a field in one of the child tables, I guess you
only  want to display the rows in the main table which have this
field, ie.  not-null in the LEFT JOIN. You can also use the principle
above.
- You can use a straight join instead of an IN.

Do you mean something like this?
SELECT Table.IDPK, Table2.varchar1, Table2.varchar2, ...
FROM Table
LEFT JOIN many tables
INNER JOIN Table AS Table2
Miroslav
I would also recommend using the subselect format. Where any columns
that you are going to need to sort on show up in the subselect.
So you would have:
SELECT ...
   FROM main_table
   LEFT JOIN tablea ON ...
   LEFT JOIN tableb ON ...
   ...
   JOIN other_table ON ...
   WHERE main_table.idpk IN
  (SELECT idpk
   FROM main_table JOIN other_table ON main_table.idpk =
other_table.main_idpk
   WHERE ...
   ORDER BY other_table.abcd LIMIT n OFFSET m)
;
I think the final LIMIT + OFFSET would give you the wrong results, since
you have already filtered out the important rows.
I also think you don't need the final order by, since the results should
already be in sorted order.
Now this also assumes that if someone is sorting on a row, then they
don't want null entries. If they do, then you can change the subselect
into a left join. But with appropriate selectivity and indexes, an inner
join can filter out a lot of rows, and give you better performance.
The inner subselect gives you selectivity on the main table, so that you
don't have to deal with all the columns in the search, and then you
don't have to deal with all the rows later on.
I think you can also do this:
SELECT ...
   FROM (SELECT main_table.idpk, other_table.columns FROM main_table
JOIN other_table ) as p
   LEFT JOIN ...
   JOIN main_table ON main_table.idpk = p.idpk;
In that case instead of selecting out the id and putting that into the
where, you put it in the from, and then join against it.
I don't really know which is better.
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] How to read query plan

2005-03-14 Thread Kaloyan Iliev Iliev




Hi,

I have an idea about your problem. Will it be difficult not to change
the entire code but only the queries? You can change type in the
Postgres to bool. Then, when select data you can use a CASE..WHEN to
return 'Y' or 'N' or even write a little function which accepts bool
and returns 'Y' or 'N'. In this case in all your queries you will have
to replace the select of bool field with select form the function. 

Kaloyan

Miroslav ulc wrote:
Tom Lane
wrote:
  
  
  Just FYI, I did a quick search-and-replace on
your dump to replace

varchar(1) by "char", which makes the column fixed-width without any

change in the visible data. This made hardly any difference in the

join speed though :-(. So that is looking like a dead end.




  
I'll try to change the data type to bool but the problem I stand in
front of is that the code expects that SELECTs return 'Y' or 'N' but
what I have found out till now is that PostgreSQL returns 't' or 'f'
for bool data. I think about some solution but they use CPU :-(
  
  
  John's idea about re-joining to the main
table to pick up the bulk of

its fields only after joining to the sub-tables might work.




  
I'll try that. It seems it could work.
  
  
   regards, tom lane




  
Miroslav
  
  


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq
  





Re: [PERFORM] How to read query plan

2005-03-14 Thread Miroslav Šulc
Kaloyan Iliev Iliev wrote:
Hi,
I have an idea about your problem. Will it be difficult not to change 
the entire code but only the queries? You can change type in the 
Postgres to bool. Then, when select data you can use a CASE..WHEN to 
return 'Y' or 'N' or even write a little function which accepts bool 
and returns 'Y' or 'N'. In this case in all your queries you will have 
to replace the select of bool field with select form the function.
Thank you for your suggestion. I had a private message exchange with 
Harald Fuchs who suggested the same (except the function). Here is what 
whe exchanged:

Harald Fuchs wrote:
If you can control the SELECTs, just use
 SELECT CASE col WHEN true THEN 'Y' ELSE 'N' END
instead of
 SELECT col
Thus you wouldn't need to change your application code.
 

I use single SELECT for both PostgreSQL and MySQL. I could use your 
solution. It would just require some tagging of bool fields in SELECTs 
so I could add the CASE statement in case I use PostgreSQL backend.

Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How to read query plan

2005-03-13 Thread John Arbash Meinel
Miroslav ulc wrote:
Hi all,
I am new to PostgreSQL and query optimizations. We have recently moved
our project from MySQL to PostgreSQL and we are having performance
problem with one of our most often used queries. On MySQL the speed
was sufficient but PostgreSQL chooses time expensive query plan. I
would like to optimize it somehow but the query plan from EXPLAIN
ANALYZE is little bit cryptic to me.
So the first thing I would like is to understand the query plan. I
have read performance tips and FAQ but it didn't move me too much
further.
I would appreciate if someone could help me to understand the query
plan and what are the possible general options I can test. I think at
this moment the most expensive part is the Sort. Am I right? If so,
how could I generally avoid it (turning something on or off, using
parentheses for JOINs etc.) to force some more efficient query plan?
Thank you for any suggestions.
You really need to post the original query, so we can see *why* postgres
thinks it needs to run the plan this way.
Also, the final sort actually isn't that expensive.
When you have the numbers (cost=xxx..yyy) the xxx is the time when the
step can start, and the yyy is the time when the step can finish. For a
lot of steps, it can start running while the sub-steps are still feeding
back more data, for others, it has to wait for the sub-steps to finish.
The first thing to look for, is to make sure the estimated number of
rows is close to the actual number of rows. If they are off, then
postgres may be mis-estimating the optimal plan. (If postgres thinks it
is going to only need 10 rows, it may use an index scan, but when 1000
rows are returned, a seq scan might have been faster.)
You seem to be doing a lot of outer joins. Is that necessary? I don't
really know what you are looking for, but you are joining against enough
tables, that I think this query is always going to be slow.
From what I can tell, you have 1 table which has 6364 rows, and you are
grabbing all of those rows, and then outer joining it with about 11
other tables.
I would actually guess that the most expensive parts of the plan are the
NESTED LOOPS which when they go to materialize have to do a sequential
scan, and they get executed 6364 times. It looks like the other tables
are small (only 3-5 rows), so it takes about 0.05 ms for each seqscan,
the problem is that because you are doing it 6k times, it ends up taking
about 300ms of your time.
You could try setting set enable_nestloop to off.
I don't know that it will be faster, but it could be.
In general, though, it seems like you should be asking a different
question, rather than trying to optimize the query that you have.
Can you post the original SQL statement, and maybe describe what you are
trying to do?
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] How to read query plan

2005-03-13 Thread Ragnar Hafstað
On Sun, 2005-03-13 at 16:32 +0100, Miroslav ulc wrote:
 Hi all,
 
 I am new to PostgreSQL and query optimizations. We have recently moved 
 our project from MySQL to PostgreSQL and we are having performance 
 problem with one of our most often used queries. On MySQL the speed was 
 sufficient but PostgreSQL chooses time expensive query plan. I would 
 like to optimize it somehow but the query plan from EXPLAIN ANALYZE is 
 little bit cryptic to me.
 

[snip output of EXPLAIN ANALYZE]

for those of us who have not yet reached the level where one can
infer it from the query plan, how abour showing us the actual
query too ?

but as an example of what to look for, consider the first few lines
(reformatted): 

 Merge Right Join  (cost=9868.84..9997.74 rows=6364 width=815) 
   (actual time=9982.022..10801.216 rows=6364 loops=1)
   Merge Cond: (outer.idpk = inner.cadastralunitidfk)
   -  Index Scan using cadastralunits_pkey on cadastralunits  
   (cost=0.00..314.72 rows=13027 width=31)
   (actual time=0.457..0.552 rows=63 loops=1)
   -  Sort  (cost=9868.84..9884.75 rows=6364 width=788)
 (actual time=9981.405..10013.708 rows=6364 loops=1)

notice that the index scan is expected to return 13027 rows, but
actually returns 63. this might influence the a choice of plan.

gnari




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to read query plan

2005-03-13 Thread Miroslav ulc
Hi John,
thank you for your response.
John Arbash Meinel wrote:
You really need to post the original query, so we can see *why* postgres
thinks it needs to run the plan this way.
Here it is:
SELECT AdDevicesSites.IDPK, AdDevicesSites.AdDevicesSiteSizeIDFK, 
AdDevicesSites.AdDevicesSiteRegionIDFK, 
AdDevicesSites.AdDevicesSiteCountyIDFK, 
AdDevicesSites.AdDevicesSiteCityIDFK, 
AdDevicesSites.AdDevicesSiteDistrictIDFK, 
AdDevicesSites.AdDevicesSiteStreetIDFK, 
AdDevicesSites.AdDevicesSiteStreetDescriptionIDFK, 
AdDevicesSites.AdDevicesSitePositionIDFK, 
AdDevicesSites.AdDevicesSiteVisibilityIDFK, 
AdDevicesSites.AdDevicesSiteStatusTypeIDFK, 
AdDevicesSites.AdDevicesSitePartnerIdentificationOperatorIDFK, 
AdDevicesSites.AdDevicesSitePartnerElectricitySupplierIDFK, 
AdDevicesSites.AdDevicesSitePartnerMaintainerIDFK, 
AdDevicesSites.AdDevicesSitePartnerStickerIDFK, 
AdDevicesSites.CadastralUnitIDFK, AdDevicesSites.MediaType, 
AdDevicesSites.Mark, AdDevicesSites.Amount, AdDevicesSites.Distance, 
AdDevicesSites.OwnLightening, AdDevicesSites.LocationDownTown, 
AdDevicesSites.LocationSuburb, AdDevicesSites.LocationBusinessDistrict, 
AdDevicesSites.LocationResidentialDistrict, 
AdDevicesSites.LocationIndustrialDistrict, 
AdDevicesSites.LocationNoBuildings, AdDevicesSites.ParkWayHighWay, 
AdDevicesSites.ParkWayFirstClassRoad, AdDevicesSites.ParkWayOtherRoad, 
AdDevicesSites.ParkWayStreet, AdDevicesSites.ParkWayAccess, 
AdDevicesSites.ParkWayExit, AdDevicesSites.ParkWayParkingPlace, 
AdDevicesSites.ParkWayPassangersOnly, AdDevicesSites.ParkWayCrossRoad, 
AdDevicesSites.PositionStandAlone, 
AdDevicesSites.NeighbourhoodPublicTransportation, 
AdDevicesSites.NeighbourhoodInterCityTransportation, 
AdDevicesSites.NeighbourhoodPostOffice, 
AdDevicesSites.NeighbourhoodNewsStand, 
AdDevicesSites.NeighbourhoodAmenities, 
AdDevicesSites.NeighbourhoodSportsSpot, 
AdDevicesSites.NeighbourhoodHealthServiceSpot, 
AdDevicesSites.NeighbourhoodShops, 
AdDevicesSites.NeighbourhoodShoppingCenter, 
AdDevicesSites.NeighbourhoodSuperMarket, 
AdDevicesSites.NeighbourhoodPetrolStation, 
AdDevicesSites.NeighbourhoodSchool, AdDevicesSites.NeighbourhoodBank, 
AdDevicesSites.NeighbourhoodRestaurant, 
AdDevicesSites.NeighbourhoodHotel, AdDevicesSites.RestrictionCigarettes, 
AdDevicesSites.RestrictionPolitics, AdDevicesSites.RestrictionSpirits, 
AdDevicesSites.RestrictionSex, AdDevicesSites.RestrictionOther, 
AdDevicesSites.RestrictionNote, AdDevicesSites.SpotMapFile, 
AdDevicesSites.SpotPhotoFile, AdDevicesSites.SourcePhotoTimeStamp, 
AdDevicesSites.SourceMapTimeStamp, AdDevicesSites.Price, 
AdDevicesSites.WebPrice, AdDevicesSites.CadastralUnitCode, 
AdDevicesSites.BuildingNumber, AdDevicesSites.ParcelNumber, 
AdDevicesSites.GPSLatitude, AdDevicesSites.GPSLongitude, 
AdDevicesSites.GPSHeight, AdDevicesSites.MechanicalOpticalCoordinates, 
AdDevicesSites.Deleted, AdDevicesSites.Protected, 
AdDevicesSites.DateCreated, AdDevicesSites.DateLastModified, 
AdDevicesSites.DateDeleted, AdDevicesSites.CreatedByUserIDFK, 
AdDevicesSites.LastModifiedByUserIDFK, AdDevicesSites.DeletedByUserIDFK, 
AdDevicesSites.PhotoLastModificationDate, 
AdDevicesSites.MapLastModificationDate, AdDevicesSites.DateLastImported, 
AdDevicesSiteRegions.Name AS AdDevicesSiteRegionName, 
AdDevicesSiteCounties.Name AS AdDevicesSiteCountyName, 
AdDevicesSiteCities.Name AS AdDevicesSiteCityName, 
AdDevicesSiteStreets.Name AS AdDevicesSiteStreetName, 
AdDevicesSiteDistricts.Name AS AdDevicesSiteDistrictName, 
AdDevicesSiteStreetDescriptions.Name_cs AS 
AdDevicesSiteStreetDescriptionName_cs, 
AdDevicesSiteStreetDescriptions.Name_en AS 
AdDevicesSiteStreetDescriptionName_en, AdDevicesSiteSizes.Name AS 
AdDevicesSiteSizeName, SUBSTRING(AdDevicesSiteVisibilities.Name_cs, 3) 
AS AdDevicesSiteVisibilityName_cs, 
SUBSTRING(AdDevicesSiteVisibilities.Name_en, 3) AS 
AdDevicesSiteVisibilityName_en, AdDevicesSitePositions.Name_cs AS 
AdDevicesSitePositionName_cs, AdDevicesSitePositions.Name_en AS 
AdDevicesSitePositionName_en, AdDevicesSiteStatusTypes.Name_cs AS 
AdDevicesSiteStatusTypeName_cs, AdDevicesSiteStatusTypes.Name_en AS 
AdDevicesSiteStatusTypeName_en, PartnerIdentificationsOperator.Name AS 
PartnerIdentificationOperatorName, PartnersElectricitySupplier.Name AS 
PartnerElectricitySupplierName, PartnersMaintainer.Name AS 
PartnerMaintainerName, PartnersSticker.Name AS PartnerStickerName, 
CadastralUnits.Code AS CadastralUnitCodeNative, CadastralUnits.Name AS 
CadastralUnitName
FROM AdDevicesSites
LEFT JOIN AdDevicesSiteRegions ON AdDevicesSites.AdDevicesSiteRegionIDFK 
= AdDevicesSiteRegions.IDPK
LEFT JOIN AdDevicesSiteCounties ON 
AdDevicesSites.AdDevicesSiteCountyIDFK = AdDevicesSiteCounties.IDPK
LEFT JOIN AdDevicesSiteCities ON AdDevicesSites.AdDevicesSiteCityIDFK = 
AdDevicesSiteCities.IDPK
LEFT JOIN AdDevicesSiteStreets ON AdDevicesSites.AdDevicesSiteStreetIDFK 
= AdDevicesSiteStreets.IDPK
LEFT JOIN AdDevicesSiteStreetDescriptions ON 
AdDevicesSites.AdDevicesSiteStreetDescriptionIDFK 

Re: [PERFORM] How to read query plan

2005-03-13 Thread Miroslav Šulc
Hi Ragnar,
Ragnar Hafsta wrote:
[snip output of EXPLAIN ANALYZE]
for those of us who have not yet reached the level where one can
infer it from the query plan, how abour showing us the actual
query too ?
 

I thought it will be sufficient to show me where the main bottleneck is. 
And in fact, the query is rather lengthy. But I have included it in the 
response to John. So sorry for the incompletness.

but as an example of what to look for, consider the first few lines
(reformatted): 
 

Merge Right Join  (cost=9868.84..9997.74 rows=6364 width=815) 
 (actual time=9982.022..10801.216 rows=6364 loops=1)
 Merge Cond: (outer.idpk = inner.cadastralunitidfk)
 -  Index Scan using cadastralunits_pkey on cadastralunits  
 (cost=0.00..314.72 rows=13027 width=31)
 (actual time=0.457..0.552 rows=63 loops=1)
 -  Sort  (cost=9868.84..9884.75 rows=6364 width=788)
   (actual time=9981.405..10013.708 rows=6364 loops=1)
   

notice that the index scan is expected to return 13027 rows, but
actually returns 63. this might influence the a choice of plan.
 

Yes, the situation in this scenario is that the table of CadastralUnits 
contains all units from country but the AdDevices in this case are only 
from the 63 CadastralUnits. So the result - 63 rows - is just this 
little subset. Up to that, not all AdDevices have CadastralUnitIDFK set 
to an IDPK that exists in CadastralUnits but to zero (= no CadastralUnit 
set).

gnari
 

Miroslav ulc
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] How to read query plan

2005-03-13 Thread John Arbash Meinel
Miroslav ulc wrote:
Hi John,
thank you for your response.
How about a quick side track.
Have you played around with your shared_buffers, maintenance_work_mem,
and work_mem settings?
What version of postgres are you using? The above names changed in 8.0,
and 8.0 also has some perfomance improvements over the 7.4 series.
What is your hardware? Are you testing this while there is load on the
system, or under no load.
Are you re-running the query multiple times, and reporting the later
speeds, or just the first time? (If nothing is loaded into memory, the
first run is easily 10x slower than later ones.)
Just some background info. If you have set these to reasonable values,
we probably don't need to spend much time here, but it's just one of
those things to check.
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] How to read query plan

2005-03-13 Thread John Arbash Meinel
Miroslav ulc wrote:
Hi John,
thank you for your response.
I will comment on things separately.
John Arbash Meinel wrote:
...
These external tables contain information that are a unique parameter
of the AdDevice (like Position, Region, County, City etc.), in some
containing localized description of the property attribute. Some of
them could be moved into the main table but that would create a
redundancy, some of them cannot be moved into the main table (like
information about Partners which is definitely another object with
respect to AdDevices). I think the names of the tables are
self-explanatory so it should be clear what each table stores. Is this
design incorrect?
It's actually more of a question as to why you are doing left outer
joins, rather than simple joins.
Are the tables not fully populated? If so, why not?
How are you using this information? Why is it useful to get back rows
that don't have all of their information filled out?
Why is it useful to have so many columns returned? It seems like it most
cases, you are only going to be able to use *some* of the information,
why not create more queries that are specialized, rather than one get
everything query.
In fact, we only need about 30 records at a time but LIMIT can
speed-up the query only when looking for the first 30 records. Setting
OFFSET slows the query down.
Have you thought about using a cursor instead of using limit + offset?
This may not help the overall time, but it might let you split up when
the time is spent.
BEGIN;
DECLARE cursor_name CURSOR FOR SELECT ... FROM ...;
FETCH FORWARD 30 FROM cursor_name;
FETCH FORWARD 30 FROM cursor_name;
...
END;
I don't
really know what you are looking for, but you are joining against enough
tables, that I think this query is always going to be slow.

In MySQL the query was not so slow and I don't see any reason why
there should be large differences in SELECT speed. But if the design
of the tables is incorrect, we will correct it.
In the other post I asked about your postgres settings. The defaults are
pretty stingy, so that *might* be an issue.
From what I can tell, you have 1 table which has 6364 rows, and you are
grabbing all of those rows, and then outer joining it with about 11
other tables.

Here are the exact numbers:
AdDevicesSites - 6364
AdDevicesSiteRegions - 15
AdDevicesSiteCounties - 110
AdDevicesSiteCities - 124
AdDevicesSiteStreets - 2858
AdDevicesSiteStreetDescriptions - 4585
AdDevicesSiteDistricts - 344
AdDevicesSiteSizes - 110
AdDevicesSiteVisibilities - 4
AdDevicesSitePositions - 3
AdDevicesSiteStatusTypes - 5
PartnerIdentifications - 61
Partners - 61
CadastralUnits - 13027
And if I understand correctly, you consider all of these to be outer
joins. Meaning you want *all* of AdDevicesSites, and whatever info goes
along with it, but there are no restrictions as to what rows you want.
You want everything you can get.
Do you actually need *everything*? You mention only needing 30, what for?
I would actually guess that the most expensive parts of the plan are the
NESTED LOOPS which when they go to materialize have to do a sequential
scan, and they get executed 6364 times. It looks like the other tables
are small (only 3-5 rows), so it takes about 0.05 ms for each seqscan,
the problem is that because you are doing it 6k times, it ends up taking
about 300ms of your time.
You could try setting set enable_nestloop to off.
I don't know that it will be faster, but it could be.

I have tried that and it resulted in about 2 sec slowdown :-(
Generally, the optimizer *does* select the best query plan. As long as
it has accurate statistics, which it seems to in this case.

In general, though, it seems like you should be asking a different
question, rather than trying to optimize the query that you have.

You mean how should I improve the design to make the query faster?
There is one possibility if we don't find anything nicer. Which is to
create a lazy materialized view. Basically, you run this query, and
store it in a table. Then when you want to do the SELECT, you just do
that against the unrolled table.
You can then create triggers, etc to keep the data up to date.
Here is a good documentation of it:
http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html
It is basically a way that you can un-normalize data, in a safe way.
Also, another thing that you can do, is instead of using a cursor, you
can create a temporary table with the results of the query, and create a
primary key which is just a simple counter. Then instead of doing limit
+ offset, you can select * where id  0 and id  30; ... select * where
id  30 and id  60; etc.
It still requires the original query to be run, though, so it is not
necessarily optimal for you.
Can you post the original SQL statement, and maybe describe what you are
trying to do?

I hope the explanation above is clear and sufficient :-)
John
=:-

Unfortunately, I don't really see any obvious problems with your query
in the way that you are using it. 

Re: [PERFORM] How to read query plan

2005-03-13 Thread Miroslav Šulc
John Arbash Meinel wrote:
How about a quick side track.
Have you played around with your shared_buffers, maintenance_work_mem,
and work_mem settings?
I have tried to set shared_buffers to 48000 now but no speedup 
(11,098.813 ms third try). The others are still default. I'll see 
documentation and will play with the other parameters.

What version of postgres are you using?
8.0.1
The above names changed in 8.0,
and 8.0 also has some perfomance improvements over the 7.4 series.
What is your hardware?
My dev notebook Acer TravelMate 292LMi
$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 9
model name  : Intel(R) Pentium(R) M processor 1500MHz
stepping: 5
cpu MHz : 1495.485
cache size  : 1024 KB
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 2
wp  : yes
flags   : fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov 
pat clflush dts acpi mmx fxsr sse sse2 tm pbe est tm2
bogomips: 2957.31

$ cat /proc/meminfo
MemTotal:   516136 kB
MemFree: 18024 kB
Buffers: 21156 kB
Cached: 188868 kB
SwapCached: 24 kB
Active: 345596 kB
Inactive:   119344 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:   516136 kB
LowFree: 18024 kB
SwapTotal: 1004020 kB
SwapFree:  1003996 kB
Dirty:   4 kB
Writeback:   0 kB
Mapped: 343676 kB
Slab:18148 kB
CommitLimit:   1262088 kB
Committed_AS:   951536 kB
PageTables:   2376 kB
VmallocTotal:   516056 kB
VmallocUsed: 90528 kB
VmallocChunk:   424912 kB
IDE disc.
# hdparm -Tt /dev/hda
/dev/hda:
Timing cached reads:   1740 MB in  2.00 seconds = 870.13 MB/sec
Timing buffered disk reads:   40 MB in  3.30 seconds =  12.10 MB/sec
Are you testing this while there is load on the
system, or under no load.
The load is low. This is few seconds after I have run the EXPLAIN ANALYZE.
# cat /proc/loadavg
0.31 0.51 0.33 1/112 6909
Are you re-running the query multiple times, and reporting the later
speeds, or just the first time? (If nothing is loaded into memory, the
first run is easily 10x slower than later ones.)
The times changes only little. First run was about 13 sec, second about 
10 sec, third about 11 sec etc.

Just some background info. If you have set these to reasonable values,
we probably don't need to spend much time here, but it's just one of
those things to check.
Sure you are right. I'll try the other parameters.
John
=:-
Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How to read query plan

2005-03-13 Thread Miroslav Šulc
John Arbash Meinel wrote:
It's actually more of a question as to why you are doing left outer
joins, rather than simple joins.
Are the tables not fully populated? If so, why not?
Some records do not consist of full information (they are collected from 
different sources which use different approach to the data collection) 
so using INNER JOIN would cause some records wouldn't be displayed which 
is unacceptable.

How are you using this information? Why is it useful to get back rows
that don't have all of their information filled out?
Each row contains main information which are important. The other 
information are also important but may be missing. Information are 
display on lists of 30 rows or on a card. When using filter the query is 
much faster but the case without filter has these results.

Why is it useful to have so many columns returned? It seems like it most
cases, you are only going to be able to use *some* of the information,
why not create more queries that are specialized, rather than one get
everything query.
Many of the columns are just varchar(1) (because of the migration from 
MySQL enum field type) so the record is not so long as it could seem. 
These fields are just switches (Y(es) or N(o)). The problem is users can 
define their own templates and in different scenarios there might be 
displayed different information so reducing the number of fields would 
mean in some cases it wouldn't work as expected. But if we couldn't 
speed the query up, we will try to improve it other way.
Is there any serious reason not to use so much fields except memory 
usage? It seems to me that it shouldn't have a great impact on the speed 
in this case.

Have you thought about using a cursor instead of using limit + offset?
This may not help the overall time, but it might let you split up when
the time is spent.
..
No. I come from MySQL world where these things are not common (at least 
when using MyISAM databases). The other reason (if I understand it well) 
is that the retrieval of the packages of 30 records is not sequential. 
Our app is web based and we use paging. User can select page 1 and then 
page 10, then go backward to page 9 etc.

And if I understand correctly, you consider all of these to be outer
joins. Meaning you want *all* of AdDevicesSites, and whatever info goes
along with it, but there are no restrictions as to what rows you want.
You want everything you can get.
Do you actually need *everything*? You mention only needing 30, what for?
For display of single page consisting of 30 rows. The reason I query all 
rows is that this is one of the filters users can use. User can display 
just bigboards or billboards (or specify more advanced filters) but 
he/she can also display AdDevices without any filter (page by page). 
Before I select the 30 row, I need to order them by a key and after that 
select the records, so this is also the reason why to ask for all rows. 
The key for sorting might be different for each run.

There is one possibility if we don't find anything nicer. Which is to
create a lazy materialized view. Basically, you run this query, and
store it in a table. Then when you want to do the SELECT, you just do
that against the unrolled table.
You can then create triggers, etc to keep the data up to date.
Here is a good documentation of it:
http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html
It is basically a way that you can un-normalize data, in a safe way.
Also, another thing that you can do, is instead of using a cursor, you
can create a temporary table with the results of the query, and create a
primary key which is just a simple counter. Then instead of doing limit
+ offset, you can select * where id  0 and id  30; ... select * where
id  30 and id  60; etc.
It still requires the original query to be run, though, so it is not
necessarily optimal for you.
These might be the other steps in case we cannot speed-up the query. I 
would prefer to speed the query up :-)

Unfortunately, I don't really see any obvious problems with your query
in the way that you are using it. The problem is that you are not
applying any selectivity, so postgres has to go to all the tables, and
get all the rows, and then try to logically merge them together. It is
doing a hash merge, which is generally one of the faster ones and it
seems to be doing the right thing.
I would be curious to see how mysql was handling this query, to see if
there was something different it was trying to do. I'm also curious how
much of a difference there was.
In fact, on MySQL I didn't see any slow reactions so I didn't measure 
and inspect it. But I can try it if I figure out how to copy the 
database from PostgreSQL to MySQL.

John
=:-
Thank you for your time and help.
Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika

Re: [PERFORM] How to read query plan

2005-03-13 Thread Tom Lane
John Arbash Meinel [EMAIL PROTECTED] writes:
 How about a quick side track.
 Have you played around with your shared_buffers, maintenance_work_mem,
 and work_mem settings?

Indeed.  The hash joins seem unreasonably slow considering how little
data they are processing (unless this is being run on some ancient
toaster...).  One thought that comes to mind is that work_mem may be
set so small that the hashes are forced into multiple batches.

Another question worth asking is what are the data types of the columns
being joined on.  If they are character types, what locale and encoding
is the database using?

 Are you re-running the query multiple times, and reporting the later
 speeds, or just the first time? (If nothing is loaded into memory, the
 first run is easily 10x slower than later ones.)

That cost would be paid during the bottom-level scans though.  The thing
that strikes me here is that nearly all of the cost is being spent
joining.

 What version of postgres are you using?

And what's the platform (hardware and OS)?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to read query plan

2005-03-13 Thread John Arbash Meinel
Miroslav ulc wrote:
John Arbash Meinel wrote:
...
Many of the columns are just varchar(1) (because of the migration from
MySQL enum field type) so the record is not so long as it could seem.
These fields are just switches (Y(es) or N(o)). The problem is users
can define their own templates and in different scenarios there might
be displayed different information so reducing the number of fields
would mean in some cases it wouldn't work as expected. But if we
couldn't speed the query up, we will try to improve it other way.
Is there any serious reason not to use so much fields except memory
usage? It seems to me that it shouldn't have a great impact on the
speed in this case.
Is there a reason to use varchar(1) instead of char(1). There probably
is 0 performance difference, I'm just curious.

Have you thought about using a cursor instead of using limit + offset?
This may not help the overall time, but it might let you split up when
the time is spent.
..

No. I come from MySQL world where these things are not common (at
least when using MyISAM databases). The other reason (if I understand
it well) is that the retrieval of the packages of 30 records is not
sequential. Our app is web based and we use paging. User can select
page 1 and then page 10, then go backward to page 9 etc.
Well, with cursors you can also do FETCH ABSOLUTE 1 FROM
cursor_name, which sets the cursor position, and then you can FETCH
FORWARD 30.
I honestly don't know how the performance will be, but it is something
that you could try.
And if I understand correctly, you consider all of these to be outer
joins. Meaning you want *all* of AdDevicesSites, and whatever info goes
along with it, but there are no restrictions as to what rows you want.
You want everything you can get.
Do you actually need *everything*? You mention only needing 30, what
for?

For display of single page consisting of 30 rows. The reason I query
all rows is that this is one of the filters users can use. User can
display just bigboards or billboards (or specify more advanced
filters) but he/she can also display AdDevices without any filter
(page by page). Before I select the 30 row, I need to order them by a
key and after that select the records, so this is also the reason why
to ask for all rows. The key for sorting might be different for each run.
How are you caching the information in the background in order to
support paging? Since you aren't using limit/offset, and you don't seem
to be creating a temporary table, I assume you have a layer inbetween
the web server and the database (or possibly inside the webserver) which
keeps track of current session information. Is that true?
These might be the other steps in case we cannot speed-up the query. I
would prefer to speed the query up :-)
Naturally fast query comes first. I just have the feeling it is either a
postgres configuration problem, or an intrinsic problem to postgres.
Given your constraints, there's not much that we can change about the
query itself.
In fact, on MySQL I didn't see any slow reactions so I didn't measure
and inspect it. But I can try it if I figure out how to copy the
database from PostgreSQL to MySQL.
I figured you still had a copy of the MySQL around to compare to. You
probably don't need to spend too much time on it yet.
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] How to read query plan

2005-03-13 Thread Miroslav ulc
Tom Lane wrote:
John Arbash Meinel [EMAIL PROTECTED] writes:
 

How about a quick side track.
Have you played around with your shared_buffers, maintenance_work_mem,
and work_mem settings?
   

Indeed.  The hash joins seem unreasonably slow considering how little
data they are processing (unless this is being run on some ancient
toaster...).  One thought that comes to mind is that work_mem may be
set so small that the hashes are forced into multiple batches.
 

I've just tried to uncomment the settings for these parameters with with 
no impact on the query speed.

shared_buffers = 48000  # min 16, at least max_connections*2, 
8KB each
work_mem = 1024 # min 64, size in KB
maintenance_work_mem = 16384# min 1024, size in KB
max_stack_depth = 2048  # min 100, size in KB

Another question worth asking is what are the data types of the columns
being joined on.  If they are character types, what locale and encoding
is the database using?
 

I have checked this and there are some JOINs smallint against integer. 
Is that problem? I would use smallint for IDPKs of some smaller tables 
but the lack of SMALLSERIAL and my laziness made me use SERIAL instead 
which is integer.

That cost would be paid during the bottom-level scans though.  The thing
that strikes me here is that nearly all of the cost is being spent
joining.
 

What version of postgres are you using?
   

And what's the platform (hardware and OS)?
 

I've already posted the hardware info. OS is Linux (Gentoo) with kernel 
2.6.11.

			regards, tom lane
 

Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How to read query plan

2005-03-13 Thread Miroslav Šulc
John Arbash Meinel wrote:
Is there a reason to use varchar(1) instead of char(1). There probably
is 0 performance difference, I'm just curious.
No, not at all. I'm just not used to char().
Well, with cursors you can also do FETCH ABSOLUTE 1 FROM
cursor_name, which sets the cursor position, and then you can FETCH
FORWARD 30.
I honestly don't know how the performance will be, but it is something
that you could try.
This is science for me at this moment :-)
For display of single page consisting of 30 rows. The reason I query
all rows is that this is one of the filters users can use. User can
display just bigboards or billboards (or specify more advanced
filters) but he/she can also display AdDevices without any filter
(page by page). Before I select the 30 row, I need to order them by a
key and after that select the records, so this is also the reason why
to ask for all rows. The key for sorting might be different for each 
run.

How are you caching the information in the background in order to
support paging? Since you aren't using limit/offset, and you don't seem
to be creating a temporary table, I assume you have a layer inbetween
the web server and the database (or possibly inside the webserver) which
keeps track of current session information. Is that true?
I just need three information:
1) used filter (stored in session, identified by filter index in query 
string)
2) page length (static predefined)
3) what page to display (in query string)

In fact, on MySQL I didn't see any slow reactions so I didn't measure
and inspect it. But I can try it if I figure out how to copy the
database from PostgreSQL to MySQL.

I figured you still had a copy of the MySQL around to compare to. You
probably don't need to spend too much time on it yet.
It's not so simple because there are some differences between MySQL and 
PostgreSQL in how they handle case sensitivity etc. The database table 
structures are not the same too because of different data types support 
and data values support.

John
=:-
Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How to read query plan

2005-03-13 Thread Tom Lane
=?windows-1250?Q?Miroslav_=8Aulc?= [EMAIL PROTECTED] writes:
 I've just tried to uncomment the settings for these parameters with with 
 no impact on the query speed.

 shared_buffers = 48000  # min 16, at least max_connections*2, 
 8KB each
 work_mem = 1024 # min 64, size in KB
 maintenance_work_mem = 16384# min 1024, size in KB
 max_stack_depth = 2048  # min 100, size in KB

Hmm.  Given the small size of the auxiliary tables, you'd think they'd
fit in 1MB work_mem no problem.  But try bumping work_mem up to 10MB
just to see if it makes a difference.  (BTW, you do know that altering
the .conf file doesn't in itself do anything?  You have to SIGHUP the
postmaster to make it notice the change ... and for certain parameters
such as shared_buffers, you actually have to stop and restart the
postmaster.  You can use the SHOW command to verify whether a change
has taken effect.)

 I have checked this and there are some JOINs smallint against integer. 
 Is that problem?

That probably explains why some of the joins are merges instead of
hashes --- hash join doesn't work across datatypes.  Doesn't seem like
it should be a huge problem though.  I was more concerned about the
possibility of slow locale-dependent string comparisons.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] How to read query plan

2005-03-13 Thread Miroslav Šulc
Tom Lane wrote:
=?windows-1250?Q?Miroslav_=8Aulc?= [EMAIL PROTECTED] writes:
 

shared_buffers = 48000  # min 16, at least max_connections*2, 
8KB each
work_mem = 1024 # min 64, size in KB
maintenance_work_mem = 16384# min 1024, size in KB
max_stack_depth = 2048  # min 100, size in KB
   

Hmm.  Given the small size of the auxiliary tables, you'd think they'd
fit in 1MB work_mem no problem.  But try bumping work_mem up to 10MB
just to see if it makes a difference.  (BTW, you do know that altering
the .conf file doesn't in itself do anything?  You have to SIGHUP the
postmaster to make it notice the change ... and for certain parameters
such as shared_buffers, you actually have to stop and restart the
postmaster.  You can use the SHOW command to verify whether a change
has taken effect.)
 

I've tried to set work_mem to 10240, restarted postmaster and tried the 
EXPLAIN ANALYZE but there is only cca 200 ms speedup.

I have checked this and there are some JOINs smallint against integer. 
Is that problem?
   

That probably explains why some of the joins are merges instead of
hashes --- hash join doesn't work across datatypes.  Doesn't seem like
it should be a huge problem though.  I was more concerned about the
possibility of slow locale-dependent string comparisons.
 

There are only JOINs number against number. I've tried to change one of 
the fields from smallint to integer but there was no speedup.

			regards, tom lane
 

Miroslav
begin:vcard
fn;quoted-printable:Miroslav =C5=A0ulc
n;quoted-printable:=C5=A0ulc;Miroslav
org:StartNet s.r.o.
adr;quoted-printable;quoted-printable:;;Vrchlick=C3=A9ho 161/5;Praha 5;;150 00;=C4=8Cesk=C3=A1 republika
email;internet:[EMAIL PROTECTED]
title:CEO
tel;work:+420 257 225 602
tel;cell:+420 603 711 413
x-mozilla-html:TRUE
url:http://www.startnet.cz
version:2.1
end:vcard


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to read query plan

2005-03-13 Thread Tom Lane
=?windows-1250?Q?Miroslav_=8Aulc?= [EMAIL PROTECTED] writes:
 Is the data confidential?  If you'd be willing to send me a pg_dump
 off-list, I'd like to replicate this test and try to see where the time
 is going.
 
 Thank you very much for your offer. The data are partially confidental 
 so I hashed some of the text information and changed some values (not 
 the values for the JOINs) so I could send it to you. I've checked the 
 EXPLAIN ANALYZE if anything changed and the result is merely the same 
 (maybe cca 1 sec slower - maybe because the hash caused the text data to 
 be longer).

No problem; thank you for supplying the test case.  What I find is
rather surprising: most of the runtime can be blamed on disassembling
and reassembling tuples during the join steps.  Here are the hot spots
according to gprof:

---
1.277.388277/103737  ExecScan [16]
2.93   17.02   19092/103737  ExecNestLoop cycle 2 [14]
3.91   22.70   25456/103737  ExecMergeJoin cycle 2 [13]
7.81   45.40   50912/103737  ExecHashJoin cycle 2 [12]
[9] 86.3   15.92   92.50  103737 ExecProject [9]
7.65   76.45 8809835/9143692 ExecEvalVar [10]
3.424.57  103737/103775  heap_formtuple [17]
0.030.24   12726/143737  ExecMakeFunctionResultNoSets 
[24]
0.020.12  103737/290777  ExecStoreTuple [44]
0.010.00   2/2   ExecEvalFunc [372]
0.000.00   2/22  ExecMakeFunctionResult [166]
---
0.000.00  42/9143692 ExecEvalFuncArgs [555]
0.050.51   59067/9143692 ExecHashGetHashValue [32]
0.242.38  274748/9143692 ExecMakeFunctionResultNoSets 
[24]
7.65   76.45 8809835/9143692 ExecProject [9]
[10]69.57.94   79.34 9143692 ExecEvalVar [10]
   79.340.00 8750101/9175517 nocachegetattr [11]
---

I think the reason this is popping to the top of the runtime is that the
joins are so wide (an average of ~85 columns in a join tuple according
to the numbers above).  Because there are lots of variable-width columns
involved, most of the time the fast path for field access doesn't apply
and we end up going to nocachegetattr --- which itself is going to be
slow because it has to scan over so many columns.  So the cost is
roughly O(N^2) in the number of columns.

As a short-term hack, you might be able to improve matters if you can
reorder your LEFT JOINs to have the minimum number of columns
propagating up from the earlier join steps.  In other words make the
later joins add more columns than the earlier, as much as you can.

This is actually good news, because before 8.0 we had much worse
problems than this with extremely wide tuples --- there were O(N^2)
behaviors all over the place due to the old List algorithms.  Neil
Conway's rewrite of the List code got rid of those problems, and now
we can see the places that are left to optimize.  The fact that there
seems to be just one is very nice indeed.

Since ExecProject operations within a nest of joins are going to be
dealing entirely with Vars, I wonder if we couldn't speed matters up
by having a short-circuit case for a projection that is only Vars.
Essentially it would be a lot like execJunk.c, except able to cope
with two input tuples.  Using heap_deformtuple instead of retail
extraction of fields would eliminate the O(N^2) penalty for wide tuples.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings