Re: [PERFORM] Comparative performance

2005-10-04 Thread PFC


It's more understandable if the table names are in front of the column  
names :


SELECT relationship.topic_id1, relationship.topic_id2, topic.topic_name,  
topic.categ_id, topic.list_name, topic.title,
topic.url, topic.page_type, relationship.rel_type, entry_type.inverse_id,  
relationship.description AS rel_descrip,

relationship.created, relationship.updated
FROM relationship, topic, entry_type
WHERE
	((relationship.topic_id1 = topic.topic_id AND relationship.topic_id2 =  
1252)
	OR (relationship.topic_id2 =  topic.topic_id and relationship.topic_id1 =  
1252))

AND relationship.rel_type = entry_type.type_id
AND entry_type.class_id = 2
ORDER BY rel_type, list_name;

I see a few problems in your schema.
- topic_id1 and topic_id2 play the same role, there is no constraint to  
determine which is which, hence it is possible to define the same relation  
twice.

- as you search on two columns with OR, you need UNION to use indexes.
- lack of indexes
- I don't understand why the planner doesn't pick up hash joins...
- if you use a version before 8, type mismatch will prevent use of the  
indexes.


I'd suggest rewriting the query like this :
SELECT topic.*, foo.* FROM
topic,
(SELECT topic_id2 as fetch_id, topic_id1, topic_id2, rel_type, description  
as rel_descrip, created, updated

FROM relationship
WHERE
rel_type IN (SELECT type_id FROM entry_type WHERE class_id = 2)
AND topic_id1 = 1252
UNION
SELECT topic_id1 as fetch_id, topic_id1, topic_id2, rel_type, description  
as rel_descrip, created, updated

FROM relationship
WHERE
rel_type IN (SELECT type_id FROM entry_type WHERE class_id = 2)
AND topic_id2 = 1252)
AS foo
WHERE topic.topic_id = foo.fetch_id


CREATE INDEX'es ON
entry_type( class_id )

relationship( topic_id1, rel_type, topic_id2 )	which becomes your new  
PRIMARY KEY

relationship( topic_id2, rel_type, topic_id1 )

Of course, this doesn't explain how MySQL manages to execute the query  
in about 9 msec.  The only minor differences in the schema are:   
entry_type.title and rel_title are char(32) in MySQL,  
entry_type.class_id is a tinyint, and topic.categ_id, page_type and  
dark_ind are also tinyints. MySQL also doesn't have the REFERENCES.


Can you post the result from MySQL EXPLAIN ?

You might be interested in the following code. Just replace mysql_ by pg_,  
it's quite useful.


$global_queries_log = array();

function _getmicrotime()	{	list($u,$s) = explode(' ',microtime());	return  
$u+$s;	}


/*  Formats query, with given arguments, escaping all strings as needed.
db_quote_query( 'UPDATE junk SET a=%s WHERE b=%s', array( 1,po'po ) )
=   UPDATE junk SET a='1 WHERE b='po\'po'
*/
function db_quote_query( $sql, $params=false )
{
// if no params, send query raw
if( !$params )
return $sql;

// quote params
foreach( $params as $key = $val )
{
if( is_array( $val ))
$val = implode( ',', $val );
$params[$key] = '.mysql_real_escape_string( $val ).';
}
return vsprintf( $sql, $params );
}

/*  Formats query, with given arguments, escaping all strings as needed.
Runs query, logging its execution time.
Returns the query, or dies with error.
*/
function db_query( $sql, $params=false )
{
// it's already a query
if( is_resource( $sql ))
return $sql;

$sql = db_quote_query( $sql, $params );

$t = _getmicrotime();
$r = mysql_query( $sql );
if( !$r )
{
		echo div class=bigerrorbErreur MySQL  
:/bbr.mysql_error().brbrbRequﻪte/b  
:br.$sql.brbrbTraceback /b:pre;

foreach( debug_backtrace() as $t ) xdump( $t );
echo /pre/div;
die();
}
global $global_queries_log;
$global_queries_log[] = array( _getmicrotime()-$t, $sql );
return $r;
}

At the end of your page, display the contents of $global_queries_log.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Comparative performance

2005-10-04 Thread Jim C. Nasby
On Thu, Sep 29, 2005 at 08:44:16AM -0400, Joe wrote:
 CREATE TABLE entry (
   entry_id serial PRIMARY KEY,
   title VARCHAR(128) NOT NULL,
   subtitle VARCHAR(128),
   subject_type SMALLINT,
   subject_id INTEGER REFERENCES topic,
   actor_type SMALLINT,
   actor_id INTEGER REFERENCES topic,
   actor VARCHAR(64),
   actor_role VARCHAR(64),
   rel_entry_id INTEGER,
   rel_entry VARCHAR(64),
   description VARCHAR(255),
   quote text,
   url VARCHAR(255),
   entry_date CHAR(10),
   created DATE NOT NULL DEFAULT CURRENT_DATE,
   updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
 WITHOUT OIDS;
 CREATE INDEX entry_actor_id ON entry (actor_id);
 CREATE INDEX entry_subject_id ON entry (subject_id);

A few tips...

Fields in PostgreSQL have alignment requirements, so the smallints
aren't saving you anything right now. If you put both of them together
though, you'll save 4 bytes on most hardware.

You'll also get some minor gains from putting all the variable-length
fields at the end, as well as nullable fields. If you search the
archives for 'field order' you should be able to find some useful info.

Make sure these indexes exist if you'll be updating or inserting into
entry:

CREATE INDEX topic__subject_id ON topic(subject_id);
CREATE INDEX topic__actor_id ON topic(actor_id);

Also, the fact that subject and actor both point to topic along with
subject_type and actor_type make me suspect that your design is
de-normalized. Of course there's no way to know without more info.

FWIW, I usually use timestamptz for both created and updated fields.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-10-04 Thread Jim C. Nasby
On Thu, Sep 29, 2005 at 04:39:36PM -0400, Joe wrote:
 Andreas Pflug wrote:
 Hm, if you only have 4 tables, why do you need 12 queries?
 To reduce queries, join them in the query; no need to merge them 
 physically. If you have only two main tables, I'd bet you only need 1-2 
 queries for the whole page.
 
 There are more than four tables and the queries are not functionally 
 overlapping.  As an example, allow me to refer to the page 
 www.freedomcircle.com/topic.php/Economists.
 
 The top row of navigation buttons (Life, Liberty, etc.) is created from a 
 query of the 'topic' table.  It could've been hard-coded as a PHP array, 
 but with less flexibility.  The alphabetical links are from a SELECT 
 DISTINCT substring from topic.  It could've been generated by a PHP for 
 loop (originally implemented that way) but again with less flexibility.  
 The listing of economists is another SELECT from topic.  The subheadings 
 (Articles, Books) come from a SELECT of an entry_type table --which 
 currently has 70 rows-- and is read into a PHP array since we don't know 
 what headings will be used in a given page.  The detail of the entries 

I suspect this might be something better done in a join.

 comes from that query that I posted earlier, but there are three additional 
 queries that are used for specialized entry types (relationships between 
 topics --e.g., Prof. Williams teaches at George Mason, events, and 
 multi-author or multi-subject articles and books).  And there's yet another 

Likewise...

 table for the specific book information.  Once the data is retrieved it's 
 sorted internally with PHP, at the heading level, before display.

It's often better to let the database sort and/or aggregate data.

 Maybe there is some way to merge all the queries (some already fairly 
 complex) that fetch the data for the entries box but I believe it would be 
 a monstrosity with over 100 lines of SQL.

Also, just because no one else has mentioned it, remember that it's very
easy to get MySQL into a mode where you have no data integrity. If
that's the case it's going to be faster than PostgreSQL (though I'm not
sure how much that affects the performance of SELECTs).
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-10-04 Thread Joe

Hi Jim,

Jim C. Nasby wrote:

Also, just because no one else has mentioned it, remember that it's very
easy to get MySQL into a mode where you have no data integrity. If
that's the case it's going to be faster than PostgreSQL (though I'm not
sure how much that affects the performance of SELECTs).


Yes indeed.  When I added the REFERENCES to the schema and reran the conversion 
scripts, aside from having to reorder the table creation and loading (they used 
to be in alphabetical order), I also found a few referential integrity errors in 
the MySQL data.


Joe


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Comparative performance

2005-10-04 Thread Joe

PFC wrote:
- if you use a version before 8, type mismatch will prevent use of the  
indexes.


I'm using 8.0.3, but the type mismatch between relationship.rel_type and 
entry_type.type_id was unintended.  The current databases use SMALLINT for both. 
 The PostgreSQL schema was derived from an export script stored in Subversion, 
apparently before the column datatypes were changed.



CREATE INDEX'es ON
entry_type( class_id )

relationship( topic_id1, rel_type, topic_id2 )which becomes your 
new  PRIMARY KEY

relationship( topic_id2, rel_type, topic_id1 )


Creating the second relationship index was sufficient to modify the query plan 
to cut down runtime to zero:


 Sort  (cost=75.94..75.95 rows=2 width=381) (actual time=0.000..0.000 rows=0 
loops=1)

   Sort Key: r.rel_type, t.list_name
   -  Nested Loop  (cost=16.00..75.93 rows=2 width=381) (actual 
time=0.000..0.000 rows=0 loops=1)
 Join Filter: (((outer.topic_id1 = inner.topic_id) AND 
(outer.topic_id2 = 1252)) OR ((outer.topic_id2 = inner.topic_id) AND 
(outer.topic_id1 = 1252)))
 -  Nested Loop  (cost=16.00..35.11 rows=1 width=169) (actual 
time=0.000..0.000 rows=0 loops=1)

   Join Filter: (inner.rel_type = outer.type_id)
   -  Seq Scan on entry_type e  (cost=0.00..18.75 rows=4 width=4) 
(actual time=0.000..0.000 rows=15 loops=1)

 Filter: (class_id = 2)
   -  Materialize  (cost=16.00..16.04 rows=4 width=167) (actual 
time=0.000..0.000 rows=0 loops=15)
 -  Seq Scan on relationship r  (cost=0.00..16.00 rows=4 
width=167) (actual time=0.000..0.000 rows=0 loops=1)

   Filter: ((topic_id2 = 1252) OR (topic_id1 = 1252))
 -  Seq Scan on topic t  (cost=0.00..30.94 rows=494 width=216) (never 
executed)

 Total runtime: 0.000 ms
(13 rows)

The overall execution time for the Economists page for PostgreSQL is within 4% 
of the MySQL time, so for the time being I'll leave the query in its current form.


Thanks for your help.

Joe


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Comparative performance

2005-10-04 Thread Joe

Jim C. Nasby wrote:

Make sure these indexes exist if you'll be updating or inserting into
entry:

CREATE INDEX topic__subject_id ON topic(subject_id);
CREATE INDEX topic__actor_id ON topic(actor_id);


Actually, topic's primary key is topic_id.


Also, the fact that subject and actor both point to topic along with
subject_type and actor_type make me suspect that your design is
de-normalized. Of course there's no way to know without more info.


Yes, the design is denormalized.  The reason is that a book or article is 
usually by a single author (an actor topic) and it will be listed under one 
main topic (a subject topic).  There's a topic_entry table where additional 
actors and subjects can be added.


It's somewhat ironic because I used to teach and/or preach normalization and the 
goodness of a 3NF+ design (also about having the database do aggregation and 
sorting as you mentioned in your other email).



FWIW, I usually use timestamptz for both created and updated fields.


IIRC 'created' ended up as a DATE because MySQL 4 has a restriction about a 
single TIMESTAMP column per table taking the default value of current_timestamp.


Joe


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Comparative performance

2005-10-04 Thread Jim C. Nasby
On Tue, Oct 04, 2005 at 05:11:19PM -0400, Joe wrote:
 Hi Jim,
 
 Jim C. Nasby wrote:
 Also, just because no one else has mentioned it, remember that it's very
 easy to get MySQL into a mode where you have no data integrity. If
 that's the case it's going to be faster than PostgreSQL (though I'm not
 sure how much that affects the performance of SELECTs).
 
 Yes indeed.  When I added the REFERENCES to the schema and reran the 
 conversion scripts, aside from having to reorder the table creation and 
 loading (they used to be in alphabetical order), I also found a few 
 referential integrity errors in the MySQL data.

Data integrity != refferential integrity. :) It's very easy to
accidentally get MyISAM tables in MySQL, which means you are nowhere
near ACID which also means you can't get anything close to an apples to
apples comparison to PostgreSQL.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Comparative performance

2005-10-04 Thread Dario
Postgresql uses MVCC to ensure data integrity. Server must choose the right
version of tuple, according to transaction ID of statement. Even for a
select (ACID features of postgresql, I think C and I apply here), it must
accomplish some extra work.

-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Joe
Enviado el: martes, 04 de octubre de 2005 18:11
Para: Jim C. Nasby
CC: Andreas Pflug; pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] Comparative performance


Hi Jim,

Jim C. Nasby wrote:
 Also, just because no one else has mentioned it, remember that it's very
 easy to get MySQL into a mode where you have no data integrity. If
 that's the case it's going to be faster than PostgreSQL (though I'm not
 sure how much that affects the performance of SELECTs).

Yes indeed.  When I added the REFERENCES to the schema and reran the
conversion
scripts, aside from having to reorder the table creation and loading (they
used
to be in alphabetical order), I also found a few referential integrity
errors in
the MySQL data.

Joe


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Comparative performance

2005-10-03 Thread Joe

PFC wrote:
Even though this query isn't that optimized, it's still only 16  
milliseconds.

Why does it take this long for PHP to get the results ?

Can you try pg_query'ing this exact same query, FROM PHP, and timing 
it  with getmicrotime() ?


That query took about 27 msec in actual PHP execution time.  It turns out the 
real culprit is the following query, which interestingly enough retrieves zero 
rows in the case of the Economists page that I've been using for testing, yet it 
uses up about 1370 msec in actual runtime:


SELECT topic_id1, topic_id2, topic_name, categ_id, list_name, t.title, url, 
page_type, rel_type, inverse_id, r.description AS rel_descrip, r.created, r.updated

FROM relationship r, topic t, entry_type e
WHERE ((topic_id1 = topic_id AND topic_id2 = 1252) OR (topic_id2 = topic_id and 
topic_id1 = 1252)) AND rel_type = type_id AND e.class_id = 2

ORDER BY rel_type, list_name;

The EXPLAIN ANALYZE output, after I ran VACUUM ANALYZE on the three tables, is:

 Sort  (cost=4035.55..4035.56 rows=1 width=131) (actual time=2110.000..2110.000 
rows=0 loops=1)

   Sort Key: r.rel_type, t.list_name
   -  Nested Loop  (cost=36.06..4035.54 rows=1 width=131) (actual 
time=2110.000..2110.000 rows=0 loops=1)
 Join Filter: (((inner.topic_id1 = outer.topic_id) AND 
(inner.topic_id2 = 1252)) OR ((inner.topic_id2 = outer.topic_id) AND 
(inner.topic_id1 = 1252)))
 -  Seq Scan on topic t  (cost=0.00..38.34 rows=1234 width=90) (actual 
time=0.000..15.000 rows=1234 loops=1)
 -  Materialize  (cost=36.06..37.13 rows=107 width=45) (actual 
time=0.000..0.509 rows=466 loops=1234)
   -  Merge Join  (cost=30.31..35.96 rows=107 width=45) (actual 
time=0.000..0.000 rows=466 loops=1)

 Merge Cond: (outer.type_id = inner.rel_type)
 -  Index Scan using entry_type_pkey on entry_type e  (cost
=0.00..3.94 rows=16 width=4) (actual time=0.000..0.000 rows=15 loops=1)
   Filter: (class_id = 2)
 -  Sort  (cost=30.31..31.48 rows=466 width=43) (actual 
time=0.000..0.000 rows=466 loops=1)

   Sort Key: r.rel_type
   -  Seq Scan on relationship r  (cost=0.00..9.66 
rows=466 width=43) (actual time=0.000..0.000 rows=466 loops=1)

 Total runtime: 2110.000 ms
(14 rows)

The tables are as follows:

CREATE TABLE entry_type (
  type_id SMALLINT NOT NULL PRIMARY KEY,
  title VARCHAR(32) NOT NULL,
  rel_title VARCHAR(32),
  class_id SMALLINT NOT NULL DEFAULT 1,
  inverse_id SMALLINT,
  updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
WITHOUT OIDS;

CREATE TABLE topic (
  topic_id serial PRIMARY KEY,
  topic_name VARCHAR(48) NOT NULL UNIQUE,
  categ_id SMALLINT NOT NULL,
  parent_entity INTEGER,
  parent_concept INTEGER,
  crossref_id INTEGER,
  list_name VARCHAR(80) NOT NULL,
  title VARCHAR(80),
  description VARCHAR(255),
  url VARCHAR(64),
  page_type SMALLINT NOT NULL,
  dark_ind BOOLEAN NOT NULL DEFAULT FALSE,
  ad_code INTEGER,
  created DATE NOT NULL DEFAULT CURRENT_DATE,
  updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
WITHOUT OIDS;

CREATE TABLE relationship (
  topic_id1 INTEGER NOT NULL REFERENCES topic,
  topic_id2 INTEGER NOT NULL REFERENCES topic,
  rel_type INTEGER NOT NULL,
  description VARCHAR(255),
  created DATE NOT NULL DEFAULT CURRENT_DATE,
  updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (topic_id1, topic_id2, rel_type))
WITHOUT OIDS;

I'm thinking that perhaps I need to set up another index with topic_id2 first 
and topic_id1 second.  In addition, an index on entry_type.class_id may improve 
things.  Another possibility would be to rewrite the query as a UNION.


Of course, this doesn't explain how MySQL manages to execute the query in about 
9 msec.  The only minor differences in the schema are:  entry_type.title and 
rel_title are char(32) in MySQL, entry_type.class_id is a tinyint, and 
topic.categ_id, page_type and dark_ind are also tinyints. MySQL also doesn't 
have the REFERENCES.


A couple of interesting side notes from my testing.  First is that pg_connect() 
took about 39 msec but mysql_connect() took only 4 msec, however, pg_pconnect() 
took 0.14 msec while mysql_pconnect() took 0.99 msec (all tests were repeated 
five times and the quoted results are averages).  Second, is that PostgreSQL's 
performance appears to be much more consistent in certain queries.  For example, 
the query that retrieves the list of subtopics (the names and description of 
economists), took 17 msec in PG, with a low of 15 (three times) and a high of 
21, whereas MySQL took 60 msec on average but had a low of 22 and a high of 102 
msec.


Joe


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Comparative performance

2005-09-30 Thread Magnus Hagander
  This smells like a TCP communication problem.
 
 I'm puzzled by that remark.  How much does TCP get into the 
 picture in a local Windows client/server environment?

Windows has no Unix Domain Sockets (no surprise there), so TCP
connections over the loopback interface are used to connect to the
server.

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Comparative performance

2005-09-29 Thread PFC


It appears that PostgreSQL is two to three times slower than MySQL.  For  
example, some pages that have some 30,000 characters (when saved as  
HTML) take 1 to 1 1/2 seconds with MySQL but 3 to 4 seconds with  
PostgreSQL.  I had read that the former was generally faster than the  
latter, particularly for simple web applications but I was hoping that  
Postgres' performance would not be that noticeably slower.


	From my experience, the postgres libraries in PHP are a piece of crap,  
and add a lot of overhead even from small queries.
	For instance, a simple query like SELECT * FROM table WHERE  
primary_key_id=1234 can take the following time, on my laptop, with data  
in the filesystem cache of course :


EXPLAIN ANALYZE 0.1 ms
python + psycopg 2  0.1 ms (damn fast)
php + mysql 0.3 ms
php + postgres  1-2 ms (damn slow)

	So, if your pages are designed in The PHP Way (ie. a large number of  
small queries), I might suggest using a language with a decent postgres  
interface (python, among others), or rewriting your bunches of small  
queries as Stored Procedures or Joins, which will provide large speedups.  
Doing 50 queries on a page is always a bad idea, but it's tolerable in  
php-mysql, not in php-postgres.


	If it's only one large query, there is a problem, as postgres is usually  
a lot smarter about query optimization.


	If you use the usual mysql techniques (like, storing a page counter in a  
row in a table, or storing sessions in a table) beware, these are no-nos  
for postgres, these things should NOT be done with a database anyway, try  
memcached for instance.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-09-29 Thread Joe

Magnus Hagander wrote:

That actually depends a lot on *how* you use it. I've seen pg-on-windows
deployments that come within a few percent of the linux performance.
I've also seen those that are absolutely horrible compared.

One sure way to kill the performance is to do a lot of small
connections. Using persistent connection is even more important on
Windows than it is on Unix. It could easily explain a difference like
this.


I just tried using pg_pconnect() and I didn't notice any significant 
improvement.  What bothers me most is that with Postgres I tend to see jerky 
behavior on almost every page:  the upper 1/2 or 2/3 of the page is displayed 
first and you can see a blank bottom (or you can see a half-filled completion 
bar).  With MySQL each page is generally displayed in one swoop.


Joe


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Comparative performance

2005-09-29 Thread Steinar H. Gunderson
On Thu, Sep 29, 2005 at 08:16:11AM -0400, Joe wrote:
 I just tried using pg_pconnect() and I didn't notice any significant 
 improvement.

PHP persistent connections are not really persistent -- or so I've been told.

Anyhow, what was discussed here was pg_query, not pg_connect. You really want
to reduce the number of pg_query() calls in any case; you haven't told us how
many there are yet, but it sounds like there are a lot of them.

 What bothers me most is that with Postgres I tend to see jerky behavior on
 almost every page:  the upper 1/2 or 2/3 of the page is displayed first and
 you can see a blank bottom (or you can see a half-filled completion bar).
 With MySQL each page is generally displayed in one swoop.

This might just be your TCP/IP stack finding out that the rest of the page
isn't likely to come anytime soon, and start sending it out... or something
else. I wouldn't put too much weight on it, it's likely to go away as soon as
you fix the rest of the problem.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Comparative performance

2005-09-29 Thread Gavin Sherry
On Thu, 29 Sep 2005, Joe wrote:

 Magnus Hagander wrote:
  That actually depends a lot on *how* you use it. I've seen pg-on-windows
  deployments that come within a few percent of the linux performance.
  I've also seen those that are absolutely horrible compared.
 
  One sure way to kill the performance is to do a lot of small
  connections. Using persistent connection is even more important on
  Windows than it is on Unix. It could easily explain a difference like
  this.

 I just tried using pg_pconnect() and I didn't notice any significant
 improvement.  What bothers me most is that with Postgres I tend to see jerky
 behavior on almost every page:  the upper 1/2 or 2/3 of the page is displayed
 first and you can see a blank bottom (or you can see a half-filled completion
 bar).  With MySQL each page is generally displayed in one swoop.

Please post the table definitions, queries and explain analyze results so
we can tell you why the performance is poor.

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Comparative performance

2005-09-29 Thread Joe

PFC wrote:
From my experience, the postgres libraries in PHP are a piece of 
crap,  and add a lot of overhead even from small queries.
For instance, a simple query like SELECT * FROM table WHERE  
primary_key_id=1234 can take the following time, on my laptop, with 
data  in the filesystem cache of course :


EXPLAIN ANALYZE0.1 ms
python + psycopg 20.1 ms (damn fast)
php + mysql0.3 ms
php + postgres1-2 ms (damn slow)


As a Trac user I was considering moving to Python, so it's good to know that, 
but the rewrite is a longer term project.


So, if your pages are designed in The PHP Way (ie. a large number 
of  small queries), I might suggest using a language with a decent 
postgres  interface (python, among others), or rewriting your bunches of 
small  queries as Stored Procedures or Joins, which will provide large 
speedups.  Doing 50 queries on a page is always a bad idea, but it's 
tolerable in  php-mysql, not in php-postgres.


The pages do use a number of queries to collect all the data for display but 
nowhere near 50.  I'd say it's probably less than a dozen.  As an aside, one of 
my tasks (before the conversion) was to analyze the queries and see where they 
could be tweaked for performance, but with MySQL that was never a top priority.


The schema is fairly simple having two main tables: topic and entry (sort of 
like account and transaction in an accounting scenario).  There are two 
additional tables that perhaps could be merged into the entry table (and that 
would reduce the number of queries) but I do not want to make major changes to 
the schema (and the app) for the PostgreSQL conversion.


Joe


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Comparative performance

2005-09-29 Thread Joe

Gavin Sherry wrote:

Please post the table definitions, queries and explain analyze results so
we can tell you why the performance is poor.


I did try to post that last night but apparently my reply didn't make it to the 
list.  Here it is again:


Matthew Nuzum wrote:

 This is the right list. Post detail and I'm sure you'll get some suggestions.


Thanks, Matthew (and Chris and Gavin).

The main table used in the query is defined as follows:

CREATE TABLE entry (
  entry_id serial PRIMARY KEY,
  title VARCHAR(128) NOT NULL,
  subtitle VARCHAR(128),
  subject_type SMALLINT,
  subject_id INTEGER REFERENCES topic,
  actor_type SMALLINT,
  actor_id INTEGER REFERENCES topic,
  actor VARCHAR(64),
  actor_role VARCHAR(64),
  rel_entry_id INTEGER,
  rel_entry VARCHAR(64),
  description VARCHAR(255),
  quote text,
  url VARCHAR(255),
  entry_date CHAR(10),
  created DATE NOT NULL DEFAULT CURRENT_DATE,
  updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
WITHOUT OIDS;
CREATE INDEX entry_actor_id ON entry (actor_id);
CREATE INDEX entry_subject_id ON entry (subject_id);

It has 3422 rows at this time.

The query for one of the pages is the following:

SELECT entry_id, subject_type AS type, subject_type, subject_id, actor_type, 
actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, 
description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date 
from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date 
from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created, 
updated FROM entry WHERE subject_id = 1079
UNION SELECT entry_id, actor_type AS type, subject_type, subject_id, actor_type, 
actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, 
description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date 
from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date 
from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created, 
updated FROM entry WHERE actor_id = 1079 ORDER BY type, title, subtitle;


The output of EXPLAIN ANALYZE is:

 Sort  (cost=158.98..159.14 rows=62 width=568) (actual time=16.000..16.000 
rows=59 loops=1)

   Sort Key: type, title, subtitle
   -  Unique  (cost=153.57..157.14 rows=62 width=568) (actual 
time=16.000..16.000 rows=59 loops=1)
 -  Sort  (cost=153.57..153.73 rows=62 width=568) (actual 
time=16.000..16.000 rows=59 loops=1)
   Sort Key: entry_id, type, subject_type, subject_id, 
actor_type, actor_id, actor, actor_role, rel_entry_id, rel_entry, title, 
subtitle, description, url, main_quote, rel_quote, dom, date_ymd, date_ym, 
date_y, created, updated
   -  Append  (cost=0.00..151.73 rows=62 width=568) (actual 
time=0.000..16.000 rows=59 loops=1)
 -  Subquery Scan *SELECT* 1  (cost=0.00..17.21 rows=4 
width=568) (actual time=0.000..0.000 rows=3 loops=1)
   -  Index Scan using entry_subject_id on entry 
(cost=0.00..17.17 rows=4 width=568) (actual time=0.000..0.000 rows=3 loops=1)

 Index Cond: (subject_id = 1079)
 -  Subquery Scan *SELECT* 2  (cost=0.00..134.52 rows=58 
width=568) (actual time=0.000..16.000 rows=56 loops=1)
   -  Seq Scan on entry  (cost=0.00..133.94 rows=58 
width=568) (actual time=0.000..16.000 rows=56 loops=1)

 Filter: (actor_id = 1079)
 Total runtime: 16.000 ms
(13 rows)

What I don't quite understand is why it's doing a sequential scan on actor_id 
instead of using the entry_actor_id index.  Note that actor_id has 928 non-null 
values (27%), whereas subject_id has 3089 non-null values (90%).


Note that the entry_date column was originally a MySQL date but it had partial 
dates, i.e., some days and months are set to zero.  Eventually I hope to define 
a PostgreSQL datatype for it and to simplify the substring retrievals.  However, 
I don't think the extra computational time should affect the overall runtime 
significantly.


Gavin, I'm using PostgreSQL 8.0.3, Apache 1.3.28, PHP 4.3.4, MySQL 4.0.16 and 
I'm comparing both databases on XP (on a Pentium 4, 1.6 GHz, 256 MB RAM).


Thanks for any feedback.

Joe


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-09-29 Thread Andreas Pflug

Joe wrote:




The pages do use a number of queries to collect all the data for display 
but nowhere near 50.  I'd say it's probably less than a dozen.  

The schema is fairly simple having two main tables: topic and entry 
(sort of like account and transaction in an accounting scenario).  There 
are two additional tables that perhaps could be merged into the entry 
table 


Hm, if you only have 4 tables, why do you need 12 queries?
To reduce queries, join them in the query; no need to merge them 
physically. If you have only two main tables, I'd bet you only need 1-2 
queries for the whole page.


Regards,
Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-09-29 Thread PFC



  Total runtime: 16.000 ms


	Even though this query isn't that optimized, it's still only 16  
milliseconds.

Why does it take this long for PHP to get the results ?

	Can you try pg_query'ing this exact same query, FROM PHP, and timing it  
with getmicrotime() ?


	You can even do an EXPLAIN ANALYZE from pg_query and display the results  
in your webpage, to check how long the query takes on the server.


You can also try it on a Linux box.

This smells like a TCP communication problem.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-09-29 Thread Joe

Andreas Pflug wrote:

Hm, if you only have 4 tables, why do you need 12 queries?
To reduce queries, join them in the query; no need to merge them 
physically. If you have only two main tables, I'd bet you only need 1-2 
queries for the whole page.


There are more than four tables and the queries are not functionally 
overlapping.  As an example, allow me to refer to the page 
www.freedomcircle.com/topic.php/Economists.


The top row of navigation buttons (Life, Liberty, etc.) is created from a query 
of the 'topic' table.  It could've been hard-coded as a PHP array, but with less 
flexibility.  The alphabetical links are from a SELECT DISTINCT substring from 
topic.  It could've been generated by a PHP for loop (originally implemented 
that way) but again with less flexibility.  The listing of economists is another 
SELECT from topic.  The subheadings (Articles, Books) come from a SELECT of an 
entry_type table --which currently has 70 rows-- and is read into a PHP array 
since we don't know what headings will be used in a given page.  The detail of 
the entries comes from that query that I posted earlier, but there are three 
additional queries that are used for specialized entry types (relationships 
between topics --e.g., Prof. Williams teaches at George Mason, events, and 
multi-author or multi-subject articles and books).  And there's yet another 
table for the specific book information.  Once the data is retrieved it's sorted 
internally with PHP, at the heading level, before display.


Maybe there is some way to merge all the queries (some already fairly complex) 
that fetch the data for the entries box but I believe it would be a monstrosity 
with over 100 lines of SQL.


Thanks,

Joe


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-09-29 Thread Joe

PFC wrote:
Even though this query isn't that optimized, it's still only 16  
milliseconds.

Why does it take this long for PHP to get the results ?

Can you try pg_query'ing this exact same query, FROM PHP, and timing 
it  with getmicrotime() ?


Thanks, that's what I was looking for.  It's microtime(), BTW.  It'll take me 
some time to instrument it, but that way I can pinpoint what is really slow.


You can even do an EXPLAIN ANALYZE from pg_query and display the 
results  in your webpage, to check how long the query takes on the server.


You can also try it on a Linux box.


My current host only supports MySQL.  I contacted hub.org to see if they could 
assist in this transition but I haven't heard back.



This smells like a TCP communication problem.


I'm puzzled by that remark.  How much does TCP get into the picture in a local 
Windows client/server environment?


Joe


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Comparative performance

2005-09-28 Thread Gavin Sherry
On Wed, 28 Sep 2005, Joe wrote:

 I'm converting a relatively small database (2 MB) from MySQL to PostgreSQL.  
 It
 is used to generate web pages using PHP.  Although the actual website runs 
 under
 Linux, the development is done under XP.  I've completed most of the data
 conversion and rewrite of the PHP scripts, so now I'm comparing relative
 performance.

 It appears that PostgreSQL is two to three times slower than MySQL.  For
 example, some pages that have some 30,000 characters (when saved as HTML) 
 take 1
 to 1 1/2 seconds with MySQL but 3 to 4 seconds with PostgreSQL.  I had read 
 that
 the former was generally faster than the latter, particularly for simple web
 applications but I was hoping that Postgres' performance would not be that
 noticeably slower.

Are you comparing PostgreSQL on XP to MySQL on XP or PostgreSQL on Linux
to MySQL on Linux? Our performance on XP is not great. Also, which version
of PostgreSQL are you using?


 I'm trying to determine if the difference can be attributed to anything that
 I've done or missed.  I've run VACUUM ANALYZE on the two main tables and I'm
 looking at the results of EXPLAIN on the query that drives the retrieval of
 probably 80% of the data for the pages in question.

Good.


 Before I post the EXPLAIN and the table schema I'd appreciate confirmation 
 that
 this list is the appropriate forum.  I'm a relative newcomer to PostgreSQL 
 (but
 not to relational databases), so I'm not sure if this belongs in the novice or
 general lists.

You can post the results of EXPLAIN ANALYZE here. Please including schema
definitions and the query string(s) themselves.

Thanks,

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Comparative performance

2005-09-28 Thread Dennis Bjorklund
On Wed, 28 Sep 2005, Joe wrote:

 Before I post the EXPLAIN and the table schema I'd appreciate
 confirmation that this list is the appropriate forum. 

It is and and useful things to show are

 * the slow query
 * EXPLAIN ANALYZE of the query
 * the output of \d for each table involved in the query
 * the output of SHOW ALL;
 * The amount of memory the machine have

The settings that are the most important to tune in postgresql.conf for
performance is in my opinion; shared_buffers, effective_cache_size and
(to a lesser extent) work_mem.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org