Re: [PERFORM] shared_buffers advice

2010-03-18 Thread Dimitri Fontaine
Greg Smith g...@2ndquadrant.com writes:
  I'm not sure how to make progress on similar ideas about
 tuning closer to the filesystem level without having something automated
 that takes over the actual benchmark running and data recording steps; it's
 just way too time consuming to do those right now with every tool that's
 available for PostgreSQL so far.  That's the problem I work on, there are
 easily a half dozen good ideas for improvements here floating around where
 coding time is dwarfed by required performance validation time.

I still think the best tool around currently for this kind of testing is
tsung, but I've yet to have the time to put money where my mouth is, as
they say. Still, I'd be happy to take some time a help you decide if
it's the tool you want to base your performance testing suite on or not.

  http://tsung.erlang-projects.org/

Regards,
-- 
dim

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dump far too slow

2010-03-18 Thread Matthew Wakeling

On Sun, 14 Mar 2010, David Newall wrote:

  nohup time pg_dump -f database.dmp -Z9 database

I presumed pg_dump was CPU-bound because of gzip compression, but a test I 
ran makes that seem unlikely...


There was some discussion about this a few months ago at 
http://archives.postgresql.org/pgsql-performance/2009-07/msg00348.php


It seems that getting pg_dump to do the compression is a fair amount 
slower than piping the plain format dump straight through gzip. You get a 
bit more parallelism that way too.


Matthew


--
I'm always interested when [cold callers] try to flog conservatories.
Anyone who can actually attach a conservatory to a fourth floor flat
stands a marginally better than average chance of winning my custom.
(Seen on Usenet)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread Corin

Hi all,

I'm running quite a large social community website (250k users, 16gb 
database). We are currently preparing a complete relaunch and thinking 
about switching from mysql 5.1.37 innodb to postgresql 8.4.2. The 
database server is a dual dualcore operton 2216 with 12gb ram running on 
debian amd64.


For a first impression I ran a simple query on our users table (snapshot 
with only ~ 45.000 records). The table has an index on birthday_age 
[integer]. The test executes 10 times the same query and simply discards 
the results. I ran the tests using a php and a ruby script, the results 
are almost the same.


Unluckily mysql seems to be around 3x as fast as postgresql for this 
simple query. There's no swapping, disc reading involved...everything is 
in ram.


query
select * from users where birthday_age between 12 and 13 or birthday_age 
between 20 and 22 limit 1000


mysql
{select_type=SIMPLE, key_len=1, id=1, table=users, 
type=range, possible_keys=birthday_age, rows=7572, 
Extra=Using where, ref=nil, key=birthday_age}

15.104055404663
14.209032058716
18.857002258301
15.714883804321
14.73593711853
15.048027038574
14.589071273804
14.847040176392
15.192985534668
15.115976333618

postgresql
{QUERY PLAN=Limit (cost=125.97..899.11 rows=1000 width=448) (actual 
time=0.927..4.990 rows=1000 loops=1)}
{QUERY PLAN= - Bitmap Heap Scan on users (cost=125.97..3118.00 
rows=3870 width=448) (actual time=0.925..3.420 rows=1000 loops=1)}
{QUERY PLAN= Recheck Cond: (((birthday_age = 12) AND (birthday_age 
= 13)) OR ((birthday_age = 20) AND (birthday_age = 22)))}
{QUERY PLAN= - BitmapOr (cost=125.97..125.97 rows=3952 width=0) 
(actual time=0.634..0.634 rows=0 loops=1)}
{QUERY PLAN= - Bitmap Index Scan on birthday_age (cost=0.00..41.67 
rows=1341 width=0) (actual time=0.260..0.260 rows=1327 loops=1)}
{QUERY PLAN= Index Cond: ((birthday_age = 12) AND (birthday_age = 
13))}
{QUERY PLAN= - Bitmap Index Scan on birthday_age (cost=0.00..82.37 
rows=2611 width=0) (actual time=0.370..0.370 rows=2628 loops=1)}
{QUERY PLAN= Index Cond: ((birthday_age = 20) AND (birthday_age = 
22))}

{QUERY PLAN=Total runtime: 5.847 ms}
44.173002243042
41.156768798828
39.988040924072
40.470123291016
40.035963058472
40.077924728394
40.94386100769
40.183067321777
39.83211517334
40.256977081299

I also wonder why the reported runtime of 5.847 ms is so much different 
to the runtime reported of my scripts (both php and ruby are almost the 
same). What's the best tool to time queries in postgresql? Can this be 
done from pgadmin?


Thanks,
Corin


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread tv
I guess we need some more details about the test. Is the
connection/disconnection part of each test iteration? And how are the
databases connected (using a socked / localhost / different host)?

Anyway measuring such simple queries will tell you almost nothing about
the general app performance - use the queries that are used in the
application.

 I also wonder why the reported runtime of 5.847 ms is so much different
 to the runtime reported of my scripts (both php and ruby are almost the
 same). What's the best tool to time queries in postgresql? Can this be
 done from pgadmin?

I doubt there's a 'best tool' to time queries, but I'd vote for logging
from the application itself, as it measures the performance from the end
user view-point (and that's what you're interested in). Just put some
simple logging into the database access layer.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread Kenneth Marshall
If you expect this DB to be memory resident, you should update
the cpu/disk cost parameters in postgresql.conf. There was a
post earlier today with some more reasonable starting values.
Certainly your test DB will be memory resident.

Ken

On Thu, Mar 18, 2010 at 03:31:18PM +0100, Corin wrote:
 Hi all,

 I'm running quite a large social community website (250k users, 16gb 
 database). We are currently preparing a complete relaunch and thinking 
 about switching from mysql 5.1.37 innodb to postgresql 8.4.2. The database 
 server is a dual dualcore operton 2216 with 12gb ram running on debian 
 amd64.

 For a first impression I ran a simple query on our users table (snapshot 
 with only ~ 45.000 records). The table has an index on birthday_age 
 [integer]. The test executes 10 times the same query and simply discards 
 the results. I ran the tests using a php and a ruby script, the results are 
 almost the same.

 Unluckily mysql seems to be around 3x as fast as postgresql for this simple 
 query. There's no swapping, disc reading involved...everything is in ram.

 query
 select * from users where birthday_age between 12 and 13 or birthday_age 
 between 20 and 22 limit 1000

 mysql
 {select_type=SIMPLE, key_len=1, id=1, table=users, 
 type=range, possible_keys=birthday_age, rows=7572, 
 Extra=Using where, ref=nil, key=birthday_age}
 15.104055404663
 14.209032058716
 18.857002258301
 15.714883804321
 14.73593711853
 15.048027038574
 14.589071273804
 14.847040176392
 15.192985534668
 15.115976333618

 postgresql
 {QUERY PLAN=Limit (cost=125.97..899.11 rows=1000 width=448) (actual 
 time=0.927..4.990 rows=1000 loops=1)}
 {QUERY PLAN= - Bitmap Heap Scan on users (cost=125.97..3118.00 
 rows=3870 width=448) (actual time=0.925..3.420 rows=1000 loops=1)}
 {QUERY PLAN= Recheck Cond: (((birthday_age = 12) AND (birthday_age = 
 13)) OR ((birthday_age = 20) AND (birthday_age = 22)))}
 {QUERY PLAN= - BitmapOr (cost=125.97..125.97 rows=3952 width=0) 
 (actual time=0.634..0.634 rows=0 loops=1)}
 {QUERY PLAN= - Bitmap Index Scan on birthday_age (cost=0.00..41.67 
 rows=1341 width=0) (actual time=0.260..0.260 rows=1327 loops=1)}
 {QUERY PLAN= Index Cond: ((birthday_age = 12) AND (birthday_age = 
 13))}
 {QUERY PLAN= - Bitmap Index Scan on birthday_age (cost=0.00..82.37 
 rows=2611 width=0) (actual time=0.370..0.370 rows=2628 loops=1)}
 {QUERY PLAN= Index Cond: ((birthday_age = 20) AND (birthday_age = 
 22))}
 {QUERY PLAN=Total runtime: 5.847 ms}
 44.173002243042
 41.156768798828
 39.988040924072
 40.470123291016
 40.035963058472
 40.077924728394
 40.94386100769
 40.183067321777
 39.83211517334
 40.256977081299

 I also wonder why the reported runtime of 5.847 ms is so much different to 
 the runtime reported of my scripts (both php and ruby are almost the same). 
 What's the best tool to time queries in postgresql? Can this be done from 
 pgadmin?

 Thanks,
 Corin


 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread Thom Brown
On 18 March 2010 14:31, Corin wakath...@gmail.com wrote:

 Hi all,

 I'm running quite a large social community website (250k users, 16gb
 database). We are currently preparing a complete relaunch and thinking about
 switching from mysql 5.1.37 innodb to postgresql 8.4.2. The database server
 is a dual dualcore operton 2216 with 12gb ram running on debian amd64.

 For a first impression I ran a simple query on our users table (snapshot
 with only ~ 45.000 records). The table has an index on birthday_age
 [integer]. The test executes 10 times the same query and simply discards the
 results. I ran the tests using a php and a ruby script, the results are
 almost the same.

 Unluckily mysql seems to be around 3x as fast as postgresql for this simple
 query. There's no swapping, disc reading involved...everything is in ram.

 query
 select * from users where birthday_age between 12 and 13 or birthday_age
 between 20 and 22 limit 1000

 mysql
 {select_type=SIMPLE, key_len=1, id=1, table=users,
 type=range, possible_keys=birthday_age, rows=7572,
 Extra=Using where, ref=nil, key=birthday_age}
 15.104055404663
 14.209032058716
 18.857002258301
 15.714883804321
 14.73593711853
 15.048027038574
 14.589071273804
 14.847040176392
 15.192985534668
 15.115976333618

 postgresql
 {QUERY PLAN=Limit (cost=125.97..899.11 rows=1000 width=448) (actual
 time=0.927..4.990 rows=1000 loops=1)}
 {QUERY PLAN= - Bitmap Heap Scan on users (cost=125.97..3118.00
 rows=3870 width=448) (actual time=0.925..3.420 rows=1000 loops=1)}
 {QUERY PLAN= Recheck Cond: (((birthday_age = 12) AND (birthday_age =
 13)) OR ((birthday_age = 20) AND (birthday_age = 22)))}
 {QUERY PLAN= - BitmapOr (cost=125.97..125.97 rows=3952 width=0)
 (actual time=0.634..0.634 rows=0 loops=1)}
 {QUERY PLAN= - Bitmap Index Scan on birthday_age (cost=0.00..41.67
 rows=1341 width=0) (actual time=0.260..0.260 rows=1327 loops=1)}
 {QUERY PLAN= Index Cond: ((birthday_age = 12) AND (birthday_age =
 13))}
 {QUERY PLAN= - Bitmap Index Scan on birthday_age (cost=0.00..82.37
 rows=2611 width=0) (actual time=0.370..0.370 rows=2628 loops=1)}
 {QUERY PLAN= Index Cond: ((birthday_age = 20) AND (birthday_age =
 22))}
 {QUERY PLAN=Total runtime: 5.847 ms}
 44.173002243042
 41.156768798828
 39.988040924072
 40.470123291016
 40.035963058472
 40.077924728394
 40.94386100769
 40.183067321777
 39.83211517334
 40.256977081299

 I also wonder why the reported runtime of 5.847 ms is so much different to
 the runtime reported of my scripts (both php and ruby are almost the same).
 What's the best tool to time queries in postgresql? Can this be done from
 pgadmin?


pgAdmin will return the query time in the status bar of a query window.
Similarly, you can use psql and activate query times by using \timing.

Regards

Thom


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread Grzegorz Jaśkiewicz
time that psql or pgAdmin shows is purely the postgresql time.
Question here was about the actual application's time. Sometimes the data
transmission, fetch and processing on the app's side can take longer than
the 'postgresql' time.


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread Stephen Frost
Corin,

* Corin (wakath...@gmail.com) wrote:
 I'm running quite a large social community website (250k users, 16gb  
 database). We are currently preparing a complete relaunch and thinking  
 about switching from mysql 5.1.37 innodb to postgresql 8.4.2. The  
 database server is a dual dualcore operton 2216 with 12gb ram running on  
 debian amd64.

Can you provide at least your postgresql.conf?  That could be useful,
though this does seem like a really simple query.

 For a first impression I ran a simple query on our users table (snapshot  
 with only ~ 45.000 records). The table has an index on birthday_age  
 [integer]. The test executes 10 times the same query and simply discards  
 the results. I ran the tests using a php and a ruby script, the results  
 are almost the same.

I wouldn't expect it to matter a whole lot, but have you considered
using prepared queries?

 Unluckily mysql seems to be around 3x as fast as postgresql for this  
 simple query. There's no swapping, disc reading involved...everything is  
 in ram.

 query
 select * from users where birthday_age between 12 and 13 or birthday_age  
 between 20 and 22 limit 1000

Do you use every column from users, and do you really want 1000 records
back?

 {QUERY PLAN=Total runtime: 5.847 ms}

This runtime is the amount of time it took for the backend to run the
query.

 44.173002243042

These times are including all the time required to get the data back to
the client.  If you don't use cursors, all data from the query is
returned all at once.  Can you post the script you're using along with
the table schema and maybe some sample or example data?  Also, are you
doing this all inside a single transaction, or are you creating a new
transaction for every query?  I trust you're not reconnecting to the
database for every query..

 I also wonder why the reported runtime of 5.847 ms is so much different  
 to the runtime reported of my scripts (both php and ruby are almost the  
 same). What's the best tool to time queries in postgresql? Can this be  
 done from pgadmin?

As was mentioned elsewhere, certainly the best tool to test with is your
actual application, if that's possible..  Or at least the language your
application is in.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] shared_buffers advice

2010-03-18 Thread Greg Smith

Dimitri Fontaine wrote:

I still think the best tool around currently for this kind of testing is
tsung


I am happy to say that for now, pgbench is the only actual testing tool 
supported.  Done; now I don't need tsung. 

However, that doesn't actually solve any of the problems I was talking 
about though, which is why I'm not even talking about that part.  We 
need the glue to pull out software releases, run whatever testing tool 
is appropriate, and then save the run artifacts in some standardized 
form so they can be referenced with associated build/configuration 
information to track down a regression when it does show up.  Building 
those boring bits are the real issue here; load testing tools are easy 
to find because those are fun to work on.


And as a general commentary on the vision here, tsung will never fit 
into this anyway because something that can run on the buildfarm 
machines with the software they already have installed is the primary 
target.  I don't see anything about tsung so interesting that it trumps 
that priority, even though it is an interesting tool.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread Magnus Hagander
On Thu, Mar 18, 2010 at 16:09, Stephen Frost sfr...@snowman.net wrote:
 Corin,

 * Corin (wakath...@gmail.com) wrote:
 {QUERY PLAN=Total runtime: 5.847 ms}

 This runtime is the amount of time it took for the backend to run the
 query.

 44.173002243042

 These times are including all the time required to get the data back to
 the client.  If you don't use cursors, all data from the query is
 returned all at once.  Can you post the script you're using along with
 the table schema and maybe some sample or example data?  Also, are you
 doing this all inside a single transaction, or are you creating a new
 transaction for every query?  I trust you're not reconnecting to the
 database for every query..

Just as a note here, since the OP is using Debian. If you are
connecting over TCP, debian will by default to SSL on your connection
which obviously adds a *lot* of overhead. If you're not actively using
it (in which case you will control this from pg_hba.conf), just edit
postgresql.conf and disable SSL, then restart the server.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread Scott Marlowe
On Thu, Mar 18, 2010 at 8:31 AM, Corin wakath...@gmail.com wrote:
 Hi all,

 I'm running quite a large social community website (250k users, 16gb
 database). We are currently preparing a complete relaunch and thinking about
 switching from mysql 5.1.37 innodb to postgresql 8.4.2. The database server
 is a dual dualcore operton 2216 with 12gb ram running on debian amd64.

 For a first impression I ran a simple query on our users table (snapshot
 with only ~ 45.000 records). The table has an index on birthday_age
 [integer]. The test executes 10 times the same query and simply discards the
 results. I ran the tests using a php and a ruby script, the results are
 almost the same.

 Unluckily mysql seems to be around 3x as fast as postgresql for this simple
 query. There's no swapping, disc reading involved...everything is in ram.

 query
 select * from users where birthday_age between 12 and 13 or birthday_age
 between 20 and 22 limit 1000

 mysql
 {select_type=SIMPLE, key_len=1, id=1, table=users,
 type=range, possible_keys=birthday_age, rows=7572,
 Extra=Using where, ref=nil, key=birthday_age}
 15.104055404663
 14.209032058716
 18.857002258301
 15.714883804321
 14.73593711853
 15.048027038574
 14.589071273804
 14.847040176392
 15.192985534668
 15.115976333618

 postgresql
 {QUERY PLAN=Limit (cost=125.97..899.11 rows=1000 width=448) (actual
 time=0.927..4.990 rows=1000 loops=1)}
 {QUERY PLAN= - Bitmap Heap Scan on users (cost=125.97..3118.00
 rows=3870 width=448) (actual time=0.925..3.420 rows=1000 loops=1)}
 {QUERY PLAN= Recheck Cond: (((birthday_age = 12) AND (birthday_age =
 13)) OR ((birthday_age = 20) AND (birthday_age = 22)))}
 {QUERY PLAN= - BitmapOr (cost=125.97..125.97 rows=3952 width=0) (actual
 time=0.634..0.634 rows=0 loops=1)}
 {QUERY PLAN= - Bitmap Index Scan on birthday_age (cost=0.00..41.67
 rows=1341 width=0) (actual time=0.260..0.260 rows=1327 loops=1)}
 {QUERY PLAN= Index Cond: ((birthday_age = 12) AND (birthday_age =
 13))}
 {QUERY PLAN= - Bitmap Index Scan on birthday_age (cost=0.00..82.37
 rows=2611 width=0) (actual time=0.370..0.370 rows=2628 loops=1)}
 {QUERY PLAN= Index Cond: ((birthday_age = 20) AND (birthday_age =
 22))}
 {QUERY PLAN=Total runtime: 5.847 ms}
 44.173002243042
 41.156768798828
 39.988040924072
 40.470123291016
 40.035963058472
 40.077924728394
 40.94386100769
 40.183067321777
 39.83211517334
 40.256977081299

 I also wonder why the reported runtime of 5.847 ms is so much different to
 the runtime reported of my scripts (both php and ruby are almost the same).
 What's the best tool to time queries in postgresql? Can this be done from
 pgadmin?

It's different because it only takes pgsql 5 milliseconds to run the
query, and 40 seconds to transfer the data across to your applicaiton,
which THEN promptly throws it away.  If you run it as

MySQL's client lib doesn't transfer over the whole thing.  This is
more about how each db interface is implemented in those languages.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread Arjen van der Meijden

On 18-3-2010 16:50 Scott Marlowe wrote:

It's different because it only takes pgsql 5 milliseconds to run the
query, and 40 seconds to transfer the data across to your applicaiton,
which THEN promptly throws it away.  If you run it as

MySQL's client lib doesn't transfer over the whole thing.  This is
more about how each db interface is implemented in those languages.


Its the default behavior of both PostgreSQL and MySQL to transfer the 
whole resultset over to the client. Or is that different for Ruby's 
MySQL-driver? At least in PHP the behavior is similar for both.
And I certainly do hope its 40ms rather than 40s, otherwise it would be 
a really bad performing network in either case (15s for mysql) or very 
large records (which I doubt).


I'm wondering if a new connection is made between each query. PostgreSQL 
is (afaik still is but I haven't compared that recently) a bit slower on 
that department than MySQL.


Best regards,

Arjen

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-18 Thread Hannu Krosing
On Wed, 2010-03-17 at 16:49 -0400, Greg Smith wrote:
 Alvaro Herrera wrote:
  Andres Freund escribió:
 

  I find it way much easier to believe such issues exist on a tables in 
  constrast to indexes. The likelihood to get sequential accesses on an 
  index is 
  small enough on a big table to make it unlikely to matter much.
  
 
  Vacuum walks indexes sequentially, for one.

 
 That and index-based range scans were the main two use-cases I was 
 concerned would be degraded by interleaving index builds, compared with 
 doing them in succession. 

I guess that tweaking file systems to allocate in bigger chunks help
here ? I know that xfs can be tuned in that regard, but how about other
common file systems like ext3 ?

- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Forcing index scan on query produces 16x faster

2010-03-18 Thread Dave Crooke
I've also observed the same behaviour on a very large table (200GB data,
170GB for 2 indexes) 

I have a table which has 6 small columns, let's call them (a, b, c, d, e, f)
and about 1 billion rows. There is an index on (a, b, c, d) - not my idea,
Hibernate requires primary keys for every table.

If I do the following query:

*select max(c) from tbl where a=[constant literal] and b=[other constant
literal];*

 then with maxed out analysis histograms, and no changes to any of the
page_cost type stuff, it still deparately wants toi do a full table scan,
which is ... kinda slow.

Of course, a billion row table is also rather suboptimal (our app collects a
lot more data than it used to) and so I'm bypassing Hibernate, and sharding
it all by time, so that the tables and indexes will be a manageable size,
and will also be vacuum-free as my aging out process is now DROP TABLE :-)

Cheers
Dave

On Wed, Mar 17, 2010 at 8:01 PM, Eger, Patrick pe...@automotive.com wrote:

 I'm running 8.4.2 and have noticed a similar heavy preference for
 sequential scans and hash joins over index scans and nested loops.  Our
 database is can basically fit in cache 100% so this may not be
 applicable to your situation, but the following params seemed to help
 us:

 seq_page_cost = 1.0
 random_page_cost = 1.01
 cpu_tuple_cost = 0.0001
 cpu_index_tuple_cost = 0.5
 cpu_operator_cost = 0.25
 effective_cache_size = 1000MB
 shared_buffers = 1000MB


 Might I suggest the Postgres developers reconsider these defaults for
 9.0 release, or perhaps provide a few sets of tuning params for
 different workloads in the default install/docs? The cpu_*_cost in
 particular seem to be way off afaict. I may be dead wrong though, fwiw
 =)

 -Original Message-
 From: pgsql-performance-ow...@postgresql.org
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Christian
 Brink
 Sent: Wednesday, March 17, 2010 2:26 PM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Forcing index scan on query produces 16x faster

 I am running into a problem with a particular query. The execution plan
 cost shows that the Seq Scan is a better bet (cost=54020.49..54020.55)
 over the forced index 'enable_seqscan =  false'
 (cost=1589703.87..1589703.93). But when I run the query both ways I get
 a vastly different result (below). It appears not to want to bracket the

 salesitems off of the 'id' foreign_key unless I force it.

 Is there a way to rewrite or hint the planner to get me the better plan
 without resorting to 'enable_seqscan' manipulation (or am I missing
 something)?

 postream= select version();
  version
 
 -
  PostgreSQL 8.0.3 on i386-redhat-linux-gnu, compiled by GCC
 i386-redhat-linux-gcc (GCC) 4.0.0 20050505 (Red Hat 4.0.0-4)


 postream= SET enable_seqscan = false;
 SET
 postream= EXPLAIN ANALYZE
 postream- SELECT si.group1_id as name, sum(si.qty) as count,
 sum(si.amt) as amt
 postream-   FROM salesitems si, sales s, sysstrings
 postream-  WHERE si.id = s.id
 postream-AND si.group1_id != ''
 postream-AND si.group1_id IS NOT NULL
 postream-AND NOT si.void
 postream-AND NOT s.void
 postream-AND NOT s.suspended
 postream-AND s.tranzdate = (cast('2010-02-15' as date) +
 cast(sysstrings.data as time))
 postream-AND s.tranzdate  ((cast('2010-02-15' as date) + 1) +
 cast(sysstrings.data as time))
 postream-AND sysstrings.id='net/Console/Employee/Day End Time'
 postream-  GROUP BY name;

 QUERY PLAN
 
 
 
  HashAggregate  (cost=1589703.87..1589703.93 rows=13 width=35) (actual
 time=33.414..33.442 rows=12 loops=1)
-  Nested Loop  (cost=0.01..1588978.22 rows=96753 width=35) (actual

 time=0.284..22.115 rows=894 loops=1)
  -  Nested Loop  (cost=0.01..2394.31 rows=22530 width=4)
 (actual time=0.207..4.671 rows=225 loops=1)
-  Index Scan using sysstrings_pkey on sysstrings
 (cost=0.00..5.78 rows=1 width=175) (actual time=0.073..0.078 rows=1
 loops=1)
  Index Cond: (id = 'net/Console/Employee/Day End
 Time'::text)
-  Index Scan using sales_tranzdate_index on sales s
 (cost=0.01..1825.27 rows=22530 width=12) (actual time=0.072..3.464
 rows=225 loops=1)
  Index Cond: ((s.tranzdate = ('2010-02-15'::date +

 (outer.data)::time without time zone)) AND (s.tranzdate 
 ('2010-02-16'::date + (outer.data)::time without time zone)))
  Filter: ((NOT void) AND (NOT suspended))
  -  Index Scan using salesitems_pkey on salesitems si
 (cost=0.00..70.05 rows=30 width=39) (actual time=0.026..0.052 rows=4
 loops=225)
Index 

Re: [PERFORM] mysql to postgresql, performance questions

2010-03-18 Thread Ivan Voras

Corin wrote:

Hi all,

I'm running quite a large social community website (250k users, 16gb 
database). We are currently preparing a complete relaunch and thinking 
about switching from mysql 5.1.37 innodb to postgresql 8.4.2. The 


relaunch looks like you are nearing the end (the launch) of the 
project - if so, you should know that switching databases near the 
project deadline is almost always a suicidal act. Even if the big 
differences are easily fixable, the small differences will kill you.


database server is a dual dualcore operton 2216 with 12gb ram running on 
debian amd64.


For a first impression I ran a simple query on our users table (snapshot 
with only ~ 45.000 records). The table has an index on birthday_age 
[integer]. The test executes 10 times the same query and simply discards 
the results. I ran the tests using a php and a ruby script, the results 
are almost the same.


Your table will probably fit in RAM but the whole database obviously 
won't. Not that it matters here.


Did you configure anything at all in postgresql.conf? The defaults 
assume a very small database.


Unluckily mysql seems to be around 3x as fast as postgresql for this 
simple query. There's no swapping, disc reading involved...everything is 
in ram.


It depends...


15.115976333618


So this is 15 ms?


postgresql
{QUERY PLAN=Limit (cost=125.97..899.11 rows=1000 width=448) (actual 
time=0.927..4.990 rows=1000 loops=1)}
{QUERY PLAN= - Bitmap Heap Scan on users (cost=125.97..3118.00 
rows=3870 width=448) (actual time=0.925..3.420 rows=1000 loops=1)}
{QUERY PLAN= Recheck Cond: (((birthday_age = 12) AND (birthday_age 
= 13)) OR ((birthday_age = 20) AND (birthday_age = 22)))}
{QUERY PLAN= - BitmapOr (cost=125.97..125.97 rows=3952 width=0) 
(actual time=0.634..0.634 rows=0 loops=1)}
{QUERY PLAN= - Bitmap Index Scan on birthday_age (cost=0.00..41.67 
rows=1341 width=0) (actual time=0.260..0.260 rows=1327 loops=1)}
{QUERY PLAN= Index Cond: ((birthday_age = 12) AND (birthday_age = 
13))}
{QUERY PLAN= - Bitmap Index Scan on birthday_age (cost=0.00..82.37 
rows=2611 width=0) (actual time=0.370..0.370 rows=2628 loops=1)}
{QUERY PLAN= Index Cond: ((birthday_age = 20) AND (birthday_age = 
22))}

{QUERY PLAN=Total runtime: 5.847 ms}
44.173002243042


I also wonder why the reported runtime of 5.847 ms is so much different 
to the runtime reported of my scripts (both php and ruby are almost the 


It looks like you are spending ~~38 ms in delivering the data to your 
application. Whatever you are using, stop using it :)


same). What's the best tool to time queries in postgresql? Can this be 
done from pgadmin?


The only rational way is to measure at the database itself and not 
include other factors like the network, scripting language libraries, 
etc. To do this, login at your db server with a shell and use psql. 
Start it as psql databasename username and issue a statement like 
EXPLAIN ANALYZE SELECT ...your_query Unless magic happens, this 
will open a local unix socket connection to the database for the query, 
which has the least overhead.


You can of course also do this for MySQL though I don't know if it has 
an equivalent of EXPLAIN ANALYZE.


But even after you have found where the problem is, and even if you see 
that Pg is faster than MySQL, you will still need realistic loads to 
test the real-life performance difference.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance