Re: [PERFORM] opportunity to benchmark a quad core Xeon

2006-12-16 Thread Arjen van der Meijden

On 16-12-2006 4:24 Jeff Frost wrote:
We can add more RAM and drives for testing purposes.  Can someone 
suggest what benchmarks with what settings would be desirable to see how 
this system performs.  I don't believe I've seen any postgres benchmarks 
done on a quad xeon yet.


We've done our standard benchmark on a dual X5355:
http://tweakers.net/reviews/661

Verdict is that for a price/performance-ratio you're better off with a 
5160, but in absolute performance it does win.


Best regards,

Arjen

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-16 Thread Cosimo Streppone

Alexander Staubo wrote:


On Dec 15, 2006, at 17:53 , Ron wrote:


At 09:50 AM 12/15/2006, Greg Smith wrote:


On Fri, 15 Dec 2006, Merlin Moncure wrote:

The slower is probably due to the unroll loops switch which can  
actually hurt code due to the larger footprint (less cache  coherency).


The cache issues are so important with current processors that I'd  
suggest throwing -Os (optimize for size) into the mix people  test.  


So far I have been compiling PostgreSQL and running my pgbench script  
manually, but this makes me want to modify my script to run pgbench  
automatically using all possible permutations of a set of compiler  flags.


I don't know if it's practical, but this link comes to mind:

http://clusty.com/search?query=acovea

--
Cosimo

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] partition text/varchar check problem

2006-12-16 Thread jamcito
Hello,
I am trying to make partitions:

CREATE SEQUENCE data_seq;
CREATE TABLE data (
identyf bigint,
namevarchar,
added   timestamp default now()
);

/*/
CREATE TABLE data_a (CHECK (name LIKE varchar 'a%')
) INHERITS (data);
--
CREATE TABLE data_b (CHECK (name LIKE varchar 'b%')
) INHERITS (data);

/*/
CREATE INDEX data_a_idx ON data_a(name);
CREATE INDEX data_b_idx ON data_b(name);

/*/
CREATE RULE data_insert_a AS ON INSERT TO data WHERE (name LIKE 'a%')
DO INSTEAD INSERT INTO data_a(identyf,name) VALUES
(nextval('data_seq'),NEW.name);
--
CREATE RULE data_insert_b AS ON INSERT TO data WHERE (name LIKE 'b%')
DO INSTEAD INSERT INTO data_b(identyf,name) VALUES
(nextval('data_seq'),NEW.name);


I put some data and vacuum:

/*/
INSERT INTO data(name) VALUES ('aaa');
INSERT INTO data(name) VALUES ('aab');
INSERT INTO data(name) VALUES ('baa');
INSERT INTO data(name) VALUES ('bab');

VACUUM ANALYZE data_a;
VACUUM ANALYZE data_b;

/*/
SET constraint_exclusion=off;
SET
EXPLAIN SELECT * FROM data WHERE name = 'aaa';
   QUERY PLAN

 Result  (cost=0.00..24.42 rows=7 width=48)
   -  Append  (cost=0.00..24.42 rows=7 width=48)
 -  Seq Scan on data  (cost=0.00..22.38 rows=5 width=48)
   Filter: ((name)::text = 'aaa'::text)
 -  Seq Scan on data_a data  (cost=0.00..1.02 rows=1 width=23)
   Filter: ((name)::text = 'aaa'::text)
 -  Seq Scan on data_b data  (cost=0.00..1.02 rows=1 width=23)
   Filter: ((name)::text = 'aaa'::text)
(8 rows)


/*/
SET constraint_exclusion=on;
SET

SHOW constraint_exclusion;
 constraint_exclusion
--
 on
(1 row)

EXPLAIN SELECT * FROM data WHERE name = 'aaa';
   QUERY PLAN

 Result  (cost=0.00..24.42 rows=7 width=48)
   -  Append  (cost=0.00..24.42 rows=7 width=48)
 -  Seq Scan on data  (cost=0.00..22.38 rows=5 width=48)
   Filter: ((name)::text = 'aaa'::text)
 -  Seq Scan on data_a data  (cost=0.00..1.02 rows=1 width=23)
   Filter: ((name)::text = 'aaa'::text)
 -  Seq Scan on data_b data  (cost=0.00..1.02 rows=1 width=23)
   Filter: ((name)::text = 'aaa'::text)
(8 rows)


I have tried with name as text in data table and in CHECK. Where do I
have an error? Is it possible to make partitions with strings?

Thank you for any clues.

Best regards,
jamcito

--
smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret
http://link.interia.pl/f19d4 - najlepsze filmy w intermecie


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-16 Thread Ron

At 07:06 PM 12/15/2006, Michael Stone wrote:

On Fri, Dec 15, 2006 at 12:24:46PM -0500, Ron wrote:
ATM, the most we can say is that in a number of systems with modest 
physical IO subsystems



So I reran it on a 3.2GHz xeon with 6G RAM off a ramdisk; I/O ain't 
the bottleneck on that one. Results didn't show didn't show any 
signficant gains regardless of compilation options (results hovered 
around 12k tps). If people want to continue this, I will point out 
that they should make sure they're linked against the optimized 
libpq rather than an existing one elsewhere in the library path. 
Beyond that, I'm done with this thread. Maybe there are some gains 
to be found somewhere, but the testing done thus far (while limited) 
is sufficient, IMO, to demonstrate that compiler options aren't 
going to provide a blow-your-socks-off dramatic performance improvement.
AFAICT, no one has stated there would be a blow-your-socks-off 
dramatic performance improvement for pg due to compilation 
options.  Just that there might be some, and there might be some that 
are arch specific.


So far these experiments have shown
= multiple instances of a ~30-35% performance improvement going from 
-O0 to --O3
= 1 instance of arch specific options hurting performance when 
combined with -O3
= 1 instance of arch specific options helping performance on an OS 
that only one person has tested (Gentoo Linux)
= that a 2.33 GHz C2D Mac laptop (under what OS?) with a typical 
laptop modest physical IO subystem can do ~2100tps
= that pg has a speed limit on a 3.2GHz Xeon (which kind?) with 6G 
RAM off a ramdisk (under what OS?) of ~12K tps
(I'd be curious to see what this limit is with better CPUs and memory 
subsystems)


Note that except for the first point, all the other results are 
singletons that as of yet have not been reproduced.


The most important gain IMO is Knowledge, and I'd say there is 
still more to learn and/or verify IMHO. YMMV.


Ron Peacetree


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-16 Thread Steinar H. Gunderson
On Sat, Dec 16, 2006 at 10:53:21AM -0500, Ron wrote:
 AFAICT, no one has stated there would be a blow-your-socks-off 
 dramatic performance improvement for pg due to compilation 
 options.  Just that there might be some, and there might be some that 
 are arch specific.

FWIW, the original claim was: It's really important to have your GLIBC
compiled for your processor.  It is essencial for performance.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Scaling concerns

2006-12-16 Thread tsuraan

I'm writing a webmail-type application that is meant to be used in a
corporate environment.  The core of my system is a Postgres database
that is used as a message header cache.  The two (relevant) tables
being used are pasted into the end of this message.  My problem is
that, as the messages table increases to tens of millions of rows,
pgsql slows down considerably.  Even an operation like select
count(*) from messages can take minutes, with a totally idle system.
Postgres seems to be the most scalable Free database out there, so I
must be doing something wrong.

As for the most common strategy of having a slower (more rows)
archival database and a smaller, faster live database, all the
clients in the company are using their normal corporate email server
for day-to-day email handling.  The webmail is used for access email
that's no longer on the corporate server, so it's not really simple to
say which emails should be considered live and which are really
out-of-date.

My postgres settings are entirely default with the exception of
shared_buffers being set to 40,000 and max_connections set to 400.
I'm not sure what the meaning of most of the other settings are, so I
haven't touched them.  The machines running the database servers are
my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB
SATA II drive), and a production server with two dual-core Intel
chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550
controller.  Both machines are running Gentoo Linux with a 2.6.1x
kernel, and both exhibit significant performance degradation when I
start getting tens of millions of records.

Any advice would be most appreciated.  Thanks in advance!

Tables:

CREATE TABLE EmailAddresses (
 emailid   SERIAL PRIMARY KEY, --  The unique identifier of this address
 name  TEXT NOT NULL,  --  The friendly name in the address
 addrspec  TEXT NOT NULL,  --  The [EMAIL PROTECTED] part of the address
 UNIQUE(name, addrspec)
);

and

CREATE TABLE Messages (
 -- Store info:
 msgkeyBIGSERIAL PRIMARY KEY,  --  Unique identifier for a message
 path  TEXT NOT NULL,  --  Where the message is on the file system
 inserted  TIMESTAMP DEFAULT now(),--  When the message was fetched
 -- Message Info:
 msgid TEXT UNIQUE NOT NULL,   --  Message's Message-Id field
 mfrom INTEGER --  Who sent the message
   REFERENCES EmailAddresses
   DEFAULT NULL,
 mdate TIMESTAMP DEFAULT NULL, --  Message date header field
 replyto   TEXT DEFAULT NULL,  --  Message-ID of replied-to message
 subject   TEXT DEFAULT NULL,  --  Message subject header field
 numatch   INTEGER DEFAULT NULL,   --  Number of attachments
 UNIQUE(path)
);

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Scaling concerns

2006-12-16 Thread Steinar H. Gunderson
On Sat, Dec 16, 2006 at 11:26:02AM -0600, tsuraan wrote:
 Even an operation like select count(*) from messages can take minutes,
 with a totally idle system.  Postgres seems to be the most scalable Free
 database out there, so I must be doing something wrong.

Unqualified SELECT COUNT(*) FROM foo is one of the most expensive operations
you can do on your system, since the visibility information has to be checked
on disk for each row. Instead, try real queries on real data, and post here
if some are too slow for you.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-16 Thread Michael Stone

On Sat, Dec 16, 2006 at 10:53:21AM -0500, Ron wrote:
The most important gain IMO is Knowledge, and I'd say there is 
still more to learn and/or verify IMHO. YMMV.


Well, I think there are other areas where I can spend my time where 
potential gains are more likely. YMMV (although, I note, you don't seem 
to be spending much of your own time testing this)


Mike Stone

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Scaling concerns

2006-12-16 Thread tsuraan

Unqualified SELECT COUNT(*) FROM foo is one of the most expensive operations
you can do on your system, since the visibility information has to be
checked
on disk for each row. Instead, try real queries on real data, and post here
if some are too slow for you.


Ok, that's a bad example.  I'm learning :-)  Is insert ... select also
really expensive then?  I have a table loaded with message-id and path
information of currently-existing messages.  It has ~20 million rows.
Trying to do INSERT INTO Messages(path, msgid) SELECT (path, msgid)
FROM tmpMessages took a really long time before psql died with an
out-of-memory error.  Is there a more sane way to do a table copy, or
should I have just dropped all the indices from the Message table and
loaded into that?

Thanks!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] partition text/varchar check problem -- solved

2006-12-16 Thread jamcito
Tom Lane wrote:
 CREATE TABLE data_a (CHECK (name LIKE varchar 'a%')
 ) INHERITS (data);
 --
 CREATE TABLE data_b (CHECK (name LIKE varchar 'b%')
 ) INHERITS (data);
 
 That's not going to work because the planner is unable to prove anything
 about the behavior of LIKE --- there is nothing in the system that
 offers a relationship between the = operator and the LIKE operator.
 You'll need something like
 
   CHECK (name = 'a' AND name  'b')
   CHECK (name = 'b' AND name  'c')
 
 etc.  (These work for a query like WHERE name = 'foo' because
 the =  and = operators are all members of the same btree opclass,
 so the planner knows how to reason about them.)
 
   regards, tom lane

Thank you, it works!

Cheers,
jamcito

--
Jestes kierowca? To poczytaj!  http://link.interia.pl/f199e


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] partition text/varchar check problem

2006-12-16 Thread Ireneusz Pluta

jamcito napisał(a):

/*/
SET constraint_exclusion=on;
SET

SHOW constraint_exclusion;
 constraint_exclusion
--
 on
(1 row)

EXPLAIN SELECT * FROM data WHERE name = 'aaa';
   QUERY PLAN

 Result  (cost=0.00..24.42 rows=7 width=48)
   -  Append  (cost=0.00..24.42 rows=7 width=48)
 -  Seq Scan on data  (cost=0.00..22.38 rows=5 width=48)
   Filter: ((name)::text = 'aaa'::text)
 -  Seq Scan on data_a data  (cost=0.00..1.02 rows=1 width=23)
   Filter: ((name)::text = 'aaa'::text)
 -  Seq Scan on data_b data  (cost=0.00..1.02 rows=1 width=23)
   Filter: ((name)::text = 'aaa'::text)
(8 rows)
  

Can you show what you get from:
EXPLAIN SELECT * FROM data WHERE name LIKE 'a%'

?

Irek.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] partition text/varchar check problem

2006-12-16 Thread jamcito
Ireneusz Pluta wrote:
 Can you show what you get from:
 EXPLAIN SELECT * FROM data WHERE name LIKE 'a%'
 
 ?
 
 Irek.

I get:
   QUERY PLAN

 Result  (cost=0.00..24.42 rows=8 width=48)
   -  Append  (cost=0.00..24.42 rows=8 width=48)
 -  Seq Scan on data  (cost=0.00..22.38 rows=5 width=48)
   Filter: ((name)::text ~~ 'a%'::text)
 -  Seq Scan on data_a data  (cost=0.00..1.02 rows=2 width=23)
   Filter: ((name)::text ~~ 'a%'::text)
 -  Seq Scan on data_b data  (cost=0.00..1.02 rows=1 width=23)
   Filter: ((name)::text ~~ 'a%'::text)
(8 rows)

Both partition tables are scanned.

Best,
jamcito

--
smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret
http://link.interia.pl/f19d4 - najlepsze filmy w intermecie


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] partition text/varchar check problem

2006-12-16 Thread Tom Lane
Ireneusz Pluta [EMAIL PROTECTED] writes:
 Can you show what you get from:
 EXPLAIN SELECT * FROM data WHERE name LIKE 'a%'

Won't help.  Exact equality of the WHERE condition is useful for
partial-index cases, because there the planner needs to prove that
the WHERE condition implies the index predicate before it can use
the index; and exact equality is certainly sufficient for that.
But for constraint exclusion, the problem is to prove that the
WHERE condition refutes the constraint, rather than implies it.
Knowing that name LIKE 'a%' disproves name LIKE 'b%' requires
more knowledge about LIKE than the planner has got.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Scaling concerns

2006-12-16 Thread Dimitri Fontaine
Le samedi 16 décembre 2006 18:32, Steinar H. Gunderson a écrit :
 Instead, try real queries on real data,
 and post here if some are too slow for you.

To quickly find out a subset of slow queries on your production system, you 
can use the pgfouine tool:
  http://pgfouine.projects.postgresql.org/

If you then want to make some measurements of PostgreSQL performances with 
some different settings and compare them, consider using the tsung tool (and 
may be tsung-ploter companion tool to graph several benchs onto the same 
charts for comparing purpose):
  http://pgfouine.projects.postgresql.org/tsung.html
  http://tsung.erlang-projects.org/
  http://debian.dalibo.org/unstable/

This latter link also contains a .tar.gz archive of tsung-ploter in case 
you're not running a debian system. Dependencies are python and matplotlib.

Regards,
-- 
Dimitri Fontaine
http://www.dalibo.com/


pgpyW4ZPkTS5m.pgp
Description: PGP signature


Re: [PERFORM] Scaling concerns

2006-12-16 Thread Andreas Kostyrka
* tsuraan [EMAIL PROTECTED] [061216 18:26]:
 I'm writing a webmail-type application that is meant to be used in a
 corporate environment.  The core of my system is a Postgres database
 that is used as a message header cache.  The two (relevant) tables
 being used are pasted into the end of this message.  My problem is
 that, as the messages table increases to tens of millions of rows,
 pgsql slows down considerably.  Even an operation like select
 count(*) from messages can take minutes, with a totally idle system.
 Postgres seems to be the most scalable Free database out there, so I
 must be doing something wrong.

select count(*) from table is the worst case in PostgreSQL. (MVC
systems in general I guess).

If you really need to run count(*) you need to think about the
required isolation level of these operations and make some aggregate
table yourself.

(btw, select aggregate(*) from bigtable is something that no database
likes, it's just the degree of slowness that sometimes is different).

For scaling you should consider slony. Either hangout on #slony on
Freenode.net or ask on the mailing list if you have questions.

 As for the most common strategy of having a slower (more rows)
 archival database and a smaller, faster live database, all the
 clients in the company are using their normal corporate email server
 for day-to-day email handling.  The webmail is used for access email
 that's no longer on the corporate server, so it's not really simple to
 say which emails should be considered live and which are really
 out-of-date.
 
 My postgres settings are entirely default with the exception of
 shared_buffers being set to 40,000 and max_connections set to 400.
 I'm not sure what the meaning of most of the other settings are, so I
 haven't touched them.  The machines running the database servers are
 my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB
 SATA II drive), and a production server with two dual-core Intel

Intel chips = define more. There are Intel boxes known to have issues
under specific load scenarios with PostgreSQL (again specific
versions). To make it funnier, these are really really hard to track
down ;)

 chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550
 controller.  Both machines are running Gentoo Linux with a 2.6.1x
 kernel, and both exhibit significant performance degradation when I
 start getting tens of millions of records.
 
 Any advice would be most appreciated.  Thanks in advance!

Cluster. One box that applies changes, and multiple boxes that read
the data.

If you cannot afford multiple boxes from the start, design your
application still to work with two connections: one connection to a
user with read/write permissions, and one connecting to a user having
only select permissions = this way you can later easily add a
loadbalancer to the mix, and use multiple postgres boxes for reading
stuff.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Scaling concerns

2006-12-16 Thread Luke Lonergan
Tsuraan,

Select count(*) from bigtable is testing your disk drive speed up till
about 300MB/s, after which it is CPU limited in Postgres.

My guess is that your system has a very slow I/O configuration, either due
to faulty driver/hardware or the configuration.

The first thing you should do is run a simple I/O test on your data
directory - write a file twice the size of memory using dd like this:

  time bash -c dd if=/dev/zero of=data_directory/bigfile bs=8k count=(2 *
memory_size / 8192)  sync

  time dd if=data_directory/bigfile of=/dev/null bs=8k

Then report the times here.

- Luke

On 12/16/06 9:26 AM, tsuraan [EMAIL PROTECTED] wrote:

 I'm writing a webmail-type application that is meant to be used in a
 corporate environment.  The core of my system is a Postgres database
 that is used as a message header cache.  The two (relevant) tables
 being used are pasted into the end of this message.  My problem is
 that, as the messages table increases to tens of millions of rows,
 pgsql slows down considerably.  Even an operation like select
 count(*) from messages can take minutes, with a totally idle system.
 Postgres seems to be the most scalable Free database out there, so I
 must be doing something wrong.
 
 As for the most common strategy of having a slower (more rows)
 archival database and a smaller, faster live database, all the
 clients in the company are using their normal corporate email server
 for day-to-day email handling.  The webmail is used for access email
 that's no longer on the corporate server, so it's not really simple to
 say which emails should be considered live and which are really
 out-of-date.
 
 My postgres settings are entirely default with the exception of
 shared_buffers being set to 40,000 and max_connections set to 400.
 I'm not sure what the meaning of most of the other settings are, so I
 haven't touched them.  The machines running the database servers are
 my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB
 SATA II drive), and a production server with two dual-core Intel
 chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550
 controller.  Both machines are running Gentoo Linux with a 2.6.1x
 kernel, and both exhibit significant performance degradation when I
 start getting tens of millions of records.
 
 Any advice would be most appreciated.  Thanks in advance!
 
 Tables:
 
 CREATE TABLE EmailAddresses (
   emailid   SERIAL PRIMARY KEY, --  The unique identifier of this address
   name  TEXT NOT NULL,  --  The friendly name in the address
   addrspec  TEXT NOT NULL,  --  The [EMAIL PROTECTED] part of the 
 address
   UNIQUE(name, addrspec)
 );
 
 and
 
 CREATE TABLE Messages (
   -- Store info:
   msgkeyBIGSERIAL PRIMARY KEY,  --  Unique identifier for a message
   path  TEXT NOT NULL,  --  Where the message is on the file
 system
   inserted  TIMESTAMP DEFAULT now(),--  When the message was fetched
   -- Message Info:
   msgid TEXT UNIQUE NOT NULL,   --  Message's Message-Id field
   mfrom INTEGER --  Who sent the message
 REFERENCES EmailAddresses
 DEFAULT NULL,
   mdate TIMESTAMP DEFAULT NULL, --  Message date header field
   replyto   TEXT DEFAULT NULL,  --  Message-ID of replied-to message
   subject   TEXT DEFAULT NULL,  --  Message subject header field
   numatch   INTEGER DEFAULT NULL,   --  Number of attachments
   UNIQUE(path)
 );
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org