Re: [PERFORM] opportunity to benchmark a quad core Xeon
On 16-12-2006 4:24 Jeff Frost wrote: We can add more RAM and drives for testing purposes. Can someone suggest what benchmarks with what settings would be desirable to see how this system performs. I don't believe I've seen any postgres benchmarks done on a quad xeon yet. We've done our standard benchmark on a dual X5355: http://tweakers.net/reviews/661 Verdict is that for a price/performance-ratio you're better off with a 5160, but in absolute performance it does win. Best regards, Arjen ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] New to PostgreSQL, performance considerations
Alexander Staubo wrote: On Dec 15, 2006, at 17:53 , Ron wrote: At 09:50 AM 12/15/2006, Greg Smith wrote: On Fri, 15 Dec 2006, Merlin Moncure wrote: The slower is probably due to the unroll loops switch which can actually hurt code due to the larger footprint (less cache coherency). The cache issues are so important with current processors that I'd suggest throwing -Os (optimize for size) into the mix people test. So far I have been compiling PostgreSQL and running my pgbench script manually, but this makes me want to modify my script to run pgbench automatically using all possible permutations of a set of compiler flags. I don't know if it's practical, but this link comes to mind: http://clusty.com/search?query=acovea -- Cosimo ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] partition text/varchar check problem
Hello, I am trying to make partitions: CREATE SEQUENCE data_seq; CREATE TABLE data ( identyf bigint, namevarchar, added timestamp default now() ); /*/ CREATE TABLE data_a (CHECK (name LIKE varchar 'a%') ) INHERITS (data); -- CREATE TABLE data_b (CHECK (name LIKE varchar 'b%') ) INHERITS (data); /*/ CREATE INDEX data_a_idx ON data_a(name); CREATE INDEX data_b_idx ON data_b(name); /*/ CREATE RULE data_insert_a AS ON INSERT TO data WHERE (name LIKE 'a%') DO INSTEAD INSERT INTO data_a(identyf,name) VALUES (nextval('data_seq'),NEW.name); -- CREATE RULE data_insert_b AS ON INSERT TO data WHERE (name LIKE 'b%') DO INSTEAD INSERT INTO data_b(identyf,name) VALUES (nextval('data_seq'),NEW.name); I put some data and vacuum: /*/ INSERT INTO data(name) VALUES ('aaa'); INSERT INTO data(name) VALUES ('aab'); INSERT INTO data(name) VALUES ('baa'); INSERT INTO data(name) VALUES ('bab'); VACUUM ANALYZE data_a; VACUUM ANALYZE data_b; /*/ SET constraint_exclusion=off; SET EXPLAIN SELECT * FROM data WHERE name = 'aaa'; QUERY PLAN Result (cost=0.00..24.42 rows=7 width=48) - Append (cost=0.00..24.42 rows=7 width=48) - Seq Scan on data (cost=0.00..22.38 rows=5 width=48) Filter: ((name)::text = 'aaa'::text) - Seq Scan on data_a data (cost=0.00..1.02 rows=1 width=23) Filter: ((name)::text = 'aaa'::text) - Seq Scan on data_b data (cost=0.00..1.02 rows=1 width=23) Filter: ((name)::text = 'aaa'::text) (8 rows) /*/ SET constraint_exclusion=on; SET SHOW constraint_exclusion; constraint_exclusion -- on (1 row) EXPLAIN SELECT * FROM data WHERE name = 'aaa'; QUERY PLAN Result (cost=0.00..24.42 rows=7 width=48) - Append (cost=0.00..24.42 rows=7 width=48) - Seq Scan on data (cost=0.00..22.38 rows=5 width=48) Filter: ((name)::text = 'aaa'::text) - Seq Scan on data_a data (cost=0.00..1.02 rows=1 width=23) Filter: ((name)::text = 'aaa'::text) - Seq Scan on data_b data (cost=0.00..1.02 rows=1 width=23) Filter: ((name)::text = 'aaa'::text) (8 rows) I have tried with name as text in data table and in CHECK. Where do I have an error? Is it possible to make partitions with strings? Thank you for any clues. Best regards, jamcito -- smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret http://link.interia.pl/f19d4 - najlepsze filmy w intermecie ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] New to PostgreSQL, performance considerations
At 07:06 PM 12/15/2006, Michael Stone wrote: On Fri, Dec 15, 2006 at 12:24:46PM -0500, Ron wrote: ATM, the most we can say is that in a number of systems with modest physical IO subsystems So I reran it on a 3.2GHz xeon with 6G RAM off a ramdisk; I/O ain't the bottleneck on that one. Results didn't show didn't show any signficant gains regardless of compilation options (results hovered around 12k tps). If people want to continue this, I will point out that they should make sure they're linked against the optimized libpq rather than an existing one elsewhere in the library path. Beyond that, I'm done with this thread. Maybe there are some gains to be found somewhere, but the testing done thus far (while limited) is sufficient, IMO, to demonstrate that compiler options aren't going to provide a blow-your-socks-off dramatic performance improvement. AFAICT, no one has stated there would be a blow-your-socks-off dramatic performance improvement for pg due to compilation options. Just that there might be some, and there might be some that are arch specific. So far these experiments have shown = multiple instances of a ~30-35% performance improvement going from -O0 to --O3 = 1 instance of arch specific options hurting performance when combined with -O3 = 1 instance of arch specific options helping performance on an OS that only one person has tested (Gentoo Linux) = that a 2.33 GHz C2D Mac laptop (under what OS?) with a typical laptop modest physical IO subystem can do ~2100tps = that pg has a speed limit on a 3.2GHz Xeon (which kind?) with 6G RAM off a ramdisk (under what OS?) of ~12K tps (I'd be curious to see what this limit is with better CPUs and memory subsystems) Note that except for the first point, all the other results are singletons that as of yet have not been reproduced. The most important gain IMO is Knowledge, and I'd say there is still more to learn and/or verify IMHO. YMMV. Ron Peacetree ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] New to PostgreSQL, performance considerations
On Sat, Dec 16, 2006 at 10:53:21AM -0500, Ron wrote: AFAICT, no one has stated there would be a blow-your-socks-off dramatic performance improvement for pg due to compilation options. Just that there might be some, and there might be some that are arch specific. FWIW, the original claim was: It's really important to have your GLIBC compiled for your processor. It is essencial for performance. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Scaling concerns
I'm writing a webmail-type application that is meant to be used in a corporate environment. The core of my system is a Postgres database that is used as a message header cache. The two (relevant) tables being used are pasted into the end of this message. My problem is that, as the messages table increases to tens of millions of rows, pgsql slows down considerably. Even an operation like select count(*) from messages can take minutes, with a totally idle system. Postgres seems to be the most scalable Free database out there, so I must be doing something wrong. As for the most common strategy of having a slower (more rows) archival database and a smaller, faster live database, all the clients in the company are using their normal corporate email server for day-to-day email handling. The webmail is used for access email that's no longer on the corporate server, so it's not really simple to say which emails should be considered live and which are really out-of-date. My postgres settings are entirely default with the exception of shared_buffers being set to 40,000 and max_connections set to 400. I'm not sure what the meaning of most of the other settings are, so I haven't touched them. The machines running the database servers are my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB SATA II drive), and a production server with two dual-core Intel chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550 controller. Both machines are running Gentoo Linux with a 2.6.1x kernel, and both exhibit significant performance degradation when I start getting tens of millions of records. Any advice would be most appreciated. Thanks in advance! Tables: CREATE TABLE EmailAddresses ( emailid SERIAL PRIMARY KEY, -- The unique identifier of this address name TEXT NOT NULL, -- The friendly name in the address addrspec TEXT NOT NULL, -- The [EMAIL PROTECTED] part of the address UNIQUE(name, addrspec) ); and CREATE TABLE Messages ( -- Store info: msgkeyBIGSERIAL PRIMARY KEY, -- Unique identifier for a message path TEXT NOT NULL, -- Where the message is on the file system inserted TIMESTAMP DEFAULT now(),-- When the message was fetched -- Message Info: msgid TEXT UNIQUE NOT NULL, -- Message's Message-Id field mfrom INTEGER -- Who sent the message REFERENCES EmailAddresses DEFAULT NULL, mdate TIMESTAMP DEFAULT NULL, -- Message date header field replyto TEXT DEFAULT NULL, -- Message-ID of replied-to message subject TEXT DEFAULT NULL, -- Message subject header field numatch INTEGER DEFAULT NULL, -- Number of attachments UNIQUE(path) ); ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Scaling concerns
On Sat, Dec 16, 2006 at 11:26:02AM -0600, tsuraan wrote: Even an operation like select count(*) from messages can take minutes, with a totally idle system. Postgres seems to be the most scalable Free database out there, so I must be doing something wrong. Unqualified SELECT COUNT(*) FROM foo is one of the most expensive operations you can do on your system, since the visibility information has to be checked on disk for each row. Instead, try real queries on real data, and post here if some are too slow for you. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] New to PostgreSQL, performance considerations
On Sat, Dec 16, 2006 at 10:53:21AM -0500, Ron wrote: The most important gain IMO is Knowledge, and I'd say there is still more to learn and/or verify IMHO. YMMV. Well, I think there are other areas where I can spend my time where potential gains are more likely. YMMV (although, I note, you don't seem to be spending much of your own time testing this) Mike Stone ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Scaling concerns
Unqualified SELECT COUNT(*) FROM foo is one of the most expensive operations you can do on your system, since the visibility information has to be checked on disk for each row. Instead, try real queries on real data, and post here if some are too slow for you. Ok, that's a bad example. I'm learning :-) Is insert ... select also really expensive then? I have a table loaded with message-id and path information of currently-existing messages. It has ~20 million rows. Trying to do INSERT INTO Messages(path, msgid) SELECT (path, msgid) FROM tmpMessages took a really long time before psql died with an out-of-memory error. Is there a more sane way to do a table copy, or should I have just dropped all the indices from the Message table and loaded into that? Thanks! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] partition text/varchar check problem -- solved
Tom Lane wrote: CREATE TABLE data_a (CHECK (name LIKE varchar 'a%') ) INHERITS (data); -- CREATE TABLE data_b (CHECK (name LIKE varchar 'b%') ) INHERITS (data); That's not going to work because the planner is unable to prove anything about the behavior of LIKE --- there is nothing in the system that offers a relationship between the = operator and the LIKE operator. You'll need something like CHECK (name = 'a' AND name 'b') CHECK (name = 'b' AND name 'c') etc. (These work for a query like WHERE name = 'foo' because the = and = operators are all members of the same btree opclass, so the planner knows how to reason about them.) regards, tom lane Thank you, it works! Cheers, jamcito -- Jestes kierowca? To poczytaj! http://link.interia.pl/f199e ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] partition text/varchar check problem
jamcito napisał(a): /*/ SET constraint_exclusion=on; SET SHOW constraint_exclusion; constraint_exclusion -- on (1 row) EXPLAIN SELECT * FROM data WHERE name = 'aaa'; QUERY PLAN Result (cost=0.00..24.42 rows=7 width=48) - Append (cost=0.00..24.42 rows=7 width=48) - Seq Scan on data (cost=0.00..22.38 rows=5 width=48) Filter: ((name)::text = 'aaa'::text) - Seq Scan on data_a data (cost=0.00..1.02 rows=1 width=23) Filter: ((name)::text = 'aaa'::text) - Seq Scan on data_b data (cost=0.00..1.02 rows=1 width=23) Filter: ((name)::text = 'aaa'::text) (8 rows) Can you show what you get from: EXPLAIN SELECT * FROM data WHERE name LIKE 'a%' ? Irek. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] partition text/varchar check problem
Ireneusz Pluta wrote: Can you show what you get from: EXPLAIN SELECT * FROM data WHERE name LIKE 'a%' ? Irek. I get: QUERY PLAN Result (cost=0.00..24.42 rows=8 width=48) - Append (cost=0.00..24.42 rows=8 width=48) - Seq Scan on data (cost=0.00..22.38 rows=5 width=48) Filter: ((name)::text ~~ 'a%'::text) - Seq Scan on data_a data (cost=0.00..1.02 rows=2 width=23) Filter: ((name)::text ~~ 'a%'::text) - Seq Scan on data_b data (cost=0.00..1.02 rows=1 width=23) Filter: ((name)::text ~~ 'a%'::text) (8 rows) Both partition tables are scanned. Best, jamcito -- smieszne, muzyka, pilka, sexy, kibice, kino, ciekawe, extreme, kabaret http://link.interia.pl/f19d4 - najlepsze filmy w intermecie ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] partition text/varchar check problem
Ireneusz Pluta [EMAIL PROTECTED] writes: Can you show what you get from: EXPLAIN SELECT * FROM data WHERE name LIKE 'a%' Won't help. Exact equality of the WHERE condition is useful for partial-index cases, because there the planner needs to prove that the WHERE condition implies the index predicate before it can use the index; and exact equality is certainly sufficient for that. But for constraint exclusion, the problem is to prove that the WHERE condition refutes the constraint, rather than implies it. Knowing that name LIKE 'a%' disproves name LIKE 'b%' requires more knowledge about LIKE than the planner has got. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Scaling concerns
Le samedi 16 décembre 2006 18:32, Steinar H. Gunderson a écrit : Instead, try real queries on real data, and post here if some are too slow for you. To quickly find out a subset of slow queries on your production system, you can use the pgfouine tool: http://pgfouine.projects.postgresql.org/ If you then want to make some measurements of PostgreSQL performances with some different settings and compare them, consider using the tsung tool (and may be tsung-ploter companion tool to graph several benchs onto the same charts for comparing purpose): http://pgfouine.projects.postgresql.org/tsung.html http://tsung.erlang-projects.org/ http://debian.dalibo.org/unstable/ This latter link also contains a .tar.gz archive of tsung-ploter in case you're not running a debian system. Dependencies are python and matplotlib. Regards, -- Dimitri Fontaine http://www.dalibo.com/ pgpyW4ZPkTS5m.pgp Description: PGP signature
Re: [PERFORM] Scaling concerns
* tsuraan [EMAIL PROTECTED] [061216 18:26]: I'm writing a webmail-type application that is meant to be used in a corporate environment. The core of my system is a Postgres database that is used as a message header cache. The two (relevant) tables being used are pasted into the end of this message. My problem is that, as the messages table increases to tens of millions of rows, pgsql slows down considerably. Even an operation like select count(*) from messages can take minutes, with a totally idle system. Postgres seems to be the most scalable Free database out there, so I must be doing something wrong. select count(*) from table is the worst case in PostgreSQL. (MVC systems in general I guess). If you really need to run count(*) you need to think about the required isolation level of these operations and make some aggregate table yourself. (btw, select aggregate(*) from bigtable is something that no database likes, it's just the degree of slowness that sometimes is different). For scaling you should consider slony. Either hangout on #slony on Freenode.net or ask on the mailing list if you have questions. As for the most common strategy of having a slower (more rows) archival database and a smaller, faster live database, all the clients in the company are using their normal corporate email server for day-to-day email handling. The webmail is used for access email that's no longer on the corporate server, so it's not really simple to say which emails should be considered live and which are really out-of-date. My postgres settings are entirely default with the exception of shared_buffers being set to 40,000 and max_connections set to 400. I'm not sure what the meaning of most of the other settings are, so I haven't touched them. The machines running the database servers are my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB SATA II drive), and a production server with two dual-core Intel Intel chips = define more. There are Intel boxes known to have issues under specific load scenarios with PostgreSQL (again specific versions). To make it funnier, these are really really hard to track down ;) chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550 controller. Both machines are running Gentoo Linux with a 2.6.1x kernel, and both exhibit significant performance degradation when I start getting tens of millions of records. Any advice would be most appreciated. Thanks in advance! Cluster. One box that applies changes, and multiple boxes that read the data. If you cannot afford multiple boxes from the start, design your application still to work with two connections: one connection to a user with read/write permissions, and one connecting to a user having only select permissions = this way you can later easily add a loadbalancer to the mix, and use multiple postgres boxes for reading stuff. Andreas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Scaling concerns
Tsuraan, Select count(*) from bigtable is testing your disk drive speed up till about 300MB/s, after which it is CPU limited in Postgres. My guess is that your system has a very slow I/O configuration, either due to faulty driver/hardware or the configuration. The first thing you should do is run a simple I/O test on your data directory - write a file twice the size of memory using dd like this: time bash -c dd if=/dev/zero of=data_directory/bigfile bs=8k count=(2 * memory_size / 8192) sync time dd if=data_directory/bigfile of=/dev/null bs=8k Then report the times here. - Luke On 12/16/06 9:26 AM, tsuraan [EMAIL PROTECTED] wrote: I'm writing a webmail-type application that is meant to be used in a corporate environment. The core of my system is a Postgres database that is used as a message header cache. The two (relevant) tables being used are pasted into the end of this message. My problem is that, as the messages table increases to tens of millions of rows, pgsql slows down considerably. Even an operation like select count(*) from messages can take minutes, with a totally idle system. Postgres seems to be the most scalable Free database out there, so I must be doing something wrong. As for the most common strategy of having a slower (more rows) archival database and a smaller, faster live database, all the clients in the company are using their normal corporate email server for day-to-day email handling. The webmail is used for access email that's no longer on the corporate server, so it's not really simple to say which emails should be considered live and which are really out-of-date. My postgres settings are entirely default with the exception of shared_buffers being set to 40,000 and max_connections set to 400. I'm not sure what the meaning of most of the other settings are, so I haven't touched them. The machines running the database servers are my home desktop (a dual-core athlon 3200+ with 2GB RAM and a 120GB SATA II drive), and a production server with two dual-core Intel chips, 4 GB RAM, and a RAID 5 array of SATA II drives on a 3Ware 9550 controller. Both machines are running Gentoo Linux with a 2.6.1x kernel, and both exhibit significant performance degradation when I start getting tens of millions of records. Any advice would be most appreciated. Thanks in advance! Tables: CREATE TABLE EmailAddresses ( emailid SERIAL PRIMARY KEY, -- The unique identifier of this address name TEXT NOT NULL, -- The friendly name in the address addrspec TEXT NOT NULL, -- The [EMAIL PROTECTED] part of the address UNIQUE(name, addrspec) ); and CREATE TABLE Messages ( -- Store info: msgkeyBIGSERIAL PRIMARY KEY, -- Unique identifier for a message path TEXT NOT NULL, -- Where the message is on the file system inserted TIMESTAMP DEFAULT now(),-- When the message was fetched -- Message Info: msgid TEXT UNIQUE NOT NULL, -- Message's Message-Id field mfrom INTEGER -- Who sent the message REFERENCES EmailAddresses DEFAULT NULL, mdate TIMESTAMP DEFAULT NULL, -- Message date header field replyto TEXT DEFAULT NULL, -- Message-ID of replied-to message subject TEXT DEFAULT NULL, -- Message subject header field numatch INTEGER DEFAULT NULL, -- Number of attachments UNIQUE(path) ); ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org