Re: [PERFORM] Huge Data sets, simple queries
On Fri, Jan 27, 2006 at 08:23:55PM -0500, Mike Biamonte wrote: This query took 18 hours on PG 8.1 on a Dual Xeon, RHEL3, (2.4 Kernel) with RAID-10 (15K drives) and 12 GB Ram. I was expecting it to take about 4 hours - based on some experience with a similar dataset on a different machine (RH9, PG7.3 Dual Xeon, 4GB RAM, Raid-5 10K drives) It's difficult to just try various ideas because each attempt takes a full day to test. Real experience is needed here! It seems like you are changing multiple variables at the same time. I think you need to first compare the query plans with EXPLAIN SELECT to see if they are significantly different. Your upgrade from 7.3 to 8.1 may have resulted in a less optimal plan. Second, you should monitor your IO performance during the query execution and test it independent of postgres. Then compare the stats between the two systems. As a side note, if you have many disks and you are concerned about bottlenecks on read operations, RAID 5 may perform better than RAID 10. -Mike ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SAN/NAS options
On Sat, Jan 14, 2006 at 09:37:01PM -0500, Charles Sprickman wrote: Following up to myself again... On Wed, 14 Dec 2005, Charles Sprickman wrote: Hello all, Supermicro 1U w/SCA backplane and 4 bays 2x2.8 GHz Xeons Adaptec 2015S zero channel RAID card I don't want to throw away the four machines like that that we have. I do want to throw away the ZCR cards... :) If I ditch those I still have a 1U box with a U320 scsi plug on the back. I'm vaguely considering pairing these two devices: http://www.areca.us/products/html/products.htm That's an Areca 16 channel SATA II (I haven't even read up on what's new in SATA II) RAID controller with an optional U320 SCSI daughter card to connect to the host(s). http://www.chenbro.com.tw/Chenbro_Special/RM321.php Not sure how significant, but the RM321 backplane claims to support SATA 150 (aka SATA I) only. -Mike ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] fine tuning for logging server
On Wed, Mar 30, 2005 at 08:41:43PM -0600, John Arbash Meinel wrote: If all you are doing is append only logging, the fastest thing is probably just a flat file. You could have something that comes along later to move it into the database. It doesn't really sound like you are using any features a database provides. (normalization, foreign keys, indexes, etc.) Here's two ideas that I don't think have been mentioned yet: Use copy to bulk load the data instead of individual imports. And if you get desperate, you can run pg with fsync=false since you don't seem to care about re-initializing your whole database in the case of unexpected interruption. -Mike ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Peformance Tuning Opterons/ Hard Disk Layout
On Wed, Feb 23, 2005 at 02:15:52PM -0500, John Allgood wrote: using custom scripts. Maybe I have given a better explanation of the application. my biggest concern is how to partition the shared storage for maximum performance. Is there a real benifit to having more that one raid5 partition or am I wasting my time. I think the simplest and most generic solution would be to put the OS and pg_xlog on a RAID 1 pair and dedicate the rest of the drives to RAID 5 or RAID 1+0 (striped set of mirrors) array. Depending on the nature of your work, you may get better performance by placing individual tables/indices on dedicated spindles for parallel access. -Mike Adler ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Low Performance for big hospital server ..
On Sun, Jan 02, 2005 at 09:54:32AM +0700, [EMAIL PROTECTED] wrote: postgresql 7.3.2-1 with RH 9 on a mechine of 2 Xeon 3.0 Ghz and ram of 4 Gb. You may want to try disabling hyperthreading, if you don't mind rebooting. grew up to 3.5 Gb and there were more than 160 concurent connections. Looks like your growing dataset won't fit in your OS disk cache any longer. Isolate your most problematic queries and check out their query plans. I bet you have some sequential scans that used to read from cache but now need to read the disk. An index may help you. More RAM wouldn't hurt. =) -Mike Adler ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] PG Logging is Slow
On Mon, Dec 20, 2004 at 03:17:11PM +1100, Theo Galanakis wrote: Under postgres 7.3 logging is incredibly slow! I have applied the following settings: syslog = 2 syslog_facility = 'LOCAL0' syslog_ident = 'postgres' log_connections = true log_duration = true log_pid = true log_statement = true log_timestamp = true This severely impacted the performance of our production system, a search page which took 1-3 seconds now takes over 30, is this normal? I need to get some performance indicators from our production db, however I cant turn on logging with such performance degradation. I've experienced this problem many times due to hanging dns lookups. /etc/resolv.conf may point to a nonexistent nameserver. Comment it out and restart syslogd. Or use a syslog implementation that allows you to disable dns lookups. Or just give the nameserver a kick. -Mike Adler ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] memcached and PostgreSQL
On Wed, Nov 17, 2004 at 09:13:09AM -0800, Darcy Buskermolen wrote: On November 16, 2004 08:00 pm, Michael Adler wrote: http://pugs.postgresql.org/sfpug/archives/21.html I noticed that some of you left coasters were talking about memcached and pgsql. I'm curious to know what was discussed. Have a look at the pdf presentation found on the following site: http://people.freebsd.org/~seanc/pgmemcache/ Thanks for that. That presentation was rather broad and the API seems rather general purpose, but I wonder why you would really want access the cache by way of the DB? If one major point of memcache is to allocate RAM to a low-overhead server instead of to the RDBMS's disk cache, why would you add the overhead of the RDBMS to the process? (this is a bit of straw man, but just trying to flesh-out the pros and cons) Still, it seems like a convenient way to maintain cache coherency, assuming that your application doesn't already have a clean way to do that. (just my uninformed opinion, though...) -Mike ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] memcached and PostgreSQL
http://pugs.postgresql.org/sfpug/archives/21.html I noticed that some of you left coasters were talking about memcached and pgsql. I'm curious to know what was discussed. In reading about memcached, it seems that many people are using it to circumvent the scalability problems of MySQL (lack of MVCC). from their site: snip Shouldn't the database do this? Regardless of what database you use (MS-SQL, Oracle, Postgres, MysQL-InnoDB, etc..), there's a lot of overhead in implementing ACID properties in a RDBMS, especially when disks are involved, which means queries are going to block. For databases that aren't ACID-compliant (like MySQL-MyISAM), that overhead doesn't exist, but reading threads block on the writing threads. memcached never blocks. /snip So What does memcached offer pgsql users? It would still seem to offer the benefit of a multi-machined cache. -Mike ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Excessive context switching on SMP Xeons
On Thu, Oct 07, 2004 at 11:48:41AM -0400, Bill Montgomery wrote: Alan Stange wrote: The same test on a Dell PowerEdge 1750, Dual Xeon 3.2 GHz, 512k cache, HT on, Linux 2.4.21-20.ELsmp (RHEL 3), 4GB memory, pg 7.4.5: Far less performance that the Dual Opterons with a low number of clients, but the gap narrows as the number of clients goes up. Anyone smarter than me care to explain? You'll have to wait for someone smarter than you, but I will posit this: Did you use a tmpfs filesystem like Alan? You didn't mention either way. Alan did that as an attempt remove IO as a variable. -Mike ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] [HACKERS] spurious function execution in prepared statements.
On Thu, Sep 30, 2004 at 09:45:51AM -0400, Merlin Moncure wrote: Now, if the same query is executed as a prepared statement, prepare ps(...) as select f(t.c) from t where [expr] limit 1; execute ps; now, if ps ends up using a index scan on t, everything is ok. However, if ps does a seqscan, f executes for every row on t examined until the [expr] criteria is met. Is this a bug? If necessary I should be able to set up a reproducible example. The easy workaround is to not use prepared statements in these situations, but I need to be able to guarantee that f only executes once (even if that means exploring subqueries). Here's another workaround that may let you use a prepared statement: prepare ps(...) as select f(c) from (select c from t where [expr] limit 1) as t1 -Mike ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Caching of Queries (now with pgpool)
On Thu, Sep 23, 2004 at 09:23:51PM -0400, Jason Coene wrote: I ran some previous queries to get pgpool to pre-establish all the connections, and ab ran for a few minutes (with one query per page, eek!). It was still exhibiting the same problems as before. While so many new connections at once can surely make the problem worse (and pgpool will surely help there), shouldn't this prove that it's not the only issue? Any ideas? Now that your connections are persistent, you may benefit from using PREPAREd queries. -Mike ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Performance over a LAN
On Fri, Jul 23, 2004 at 03:20:54PM +0930, William Carney wrote: But with the server running on one machine and the client running on another, the two machines being connected by a 100 Mb ethernet, with nothing else on the network, this test takes 17 minutes to run. I have tried changing the frequency of COMMIT operations, but with only a small effect. The machines used are P4s running FreeBSD 5.2.1. The Postgres version is 7.4.3. Can anyone tell me why there's such a big difference? Can you reproduce this problem in a tiny test case? If your application is doing other networky things (e.g. many name resolutions that hang for 10 seconds each), they may be slowing down the PostgreSQL work. Just a WAG. -mike ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] string casting for index usage
On Thu, Mar 18, 2004 at 03:39:12PM -0500, Tom Lane wrote: Michael Adler [EMAIL PROTECTED] writes: In porting an application from v7.2 and v7.3, I noticed that a join on a varchar column and a text column was ignoring indices that were helpful in v7.2. When I explicitly cast the text to a varchar (or set ENABLE_SEQSCAN TO false) the index is scanned and it works as efficiently as in v7.2. Maybe you should be moving to 7.4, instead. That's a fair suggestion, but it's not practical for our 75 sites, most without decent network access. If this is in fact addressed in newer releases, then my point is mostly inconsequential. We use Debian stable (7.2.1-2woody4) and Debian testing (7.3.4-9). A desultory test didn't show any difference between 7.2.4 and 7.3.6 in this respect, however. Perhaps you forgot to ANALYZE yet in the new database? I have a test with sample data and queries to demonstrate what I'm seeing. I hope it is useful. Having to do manual casts is not cruel and unusual, but it's not encouraging to see performance go down after an upgrade. If anyone has any clever solutions, let me know. tables, data, and queries: http://www.panix.com/~adler/manual-cast-for-index-scan.sql my test output: http://www.panix.com/~adler/manual-cast-for-index-scan_7.3.4-9.out http://www.panix.com/~adler/manual-cast-for-index-scan_7.2.1-2woody4.out (the times are not horrific in these specific examples, but the sequential scan makes them unscalable). manual-cast-for-index-scan_7.3.4-9.out: DROP TABLE t1; DROP TABLE DROP TABLE t2; DROP TABLE CREATE TABLE t1 ( key_col text, grp text ); CREATE TABLE COPY t1 FROM stdin; CREATE UNIQUE INDEX tempindex1 ON t1 USING btree (key_col); CREATE INDEX CREATE TABLE t2 ( item_num character varying(5), key_col character varying(14) ); CREATE TABLE COPY t2 FROM stdin; CREATE INDEX tempindex2 ON t2 USING btree (key_col); CREATE INDEX VACUUM ANALYZE; VACUUM SELECT version(); PostgreSQL 7.3.4 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc (GCC) 3.3.2 (Debian) EXPLAIN ANALYZE SELECT item_num, t1.key_col FROM t1 LEFT JOIN t2 ON (t2.key_col = t1.key_col) WHERE grp = '24'; Nested Loop (cost=0.00..23803.27 rows=194 width=31) (actual time=20.95..1401.46 rows=69 loops=1) Join Filter: ((inner.key_col)::text = outer.key_col) - Seq Scan on t1 (cost=0.00..492.94 rows=194 width=18) (actual time=0.32..30.27 rows=69 loops=1) Filter: (grp = '24'::text) - Seq Scan on t2 (cost=0.00..66.87 rows=4287 width=13) (actual time=0.01..12.06 rows=4287 loops=69) Total runtime: 1401.73 msec EXPLAIN ANALYZE SELECT item_num, t1.key_col FROM t1 LEFT JOIN t2 ON (t2.key_col::text = t1.key_col) WHERE grp = '24'; Nested Loop (cost=0.00..23803.27 rows=194 width=31) (actual time=20.27..1398.82 rows=69 loops=1) Join Filter: ((inner.key_col)::text = outer.key_col) - Seq Scan on t1 (cost=0.00..492.94 rows=194 width=18) (actual time=0.26..25.91 rows=69 loops=1) Filter: (grp = '24'::text) - Seq Scan on t2 (cost=0.00..66.87 rows=4287 width=13) (actual time=0.01..12.02 rows=4287 loops=69) Total runtime: 1399.08 msec EXPLAIN ANALYZE SELECT item_num, t1.key_col FROM t1 LEFT JOIN t2 ON (t2.key_col = t1.key_col::varchar(24)) WHERE grp = '24'; Nested Loop (cost=0.00..4819.13 rows=194 width=31) (actual time=0.52..27.46 rows=69 loops=1) - Seq Scan on t1 (cost=0.00..492.94 rows=194 width=18) (actual time=0.27..25.94 rows=69 loops=1) Filter: (grp = '24'::text) - Index Scan using tempindex2 on t2 (cost=0.00..22.17 rows=12 width=13) (actual time=0.01..0.01 rows=0 loops=69) Index Cond: (t2.key_col = (outer.key_col)::character varying(24)) Total runtime: 27.70 msec manual-cast-for-index-scan_7.2.1-2woody4.out: DROP TABLE t1; DROP DROP TABLE t2; DROP CREATE TABLE t1 ( key_col text, grp text ); CREATE COPY t1 FROM stdin; CREATE UNIQUE INDEX tempindex1 ON t1 USING btree (key_col); CREATE CREATE TABLE t2 ( item_num character varying(5), key_col character varying(14) ); CREATE COPY t2 FROM stdin; CREATE INDEX tempindex2 ON t2 USING btree (key_col); CREATE VACUUM ANALYZE; VACUUM SELECT version(); PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.4 EXPLAIN ANALYZE SELECT item_num, t1.key_col FROM t1 LEFT JOIN t2 ON (t2.key_col = t1.key_col) WHERE grp = '24'; psql:castedneed.sql:29127: NOTICE: QUERY PLAN: Nested Loop (cost=0.00..1405.88 rows=204 width=32) (actual time=0.46..40.60 rows=69 loops=1) - Seq Scan on t1 (cost=0.00..505.94 rows=204 width=18) (actual time=0.35..39.09 rows=69 loops=1) - Index Scan using tempindex2 on t2 (cost=0.00..4.27 rows=11 width=14) (actual time=0.01..0.01 rows=0 loops=69) Total runtime: 40.81 msec EXPLAIN EXPLAIN ANALYZE SELECT item_num, t1.key_col FROM t1 LEFT JOIN t2 ON (t2.key_col::text = t1.key_col) WHERE grp = '24'; psql:castedneed.sql:29128: NOTICE: QUERY PLAN: Nested Loop (cost=0.00..1405.88 rows=204 width
Re: [PERFORM] inferior SCSI performance
On Wed, 17 Sep 2003, Tom Lane wrote: Michael Adler [EMAIL PROTECTED] writes: I have been experimenting with a new Seagate Cheetah 10k-RPM SCSI to compare with a cheaper Seagate Barracuda 7200-RPM IDE (each in a single-drive configuration). The Cheetah definately dominates the generic IO tests such as bonnie++, but fares poorly with pgbench (and other postgresql operations). It's fairly common for ATA drives to be configured to lie about write completion (ie, claim write-complete as soon as data is accepted into their onboard RAM buffer), whereas SCSI drives usually report write complete only when the data is actually down to disk. The performance differential may thus be coming at the expense of reliability. If you run Postgres with fsync off, does the differential go away? Yes, they both perform equally at about 190 tps with fsync off. The culprit turns out to be write-caching on the IDE drive. It is enabled by default, but can be disabled with hdparm -W0 /dev/hdx. After it is disabled, the tps are proportional to rpms. There's an (2001) Linux thread on this if anyone is interested: http://www.ussg.iu.edu/hypermail/linux/kernel/0103.0/0331.html So the quesiton is whether it is ever sensible to use write-caching and expect comparable persistence. Thanks, Michael Adler ---(end of broadcast)--- TIP 8: explain analyze is your friend