[PERFORM] Samsung 32GB SATA SSD tested
For background, please read the thread Fusion-io ioDrive, archived at http://archives.postgresql.org/pgsql-performance/2008-07/msg00010.php To recap, I tested an ioDrive versus a 6-disk RAID with pgbench on an ordinary PC. I now also have a 32GB Samsung SATA SSD, and I have tested it in the same machine with the same software and configuration. I tested it connected to the NVIDIA CK804 SATA controller on the motherboard, and as a pass-through disk on the Areca RAID controller, with write-back caching enabled. Service Time Percentile, millis R/W TPS R-O TPS 50th 80th 90th 95th RAID 182 673 18 32 42 64 Fusion971 4792 8 9 10 11 SSD+NV442 4399 12 18 36 43 SSD+Areca 252 5937 12 15 17 21 As you can see, there are tradeoffs. The motherboard's ports are substantially faster on the TPC-B type of workload. This little, cheap SSD achieves almost half the performance of the ioDrive (i.e. similar performance to a 50-disk SAS array.) The RAID controller does a better job on the read-only workload, surpassing the ioDrive by 20%. Strangely the RAID controller behaves badly on the TPC-B workload. It is faster than disk, but not by a lot, and it's much slower than the other flash configurations. The read/write benchmark did not vary when changing the number of clients between 1 and 8. I suspect this is some kind of problem with Areca's kernel driver or firmware. On the bright side, the Samsung+Areca configuration offers excellent service time distribution, comparable to that achieved by the ioDrive. Using the motherboard's SATA ports gave service times comparable to the disk RAID. The performance is respectable for a $400 device. You get about half the tps and half the capacity of the ioDrive, but for one fifth the price and in the much more convenient SATA form factor. Your faithful investigator, jwb -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Storing Digital Video
On Tue, 2006-01-31 at 16:32 -0800, Rodrigo Madera wrote: I am concerned with performance issues involving the storage of DV on a database. I though of some options, which would be the most advised for speed? 1) Pack N frames inside a container and store the container to the db. 2) Store each frame in a separate record in the table frames. 3) (type something here) How about some more color? _Why_, for example, would you store video in a relational database? -jwb ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Huge Data sets, simple queries
On Tue, 2006-01-31 at 21:53 -0800, Luke Lonergan wrote: Jeffrey, On 1/31/06 8:09 PM, Jeffrey W. Baker [EMAIL PROTECTED] wrote: ... Prove it. I think I've proved my point. Software RAID1 read balancing provides 0%, 300%, 100%, and 100% speedup on 1, 2, 4, and 8 threads, respectively. In the presence of random I/O, the results are even better. Anyone who thinks they have a single-threaded workload has not yet encountered the autovacuum daemon. Good data - interesting case. I presume from your results that you had to make the I/Os non-overlapping (the skip option to dd) in order to get the concurrent access to work. Why the particular choice of offset - 3.2GB in this case? No particular reason. 8k x 10 is what the last guy used upthread. So - the bandwidth doubles in specific circumstances under concurrent workloads - not relevant to Huge Data sets, simple queries, but possibly helpful for certain kinds of OLTP applications. Ah, but someday Pg will be able to concurrently read from two datastreams to complete a single query. And that day will be glorious and fine, and you'll want as much disk concurrency as you can get your hands on. -jwb ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index Usage using IN
On Thu, 2006-02-02 at 09:12 +1300, Ralph Mason wrote: Hi, I have 2 tables both have an index on ID (both ID columns are an oid). I want to find only only rows in one and not the other. Select ID from TableA where ID not IN ( Select ID from Table B) Have you considered this: SELECT ID from TableA EXCEPT Select ID from Table B ? -jwb ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index Usage using IN
On Wed, 2006-02-01 at 12:22 -0800, Jeffrey W. Baker wrote: On Thu, 2006-02-02 at 09:12 +1300, Ralph Mason wrote: Hi, I have 2 tables both have an index on ID (both ID columns are an oid). I want to find only only rows in one and not the other. Select ID from TableA where ID not IN ( Select ID from Table B) Have you considered this: SELECT ID from TableA EXCEPT Select ID from Table B Alternately: SELECT a.ID FROM TableA AS a LEFT JOIN TableB AS b ON a.ID = b.ID WHERE b.ID IS NULL -jwb ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Huge Data sets, simple queries
On Sun, 2006-01-29 at 13:44 -0500, Luke Lonergan wrote: Depesz, [mailto:[EMAIL PROTECTED] On Behalf Of hubert depesz lubaczewski Sent: Sunday, January 29, 2006 3:25 AM hmm .. do i understand correctly that you're suggesting that using raid 10 and/or hardware raid adapter might hurt disc subsystem performance? could you elaborate on the reasons, please? it's not that i'm against the idea - i'm just curious as this is very against-common-sense. and i always found it interesting when somebody states something that uncommon... Oh - and about RAID 10 - for large data work it's more often a waste of disk performance-wise compared to RAID 5 these days. RAID5 will almost double the performance on a reasonable number of drives. I think you might want to be more specific here. I would agree with you for data warehousing, decision support, data mining, and similar read-mostly non-transactional loads. For transactional loads RAID-5 is, generally speaking, a disaster due to the read-before-write problem. While we're on the topic, I just installed another one of those Areca ARC-1130 controllers with 1GB cache. It's ludicrously fast: 250MB/sec burst writes, CPU-limited reads. I can't recommend them highly enough. -jwb PS: Could you look into fixing your mailer? Your messages sometimes don't contain In-Reply-To headers, and therefore don't thread properly. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Huge Data sets, simple queries
On Sat, 2006-01-28 at 10:55 -0500, Tom Lane wrote: Assuming that month means what it sounds like, the above would result in running twelve parallel sort/uniq operations, one for each month grouping, to eliminate duplicates before counting. You've got sortmem set high enough to blow out RAM in that scenario ... Hrmm, why is it that with a similar query I get a far simpler plan than you describe, and relatively snappy runtime? select date , count(1) as nads , sum(case when premium then 1 else 0 end) as npremium , count(distinct(keyword)) as nwords , count(distinct(advertiser)) as nadvertisers from data group by date order by date asc QUERY PLAN --- GroupAggregate (cost=0.00..14452743.09 rows=721 width=13) - Index Scan using data_date_idx on data (cost=0.00..9075144.27 rows=430206752 width=13) (2 rows) =# show server_version; server_version 8.1.2 (1 row) -jwb ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Huge Data sets, simple queries
On Fri, 2006-01-27 at 20:23 -0500, Mike Biamonte wrote: Does anyone have any experience with extremely large data sets? I'm mean hundreds of millions of rows. Sure, I think more than a few of us do. Just today I built a summary table from a 25GB primary table with ~430 million rows. This took about 45 minutes. The queries I need to run on my 200 million transactions are relatively simple: select month, count(distinct(cardnum)) count(*), sum(amount) from transactions group by month; This query took 18 hours on PG 8.1 on a Dual Xeon, RHEL3, (2.4 Kernel) with RAID-10 (15K drives) and 12 GB Ram. I was expecting it to take about 4 hours - based on some experience with a similar dataset on a different machine (RH9, PG7.3 Dual Xeon, 4GB RAM, Raid-5 10K drives) Possibly the latter machine has a faster I/O subsystem. How large is the table on disk? This machine is COMPLETELY devoted to running these relatively simple queries one at a time. (No multi-user support needed!)I've been tooling with the various performance settings: effective_cache at 5GB, shared_buffers at 2 GB, workmem, sortmem at 1 GB each. ( Shared buffers puzzles me a it bit - my instinct says to set it as high as possible, but everything I read says that too high can hurt performance.) Any ideas for performance tweaking in this kind of application would be greatly appreciated. We've got indexes on the fields being grouped, and always vacuum analzye after building them. Probably vacuum makes no difference. It's difficult to just try various ideas because each attempt takes a full day to test. Real experience is needed here! Can you send us an EXPLAIN of the query? I believe what you're seeing here is probably: Aggregate +-Sort +-Sequential Scan or perhaps: Aggregate +-Index Scan I have a feeling that the latter will be much faster. If your table has been created over time, then it is probably naturally ordered by date, and therefore also ordered by month. You might expect a Sequential Scan to be the fastest, but the Sort step will be a killer. On the other hand, if your table is badly disordered by date, the Index Scan could also be very slow. Anyway, send us the query plan and also perhaps a sample of vmstat during the query. For what it's worth, I have: effective_cache_size| 70 cpu_tuple_cost | 0.01 cpu_index_tuple_cost| 0.001 random_page_cost| 3 shared_buffers | 5 temp_buffers| 1000 work_mem| 1048576 = for this query only And here's a few lines from vmstat during the query: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 2 1 76 43476 94916 765514800 78800 0 1662 788 68 12 0 20 1 1 76 45060 91196 765808800 78028 0 1639 712 71 11 0 19 2 0 76 44668 87624 766296000 7892452 1650 736 69 12 0 19 2 0 76 45300 83672 766743200 8353616 1688 768 71 12 0 18 1 1 76 45744 80652 767071200 84052 0 1691 796 70 12 0 17 That's about 80MB/sec sequential input, for comparison purposes. -jwb ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Process executing COPY opens and reads every table on the system
I have an instance of PG 7.4 where I would really like to execute some schema changes, but every schema change is blocked waiting for a process doing a COPY. That query is: COPY drill.trades (manager, sec_id, ticker, bridge_tkr, date, type, short, quantity, price, prin, net_money, factor) TO stdout; So it's only involved with a single table in a single schema. Unfortunately, what this process is doing is opening and reading every table in the database: # strace -e open,close -p 29859 Process 29859 attached - interrupt to quit open(/var/lib/postgres/data/base/7932340/2442094542, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.1, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.3, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.5, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.5, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.3, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.6, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.9, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.3, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2429205386, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2429205433, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2429205441, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2426495316, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.3, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.10, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.16, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2023517557.9, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/2298808676/2298808939.10, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/358185104.15, O_RDWR) = 49 close(49) = 0 open(/var/lib/postgres/data/base/7932340/2414561511, O_RDWR) = 49 close(49) = 0
[PERFORM] Invulnerable VACUUM process thrashing everything
A few WEEKS ago, the autovacuum on my instance of pg 7.4 unilaterally decided to VACUUM a table which has not been updated in over a year and is more than one terabyte on the disk. Because of the very high transaction load on this database, this VACUUM has been ruining performance for almost a month. Unfortunately is seems invulnerable to killing by signals: # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -HUP 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -INT 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -PIPE 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM o/~ But the cat came back, the very next day ... I assume that if I kill this with SIGKILL, that will bring down every other postgres process, so that should be avoided. But surely there is a way to interrupt this. If I had some reason to shut down the instance, I'd be screwed, it seems. -jwb ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Invulnerable VACUUM process thrashing everything
On Thu, 2005-12-29 at 22:53 +, Russ Garrett wrote: In my experience a kill -9 has never resulted in any data loss in this situation (it will cause postgres to detect that the process died, shut down, then recover), and most of the time it only causes a 5-10sec outage. I'd definitely hesitate to recommend it in a production context though, especially since I think there are some known race-condition bugs in 7.4. VACUUM *will* respond to a SIGTERM, but it doesn't check very often - I've often had to wait hours for it to determine that it's been killed, and my tables aren't anywhere near 1TB. Maybe this is a place where things could be improved... FWIW, I murdered this process with SIGKILL, and the recovery was very short. Incidentally, I have to kill -9 some of our MySQL instances quite regularly because they do odd things. Not something you want to be doing, especially when MySQL takes 30mins to recover. Agreed. After mysql shutdown with MyISAM, all tables must be checked and usually many need to be repaired. This takes a reallly long time. -jwb Russ Garrett Last.fm Ltd. [EMAIL PROTECTED] Ron wrote: Ick. Can you get users and foreign connections off that machine, lock them out for some period, and renice the VACUUM? Shedding load and keeping it off while VACUUM runs high priority might allow it to finish in a reasonable amount of time. Or Shedding load and dropping the VACUUM priority might allow a kill signal to get through. Hope this helps, Ron At 05:09 PM 12/29/2005, Jeffrey W. Baker wrote: A few WEEKS ago, the autovacuum on my instance of pg 7.4 unilaterally decided to VACUUM a table which has not been updated in over a year and is more than one terabyte on the disk. Because of the very high transaction load on this database, this VACUUM has been ruining performance for almost a month. Unfortunately is seems invulnerable to killing by signals: # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -HUP 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -INT 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM # kill -PIPE 15308 # ps ax | grep VACUUM 15308 ?D588:00 postgres: postgres skunk [local] VACUUM o/~ But the cat came back, the very next day ... I assume that if I kill this with SIGKILL, that will bring down every other postgres process, so that should be avoided. But surely there is a way to interrupt this. If I had some reason to shut down the instance, I'd be screwed, it seems. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] opinion on disk speed
On Thu, 2005-12-08 at 11:52 -0500, Vivek Khera wrote: I have a choice to make on a RAID enclosure: 14x 36GB 15kRPM ultra 320 SCSI drives OR 12x 72GB 10kRPM ultra 320 SCSI drives both would be configured into RAID 10 over two SCSI channels using a megaraid 320-2x card. My goal is speed. Either would provide more disk space than I would need over the next two years. The database does a good number of write transactions, and a decent number of sequential scans over the whole DB (about 60GB including indexes) for large reports. The STR of 15k is quite a bit higher than 10k. I'd be inclined toward the 15k if it doesn't impact the budget. For the write transactions, the speed and size of the DIMM on that LSI card will matter the most. I believe the max memory on that adapter is 512MB. These cost so little that it wouldn't make sense to go with anything smaller. When comparing the two disks, don't forget to check for supported SCSI features. In the past I've been surprised that some 10k disks don't support packetization, QAS, and so forth. All 15k disks seem to support these. Don't forget to post some benchmarks when your vendor delivers ;) -jwb ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] A Better External Sort?
On Wed, 2005-10-05 at 12:14 -0400, Ron Peacetree wrote: I've now gotten verification from multiple working DBA's that DB2, Oracle, and SQL Server can achieve ~250MBps ASTR (with as much as ~500MBps ASTR in setups akin to Oracle RAC) when attached to a decent (not outrageous, but decent) HD subsystem... I've not yet had any RW DBA verify Jeff Baker's supposition that ~1GBps ASTR is attainable. Cache based bursts that high, yes. ASTR, no. I find your tone annoying. That you do not have access to this level of hardware proves nothing, other than pointing out that your repeated emails on this list are based on supposition. If you want 1GB/sec STR you need: 1) 1 or more Itanium CPUs 2) 24 or more disks 3) 2 or more SATA controllers 4) Linux Have fun. -jwb ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Ultra-cheap NVRAM device
On Mon, 2005-10-03 at 11:15 -0600, Dan Harris wrote: On Oct 3, 2005, at 5:02 AM, Steinar H. Gunderson wrote: I thought this might be interesting, not the least due to the extremely low price ($150 + the price of regular DIMMs): This has been posted before, and the main reason nobody got very excited is that: a) it only uses the PCI bus to provide power to the device, not for I/O b) It is limited to SATA bandwidth c) The benchmarks did not prove it to be noticeably faster than a good single SATA drive A few of us were really excited at first too, until seeing the benchmarks.. Also, no ECC support. You'd be crazy to use it for anything. -jwb ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [PERFORM] A Better External Sort?
On Thu, 2005-09-29 at 10:06 -0700, Luke Lonergan wrote: Josh, On 9/29/05 9:54 AM, Josh Berkus josh@agliodbs.com wrote: Following an index creation, we see that 95% of the time required is the external sort, which averages 2mb/s. This is with seperate drives for the WAL, the pg_tmp, the table and the index. I've confirmed that increasing work_mem beyond a small minimum (around 128mb) had no benefit on the overall index creation speed. Yp! That about sums it up - regardless of taking 1 or 2 passes through the heap being sorted, 1.5 - 2 MB/s is the wrong number. Yeah this is really bad ... approximately the speed of GNU sort. Josh, do you happen to know how many passes are needed in the multiphase merge on your 60GB table? Looking through tuplesort.c, I have a couple of initial ideas. Are we allowed to fork here? That would open up the possibility of using the CPU and the I/O in parallel. I see that tuplesort.c also suffers from the kind of postgresql-wide disease of calling all the way up and down a big stack of software for each tuple individually. Perhaps it could be changed to work on vectors. I think the largest speedup will be to dump the multiphase merge and merge all tapes in one pass, no matter how large M. Currently M is capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes over the tape. It could be done in a single pass heap merge with N*log(M) comparisons, and, more importantly, far less input and output. I would also recommend using an external processes to asynchronously feed the tuples into the heap during the merge. What's the timeframe for 8.2? -jwb ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] A Better External Sort?
On Tue, 2005-09-27 at 13:15 -0400, Ron Peacetree wrote: That Btree can be used to generate a physical reordering of the data in one pass, but that's the weakest use for it. The more powerful uses involve allowing the Btree to persist and using it for more efficient re-searches or combining it with other such Btrees (either as a step in task distribution across multiple CPUs or as a more efficient way to do things like joins by manipulating these Btrees rather than the actual records.) Maybe you could describe some concrete use cases. I can see what you are getting at, and I can imagine some advantageous uses, but I'd like to know what you are thinking. Specifically I'd like to see some cases where this would beat sequential scan. I'm thinking that in your example of a terabyte table with a column having only two values, all the queries I can think of would be better served with a sequential scan. Perhaps I believe this because you can now buy as much sequential I/O as you want. Random I/O is the only real savings. -jwb ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Battery Backed Cache for RAID
On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote: I'm getting a new server for our database, and I have a quick question about RAID controllers with a battery backed cache. I understand that the cache will allow the cache to be written out if the power fails to the box, which allows it to report a write as committed safely when it's not actually committed. Actually the cache will just hold its contents while the power is out. When the power is restored, the RAID controller will complete the writes to disk. If the battery does not last through the outage, the data is lost. My question is, if the power goes off, and the drives stop, how does the battery backed cache save things out to the dead drives? Is there another component that is implied that will provide power to the drives that I should be looking into as well? A UPS would allow you to do an orderly shutdown and write contents to disk during a power failure. However a UPS can be an extra point of failure. -jwb ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Observation about db response time
On Tue, 2005-08-30 at 08:13 -0500, Frank Wiles wrote: On Tue, 30 Aug 2005 18:35:30 +0530 Akshay Mathur [EMAIL PROTECTED] wrote: Hello Friends, We were having a database in pgsql7.4.2 The database was responding very slowly even after full vacuum analyze (select count(*) from some_table_having_18000_records was taking 18 Sec). We took a backup of that db and restored it back. Now the same db on same PC is responding fast (same query is taking 18 ms). But we can't do the same as a solution of slow response. Do anybody has faced similar problem? Is this due to any internal problem of pgsql? Is there any clue to fasten the database? This could be because you don't have max_fsm_pages and max_fsm_relations setup correctly or are not doing full vacuums often enough. If your database deletes a ton of data as a matter of course then sometimes a full vacuum will not clear up as much space as it could. Try increasing those configuration values and doing vacuums more often. If you should also explore upgrading to the latest 8.0 as you will no doubt see noticeable speed improvements. This can also be caused by index bloat. VACUUM does not clear out the index. You must use REINDEX for that. -jwb ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Limit + group + join
On Thu, 2005-08-25 at 18:56 -0700, Jeffrey W. Baker wrote: On Fri, 2005-08-26 at 02:27 +0200, Tobias Brox wrote: Consider this setup - which is a gross simplification of parts of our production system ;-) create table c (id integer primary key); create table b (id integer primary key, c_id integer); create index b_on_c on b(c_id) insert into c (select ... lots of IDs ...); insert into b (select id, id from c); /* keep it simple :-) */ Now, I'm just interessted in some few rows. All those gives good plans: explain select c.id from c order by c.id limit 1; explain select c.id from c group by c.id order by c.id limit 1; explain select c.id from c join b on c_id=c.id order by c.id limit 1; ... BUT ... combining join, group and limit makes havoc: explain select c.id from c join b on c_id=c.id group by c.id order by c.id desc limit 5; Where's b in this join clause? It looks like a cartesian product to me. Nevermind. I read c_id as c.id. -jwb ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Read/Write block sizes
On Wed, 2005-08-24 at 01:56 -0400, Tom Lane wrote: Jeffrey W. Baker [EMAIL PROTECTED] writes: On Wed, 2005-08-24 at 17:20 +1200, Guy Thornley wrote: Dont forget that already in postgres, you have a process per connection, and all the processes take care of their own I/O. That's the problem. Instead you want 1 or 4 or 10 i/o slaves coordinating the I/O of all the backends optimally. For instance, with synchronous scanning. And why exactly are we going to do a better job of I/O scheduling than the OS itself can do? ... There are some things we could do to reduce the impedance between us and the OS --- for instance, the upthread criticism that a seqscan asks the OS for only 8K at a time is fair enough. But that doesn't translate to a conclusion that we should schedule the I/O instead of the OS. Synchronous scanning is a fairly huge and obvious win. If you have two processes 180 degrees out-of-phase in a linear read, neither process is going to get anywhere near the throughput they would get from a single scan. I think you're being deliberately obtuse with regards to file I/O and the operating system. The OS isn't magical. It has to strike a balance between a reasonable read latency and a reasonable throughput. As far as the kernel is concerned, a busy postgresql server is indistinguishable from 100 unrelated activities. All backends will be served equally, even if in this case equally means quite badly all around. An I/O slave process could be a big win in Postgres for many kinds of reads. Instead of opening and reading files the backends would connect to the I/O slave and request the file be read. If a scan of that file were already underway, the new backends would be attached. Otherwise a new scan would commence. In either case, the slave process can issue (sometimes non-dependant) reads well ahead of the needs of the backend. You may think the OS can do this for you but it can't. On postgres knows that it needs the whole file from beginning to end. The OS can only guess. Ask me sometime about my replacement for GNU sort. It uses the same sorting algorithm, but it's an order of magnitude faster due to better I/O strategy. Someday, in my infinite spare time, I hope to demonstrate that kind of improvement with a patch to pg. -jwb ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Read/Write block sizes (Was: Caching by Postgres)
On Tue, 2005-08-23 at 19:12 -0400, Michael Stone wrote: On Tue, Aug 23, 2005 at 05:29:01PM -0400, Jignesh Shah wrote: Actually some of that readaheads,etc the OS does already if it does some sort of throttling/clubbing of reads/writes. Note that I specified the fully cached case--even with the workload in RAM the system still has to process a heck of a lot of read calls. * Introduce a multiblock or extent tunable variable where you can define a multiple of 8K (or BlockSize tuneable) to read a bigger chunk and store it in the bufferpool.. (Maybe writes too) (Most devices now support upto 1MB chunks for reads and writes) Yeah. The problem with relying on OS readahead is that the OS doesn't know whether you're doing a sequential scan or an index scan; if you have the OS agressively readahead you'll kill your seek performance. OTOH, if you don't do readaheads you'll kill your sequential scan performance. At the app level you know which makes sense for each operation. This is why we have MADVISE_RANDOM and MADVISE_SEQUENTIAL. -jwb ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Read/Write block sizes
On Tue, 2005-08-23 at 19:31 -0700, Josh Berkus wrote: Steve, I would assume that dbt2 with STP helps minimize the amount of hours someone has to invest to determine performance gains with configurable options? Actually, these I/O operation issues show up mainly with DW workloads, so the STP isn't much use there. If I can ever get some of these machines back from the build people, I'd like to start testing some stuff. One issue with testing this is that currently PostgreSQL doesn't support block sizes above 128K. We've already done testing on that (well, Mark has) and the performance gains aren't even worth the hassle of remembering you're on a different block size (like, +4%). What the Sun people have done with other DB systems is show that substantial performance gains are possible on large databases (100G) using block sizes of 1MB. I believe that's possible (and that it probably makes more of a difference on Solaris than on BSD) but we can't test it without some hackery first. To get decent I/O you need 1MB fundamental units all the way down the stack. You need a filesystem that can take a 1MB write well, and you need an I/O scheduler that will keep it together, and you need a storage controller that can eat a 1MB request at once. Ideally you'd like an architecture with a 1MB page (Itanium has this, and AMD64 Linux will soon have this.) The Lustre people have done some work in this area, opening up the datapaths in the kernel so they can keep the hardware really working. They even modified the QLogic SCSI/FC driver so it supports such large transfers. Their work has shown that you can get significant perf boost on Linux just by thinking in terms of larger transfers. Unfortunately I'm really afraid that this conversation is about trees when the forest is the problem. PostgreSQL doesn't even have an async reader, which is the sort of thing that could double or triple its performance. You're talking about block sizes and such, but the kinds of improvements you can get there are in the tens of percents at most. -jwb ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Read/Write block sizes
On Wed, 2005-08-24 at 17:20 +1200, Guy Thornley wrote: As for the async IO, sure you might think 'oh async IO would be so cool!!' and I did, once, too. But then I sat down and _thought_ about it, and decided well, no, actually, theres _very_ few areas it could actually help, and in most cases it just make it easier to drive your box into lseek() induced IO collapse. Dont forget that already in postgres, you have a process per connection, and all the processes take care of their own I/O. That's the problem. Instead you want 1 or 4 or 10 i/o slaves coordinating the I/O of all the backends optimally. For instance, with synchronous scanning. -jwb ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Query plan looks OK, but slow I/O - settings advice?
On Tue, 2005-08-16 at 10:46 -0700, Roger Hand wrote: The disks are ext3 with journalling type of ordered, but this was later changed to writeback with no apparent change in speed. They're on a Dell poweredge 6650 with LSI raid card, setup as follows: 4 disks raid 10 for indexes (145GB) - sdc1 6 disks raid 10 for data (220GB) - sdd1 2 mirrored disks for logs - sdb1 stripe size is 32k cache policy: cached io (am told the controller has bbu) write policy: write-back read policy: readahead I assume you are using Linux 2.6. Have you considered booting your machine with elevator=deadline? You can also change this at runtime using sysfs. These read speeds are not too impressive. Perhaps this is a slow controller. Alternately you might need bigger CPUs. There's a lot of possibilities, obviously :) I'd start with the elevator, since that's easily tested. -jwb ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Need for speed
On Wed, 2005-08-17 at 11:15 +0200, Ulrich Wisser wrote: Hello, thanks for all your suggestions. I can see that the Linux system is 90% waiting for disc io. At that time all my queries are *very* slow. My scsi raid controller and disc are already the fastest available. What RAID controller? Initially you said you have only 2 disks, and since you have your xlog on a separate spindle, I assume you have 1 disk for the xlog and 1 for the data. Even so, if you have a RAID, I'm going to further assume you are using RAID 1, since no sane person would use RAID 0. In those cases you are getting the performance of a single disk, which is never going to be very impressive. You need a RAID. Please be more precise when describing your system to this list. -jwb ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Need for speed
On Tue, 2005-08-16 at 17:39 +0200, Ulrich Wisser wrote: Hello, one of our services is click counting for on line advertising. We do this by importing Apache log files every five minutes. This results in a lot of insert and delete statements. At the same time our customers shall be able to do on line reporting. We have a box with Linux Fedora Core 3, Postgres 7.4.2 Intel(R) Pentium(R) 4 CPU 2.40GHz This is not a good CPU for this workload. Try an Opteron or Xeon. Also of major importance is the amount of memory. If possible, you would like to have memory larger than the size of your database. 2 scsi 76GB disks (15.000RPM, 2ms) If you decide your application is I/O bound, here's an obvious place for improvement. More disks == faster. I did put pg_xlog on another file system on other discs. Did that have a beneficial effect? Still when several users are on line the reporting gets very slow. Queries can take more then 2 min. Is this all the time or only during the insert? I need some ideas how to improve performance in some orders of magnitude. I already thought of a box with the whole database on a ram disc. So really any idea is welcome. You don't need a RAM disk, just a lot of RAM. Your operating system will cache disk contents in memory if possible. You have a very small configuration, so more CPU, more memory, and especially more disks will probably all yield improvements. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [SPAM?] Re: [PERFORM] PG8 Tuning
On Fri, 2005-08-12 at 08:47 +, Steve Poe wrote: Paul, Before I say anything else, one online document which may be of assistance to you is: http://www.powerpostgresql.com/PerfList/ Some thoughts I have: 3) You're shared RAM setting seems overkill to me. Part of the challenge is you're going from 1000 to 262K with no assessment in between. Each situation can be different, but try in the range of 10 - 50K. 4) pg_xlog: If you're pg_xlog is on a spindle is *only* for pg_xlog you're better off. Like Mr. Stone said earlier, this is pure dogma. In my experience, xlogs on the same volume with data is much faster if both are on battery-backed write-back RAID controller memory. Moving from this situation to xlogs on a single normal disk is going to be much slower in most cases. -jwb ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance problems on 4/8way Opteron (dualcore) HP
On Fri, 2005-07-29 at 10:46 -0700, Josh Berkus wrote: Dirk, does anybody have expierence with this machine (4x 875 dual core Opteron CPUs)? I'm using dual 275s without problems. Nope. I suspect that you may be the first person to report in on dual-cores. There may be special compile issues with dual-cores that we've not yet encountered. Doubtful. However you could see improvements using recent Linux kernel code. There have been some patches for optimizing scheduling and memory allocations. However, if you are running this machine in 32-bit mode, why did you bother paying $14,000 for your CPUs? You will get FAR better performance in 64-bit mode. 64-bit mode will give you 30-50% better performance on PostgreSQL loads, in my experience. Also, if I remember correctly, the 32-bit x86 kernel doesn't understand Opteron NUMA topology, so you may be seeing poor memory allocation decisions. -jwb We run RHEL 3.0, 32bit and under high load it is a drag. We mostly run memory demanding queries. Context switches are pretty much around 20.000 on the average, no cs spikes when we run many processes in parallel. Actually we only see two processes in running state! When there are only a few processes running context switches go much higher. At the moment we are much slower that with a 4way XEON box (DL580). Um, that was a bit incoherent. Are you seeing a CS storm or aren't you? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Cheap RAM disk?
On Tue, 2005-07-26 at 11:34 -0500, John A Meinel wrote: I saw a review of a relatively inexpensive RAM disk over at anandtech.com, the Gigabyte i-RAM http://www.anandtech.com/storage/showdoc.aspx?i=2480 Basically, it is a PCI card, which takes standard DDR RAM, and has a SATA port on it, so that to the system, it looks like a normal SATA drive. The card costs about $100-150, and you fill it with your own ram, so for a 4GB (max size) disk, it costs around $500. Looking for solid state storage devices, the cheapest I found was around $5k for 2GB. Gigabyte claims that the battery backup can last up to 16h, which seems decent, if not really long (the $5k solution has a built-in harddrive so that if the power goes out, it uses the battery power to copy the ramdisk onto the harddrive for more permanent storage). Anyway, would something like this be reasonable as a drive for storing pg_xlog? With 4GB you could have as many as 256 checkpoint segments. I haven't tried this product, but the microbenchmarks seem truly slow. I think you would get a similar benefit by simply sticking a 1GB or 2GB DIMM -- battery-backed, of course -- in your RAID controller. -jwb ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Huge performance problem between 7.4.1 and 8.0.3 -
On Sun, 2005-07-17 at 21:34 -0600, Robert Creager wrote: Sigh... I recently upgraded from 7.4.1 to 8.0.3. The application did not change. I'm now running both database concurrently (on different ports, same machine) just so I could verify the problem really exists. The application is a custom test application for testing mechanical systems. The runs in question (4 at a time) each generate 16 queries at a time of which the results are sent to the mechanical system which processes the request, which processes them anywhere from 10 to 120 seconds. The system is capable of completing between 4 and 8 jobs at once. So, once the system is running, at most there will be 8 queries per run simultaneously. The entire database fits into RAM (2Gb), as evidenced by no disk activity and relatively small database size. pg_xlog is on different disks from the db. The problem is that on version 8.0.3, once I get 3 or more concurrent runs going, the query times start tanking (20 seconds). On 7.4.1, the applications hum along with queries typically below .2 seconds on over 5 concurrent runs. Needless to say, 7.4.1 behaves as expected... The only change between runs is the port connecting to. Bot DB's are up at the same time. For 8.03, pg_autovacuum is running. On 7.4.1, I set up a cron job to vacuum analyze every 5 minutes. The system is Mandrake Linux running 2.4.22 kernel with dual Intel Xenon CPU with HT enabled. On an 803 run, the context switching is up around 60k. On 7.4.1, it maxes around 23k and averages 1k. Did you build 8.0.3 yourself, or install it from packages? I've seen in the past where pg would build with the wrong kind of mutexes on some machines, and that would send the CS through the roof. If you did build it yourself, check your ./configure logs. If not, try strace. -jwb ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] more filesystem benchmarks
In our last installment, we saw that JFS provides higher pgbench performance than either XFS or ext3. Using a direct-I/O patch stolen from 8.1, JFS achieved 105 tps with 100 clients. To refresh, the machine in question has 5 7200RPM SATA disks, an Areca RAID controller with 128MB cache, and 1GB of main memory. pgbench is being run with a scale factor of 1000 and 10 total transactions. At the suggestion of Andreas Dilger of clusterfs, I tried modulating the size of the ext3 journal, and the mount options (data=journal, writeback, and ordered). I turns out that you can achieve a substantial improvement (almost 50%) by simply mounting the ext3 volume with data=writeback instead of data=ordered (the default). Changing the journal size did not seem to make a difference, except that 256MB is for some reason pathological (9% slower than the best time). 128MB, the default for a large volume, gave the same performance as 400MB (the max) or 32MB. In the end, the ext3 volume mounted with -o noatime,data=writeback yielded 88 tps with 100 clients. This is about 16% off the performance of JFS with default options. Andreas pointed me to experimental patches to ext3's block allocation code and writeback strategy. I will test these, but I expect the database community, which seems so attached to its data, will be very interested in code that has not yet entered mainstream use. Another frequent suggestion is to put the xlog on a separate device. I tried this, and, for a given number of disks, it appears to be counter-productive. A RAID5 of 5 disks holding both logs and data is about 15% faster than a RAID5 of 3 disks with the data, and a mirror of two disks holding the xlog. Here are the pgbench results for each permutation of ext3: Journal Size | Journal Mode | 1 Client | 10 Clients | 100 Clients -- 32 ordered28 51 57 32 writeback 34 70 88 64 ordered29 52 61 64 writeback 32 69 87 128ordered32 54 62 128writeback 34 70 88 256ordered28 51 60 256writeback 29 64 79 400ordered26 49 59 400writeback 32 70 87 -jwb ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Really bad diskio
On Fri, 2005-07-15 at 15:04 -0600, Ron Wills wrote: At Fri, 15 Jul 2005 13:45:07 -0700, Joshua D. Drake wrote: Ron Wills wrote: Hello all I'm running a postgres 7.4.5, on a dual 2.4Ghz Athlon, 1Gig RAM and an 3Ware SATA raid. 2 drives? 4 drives? 8 drives? 3 drives raid 5. I don't believe it's the raid. I've tested this by moving the database to the mirrors software raid where the root is found and onto the the SATA raid. Neither relieved the IO problems. Hard or soft RAID? Which controller? Many of the 3Ware controllers (85xx and 95xx) have extremely bad RAID 5 performance. Did you take any pgbench or other benchmark figures before you started using the DB? -jwb ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Really bad diskio
On Fri, 2005-07-15 at 15:29 -0600, Ron Wills wrote: Here's a bit of a dump of the system that should be useful. Processors x2: vendor_id : AuthenticAMD cpu family : 6 model : 8 model name : AMD Athlon(tm) MP 2400+ stepping: 1 cpu MHz : 2000.474 cache size : 256 KB MemTotal: 903804 kB Mandrake 10.0 Linux kernel 2.6.3-19mdk The raid controller, which is using the hardware raid configuration: 3ware 9000 Storage Controller device driver for Linux v2.26.02.001. scsi0 : 3ware 9000 Storage Controller 3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xe802, IRQ: 17. 3w-9xxx: scsi0: Firmware FE9X 2.02.00.011, BIOS BE9X 2.02.01.037, Ports: 4. Vendor: 3ware Model: Logical Disk 00 Rev: 1.00 Type: Direct-Access ANSI SCSI revision: 00 SCSI device sda: 624955392 512-byte hdwr sectors (319977 MB) SCSI device sda: drive cache: write back, no read (daft) This is also on a 3.6 reiser filesystem. Here's the iostat for 10mins every 10secs. I've removed the stats from the idle drives to reduce the size of this email. Linux 2.6.3-19mdksmp (photo_server) 07/15/2005 avg-cpu: %user %nice%sys %iowait %idle 2.851.532.15 39.52 53.95 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 82.49 4501.73 188.38 1818836580 76110154 avg-cpu: %user %nice%sys %iowait %idle 0.300.001.00 96.302.40 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 87.80 6159.20 340.00 61592 3400 These I/O numbers are not so horrible, really. 100% iowait is not necessarily a symptom of misconfiguration. It just means you are disk limited. With a database 20 times larger than main memory, this is no surprise. If I had to speculate about the best way to improve your performance, I would say: 1a) Get a better RAID controller. The 3ware hardware RAID5 is very bad. 1b) Get more disks. 2) Get a (much) newer kernel. 3) Try XFS or JFS. Reiser3 has never looked good in my pgbench runs By the way, are you experiencing bad application performance, or are you just unhappy with the iostat figures? Regards, jwb ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] JFS fastest filesystem for PostgreSQL?
[reposted due to delivery error -jwb] I just took delivery of a new system, and used the opportunity to benchmark postgresql 8.0 performance on various filesystems. The system in question runs Linux 2.6.12, has one CPU and 1GB of system memory, and 5 7200RPM SATA disks attached to an Areca hardware RAID controller having 128MB of cache. The caches are all write-back. I ran pgbench with a scale factor of 1000 and a total of 100,000 transactions per run. I varied the number of clients between 10 and 100. It appears from my test JFS is much faster than both ext3 and XFS for this workload. JFS and XFS were made with the mkfs defaults. ext3 was made with -T largefile4 and -E stride=32. The deadline scheduler was used for all runs (anticipatory scheduler is much worse). Here's the result, in transactions per second. ext3 jfs xfs - 10 Clients 55 81 68 100 Clients 61 100 64 -jwb ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] JFS fastest filesystem for PostgreSQL?
On Thu, 2005-07-14 at 10:03 +0200, Dawid Kuroczko wrote: On 7/14/05, Jeffrey W. Baker [EMAIL PROTECTED] wrote: [reposted due to delivery error -jwb] I just took delivery of a new system, and used the opportunity to benchmark postgresql 8.0 performance on various filesystems. The system in question runs Linux 2.6.12, has one CPU and 1GB of system memory, and 5 7200RPM SATA disks attached to an Areca hardware RAID controller having 128MB of cache. The caches are all write-back. I ran pgbench with a scale factor of 1000 and a total of 100,000 transactions per run. I varied the number of clients between 10 and 100. It appears from my test JFS is much faster than both ext3 and XFS for this workload. JFS and XFS were made with the mkfs defaults. ext3 was made with -T largefile4 and -E stride=32. The deadline scheduler was used for all runs (anticipatory scheduler is much worse). Here's the result, in transactions per second. ext3 jfs xfs - 10 Clients 55 81 68 100 Clients 61 100 64 If you still have a chance, could you do tests with other journaling options for ext3 (journal=writeback, journal=data)? And could you give figures about performace of other IO elevators? I mean, you wrote that anticipatory is much wore -- how much worse? :) Could you give numbers for deadline,anticipatory,cfq elevators? :) And, additionally would it be possible to give numbers for bonnie++ results? To see how does pgbench to bonnie++ relate? Phew, that's a lot of permutations. At 20-30 minutes per run, I'm thinking 5-8 hours or so. Still, for you dear readers, I'll somehow accomplish this tedious feat. As for Bonnie, JFS is a good 60-80% faster than ext3. See my message to ext3-users yesterday. Using bonnie++ with a 10GB fileset, in MB/s: ext3jfsxfs Read 112 188141 Write 97 157167 Rewrite 51 71 60 -jwb ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] JFS fastest filesystem for PostgreSQL?
I just took delivery of a new system, and used the opportunity to benchmark postgresql 8.0 performance on various filesystems. The system in question runs Linux 2.6.12, has one CPU and 1GB of system memory, and 5 7200RPM SATA disks attached to an Areca hardware RAID controller having 128MB of cache. The caches are all write-back. I ran pgbench with a scale factor of 1000 and a total of 100,000 transactions per run. I varied the number of clients between 10 and 100. It appears from my test JFS is much faster than both ext3 and XFS for this workload. JFS and XFS were made with the mkfs defaults. ext3 was made with -T largefile4 and -E stride=32. The deadline scheduler was used for all runs (anticipatory scheduler is much worse). Here's the result, in transactions per second. ext3 jfs xfs - 10 Clients 55 81 68 100 Clients 61 100 64 -jwb ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org