You might want to contact Greenplum. They have a lot of experience with multiple-terabyte data warehouses using Postgresql. They don't have local support yet but they've been bragging region-wide (ASEAN) about displacing Oracle at Smart when it's not true. :)
but regardless they are the go-to people for very large Postgresql sites. On 4/26/07, Gerald Timothy Quimpo <[EMAIL PROTECTED]> wrote:
hi all, Is there anyone on the list with 1. experience with very large linux (or generally, *nixen) postgresql servers? or, possibly, with 2) bad experiences with linux or LVM on very large RAID arrays. My company might be interested in contracting for services from #1 above. We're probably going to contract with official postgresql support companies overseas, but if there's someone local, we'd be glad to take up some of your time. Here's the situation that brings this up. We had a CentOS server with an adaptec RAID card (i don't know the model number, or any of the version numbers) and 8 SATA drives. The SATA drives were in one RAID-51 array, yielding around 1.5 TB of space. Since CentOS didn't have official XFS support yet at the time this was first configured, the server was configured with LVM and reiserfs (yes, we could have gone with xfs but I recommended reiserfs because I'd had a LOT of problems with xfs requiring full fsck after random power outages, this was my bug, random power outages are very rare at our data center, and the UPSs and generate cover those rare situations). The server worked very well for many months. It was possibly slightly slower than optimal because of LVM, but that wasn't a major issue yet. We had a postgresql database on that server which was at a bit more than 500GB (data + indexes). This bloated up to around 700GB because a lot of the tables had large datasets reloaded (deleted and updated). I tried to vacuum (not full, just regular vacuum) the whole database. That took so long I killed it and tried individual table vacuums. That took so long for all tables tried, so I gave up on that too. During one of these vacuums the server had a kernel panic (reiserfs error, something about hash size too big). So I gave up on vacuum. I then did cluster on individual tables, largest tables first. This ran faster than the corresponding vacuum and had the benefit of improving queries that used the clustered database. Strangely, cluster failed on a relatively small table (less than 5GB of data) when it succeeded with much larger tables (80GB). Does anyone have any insights into any of this? The server has been reinstalled with FreeBSD and UFS2 (not my choice, but a pretty good one), so all of this is now of theoretical value only. Insights into the stability of reiserfs, LVM, reiserfs+LVM would be interesting. Oh yeah, after the system crashed, the drives in the array were scanned for bad blocks by the RAID controller. No errors were found. I was sort of convinced that the problem would be due to some media errors because toward the end there, the same cluster command on the same table/index caused the same kernel panic twice. So I thought there must be a bad sector either in that table or index so that reading that bad part made reiserfs die. This seems to be disproven though, by the fact that the RAID controller diagnostics found no media errors. tiger _________________________________________________ Philippine Linux Users' Group (PLUG) Mailing List plug@lists.linux.org.ph (#PLUG @ irc.free.net.ph) Read the Guidelines: http://linux.org.ph/lists Searchable Archives: http://archives.free.net.ph
-- Orlando Andico Senior Sales Consultant - Embedded GTMi Oracle (Philippines) Corporation The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation. _________________________________________________ Philippine Linux Users' Group (PLUG) Mailing List plug@lists.linux.org.ph (#PLUG @ irc.free.net.ph) Read the Guidelines: http://linux.org.ph/lists Searchable Archives: http://archives.free.net.ph