how's the service being used in the first place?

i haven't dealt with DB+ReiserFS+LVM combo before
but it's more like of a DB+JFS-LVM. yes, i removed
the LVM layer due to a performance hit on the production
server. we're now doing our own "snapshots" from a
separate slave machine.

if you're using CentOS and you're not dependent on
the EL features, you might as well use the vanilla
kernel and keep yourself updated with regards to the
filesystem (and driver) fixes (bit me once).

and if you're going to ask me, i might as well remove
the RAID layer. sounds insane to some, but i really
hate the performance hit when it is rebuilding in the
background (heck, others can deal w/o using any
FS journalling, hehe). the machine maybe up in no
time but if it can't meet the performance demand of
the users, _it_is_just_as_good_when_it_is_offline_.
and besides, there are lots of ways to recover from
a failure, but that all depends on the infra.




On 4/26/07, Gerald Timothy Quimpo <[EMAIL PROTECTED]> wrote:

hi all,

Is there anyone on the list with

   1. experience with very large linux (or generally, *nixen)
      postgresql servers? or, possibly, with

   2) bad experiences with linux or LVM on very large RAID arrays.

My company might be interested in contracting for services from #1
above.  We're probably going to contract with official postgresql
support companies overseas, but if there's someone local, we'd be glad
to take up some of your time.

Here's the situation that brings this up.

We had a CentOS server with an adaptec RAID card (i don't know the model
number, or any of the version numbers) and 8 SATA drives.  The SATA
drives were in one RAID-51 array, yielding around 1.5 TB of space.
Since CentOS didn't have official XFS support yet at the time this was
first configured, the server was configured with LVM and reiserfs
(yes, we could have gone with xfs but I recommended reiserfs because I'd
had a LOT of problems with xfs requiring full fsck after random power
outages, this was my bug, random power outages are very rare at our data
center, and the UPSs and generate cover those rare situations).

The server worked very well for many months.  It was possibly slightly
slower than optimal because of LVM, but that wasn't a major issue yet.

We had a postgresql database on that server which was at a bit more than
500GB (data + indexes). This bloated up to around 700GB because a lot of
the tables had large datasets reloaded (deleted and updated).  I tried
to vacuum (not full, just regular vacuum) the whole database. That took
so long I killed it and tried individual table vacuums. That took so
long for all tables tried, so I gave up on that too.  During one of
these vacuums the server had a kernel panic (reiserfs error, something
about hash size too big).  So I gave up on vacuum.

I then did cluster on individual tables, largest tables first. This ran
faster than the corresponding vacuum and had the benefit of improving
queries that used the clustered database.  Strangely, cluster failed on
a relatively small table (less than 5GB of data) when it succeeded with
much larger tables (80GB).

Does anyone have any insights into any of this? The server has been
reinstalled with FreeBSD and UFS2 (not my choice, but a pretty good
one), so all of this is now of theoretical value only.  Insights into
the stability of reiserfs, LVM, reiserfs+LVM would be interesting.

Oh yeah, after the system crashed, the drives in the array were scanned
for bad blocks by the RAID controller. No errors were found. I was sort
of convinced that the problem would be due to some media errors because
toward the end there, the same cluster command on the same table/index
caused the same kernel panic twice.  So I thought there must be a bad
sector either in that table or index so that reading that bad part made
reiserfs die.  This seems to be disproven though, by the fact that the
RAID controller diagnostics found no media errors.

tiger

_________________________________________________
Philippine Linux Users' Group (PLUG) Mailing List
plug@lists.linux.org.ph (#PLUG @ irc.free.net.ph)
Read the Guidelines: http://linux.org.ph/lists
Searchable Archives: http://archives.free.net.ph

Reply via email to