Re: [PERFORM] Index files

2007-09-14 Thread Harsh Azad
Great, creating new tablespace for indexes worked! Now the question is
whether existing tables/index can be moved to the new tablespace using an
alter command or the only way possible is to drop and recreate them?

Azad

On 9/14/07, Jean-David Beyer [EMAIL PROTECTED] wrote:

 Harsh Azad wrote:
  Hi,
 
  Where are the database index files located in the $PGDATA directory? I
  was thinking on soft linking them to another physical hard disk array.
 
 I am not an expert, but what I have done is put the Write-Ahead-Log on one
 hard drive, some little-used relations and their indices on a second hard
 drive, and the main database files on four other drives. These are SCSI
 hard
 drives and I have two SCSI controllers. /dev/sda and /dev/sdb are on one
 controller, and the other four hard drives are on the other controller.
 These controllers are on a PCI-X bus all their own.

 I put $PGDATA (I do not actually set or use that global variable) on
 /dev/sda.

 [/srv/dbms/dataA/pgsql/data]$ ls -l
 total 88
 -rw--- 1 postgres postgres 4 Aug 11 13:32 PG_VERSION
 drwx-- 5 postgres postgres  4096 Aug 11 13:32 base
 drwx-- 2 postgres postgres  4096 Sep 14 09:16 global
 drwx-- 2 postgres postgres  4096 Sep 13 23:35 pg_clog
 -rw--- 1 postgres postgres  3396 Aug 11 13:32 pg_hba.conf
 -rw--- 1 root root  3396 Aug 16 14:32 pg_hba.conf.dist
 -rw--- 1 postgres postgres  1460 Aug 11 13:32 pg_ident.conf
 drwx-- 4 postgres postgres  4096 Aug 11 13:32 pg_multixact
 drwx-- 2 postgres postgres  4096 Sep 14 09:16 pg_subtrans
 drwx-- 2 postgres postgres  4096 Aug 12 16:14 pg_tblspc
 drwx-- 2 postgres postgres  4096 Aug 11 13:32 pg_twophase
 drwx-- 3 postgres postgres  4096 Sep 14 09:13 pg_xlog
 -rw--- 1 postgres postgres 15526 Sep 11 22:31 postgresql.conf
 -rw--- 1 postgres postgres 13659 Aug 11 13:32 postgresql.conf.dist
 -rw--- 1 postgres postgres56 Sep 14 07:33 postmaster.opts
 -rw--- 1 postgres postgres52 Sep 14 07:33 postmaster.pid

 In /dev/sdb are

 ]$ ls -l
 total 12
 drwxr-x--- 2 postgres postgres 4096 Aug 18 00:00 pg_log
 -rw--- 1 postgres postgres 2132 Sep 14 07:25 pgstartup.log
 drwx-- 3 postgres postgres 4096 Aug 12 21:06 stock

 The stuff in stock are little-used tables and their indices.

 Everything else is on the other four drives. I put the index for a table
 on
 a separate drive from the tata for the table.

 --
   .~.  Jean-David Beyer  Registered Linux User 85642.
   /V\  PGP-Key: 9A2FC99A Registered Machine   241939.
 /( )\ Shrewsbury, New Jerseyhttp://counter.li.org
 ^^-^^ 09:10:01 up 1:37, 4 users, load average: 5.77, 5.12, 4.58

 ---(end of broadcast)---
 TIP 7: You can help support the PostgreSQL project by donating at

 http://www.postgresql.org/about/donate




-- 
Harsh Azad
===
[EMAIL PROTECTED]


[PERFORM] Index files

2007-09-13 Thread Harsh Azad
Hi,

Where are the database index files located in the $PGDATA directory? I was
thinking on soft linking them to another physical hard disk array.

Thanks,
Azad


Re: [PERFORM] Index files

2007-09-13 Thread Harsh Azad
ah.. thanks. Didn't realize table spaces can be mentioned while creating a
index. BTW, are soft links ok to use for pg_clog / pg_xlog . I moved the
existing directories to /mnt/logs/pglogs and made soft links for both
directories in $PGDATA

Thanks

On 9/14/07, Ow Mun Heng [EMAIL PROTECTED] wrote:

 On Fri, 2007-09-14 at 08:20 +0530, Harsh Azad wrote:
  Hi,
 
  Where are the database index files located in the $PGDATA directory? I
  was thinking on soft linking them to another physical hard disk array.

 you have to search through pg_class for the number

 Alternatively, you can try using tablespaces.

 create tablespace indexspace location '/mnt/fastarray'
 create index newindex on table (index_1) tablespace indexspace




-- 
Harsh Azad
===
[EMAIL PROTECTED]


Re: [PERFORM] SAN vs Internal Disks

2007-09-11 Thread Harsh Azad
Yeah, the DAS we are considering is Dell MD3000, it has redundant hot
swappable raid controllers in active-active mode. Provision for hot spare
hard-disk. And it can take upto 15 disks in 3U, you can attach two more
MD1000 to it, giving a total of 45 disks in total.

-- Harsh

On 9/12/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 On Tue, 11 Sep 2007, Decibel! wrote:

  On Tue, Sep 11, 2007 at 05:09:00PM -0400, Michael Stone wrote:
  On Tue, Sep 11, 2007 at 03:55:51PM -0500, Decibel! wrote:
  Also, to reply to someone else's email... there is one big reason to
 use
  a SAN over direct storage: you can do HA that results in 0 data loss.
  Good SANs are engineered to be highly redundant, with multiple
  controllers, PSUs, etc, so that the odds of losing the SAN itself are
  very, very low. The same isn't true with DAS.
 
  You can get DAS arrays with multiple controllers, PSUs, etc.  DAS !=
  single disk.
 
  It's still in the same chassis, though, which means if you lose memory
  or mobo you're still screwed. In a SAN setup for redundancy, there's
  very little in the way of a single point of failure; generally only the
  backplane, and because there's very little that's on there it's
  extremely rare for one to fail.

 not nessasarily. direct attached doesn't mean in the same chassis,
 external drive shelves attached via SCSI are still DAS

 you can even have DAS attached to a pair of machines, with the second box
 configured to mount the drives only if the first one dies.

 David Lang

 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster




-- 
Harsh Azad
===
[EMAIL PROTECTED]


[PERFORM] SAN vs Internal Disks

2007-09-06 Thread Harsh Azad
Hi,

We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB
RAM, 4x SAS 146 GB 15K RPM on RAID 5.

The current data size is about 50GB, but we want to purchase the hardware to
scale to about 1TB as we think our business will need to support that much
soon.
- Currently we have a 80% read and 20% write perecntages.
- Currently with this configuration the Database is showing signs of
over-loading.
- Auto-vaccum, etc run on this database, vaccum full runs nightly.
- Currently CPU loads are about 20%, memory utilization is full (but this is
also due to linux caching disk blocks) and IO waits are frequent.
- We have a load of about 400 queries per second

Now we are considering to purchase our own servers and in the process are
facing the usual dilemmas. First I'll list out what machine we have decided
to use:
2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now)
32 GB RAM
OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1
(Data Storage mentioned below)

We have already decided to split our database into 3 machines on the basis
on disjoint sets of data. So we will be purchasing three of these boxes.

HELP 1: Does something look wrong with above configuration, I know there
will be small differences b/w opetron/xeon. But do you think there is
something against going for 2.4Ghz Quad Xeons (clovertown i think)?

HELP 2: The main confusion is with regards to Data Storage. We have the
option of going for:

A: IBM N-3700 SAN Box, having 12x FC 300GB disks,  Partitioned into 3 disks
into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We
are also considering similar solution from EMC - CX310C.

B: Go for Internal of DAS based storage. Here for each server we should be
able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for
tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on
RAID-10 single table-space.

What do I think? Well..
SAN wins on manageability, replication (say to a DR site), backup, etc...
DAS wins on cost

But for a moment keeping these aside, i wanted to discuss, purely on
performance side which one is a winner? It feels like internal-disks will
perform better, but need to understand a rough magnitude of difference in
performance to see if its worth loosing the manageability features.

Also if we choose to go with DAS, what would be the best tool to do async
replication to DR site and maybe even as a extra plus a second read-only DB
server to distribute select loads.

Regards,
Azad


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Harsh Azad
Thanks Mark.

If I replicate a snapshot of Data and log files (basically the entire PG
data directory) and I maintain same version of postgres on both servers, it
should work right?

I am also thinking that having SAN storage will provide me with facility of
keeping a warm standby DB. By just shutting one server down and starting the
other mounting the same File system I should be able to bing my DB up when
the primary inccurs a physical failure.

I'm only considering SAN storage for this feature - has anyone ever used SAN
for replication and warm standy-by on Postgres?

Regards,
Harsh

On 9/6/07, Mark Lewis [EMAIL PROTECTED] wrote:

 On Thu, 2007-09-06 at 18:05 +0530, Harsh Azad wrote:
  Hi,
 
  We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon,
  8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5.
 
  The current data size is about 50GB, but we want to purchase the
  hardware to scale to about 1TB as we think our business will need to
  support that much soon.
  - Currently we have a 80% read and 20% write perecntages.
  - Currently with this configuration the Database is showing signs of
  over-loading.
  - Auto-vaccum, etc run on this database, vaccum full runs nightly.
  - Currently CPU loads are about 20%, memory utilization is full (but
  this is also due to linux caching disk blocks) and IO waits are
  frequent.
  - We have a load of about 400 queries per second
 
  Now we are considering to purchase our own servers and in the process
  are facing the usual dilemmas. First I'll list out what machine we
  have decided to use:
  2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now)
  32 GB RAM
  OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1
  (Data Storage mentioned below)
 
  We have already decided to split our database into 3 machines on the
  basis on disjoint sets of data. So we will be purchasing three of
  these boxes.
 
  HELP 1: Does something look wrong with above configuration, I know
  there will be small differences b/w opetron/xeon. But do you think
  there is something against going for 2.4Ghz Quad Xeons (clovertown i
  think)?
 
  HELP 2: The main confusion is with regards to Data Storage. We have
  the option of going for:
 
  A: IBM N-3700 SAN Box, having 12x FC 300GB disks,  Partitioned into 3
  disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2
  hot spare. We are also considering similar solution from EMC -
  CX310C.
 
  B: Go for Internal of DAS based storage. Here for each server we
  should be able to have: 2x disks on RAID-1 for logs, 6x disks on
  RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or
  maybe 12x disks on RAID-10 single table-space.
 
  What do I think? Well..
  SAN wins on manageability, replication (say to a DR site), backup,
  etc...
  DAS wins on cost
 
  But for a moment keeping these aside, i wanted to discuss, purely on
  performance side which one is a winner? It feels like internal-disks
  will perform better, but need to understand a rough magnitude of
  difference in performance to see if its worth loosing the
  manageability features.
 
  Also if we choose to go with DAS, what would be the best tool to do
  async replication to DR site and maybe even as a extra plus a second
  read-only DB server to distribute select loads.

 Sounds like a good candidate for Slony replication for backups /
 read-only slaves.

 I haven't seen a SAN yet whose DR / replication facilities are on par
 with a good database replication solution.  My impression is that those
 facilities are mostly for file servers, mail servers, etc.  It would be
 difficult for a SAN to properly replicate a database given the strict
 ordering, size and consistency requirements for the data files.  Not
 impossible, but in my limited experience I haven't found one that I
 trust to do it reliably either, vendor boastings to the contrary
 notwithstanding.  (Hint: make sure you know exactly what your vendor's
 definition of the term 'snapshot' really means).

 So before you invest in a SAN, make sure that you're actually going to
 be able to (and want to) use all the nice management features you're
 paying for.  We have some SAN's that are basically acting just as
 expensive external RAID arrays because we do the database
 replication/backup in software anyway.

 -- Mark Lewis




-- 
Harsh Azad
===
[EMAIL PROTECTED]


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Harsh Azad
Thanks Scott, we have now requested IBM/EMC to provide test machines.
Interestingly since you mentioned the importance of Raid controllers and the
drivers; we are planning to use Cent OS 5 for hosting the DB.

Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find
any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5?

Secondly, would investing into Redhat enterprise edition give any
performance advantage? I know all the SAN boxes are only certified on RHEL
and not CentOS. Or since CentOS is similar to RHEL it would be fine?

Regards,
Harsh

On 9/6/07, Scott Marlowe [EMAIL PROTECTED] wrote:

 On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote:
  Hi,
 
  We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB
  RAM, 4x SAS 146 GB 15K RPM on RAID 5.
 
  The current data size is about 50GB, but we want to purchase the
 hardware to
  scale to about 1TB as we think our business will need to support that
 much
  soon.
  - Currently we have a 80% read and 20% write percentages.

 For this type load, you should be running on RAID10 not RAID5.  Or, if
 you must use RAID 5, use more disks and have a battery backed caching
 RAID controller known to perform well with RAID5 and large arrays.

  - Currently with this configuration the Database is showing signs of
  over-loading.

 On I/O or CPU?  If you're running out of CPU, then look to increasing
 CPU horsepower and tuning postgresql.
 If I/O then you need to look into a faster I/O subsystem.

  - Auto-vaccum, etc run on this database, vaccum full runs nightly.

 Generally speaking, if you need to run vacuum fulls, you're doing
 something wrong.  Is there a reason you're running vacuum full or is
 this just precautionary.  vacuum full can bloat your indexes, so you
 shouldn't run it regularly.  reindexing might be a better choice if
 you do need to regularly shrink your db.  The better option is to
 monitor your fsm usage and adjust fsm settings / autovacuum settings
 as necessary.

   - Currently CPU loads are about 20%, memory utilization is full (but
 this
  is also due to linux caching disk blocks) and IO waits are frequent.
  - We have a load of about 400 queries per second

 What does vmstat et. al. say about CPU versus I/O wait?

  Now we are considering to purchase our own servers and in the process
 are
  facing the usual dilemmas. First I'll list out what machine we have
 decided
  to use:
  2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now)
  32 GB RAM
  OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1
  (Data Storage mentioned below)
 
  We have already decided to split our database into 3 machines on the
 basis
  on disjoint sets of data. So we will be purchasing three of these boxes.
 
  HELP 1: Does something look wrong with above configuration, I know there
  will be small differences b/w opetron/xeon. But do you think there is
  something against going for 2.4Ghz Quad Xeons (clovertown i think)?

 Look like good machines, plenty fo memory.

  HELP 2: The main confusion is with regards to Data Storage. We have the
  option of going for:
 
  A: IBM N-3700 SAN Box, having 12x FC 300GB disks,  Partitioned into 3
 disks
  into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot
 spare. We
  are also considering similar solution from EMC - CX310C.
 
  B: Go for Internal of DAS based storage. Here for each server we should
 be
  able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for
  tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks
 on
  RAID-10 single table-space.
 
  What do I think? Well..
  SAN wins on manageability, replication (say to a DR site), backup,
 etc...
  DAS wins on cost

 The problem with SAN is that it's apparently very easy to build a big
 expensive system that performs poorly.  We've seen reports of such
 here on the lists a few times.  I would definitely demand an
 evaluation period from your supplier to make sure it performs well if
 you go SAN.

  But for a moment keeping these aside, i wanted to discuss, purely on
  performance side which one is a winner? It feels like internal-disks
 will
  perform better, but need to understand a rough magnitude of difference
 in
  performance to see if its worth loosing the manageability features.

 That really really really depends.  The quality of RAID controllers
 for either setup is very important, as is the driver support, etc...
 All things being even, I'd lean towards the local storage.

  Also if we choose to go with DAS, what would be the best tool to do
 async
  replication to DR site and maybe even as a extra plus a second read-only
 DB
  server to distribute select loads.

 Look at slony, or PITR with continuous recovery.  Of those two, I've
 only used Slony in production, and I was very happy with it's
 performance, and it was very easy to write a bash script to monitor
 the replication for failures.




-- 
Harsh Azad
===
[EMAIL PROTECTED]


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Harsh Azad
Hi,

How about the Dell Perc 5/i card, 512MB battery backed cache or IBM
ServeRAID-8k Adapter?

I hope I am sending relevant information here, I am not too well versed with
RAID controllers.

Regards,
Harsh

On 9/6/07, Scott Marlowe [EMAIL PROTECTED] wrote:

 On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote:
  Thanks Scott, we have now requested IBM/EMC to provide test machines.
  Interestingly since you mentioned the importance of Raid controllers and
 the
  drivers; we are planning to use Cent OS 5 for hosting the DB.

 What RAID controllers have you looked at.  Seems the two most popular
 in terms of performance here have been Areca and 3Ware / Escalade.
 LSI seems to come in a pretty close third.  Adaptec is to be avoided
 as are cheap RAID controllers (i.e. promise etc...)  battery backed
 cache is a must, and the bigger the better.

  Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not
 find
  any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5?
 
  Secondly, would investing into Redhat enterprise edition give any
  performance advantage? I know all the SAN boxes are only certified on
 RHEL
  and not CentOS. Or since CentOS is similar to RHEL it would be fine?

 for all intents and purposes, CentOS and RHEL are the same OS, so any
 pgsql rpm for one should pretty much work for the other.  At the
 worst, you might have to get a srpm and rebuild it for CentOS / White
 Box.




-- 
Harsh Azad
===
[EMAIL PROTECTED]