Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-06 Thread Ansgar -59cobalt- Wiechers
On 2007-09-05 Scott Marlowe wrote:
 On 9/5/07, Ansgar -59cobalt- Wiechers [EMAIL PROTECTED] wrote:
 On 2007-09-05 Scott Marlowe wrote:
 On 9/5/07, Ansgar -59cobalt- Wiechers [EMAIL PROTECTED] wrote:
 On 2007-09-05 Scott Marlowe wrote:
 And there's the issue that with windows / NTFS that when one
 process opens a file for read, it locks it for all other users.
 This means that things like virus scanners can cause odd,
 unpredictable failures of your database.

 Uh... what? Locking isn't done by the filesystem but by
 applications (which certainly can decide to not lock a file when
 opening it). And no one in his right mind would ever have a virus
 scanner access the files of a running database, regardless of
 operating system or filesystem.

 Exactly, the default is to lock the file.  The application has to
 explicitly NOT lock it.  It's the opposite of linux.

 Yes. So? It's still up to the application, and it still has nothing
 at all to do with the filesystem.
 
 And if you look at my original reply, you'll see that I said WINDOWS /
 NTFS.  not just NTFS.  i.e. it's a windowsism.

I am aware of what you wrote. However, since the locking behaviour is 
exactly the same with Windows/FAT32 or Windows/%ANY_OTHER_FILESYSTEM%
your statement is still wrong.

Regards
Ansgar Wiechers
-- 
The Mac OS X kernel should never panic because, when it does, it
seriously inconveniences the user.
--http://developer.apple.com/technotes/tn2004/tn2118.html

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Hardware spec

2007-09-06 Thread Willo van der Merwe

Hi guys,

I'm have the rare opportunity to spec the hardware for a new database
server. It's going to replace an older one, driving a social networking
web application. The current server (a quad opteron with 4Gb of RAM and
80Gb fast SCSI RAID10) is coping with an average load of ranging between
1.5 and 3.5.

The new machine spec I have so far:
   2 x Intel Xeon 2.33 GHz Dual Core Woodcrest Processors
   4 Gb RAM
   5x73 GB Ultra320 SCSI RAID 5 (288 GB storage)

I've heard that RAID 5 is not necessarily the best performer. Also, are
there any special tricks when partition the file system?

Regards,

Willo


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware spec

2007-09-06 Thread Richard Huxton

Willo van der Merwe wrote:

Hi guys,

I'm have the rare opportunity to spec the hardware for a new database
server. It's going to replace an older one, driving a social networking
web application. The current server (a quad opteron with 4Gb of RAM and
80Gb fast SCSI RAID10) is coping with an average load of ranging between
1.5 and 3.5.

The new machine spec I have so far:


What's the limiting factor on your current machine - disk, memory, cpup?

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware spec

2007-09-06 Thread Willo van der Merwe

Richard Huxton wrote:

Willo van der Merwe wrote:

Hi guys,

I'm have the rare opportunity to spec the hardware for a new database
server. It's going to replace an older one, driving a social networking
web application. The current server (a quad opteron with 4Gb of RAM and
80Gb fast SCSI RAID10) is coping with an average load of ranging between
1.5 and 3.5.

The new machine spec I have so far:

What's the limiting factor on your current machine - disk, memory, cpup?
I'm a bit embarrassed to admit that I'm not sure. The reason we're 
changing machines is that we might be changing ISPs and we're renting / 
leasing the machines from the ISP.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Hardware spec

2007-09-06 Thread Jean-David Beyer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Willo van der Merwe wrote:
 Richard Huxton wrote:
 Willo van der Merwe wrote:
 Hi guys,
 
 I'm have the rare opportunity to spec the hardware for a new database
  server. It's going to replace an older one, driving a social
 networking web application. The current server (a quad opteron with
 4Gb of RAM and 80Gb fast SCSI RAID10) is coping with an average load
 of ranging between 1.5 and 3.5.
 
 The new machine spec I have so far:
 What's the limiting factor on your current machine - disk, memory,
 cpup?
 I'm a bit embarrassed to admit that I'm not sure. The reason we're 
 changing machines is that we might be changing ISPs and we're renting / 
 leasing the machines from the ISP.
 
Before you get rid of the current ISP, better examine what is going on with
the present setup. It would be good to know if you are memory, processor, or
IO limited. That way you could increase what needs to be increased, and not
waste money where the bottleneck is not.

- --
  .~.  Jean-David Beyer  Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A Registered Machine   241939.
 /( )\ Shrewsbury, New Jerseyhttp://counter.li.org
 ^^-^^ 07:10:01 up 28 days, 10:32, 4 users, load average: 5.48, 4.77, 4.37
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org

iD8DBQFG3+FwPtu2XpovyZoRAmp+AJ9R4mvznqJ24ZCPK8DcTAsz2d34+QCfQzhH
vmXnoJO0vm/A/f/Ol0TOy6o=
=9rsm
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Hardware spec

2007-09-06 Thread Willo van der Merwe

Jean-David Beyer wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Willo van der Merwe wrote:
  

Richard Huxton wrote:


Willo van der Merwe wrote:
  

Hi guys,

I'm have the rare opportunity to spec the hardware for a new database
 server. It's going to replace an older one, driving a social
networking web application. The current server (a quad opteron with
4Gb of RAM and 80Gb fast SCSI RAID10) is coping with an average load
of ranging between 1.5 and 3.5.

The new machine spec I have so far:


What's the limiting factor on your current machine - disk, memory,
cpup?
  
I'm a bit embarrassed to admit that I'm not sure. The reason we're 
changing machines is that we might be changing ISPs and we're renting / 
leasing the machines from the ISP.




Before you get rid of the current ISP, better examine what is going on with
the present setup. It would be good to know if you are memory, processor, or
IO limited. That way you could increase what needs to be increased, and not
waste money where the bottleneck is not.
  
Good advice. After running a vmstat and iostat, it is clear, to my mind 
anyway, that the most likely bottleneck is IO, next is probably some 
more RAM.

Here's the output:
procs ---memory-- ---swap-- -io --system-- 
cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us sy 
id wa
0  0  29688  80908 128308 331579200 8636 8 17  
2 80  1



avg-cpu:  %user   %nice%sys %iowait   %idle
 17.180.001.930.81   80.08

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  14.5766.48   506.45   58557617  446072213
sda1  0.60 0.27 4.70 2351224136128
sda2  0.38 0.77 2.27 6787542002576
sda3  2.37 0.4918.61 429171   16389960
sda4  0.00 0.00 0.00  2  0
sda5  0.71 0.66 5.46 5783074807087
sda6  0.03 0.01 0.24   6300 214196
sda7  0.02 0.00 0.19   2622 165992
sda8 60.1964.29   474.98   56626211  418356226


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] SAN vs Internal Disks

2007-09-06 Thread Harsh Azad
Hi,

We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB
RAM, 4x SAS 146 GB 15K RPM on RAID 5.

The current data size is about 50GB, but we want to purchase the hardware to
scale to about 1TB as we think our business will need to support that much
soon.
- Currently we have a 80% read and 20% write perecntages.
- Currently with this configuration the Database is showing signs of
over-loading.
- Auto-vaccum, etc run on this database, vaccum full runs nightly.
- Currently CPU loads are about 20%, memory utilization is full (but this is
also due to linux caching disk blocks) and IO waits are frequent.
- We have a load of about 400 queries per second

Now we are considering to purchase our own servers and in the process are
facing the usual dilemmas. First I'll list out what machine we have decided
to use:
2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now)
32 GB RAM
OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1
(Data Storage mentioned below)

We have already decided to split our database into 3 machines on the basis
on disjoint sets of data. So we will be purchasing three of these boxes.

HELP 1: Does something look wrong with above configuration, I know there
will be small differences b/w opetron/xeon. But do you think there is
something against going for 2.4Ghz Quad Xeons (clovertown i think)?

HELP 2: The main confusion is with regards to Data Storage. We have the
option of going for:

A: IBM N-3700 SAN Box, having 12x FC 300GB disks,  Partitioned into 3 disks
into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We
are also considering similar solution from EMC - CX310C.

B: Go for Internal of DAS based storage. Here for each server we should be
able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for
tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on
RAID-10 single table-space.

What do I think? Well..
SAN wins on manageability, replication (say to a DR site), backup, etc...
DAS wins on cost

But for a moment keeping these aside, i wanted to discuss, purely on
performance side which one is a winner? It feels like internal-disks will
perform better, but need to understand a rough magnitude of difference in
performance to see if its worth loosing the manageability features.

Also if we choose to go with DAS, what would be the best tool to do async
replication to DR site and maybe even as a extra plus a second read-only DB
server to distribute select loads.

Regards,
Azad


Re: [PERFORM] utilising multi-cpu/core machines?

2007-09-06 Thread Kevin Grittner
 On Wed, Sep 5, 2007 at  5:41 PM, in message [EMAIL PROTECTED],
Thomas Finneid [EMAIL PROTECTED] wrote: 
 how does pg utilise multi cpus/cores, i.e. does it use more than one 
 core? and possibly, how, are there any documentation about this.
 
For portability reasons PostgreSQL doesn't use threads, per se, but spawns
a new process for each connection, and a few for other purposes.  Each
process may be running on a separate CPU, but a single connection will
only be using one -- directly, anyway.  (The OS may well be using the
other for I/O, etc.)
 
For documentation, you could start with this:
 
http://www.postgresql.org/docs/8.2/interactive/app-postgres.html
 
-Kevin
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Hardware spec

2007-09-06 Thread Florian Weimer
* Willo van der Merwe:

 Good advice. After running a vmstat and iostat, it is clear, to my
 mind anyway, that the most likely bottleneck is IO, next is probably
 some more RAM.

 Here's the output:
 procs ---memory-- ---swap-- -io --system-- 
 cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us
 sy id wa
 0  0  29688  80908 128308 331579200 8636 8 17
 2 80  1

You need to run vmstat 10 (for ten-second averages) and report a
couple of lines.

 avg-cpu:  %user   %nice%sys %iowait   %idle
  17.180.001.930.81   80.08

Same for iostat.

Your initial numbers suggest that your server isn't I/O-bound, though
(the percentage spent in iowait is much too small, and so are the tps
numbers).

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware spec

2007-09-06 Thread Florian Weimer
* Willo van der Merwe:

 Florian Weimer wrote:
 You need to run vmstat 10 (for ten-second averages) and report a
 couple of lines.

 2 80  1
 5  0  61732  37052  28180 34319560014   987 2320  2021 38

 sda3  3.30 0.0026.40  0264
 sda8 97.90 0.00   783.20  0   7832

These values don't look I/O bound to me.  CPU usage is pretty low,
too.

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] [ADMIN] ADO -PostgreSQL OLE DB Provider

2007-09-06 Thread Richard Broersma Jr
--- Jayaram Bhat [EMAIL PROTECTED] wrote:
 I am using a postgres setup  in Windows. And it is working fine usings ODBC 
 drive, but not in ADO PostgreSQL OLE DB Provider
 
 giving error test connection 'Test connection failed because of an error in 
 initializing provider. Unspecified error'
 
 Can i get help for the same

This is my understanding regarding the OLE DB driver for PostgreSQL:

(1) The OLE DB Provider lacks the robustness that the ODBC driver has.
(2) The OLE DB Provider lacks much of the Functionality that an OLE DB provider 
should have.  
(3) And this probably will not change soon since the OLE DB driver is not 
actively supported.

Regards,
Richard Broersma Jr.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Scott Marlowe
On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote:
 Hi,

 We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB
 RAM, 4x SAS 146 GB 15K RPM on RAID 5.

 The current data size is about 50GB, but we want to purchase the hardware to
 scale to about 1TB as we think our business will need to support that much
 soon.
 - Currently we have a 80% read and 20% write percentages.

For this type load, you should be running on RAID10 not RAID5.  Or, if
you must use RAID 5, use more disks and have a battery backed caching
RAID controller known to perform well with RAID5 and large arrays.

 - Currently with this configuration the Database is showing signs of
 over-loading.

On I/O or CPU?  If you're running out of CPU, then look to increasing
CPU horsepower and tuning postgresql.
If I/O then you need to look into a faster I/O subsystem.

 - Auto-vaccum, etc run on this database, vaccum full runs nightly.

Generally speaking, if you need to run vacuum fulls, you're doing
something wrong.  Is there a reason you're running vacuum full or is
this just precautionary.  vacuum full can bloat your indexes, so you
shouldn't run it regularly.  reindexing might be a better choice if
you do need to regularly shrink your db.  The better option is to
monitor your fsm usage and adjust fsm settings / autovacuum settings
as necessary.

  - Currently CPU loads are about 20%, memory utilization is full (but this
 is also due to linux caching disk blocks) and IO waits are frequent.
 - We have a load of about 400 queries per second

What does vmstat et. al. say about CPU versus I/O wait?

 Now we are considering to purchase our own servers and in the process are
 facing the usual dilemmas. First I'll list out what machine we have decided
 to use:
 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now)
 32 GB RAM
 OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1
 (Data Storage mentioned below)

 We have already decided to split our database into 3 machines on the basis
 on disjoint sets of data. So we will be purchasing three of these boxes.

 HELP 1: Does something look wrong with above configuration, I know there
 will be small differences b/w opetron/xeon. But do you think there is
 something against going for 2.4Ghz Quad Xeons (clovertown i think)?

Look like good machines, plenty fo memory.

 HELP 2: The main confusion is with regards to Data Storage. We have the
 option of going for:

 A: IBM N-3700 SAN Box, having 12x FC 300GB disks,  Partitioned into 3 disks
 into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We
 are also considering similar solution from EMC - CX310C.

 B: Go for Internal of DAS based storage. Here for each server we should be
 able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for
 tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on
 RAID-10 single table-space.

 What do I think? Well..
 SAN wins on manageability, replication (say to a DR site), backup, etc...
 DAS wins on cost

The problem with SAN is that it's apparently very easy to build a big
expensive system that performs poorly.  We've seen reports of such
here on the lists a few times.  I would definitely demand an
evaluation period from your supplier to make sure it performs well if
you go SAN.

 But for a moment keeping these aside, i wanted to discuss, purely on
 performance side which one is a winner? It feels like internal-disks will
 perform better, but need to understand a rough magnitude of difference in
 performance to see if its worth loosing the manageability features.

That really really really depends.  The quality of RAID controllers
for either setup is very important, as is the driver support, etc...
All things being even, I'd lean towards the local storage.

 Also if we choose to go with DAS, what would be the best tool to do async
 replication to DR site and maybe even as a extra plus a second read-only DB
 server to distribute select loads.

Look at slony, or PITR with continuous recovery.  Of those two, I've
only used Slony in production, and I was very happy with it's
performance, and it was very easy to write a bash script to monitor
the replication for failures.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Harsh Azad
Thanks Mark.

If I replicate a snapshot of Data and log files (basically the entire PG
data directory) and I maintain same version of postgres on both servers, it
should work right?

I am also thinking that having SAN storage will provide me with facility of
keeping a warm standby DB. By just shutting one server down and starting the
other mounting the same File system I should be able to bing my DB up when
the primary inccurs a physical failure.

I'm only considering SAN storage for this feature - has anyone ever used SAN
for replication and warm standy-by on Postgres?

Regards,
Harsh

On 9/6/07, Mark Lewis [EMAIL PROTECTED] wrote:

 On Thu, 2007-09-06 at 18:05 +0530, Harsh Azad wrote:
  Hi,
 
  We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon,
  8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5.
 
  The current data size is about 50GB, but we want to purchase the
  hardware to scale to about 1TB as we think our business will need to
  support that much soon.
  - Currently we have a 80% read and 20% write perecntages.
  - Currently with this configuration the Database is showing signs of
  over-loading.
  - Auto-vaccum, etc run on this database, vaccum full runs nightly.
  - Currently CPU loads are about 20%, memory utilization is full (but
  this is also due to linux caching disk blocks) and IO waits are
  frequent.
  - We have a load of about 400 queries per second
 
  Now we are considering to purchase our own servers and in the process
  are facing the usual dilemmas. First I'll list out what machine we
  have decided to use:
  2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now)
  32 GB RAM
  OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1
  (Data Storage mentioned below)
 
  We have already decided to split our database into 3 machines on the
  basis on disjoint sets of data. So we will be purchasing three of
  these boxes.
 
  HELP 1: Does something look wrong with above configuration, I know
  there will be small differences b/w opetron/xeon. But do you think
  there is something against going for 2.4Ghz Quad Xeons (clovertown i
  think)?
 
  HELP 2: The main confusion is with regards to Data Storage. We have
  the option of going for:
 
  A: IBM N-3700 SAN Box, having 12x FC 300GB disks,  Partitioned into 3
  disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2
  hot spare. We are also considering similar solution from EMC -
  CX310C.
 
  B: Go for Internal of DAS based storage. Here for each server we
  should be able to have: 2x disks on RAID-1 for logs, 6x disks on
  RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or
  maybe 12x disks on RAID-10 single table-space.
 
  What do I think? Well..
  SAN wins on manageability, replication (say to a DR site), backup,
  etc...
  DAS wins on cost
 
  But for a moment keeping these aside, i wanted to discuss, purely on
  performance side which one is a winner? It feels like internal-disks
  will perform better, but need to understand a rough magnitude of
  difference in performance to see if its worth loosing the
  manageability features.
 
  Also if we choose to go with DAS, what would be the best tool to do
  async replication to DR site and maybe even as a extra plus a second
  read-only DB server to distribute select loads.

 Sounds like a good candidate for Slony replication for backups /
 read-only slaves.

 I haven't seen a SAN yet whose DR / replication facilities are on par
 with a good database replication solution.  My impression is that those
 facilities are mostly for file servers, mail servers, etc.  It would be
 difficult for a SAN to properly replicate a database given the strict
 ordering, size and consistency requirements for the data files.  Not
 impossible, but in my limited experience I haven't found one that I
 trust to do it reliably either, vendor boastings to the contrary
 notwithstanding.  (Hint: make sure you know exactly what your vendor's
 definition of the term 'snapshot' really means).

 So before you invest in a SAN, make sure that you're actually going to
 be able to (and want to) use all the nice management features you're
 paying for.  We have some SAN's that are basically acting just as
 expensive external RAID arrays because we do the database
 replication/backup in software anyway.

 -- Mark Lewis




-- 
Harsh Azad
===
[EMAIL PROTECTED]


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Mark Lewis
On Thu, 2007-09-06 at 18:05 +0530, Harsh Azad wrote:
 Hi,
 
 We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon,
 8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5.
 
 The current data size is about 50GB, but we want to purchase the
 hardware to scale to about 1TB as we think our business will need to
 support that much soon. 
 - Currently we have a 80% read and 20% write perecntages. 
 - Currently with this configuration the Database is showing signs of
 over-loading.
 - Auto-vaccum, etc run on this database, vaccum full runs nightly.
 - Currently CPU loads are about 20%, memory utilization is full (but
 this is also due to linux caching disk blocks) and IO waits are
 frequent.
 - We have a load of about 400 queries per second
 
 Now we are considering to purchase our own servers and in the process
 are facing the usual dilemmas. First I'll list out what machine we
 have decided to use: 
 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now)
 32 GB RAM
 OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1
 (Data Storage mentioned below)
 
 We have already decided to split our database into 3 machines on the
 basis on disjoint sets of data. So we will be purchasing three of
 these boxes. 
 
 HELP 1: Does something look wrong with above configuration, I know
 there will be small differences b/w opetron/xeon. But do you think
 there is something against going for 2.4Ghz Quad Xeons (clovertown i
 think)?
 
 HELP 2: The main confusion is with regards to Data Storage. We have
 the option of going for:
 
 A: IBM N-3700 SAN Box, having 12x FC 300GB disks,  Partitioned into 3
 disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2
 hot spare. We are also considering similar solution from EMC -
 CX310C. 
 
 B: Go for Internal of DAS based storage. Here for each server we
 should be able to have: 2x disks on RAID-1 for logs, 6x disks on
 RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or
 maybe 12x disks on RAID-10 single table-space. 
 
 What do I think? Well..
 SAN wins on manageability, replication (say to a DR site), backup,
 etc...
 DAS wins on cost
 
 But for a moment keeping these aside, i wanted to discuss, purely on
 performance side which one is a winner? It feels like internal-disks
 will perform better, but need to understand a rough magnitude of
 difference in performance to see if its worth loosing the
 manageability features. 
 
 Also if we choose to go with DAS, what would be the best tool to do
 async replication to DR site and maybe even as a extra plus a second
 read-only DB server to distribute select loads.

Sounds like a good candidate for Slony replication for backups /
read-only slaves.

I haven't seen a SAN yet whose DR / replication facilities are on par
with a good database replication solution.  My impression is that those
facilities are mostly for file servers, mail servers, etc.  It would be
difficult for a SAN to properly replicate a database given the strict
ordering, size and consistency requirements for the data files.  Not
impossible, but in my limited experience I haven't found one that I
trust to do it reliably either, vendor boastings to the contrary
notwithstanding.  (Hint: make sure you know exactly what your vendor's
definition of the term 'snapshot' really means).

So before you invest in a SAN, make sure that you're actually going to
be able to (and want to) use all the nice management features you're
paying for.  We have some SAN's that are basically acting just as
expensive external RAID arrays because we do the database
replication/backup in software anyway.

-- Mark Lewis



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Harsh Azad
Thanks Scott, we have now requested IBM/EMC to provide test machines.
Interestingly since you mentioned the importance of Raid controllers and the
drivers; we are planning to use Cent OS 5 for hosting the DB.

Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find
any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5?

Secondly, would investing into Redhat enterprise edition give any
performance advantage? I know all the SAN boxes are only certified on RHEL
and not CentOS. Or since CentOS is similar to RHEL it would be fine?

Regards,
Harsh

On 9/6/07, Scott Marlowe [EMAIL PROTECTED] wrote:

 On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote:
  Hi,
 
  We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB
  RAM, 4x SAS 146 GB 15K RPM on RAID 5.
 
  The current data size is about 50GB, but we want to purchase the
 hardware to
  scale to about 1TB as we think our business will need to support that
 much
  soon.
  - Currently we have a 80% read and 20% write percentages.

 For this type load, you should be running on RAID10 not RAID5.  Or, if
 you must use RAID 5, use more disks and have a battery backed caching
 RAID controller known to perform well with RAID5 and large arrays.

  - Currently with this configuration the Database is showing signs of
  over-loading.

 On I/O or CPU?  If you're running out of CPU, then look to increasing
 CPU horsepower and tuning postgresql.
 If I/O then you need to look into a faster I/O subsystem.

  - Auto-vaccum, etc run on this database, vaccum full runs nightly.

 Generally speaking, if you need to run vacuum fulls, you're doing
 something wrong.  Is there a reason you're running vacuum full or is
 this just precautionary.  vacuum full can bloat your indexes, so you
 shouldn't run it regularly.  reindexing might be a better choice if
 you do need to regularly shrink your db.  The better option is to
 monitor your fsm usage and adjust fsm settings / autovacuum settings
 as necessary.

   - Currently CPU loads are about 20%, memory utilization is full (but
 this
  is also due to linux caching disk blocks) and IO waits are frequent.
  - We have a load of about 400 queries per second

 What does vmstat et. al. say about CPU versus I/O wait?

  Now we are considering to purchase our own servers and in the process
 are
  facing the usual dilemmas. First I'll list out what machine we have
 decided
  to use:
  2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now)
  32 GB RAM
  OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1
  (Data Storage mentioned below)
 
  We have already decided to split our database into 3 machines on the
 basis
  on disjoint sets of data. So we will be purchasing three of these boxes.
 
  HELP 1: Does something look wrong with above configuration, I know there
  will be small differences b/w opetron/xeon. But do you think there is
  something against going for 2.4Ghz Quad Xeons (clovertown i think)?

 Look like good machines, plenty fo memory.

  HELP 2: The main confusion is with regards to Data Storage. We have the
  option of going for:
 
  A: IBM N-3700 SAN Box, having 12x FC 300GB disks,  Partitioned into 3
 disks
  into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot
 spare. We
  are also considering similar solution from EMC - CX310C.
 
  B: Go for Internal of DAS based storage. Here for each server we should
 be
  able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for
  tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks
 on
  RAID-10 single table-space.
 
  What do I think? Well..
  SAN wins on manageability, replication (say to a DR site), backup,
 etc...
  DAS wins on cost

 The problem with SAN is that it's apparently very easy to build a big
 expensive system that performs poorly.  We've seen reports of such
 here on the lists a few times.  I would definitely demand an
 evaluation period from your supplier to make sure it performs well if
 you go SAN.

  But for a moment keeping these aside, i wanted to discuss, purely on
  performance side which one is a winner? It feels like internal-disks
 will
  perform better, but need to understand a rough magnitude of difference
 in
  performance to see if its worth loosing the manageability features.

 That really really really depends.  The quality of RAID controllers
 for either setup is very important, as is the driver support, etc...
 All things being even, I'd lean towards the local storage.

  Also if we choose to go with DAS, what would be the best tool to do
 async
  replication to DR site and maybe even as a extra plus a second read-only
 DB
  server to distribute select loads.

 Look at slony, or PITR with continuous recovery.  Of those two, I've
 only used Slony in production, and I was very happy with it's
 performance, and it was very easy to write a bash script to monitor
 the replication for failures.




-- 
Harsh Azad
===
[EMAIL PROTECTED]


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Harsh Azad wrote:
 Thanks Scott, we have now requested IBM/EMC to provide test machines.
 Interestingly since you mentioned the importance of Raid controllers and the
 drivers; we are planning to use Cent OS 5 for hosting the DB.
 
 Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find
 any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5?

Look under the RHEL section of ftp.postgresql.org

Joshua D. Drake

 
 Secondly, would investing into Redhat enterprise edition give any
 performance advantage? I know all the SAN boxes are only certified on RHEL
 and not CentOS. Or since CentOS is similar to RHEL it would be fine?
 
 Regards,
 Harsh
 
 On 9/6/07, Scott Marlowe [EMAIL PROTECTED] wrote:
 On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote:
 Hi,

 We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB
 RAM, 4x SAS 146 GB 15K RPM on RAID 5.

 The current data size is about 50GB, but we want to purchase the
 hardware to
 scale to about 1TB as we think our business will need to support that
 much
 soon.
 - Currently we have a 80% read and 20% write percentages.
 For this type load, you should be running on RAID10 not RAID5.  Or, if
 you must use RAID 5, use more disks and have a battery backed caching
 RAID controller known to perform well with RAID5 and large arrays.

 - Currently with this configuration the Database is showing signs of
 over-loading.
 On I/O or CPU?  If you're running out of CPU, then look to increasing
 CPU horsepower and tuning postgresql.
 If I/O then you need to look into a faster I/O subsystem.

 - Auto-vaccum, etc run on this database, vaccum full runs nightly.
 Generally speaking, if you need to run vacuum fulls, you're doing
 something wrong.  Is there a reason you're running vacuum full or is
 this just precautionary.  vacuum full can bloat your indexes, so you
 shouldn't run it regularly.  reindexing might be a better choice if
 you do need to regularly shrink your db.  The better option is to
 monitor your fsm usage and adjust fsm settings / autovacuum settings
 as necessary.

  - Currently CPU loads are about 20%, memory utilization is full (but
 this
 is also due to linux caching disk blocks) and IO waits are frequent.
 - We have a load of about 400 queries per second
 What does vmstat et. al. say about CPU versus I/O wait?

 Now we are considering to purchase our own servers and in the process
 are
 facing the usual dilemmas. First I'll list out what machine we have
 decided
 to use:
 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now)
 32 GB RAM
 OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1
 (Data Storage mentioned below)

 We have already decided to split our database into 3 machines on the
 basis
 on disjoint sets of data. So we will be purchasing three of these boxes.

 HELP 1: Does something look wrong with above configuration, I know there
 will be small differences b/w opetron/xeon. But do you think there is
 something against going for 2.4Ghz Quad Xeons (clovertown i think)?
 Look like good machines, plenty fo memory.

 HELP 2: The main confusion is with regards to Data Storage. We have the
 option of going for:

 A: IBM N-3700 SAN Box, having 12x FC 300GB disks,  Partitioned into 3
 disks
 into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot
 spare. We
 are also considering similar solution from EMC - CX310C.

 B: Go for Internal of DAS based storage. Here for each server we should
 be
 able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for
 tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks
 on
 RAID-10 single table-space.

 What do I think? Well..
 SAN wins on manageability, replication (say to a DR site), backup,
 etc...
 DAS wins on cost
 The problem with SAN is that it's apparently very easy to build a big
 expensive system that performs poorly.  We've seen reports of such
 here on the lists a few times.  I would definitely demand an
 evaluation period from your supplier to make sure it performs well if
 you go SAN.

 But for a moment keeping these aside, i wanted to discuss, purely on
 performance side which one is a winner? It feels like internal-disks
 will
 perform better, but need to understand a rough magnitude of difference
 in
 performance to see if its worth loosing the manageability features.
 That really really really depends.  The quality of RAID controllers
 for either setup is very important, as is the driver support, etc...
 All things being even, I'd lean towards the local storage.

 Also if we choose to go with DAS, what would be the best tool to do
 async
 replication to DR site and maybe even as a extra plus a second read-only
 DB
 server to distribute select loads.
 Look at slony, or PITR with continuous recovery.  Of those two, I've
 only used Slony in production, and I was very happy with it's
 performance, and it was very easy to write a bash script to monitor
 the replication for failures.

 
 
 


- 

Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Arjen van der Meijden

On 6-9-2007 14:35 Harsh Azad wrote:

2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now)


I don't understand this sentence. You seem to imply you might be able to 
fit more processors in your system?
Currently the only Quad Core's you can buy are dual-processor 
processors, unless you already got a quote for a system that yields the 
new Intel Tigerton processors.
I.e. if they are clovertown's they are indeed Intel Core-architecture 
processors, but you won't be able to fit more than 2 in the system and 
get 8 cores in a system.
If they are Tigerton, I'm a bit surprised you got a quote for that, 
although HP seems to offer a system for those. If they are the old 
dual-core MP's (70xx or 71xx), you don't want those...



32 GB RAM
OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1
(Data Storage mentioned below)


I doubt you need 15k-rpm drives for OS... But that won't matter much on 
the total cost.


HELP 1: Does something look wrong with above configuration, I know there 
will be small differences b/w opetron/xeon. But do you think there is 
something against going for 2.4Ghz Quad Xeons (clovertown i think)?


Apart from your implication that you may be able to stick more 
processors in it: no, not to me. Two Quad Core Xeons were even faster 
than 8 dual core opterons in our benchmarks, although that might also 
indicate limited OS-, postgres or underlying I/O-scaling.
Obviously the new AMD Barcelona-line of processors (coming next week 
orso) and the new Intel Quad Core's DP (Penryn?) and MP (Tigerton) may 
be interesting to look at, I don't know how soon systems will be 
available with those processors (HP seems to offer a tigerton-server).


B: Go for Internal of DAS based storage. Here for each server we should 
be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for 
tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks 
on RAID-10 single table-space.


You don't necessarily need to use internal disks for DAS, since you can 
also link an external SAS-enclosure either with or without an integrated 
raid-controller (IBM, Sun, Dell, HP and others have options for that), 
and those are able to be expanded to either multiple enclosures tied to 
eachother or to a controller in the server.
Those may also be usable in a warm-standby-scenario and may be quite a 
bit cheaper than FC-hardware.


But for a moment keeping these aside, i wanted to discuss, purely on 
performance side which one is a winner? It feels like internal-disks 
will perform better, but need to understand a rough magnitude of 
difference in performance to see if its worth loosing the manageability 
features.


As said, you don't necessarily need real internal disks, since SAS can 
be used with external enclosures as well, still being DAS. I have no 
idea what difference you will or may see between those in terms of 
performance. It probably largely depends on the raid-controller 
available, afaik the disks will be mostly the same. And it might depend 
on your available bandwidth, external SAS offers you a 4port-connection 
allowing for a 12Gbit-connection between a disk-enclosure and a 
controller. While - as I understand it - even expensive SAN-controllers 
only offer dual-ported, 8Gbit connections?
What's more important is probably the amount of disks and raid-cache you 
can buy in the SAN vs DAS-scenario. If you can buy 24 disks when going 
for DAS vs only 12 whith SAN...


But then again, I'm no real storage expert, we only have two Dell MD1000 
DAS-units at our site.


Best regards and good luck,

Arjen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Scott Marlowe
On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote:
 Thanks Scott, we have now requested IBM/EMC to provide test machines.
 Interestingly since you mentioned the importance of Raid controllers and the
 drivers; we are planning to use Cent OS 5 for hosting the DB.

What RAID controllers have you looked at.  Seems the two most popular
in terms of performance here have been Areca and 3Ware / Escalade.
LSI seems to come in a pretty close third.  Adaptec is to be avoided
as are cheap RAID controllers (i.e. promise etc...)  battery backed
cache is a must, and the bigger the better.

 Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find
 any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5?

 Secondly, would investing into Redhat enterprise edition give any
 performance advantage? I know all the SAN boxes are only certified on RHEL
 and not CentOS. Or since CentOS is similar to RHEL it would be fine?

for all intents and purposes, CentOS and RHEL are the same OS, so any
pgsql rpm for one should pretty much work for the other.  At the
worst, you might have to get a srpm and rebuild it for CentOS / White
Box.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Harsh Azad
Hi,

How about the Dell Perc 5/i card, 512MB battery backed cache or IBM
ServeRAID-8k Adapter?

I hope I am sending relevant information here, I am not too well versed with
RAID controllers.

Regards,
Harsh

On 9/6/07, Scott Marlowe [EMAIL PROTECTED] wrote:

 On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote:
  Thanks Scott, we have now requested IBM/EMC to provide test machines.
  Interestingly since you mentioned the importance of Raid controllers and
 the
  drivers; we are planning to use Cent OS 5 for hosting the DB.

 What RAID controllers have you looked at.  Seems the two most popular
 in terms of performance here have been Areca and 3Ware / Escalade.
 LSI seems to come in a pretty close third.  Adaptec is to be avoided
 as are cheap RAID controllers (i.e. promise etc...)  battery backed
 cache is a must, and the bigger the better.

  Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not
 find
  any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5?
 
  Secondly, would investing into Redhat enterprise edition give any
  performance advantage? I know all the SAN boxes are only certified on
 RHEL
  and not CentOS. Or since CentOS is similar to RHEL it would be fine?

 for all intents and purposes, CentOS and RHEL are the same OS, so any
 pgsql rpm for one should pretty much work for the other.  At the
 worst, you might have to get a srpm and rebuild it for CentOS / White
 Box.




-- 
Harsh Azad
===
[EMAIL PROTECTED]


Re: [PERFORM] Hardware spec]

2007-09-06 Thread Jean-David Beyer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Willo van der Merwe wrote:
 Jean-David Beyer wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Willo van der Merwe wrote:

 Richard Huxton wrote:

 Willo van der Merwe wrote:

 Hi guys,

 I'm have the rare opportunity to spec the hardware for a new database
  server. It's going to replace an older one, driving a social
 networking web application. The current server (a quad opteron with
 4Gb of RAM and 80Gb fast SCSI RAID10) is coping with an average load
 of ranging between 1.5 and 3.5.

 The new machine spec I have so far:

 What's the limiting factor on your current machine - disk, memory,
 cpup?

 I'm a bit embarrassed to admit that I'm not sure. The reason we're
 changing machines is that we might be changing ISPs and we're renting
 / leasing the machines from the ISP.


 Before you get rid of the current ISP, better examine what is going on
 with
 the present setup. It would be good to know if you are memory,
 processor, or
 IO limited. That way you could increase what needs to be increased,
 and not
 waste money where the bottleneck is not.

 Good advice. After running a vmstat and iostat, it is clear, to my mind
 anyway, that the most likely bottleneck is IO, next is probably some
 more RAM.
 Here's the output:
 procs ---memory-- ---swap-- -io --system--
 cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs us sy
 id wa
 0  0  29688  80908 128308 331579200 8636 8 17  2
 80  1


 avg-cpu:  %user   %nice%sys %iowait   %idle
  17.180.001.930.81   80.08

 Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
 sda  14.5766.48   506.45   58557617  446072213
 sda1  0.60 0.27 4.70 2351224136128
 sda2  0.38 0.77 2.27 6787542002576
 sda3  2.37 0.4918.61 429171   16389960
 sda4  0.00 0.00 0.00  2  0
 sda5  0.71 0.66 5.46 5783074807087
 sda6  0.03 0.01 0.24   6300 214196
 sda7  0.02 0.00 0.19   2622 165992
 sda8 60.1964.29   474.98   56626211  418356226


1.) If this is when the system is heavily loaded, you have more capacity
than you need, on the average. Processors are idle, not in wait state. You
have enough memory (no swapping going on), disks not too busy (probably).
But if it is not heavily loaded, run these when it is.

2.) Did you let vmstat and iostat run just once, or are these the last of
several reports. Because these tell the average since boot for the first
report of each when they run the first time. If so, the numbers may not mean
much.

Here is iostat for my machine that is running right now. Only lines involved
in postgreSQL are shown:

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda8409.33 0.67  3274.00 40 196440
sdb7  0.12 0.00 0.93  0 56
sdc1 44.73 0.13   357.73  8  21464
sdd1 23.43 0.00   187.47  0  11248
sde1133.45 0.00  1067.60  0  64056
sdf1 78.25 0.00   626.00  0  37560

On sda8 is the Write-Ahead-Log.
on sdb7 are some small seldom-used relations, but also the input files that
sda and sdb are what my system uses for other stuff as well, though sda is
not too heavily used and sdb even less.
I am presently loading into the database. sdc1, sdd1, sde1, and sdf1 are the
 drives reserved for database only.

I would suggest getting at least two hard drives on the new system and
perhaps putting the WAL on one and the rest on the other. My sda and sdb
drives are around 72 GBytes and the sdc, sdd, sde, and sdf are about 18
GBytes. I believe the more spindles the better (within reason) and dividing
the stuff up with the indices separate from the associated data to reduce
seeking.

$ vmstat 30
procs ---memory--  ---swap--  -io --system--
- -cpu--
 r  b   swpd   free   buff   cache   si   sobibo incs us sy
id wa
 5  1   1340 248456 248496 66526720028   149  2 1 94  2
   3  0
 4  2   1340 244240 248948 665636400 0  2670   1248 11320 80  4
14  2
 4  1   1340 241008 249492 665986400 0  2701   1222 11432 82  4
12  2
 5  0   1340 246268 249868 665364400 0  2799   1223 11412 83  4
12  2

My machine was idle most of the time since boot (other than running BOINC
stuff), but is busy loading data into the database at the moment. See how
the first line differs from the others? I have 8GBytes RAM on this 32-bit
machine running Red Hat Enterprise Linux 5.



- --
  .~.  Jean-David Beyer  

Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Mark Lewis
On Thu, 2007-09-06 at 22:28 +0530, Harsh Azad wrote:
 Thanks Mark.
 
 If I replicate a snapshot of Data and log files (basically the entire
 PG data directory) and I maintain same version of postgres on both
 servers, it should work right?
 
 I am also thinking that having SAN storage will provide me with
 facility of keeping a warm standby DB. By just shutting one server
 down and starting the other mounting the same File system I should be
 able to bing my DB up when the primary inccurs a physical failure. 
 
 I'm only considering SAN storage for this feature - has anyone ever
 used SAN for replication and warm standy-by on Postgres?
 
 Regards,
 Harsh


We used to use a SAN for warm standby of a database, but with Oracle and
not PG.  It worked kinda sorta, except for occasional crashes due to
buggy drivers.

But after going through the exercise, we realized that we hadn't gained
anything over just doing master/slave replication between two servers,
except that it was more expensive, had a tendency to expose buggy
drivers, had a single point of failure in the SAN array, failover took
longer and we couldn't use the warm standby server to perform read-only
queries.  So we reverted back and just used the SAN as expensive DAS and
set up a separate box for DB replication.

So if that's the only reason you're considering a SAN, then I'd advise
you to spend the extra money on more DAS disks.

Maybe I'm jaded by past experiences, but the only real use case I can
see to justify a SAN for a database would be something like Oracle RAC,
but I'm not aware of any PG equivalent to that.

-- Mark Lewis

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Scott Marlowe
On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote:
 Hi,

 How about the Dell Perc 5/i card, 512MB battery backed cache or IBM
 ServeRAID-8k Adapter?

All Dell Percs have so far been based on either adaptec or LSI
controllers, and have ranged from really bad to fairly decent
performers.  There were some recent posts on this list where someone
was benchmarking one, I believe.  searching the list archives might
prove useful.

I am not at all familiar with IBM's ServeRAID controllers.

Do either of these come with or have the option for battery back
module for the cache?

 I hope I am sending relevant information here, I am not too well versed with
 RAID controllers.

Yep.  Def look for a chance to evaluate whichever ones you're
considering.  The Areca's are in the same price range as the IBM
controller you're considering, maybe a few hundred dollars more.  See
if you can get one for review while looking at these other
controllers.

I'd recommend against Dell unless you're at a company that orders
computers by the hundred lot.  My experience with Dell has been that
unless you are a big customer you're just another number (a small one
at that) on a spreadsheet.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hardware spec

2007-09-06 Thread Willo van der Merwe

Florian Weimer wrote:

You need to run vmstat 10 (for ten-second averages) and report a
couple of lines.
  
procs ---memory-- ---swap-- -io --system-- 
cpu
r  b   swpd   free   buff  cache   si   sobibo   incs us sy 
id wa
1  0  61732  47388  27908 34313440010651 4 17  
2 80  1
5  0  61732  37052  28180 34319560014   987 2320  2021 38  
4 56  2
1  0  61620  43076  28356 343225600 0   367 1691  1321 28  
3 67  1
3  0  61620  37620  28484 343274000 0   580 4088  6792 40  
5 54  1
2  0  61596  33716  28748 34335200024   415 2087  1890 44  
4 49  2
3  0  61592  45300  28904 34162003061   403 2282  2154 41  
4 54  1
7  0  61592  30172  29092 34169640019   358 2779  3478 31  
6 63  1
1  0  61580  62948  29180 34173686027   312 3632  4396 38  
4 57  1
1  0  61444  62388  29400 341796400 6   354 2163  1918 31  
4 64  1
2  0  61444  53988  29648 341798800 0   553 2095  1687 33  
3 63  1
1  0  61444  63988  29832 341834800 6   352 1767  1424 22  
3 73  1
1  1  61444  51148  30052 34191480050   349 1524   834 22  
3 74  2
1  0  61432  53460  30524 341957270 7   868 4434  6706 43  
6 49  2
1  0  61432  58668  30628 342014800 0   284 1785  1628 27  
3 69  1


iostat sda8 is the where the pg_data resides, sda3 is /var/log
avg-cpu:  %user   %nice%sys %iowait   %idle
 17.360.001.960.82   79.86
Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda3  2.38 0.4918.71 432395   16672800
sda8 62.3474.46   491.74   6634  438143794

avg-cpu:  %user   %nice%sys %iowait   %idle
 30.500.003.571.70   64.22
Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda3  5.60 0.0044.80  0448
sda8120.20   134.40   956.00   1344   9560

avg-cpu:  %user   %nice%sys %iowait   %idle
 20.680.003.431.35   74.54
Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda3  3.30 0.0026.40  0264
sda8 97.90 0.00   783.20  0   7832

avg-cpu:  %user   %nice%sys %iowait   %idle
 22.310.002.750.68   74.27
Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda3  2.10 0.0016.78  0168
sda8 60.34 0.80   481.92  8   4824

avg-cpu:  %user   %nice%sys %iowait   %idle
 11.650.001.601.03   85.72
Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda3  1.70 0.0013.61  0136
sda8 59.36 0.00   474.87  0   4744


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Joe Uhl
Scott Marlowe wrote:
 On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote:
   
 Hi,

 How about the Dell Perc 5/i card, 512MB battery backed cache or IBM
 ServeRAID-8k Adapter?
 

 All Dell Percs have so far been based on either adaptec or LSI
 controllers, and have ranged from really bad to fairly decent
 performers.  There were some recent posts on this list where someone
 was benchmarking one, I believe.  searching the list archives might
 prove useful.

 I am not at all familiar with IBM's ServeRAID controllers.

 Do either of these come with or have the option for battery back
 module for the cache?

   
 I hope I am sending relevant information here, I am not too well versed with
 RAID controllers.
 

 Yep.  Def look for a chance to evaluate whichever ones you're
 considering.  The Areca's are in the same price range as the IBM
 controller you're considering, maybe a few hundred dollars more.  See
 if you can get one for review while looking at these other
 controllers.

 I'd recommend against Dell unless you're at a company that orders
 computers by the hundred lot.  My experience with Dell has been that
 unless you are a big customer you're just another number (a small one
 at that) on a spreadsheet.
   
If you do go with Dell get connected with an account manager instead of
ordering online.  You work with the same people every time you have an
order and in my experience they can noticeably beat the best prices I
can find.  This is definitely the way to go if you don't want to get
lost in the volume.  The group I have worked with for the past ~2 years
is very responsive, remembers me and my company across the 3 - 6 month
gaps between purchases, and the server/storage person in the group is
reasonably knowledgeable and helpful.  This is for small lots of
machines, our first order was just 2 boxes and i've only placed 4 orders
total in the past 2 years.

Just my personal experience, i'd be happy to pass along the account
manager's information if anyone is interested.
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
   
Joe Uhl
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Joel Fradkin
I am not sure I agree with that evaluation.
I only have 2 dell database servers and they have been 100% reliable.
Maybe he is referring to support which does tend be up to who you get.
When I asked about performance on my new server they were very helpful but I
did have a bad time on my NAS device (but had the really cheap support plan
on it). They did help me get it fixed but I had to RMA all the drives on the
NAS as they were all bad and it was no fun installing the os as it had no
floppy. I got the better support for both the data base servers which are
using jbod from dell for the disk array. The quad proc opteron with duel
cores and 16gig of memory has been extremely fast (like 70%) over my older 4
proc 32 bit single core machine with 8 gig. But both are running postgres
and perform needed functionality. I would like to have redundant backups of
these as they are mission critical, but all in good time.

I'd recommend against Dell unless you're at a company that orders
computers by the hundred lot.  My experience with Dell has been that
unless you are a big customer you're just another number (a small one
at that) on a spreadsheet.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Arjen van der Meijden

On 6-9-2007 20:42 Scott Marlowe wrote:

On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote:

Hi,

How about the Dell Perc 5/i card, 512MB battery backed cache or IBM
ServeRAID-8k Adapter?


All Dell Percs have so far been based on either adaptec or LSI
controllers, and have ranged from really bad to fairly decent
performers.  There were some recent posts on this list where someone
was benchmarking one, I believe.  searching the list archives might
prove useful.


The Dell PERC5-cards are based on LSI-chips and perform quite well. 
Afaik Dell hasn't used adaptecs for a while now, but even recent 
(non-cheap ;) ) adaptecs aren't that bad afaik.


The disadvantage of using Areca or 3Ware is obviously the lack of 
support in A-brand servers and the lack of support for SAS-disks. Only 
recently Areca has stepped in the SAS-market, but I have no idea how 
easily those controllers are integrated in standard servers (they tend 
to be quite large, which will not fit in 2U and maybe not even in 3U or 
4U-servers).


Arjen

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Arjen van der Meijden

On 6-9-2007 20:29 Mark Lewis wrote:

Maybe I'm jaded by past experiences, but the only real use case I can
see to justify a SAN for a database would be something like Oracle RAC,
but I'm not aware of any PG equivalent to that.


PG Cluster II seems to be able to do that, but I don't know whether 
that's production quality already...


Arjen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] Reasonable amount of indices

2007-09-06 Thread Patric

Hi,
   I've a question about amount of indices.
  
   I explain my issue based on an example:

   Table which contains person information, one row per person.
   There will be lots of SELECTS doing search by special criteria, 
e.g.: Age, Gender.


   Now there will be 4 User groups which will select on the table:
   Group 1) Always doing reads on specific continents.
   Group 2) Always doing reads in specific country.
   Group 3) Always doing reads in specific region within a country.
   Group 4) Always doing reads in specific city.

   I 'm indexing the the important attributes. Would be about 5 to 6 
independent indexes.
   As there will be millions of rows, quite a lot of hits will be 
returned, I guess
   it will generate big bitmaps to calculate the intersection of the 
indices.


   Ok to prevent this from happening I'd wanted to create 4 Indexes per 
attribute, with
   special predicate, so users which only query for a country don't 
scan an index

   which indexed the entire globe:

   e.g ..
   CREATE index BlaBla_city on table tblusers(dtage) WHERE dtcity='London';
   CREATE index BlaBla_country on table tblusers(dtage) WHERE 
dtcountry='uk';
   CREATE index BlaBla_continent on table tblusers(dtage) WHERE 
dtcontinent='europe';

   etc.

   SELECT * FROM tblusers WHERE dtcontinent='europe' and age='23'
   would then postgres lead to use the special index made for europe.

   Now that I've 4 Indexes. an Insert or update will lead to some more 
overhead,  but which would be ok.


   My Question now is: Is it wise to do so, and create hundreds or 
maybe thousands of Indices

   which partition the table for the selections.

   Does postgres scale good on the selecton of indices or is the 
predicate for indices not

   layed out for such a usage?

   (PS: Don't want partition with method postgres offers..)

thanks in advance,
patric


  


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-06 Thread James Mansion

Scott Marlowe wrote:

And there's the issue that with windows / NTFS that when one process
opens a file for read, it locks it for all other users.  This means
that things like virus scanners can cause odd, unpredictable failures
of your database.

  

Can you provide some justification for this?

James


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-06 Thread James Mansion

Scott Marlowe wrote:

Where unixes generally outperform windows is in starting up new
backends, better file systems, and handling very large shared_buffer
settings.
  


Why do you think that UNIX systems are better at handling large shared 
buffers than Wndows?
32 bit Windows systems can suffer from fragmented address space, to be 
sure, but if the
performance of the operating-system supplied mutex or semaphore isn't 
good enough, you can

just use the raw atomic ops.

If what you mean is that pg has a design that's heavily oriented towards 
things that tend to
be cheap on POSIX and doesn't use the core Win32 features effectively, 
then let's track

that as an optimisation opportunity for the Win32 port.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-06 Thread Scott Marlowe
On 9/6/07, James Mansion [EMAIL PROTECTED] wrote:
 Scott Marlowe wrote:
  And there's the issue that with windows / NTFS that when one process
  opens a file for read, it locks it for all other users.  This means
  that things like virus scanners can cause odd, unpredictable failures
  of your database.
 
 
 Can you provide some justification for this?

Seeing as I didn't write Windows or any of the plethora of anti-virus
software, no I really can't. It's unforgivable behaviour.

Can I provide evidence that it happens?  Just read the archives of
this list for the evidence.  I've seen it often enough to know that
most anti-virus software seems to open files in exclusive mode and
cause problems for postgresql, among other apps.


 Why do you think that UNIX systems are better at handling large shared
 buffers than Wndows?

Because we've seen lots of problems with large shared buffers on windows here.

Now, maybe for a windows specific app it's all fine and dandy.  but
for the way pgsql works, windows and large shared buffers don't seem
to get along.

I'm done.  Use windows all you want.  I'll stick to unix.  It seems to
just work for pgsql.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Reasonable amount of indices

2007-09-06 Thread Alvaro Herrera
Patric wrote:

My Question now is: Is it wise to do so, and create hundreds or maybe 
 thousands of Indices
which partition the table for the selections.

No, this is not helpful -- basically what you are doing is taking the
first level (the first couple of levels maybe) of the index out of it,
and charging the planner with the selection of which one is the best for
each query.

Of course, I am assuming you were simplifing your queries and don't
actually store the name of the continent etc on each on every row.
Because if you actually do that, then there's your first oportunity for
actual optimization (which happens to be a more normalized model), far
more effective than the partial indexes you are suggesting.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-06 Thread Carlo Stonebanks
Wow - it's nice to hear someone say that... out loud.

Thanks, you gave me hope!

-Original Message-
From: James Mansion [mailto:[EMAIL PROTECTED] 
Sent: September 6, 2007 4:55 PM
To: Carlo Stonebanks
Cc: Scott Marlowe; Alvaro Herrera; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

Carlo Stonebanks wrote:
 Isn't it just easier to assume that Windows Server can't do anything
right?
 ;-)

   
Well, avoiding the ;-) - people do, and its remarkably foolish of them.  Its
a long-standing whinge that many people with a UNIX-background seem to
just assume that Windows sucks, but you could run 40,000 sockets from a
single Win32 process for a while and some well-known UNIX systems would
still struggle to do this, libevent or no.  Admitedly, the way a Win32
app is architected would be rather different from a typical POSIX one.

Windows has been a cheap target bt its remarkably adequate and the
TPC results speak for themselves.






---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] How planner decides left-anchored LIKE can use index

2007-09-06 Thread Tom Lane
Carlo Stonebanks [EMAIL PROTECTED] writes:
 There is an index on lower(last_name). I have seen the planner convert the 
 LIKE to lower(last_name) = 'smith' and lower(last_name)  'smiti' on 8.2.4 
 systems, but a slow sequence scan and filter on 8.1.9 - is this related to 
 the version difference (8.1.9 vs 8.2.4) or is this related to something like 
 operators/classes that have been installed?

Most likely you used C locale for the 8.2.4 installation and some other
locale for the other one.

In non-C locale you can still get the optimization if you create an
index using the text_pattern_ops opclass ... but beware that this index
is useless for the normal locale-aware operators.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Scott Marlowe
On 9/6/07, Joel Fradkin [EMAIL PROTECTED] wrote:
 I am not sure I agree with that evaluation.
 I only have 2 dell database servers and they have been 100% reliable.
 Maybe he is referring to support which does tend be up to who you get.
 When I asked about performance on my new server they were very helpful but I
 did have a bad time on my NAS device (but had the really cheap support plan
 on it). They did help me get it fixed but I had to RMA all the drives on the
 NAS as they were all bad and it was no fun installing the os as it had no
 floppy. I got the better support for both the data base servers which are
 using jbod from dell for the disk array. The quad proc opteron with duel
 cores and 16gig of memory has been extremely fast (like 70%) over my older 4
 proc 32 bit single core machine with 8 gig. But both are running postgres
 and perform needed functionality. I would like to have redundant backups of
 these as they are mission critical, but all in good time.

Dell's ok if by support you mean replacing simple broken parts, etc...

I'm talking about issues like the one we had with our 26xx servers
which, thankfully, Dell hasn't made in a while.  We had the adaptec
controllers that locked up once every 1 to 3 months for no reason, and
with 4 servers this meant a lockup of one every 1 to 2 weeks.  Not
acceptable in a production environment.  It took almost 2 years to get
Dell to agree to ship us the LSI based RAID controllers for those 4
machines.  Our account rep told us not to worry about returning the
RAID controllers as they were so old as to be obsolete.  One of the
four replacement controllers they sent us was bad, so they
cross-shipped another one.  Sometimes you get a broken part, it
happens.

One month after we got all our RAID controllers replaced, we started
getting nasty calls from their parts people wanting those parts back,
saying they were gonna charge us for them, etc...  Two thirds of the
parts were still in the server because we hadn't gotten a sheet
identifying them (the RAID key and battery) and had left in not
worrying about it because we'd been told they were needed.

These are production servers, we can't just shut them off for fun to
pull a part we were told we didn't need to return.

On top of that, the firmware updates come as an .exe that has to be
put on a windows floppy.  We don't have a single windows machine with
a floppy at the company I work for (lots of laptops with windows).  We
had to dig out a floppy drive and a windows CD to create a bootable
floppy to install a firmware update for a linux server.

So, my main complaint is about their customer service.  The machines,
when they work, are pretty good.  Normally the hardware gets along
with RH OSes.  But when things go wrong, Dell will not admit to a
design problem that's staring them in the face, and they have wasted
literally hundreds of man hours for us with their stonewalling on this
subject.  So, I see no reason to buy more hardware from them when
there are dealers big and small who have so far treated us much
better.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Reasonable amount of indices

2007-09-06 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Patric wrote:
 My Question now is: Is it wise to do so, and create hundreds or maybe 
 thousands of Indices
 which partition the table for the selections.

 No, this is not helpful -- basically what you are doing is taking the
 first level (the first couple of levels maybe) of the index out of it,

Right --- you'd be *far* better off using a small number of multicolumn
indexes.  I wouldn't want to bet that the planner code scales
effectively to thousands of indexes, and even if it does, you're
throwing away any chance of using parameterized queries.

The update overhead is unpleasant to contemplate as well (or is this a
read-only table?)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-06 Thread Carlo Stonebanks

If what you mean is that pg has a design that's heavily oriented towards 
things that tend to
be cheap on POSIX and doesn't use the core Win32 features effectively, 
then let's track
that as an optimisation opportunity for the Win32 port.


Isn't it just easier to assume that Windows Server can't do anything right?
;-)



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-06 Thread James Mansion

Carlo Stonebanks wrote:

Isn't it just easier to assume that Windows Server can't do anything right?
;-)

  

Well, avoiding the ;-) - people do, and its remarkably foolish of them.  Its
a long-standing whinge that many people with a UNIX-background seem to
just assume that Windows sucks, but you could run 40,000 sockets from a
single Win32 process for a while and some well-known UNIX systems would
still struggle to do this, libevent or no.  Admitedly, the way a Win32
app is architected would be rather different from a typical POSIX one.

Windows has been a cheap target bt its remarkably adequate and the
TPC results speak for themselves.






---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] SAN vs Internal Disks

2007-09-06 Thread Greg Smith

On Thu, 6 Sep 2007, Harsh Azad wrote:


Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find
any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5?


You've already been pointed in the right direction. Devrim, the person who 
handles this packaging, does a great job of building all the RPMs.  But I 
have a small philisophical difference with the suggested instructions for 
actually installing them though.  I just finished an alternate 
installation guide for RHEL/CentOS 5 that's now posted at 
http://www.westnet.com/~gsmith/content/postgresql/pgrpm.htm you may want 
to take a look at.


Secondly, would investing into Redhat enterprise edition give any 
performance advantage? I know all the SAN boxes are only certified on 
RHEL and not CentOS. Or since CentOS is similar to RHEL it would be 
fine?


Wouldn't expect a performance advantage.  The situation you have to ask 
consider is this:  your SAN starts having funky problems, and your 
database is down because of it.  You call the vendor.  They find out 
you're running CentOS instead of RHEL and say that's the cause of your 
problem (even though it probably isn't).  How much will such a passing the 
buck problem cost your company?  If it's a significant number, you'd be 
foolish to run CentOS instead of the real RHEL.  Some SAN vendors can be 
very, very picky about what they will support, and for most business 
environments the RHEL subscription isn't so expensive that it's worth 
wandering into an area where your support situation is fuzzy just to save 
that money.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance on 8CPU's and 32GB of RAM

2007-09-06 Thread Alvaro Herrera
James Mansion escribió:

 If what you mean is that pg has a design that's heavily oriented
 towards things that tend to be cheap on POSIX and doesn't use the core
 Win32 features effectively, then let's track that as an optimisation
 opportunity for the Win32 port.

Already done for 8.3 (actual performance improvements still to be
reported), but that doesn't help those poor users still on 8.2.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados (Luis Wu, Mundo Anillo)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match