Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
On 2007-09-05 Scott Marlowe wrote: On 9/5/07, Ansgar -59cobalt- Wiechers [EMAIL PROTECTED] wrote: On 2007-09-05 Scott Marlowe wrote: On 9/5/07, Ansgar -59cobalt- Wiechers [EMAIL PROTECTED] wrote: On 2007-09-05 Scott Marlowe wrote: And there's the issue that with windows / NTFS that when one process opens a file for read, it locks it for all other users. This means that things like virus scanners can cause odd, unpredictable failures of your database. Uh... what? Locking isn't done by the filesystem but by applications (which certainly can decide to not lock a file when opening it). And no one in his right mind would ever have a virus scanner access the files of a running database, regardless of operating system or filesystem. Exactly, the default is to lock the file. The application has to explicitly NOT lock it. It's the opposite of linux. Yes. So? It's still up to the application, and it still has nothing at all to do with the filesystem. And if you look at my original reply, you'll see that I said WINDOWS / NTFS. not just NTFS. i.e. it's a windowsism. I am aware of what you wrote. However, since the locking behaviour is exactly the same with Windows/FAT32 or Windows/%ANY_OTHER_FILESYSTEM% your statement is still wrong. Regards Ansgar Wiechers -- The Mac OS X kernel should never panic because, when it does, it seriously inconveniences the user. --http://developer.apple.com/technotes/tn2004/tn2118.html ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Hardware spec
Hi guys, I'm have the rare opportunity to spec the hardware for a new database server. It's going to replace an older one, driving a social networking web application. The current server (a quad opteron with 4Gb of RAM and 80Gb fast SCSI RAID10) is coping with an average load of ranging between 1.5 and 3.5. The new machine spec I have so far: 2 x Intel Xeon 2.33 GHz Dual Core Woodcrest Processors 4 Gb RAM 5x73 GB Ultra320 SCSI RAID 5 (288 GB storage) I've heard that RAID 5 is not necessarily the best performer. Also, are there any special tricks when partition the file system? Regards, Willo ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware spec
Willo van der Merwe wrote: Hi guys, I'm have the rare opportunity to spec the hardware for a new database server. It's going to replace an older one, driving a social networking web application. The current server (a quad opteron with 4Gb of RAM and 80Gb fast SCSI RAID10) is coping with an average load of ranging between 1.5 and 3.5. The new machine spec I have so far: What's the limiting factor on your current machine - disk, memory, cpup? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware spec
Richard Huxton wrote: Willo van der Merwe wrote: Hi guys, I'm have the rare opportunity to spec the hardware for a new database server. It's going to replace an older one, driving a social networking web application. The current server (a quad opteron with 4Gb of RAM and 80Gb fast SCSI RAID10) is coping with an average load of ranging between 1.5 and 3.5. The new machine spec I have so far: What's the limiting factor on your current machine - disk, memory, cpup? I'm a bit embarrassed to admit that I'm not sure. The reason we're changing machines is that we might be changing ISPs and we're renting / leasing the machines from the ISP. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Hardware spec
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Willo van der Merwe wrote: Richard Huxton wrote: Willo van der Merwe wrote: Hi guys, I'm have the rare opportunity to spec the hardware for a new database server. It's going to replace an older one, driving a social networking web application. The current server (a quad opteron with 4Gb of RAM and 80Gb fast SCSI RAID10) is coping with an average load of ranging between 1.5 and 3.5. The new machine spec I have so far: What's the limiting factor on your current machine - disk, memory, cpup? I'm a bit embarrassed to admit that I'm not sure. The reason we're changing machines is that we might be changing ISPs and we're renting / leasing the machines from the ISP. Before you get rid of the current ISP, better examine what is going on with the present setup. It would be good to know if you are memory, processor, or IO limited. That way you could increase what needs to be increased, and not waste money where the bottleneck is not. - -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jerseyhttp://counter.li.org ^^-^^ 07:10:01 up 28 days, 10:32, 4 users, load average: 5.48, 4.77, 4.37 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org iD8DBQFG3+FwPtu2XpovyZoRAmp+AJ9R4mvznqJ24ZCPK8DcTAsz2d34+QCfQzhH vmXnoJO0vm/A/f/Ol0TOy6o= =9rsm -END PGP SIGNATURE- ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Hardware spec
Jean-David Beyer wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Willo van der Merwe wrote: Richard Huxton wrote: Willo van der Merwe wrote: Hi guys, I'm have the rare opportunity to spec the hardware for a new database server. It's going to replace an older one, driving a social networking web application. The current server (a quad opteron with 4Gb of RAM and 80Gb fast SCSI RAID10) is coping with an average load of ranging between 1.5 and 3.5. The new machine spec I have so far: What's the limiting factor on your current machine - disk, memory, cpup? I'm a bit embarrassed to admit that I'm not sure. The reason we're changing machines is that we might be changing ISPs and we're renting / leasing the machines from the ISP. Before you get rid of the current ISP, better examine what is going on with the present setup. It would be good to know if you are memory, processor, or IO limited. That way you could increase what needs to be increased, and not waste money where the bottleneck is not. Good advice. After running a vmstat and iostat, it is clear, to my mind anyway, that the most likely bottleneck is IO, next is probably some more RAM. Here's the output: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 0 0 29688 80908 128308 331579200 8636 8 17 2 80 1 avg-cpu: %user %nice%sys %iowait %idle 17.180.001.930.81 80.08 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 14.5766.48 506.45 58557617 446072213 sda1 0.60 0.27 4.70 2351224136128 sda2 0.38 0.77 2.27 6787542002576 sda3 2.37 0.4918.61 429171 16389960 sda4 0.00 0.00 0.00 2 0 sda5 0.71 0.66 5.46 5783074807087 sda6 0.03 0.01 0.24 6300 214196 sda7 0.02 0.00 0.19 2622 165992 sda8 60.1964.29 474.98 56626211 418356226 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] SAN vs Internal Disks
Hi, We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5. The current data size is about 50GB, but we want to purchase the hardware to scale to about 1TB as we think our business will need to support that much soon. - Currently we have a 80% read and 20% write perecntages. - Currently with this configuration the Database is showing signs of over-loading. - Auto-vaccum, etc run on this database, vaccum full runs nightly. - Currently CPU loads are about 20%, memory utilization is full (but this is also due to linux caching disk blocks) and IO waits are frequent. - We have a load of about 400 queries per second Now we are considering to purchase our own servers and in the process are facing the usual dilemmas. First I'll list out what machine we have decided to use: 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) 32 GB RAM OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 (Data Storage mentioned below) We have already decided to split our database into 3 machines on the basis on disjoint sets of data. So we will be purchasing three of these boxes. HELP 1: Does something look wrong with above configuration, I know there will be small differences b/w opetron/xeon. But do you think there is something against going for 2.4Ghz Quad Xeons (clovertown i think)? HELP 2: The main confusion is with regards to Data Storage. We have the option of going for: A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3 disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We are also considering similar solution from EMC - CX310C. B: Go for Internal of DAS based storage. Here for each server we should be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on RAID-10 single table-space. What do I think? Well.. SAN wins on manageability, replication (say to a DR site), backup, etc... DAS wins on cost But for a moment keeping these aside, i wanted to discuss, purely on performance side which one is a winner? It feels like internal-disks will perform better, but need to understand a rough magnitude of difference in performance to see if its worth loosing the manageability features. Also if we choose to go with DAS, what would be the best tool to do async replication to DR site and maybe even as a extra plus a second read-only DB server to distribute select loads. Regards, Azad
Re: [PERFORM] utilising multi-cpu/core machines?
On Wed, Sep 5, 2007 at 5:41 PM, in message [EMAIL PROTECTED], Thomas Finneid [EMAIL PROTECTED] wrote: how does pg utilise multi cpus/cores, i.e. does it use more than one core? and possibly, how, are there any documentation about this. For portability reasons PostgreSQL doesn't use threads, per se, but spawns a new process for each connection, and a few for other purposes. Each process may be running on a separate CPU, but a single connection will only be using one -- directly, anyway. (The OS may well be using the other for I/O, etc.) For documentation, you could start with this: http://www.postgresql.org/docs/8.2/interactive/app-postgres.html -Kevin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Hardware spec
* Willo van der Merwe: Good advice. After running a vmstat and iostat, it is clear, to my mind anyway, that the most likely bottleneck is IO, next is probably some more RAM. Here's the output: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 0 0 29688 80908 128308 331579200 8636 8 17 2 80 1 You need to run vmstat 10 (for ten-second averages) and report a couple of lines. avg-cpu: %user %nice%sys %iowait %idle 17.180.001.930.81 80.08 Same for iostat. Your initial numbers suggest that your server isn't I/O-bound, though (the percentage spent in iowait is much too small, and so are the tps numbers). -- Florian Weimer[EMAIL PROTECTED] BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware spec
* Willo van der Merwe: Florian Weimer wrote: You need to run vmstat 10 (for ten-second averages) and report a couple of lines. 2 80 1 5 0 61732 37052 28180 34319560014 987 2320 2021 38 sda3 3.30 0.0026.40 0264 sda8 97.90 0.00 783.20 0 7832 These values don't look I/O bound to me. CPU usage is pretty low, too. -- Florian Weimer[EMAIL PROTECTED] BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [ADMIN] ADO -PostgreSQL OLE DB Provider
--- Jayaram Bhat [EMAIL PROTECTED] wrote: I am using a postgres setup in Windows. And it is working fine usings ODBC drive, but not in ADO PostgreSQL OLE DB Provider giving error test connection 'Test connection failed because of an error in initializing provider. Unspecified error' Can i get help for the same This is my understanding regarding the OLE DB driver for PostgreSQL: (1) The OLE DB Provider lacks the robustness that the ODBC driver has. (2) The OLE DB Provider lacks much of the Functionality that an OLE DB provider should have. (3) And this probably will not change soon since the OLE DB driver is not actively supported. Regards, Richard Broersma Jr. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] SAN vs Internal Disks
On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote: Hi, We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5. The current data size is about 50GB, but we want to purchase the hardware to scale to about 1TB as we think our business will need to support that much soon. - Currently we have a 80% read and 20% write percentages. For this type load, you should be running on RAID10 not RAID5. Or, if you must use RAID 5, use more disks and have a battery backed caching RAID controller known to perform well with RAID5 and large arrays. - Currently with this configuration the Database is showing signs of over-loading. On I/O or CPU? If you're running out of CPU, then look to increasing CPU horsepower and tuning postgresql. If I/O then you need to look into a faster I/O subsystem. - Auto-vaccum, etc run on this database, vaccum full runs nightly. Generally speaking, if you need to run vacuum fulls, you're doing something wrong. Is there a reason you're running vacuum full or is this just precautionary. vacuum full can bloat your indexes, so you shouldn't run it regularly. reindexing might be a better choice if you do need to regularly shrink your db. The better option is to monitor your fsm usage and adjust fsm settings / autovacuum settings as necessary. - Currently CPU loads are about 20%, memory utilization is full (but this is also due to linux caching disk blocks) and IO waits are frequent. - We have a load of about 400 queries per second What does vmstat et. al. say about CPU versus I/O wait? Now we are considering to purchase our own servers and in the process are facing the usual dilemmas. First I'll list out what machine we have decided to use: 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) 32 GB RAM OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 (Data Storage mentioned below) We have already decided to split our database into 3 machines on the basis on disjoint sets of data. So we will be purchasing three of these boxes. HELP 1: Does something look wrong with above configuration, I know there will be small differences b/w opetron/xeon. But do you think there is something against going for 2.4Ghz Quad Xeons (clovertown i think)? Look like good machines, plenty fo memory. HELP 2: The main confusion is with regards to Data Storage. We have the option of going for: A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3 disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We are also considering similar solution from EMC - CX310C. B: Go for Internal of DAS based storage. Here for each server we should be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on RAID-10 single table-space. What do I think? Well.. SAN wins on manageability, replication (say to a DR site), backup, etc... DAS wins on cost The problem with SAN is that it's apparently very easy to build a big expensive system that performs poorly. We've seen reports of such here on the lists a few times. I would definitely demand an evaluation period from your supplier to make sure it performs well if you go SAN. But for a moment keeping these aside, i wanted to discuss, purely on performance side which one is a winner? It feels like internal-disks will perform better, but need to understand a rough magnitude of difference in performance to see if its worth loosing the manageability features. That really really really depends. The quality of RAID controllers for either setup is very important, as is the driver support, etc... All things being even, I'd lean towards the local storage. Also if we choose to go with DAS, what would be the best tool to do async replication to DR site and maybe even as a extra plus a second read-only DB server to distribute select loads. Look at slony, or PITR with continuous recovery. Of those two, I've only used Slony in production, and I was very happy with it's performance, and it was very easy to write a bash script to monitor the replication for failures. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] SAN vs Internal Disks
Thanks Mark. If I replicate a snapshot of Data and log files (basically the entire PG data directory) and I maintain same version of postgres on both servers, it should work right? I am also thinking that having SAN storage will provide me with facility of keeping a warm standby DB. By just shutting one server down and starting the other mounting the same File system I should be able to bing my DB up when the primary inccurs a physical failure. I'm only considering SAN storage for this feature - has anyone ever used SAN for replication and warm standy-by on Postgres? Regards, Harsh On 9/6/07, Mark Lewis [EMAIL PROTECTED] wrote: On Thu, 2007-09-06 at 18:05 +0530, Harsh Azad wrote: Hi, We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5. The current data size is about 50GB, but we want to purchase the hardware to scale to about 1TB as we think our business will need to support that much soon. - Currently we have a 80% read and 20% write perecntages. - Currently with this configuration the Database is showing signs of over-loading. - Auto-vaccum, etc run on this database, vaccum full runs nightly. - Currently CPU loads are about 20%, memory utilization is full (but this is also due to linux caching disk blocks) and IO waits are frequent. - We have a load of about 400 queries per second Now we are considering to purchase our own servers and in the process are facing the usual dilemmas. First I'll list out what machine we have decided to use: 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) 32 GB RAM OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 (Data Storage mentioned below) We have already decided to split our database into 3 machines on the basis on disjoint sets of data. So we will be purchasing three of these boxes. HELP 1: Does something look wrong with above configuration, I know there will be small differences b/w opetron/xeon. But do you think there is something against going for 2.4Ghz Quad Xeons (clovertown i think)? HELP 2: The main confusion is with regards to Data Storage. We have the option of going for: A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3 disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We are also considering similar solution from EMC - CX310C. B: Go for Internal of DAS based storage. Here for each server we should be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on RAID-10 single table-space. What do I think? Well.. SAN wins on manageability, replication (say to a DR site), backup, etc... DAS wins on cost But for a moment keeping these aside, i wanted to discuss, purely on performance side which one is a winner? It feels like internal-disks will perform better, but need to understand a rough magnitude of difference in performance to see if its worth loosing the manageability features. Also if we choose to go with DAS, what would be the best tool to do async replication to DR site and maybe even as a extra plus a second read-only DB server to distribute select loads. Sounds like a good candidate for Slony replication for backups / read-only slaves. I haven't seen a SAN yet whose DR / replication facilities are on par with a good database replication solution. My impression is that those facilities are mostly for file servers, mail servers, etc. It would be difficult for a SAN to properly replicate a database given the strict ordering, size and consistency requirements for the data files. Not impossible, but in my limited experience I haven't found one that I trust to do it reliably either, vendor boastings to the contrary notwithstanding. (Hint: make sure you know exactly what your vendor's definition of the term 'snapshot' really means). So before you invest in a SAN, make sure that you're actually going to be able to (and want to) use all the nice management features you're paying for. We have some SAN's that are basically acting just as expensive external RAID arrays because we do the database replication/backup in software anyway. -- Mark Lewis -- Harsh Azad === [EMAIL PROTECTED]
Re: [PERFORM] SAN vs Internal Disks
On Thu, 2007-09-06 at 18:05 +0530, Harsh Azad wrote: Hi, We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5. The current data size is about 50GB, but we want to purchase the hardware to scale to about 1TB as we think our business will need to support that much soon. - Currently we have a 80% read and 20% write perecntages. - Currently with this configuration the Database is showing signs of over-loading. - Auto-vaccum, etc run on this database, vaccum full runs nightly. - Currently CPU loads are about 20%, memory utilization is full (but this is also due to linux caching disk blocks) and IO waits are frequent. - We have a load of about 400 queries per second Now we are considering to purchase our own servers and in the process are facing the usual dilemmas. First I'll list out what machine we have decided to use: 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) 32 GB RAM OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 (Data Storage mentioned below) We have already decided to split our database into 3 machines on the basis on disjoint sets of data. So we will be purchasing three of these boxes. HELP 1: Does something look wrong with above configuration, I know there will be small differences b/w opetron/xeon. But do you think there is something against going for 2.4Ghz Quad Xeons (clovertown i think)? HELP 2: The main confusion is with regards to Data Storage. We have the option of going for: A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3 disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We are also considering similar solution from EMC - CX310C. B: Go for Internal of DAS based storage. Here for each server we should be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on RAID-10 single table-space. What do I think? Well.. SAN wins on manageability, replication (say to a DR site), backup, etc... DAS wins on cost But for a moment keeping these aside, i wanted to discuss, purely on performance side which one is a winner? It feels like internal-disks will perform better, but need to understand a rough magnitude of difference in performance to see if its worth loosing the manageability features. Also if we choose to go with DAS, what would be the best tool to do async replication to DR site and maybe even as a extra plus a second read-only DB server to distribute select loads. Sounds like a good candidate for Slony replication for backups / read-only slaves. I haven't seen a SAN yet whose DR / replication facilities are on par with a good database replication solution. My impression is that those facilities are mostly for file servers, mail servers, etc. It would be difficult for a SAN to properly replicate a database given the strict ordering, size and consistency requirements for the data files. Not impossible, but in my limited experience I haven't found one that I trust to do it reliably either, vendor boastings to the contrary notwithstanding. (Hint: make sure you know exactly what your vendor's definition of the term 'snapshot' really means). So before you invest in a SAN, make sure that you're actually going to be able to (and want to) use all the nice management features you're paying for. We have some SAN's that are basically acting just as expensive external RAID arrays because we do the database replication/backup in software anyway. -- Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SAN vs Internal Disks
Thanks Scott, we have now requested IBM/EMC to provide test machines. Interestingly since you mentioned the importance of Raid controllers and the drivers; we are planning to use Cent OS 5 for hosting the DB. Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5? Secondly, would investing into Redhat enterprise edition give any performance advantage? I know all the SAN boxes are only certified on RHEL and not CentOS. Or since CentOS is similar to RHEL it would be fine? Regards, Harsh On 9/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote: Hi, We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5. The current data size is about 50GB, but we want to purchase the hardware to scale to about 1TB as we think our business will need to support that much soon. - Currently we have a 80% read and 20% write percentages. For this type load, you should be running on RAID10 not RAID5. Or, if you must use RAID 5, use more disks and have a battery backed caching RAID controller known to perform well with RAID5 and large arrays. - Currently with this configuration the Database is showing signs of over-loading. On I/O or CPU? If you're running out of CPU, then look to increasing CPU horsepower and tuning postgresql. If I/O then you need to look into a faster I/O subsystem. - Auto-vaccum, etc run on this database, vaccum full runs nightly. Generally speaking, if you need to run vacuum fulls, you're doing something wrong. Is there a reason you're running vacuum full or is this just precautionary. vacuum full can bloat your indexes, so you shouldn't run it regularly. reindexing might be a better choice if you do need to regularly shrink your db. The better option is to monitor your fsm usage and adjust fsm settings / autovacuum settings as necessary. - Currently CPU loads are about 20%, memory utilization is full (but this is also due to linux caching disk blocks) and IO waits are frequent. - We have a load of about 400 queries per second What does vmstat et. al. say about CPU versus I/O wait? Now we are considering to purchase our own servers and in the process are facing the usual dilemmas. First I'll list out what machine we have decided to use: 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) 32 GB RAM OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 (Data Storage mentioned below) We have already decided to split our database into 3 machines on the basis on disjoint sets of data. So we will be purchasing three of these boxes. HELP 1: Does something look wrong with above configuration, I know there will be small differences b/w opetron/xeon. But do you think there is something against going for 2.4Ghz Quad Xeons (clovertown i think)? Look like good machines, plenty fo memory. HELP 2: The main confusion is with regards to Data Storage. We have the option of going for: A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3 disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We are also considering similar solution from EMC - CX310C. B: Go for Internal of DAS based storage. Here for each server we should be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on RAID-10 single table-space. What do I think? Well.. SAN wins on manageability, replication (say to a DR site), backup, etc... DAS wins on cost The problem with SAN is that it's apparently very easy to build a big expensive system that performs poorly. We've seen reports of such here on the lists a few times. I would definitely demand an evaluation period from your supplier to make sure it performs well if you go SAN. But for a moment keeping these aside, i wanted to discuss, purely on performance side which one is a winner? It feels like internal-disks will perform better, but need to understand a rough magnitude of difference in performance to see if its worth loosing the manageability features. That really really really depends. The quality of RAID controllers for either setup is very important, as is the driver support, etc... All things being even, I'd lean towards the local storage. Also if we choose to go with DAS, what would be the best tool to do async replication to DR site and maybe even as a extra plus a second read-only DB server to distribute select loads. Look at slony, or PITR with continuous recovery. Of those two, I've only used Slony in production, and I was very happy with it's performance, and it was very easy to write a bash script to monitor the replication for failures. -- Harsh Azad === [EMAIL PROTECTED]
Re: [PERFORM] SAN vs Internal Disks
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Harsh Azad wrote: Thanks Scott, we have now requested IBM/EMC to provide test machines. Interestingly since you mentioned the importance of Raid controllers and the drivers; we are planning to use Cent OS 5 for hosting the DB. Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5? Look under the RHEL section of ftp.postgresql.org Joshua D. Drake Secondly, would investing into Redhat enterprise edition give any performance advantage? I know all the SAN boxes are only certified on RHEL and not CentOS. Or since CentOS is similar to RHEL it would be fine? Regards, Harsh On 9/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote: Hi, We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5. The current data size is about 50GB, but we want to purchase the hardware to scale to about 1TB as we think our business will need to support that much soon. - Currently we have a 80% read and 20% write percentages. For this type load, you should be running on RAID10 not RAID5. Or, if you must use RAID 5, use more disks and have a battery backed caching RAID controller known to perform well with RAID5 and large arrays. - Currently with this configuration the Database is showing signs of over-loading. On I/O or CPU? If you're running out of CPU, then look to increasing CPU horsepower and tuning postgresql. If I/O then you need to look into a faster I/O subsystem. - Auto-vaccum, etc run on this database, vaccum full runs nightly. Generally speaking, if you need to run vacuum fulls, you're doing something wrong. Is there a reason you're running vacuum full or is this just precautionary. vacuum full can bloat your indexes, so you shouldn't run it regularly. reindexing might be a better choice if you do need to regularly shrink your db. The better option is to monitor your fsm usage and adjust fsm settings / autovacuum settings as necessary. - Currently CPU loads are about 20%, memory utilization is full (but this is also due to linux caching disk blocks) and IO waits are frequent. - We have a load of about 400 queries per second What does vmstat et. al. say about CPU versus I/O wait? Now we are considering to purchase our own servers and in the process are facing the usual dilemmas. First I'll list out what machine we have decided to use: 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) 32 GB RAM OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 (Data Storage mentioned below) We have already decided to split our database into 3 machines on the basis on disjoint sets of data. So we will be purchasing three of these boxes. HELP 1: Does something look wrong with above configuration, I know there will be small differences b/w opetron/xeon. But do you think there is something against going for 2.4Ghz Quad Xeons (clovertown i think)? Look like good machines, plenty fo memory. HELP 2: The main confusion is with regards to Data Storage. We have the option of going for: A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3 disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We are also considering similar solution from EMC - CX310C. B: Go for Internal of DAS based storage. Here for each server we should be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on RAID-10 single table-space. What do I think? Well.. SAN wins on manageability, replication (say to a DR site), backup, etc... DAS wins on cost The problem with SAN is that it's apparently very easy to build a big expensive system that performs poorly. We've seen reports of such here on the lists a few times. I would definitely demand an evaluation period from your supplier to make sure it performs well if you go SAN. But for a moment keeping these aside, i wanted to discuss, purely on performance side which one is a winner? It feels like internal-disks will perform better, but need to understand a rough magnitude of difference in performance to see if its worth loosing the manageability features. That really really really depends. The quality of RAID controllers for either setup is very important, as is the driver support, etc... All things being even, I'd lean towards the local storage. Also if we choose to go with DAS, what would be the best tool to do async replication to DR site and maybe even as a extra plus a second read-only DB server to distribute select loads. Look at slony, or PITR with continuous recovery. Of those two, I've only used Slony in production, and I was very happy with it's performance, and it was very easy to write a bash script to monitor the replication for failures. -
Re: [PERFORM] SAN vs Internal Disks
On 6-9-2007 14:35 Harsh Azad wrote: 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) I don't understand this sentence. You seem to imply you might be able to fit more processors in your system? Currently the only Quad Core's you can buy are dual-processor processors, unless you already got a quote for a system that yields the new Intel Tigerton processors. I.e. if they are clovertown's they are indeed Intel Core-architecture processors, but you won't be able to fit more than 2 in the system and get 8 cores in a system. If they are Tigerton, I'm a bit surprised you got a quote for that, although HP seems to offer a system for those. If they are the old dual-core MP's (70xx or 71xx), you don't want those... 32 GB RAM OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 (Data Storage mentioned below) I doubt you need 15k-rpm drives for OS... But that won't matter much on the total cost. HELP 1: Does something look wrong with above configuration, I know there will be small differences b/w opetron/xeon. But do you think there is something against going for 2.4Ghz Quad Xeons (clovertown i think)? Apart from your implication that you may be able to stick more processors in it: no, not to me. Two Quad Core Xeons were even faster than 8 dual core opterons in our benchmarks, although that might also indicate limited OS-, postgres or underlying I/O-scaling. Obviously the new AMD Barcelona-line of processors (coming next week orso) and the new Intel Quad Core's DP (Penryn?) and MP (Tigerton) may be interesting to look at, I don't know how soon systems will be available with those processors (HP seems to offer a tigerton-server). B: Go for Internal of DAS based storage. Here for each server we should be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on RAID-10 single table-space. You don't necessarily need to use internal disks for DAS, since you can also link an external SAS-enclosure either with or without an integrated raid-controller (IBM, Sun, Dell, HP and others have options for that), and those are able to be expanded to either multiple enclosures tied to eachother or to a controller in the server. Those may also be usable in a warm-standby-scenario and may be quite a bit cheaper than FC-hardware. But for a moment keeping these aside, i wanted to discuss, purely on performance side which one is a winner? It feels like internal-disks will perform better, but need to understand a rough magnitude of difference in performance to see if its worth loosing the manageability features. As said, you don't necessarily need real internal disks, since SAS can be used with external enclosures as well, still being DAS. I have no idea what difference you will or may see between those in terms of performance. It probably largely depends on the raid-controller available, afaik the disks will be mostly the same. And it might depend on your available bandwidth, external SAS offers you a 4port-connection allowing for a 12Gbit-connection between a disk-enclosure and a controller. While - as I understand it - even expensive SAN-controllers only offer dual-ported, 8Gbit connections? What's more important is probably the amount of disks and raid-cache you can buy in the SAN vs DAS-scenario. If you can buy 24 disks when going for DAS vs only 12 whith SAN... But then again, I'm no real storage expert, we only have two Dell MD1000 DAS-units at our site. Best regards and good luck, Arjen ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] SAN vs Internal Disks
On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote: Thanks Scott, we have now requested IBM/EMC to provide test machines. Interestingly since you mentioned the importance of Raid controllers and the drivers; we are planning to use Cent OS 5 for hosting the DB. What RAID controllers have you looked at. Seems the two most popular in terms of performance here have been Areca and 3Ware / Escalade. LSI seems to come in a pretty close third. Adaptec is to be avoided as are cheap RAID controllers (i.e. promise etc...) battery backed cache is a must, and the bigger the better. Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5? Secondly, would investing into Redhat enterprise edition give any performance advantage? I know all the SAN boxes are only certified on RHEL and not CentOS. Or since CentOS is similar to RHEL it would be fine? for all intents and purposes, CentOS and RHEL are the same OS, so any pgsql rpm for one should pretty much work for the other. At the worst, you might have to get a srpm and rebuild it for CentOS / White Box. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SAN vs Internal Disks
Hi, How about the Dell Perc 5/i card, 512MB battery backed cache or IBM ServeRAID-8k Adapter? I hope I am sending relevant information here, I am not too well versed with RAID controllers. Regards, Harsh On 9/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote: Thanks Scott, we have now requested IBM/EMC to provide test machines. Interestingly since you mentioned the importance of Raid controllers and the drivers; we are planning to use Cent OS 5 for hosting the DB. What RAID controllers have you looked at. Seems the two most popular in terms of performance here have been Areca and 3Ware / Escalade. LSI seems to come in a pretty close third. Adaptec is to be avoided as are cheap RAID controllers (i.e. promise etc...) battery backed cache is a must, and the bigger the better. Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5? Secondly, would investing into Redhat enterprise edition give any performance advantage? I know all the SAN boxes are only certified on RHEL and not CentOS. Or since CentOS is similar to RHEL it would be fine? for all intents and purposes, CentOS and RHEL are the same OS, so any pgsql rpm for one should pretty much work for the other. At the worst, you might have to get a srpm and rebuild it for CentOS / White Box. -- Harsh Azad === [EMAIL PROTECTED]
Re: [PERFORM] Hardware spec]
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Willo van der Merwe wrote: Jean-David Beyer wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Willo van der Merwe wrote: Richard Huxton wrote: Willo van der Merwe wrote: Hi guys, I'm have the rare opportunity to spec the hardware for a new database server. It's going to replace an older one, driving a social networking web application. The current server (a quad opteron with 4Gb of RAM and 80Gb fast SCSI RAID10) is coping with an average load of ranging between 1.5 and 3.5. The new machine spec I have so far: What's the limiting factor on your current machine - disk, memory, cpup? I'm a bit embarrassed to admit that I'm not sure. The reason we're changing machines is that we might be changing ISPs and we're renting / leasing the machines from the ISP. Before you get rid of the current ISP, better examine what is going on with the present setup. It would be good to know if you are memory, processor, or IO limited. That way you could increase what needs to be increased, and not waste money where the bottleneck is not. Good advice. After running a vmstat and iostat, it is clear, to my mind anyway, that the most likely bottleneck is IO, next is probably some more RAM. Here's the output: procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 0 0 29688 80908 128308 331579200 8636 8 17 2 80 1 avg-cpu: %user %nice%sys %iowait %idle 17.180.001.930.81 80.08 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 14.5766.48 506.45 58557617 446072213 sda1 0.60 0.27 4.70 2351224136128 sda2 0.38 0.77 2.27 6787542002576 sda3 2.37 0.4918.61 429171 16389960 sda4 0.00 0.00 0.00 2 0 sda5 0.71 0.66 5.46 5783074807087 sda6 0.03 0.01 0.24 6300 214196 sda7 0.02 0.00 0.19 2622 165992 sda8 60.1964.29 474.98 56626211 418356226 1.) If this is when the system is heavily loaded, you have more capacity than you need, on the average. Processors are idle, not in wait state. You have enough memory (no swapping going on), disks not too busy (probably). But if it is not heavily loaded, run these when it is. 2.) Did you let vmstat and iostat run just once, or are these the last of several reports. Because these tell the average since boot for the first report of each when they run the first time. If so, the numbers may not mean much. Here is iostat for my machine that is running right now. Only lines involved in postgreSQL are shown: Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda8409.33 0.67 3274.00 40 196440 sdb7 0.12 0.00 0.93 0 56 sdc1 44.73 0.13 357.73 8 21464 sdd1 23.43 0.00 187.47 0 11248 sde1133.45 0.00 1067.60 0 64056 sdf1 78.25 0.00 626.00 0 37560 On sda8 is the Write-Ahead-Log. on sdb7 are some small seldom-used relations, but also the input files that sda and sdb are what my system uses for other stuff as well, though sda is not too heavily used and sdb even less. I am presently loading into the database. sdc1, sdd1, sde1, and sdf1 are the drives reserved for database only. I would suggest getting at least two hard drives on the new system and perhaps putting the WAL on one and the rest on the other. My sda and sdb drives are around 72 GBytes and the sdc, sdd, sde, and sdf are about 18 GBytes. I believe the more spindles the better (within reason) and dividing the stuff up with the indices separate from the associated data to reduce seeking. $ vmstat 30 procs ---memory-- ---swap-- -io --system-- - -cpu-- r b swpd free buff cache si sobibo incs us sy id wa 5 1 1340 248456 248496 66526720028 149 2 1 94 2 3 0 4 2 1340 244240 248948 665636400 0 2670 1248 11320 80 4 14 2 4 1 1340 241008 249492 665986400 0 2701 1222 11432 82 4 12 2 5 0 1340 246268 249868 665364400 0 2799 1223 11412 83 4 12 2 My machine was idle most of the time since boot (other than running BOINC stuff), but is busy loading data into the database at the moment. See how the first line differs from the others? I have 8GBytes RAM on this 32-bit machine running Red Hat Enterprise Linux 5. - -- .~. Jean-David Beyer
Re: [PERFORM] SAN vs Internal Disks
On Thu, 2007-09-06 at 22:28 +0530, Harsh Azad wrote: Thanks Mark. If I replicate a snapshot of Data and log files (basically the entire PG data directory) and I maintain same version of postgres on both servers, it should work right? I am also thinking that having SAN storage will provide me with facility of keeping a warm standby DB. By just shutting one server down and starting the other mounting the same File system I should be able to bing my DB up when the primary inccurs a physical failure. I'm only considering SAN storage for this feature - has anyone ever used SAN for replication and warm standy-by on Postgres? Regards, Harsh We used to use a SAN for warm standby of a database, but with Oracle and not PG. It worked kinda sorta, except for occasional crashes due to buggy drivers. But after going through the exercise, we realized that we hadn't gained anything over just doing master/slave replication between two servers, except that it was more expensive, had a tendency to expose buggy drivers, had a single point of failure in the SAN array, failover took longer and we couldn't use the warm standby server to perform read-only queries. So we reverted back and just used the SAN as expensive DAS and set up a separate box for DB replication. So if that's the only reason you're considering a SAN, then I'd advise you to spend the extra money on more DAS disks. Maybe I'm jaded by past experiences, but the only real use case I can see to justify a SAN for a database would be something like Oracle RAC, but I'm not aware of any PG equivalent to that. -- Mark Lewis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SAN vs Internal Disks
On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote: Hi, How about the Dell Perc 5/i card, 512MB battery backed cache or IBM ServeRAID-8k Adapter? All Dell Percs have so far been based on either adaptec or LSI controllers, and have ranged from really bad to fairly decent performers. There were some recent posts on this list where someone was benchmarking one, I believe. searching the list archives might prove useful. I am not at all familiar with IBM's ServeRAID controllers. Do either of these come with or have the option for battery back module for the cache? I hope I am sending relevant information here, I am not too well versed with RAID controllers. Yep. Def look for a chance to evaluate whichever ones you're considering. The Areca's are in the same price range as the IBM controller you're considering, maybe a few hundred dollars more. See if you can get one for review while looking at these other controllers. I'd recommend against Dell unless you're at a company that orders computers by the hundred lot. My experience with Dell has been that unless you are a big customer you're just another number (a small one at that) on a spreadsheet. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware spec
Florian Weimer wrote: You need to run vmstat 10 (for ten-second averages) and report a couple of lines. procs ---memory-- ---swap-- -io --system-- cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0 61732 47388 27908 34313440010651 4 17 2 80 1 5 0 61732 37052 28180 34319560014 987 2320 2021 38 4 56 2 1 0 61620 43076 28356 343225600 0 367 1691 1321 28 3 67 1 3 0 61620 37620 28484 343274000 0 580 4088 6792 40 5 54 1 2 0 61596 33716 28748 34335200024 415 2087 1890 44 4 49 2 3 0 61592 45300 28904 34162003061 403 2282 2154 41 4 54 1 7 0 61592 30172 29092 34169640019 358 2779 3478 31 6 63 1 1 0 61580 62948 29180 34173686027 312 3632 4396 38 4 57 1 1 0 61444 62388 29400 341796400 6 354 2163 1918 31 4 64 1 2 0 61444 53988 29648 341798800 0 553 2095 1687 33 3 63 1 1 0 61444 63988 29832 341834800 6 352 1767 1424 22 3 73 1 1 1 61444 51148 30052 34191480050 349 1524 834 22 3 74 2 1 0 61432 53460 30524 341957270 7 868 4434 6706 43 6 49 2 1 0 61432 58668 30628 342014800 0 284 1785 1628 27 3 69 1 iostat sda8 is the where the pg_data resides, sda3 is /var/log avg-cpu: %user %nice%sys %iowait %idle 17.360.001.960.82 79.86 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda3 2.38 0.4918.71 432395 16672800 sda8 62.3474.46 491.74 6634 438143794 avg-cpu: %user %nice%sys %iowait %idle 30.500.003.571.70 64.22 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda3 5.60 0.0044.80 0448 sda8120.20 134.40 956.00 1344 9560 avg-cpu: %user %nice%sys %iowait %idle 20.680.003.431.35 74.54 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda3 3.30 0.0026.40 0264 sda8 97.90 0.00 783.20 0 7832 avg-cpu: %user %nice%sys %iowait %idle 22.310.002.750.68 74.27 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda3 2.10 0.0016.78 0168 sda8 60.34 0.80 481.92 8 4824 avg-cpu: %user %nice%sys %iowait %idle 11.650.001.601.03 85.72 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda3 1.70 0.0013.61 0136 sda8 59.36 0.00 474.87 0 4744 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SAN vs Internal Disks
Scott Marlowe wrote: On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote: Hi, How about the Dell Perc 5/i card, 512MB battery backed cache or IBM ServeRAID-8k Adapter? All Dell Percs have so far been based on either adaptec or LSI controllers, and have ranged from really bad to fairly decent performers. There were some recent posts on this list where someone was benchmarking one, I believe. searching the list archives might prove useful. I am not at all familiar with IBM's ServeRAID controllers. Do either of these come with or have the option for battery back module for the cache? I hope I am sending relevant information here, I am not too well versed with RAID controllers. Yep. Def look for a chance to evaluate whichever ones you're considering. The Areca's are in the same price range as the IBM controller you're considering, maybe a few hundred dollars more. See if you can get one for review while looking at these other controllers. I'd recommend against Dell unless you're at a company that orders computers by the hundred lot. My experience with Dell has been that unless you are a big customer you're just another number (a small one at that) on a spreadsheet. If you do go with Dell get connected with an account manager instead of ordering online. You work with the same people every time you have an order and in my experience they can noticeably beat the best prices I can find. This is definitely the way to go if you don't want to get lost in the volume. The group I have worked with for the past ~2 years is very responsive, remembers me and my company across the 3 - 6 month gaps between purchases, and the server/storage person in the group is reasonably knowledgeable and helpful. This is for small lots of machines, our first order was just 2 boxes and i've only placed 4 orders total in the past 2 years. Just my personal experience, i'd be happy to pass along the account manager's information if anyone is interested. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Joe Uhl [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SAN vs Internal Disks
I am not sure I agree with that evaluation. I only have 2 dell database servers and they have been 100% reliable. Maybe he is referring to support which does tend be up to who you get. When I asked about performance on my new server they were very helpful but I did have a bad time on my NAS device (but had the really cheap support plan on it). They did help me get it fixed but I had to RMA all the drives on the NAS as they were all bad and it was no fun installing the os as it had no floppy. I got the better support for both the data base servers which are using jbod from dell for the disk array. The quad proc opteron with duel cores and 16gig of memory has been extremely fast (like 70%) over my older 4 proc 32 bit single core machine with 8 gig. But both are running postgres and perform needed functionality. I would like to have redundant backups of these as they are mission critical, but all in good time. I'd recommend against Dell unless you're at a company that orders computers by the hundred lot. My experience with Dell has been that unless you are a big customer you're just another number (a small one at that) on a spreadsheet. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] SAN vs Internal Disks
On 6-9-2007 20:42 Scott Marlowe wrote: On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote: Hi, How about the Dell Perc 5/i card, 512MB battery backed cache or IBM ServeRAID-8k Adapter? All Dell Percs have so far been based on either adaptec or LSI controllers, and have ranged from really bad to fairly decent performers. There were some recent posts on this list where someone was benchmarking one, I believe. searching the list archives might prove useful. The Dell PERC5-cards are based on LSI-chips and perform quite well. Afaik Dell hasn't used adaptecs for a while now, but even recent (non-cheap ;) ) adaptecs aren't that bad afaik. The disadvantage of using Areca or 3Ware is obviously the lack of support in A-brand servers and the lack of support for SAS-disks. Only recently Areca has stepped in the SAS-market, but I have no idea how easily those controllers are integrated in standard servers (they tend to be quite large, which will not fit in 2U and maybe not even in 3U or 4U-servers). Arjen ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SAN vs Internal Disks
On 6-9-2007 20:29 Mark Lewis wrote: Maybe I'm jaded by past experiences, but the only real use case I can see to justify a SAN for a database would be something like Oracle RAC, but I'm not aware of any PG equivalent to that. PG Cluster II seems to be able to do that, but I don't know whether that's production quality already... Arjen ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Reasonable amount of indices
Hi, I've a question about amount of indices. I explain my issue based on an example: Table which contains person information, one row per person. There will be lots of SELECTS doing search by special criteria, e.g.: Age, Gender. Now there will be 4 User groups which will select on the table: Group 1) Always doing reads on specific continents. Group 2) Always doing reads in specific country. Group 3) Always doing reads in specific region within a country. Group 4) Always doing reads in specific city. I 'm indexing the the important attributes. Would be about 5 to 6 independent indexes. As there will be millions of rows, quite a lot of hits will be returned, I guess it will generate big bitmaps to calculate the intersection of the indices. Ok to prevent this from happening I'd wanted to create 4 Indexes per attribute, with special predicate, so users which only query for a country don't scan an index which indexed the entire globe: e.g .. CREATE index BlaBla_city on table tblusers(dtage) WHERE dtcity='London'; CREATE index BlaBla_country on table tblusers(dtage) WHERE dtcountry='uk'; CREATE index BlaBla_continent on table tblusers(dtage) WHERE dtcontinent='europe'; etc. SELECT * FROM tblusers WHERE dtcontinent='europe' and age='23' would then postgres lead to use the special index made for europe. Now that I've 4 Indexes. an Insert or update will lead to some more overhead, but which would be ok. My Question now is: Is it wise to do so, and create hundreds or maybe thousands of Indices which partition the table for the selections. Does postgres scale good on the selecton of indices or is the predicate for indices not layed out for such a usage? (PS: Don't want partition with method postgres offers..) thanks in advance, patric ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
Scott Marlowe wrote: And there's the issue that with windows / NTFS that when one process opens a file for read, it locks it for all other users. This means that things like virus scanners can cause odd, unpredictable failures of your database. Can you provide some justification for this? James ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
Scott Marlowe wrote: Where unixes generally outperform windows is in starting up new backends, better file systems, and handling very large shared_buffer settings. Why do you think that UNIX systems are better at handling large shared buffers than Wndows? 32 bit Windows systems can suffer from fragmented address space, to be sure, but if the performance of the operating-system supplied mutex or semaphore isn't good enough, you can just use the raw atomic ops. If what you mean is that pg has a design that's heavily oriented towards things that tend to be cheap on POSIX and doesn't use the core Win32 features effectively, then let's track that as an optimisation opportunity for the Win32 port. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
On 9/6/07, James Mansion [EMAIL PROTECTED] wrote: Scott Marlowe wrote: And there's the issue that with windows / NTFS that when one process opens a file for read, it locks it for all other users. This means that things like virus scanners can cause odd, unpredictable failures of your database. Can you provide some justification for this? Seeing as I didn't write Windows or any of the plethora of anti-virus software, no I really can't. It's unforgivable behaviour. Can I provide evidence that it happens? Just read the archives of this list for the evidence. I've seen it often enough to know that most anti-virus software seems to open files in exclusive mode and cause problems for postgresql, among other apps. Why do you think that UNIX systems are better at handling large shared buffers than Wndows? Because we've seen lots of problems with large shared buffers on windows here. Now, maybe for a windows specific app it's all fine and dandy. but for the way pgsql works, windows and large shared buffers don't seem to get along. I'm done. Use windows all you want. I'll stick to unix. It seems to just work for pgsql. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Reasonable amount of indices
Patric wrote: My Question now is: Is it wise to do so, and create hundreds or maybe thousands of Indices which partition the table for the selections. No, this is not helpful -- basically what you are doing is taking the first level (the first couple of levels maybe) of the index out of it, and charging the planner with the selection of which one is the best for each query. Of course, I am assuming you were simplifing your queries and don't actually store the name of the continent etc on each on every row. Because if you actually do that, then there's your first oportunity for actual optimization (which happens to be a more normalized model), far more effective than the partial indexes you are suggesting. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
Wow - it's nice to hear someone say that... out loud. Thanks, you gave me hope! -Original Message- From: James Mansion [mailto:[EMAIL PROTECTED] Sent: September 6, 2007 4:55 PM To: Carlo Stonebanks Cc: Scott Marlowe; Alvaro Herrera; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Performance on 8CPU's and 32GB of RAM Carlo Stonebanks wrote: Isn't it just easier to assume that Windows Server can't do anything right? ;-) Well, avoiding the ;-) - people do, and its remarkably foolish of them. Its a long-standing whinge that many people with a UNIX-background seem to just assume that Windows sucks, but you could run 40,000 sockets from a single Win32 process for a while and some well-known UNIX systems would still struggle to do this, libevent or no. Admitedly, the way a Win32 app is architected would be rather different from a typical POSIX one. Windows has been a cheap target bt its remarkably adequate and the TPC results speak for themselves. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] How planner decides left-anchored LIKE can use index
Carlo Stonebanks [EMAIL PROTECTED] writes: There is an index on lower(last_name). I have seen the planner convert the LIKE to lower(last_name) = 'smith' and lower(last_name) 'smiti' on 8.2.4 systems, but a slow sequence scan and filter on 8.1.9 - is this related to the version difference (8.1.9 vs 8.2.4) or is this related to something like operators/classes that have been installed? Most likely you used C locale for the 8.2.4 installation and some other locale for the other one. In non-C locale you can still get the optimization if you create an index using the text_pattern_ops opclass ... but beware that this index is useless for the normal locale-aware operators. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] SAN vs Internal Disks
On 9/6/07, Joel Fradkin [EMAIL PROTECTED] wrote: I am not sure I agree with that evaluation. I only have 2 dell database servers and they have been 100% reliable. Maybe he is referring to support which does tend be up to who you get. When I asked about performance on my new server they were very helpful but I did have a bad time on my NAS device (but had the really cheap support plan on it). They did help me get it fixed but I had to RMA all the drives on the NAS as they were all bad and it was no fun installing the os as it had no floppy. I got the better support for both the data base servers which are using jbod from dell for the disk array. The quad proc opteron with duel cores and 16gig of memory has been extremely fast (like 70%) over my older 4 proc 32 bit single core machine with 8 gig. But both are running postgres and perform needed functionality. I would like to have redundant backups of these as they are mission critical, but all in good time. Dell's ok if by support you mean replacing simple broken parts, etc... I'm talking about issues like the one we had with our 26xx servers which, thankfully, Dell hasn't made in a while. We had the adaptec controllers that locked up once every 1 to 3 months for no reason, and with 4 servers this meant a lockup of one every 1 to 2 weeks. Not acceptable in a production environment. It took almost 2 years to get Dell to agree to ship us the LSI based RAID controllers for those 4 machines. Our account rep told us not to worry about returning the RAID controllers as they were so old as to be obsolete. One of the four replacement controllers they sent us was bad, so they cross-shipped another one. Sometimes you get a broken part, it happens. One month after we got all our RAID controllers replaced, we started getting nasty calls from their parts people wanting those parts back, saying they were gonna charge us for them, etc... Two thirds of the parts were still in the server because we hadn't gotten a sheet identifying them (the RAID key and battery) and had left in not worrying about it because we'd been told they were needed. These are production servers, we can't just shut them off for fun to pull a part we were told we didn't need to return. On top of that, the firmware updates come as an .exe that has to be put on a windows floppy. We don't have a single windows machine with a floppy at the company I work for (lots of laptops with windows). We had to dig out a floppy drive and a windows CD to create a bootable floppy to install a firmware update for a linux server. So, my main complaint is about their customer service. The machines, when they work, are pretty good. Normally the hardware gets along with RH OSes. But when things go wrong, Dell will not admit to a design problem that's staring them in the face, and they have wasted literally hundreds of man hours for us with their stonewalling on this subject. So, I see no reason to buy more hardware from them when there are dealers big and small who have so far treated us much better. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Reasonable amount of indices
Alvaro Herrera [EMAIL PROTECTED] writes: Patric wrote: My Question now is: Is it wise to do so, and create hundreds or maybe thousands of Indices which partition the table for the selections. No, this is not helpful -- basically what you are doing is taking the first level (the first couple of levels maybe) of the index out of it, Right --- you'd be *far* better off using a small number of multicolumn indexes. I wouldn't want to bet that the planner code scales effectively to thousands of indexes, and even if it does, you're throwing away any chance of using parameterized queries. The update overhead is unpleasant to contemplate as well (or is this a read-only table?) regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
If what you mean is that pg has a design that's heavily oriented towards things that tend to be cheap on POSIX and doesn't use the core Win32 features effectively, then let's track that as an optimisation opportunity for the Win32 port. Isn't it just easier to assume that Windows Server can't do anything right? ;-) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
Carlo Stonebanks wrote: Isn't it just easier to assume that Windows Server can't do anything right? ;-) Well, avoiding the ;-) - people do, and its remarkably foolish of them. Its a long-standing whinge that many people with a UNIX-background seem to just assume that Windows sucks, but you could run 40,000 sockets from a single Win32 process for a while and some well-known UNIX systems would still struggle to do this, libevent or no. Admitedly, the way a Win32 app is architected would be rather different from a typical POSIX one. Windows has been a cheap target bt its remarkably adequate and the TPC results speak for themselves. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] SAN vs Internal Disks
On Thu, 6 Sep 2007, Harsh Azad wrote: Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5? You've already been pointed in the right direction. Devrim, the person who handles this packaging, does a great job of building all the RPMs. But I have a small philisophical difference with the suggested instructions for actually installing them though. I just finished an alternate installation guide for RHEL/CentOS 5 that's now posted at http://www.westnet.com/~gsmith/content/postgresql/pgrpm.htm you may want to take a look at. Secondly, would investing into Redhat enterprise edition give any performance advantage? I know all the SAN boxes are only certified on RHEL and not CentOS. Or since CentOS is similar to RHEL it would be fine? Wouldn't expect a performance advantage. The situation you have to ask consider is this: your SAN starts having funky problems, and your database is down because of it. You call the vendor. They find out you're running CentOS instead of RHEL and say that's the cause of your problem (even though it probably isn't). How much will such a passing the buck problem cost your company? If it's a significant number, you'd be foolish to run CentOS instead of the real RHEL. Some SAN vendors can be very, very picky about what they will support, and for most business environments the RHEL subscription isn't so expensive that it's worth wandering into an area where your support situation is fuzzy just to save that money. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance on 8CPU's and 32GB of RAM
James Mansion escribió: If what you mean is that pg has a design that's heavily oriented towards things that tend to be cheap on POSIX and doesn't use the core Win32 features effectively, then let's track that as an optimisation opportunity for the Win32 port. Already done for 8.3 (actual performance improvements still to be reported), but that doesn't help those poor users still on 8.2. -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J Los dioses no protegen a los insensatos. Éstos reciben protección de otros insensatos mejor dotados (Luis Wu, Mundo Anillo) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match