Re: [PERFORM] Index files
Great, creating new tablespace for indexes worked! Now the question is whether existing tables/index can be moved to the new tablespace using an alter command or the only way possible is to drop and recreate them? Azad On 9/14/07, Jean-David Beyer [EMAIL PROTECTED] wrote: Harsh Azad wrote: Hi, Where are the database index files located in the $PGDATA directory? I was thinking on soft linking them to another physical hard disk array. I am not an expert, but what I have done is put the Write-Ahead-Log on one hard drive, some little-used relations and their indices on a second hard drive, and the main database files on four other drives. These are SCSI hard drives and I have two SCSI controllers. /dev/sda and /dev/sdb are on one controller, and the other four hard drives are on the other controller. These controllers are on a PCI-X bus all their own. I put $PGDATA (I do not actually set or use that global variable) on /dev/sda. [/srv/dbms/dataA/pgsql/data]$ ls -l total 88 -rw--- 1 postgres postgres 4 Aug 11 13:32 PG_VERSION drwx-- 5 postgres postgres 4096 Aug 11 13:32 base drwx-- 2 postgres postgres 4096 Sep 14 09:16 global drwx-- 2 postgres postgres 4096 Sep 13 23:35 pg_clog -rw--- 1 postgres postgres 3396 Aug 11 13:32 pg_hba.conf -rw--- 1 root root 3396 Aug 16 14:32 pg_hba.conf.dist -rw--- 1 postgres postgres 1460 Aug 11 13:32 pg_ident.conf drwx-- 4 postgres postgres 4096 Aug 11 13:32 pg_multixact drwx-- 2 postgres postgres 4096 Sep 14 09:16 pg_subtrans drwx-- 2 postgres postgres 4096 Aug 12 16:14 pg_tblspc drwx-- 2 postgres postgres 4096 Aug 11 13:32 pg_twophase drwx-- 3 postgres postgres 4096 Sep 14 09:13 pg_xlog -rw--- 1 postgres postgres 15526 Sep 11 22:31 postgresql.conf -rw--- 1 postgres postgres 13659 Aug 11 13:32 postgresql.conf.dist -rw--- 1 postgres postgres56 Sep 14 07:33 postmaster.opts -rw--- 1 postgres postgres52 Sep 14 07:33 postmaster.pid In /dev/sdb are ]$ ls -l total 12 drwxr-x--- 2 postgres postgres 4096 Aug 18 00:00 pg_log -rw--- 1 postgres postgres 2132 Sep 14 07:25 pgstartup.log drwx-- 3 postgres postgres 4096 Aug 12 21:06 stock The stuff in stock are little-used tables and their indices. Everything else is on the other four drives. I put the index for a table on a separate drive from the tata for the table. -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jerseyhttp://counter.li.org ^^-^^ 09:10:01 up 1:37, 4 users, load average: 5.77, 5.12, 4.58 ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate -- Harsh Azad === [EMAIL PROTECTED]
[PERFORM] Index files
Hi, Where are the database index files located in the $PGDATA directory? I was thinking on soft linking them to another physical hard disk array. Thanks, Azad
Re: [PERFORM] Index files
ah.. thanks. Didn't realize table spaces can be mentioned while creating a index. BTW, are soft links ok to use for pg_clog / pg_xlog . I moved the existing directories to /mnt/logs/pglogs and made soft links for both directories in $PGDATA Thanks On 9/14/07, Ow Mun Heng [EMAIL PROTECTED] wrote: On Fri, 2007-09-14 at 08:20 +0530, Harsh Azad wrote: Hi, Where are the database index files located in the $PGDATA directory? I was thinking on soft linking them to another physical hard disk array. you have to search through pg_class for the number Alternatively, you can try using tablespaces. create tablespace indexspace location '/mnt/fastarray' create index newindex on table (index_1) tablespace indexspace -- Harsh Azad === [EMAIL PROTECTED]
Re: [PERFORM] SAN vs Internal Disks
Yeah, the DAS we are considering is Dell MD3000, it has redundant hot swappable raid controllers in active-active mode. Provision for hot spare hard-disk. And it can take upto 15 disks in 3U, you can attach two more MD1000 to it, giving a total of 45 disks in total. -- Harsh On 9/12/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: On Tue, 11 Sep 2007, Decibel! wrote: On Tue, Sep 11, 2007 at 05:09:00PM -0400, Michael Stone wrote: On Tue, Sep 11, 2007 at 03:55:51PM -0500, Decibel! wrote: Also, to reply to someone else's email... there is one big reason to use a SAN over direct storage: you can do HA that results in 0 data loss. Good SANs are engineered to be highly redundant, with multiple controllers, PSUs, etc, so that the odds of losing the SAN itself are very, very low. The same isn't true with DAS. You can get DAS arrays with multiple controllers, PSUs, etc. DAS != single disk. It's still in the same chassis, though, which means if you lose memory or mobo you're still screwed. In a SAN setup for redundancy, there's very little in the way of a single point of failure; generally only the backplane, and because there's very little that's on there it's extremely rare for one to fail. not nessasarily. direct attached doesn't mean in the same chassis, external drive shelves attached via SCSI are still DAS you can even have DAS attached to a pair of machines, with the second box configured to mount the drives only if the first one dies. David Lang ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- Harsh Azad === [EMAIL PROTECTED]
[PERFORM] SAN vs Internal Disks
Hi, We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5. The current data size is about 50GB, but we want to purchase the hardware to scale to about 1TB as we think our business will need to support that much soon. - Currently we have a 80% read and 20% write perecntages. - Currently with this configuration the Database is showing signs of over-loading. - Auto-vaccum, etc run on this database, vaccum full runs nightly. - Currently CPU loads are about 20%, memory utilization is full (but this is also due to linux caching disk blocks) and IO waits are frequent. - We have a load of about 400 queries per second Now we are considering to purchase our own servers and in the process are facing the usual dilemmas. First I'll list out what machine we have decided to use: 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) 32 GB RAM OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 (Data Storage mentioned below) We have already decided to split our database into 3 machines on the basis on disjoint sets of data. So we will be purchasing three of these boxes. HELP 1: Does something look wrong with above configuration, I know there will be small differences b/w opetron/xeon. But do you think there is something against going for 2.4Ghz Quad Xeons (clovertown i think)? HELP 2: The main confusion is with regards to Data Storage. We have the option of going for: A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3 disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We are also considering similar solution from EMC - CX310C. B: Go for Internal of DAS based storage. Here for each server we should be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on RAID-10 single table-space. What do I think? Well.. SAN wins on manageability, replication (say to a DR site), backup, etc... DAS wins on cost But for a moment keeping these aside, i wanted to discuss, purely on performance side which one is a winner? It feels like internal-disks will perform better, but need to understand a rough magnitude of difference in performance to see if its worth loosing the manageability features. Also if we choose to go with DAS, what would be the best tool to do async replication to DR site and maybe even as a extra plus a second read-only DB server to distribute select loads. Regards, Azad
Re: [PERFORM] SAN vs Internal Disks
Thanks Mark. If I replicate a snapshot of Data and log files (basically the entire PG data directory) and I maintain same version of postgres on both servers, it should work right? I am also thinking that having SAN storage will provide me with facility of keeping a warm standby DB. By just shutting one server down and starting the other mounting the same File system I should be able to bing my DB up when the primary inccurs a physical failure. I'm only considering SAN storage for this feature - has anyone ever used SAN for replication and warm standy-by on Postgres? Regards, Harsh On 9/6/07, Mark Lewis [EMAIL PROTECTED] wrote: On Thu, 2007-09-06 at 18:05 +0530, Harsh Azad wrote: Hi, We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5. The current data size is about 50GB, but we want to purchase the hardware to scale to about 1TB as we think our business will need to support that much soon. - Currently we have a 80% read and 20% write perecntages. - Currently with this configuration the Database is showing signs of over-loading. - Auto-vaccum, etc run on this database, vaccum full runs nightly. - Currently CPU loads are about 20%, memory utilization is full (but this is also due to linux caching disk blocks) and IO waits are frequent. - We have a load of about 400 queries per second Now we are considering to purchase our own servers and in the process are facing the usual dilemmas. First I'll list out what machine we have decided to use: 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) 32 GB RAM OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 (Data Storage mentioned below) We have already decided to split our database into 3 machines on the basis on disjoint sets of data. So we will be purchasing three of these boxes. HELP 1: Does something look wrong with above configuration, I know there will be small differences b/w opetron/xeon. But do you think there is something against going for 2.4Ghz Quad Xeons (clovertown i think)? HELP 2: The main confusion is with regards to Data Storage. We have the option of going for: A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3 disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We are also considering similar solution from EMC - CX310C. B: Go for Internal of DAS based storage. Here for each server we should be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on RAID-10 single table-space. What do I think? Well.. SAN wins on manageability, replication (say to a DR site), backup, etc... DAS wins on cost But for a moment keeping these aside, i wanted to discuss, purely on performance side which one is a winner? It feels like internal-disks will perform better, but need to understand a rough magnitude of difference in performance to see if its worth loosing the manageability features. Also if we choose to go with DAS, what would be the best tool to do async replication to DR site and maybe even as a extra plus a second read-only DB server to distribute select loads. Sounds like a good candidate for Slony replication for backups / read-only slaves. I haven't seen a SAN yet whose DR / replication facilities are on par with a good database replication solution. My impression is that those facilities are mostly for file servers, mail servers, etc. It would be difficult for a SAN to properly replicate a database given the strict ordering, size and consistency requirements for the data files. Not impossible, but in my limited experience I haven't found one that I trust to do it reliably either, vendor boastings to the contrary notwithstanding. (Hint: make sure you know exactly what your vendor's definition of the term 'snapshot' really means). So before you invest in a SAN, make sure that you're actually going to be able to (and want to) use all the nice management features you're paying for. We have some SAN's that are basically acting just as expensive external RAID arrays because we do the database replication/backup in software anyway. -- Mark Lewis -- Harsh Azad === [EMAIL PROTECTED]
Re: [PERFORM] SAN vs Internal Disks
Thanks Scott, we have now requested IBM/EMC to provide test machines. Interestingly since you mentioned the importance of Raid controllers and the drivers; we are planning to use Cent OS 5 for hosting the DB. Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5? Secondly, would investing into Redhat enterprise edition give any performance advantage? I know all the SAN boxes are only certified on RHEL and not CentOS. Or since CentOS is similar to RHEL it would be fine? Regards, Harsh On 9/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote: Hi, We are currently running our DB on a DualCore, Dual Proc 3.Ghz Xeon, 8GB RAM, 4x SAS 146 GB 15K RPM on RAID 5. The current data size is about 50GB, but we want to purchase the hardware to scale to about 1TB as we think our business will need to support that much soon. - Currently we have a 80% read and 20% write percentages. For this type load, you should be running on RAID10 not RAID5. Or, if you must use RAID 5, use more disks and have a battery backed caching RAID controller known to perform well with RAID5 and large arrays. - Currently with this configuration the Database is showing signs of over-loading. On I/O or CPU? If you're running out of CPU, then look to increasing CPU horsepower and tuning postgresql. If I/O then you need to look into a faster I/O subsystem. - Auto-vaccum, etc run on this database, vaccum full runs nightly. Generally speaking, if you need to run vacuum fulls, you're doing something wrong. Is there a reason you're running vacuum full or is this just precautionary. vacuum full can bloat your indexes, so you shouldn't run it regularly. reindexing might be a better choice if you do need to regularly shrink your db. The better option is to monitor your fsm usage and adjust fsm settings / autovacuum settings as necessary. - Currently CPU loads are about 20%, memory utilization is full (but this is also due to linux caching disk blocks) and IO waits are frequent. - We have a load of about 400 queries per second What does vmstat et. al. say about CPU versus I/O wait? Now we are considering to purchase our own servers and in the process are facing the usual dilemmas. First I'll list out what machine we have decided to use: 2x Quad Xeon 2.4 Ghz (4-way only 2 populated right now) 32 GB RAM OS Only storage - 2x SCSI 146 GB 15k RPM on RAID-1 (Data Storage mentioned below) We have already decided to split our database into 3 machines on the basis on disjoint sets of data. So we will be purchasing three of these boxes. HELP 1: Does something look wrong with above configuration, I know there will be small differences b/w opetron/xeon. But do you think there is something against going for 2.4Ghz Quad Xeons (clovertown i think)? Look like good machines, plenty fo memory. HELP 2: The main confusion is with regards to Data Storage. We have the option of going for: A: IBM N-3700 SAN Box, having 12x FC 300GB disks, Partitioned into 3 disks into RAID-4 for WAL/backup, and 9 disks on RAID-DP for data, 2 hot spare. We are also considering similar solution from EMC - CX310C. B: Go for Internal of DAS based storage. Here for each server we should be able to have: 2x disks on RAID-1 for logs, 6x disks on RAID-10 for tablespace1 and 6x disks on RAID-10 for tablespace2. Or maybe 12x disks on RAID-10 single table-space. What do I think? Well.. SAN wins on manageability, replication (say to a DR site), backup, etc... DAS wins on cost The problem with SAN is that it's apparently very easy to build a big expensive system that performs poorly. We've seen reports of such here on the lists a few times. I would definitely demand an evaluation period from your supplier to make sure it performs well if you go SAN. But for a moment keeping these aside, i wanted to discuss, purely on performance side which one is a winner? It feels like internal-disks will perform better, but need to understand a rough magnitude of difference in performance to see if its worth loosing the manageability features. That really really really depends. The quality of RAID controllers for either setup is very important, as is the driver support, etc... All things being even, I'd lean towards the local storage. Also if we choose to go with DAS, what would be the best tool to do async replication to DR site and maybe even as a extra plus a second read-only DB server to distribute select loads. Look at slony, or PITR with continuous recovery. Of those two, I've only used Slony in production, and I was very happy with it's performance, and it was very easy to write a bash script to monitor the replication for failures. -- Harsh Azad === [EMAIL PROTECTED]
Re: [PERFORM] SAN vs Internal Disks
Hi, How about the Dell Perc 5/i card, 512MB battery backed cache or IBM ServeRAID-8k Adapter? I hope I am sending relevant information here, I am not too well versed with RAID controllers. Regards, Harsh On 9/6/07, Scott Marlowe [EMAIL PROTECTED] wrote: On 9/6/07, Harsh Azad [EMAIL PROTECTED] wrote: Thanks Scott, we have now requested IBM/EMC to provide test machines. Interestingly since you mentioned the importance of Raid controllers and the drivers; we are planning to use Cent OS 5 for hosting the DB. What RAID controllers have you looked at. Seems the two most popular in terms of performance here have been Areca and 3Ware / Escalade. LSI seems to come in a pretty close third. Adaptec is to be avoided as are cheap RAID controllers (i.e. promise etc...) battery backed cache is a must, and the bigger the better. Firstly, I could only find postgres 8.1.x RPM for CentOS 5, could not find any RPM for 8.2.4. Is there any 8.2.4 RPM for CentOS 5? Secondly, would investing into Redhat enterprise edition give any performance advantage? I know all the SAN boxes are only certified on RHEL and not CentOS. Or since CentOS is similar to RHEL it would be fine? for all intents and purposes, CentOS and RHEL are the same OS, so any pgsql rpm for one should pretty much work for the other. At the worst, you might have to get a srpm and rebuild it for CentOS / White Box. -- Harsh Azad === [EMAIL PROTECTED]