John Allgood wrote:
In the below statement you mention putting each database on its own raid mirror.
"However, sticking with your arrangement, it would seem that you might be able to get some extra performance if each database is on it's own raid, since you are fairly likely to have 2 transactions occuring at the same time, that don't affect eachother (since you wouldn't have any foreign keys, etc on 2 separate databases.)"
That would take alot of disk drives to accomplish. I was thinking maybe putting three or four databases on each raid and dividing the heaviest used databases on each mirrored set. And for each of these sets have its own mirror for pg_xlog. My question is what is the best way to setup postgres databases on different disks. I have setup multiple postmasters on this system as a test. The only problem was configuring each databases "ie postgresql.conf, pg_hba.conf". Is there anyway in postgres to have everything in one cluster and have it seperated onto multiple drives. Here is a example of what is was thinking about.
I think this is something that you would have to try and see what works. My first feeling is that 8-disks in RAID10 is better than 4 sets of RAID1.
MIRROR1 - Database Group 1 MIRROR2 - pg_xlog for database group 1 MIRROR3 - Database Group 2 MIRROR4 - pg_xlog for database group 2 MIRROR5 - Database Group 3 MIRROR6 - pg_xlog for database group 3
This will take about 12 disk drives. I have a 14 bay Storage Bay I can use two of the drives for hotspare's.
I would have all of them in 1 database cluster, which means they are all served by the same postgres daemon. Which I believe means that they all use the same pg_xlog. That means you only need 1 raid for pg_xlog, though I would make it a 4-drive RAID10. (RAID1 is redundant, but actually slower on writes, you need the 0 to speed up reading/writing, I could be wrong).
I believe you can still split each database onto it's own raid later on if you find that you need to.
So this is my proposal 1: OS RAID (sounds like this is not in the Storage Bay). 4-drives RAID10 pg_xlog 8-drives RAID10 database cluster 2-drives Hot spares / RAID1
If you feel like you want to partition your databases, you could also do proposal 2: 4-drives RAID10 pg_xlog 4-drives RAID10 databases master + 1-4 4-drives RAID10 databases 5-9 2-drives hotspare / RAID1
If you think partitioning is better than striping, you could do proposal 3: 4-drives RAID10 pg_xlog 2-drives RAID1 master database 2-drives RAID1 databases 1,2,3 2-drives RAID1 databases 4,5 2-drives RAID1 databases 6,7 2-drives RAID1 databases 8,9
There are certainly a lot of potential arrangements here, and it's not like I've tried a lot of them. pg_xlog seems like a big enough bottleneck that it would be good to put it on it's own RAID10, to make it as fast as possible.
It also depends a lot on whether you will be write heavy/read heavy, etc. RAID5 works quite well for reading, very poor for writing. But if the only reason to have the master database is to perform read heavy queries, and all the writing is done at night in bulk fashion with careful tuning to avoid saturation, then maybe you would want to put the master database on a RAID5 so that you can get extra disk space. You could do proposal 4: 4-drive RAID10 pg_xlog 4-drive RAID5 master db 2-drive RAID1 dbs 1-3 2-drive RAID1 dbs 4-6 2-drive RAID1 dbs 7-9
You might also do some testing and find that pg_xlog doesn't deserve it's own 4 disks, and they would be better off in the bulk tables.
Unfortunately a lot of this would come down to performance testing on your dataset, with a real data load. Which isn't very easy to do. I personally like the simplicity of proposal 1.
John Allgood - ESC Systems Administrator
Description: OpenPGP digital signature