Re: [PERFORM] Tuning massive UPDATES and GROUP BY's?

2011-03-13 Thread runner
Don't insert data into an indexed table. A very important point with


bulk-loading is that you should load all the data first, then create
the indexes. Running multiple (different) CREATE INDEX queries in
parallel can additionally save a lot of time. Also don't move data
back and forth between the tables, just drop the original when you're
done.

I just saw your post and it looks similar to what I'm doing.
We're going to be loading 12G of data from a MySQL dump into our 
pg 9.0.3 database next weekend.  I've been testing this for the last
two weeks.  Tried removing the indexes and other constraints just for
the import but for a noob like me, this was too much to ask.  Maybe
when I get more experience.  So I *WILL* be importing all of my data
into indexed tables.  I timed it and it will take eight hours.  

I'm sure I could get it down to two or three hours for the import
if I really knew more about postgres but that's the price you pay when
you slam dunk a project and your staff isn't familiar with the 
database back-end.  Other than being very inefficient, and consuming 
more time than necessary, is there any other down side to importing 
into an indexed table?  In the four test imports I've done,
everything seems to work fine, just takes a long time.

Sorry for hijacking your thread here!

 


[PERFORM] Basic performance tuning on dedicated server

2011-03-10 Thread runner

 

 I'm setting up my first PostgreSQL server to replace an existing MySQL server. 
 I've been reading Gregory Smith's book Postgres 9.0 High Performance and also 
Riggs/Krosing's PostgreSQL 9 Administration Cookbook.  While both of these 
books are excellent, I am completely new to PostgreSQL and I cannot possibly 
read and understand every aspect of tuning in the short amount time before I 
have to have this server running.

I started out by using the 11 step process for tuning a new dedicated server 
(page 145 in Gregory Smith's book) but I found I had more questions than I 
could get answered in the short amount of time I have.  So, plan B is to use 
pgtune to get a ballpark configuration and then fine tune later as I learn more.

I ran some performance tests where I imported my 11Gb database from our old 
MySQL server into PostgreSQL 9.0.3.  In my testing I left the postgresql.conf 
at default values.  The PostgreSQL test database completely blew away the old 
MySQL server in performance.  Again, the postgresql.conf was never optimized so 
I feel I will be OK if I just get in the ballpark with tuning the 
postgresql.conf file.

I'd like to run my plan by you guys to see if it seems sane and make sure I'm 
not leaving out anything major. 

I'll be running PostgreSQL 9.0.3 on a Solaris 10 64 bit (Sparc) box with 16G of 
RAM.The local file system is ZFS.  The database file systems are UFS and 
are SAN mounted from VERY fast disks with battery backed write cache.  I don't 
know anybody else who is running a mix of ZFS and UFS file systems,  I cannot 
change this.  ZFS has it's own file system cache so I'm concerned about the 
ramifications of having caches for both ZFS and UFS. The only database related 
files that are stored on the local ZFS file system are the PostgreSQL binaries 
and the system logs.

From the extensive reading I've done, it seems generally accepted to set the 
UFS file system cache to use 50% of the system RAM.  That leaves 8G left for 
PostgreSQL.  Well, not really 8G,  I've reserved 1G for system use which 
leaves me with 7G for PostgreSQL to use.  I ran pgtune and specified 7G as the 
memory ( 7 * 1024 * 1024 = 7340032 ) and 300 connections.  The resulting 
postgresql.conf is what I plan to use.   

After reading Gregory Smith's book, I've decided to put the database on one UFS 
file system, the WAL on a separate UFS file system (mounted with forcedirectio) 
and the archive logs on yet another UFS file system.  I'll be on Solaris 10 so 
I've set wal_sync_method = fdatasync based on recommendations from other 
Solaris users.  Did a lot of google searches on wal_sync_method and Solaris.

That's what I plan to go live with in a few days.  Since my test server with 
default configs already blows away the old database server, I think I can get 
away with this strategy.  Time is not on my side.

I originally installed the 32 bit PostgreSQL binaries but later switched to 64 
bit binaries.  I've read the 32 bit version is faster and uses less memory than 
the 64 bit version.  At this point I'm assuming I need the 64 bit binaries in 
order to take full advantage the the 7G of RAM I have allocated to PostgreSQL.  
If I am wrong here please let me know.

This has been a lot of information to cram down in the short amount of time 
I've had to deal with this project.  I'm going to have to go back and read the 
PostgreSQL 9.0 High Performance book two or three more times and really dig in 
to the details but for now I'm going to cheat and use pgtune as described 
above.  Thank you in advance for any advice or additional tips you may be able 
to provide.  

Rick