Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
At 12:46 AM 12/28/2006, Guy Rouillier wrote: I don't want to violate any license agreement by discussing performance, so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as BigDBMS here. I'm trying to convince my employer to replace BigDBMS with PostgreSQL for at least some of our Java applications. As a proof of concept, I started with a high-volume (but conceptually simple) network data collection application. This application collects files of 5-minute usage statistics from our network devices, and stores a raw form of these stats into one table and a normalized form into a second table. We are currently storing about 12 million rows a day in the normalized table, and each month we start new tables. For the normalized data, the app inserts rows initialized to zero for the entire current day first thing in the morning, then throughout the day as stats are received, executes updates against existing rows. So the app has very high update activity. In my test environment, I have a dual-x86 Linux platform running the application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and PostgreSQL 8.2.0 (only one at a time.) The Sun box has 4 disk arrays attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those familiar with these devices.) The arrays are set up with RAID5. So I'm working with a consistent hardware platform for this comparison. I'm only processing a small subset of files (144.) BigDBMS processed this set of data in 2 seconds, with all foreign keys in place. With all foreign keys in place, PG took 54000 seconds to complete the same job. I've tried various approaches to autovacuum (none, 30-seconds) and it doesn't seem to make much difference. What does seem to make a difference is eliminating all the foreign keys; in that configuration, PG takes about 3 seconds. Better, but BigDBMS still has it beat significantly. If you are using pg configured as default installed, you are not getting pg's best performance. Ditto using data structures optimized for BigDBMS. A= go through each query and see what work_mem needs to be for that query to be as RAM resident as possible. If you have enough RAM, set work_mem for that query that large. Remember that work_mem is =per query=, so queries running in parallel eat the sum of each of their work_mem's. B= Make sure shared buffers is set reasonably. A good rule of thumb for 8.x is that shared buffers should be at least ~1/4 your RAM. If your E4500 is maxed with RAM, there's a good chance shared buffers should be considerably more than 1/4 of RAM. C= What file system are you using? Unlike BigDBMS, pg does not have its own native one, so you have to choose the one that best suits your needs. For update heavy applications involving lots of small updates jfs and XFS should both be seriously considered. D= Your table schema and physical table layout probably needs to change. What BigDBMS likes here is most likely different from what pg likes. E= pg does not actually update records in place. It appends new records to the table and marks the old version invalid. This means that things like pages size, RAID stripe size, etc etc may need to have different values than they do for BigDBMS. Another consequence is that pg likes RAID 10 even more than most of its competitors. F= This may seem obvious, but how many of the foreign keys and other overhead do you actually need? Get rid of the unnecessary. G= Bother the folks at Sun, like Josh Berkus, who know pq inside and out +and+ know your HW (or have access to those that do ;-) )inside and out. I'll bet they'll have ideas I'm not thinking of. H= Explain Analyze is your friend. Slow queries may need better table statistics, or better SQL, or may be symptoms of issues "C" or "D" above or ... I've got PG configured so that that the system database is on disk array 2, as are the transaction log files. The default table space for the test database is disk array 3. I've got all the reference tables (the tables to which the foreign keys in the stats tables refer) on this array. I also store the stats tables on this array. Finally, I put the indexes for the stats tables on disk array 4. I don't use disk array 1 because I believe it is a software array. I= With 4 arrays of 12 HDs each, you definitely have enough spindles to place pg_xlog somewhere separate from all the other pg tables. In addition, you should analyze you table access patterns and then scatter them across your 4 arrays in such as way as to minimize head contention. I'm out of ideas how to improve this picture any further. I'd appreciate some suggestions. Thanks. Hope this helps, Ron Peacetree ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your mes
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
Hi all, A= go through each query and see what work_mem needs to be for that query to be as RAM resident as possible. If you have enough RAM, set work_mem for that query that large. Remember that work_mem is =per query=, so queries running in parallel eat the sum of each of their work_mem's. How can I know what work_mem needs a query needs? Regards -- Arnau ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Backup/Restore too slow
Hi, We have a database with one table of 10,000,000 tuples and 4 tables with 5,000,000 tuples. While in SQL Server it takes 3 minutes to restore this complete database, in PostgreSQL it takes more than 2 hours. The Backup takes 6 minutes in SQLServer and 13 minutes (which is not a problem) We are running PostgreSQL 8.1 for Windows and we are using: C:\pg_dump.exe -i -h localhost -p 5432 -U usuario -F c -b -v -f "C:\BK\file.backup" base and C:\pg_restore.exe -i -h localhost -p 5432 -U usuario -d base -O -v "C:\BK\file.backup" We use those parameters because we copied them from PGAdminIII. Is there any way to make it faster? Tanks Sebastián __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
[PERFORM] What you would consider as heavy traffic?
Hey Everyone, I am having a bit of trouble with a web host, and was wondering as what you would class as a high level of traffic to a database (queries per second) to an average server running postgres in a shared hosting environment (very modern servers). Many Thanks in Advance, Oliver ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] performance implications of binary placement
Hello all, I've been running performance tests on various incantations of Postgres on/off for a month or so. And, I've just come across some unexpected results. When I start my Postgres build as such: # (Scenario 1) ./configure --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin --includedir=/usr/include/pgsql --datadir=/usr/share/postgresql --mandir=/usr/share/man --with-docdir=/usr/share/doc/packages --disable-rpath --enable-thread-safety --enable-integer-datetimes --without-python --without-perl --without-tcl --without-tk It performs significantly worse than when I start my build like this: # (Scenario 2) ./configure --disable-rpath --enable-thread-safety --enable-integer-datetimes --without-python --without-perl --without-tcl --without-tk Note: the only differences are that "Scenario 1" includes these options: --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin --includedir=/usr/include/pgsql --datadir=/usr/share/postgresql --mandir=/usr/share/man --with-docdir=/usr/share/doc/packages And, to be clear, "Scenario 1" performs worse than "Scenario 2". Simple insert statements are taking significantly longer. I did not expect to see a performance hit with these options, especially since "/usr/" on the test machine is mounted as its own partition, and in both cases, all of the binaries, include files, etc. are in that partition. Has anyone seen this before? Are hard drive mechanics the only thing in play here? The only difference I'm seeing in logging between the two versions is that Scenario 2 has several of this message littered throughout the logfile: ERROR: could not open relation "pg_index_indexrelid_index": No such file or directory But, that doesn't seem to be effecting functionality or performance (especially considering the fact that the logfile that contains that message is part of the test that is performing better). We're using Postgres 7.4.8, building from the SLES9 Postgres 7.4.8 source rpm. Thanks for any help you can provide. I can provide more detail if needed. Thanks again, Bob ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Postgresql Configutation and overflow
Good day, I have been reading about the configuration of postgresql, but I have a server who does not give me the performance that should. The tables are indexed and made vacuum regularly, i monitor with top, ps and pg_stat_activity and when i checked was slow without a heavy load overage. Before, the server reached 2000 connections to postgresql (with max_connections=3000 in it for future workflow). I divided the load with another server for better performance, and now reach 500 connections, but yet is overflow. My question is about how much memory should i configure in shared_buffers and effective_cache_size. Features: - 4 Processsors Intel Xeon Dual 3.0Ghz - 12 GB RAM - 2 discos en RAID 1 for OS - 4 discs RAID 5 for DB - S.O Slackware 11.0 Linux 2.6.17.7 - Postgres 8.1.4 =In internet i found this: Tuning PostgreSQL for performance 2 Some basic parameters 2.1 Shared buffers # Start at 4MB (512) for a workstation # Medium size data set and 256-512MB available RAM: 16-32MB (2048-4096) # Large dataset and lots of available RAM (1-4GB): 64-256MB (8192-32768) == My postgresql.conf configuration is: #--- # FILE LOCATIONS #--- # The default values of these variables are driven from the -D command line # switch or PGDATA environment variable, represented here as ConfigDir. #data_directory = 'ConfigDir' # use data in another directory #hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file #ident_file = 'ConfigDir/pg_ident.conf' # IDENT configuration file # If external_pid_file is not explicitly set, no extra pid file is written. #external_pid_file = '(none)' # write an extra pid file #--- # CONNECTIONS AND AUTHENTICATION #--- # - Connection Settings - listen_addresses = '*' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost', '*' = all port = 5432 max_connections = 3000 # note: increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). You # might also need to raise shared_buffers to support more connections. #superuser_reserved_connections = 2 #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 # octal #bonjour_name = '' # defaults to the computer name #--- # RESOURCE USAGE (except WAL) #--- # - Memory - shared_buffers = 81920 # min 16 or max_connections*2, 8KB each temp_buffers = 5000 # min 100, 8KB each max_prepared_transactions = 1000# can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 10240# min 64, size in KB maintenance_work_mem = 253952 # min 1024, size in KB max_stack_depth = 4096 # min 100, size in KB # - Free Space Map - #max_fsm_pages = 2 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~70 bytes each # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 #preload_libraries = '' # - Cost-Based Vacuum Delay - #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-1 credits #vacuum_cost_page_miss = 10 # 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits # - Background writer - #bgwriter_delay = 200 # 10-1 milliseconds between rounds #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round #bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round #bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round #--- # WRITE AHEAD LOG #--- # - Settings - #fsync = on # turns forced synchronization on or off #wal_sync_method = fsync# the default is the first option # supported by the operating system: # open_datasync #
Re: [PERFORM] Backup/Restore too slow
Rebuilding the indexes or integrity confirmations are probably taking most of the time. What is your work_mem setting? On 22-Dec-06, at 9:32 AM, Sebastián Baioni wrote: Hi, We have a database with one table of 10,000,000 tuples and 4 tables with 5,000,000 tuples. While in SQL Server it takes 3 minutes to restore this complete database, in PostgreSQL it takes more than 2 hours. The Backup takes 6 minutes in SQLServer and 13 minutes (which is not a problem) We are running PostgreSQL 8.1 for Windows and we are using: C:\pg_dump.exe -i -h localhost -p 5432 -U usuario -F c -b -v -f "C: \BK\file.backup" base and C:\pg_restore.exe -i -h localhost -p 5432 -U usuario -d base -O -v "C:\BK\file.backup" We use those parameters because we copied them from PGAdminIII. Is there any way to make it faster? Tanks Sebastián __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
Re: [PERFORM] What you would consider as heavy traffic?
Depends on what the query is. If the queries take 3 to 5 days to execute, then 1 query per day on a 4 CPU machine would be at capacity. On 23-Dec-06, at 3:12 AM, [EMAIL PROTECTED] wrote: Hey Everyone, I am having a bit of trouble with a web host, and was wondering as what you would class as a high level of traffic to a database (queries per second) to an average server running postgres in a shared hosting environment (very modern servers). Many Thanks in Advance, Oliver ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Backup/Restore too slow
Rod Taylor <[EMAIL PROTECTED]> writes: > Rebuilding the indexes or integrity confirmations are probably taking > most of the time. > What is your work_mem setting? maintenance_work_mem is the thing to look at, actually. I concur that bumping it up might help. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Need Help
how can i get the disk usage for each table? can i do it via SQL? Thanks, Mailing-Lists ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Backup/Restore too slow
Thanks for answering. This is my configuration: # - Memory - shared_buffers = 1000# min 16, at least max_connections*2, 8KB each #work_mem = 1024# min 64, size in KB #maintenance_work_mem = 16384# min 1024, size in KB #max_stack_depth = 2048# min 100, size in KB The PC where we are runing PostgreSQL server is: AMD Athlon(tm) 64 Processor 3000+ 1.79 GHz, 1.93 GB RAM with WindowsXP Proffesional, Version 2002 Service Pack 2. How should we set it? Thanks a lot! Sebastián Tom Lane <[EMAIL PROTECTED]> escribió: Rod Taylor writes: > Rebuilding the indexes or integrity confirmations are probably taking > most of the time. > What is your work_mem setting? maintenance_work_mem is the thing to look at, actually. I concur that bumping it up might help. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
Re: [PERFORM] URGENT: Out of disk space pg_xlog
On Sat, 2006-12-23 at 13:13 -0500, Bruce Momjian wrote: > The bottom line is that we know of now cases where a long-running > transaction would delay recycling of the WAL files, so there is > certainly something not understood here. We can see from all of this that a checkpoint definitely didn't occur. Tom's causal chain was just one way that could have happened, there could well be others. I've noticed previously that a checkpoint can be starved out when trying to acquire the CheckpointStartLock. I've witnessed a two minute delay plus in obtaining the lock in the face of heavy transactions. If wal_buffers is small enough, WAL write rate high enough and the transaction rate high enough, a long queue can form for the WALWriteLock, which ensures that the CheckpointStartLock would queue indefinitely. I've tried implementing a queueable shared lock for the CheckpointStartLock. That helps the checkpoint, but it harms performance of other transactions waiting to commit, so I let that idea go. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Backup/Restore too slow
Sebastián Baioni wrote: Thanks for answering. This is my configuration: # - Memory - shared_buffers = 1000# min 16, at least max_connections*2, 8KB each #work_mem = 1024# min 64, size in KB #maintenance_work_mem = 16384# min 1024, size in KB #max_stack_depth = 2048# min 100, size in KB The PC where we are runing PostgreSQL server is: AMD Athlon(tm) 64 Processor 3000+ 1.79 GHz, 1.93 GB RAM with WindowsXP Proffesional, Version 2002 Service Pack 2. How should we set it? Shared buffers even on a workstation should be higher than 1000 if you want some performance. It depends how much memory you have spare to use for PostgreSQL. But something like shared_buffers = 2 maintenance_work_mem = 256000 Will certainly give you a performance boost. You will have to adjust those figures based on whatever else you are doing on the machine. Russell Smith. Thanks a lot! Sebastián */Tom Lane <[EMAIL PROTECTED]>/* escribió: Rod Taylor writes: > Rebuilding the indexes or integrity confirmations are probably taking > most of the time. > What is your work_mem setting? maintenance_work_mem is the thing to look at, actually. I concur that bumping it up might help. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster __ Correo Yahoo! Espacio para todos tus mensajes, antivirus y antispam ¡gratis! ¡Abrí tu cuenta ya! - http://correo.yahoo.com.ar
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
Ron wrote: > C= What file system are you using? Unlike BigDBMS, pg does not have > its own native one, so you have to choose the one that best suits > your needs. For update heavy applications involving lots of small > updates jfs and XFS should both be seriously considered. Actually it has been suggested that a combination of ext2 (for WAL) and ext3 (for data, with data journalling disabled) is a good performer. AFAIK you don't want the overhead of journalling for the WAL partition. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
On Fri, 29 Dec 2006, Alvaro Herrera wrote: Ron wrote: C= What file system are you using? Unlike BigDBMS, pg does not have its own native one, so you have to choose the one that best suits your needs. For update heavy applications involving lots of small updates jfs and XFS should both be seriously considered. Actually it has been suggested that a combination of ext2 (for WAL) and ext3 (for data, with data journalling disabled) is a good performer. AFAIK you don't want the overhead of journalling for the WAL partition. When benchmarking various options for a new PG server at one of my clients, I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be fastest to have ext2 for the WAL. The winning time was 157m46.713s for ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3 data=writeback. This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/ 1GB BBU cache. This config benched out faster than a 6disk RAID10 + 2 disk RAID1 for those of you who have been wondering if the BBU write back cache mitigates the need for separate WAL (at least on this workload). Those are the fastest times for each config, but ext2 WAL was always faster than the other two options. I didn't test any other filesystems in this go around. -- Jeff Frost, Owner <[EMAIL PROTECTED]> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
You should search the archives for Luke Lonegran's posting about how IO in Postgresql is significantly bottlenecked because it's not async. A 12 disk array is going to max out Postgresql's max theoretical write capacity to disk, and therefore BigRDBMS is always going to win in such a config. You can also look towards Bizgres which allegedly elimates some of these problems, and is cheaper than most BigRDBMS products. Alex. On 12/28/06, Guy Rouillier <[EMAIL PROTECTED]> wrote: I don't want to violate any license agreement by discussing performance, so I'll refer to a large, commercial PostgreSQL-compatible DBMS only as BigDBMS here. I'm trying to convince my employer to replace BigDBMS with PostgreSQL for at least some of our Java applications. As a proof of concept, I started with a high-volume (but conceptually simple) network data collection application. This application collects files of 5-minute usage statistics from our network devices, and stores a raw form of these stats into one table and a normalized form into a second table. We are currently storing about 12 million rows a day in the normalized table, and each month we start new tables. For the normalized data, the app inserts rows initialized to zero for the entire current day first thing in the morning, then throughout the day as stats are received, executes updates against existing rows. So the app has very high update activity. In my test environment, I have a dual-x86 Linux platform running the application, and an old 4-CPU Sun Enterprise 4500 running BigDBMS and PostgreSQL 8.2.0 (only one at a time.) The Sun box has 4 disk arrays attached, each with 12 SCSI hard disks (a D1000 and 3 A1000, for those familiar with these devices.) The arrays are set up with RAID5. So I'm working with a consistent hardware platform for this comparison. I'm only processing a small subset of files (144.) BigDBMS processed this set of data in 2 seconds, with all foreign keys in place. With all foreign keys in place, PG took 54000 seconds to complete the same job. I've tried various approaches to autovacuum (none, 30-seconds) and it doesn't seem to make much difference. What does seem to make a difference is eliminating all the foreign keys; in that configuration, PG takes about 3 seconds. Better, but BigDBMS still has it beat significantly. I've got PG configured so that that the system database is on disk array 2, as are the transaction log files. The default table space for the test database is disk array 3. I've got all the reference tables (the tables to which the foreign keys in the stats tables refer) on this array. I also store the stats tables on this array. Finally, I put the indexes for the stats tables on disk array 4. I don't use disk array 1 because I believe it is a software array. I'm out of ideas how to improve this picture any further. I'd appreciate some suggestions. Thanks. -- Guy Rouillier ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings