Re: [PERFORM] Postgresql works too slow
Simon Riggs wrote: On Mon, 2005-04-18 at 08:50 +0400, Nurlan Mukhanov (AL/EKZ) wrote: I'm trying to restore my database from dump in several parrallel processes, but restore process works too slow. Number of rows about 100 000 000, RAM: 8192M CPU: Ultra Sparc 3 Number of CPU: 4 OS: SunOS sun 5.8 RDBMS: PostgreSQL 8.0 How to encrease postgresql speed? Why postgres took only 5.0% of CPU time? When you say restore...what are you actually doing? An archive recovery? A reload? A file-level restore of database? If you are doing a restore off a pg_dump, did you dump the data as inserts? This takes a lot more time to restore. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Moving pg_xlog
Keith Worthington wrote: On Wed, 01 Jun 2005 12:19:40 -0400, Tom Lane wrote Keith Worthington [EMAIL PROTECTED] writes: I have been reading about increasing PostgreSQL performance by relocating the pg_xlog to a disk other than the one where the database resides. I have the following pg_xlogs on my system. /raid02/databases/pg_xlog /raid02/rhdb_databases/pg_xlog /raid02/databases-8.0.0/pg_xlog /var/lib/pgsql/data/pg_xlog I have no idea why the forth entry is there. It is in the PostgreSQL installation directory. It's there because the RPM sets up a database under /var/lib/pgsql/data. 1) stop the postmaster 2) rm -rf /var/lib/pgsql/data/pg_xlog 3) mv /raid02/databases/pg_xlog /var/lib/pgsql/data/pg_xlog 4) ln -s /var/lib/pgsql/data/pg_xlog /raid02/databases/pg_xlog 5) start postmaster Put the xlog anywhere BUT there! If I can do that and place the pg_xlog in the installation directory will I create any installation issues the next time I upgrade PostgreSQL? Oh, the installation will be just fine ... but your database will not be after the upgrade wipes out your WAL. Put the xlog under some non-system-defined directory. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings Thanks Tom. I am glad I asked before I leaped. 8-0 Is there a convention that most people follow. It would seem that anywhere in the installation directory is a bad idea. From what I have read on other threads it does not want to be in the database directory since in most cases that would put it on the same disk as the database. We tend to use somthing that associates the WAL with the appropriate cluster, like /var/lib/CLUSTER for the data /var/lib/CLUSTER_WAL for WAL files. I am assuming due to lack of reaction that the symbolic link is not an issue. Is there a cleaner or more appropriate way of moving the pg_xlog. A symbolic link is the standard way to do it. Finally, am I correct in assuming that as long as the postmaster is shut down moving the log is safe? You are correct. Moving the WAL files with the postmaster running would be a very bad thing. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 'Real' auto vacuum?
Mindaugas Riauba wrote: When a row is orphaned it's added to a list of possibly available rows. When a new row is needed the list of possible rows is examined and the first one with a transaction id less then the lowest running transaction id is chosen to be the new row? These rows can be in a heap so it's really fast to find one. This is the long-term plan.However, it's actually a lot harder than it sounds. Patches welcome. Some ETA? Since that would be the most welcome addition for us. We have few very heavily updated databases where table bloat and constant vacuuming is killing performance. How often are you vacuuming (the definition of 'constantly' tends to vary)? Are you vacuuming the whole database each time? If so, identify which tables are being updated frequently, and vacuum those often. Vacuum other tables less frequently. Also, are you you using VACUUM FULL (if so, you certainly don't want to be). -- Brad Nicholson 416-673-4106[EMAIL PROTECTED] Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Investigating IO Saturation
I'm investigating a potential IO issue. We're running 7.4 on AIX 5.1. During periods of high activity (reads, writes, and vacuums), we are seeing iostat reporting 100% disk usage. I have a feeling that the iostat numbers are misleading. I can make iostat usage jump from less than 10% to greater than 95% by running a single vacuum against a moderate sized table (no noticeable change in the other activity). Do I actually have a problem with IO? Whether I do or not, at what point should I start to be concerned about IO problems? If my understanding is correct, it should be based on the wait time. Here's the output of vmstat during heavy load (reads, writes, several daily vacuums and a nightly pg_dump). Wait times appear to be alright, but my understanding of when to start being concerned about IO starvation is foggy at best. vmstat 5 kthr memory page faultscpu - --- --- r b avm fre re pi po fr sr cy in sy cs us sy id wa 2 2 1548418 67130 0 0 0 141 99 0 1295 22784 13128 11 4 71 14 3 3 1548422 66754 0 0 0 2127 2965 0 2836 29981 25091 26 4 39 31 2 3 1548423 66908 0 0 0 2369 3221 0 3130 34725 28424 25 7 38 30 3 5 1548423 67029 0 0 0 2223 3097 0 2722 31885 25929 26 9 33 32 3 3 1548423 67066 0 0 0 2366 3194 0 2824 43546 36226 30 5 35 31 2 4 1548423 67004 0 0 0 2123 3236 0 2662 25756 21841 22 4 39 35 2 4 1548957 66277 0 0 0 1928 10322 0 2941 36340 29906 28 6 34 33 3 5 1549245 66024 0 0 0 2324 14291 0 2872 39413 25615 25 4 34 37 2 6 1549282 66107 0 0 0 1930 11189 0 2832 72062 32311 26 5 32 38 2 4 1549526 65855 0 0 0 2375 9278 0 2822 40368 32156 29 5 37 29 2 3 1548984 66227 0 0 0 1732 5065 0 2825 39240 30788 26 5 40 30 3 4 1549341 66027 0 0 0 2325 6453 0 2790 37567 30509 28 5 37 30 2 4 1549377 65789 0 0 0 1633 2731 0 2648 35533 27395 20 5 39 36 1 5 1549765 65666 0 0 0 2272 3340 0 2792 43002 34090 26 5 29 40 2 3 1549787 65646 0 0 0 1779 2679 0 2596 37446 29184 22 5 37 36 2 5 1548985 66263 0 0 0 2077 3086 0 2778 49579 39940 26 9 35 30 2 4 1548985 66473 0 0 0 2078 3093 0 2682 23274 18460 22 3 41 34 4 3 1548985 66263 0 0 0 2177 3344 0 2734 43029 35536 29 5 38 28 1 4 1548985 66491 0 0 0 1978 3215 0 2739 28291 22672 23 4 41 32 3 3 1548985 66422 0 0 0 1732 2469 0 2852 71865 30850 28 5 38 29 -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Investigating IO Saturation
Joshua D. Drake wrote: Brad Nicholson wrote: I'm investigating a potential IO issue. We're running 7.4 on AIX 5.1. During periods of high activity (reads, writes, and vacuums), we are seeing iostat reporting 100% disk usage. I have a feeling that the iostat numbers are misleading. I can make iostat usage jump from less than 10% to greater than 95% by running a single vacuum against a moderate sized table (no noticeable change in the other activity). Well that isn't surprising. Vacuum is brutal especially on 7.4 as that is pre background writer. What type of IO do you have available (RAID, SCSI?) Data LUN is RAID 10, wal LUN is RAID 1. -- Brad Nicholson 416-673-4106[EMAIL PROTECTED] Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] vacuum full seems to hang on very small table
Dan Harris wrote: I have a table with 1 live row that I found has 115000 dead rows in it ( from a testing run ). I'm trying to VACUUM FULL the table and it has run for over 18 hours without completion. Considering the hardware on this box and the fact that performance seems reasonable in all other aspects, I'm confused as to why this would happen. The database other than this table is quite large ( 70 gigs on disk ) and I would expect to take days to complete but I just did 'vacuum full table_stats'. That should only do that table, correct? I'm running 8.0.3. VACUUM FULL requires an exclusive lock on the table that it's vacuuming. Chances are something else has a lock on the table is blocking the vacuum from obtaining the necessary lock. Check pg_locks for ungranted locks, you'll probably find that the request from the vacuum is ungranted. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
Tom Lane wrote: This is unfortunately not going to help you as far as getting that machine into production now (unless you're brave enough to run CVS tip as production, which I certainly am not). I'm afraid you're most likely going to have to ship that pSeries back at the end of the month, but while you've got it it'd be awfully nice if we could use it as a testbed We have PSeries boxes here that won't be going away anytime soon. If there are any specific test cases that need to run, I should be able to find the time to do it. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] newly created database makes queries run 300% faster
Ksenia Marasanova wrote: Hi, Alfter hours of adjusting performance of the queries in my Postgres 7.3 database - reprogramming the queries, VACUUMing, changing value of enable_seqscan - I gived it up, recreated the database and transferred the dump of the old database into it. The queries went from 15 sec to 50 msec!! Wow. Now I would really love to know how the old database got that slow, and how can I avoid it in the future. Any tips are greatly appreciated! If memory servers me (and it might not in this case), vacuum in 7.3 had issues with indexes. Reindexing or clustering your tables might have helped. Both are blocking operations. How to avoid it in the future is simple. Upgrade to a modern version of Postgres and vacuum your database properly. People work on this thing for a reason :-) -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Vacuum not identifying rows for removal..
On Mon, 2006-08-21 at 11:50 -0700, Eamonn Kent wrote: So, my best guess is that something in our application is preventing vacuum from removing dead rows. What could cause this? Would it be caused by a long-living transaction? What is the best way to track the problem down...right now, I am looking through pg_stat_activity and pg_locks to find processes that are “in transaction” and what locks they are holding. If you have any long running transactions - idle or active, that's your problem. Vacuum can only clear out dead tuples older than that oldest transaction. Deal with those. Make sure every single transaction your app initiates commits or rolls back every single time. You'll generally find them in pg_stat_activity, but not always. ps may show you idle transactions not showing as idle in pg_stat_activity -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Identifying bloated tables
On Mon, 2006-08-28 at 16:39 +0200, Michal Taborsky - Internet Mall wrote: I just put together a view, which helps us in indentifying which database tables are suffering from space bloat, ie. they take up much more space than they actually should. I though this might be useful for some folk here, because the questions about bloat-related performance degradation are quite common. Are you sure you haven't reinvented the wheel? Have you checked out contrib/pgstattuple ? Brad. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL and sql-bench
On Thu, 2006-09-21 at 07:52 -0700, yoav x wrote: Hi After upgrading DBI and DBD::Pg, this benchmark still picks MySQL as the winner (at least on Linux RH3 on a Dell 1875 server with 2 hyperthreaded 3.6GHz CPUs and 4GB RAM). I've applied the following parameters to postgres.conf: max_connections = 500 shared_buffers = 3000 work_mem = 10 effective_cache_size = 30 Most queries still perform slower than with MySQL. Is there anything else that can be tweaked or is this a limitation of PG or the benchmark? As mentioned by others, you are using a benchmark that is slanted towards MySQL. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] recommended benchmarks
On Fri, 2006-09-22 at 13:14 -0400, Charles Sprickman wrote: Hi all, I still have an dual dual-core opteron box with a 3Ware 9550SX-12 sitting here and I need to start getting it ready for production. I also have to send back one processor since we were mistakenly sent two. Before I do that, I would like to record some stats for posterity and post to the list so that others can see how this particular hardware performs. It looks to be more than adequate for our needs... What are the standard benchmarks that people here use for comparison purposes? I know all benchmarks are flawed in some way, but I'd at least like to measure with the same tools that folks here generally use to get a ballpark figure. Check out the OSDL stuff. http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/ Brad. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL Caching
On Wed, 2006-10-04 at 07:38 -0500, Dave Dutcher wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adnan DURSUN i want to be can read an execution plan when i look at it. So, is there any doc about how it should be read ? You are asking how to read the output from EXPLAIN? This page is a good place to start: http://www.postgresql.org/docs/8.1/interactive/performance-tips.html Robert Treat's Explaining Explain presentation from OSCON is also very good: http://redivi.com/~bob/oscon2005_pgsql_pdf/OSCON_Explaining_Explain_Public.pdf#search=%22%22explaining%20explain%22%22 Brad. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL Caching
On Tue, 2006-10-03 at 18:29 -0700, Tomeh, Husam wrote: * When any session updates the data that already in shared buffer, does Postgres synchronize the data both disk and shared buffers area immediately ? Not necessarily true. When a block is modified in the shared buffers, the modified block is written to the Postgres WAL log. A periodic DB checkpoint is performed to flush the modified blocks in the shared buffers to the data files. Postgres 8.0 and beyond have a process called bgwriter that continually flushes dirty buffers to disk, to minimize the work that needs to be done at checkpoint time. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Priority to a mission critical transaction
On Tue, 2006-11-21 at 21:43 -0200, Carlos H. Reimer wrote: Hi, We have an application that is mission critical, normally very fast, but when an I/O or CPU bound transaction appears, the mission critical application suffers. Is there a way go give some kind of priority to this kind of application? Reimer Not that I'm aware of. Depending on what the problems transactions are, setting up a replica on a separate machine and running those transactions against the replica might be the solution. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] New to PostgreSQL, performance considerations
On Mon, 2006-12-11 at 20:22 -0200, Daniel van Ham Colchete wrote: I'm thinking about writing a script to make all the tests (more than 3 times each), get the data and plot some graphs. I don't have the time right now to do it, maybe next week I'll have. Check out the OSDL test suite stuff. It runs the test and handles all the reporting for you (including graphs). http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/ I invite everyone to comment/sugest on the procedure or the results. I'd recommend running each test for quite a bit longer. Get your buffers dirty and exercised, and see what things look like then. Also, if you have the disk, offload your wal files to a separate disk. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Not Picking Index
On Fri, 2007-02-16 at 20:01 +0530, Gauri Kanekar wrote: I want the planner to ignore a specific index. I am testing some query output. For that purpose i dont want the index. I that possible to ignore a index by the planner. If the indexed field is an intger, add 0 to it. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Long Running Commits - Not Checkpoints
I'm having a problem with long running commits appearing in my database logs. It may be hardware related, as the problem appeared when we moved the database to a new server connected to a different disk array. The disk array is a lower class array, but still more than powerful enough to handle the IO requirements. One big difference though is that the old array had 16 GB of cache, the new one has 4 GB. Running Postgres 8.1.8 on AIX 5.3 We have enough IO to spare that we have the bgwriter cranked up pretty high, dirty buffers are getting quickly. Vmstat indicates 0 io wait time, no swapping or anything nasty like that going on. The long running commits do not line up with checkpoint times. The postgresql.conf config are identical except that wal_buffers was 8 on the old master, and it is set to 16 on the new one. We have other installations of this product running on the same array (different servers though) and they are not suffering from this problem. The only other thing of note is that the wal files sit on the same disk as the data directory. This has not changed between the old and new config, but the installs that are running fine do have their wal files on a separate partition. Any ideas where the problem could lie? Could having the wal files on the same data partition cause long running commits when there is plenty of IO to spare? -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Long Running Commits - Not Checkpoints
On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote: I'm having a problem with long running commits appearing in my database logs. It may be hardware related, as the problem appeared when we moved the database to a new server connected to a different disk array. The disk array is a lower class array, but still more than powerful enough to handle the IO requirements. One big difference though is that the old array had 16 GB of cache, the new one has 4 GB. Running Postgres 8.1.8 on AIX 5.3 We have enough IO to spare that we have the bgwriter cranked up pretty high, dirty buffers are getting quickly. Vmstat indicates 0 io wait time, no swapping or anything nasty like that going on. The long running commits do not line up with checkpoint times. The postgresql.conf config are identical except that wal_buffers was 8 on the old master, and it is set to 16 on the new one. We have other installations of this product running on the same array (different servers though) and they are not suffering from this problem. The only other thing of note is that the wal files sit on the same disk as the data directory. This has not changed between the old and new config, but the installs that are running fine do have their wal files on a separate partition. Any ideas where the problem could lie? Could having the wal files on the same data partition cause long running commits when there is plenty of IO to spare? More on this - we also have long running commits on installations that do have the wal files on a separate partition. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Long Running Commits - Not Checkpoints
On Thu, 2007-09-13 at 11:10 -0400, Tom Lane wrote: Brad Nicholson [EMAIL PROTECTED] writes: On Thu, 2007-09-13 at 10:15 -0400, Brad Nicholson wrote: I'm having a problem with long running commits appearing in my database logs. It may be hardware related, as the problem appeared when we moved the database to a new server connected to a different disk array. More on this - we also have long running commits on installations that do have the wal files on a separate partition. What's your definition of long running commit --- seconds? milliseconds? Exactly what are you measuring? Can you correlate the problem with what log_min_duration is set to 150ms Commits running over that up to 788ms. Here is what we see in the logs (with obfuscated dbname, username and IP): 2007-09-13 10:01:49.787 CUT [782426] dbname username 1.2.3.171 LOG: duration: 224.286 ms statement: EXECUTE unnamed [PREPARE: commit] 2007-09-13 10:19:16.373 CUT [737404] dbname username 1.2.3.174 LOG: duration: 372.545 ms statement: EXECUTE unnamed [PREPARE: commit] 2007-09-13 10:19:24.437 CUT [1806498] dbname username 11.2.3.171 LOG: duration: 351.544 ms statement: EXECUTE unnamed [PREPARE: commit] 2007-09-13 10:33:11.204 CUT [962598] dbname username 1.2.3.170 LOG: duration: 504.057 ms statement: EXECUTE unnamed [PREPARE: commit] 2007-09-13 10:40:33.735 CUT [1282104] dbname username 1.2.3.174 LOG: duration: 250.127 ms statement: EXECUTE unnamed [PREPARE: commit] 2007-09-13 10:49:54.752 CUT [1188032] dbname username 1.2.3.170 LOG: duration: 382.781 ms statement: EXECUTE unnamed [PREPARE: commit] 2007-09-13 11:30:43.339 CUT [1589464] dbname username 1.2.3.172 LOG: duration: 408.463 ms statement: EXECUTE unnamed [PREPARE: commit] -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Long Running Commits - Not Checkpoints
On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote: On Thu, 13 Sep 2007, Brad Nicholson wrote: I'd be curious to see how you've got your background writer configured to see if it matches situations like this I've seen in the past. The parameters controlling the all scan are the ones you'd might consider turning down, definately the percentage and possibly the maxpages as well. bgwriter_delay = 50 # 10-1 milliseconds between rounds bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers scanned/round bgwriter_lru_maxpages = 300 # 0-1000 buffers max written/round bgwriter_all_percent = 20 # 0-100% of all buffers scanned/round bgwriter_all_maxpages = 600 # 0-1000 buffers max written/round -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Long Running Commits - Not Checkpoints
On Thu, 2007-09-13 at 12:19 -0400, Brad Nicholson wrote: On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote: On Thu, 13 Sep 2007, Brad Nicholson wrote: I'd be curious to see how you've got your background writer configured to see if it matches situations like this I've seen in the past. The parameters controlling the all scan are the ones you'd might consider turning down, definately the percentage and possibly the maxpages as well. bgwriter_delay = 50 # 10-1 milliseconds between rounds bgwriter_lru_percent = 20.0 # 0-100% of LRU buffers scanned/round bgwriter_lru_maxpages = 300 # 0-1000 buffers max written/round bgwriter_all_percent = 20 # 0-100% of all buffers scanned/round bgwriter_all_maxpages = 600 # 0-1000 buffers max written/round I should add, there are 6 back ends running on this disk array (different servers and different data partitions) with these bgwriter settings. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Long Running Commits - Not Checkpoints
On Thu, 2007-09-13 at 12:12 -0400, Greg Smith wrote: Since you're probably not monitoring I/O waits and similar statistics on how the disk array's cache is being used, whether this is happening or not to you won't be obvious from what the operating system is reporting. A sysadmin looked at cache usage on the disk array. The read cache is being used heavily, and the write cache is not. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Curious about dead rows.
On Wed, 2007-11-14 at 17:46 -0500, Tom Lane wrote: Russell Smith [EMAIL PROTECTED] writes: It is possible that analyze is not getting the number of dead rows right? Hah, I think you are on to something. ANALYZE is telling the truth about how many dead rows it saw, but its notion of dead is not good according to SnapshotNow. Thus, rows inserted by a not-yet-committed transaction would be counted as dead. So if these are background auto-analyzes being done in parallel with inserting transactions that run for awhile, seeing a few not-yet-committed rows would be unsurprising. I wonder if that is worth fixing? I'm not especially concerned about the cosmetic aspect of it, but if we mistakenly launch an autovacuum on the strength of an inflated estimate of dead rows, that could be costly. Sounds to me like that could result in autovacuum kicking off while doing large data loads. This sounds suspiciously like problem someone on -novice was having - tripping over a windows autovac bug while doing a data load http://archives.postgresql.org/pgsql-novice/2007-11/msg00025.php -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] 7.4 Checkpoint Question
I have a legacy system still on 7.4 (I know, I know...the upgrade is coming soon). I have a fairly big spike happening once a day, every day, at the same time. It happens during a checkpoint, no surprise there. I know the solution to the problem (upgrade to a modern version), but what I'm looking for as an explanation as to why one particular checkpoint would be so bad on a low volume system, so I can appease certain management concerns. This is a _really _low volume system, less than 500 writes/hour. Normal operation sees checkpoint related spikes of around 200-300 milliseconds. We always checkpoint at the checkpoint timeout (every 5 minutes). During this one checkpoint, I'm seeing transactions running 2-3 seconds. During this time, writes are 5/minute. Relevant settings: shared_buffers = 1 checkpoint_segments = 30 checkpoint_timeout = 300 What gives? -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 7.4 Checkpoint Question
On Thu, 2007-11-29 at 16:14 +, Simon Riggs wrote: On Thu, 2007-11-29 at 10:10 -0500, Brad Nicholson wrote: I have a legacy system still on 7.4 (I know, I know...the upgrade is coming soon). I have a fairly big spike happening once a day, every day, at the same time. It happens during a checkpoint, no surprise there. I know the solution to the problem (upgrade to a modern version), but what I'm looking for as an explanation as to why one particular checkpoint would be so bad on a low volume system, so I can appease certain management concerns. This is a _really _low volume system, less than 500 writes/hour. Normal operation sees checkpoint related spikes of around 200-300 milliseconds. We always checkpoint at the checkpoint timeout (every 5 minutes). During this one checkpoint, I'm seeing transactions running 2-3 seconds. During this time, writes are 5/minute. Relevant settings: shared_buffers = 1 checkpoint_segments = 30 checkpoint_timeout = 300 What gives? If the timing is regular, its most likely a human-initiated action rather then a behavioural characteristic. VACUUM runs in background at that time, updates loads of blocks which need to be written out at checkpoint time. That slows queries down at that time but not others. Bingo. Big vacuum daily vacuum completes shortly before this chckpoint. Thanks. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Replication Syatem
On Mon, 2008-04-28 at 19:35 +0530, Gauri Kanekar wrote: Peter, We are doing vacuum full every alternate day. We also do vacuum analyze very often. We are currently using 8.1.3 version. Auto vacuum is already on. But the table1 is so busy that auto vacuum don't get sufficient chance to vacuum it :(. You should seriously consider upgrading to PG 8.3. There have been substantial improvements to VACUUM since 8.1 Brad. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Monitoring buffercache...
On Mon, 2008-11-24 at 11:43 -0700, Kevin Kempter wrote: Hi All; I've installed pg_buffercache and I want to use it to help define the optimal shared_buffers size. Currently I run this each 15min via cron: insert into buffercache_stats select now(), isdirty, count(*) as buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2; and here's it's explain plan explain insert into buffercache_stats select now(), isdirty, count(*) as buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2; QUERY PLAN --- Subquery Scan *SELECT* (cost=65.00..65.23 rows=2 width=25) - HashAggregate (cost=65.00..65.12 rows=2 width=1) - Function Scan on pg_buffercache_pages p (cost=0.00..55.00 rows=1000 width=1) (3 rows) Then once a day I will pull a report from the buffercache_stats table. The buffercache_stats table is our own creation : \d buffercache_stats Table public.buffercache_stats Column |Type | Modifiers +-+--- snap_timestamp | timestamp without time zone | isdirty| boolean | buffers | integer | memory| integer | Here's my issue, the server that we'll eventually roll this out to is extremely busy and the every 15min query above has the potential to have a huge impact on performance. I wouldn't routinely run pg_buffercache on a busy database. Plus, I don't think that pg_buffercache will answer this question for you. It will tell you whats currently in the buffer pool and the clean/dirty status, but that's not the first place I'd look, but what you really need is to figure out the hit ratio on the buffer pool and go from there. Does anyone have any suggestions per a better approach or maybe a way to improve the performance for the above query ? You should be able to use the blocks hit vs block read data in the pg_stat_database view (for the overall database), and drill down into pg_statio_user_tables/pg_statio_all_tables to get more detailed data if you want. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Monitoring buffercache...
On Mon, 2008-11-24 at 12:46 -0700, Scott Marlowe wrote: On Mon, Nov 24, 2008 at 11:43 AM, Kevin Kempter [EMAIL PROTECTED] wrote: Hi All; I've installed pg_buffercache and I want to use it to help define the optimal shared_buffers size. Currently I run this each 15min via cron: insert into buffercache_stats select now(), isdirty, count(*) as buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2; and here's it's explain plan explain insert into buffercache_stats select now(), isdirty, count(*) as buffers, (count(*) * 8192) as memory from pg_buffercache group by 1,2; QUERY PLAN --- Subquery Scan *SELECT* (cost=65.00..65.23 rows=2 width=25) - HashAggregate (cost=65.00..65.12 rows=2 width=1) - Function Scan on pg_buffercache_pages p (cost=0.00..55.00 rows=1000 width=1) (3 rows) Then once a day I will pull a report from the buffercache_stats table. The buffercache_stats table is our own creation : \d buffercache_stats Table public.buffercache_stats Column |Type | Modifiers +-+--- snap_timestamp | timestamp without time zone | isdirty| boolean | buffers | integer | memory| integer | Here's my issue, the server that we'll eventually roll this out to is extremely busy and the every 15min query above has the potential to have a huge impact on performance. Does anyone have any suggestions per a better approach or maybe a way to improve the performance for the above query ? I wouldn't worry about running it every 15 minutes unless it's on a REALLY slow machine. I just ran it in a loop over and over on my 8 core opteron server and it ran the load factor up by almost exactly 1.0. Under our normal daily load, it sits at 1.9 to 2.5, and it climbed to 2.9 under the new load of running that query over and over. So, it doesn't seem to be blocking or anything. The internal docs for pg_buffercache_pages.c state: To get a consistent picture of the buffer state, we must lock all partitions of the buffer map. Needless to say, this is horrible for concurrency. Must grab locks in increasing order to avoid possible deadlocks. I'd be concerned about that running routinely. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Sl_log_1 and sl_log_2 not getting truncated.
On Wed, 2009-01-07 at 18:18 +0530, Nimesh Satam wrote: Hi, We have recently installed slony and tsrted replication on one of our test machines. When we start inserting data in to the replicated database, the replication is taking properly. Over a period of time the lag increases between the two database. You should sign up to the Slony list and ask your question there. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scheduling autovacuum at lean hours only.
On Wed, 2009-02-11 at 22:57 +0530, Rajesh Kumar Mallah wrote: On Wed, Feb 11, 2009 at 10:03 PM, Grzegorz Jaśkiewicz gryz...@gmail.com wrote: On Wed, Feb 11, 2009 at 2:57 PM, Rajesh Kumar Mallah mallah.raj...@gmail.com wrote: vacuum_cost_delay = 150 vacuum_cost_page_hit = 1 vacuum_cost_page_miss = 10 vacuum_cost_page_dirty = 20 vacuum_cost_limit = 1000 autovacuum_vacuum_cost_delay = 300 why is it not a good idea to give end users control over when they want to run it ? Effectively, you have control over autovacuum via these params. You have to remember, that autovacuum doesn't cost much, and it makes planner know more about data. It's not there to clean up databases, as you might imagine - it is there to update stats, and mark pages as free. So make sure you tweak that config fist, because I have a funny feeling that you just think that vacuuming bogs down your machine, and _can_ be turned off without any bad consequences, which is simply not true. our usage pattern is such that peak activity (indicated by load average) during day time is 10 times during night hours. Autovacuum just puts more pressure to the system. If less stressing version is used then it shall take longer to complete one cycle, which would mean less performance for longer time . Less performance queues up queries and encourages people to re submit their queries which again adds to bogging up the system. That's not exactly how it works in practise, if tuned properly. It may take longer, but it is less intensive while running. We had one system that had spikes happening due to the exact case you described - there were noticeably high IO wait times while certain tables were being vacuumed. We set the cost delay and the wait times dropped to the point where it was non-issue. Vacuums take twice as long, but there is no measurable impact to the performance. In our case i feel the hardware is bit underscaled as compared to load thats why i think running in lean hours is best of both worlds no performance sacrifices and intelligent vacuuming. That is a different issue altogether. Not vacuuming a running system at all during peak hours is not considered intelligent vacuuming IMHO. There are plenty of use cases where small, frequent vacuums keep tables under control at a very low cost. Letting them go for extended periods of time without vacuuming causes bloat and eventual slowdowns to table access which manifest in higher IO usage across the board. If you really are dead set on vacuuming only at night, you may want to do a careful analysis of which tables need to be vacuumed and when, and trigger manual vacuums from cron. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Long Running Commits
backend, pid=667692 socket=9 2009-03-04 09:56:13.933 CUT [561402]DEBUG: server process (PID 864402) exited with exit code 0 2009-03-04 09:56:13.933 CUT [561402]DEBUG: server process (PID 594124) exited with exit code 0 2009-03-04 09:56:13.934 CUT [561402]DEBUG: server process (PID 852202) exited with exit code 0 2009-03-04 09:56:13.935 CUT [561402]DEBUG: server process (PID 2433156) exited with exit code 0 2009-03-04 09:56:13.935 CUT [561402]DEBUG: server process (PID 2216120) exited with exit code 0 2009-03-04 09:56:13.936 CUT [561402]DEBUG: server process (PID 1761484) exited with exit code 0 2009-03-04 09:56:13.936 CUT [561402]DEBUG: server process (PID 815268) exited with exit code 0 2009-03-04 09:56:13.936 CUT [561402]DEBUG: server process (PID 876668) exited with exit code 0 2009-03-04 09:56:13.937 CUT [561402]DEBUG: server process (PID 897060) exited with exit code 0 2009-03-04 09:56:13.937 CUT [561402]DEBUG: server process (PID 2199742) exited with exit code 0 2009-03-04 09:56:13.937 CUT [561402]DEBUG: server process (PID 913618) exited with exit code 0 2009-03-04 09:56:13.937 CUT [561402]DEBUG: server process (PID 2334774) exited with exit code 0 2009-03-04 09:56:13.984 CUT [561402]DEBUG: forked new backend, pid=2334776 socket=9 2009-03-04 09:56:14.065 CUT [602282] appdb appuser 1.2.3.3 LOG: duration: 872.457 ms statement: EXECUTE unnamed [PREPARE: commit] 2009-03-04 09:56:14.065 CUT [868552] appdb appuser 1.2.3.4 LOG: duration: 873.756 ms statement: EXECUTE unnamed [PREPARE: commit] 2009-03-04 09:56:14.075 CUT [2212000] appdb appuser 1.2.3.5 LOG: duration: 586.276 ms statement: EXECUTE unnamed [PREPARE: commit] 2009-03-04 09:56:14.076 CUT [561402]DEBUG: forked new backend, pid=913620 socket=9 2009-03-04 09:56:14.095 CUT [561402]DEBUG: forked new backend, pid=782510 socket=9 2009-03-04 09:56:14.106 CUT [561402]DEBUG: forked new backend, pid=2285776 socket=9 2009-03-04 09:56:14.179 CUT [561402]DEBUG: forked new backend, pid=999650 socket=9 2009-03-04 09:56:14.194 CUT [561402]DEBUG: forked new backend, pid=1646620 socket=9 2009-03-04 09:56:14.245 CUT [561402]DEBUG: forked new backend, pid=921634 socket=9 2009-03-04 09:56:14.255 CUT [561402]DEBUG: forked new backend, pid=1257600 socket=9 2009-03-04 09:56:14.265 CUT [561402]DEBUG: forked new backend, pid=1622114 socket=9 2009-03-04 09:56:14.325 CUT [561402]DEBUG: forked new backend, pid=1519664 socket=9 2009-03-04 09:56:14.326 CUT [561402]DEBUG: forked new backend, pid=1839238 socket=9 2009-03-04 09:56:14.335 CUT [561402]DEBUG: forked new backend, pid=2412680 socket=9 2009-03-04 09:56:14.425 CUT [561402]DEBUG: forked new backend, pid=2007238 socket=9 2009-03-04 09:56:14.426 CUT [561402]DEBUG: server process (PID 2363424) exited with exit code 0 2009-03-04 09:56:14.426 CUT [561402]DEBUG: server process (PID 1749056) exited with exit code 0 2009-03-04 09:56:14.426 CUT [561402]DEBUG: server process (PID 2056242) exited with exit code 0 2009-03-04 09:56:14.427 CUT [561402]DEBUG: server process (PID 770156) exited with exit code 0 2009-03-04 09:56:14.433 CUT [561402]DEBUG: forked new backend, pid=770158 socket=9 2009-03-04 09:56:14.433 CUT [561402]DEBUG: server process (PID 2379944) exited with exit code 0 2009-03-04 09:56:14.484 CUT [561402]DEBUG: forked new backend, pid=2379946 socket=9 2009-03-04 09:56:14.485 CUT [561402]DEBUG: server process (PID 1130616) exited with exit code 0 2009-03-04 09:56:14.494 CUT [561402]DEBUG: forked new backend, pid=1130618 socket=9 2009-03-04 09:56:14.505 CUT [561402]DEBUG: forked new backend, pid=516338 socket=9 2009-03-04 09:56:14.506 CUT [561402]DEBUG: forked new backend, pid=2199744 socket=9 -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
Greg Smith wrote: Karl Denninger wrote: With the write cache off on these disks they still are huge wins for very-heavy-read applications, which many are. Very read-heavy applications would do better to buy a ton of RAM instead and just make sure they populate from permanent media (say by reading everything in early at sequential rates to prime the cache). There is an extremely narrow use-case where SSDs are the right technology, and it's only in a subset even of read-heavy apps where they make sense. Out of curiosity, what are those narrow use cases where you think SSD's are the correct technology? -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
On Tue, 2009-11-17 at 11:36 -0500, Merlin Moncure wrote: 2009/11/13 Greg Smith g...@2ndquadrant.com: As far as what real-world apps have that profile, I like SSDs for small to medium web applications that have to be responsive, where the user shows up and wants their randomly distributed and uncached data with minimal latency. SSDs can also be used effectively as second-tier targeted storage for things that have a performance-critical but small and random bit as part of a larger design that doesn't have those characteristics; putting indexes on SSD can work out well for example (and there the write durability stuff isn't quite as critical, as you can always drop an index and rebuild if it gets corrupted). I am right now talking to someone on postgresql irc who is measuring 15k iops from x25-e and no data loss following power plug test. I am becoming increasingly suspicious that peter's results are not representative: given that 90% of bonnie++ seeks are read only, the math doesn't add up, and they contradict broadly published tests on the internet. Has anybody independently verified the results? How many times have the run the plug test? I've read other reports of people (not on Postgres) losing data on this drive with the write cache on. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is Diskeeper Automatic Mode safe?
On Mon, 2009-11-16 at 23:57 -0500, cb wrote: On Nov 16, 2009, at 8:31 PM, Tom Lane wrote: Myself and the other guy responsible for the underlying hardware have already gone down this route. The big bosses know our stance and know it isn't us preventing the upgrade. After that, there isn't too much more I can do except sit back and shake my head each time something goes wrong and I get sent on a wild goose chase to find any reason for the failure OTHER than PG. What you need to do is stop the wild goose chases. If problem is you PG version, no amount of investigation into other areas is going to change that. Your company is simply wasting money by ignoring this and blindly hoping that the problem will be something else. It can be a difficult battle, but it can be won. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SSD + RAID
On Thu, 2009-11-19 at 19:01 +0100, Anton Rommerskirchen wrote: Am Donnerstag, 19. November 2009 13:29:56 schrieb Craig Ringer: On 19/11/2009 12:22 PM, Scott Carey wrote: 3: Have PG wait a half second (configurable) after the checkpoint fsync() completes before deleting/ overwriting any WAL segments. This would be a trivial feature to add to a postgres release, I think. How does that help? It doesn't provide any guarantee that the data has hit main storage - it could lurk in SDD cache for hours. 4: Yet another solution: The drives DO adhere to write barriers properly. A filesystem that used these in the process of fsync() would be fine too. So XFS without LVM or MD (or the newer versions of those that don't ignore barriers) would work too. *if* the WAL is also on the SSD. If the WAL is on a separate drive, the write barriers do you no good, because they won't ensure that the data hits the main drive storage before the WAL recycling hits the WAL disk storage. The two drives operate independently and the write barriers don't interact. You'd need some kind of inter-drive write barrier. -- Craig Ringer Hello ! as i understand this: ssd performace is great, but caching is the problem. questions: 1. what about conventional disks with 32/64 mb cache ? how do they handle the plug test if their caches are on ? If the aren't battery backed, they can lose data. This is not specific to SSD. 2. what about using seperated power supply for the disks ? it it possible to write back the cache after switching the sata to another machine controller ? Not sure. I only use devices with battery backed caches or no cache. I would be concerned however about the drive not flushing itself and still running out of power. 3. what about making a statement about a lacking enterprise feature (aka emergency battery equipped ssd) and submitting this to the producers ? The producers aren't making Enterprise products, they are using caches to accelerate the speeds of consumer products to make their drives more appealing to consumers. They aren't going to slow them down to make them more reliable, especially when the core consumer doesn't know about this issue, and is even less likely to understand it if explained. They may stamp the word Enterprise on them, but it's nothing more than marketing. I found that one of them (OCZ) seems to handle suggestions of customers (see write speed discussins on vertex fro example) and another (intel) seems to handle serious problems with his disks in rewriting and sometimes redesigning his products - if you tell them and market dictades to react (see degeneration of performace before 1.11 firmware). perhaps its time to act and not only to complain about the fact. Or, you could just buy higher quality equipment that was designed with this in mind. There is nothing unique to SSD here IMHO. I wouldn't run my production grade databases on consumer grade HDD, I wouldn't run them on consumer grade SSD either. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DB is slow until DB is reloaded
On Mon, 2010-01-04 at 15:53 -0500, Madison Kelly wrote: Gary Doades wrote: From your queries it definitely looks like its your stats that are the problem. When the stats get well out of date the planner is choosing a hash join because it thinks thousands of rows are involved where as only a few are actually involved. Thats why, with better stats, the second query is using a loop join over very few rows and running much quicker. Therefore it's ANALYZE you need to run as well as regular VACUUMing. There should be no need to VACUUM FULL at all as long as you VACUUM and ANALYZE regularly. Once a day may be enough, but you don't say how long it takes your database to become slow. You can VACUUM either the whole database (often easiest) or individual tables if you know in more detail what the problem is and that only certain tables need it. Setting up autovacuum may well be sufficient. Cheers, Gary. That explains things, thank you! For the record; It was taking a few months for the performance to become intolerable. I've added CLUSTER - ANALYZE - VACUUM to my nightly routine and dropped the VACUUM FULL call. I'll see how this works. I think you are going down the wrong route here - you should be looking at preventative maintenance instead of fixing it after its broken. Ensure that autovacuum is running for the database (assuming that you are on a relatively modern version of PG), and possibly tune it to be more aggressive (we can help). This will ensure that the condition never comes up. ps - if you do go with the route specify, no need to VACUUM after the CLUSTER. CLUSTER gets rid of the dead tuples - nothing for VACUUM to do. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DB is slow until DB is reloaded
On Mon, 2010-01-04 at 20:02 -0800, Craig James wrote: +Madison Kelly wrote: You are right, autovacuum is not running after all. From your comment, I am wondering if you'd recommend I turn it on or not? If so, given that I doubt I will upgrade any time soon, how would I enable it? I suppose I could google that, but google rarely shares gotcha's. :) Most of the pain of a Postgres upgrade is the dump/reload step. But you've already had to do that several times, so why the hesitation to upgrade? Upgrading Postgres to the latest release, even if you have to do it from the source code, takes almost no time at all compared to the time you've already burned trying to solve this problem. Actually, the biggest pain going beyond 8.2 is the change to implicit casting. Do the upgrade, you won't regret it. Agree. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] bgwriter tunables vs pg_stat_bgwriter
On Wed, 2010-02-17 at 08:30 -0500, Jeff wrote: Since getting on 8.4 I've been monitoring things fairly closely. I whipped up a quick script to monitor pg_stat_bgwriter and save deltas every minute so I can ensure my bgwriter is beating out the backends for writes (as it is supposed to do). Now, the odd thing I'm running into is this: bgwriter_delay is 100ms (ie 10 times a second, give or take) bgwriter_lru_maxpages is 500 (~5000 pages / second) bgwriter_lru_multiplier is 4 Now, assuming I understand these values right the following is what should typically happen: while(true) { if buffers_written bgwriter_lru_maxpages or buffers_written anticipated_pages_needed * bgwriter_lru_multiplier { sleep(bgwriter_delay ms) continue; } ... } Correct. so I should not be able to have more than ~5000 bgwriter_clean pages per minute. (this assumes writing takes 0ms, which of course is inaccurate) That works out to 5000/second - 300,000/minute. However, I see this in my stats (they are deltas), and I'm reasonably sure it is not a bug in the code: (timestamp, buffers clean, buffers_checkpoint, buffers backend) 2010-02-17 08:23:51.184018 | 1 | 1686 | 5 2010-02-17 08:22:51.170863 | 15289 | 12676 | 207 2010-02-17 08:21:51.155793 | 38467 | 8993 |4277 2010-02-17 08:20:51.139199 | 35582 | 0 |9437 2010-02-17 08:19:51.125025 | 8 | 0 | 3 2010-02-17 08:18:51.84 | 1140 | 1464 | 6 2010-02-17 08:17:51.098422 | 0 | 1682 | 228 2010-02-17 08:16:51.082804 |50 | 0 | 6 2010-02-17 08:15:51.067886 | 789 | 0 | 1 perhaps some stats buffering occurring or something or some general misunderstanding of some of these tunables? -- Jeff Trout j...@jefftrout.com http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing FusionIO
On Wed, 2010-03-17 at 14:30 +0200, Devrim GÜNDÜZ wrote: On Mon, 2010-03-08 at 09:38 -0800, Ben Chobot wrote: We've enjoyed our FusionIO drives very much. They can do 100k iops without breaking a sweat. Yeah, performance is excellent. I bet we could get more, but CPU was bottleneck in our test, since it was just a demo server :( Did you test the drive in all three modes? If so, what sort of differences did you see. I've been hearing bad things from some folks about the quality of the FusionIO drives from a durability standpoint. I'm Unsure if this is vendor specific bias or not, but considering the source (which not vendor specific), I don't think so. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing FusionIO
On Wed, 2010-03-17 at 09:11 -0400, Justin Pitts wrote: On Mar 17, 2010, at 9:03 AM, Brad Nicholson wrote: I've been hearing bad things from some folks about the quality of the FusionIO drives from a durability standpoint. Can you be more specific about that? Durability over what time frame? How many devices in the sample set? How did FusionIO deal with the issue? I didn't get any specifics - as we are looking at other products. It did center around how FusionIO did wear-leveling though. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing FusionIO
On Wed, 2010-03-17 at 09:52 -0400, Justin Pitts wrote: FusionIO is publicly claiming 24 years @ 5TB/day on the 80GB SLC device, which wear levels across 100GB of actual installed capacity. http://community.fusionio.com/forums/p/34/258.aspx#258 20% of overall capacity free for levelling doesn't strike me as a lot. Some of the Enterprise grade stuff we are looking into (like TMS RamSan) leaves 40% (with much larger overall capacity). Also, running that drive at 80GB is the Maximum Capacity mode, which decreases the write performance. Max drive performance would be about 41TB/day, which coincidently works out very close to the 3 year warranty they have on the devices. To counter that: http://www.tomshardware.com/reviews/fusioinio-iodrive-flash,2140-2.html Fusion-io’s wear leveling algorithm is based on a cycle of 5 TB write/erase volume per day, resulting in 24 years run time for the 80 GB model, 48 years for the 160 GB version and 16 years for the MLC-based 320 GB type. However, since 5 TB could be written or erased rather quickly given the performance level, we recommend not relying on these approximations too much. FusionIO's claim _seems_ credible. I'd love to see some evidence to the contrary. Vendor claims always seem credible. The key is to separate the marketing hype from the actual details. Again, I'm just passing along what I heard - which was from a vendor-neutral, major storage consulting firm that decided to stop recommending these drives to clients. Make of that what you will. As an aside, some folks in our Systems Engineering department here did do some testing of FusionIO, and they found that the helper daemons were inefficient and placed a fair amount of load on the server. That might be something to watch of for for those that are testing them. On Mar 17, 2010, at 9:18 AM, Brad Nicholson wrote: On Wed, 2010-03-17 at 09:11 -0400, Justin Pitts wrote: On Mar 17, 2010, at 9:03 AM, Brad Nicholson wrote: I've been hearing bad things from some folks about the quality of the FusionIO drives from a durability standpoint. Can you be more specific about that? Durability over what time frame? How many devices in the sample set? How did FusionIO deal with the issue? I didn't get any specifics - as we are looking at other products. It did center around how FusionIO did wear-leveling though. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing FusionIO
On Wed, 2010-03-17 at 14:11 -0400, Justin Pitts wrote: On Mar 17, 2010, at 10:41 AM, Brad Nicholson wrote: On Wed, 2010-03-17 at 09:52 -0400, Justin Pitts wrote: FusionIO is publicly claiming 24 years @ 5TB/day on the 80GB SLC device, which wear levels across 100GB of actual installed capacity. http://community.fusionio.com/forums/p/34/258.aspx#258 20% of overall capacity free for levelling doesn't strike me as a lot. I don't have any idea how to judge what amount would be right. Some of the Enterprise grade stuff we are looking into (like TMS RamSan) leaves 40% (with much larger overall capacity). Also, running that drive at 80GB is the Maximum Capacity mode, which decreases the write performance. Very fair. In my favor, my proposed use case is probably at half capacity or less. I am getting the impression that partitioning/formatting the drive for the intended usage, and not the max capacity, is the way to go. Capacity isn't an issue with this workload. I cannot fit enough drives into these servers to get a tenth of the IOPS that even Tom's documents the ioDrive is capable of at reduced performance levels. The actual media is only good for a very limited number of write cycles. The way that the drives get around to be reliable is to constantly write to different areas. The more you have free, the less you have to re-use, the longer the lifespan. This is done by the drives wear levelling algorithms, not by using partitioning utilities btw. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why Wal_buffer is 64KB
On Thu, 2010-03-25 at 20:31 +0530, Tadipathri Raghu wrote: Hi All, Can anybody clarify on this, why wal_buffer is 64kb and what is advantages and disadvantages in increasing or decreasing the wal_buffer. This is addressed in the documentation. http://www.postgresql.org/docs/8.4/interactive/wal-configuration.html -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to fast the REINDEX
On Thu, 2010-04-01 at 19:17 +0530, raghavendra t wrote: Hi All, System Config - CPU - Intel® Xenon® CPU CPU Speed - 3.16 GHz Server Model - Sun Fire X4150 RAM-Size - 16GB Steve: So am I to understand I don't need to do daily reindexing as a maintenance measure with 8.3.7 on FreeBSD. My question is something like Steve's, why we should not do reindexing as our maintenance task. I was doing reindex only to get a best fit and not fall short of 90% hole, bcoz my table has lot of updates and deletes. We also has the weekly maintance of VACUUM, but still reindex takes lot of time. This is your problem. You should enable autovaccuum, let the vacuums happen more frequently, and this problem will go away. You will still have to fix the underlying bloat a last time though. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Zeus IOPS
On Fri, 2010-05-28 at 13:48 -0600, Scott Marlowe wrote: Anybody on the list have any experience with these drives? They get good numbers but I can't find diddly on them on the internet for the last year or so. http://www.stec-inc.com/product/zeusiops.php I'd heard that they were a popular choice in the Enterprise market, but that was around 6 months ago or so, and purely anecdotal. Seems strange though for them to disappear off the radar when SSD related information is becoming so prevalent. That could just be an effect of all the noise around the non-Enterprise grade SSD's floating around though. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Zeus IOPS
On Tue, 2010-06-01 at 10:27 +0300, Mindaugas Riauba wrote: Hello, Anybody on the list have any experience with these drives? They get good numbers but I can't find diddly on them on the internet for the last year or so. http://www.stec-inc.com/product/zeusiops.php Most of the storage vendors (I have confirmation from EMC and HP) use those in their SAN boxes. I believe that is because they are the only SLC SSD makers that have supercapacitors on the SSD drive which allows them to run with write cache enabled. As a side effect - they are insanely expensive. :) Texas Memory Systems also have these. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
On Tue, 2010-06-29 at 21:39 -0400, Bruce Momjian wrote: Jignesh Shah wrote: On Tue, Jun 29, 2010 at 2:45 PM, Bruce Momjian br...@momjian.us wrote: Tom Lane wrote: Bruce Momjian br...@momjian.us writes: I asked on IRC and was told it is true, and looking at the C code it looks true. ?What synchronous_commit = false does is to delay writing the wal buffers to disk and fsyncing them, not just fsync, which is where the commit loss due to db process crash comes from. Ah, I see. ?Thanks. I am personally surprised it was designed that way; ?I thought we would just delay fsync. That would require writing and syncing to be separable actions. ?If you're using O_SYNC or similar, they aren't. Ah, very good point. ?I have added a C comment to clarify why this is the current behavior; ?attached and applied. -- ?Bruce Momjian ?br...@momjian.us ? ? ? ?http://momjian.us ?EnterpriseDB ? ? ? ? ? ? ? ? ? ? ? ? ? ? http://enterprisedb.com Though has anybody seen a behaviour where synchronous_commit=off is slower than synchronous_commit=on ? Again there are two cases here one with O_* flag and other with f*sync flags. But I had seen that behavior with PostgreSQL 9.0 beta(2 I think) though havent really investigated it much yet .. (though now I dont remember which wal_sync_method flag) . Just curious if anybody has seen that behavior.. I have trouble believing how synchronous_commit=off could be slower than 'on'. I wonder if it could be contention on wal buffers? Say I've turned synchronous_commit off, I drive enough traffic fill up my wal_buffers. I assume that we would have to start writing buffers down to disk before allocating to the new process. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help in performance tuning.
On Fri, 2010-07-09 at 00:42 -0400, Tom Lane wrote: Samuel Gendler sgend...@ideasculptor.com writes: On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer cr...@postnewspapers.com.au wrote: If you're not using a connection pool, start using one. I see this issue and subsequent advice cross this list awfully frequently. Is there in architectural reason why postgres itself cannot pool incoming connections in order to eliminate the requirement for an external pool? Perhaps not, but there's no obvious benefit either. Since there's More Than One Way To Do It, it seems more practical to keep that as a separate problem that can be solved by a choice of add-on packages. This sounds similar to the approach to taken with Replication for years before being moved into core. Just like replication, pooling has different approaches. I do think that in both cases, having a solution that works, easily, out of the box will meet the needs of most users. There is also the issue of perception/adoption here as well. One of my colleagues mentioned that at PG East that he repeatedly heard people talking (negatively) about the over reliance on add-on packages to deal with core DB functionality. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] dates and partitioning
On Tue, 2010-07-20 at 09:36 -0600, Kevin Kempter wrote: Hi All; we have a table partitioned by day, the check constraint on the child tables looks like this (this is the may 31st partition): CHECK (stime = '2010-05-30 00:00:00+00'::timestamp with time zone AND stime = '2010-05-30 23:59:59+00'::timestamp with time zone) We have a python based app that creates code like this: select * from table_a a, table_b b, table_d d where a.id = b.id and b.id = d.id and stime = timestamp %s at time zone \'UTC\' and stime timestamp %s at time zone \'UTC\' and stime = timestamp %s at time zone d.name and stime timestamp %s at time zone d.name ... so here's my questions: 1) the above app generated query pshows up like this in pg_stat_activity: and stime = timestamp E'2010-07-17' at time zone 'UTC' and stime timestamp E'2010-07-21' at time zone 'UTC' and stime = timestamp E'2010-07-18' at time zone d.name and stime timestamp E'2010-07-19' at time zone d.name what's the E'date' from? and why does it show up this way? That's E is an escape character. Python is likely putting that in. See http://www.postgresql.org/docs/8.4/static/sql-syntax-lexical.html - section 4.1.2.2 2) the above query creates a plan that does a sequential scan filter on every partition. Why won't it only hit the correct partitions? Is it due to the way the date was specified? or maybe the at time zone syntax? Do you have constraint_exclusion turned on? -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On 10-08-04 03:49 PM, Scott Carey wrote: On Aug 2, 2010, at 7:26 AM, Merlin Moncure wrote: On Fri, Jul 30, 2010 at 11:01 AM, Yeb Havingayebhavi...@gmail.com wrote: After a week testing I think I can answer the question above: does it work like it's supposed to under PostgreSQL? YES The drive I have tested is the $435,- 50GB OCZ Vertex 2 Pro, http://www.newegg.com/Product/Product.aspx?Item=N82E16820227534 * it is safe to mount filesystems with barrier off, since it has a 'supercap backed cache'. That data is not lost is confirmed by a dozen power switch off tests while running either diskchecker.pl or pgbench. * the above implies its also safe to use this SSD with barriers, though that will perform less, since this drive obeys write trough commands. * the highest pgbench tps number for the TPC-B test for a scale 300 database (~5GB) I could get was over 6700. Judging from the iostat average util of ~40% on the xlog partition, I believe that this number is limited by other factors than the SSD, like CPU, core count, core MHz, memory size/speed, 8.4 pgbench without threads. Unfortunately I don't have a faster/more core machines available for testing right now. * pgbench numbers for a larger than RAM database, read only was over 25000 tps (details are at the end of this post), during which iostat reported ~18500 read iops and 100% utilization. * pgbench max reported latencies are 20% of comparable BBWC setups. * how reliable it is over time, and how it performs over time I cannot say, since I tested it only for a week. Thank you very much for posting this analysis. This has IMNSHO the potential to be a game changer. There are still some unanswered questions in terms of how the drive wears, reliability, errors, and lifespan but 6700 tps off of a single 400$ device with decent fault tolerance is amazing (Intel, consider yourself upstaged). Ever since the first samsung SSD hit the market I've felt the days of the spinning disk have been numbered. Being able to build a 100k tps server on relatively inexpensive hardware without an entire rack full of drives is starting to look within reach. Intel's next gen 'enterprise' SSD's are due out later this year. I have heard from those with access to to test samples that they really like them -- these people rejected the previous versions because of the data loss on power failure. So, hopefully there will be some interesting competition later this year in the medium price range enterprise ssd market. I'll be doing some testing on Enterprise grade SSD's this year. I'll also be looking at some hybrid storage products that use as SSD's as accelerators mixed with lower cost storage. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
On 8/10/2010 12:21 PM, Greg Smith wrote: Scott Carey wrote: Also, the amount of data at risk in a power loss varies between drives. For Intel's drives, its a small chunk of data ( 256K). For some other drives, the cache can be over 30MB of outstanding writes. For some workloads this is acceptable No, it isn't ever acceptable. You can expect the type of data loss you get when a cache fails to honor write flush calls results in catastrophic database corruption. It's not I lost the last few seconds; it's the database is corrupted and won't start after a crash. This is why we pound on this topic on this list. A SSD that fails to honor flush requests is completely worthless for anything other than toy databases. You can expect significant work to recover any portion of your data after the first unexpected power loss under heavy write load in this environment, during which you're down. We do database corruption recovery at 2ndQuadrant; while I can't talk about the details of some recent incidents, I am not speaking theoretically when I warn about this. What about putting indexes on them? If the drive fails and drops writes on those, they could be rebuilt - assuming your system can function without the index(es) temporarily. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
On 8/10/2010 2:28 PM, Greg Smith wrote: Brad Nicholson wrote: What about putting indexes on them? If the drive fails and drops writes on those, they could be rebuilt - assuming your system can function without the index(es) temporarily. Dumping indexes on SSD is one of the better uses for them, presuming you can survive what is likely to be an outage from a can the site handle full load? perspective while they rebuild after a crash. As I'm sure Brad is painfully aware of already, index rebuilding in PostgreSQL can take a while. To spin my broken record here again, the main thing to note when you consider that--relocate indexes onto SSD--is that the ones you are most concerned about the performance of were likely to be already sitting in RAM anyway, meaning the SSD speedup doesn't help reads much. So the giant performance boost just isn't there in that case. The case where I'm thinking they may be of use is for indexes you can afford to lose. I'm thinking of ones that are needed by nightly batch jobs, down stream systems or reporting - the sorts of things that you can turn off during a rebuild, and where the data sets are not likely to be in cache. We have a few such cases, but we don't need the speed of SSD's for them. Personally, I wouldn't entertain any SSD with a capacitor backing it for anything, even indexes. Not worth the hassle to me. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
On 8/10/2010 2:38 PM, Karl Denninger wrote: Scott Marlowe wrote: On Tue, Aug 10, 2010 at 12:13 PM, Karl Denningerk...@denninger.net wrote: ANY disk that says write is complete when it really is not is entirely unsuitable for ANY real database use. It is simply a matter of time What about read only slaves where there's a master with 100+spinning hard drives getting it right and you need a half dozen or so read slaves? I can imagine that being ok, as long as you don't restart a server after a crash without checking on it. A read-only slave isn't read-only, is it? I mean, c'mon - how does the data get there? A valid case is a Slony replica if used for query offloading (not for DR). It's considered a read-only subscriber from the perspective of Slony as only Slony can modify the data (although you are technically correct, it is not read only - controlled write may be more accurate). In case of failure, a rebuild + resubscribe gets you back to the same consistency. If you have high IO requirements, and don't have the budget to rack up extra disk arrays to meet them, it could be an option. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
On 8/10/2010 3:28 PM, Karl Denninger wrote: Brad Nicholson wrote: On 8/10/2010 2:38 PM, Karl Denninger wrote: Scott Marlowe wrote: On Tue, Aug 10, 2010 at 12:13 PM, Karl Denningerk...@denninger.net wrote: ANY disk that says write is complete when it really is not is entirely unsuitable for ANY real database use. It is simply a matter of time What about read only slaves where there's a master with 100+spinning hard drives getting it right and you need a half dozen or so read slaves? I can imagine that being ok, as long as you don't restart a server after a crash without checking on it. A read-only slave isn't read-only, is it? I mean, c'mon - how does the data get there? A valid case is a Slony replica if used for query offloading (not for DR). It's considered a read-only subscriber from the perspective of Slony as only Slony can modify the data (although you are technically correct, it is not read only - controlled write may be more accurate). In case of failure, a rebuild + resubscribe gets you back to the same consistency. If you have high IO requirements, and don't have the budget to rack up extra disk arrays to meet them, it could be an option. CAREFUL with that model and beliefs. Specifically, the following will hose you without warning: 1. SLONY gets a change on the master. 2. SLONY commits it to the (read-only) slave. 3. Confirmation comes back to the master that the change was propagated. 4. Slave CRASHES without actually committing the changed data to stable storage. What will hose you is assuming that your data will be okay in the case of a failure, which is a very bad assumption to make in the case on unreliable SSD's. You are assuming I am implying that these should be treated like reliable media - I am not. In case of failure, you need to assume data loss until proven otherwise. If there is a problem, rebuild. When the slave restarts it will not know that the transaction was lost. Neither will the master, since it was told that it was committed. Slony will happily go on its way and replicate forward, without any indication of a problem - except that on the slave, there are one or more transactions that are **missing**. Correct. Some time later you issue an update that goes to the slave, but the change previously lost causes the slave commit to violate referential integrity. SLONY will fail to propagate that change and all behind it - it effectively locks at that point in time. It will lock data flow to that subscriber, but not to others. You can recover from this by dropping the slave from replication and re-inserting it, but that forces a full-table copy of everything in the replication set. The bad news is that the queries to the slave in question may have been returning erroneous data for some unknown period of time prior to the lockup in replication (which hopefully you detect reasonably quickly - you ARE watching SLONY queue depth with some automated process, right?) There are ways around that - run two subscribers and redirect your queries on failure. Don't bring up the failed replica until it is verified or rebuilt. I can both cause this in the lab and have had it happen in the field. It's a nasty little problem that bit me on a series of disks that claimed to have write caching off, but in fact did not. I was very happy that the data on the master was good at that point, as if I had needed to failover to the slave (thinking it was a good copy) I would have been in SERIOUS trouble. It's very easy to cause those sorts of problems. What I am saying is that the technology can have a use, if you are aware of the sharp edges, and can both work around them and live with them. Everything you are citing is correct, but is more implying that they they are blindly thrown in without understanding the risks and mitigating them. I'm also not suggesting that this is a configuration I would endorse, but it could potentially save a lot of money in certain use cases. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp.
Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
On 10-08-12 03:22 AM, Arjen van der Meijden wrote: On 12-8-2010 2:53 gnuo...@rcn.com wrote: - The value of SSD in the database world is not as A Faster HDD(tm). Never was, despite the naive' who assert otherwise. The value of SSD is to enable BCNF datastores. Period. If you're not going to do that, don't bother. Silicon storage will never reach equivalent volumetric density, ever. SSD will never be useful in the byte bloat world of xml and other flat file datastores (resident in databases or not). Industrial strength SSD will always be more expensive/GB, and likely by a lot. (Re)factoring to high normalization strips out an order of magnitude of byte bloat, increases native data integrity by as much, reduces much of the redundant code, and puts the ACID where it belongs. All good things, but not effortless. It is actually quite common to under-utilize (short stroke) hard drives in the enterprise world. Simply because 'they' need more IOps per amount of data than a completely utilized disk can offer. As such the expense/GB can be much higher than simply dividing the capacity by its price (and if you're looking at fiber channel disks, that price is quite high already). And than it is relatively easy to find enterprise SSD's with better pricing for the whole system as soon as the IOps are more important than the capacity. And when you compare the ongoing operational costs of rack space, powering and cooling for big arrays full of spinning disks to flash based solutions the price comparison evens itself out even more. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] locking issue on simple selects?
On 10-09-15 03:07 PM, Tobias Brox wrote: On 15 September 2010 15:39, Tom Lanet...@sss.pgh.pa.us wrote: An exclusive lock will block selects too. Have you looked into pg_locks for ungranted lock requests? Well - I thought so, we have a logging script that logs the content of the pg_locks table, it didn't log anything interesting but it may be a problem with the script itself. It does an inner join on pg_locks.relation = pg_class.oid but when I check now this join seems to remove most of the rows in the pg_locks table. Does it make sense at all to join pg_class with pg_locks? I will ask the sysadm to change to an outer join as for now. You can also enable log_lock_waits and the lock waits will appear in your Postgres logs. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Memory usage - indexes
On 10-09-24 12:46 PM, Tobias Brox wrote: On 24 September 2010 18:23, Bob Lunneybob_lun...@yahoo.com wrote: Consult pg_statio_user_indexes to see which indexes have been used and how much. What is the main differences between pg_statio_user_indexes and pg_stat_user_indexes? The pg_stat_* views give you usage information (for indexes - number of scans, numbers of tuples read/fetched). The pg_statio_* views give you information about block reads and block hits I'm not at all concerned about 1 and 2 above - we don't have any performance issues on the write part, and we have plenty of disk capacity. We are still doing the nightly vacuum thing, and it does hurt us a bit since it's dragging ever more out in time. Why is the vacuum dragging out over time? Is the size of your data increasing, are you doing more writes that leave dead tuples, or are your tables and/or indexes getting bloated? Also, is there a reason why you do nightly vacuums instead of letting autovacuum handle the work? We started doing far less vacuuming when we let autovacuum handle things. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Memory usage - indexes
On 10-09-24 01:41 PM, Tobias Brox wrote: What do you mean, that you could run regular vacuum less frequently, or that the regular vacuum would go faster? It means that vacuums ran less frequently. With cron triggered vacuums, we estimated when tables needed to be vacuumed, and vacuumed them accordingly. Because of unpredictable shifts in activity, we scheduled the vacuums to happen more often than needed. With autovacuum, we vacuum some of our large tables far less frequently. We have a few large tables that used to get vacuumed every other day that now get vacuumed once or twice a month. The vacuums themselves take longer now as we use the vacuum cost delay to control the IO. That wasn't an option for us when we did manual vacuums as that was in 8.1 when vacuums were still treated as long running transactions. Stretching a vacuum out to a few hours prior to 8.2 would bloat other tables. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Memory usage - indexes
On 10-09-24 03:06 PM, Bob Lunney wrote: The pertinent difference between pg_stat_user_indexes and pg_statio_user_indexes is the latter shows the number of blocks read from disk or found in the cache. I have a minor, but very important correction involving this point. The pg_statio tables show you what blocks are found in the Postgres buffer cache, and what ones are not. For the ones that are not, those blocks may come from the OS filesystem cache, a battery backed cache, or on the actual disk. There is a big difference in performance based on where you are actually getting those blocks from (and you can't this info from Postgres). -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New wiki page on write reliability
On 10-10-21 10:08 AM, Greg Smith wrote: Now that some of my recent writing has gone from NDA protected to public sample, I've added a new page to the PostgreSQL wiki that provides a good starting set of resources to learn about an ever popular topic here, how write cache problems can lead to database corruption: http://wiki.postgresql.org/wiki/Reliable_Writes Bruce also has a presentation he's been working on that adds pictures showing the flow of data through the various cache levels, to help people visualize the whole thing, that should get added into there once he's finished tweaking it. I'd like to get some feedback from the members of this list about what's still missing after this expanded data dump. Ultimately I'd like to get this page to be an authoritative enough resource that the Reliability section of the official documentation could point back to this as a recommendation for additional information. So much of this material requires singling out specific vendors and staying up to date with hardware changes, both things that the official docs are not a good place for. Looks like a good start. I think a warning turning fsync off, the dangers of async_commit, and the potential problems with disabling full_page_writes might be worth mentioning on this page, unless you want to leave that buried in the attached references. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] AIX slow buffer reads
On 10-10-25 03:26 PM, André Volpato wrote: | On Mon, Oct 25, 2010 at 2:21 PM, André Volpato |andre.volp...@ecomtecnologia.com.br wrote: | Hi all, | | We are tuning a PostgreSQL box with AIX 5.3 and got stucked in a | very odd situation. | When a query got ran for the second time, the system seems to | deliver the results to slow. | | Here´s some background info: | | AIX Box: | PostgreSQL 8.4.4, AIX 5.3-9 64bits, SAN IBM DS3400, 8x450GB SAS 15K | Raid-5 | 8GB RAM, 2.3GB Shared buffers | | Debian Box: | PostgreSQL 8.4.4, Debian 4.3.2 64bits, SAN IBM DS3400, 5x300GB SAS | 15K Raid-0 | 7GB RAM, 2.1GB Shared buffers | | Right now, we changed lots of AIX tunables to increase disk and SO | performance. | Of course, postgres got tunned as well. I can post all changes made | until now if needed. | | To keep it simple, I will try to explain only the buffer read issue. | This query [1] took like 14s to run at AIX, and almost the same time | at Debian. | The issue is when I run it for the second time: | AIX - 8s | Debian - 0.3s | | These times keep repeating after the second run, and I can ensure | AIX isn´t touching the disks anymore. | I´ve never seen this behaviour before. I heard about Direct I/O and | I was thinking about givng it a shot. | Any ideas? | | | I doubt disk/io is the problem. Me either. Like I said, AIX do not touch the storage when runing the query. It became CPU-bound after data got into cache. Have you confirmed that the hardware is ok on both servers? Have both OS's been tuned by people that know how to tune the respective OS's? AIX is very different than Linux, and needs to be tuned accordingly. On AIX can you trace why it is CPU bound? What else is taking the CPU time, anything? Also, can you provide the output of pg_config from your AIX build? -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] AIX slow buffer reads
On 10-10-26 05:04 PM, André Volpato wrote: - Mensagem original - | On 10-10-25 03:26 PM, André Volpato wrote: | | On Mon, Oct 25, 2010 at 2:21 PM, André Volpato | |andre.volp...@ecomtecnologia.com.br wrote: (...) | | These times keep repeating after the second run, and I can | | ensure AIX isn´t touching the disks anymore. | | I´ve never seen this behaviour before. I heard about Direct I/O | | and I was thinking about givng it a shot. | | | | Any ideas? | | | | | | I doubt disk/io is the problem. | | Me either. | Like I said, AIX do not touch the storage when runing the query. | It became CPU-bound after data got into cache. | | Have you confirmed that the hardware is ok on both servers? | The hardware was recently instaled and checked by the vendor team. AIX box is on JS22: PostgreSQL 8.4.4, AIX 5.3-9 64bits, SAN IBM DS3400, 8x450GB SAS 15K Raid-5 8GB RAM (DDR2 667) # lsconf System Model: IBM,7998-61X Processor Type: PowerPC_POWER6 Processor Implementation Mode: POWER 6 Processor Version: PV_6 Number Of Processors: 4 Processor Clock Speed: 4005 MHz CPU Type: 64-bit Kernel Type: 64-bit Memory Size: 7680 MB Debian box is on HS21: PostgreSQL 8.4.4, Debian 4.3.2 64bits, SAN IBM DS3400, 5x300GB SAS 15K Raid-0 7GB RAM (DDR2 667) We are forced to use RedHat on this machine, so we are virtualizing the Debian box. # cpuinfo processor : [0-7] vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5420 @ 2.50GHz stepping: 6 cpu MHz : 2500.148 cache size : 6144 KB | Have both OS's been tuned by people that know how to tune the | respective OS's? AIX is very different than Linux, and needs to be tuned | accordingly. We´ve been tuning AIX for the last 3 weeks, and lots of tuneables got changed. On Debian, we have far more experience, and it´s been a chalenge to understand how AIX works. Most important tunes: page_steal_method=1 lru_file_repage=0 kernel_heap_psize=64k maxperm%=90 maxclient%=90 minperm%=20 Disk: chdev -l hdisk8 -a queue_depth=24 chdev -l hdisk8 -a reserve_policy=no_reserve chdev -l hdisk8 -a algorithm=round_robin chdev -l hdisk8 -a max_transfer=0x40 HBA: chdev -l fcs0 -P -a max_xfer_size=0x40 -a num_cmd_elems=1024 Postgres: shared_buffers = 2304MB effective_io_concurrency = 5 I wonder if effective_io_concurrency has anything to do with it. It was implemented and mainly tested on Linux, and I am unsure if it will do anything on AIX. The plan you posted for the query does a bitmap index scans which is what effective_io_concurrency will speed up. Can you post the output of explain analyze for that query on both AIX and Linux? That will show where the time is being spent. If it is being spent in the bitmap index scan, try setting effective_io_concurrency to 0 for Linux, and see what effect that has. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] AIX slow buffer reads
On 10/27/2010 4:10 PM, Tom Lane wrote: Greg Smithg...@2ndquadrant.com writes: André Volpato wrote: I disabled effective_io_concurrency at AIX but it made no changes on bitmap index times. Brad's point is that it probably doesn't do anything at all on AIX, and is already disabled accordingly. AFAICT from googling, AIX does have posix_fadvise, though maybe it doesn't do anything useful ... regards, tom lane If there is an easy way to check if it does do anything useful? If so, I can check it out. -- Brad Nicholson 416-673-4106 Database Administrator, Afilias Canada Corp. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance