Re: [PERFORM] Finding bottleneck
Merlin Moncure [EMAIL PROTECTED] writes: One thing that might interest you is that the penalty in 8.1 for stats_command_string=true in this type of access pattern is very high: I was experimenting to see if the new cpu efficiency gave me enough of a budget to start using this. This more than doubled the cpu load to around 70% with a runtime of 82 seconds. This is actually worse than 8.0 :(. That seems quite peculiar; AFAICS the pgstat code shouldn't be any slower than before. At first I thought it might be because we'd increased PGSTAT_ACTIVITY_SIZE, but actually that happened before 8.0 release, so it shouldn't be a factor in this comparison. Can anyone else confirm a larger penalty for stats_command_string in HEAD than in 8.0? A self-contained test case would be nice too. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Finding bottleneck
That seems quite peculiar; AFAICS the pgstat code shouldn't be any slower than before. At first I thought it might be because we'd increased PGSTAT_ACTIVITY_SIZE, but actually that happened before 8.0 release, so it shouldn't be a factor in this comparison. Just FYI the last time I looked at stats was in the 8.0 beta period. Can anyone else confirm a larger penalty for stats_command_string in HEAD than in 8.0? A self-contained test case would be nice too. looking into it. Merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Finding bottleneck
On Mon, 8 Aug 2005, Tom Lane wrote: What that sounds like to me is a machine with inadequate disk I/O bandwidth. Your earlier comment that checkpoint drives the machine into the ground fits right into that theory, too. You said there is almost no IO-wait but are you sure you are measuring that correctly? Reducing checkpoint_timeout to 600 seconds had a positive effect. Previous value was 1800 seconds. We have a spare disk array from the old server and I'm planning to use it as a tablespace for the comment table (the 100M+ rows one) as Ron suggested. Queries accumulate and when checkpointing is over, there can be something like 400 queries running but over 50% of cpu is just idling. 400 queries? Are you launching 400 separate backends to do that? Some sort of connection pooling seems like a good idea, if you don't have it in place already. If the system's effective behavior in the face of heavy load is to start even more concurrent backends, that could easily drive things into the ground. Ok, I implemented connection pooling using pgpool and it increased performance a lot! We are now delivering about 1500 dynamic pages a second without problems. Each of the eight single-cpu webservers are running a pgpool instance with 20 connections. However, those configuration changes didn't have significant effect to oprofile results. AtEOXact_CatCache consumes even more cycles. This isn't a problem right now but it may be in the future... CPU: AMD64 processors, speed 2190.23 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 10 samples %symbol name 1147870 21.1602 AtEOXact_CatCache 1874663.4558 hash_seq_search 1743573.2142 AllocSetAlloc 1708963.1504 nocachegetattr 1317242.4282 ExecMakeFunctionResultNoSets 1252922.3097 SearchCatCache 1172642.1617 StrategyDirtyBufferList 1057411.9493 hash_search 98245 1.8111 FunctionCall2 97878 1.8043 yyparse 90932 1.6763 LWLockAcquire 83555 1.5403 LWLockRelease 81045 1.4940 _bt_compare ... and so on ... -8 Signigicant rows from current postgresql.conf 8- max_connections = 768 # unnecessarily large with connection pooling shared_buffers = 15000 work_mem = 2048 maintenance_work_mem = 32768 max_fsm_pages = 100 max_fsm_relations = 5000 bgwriter_percent = 2 fsync = true wal_buffers = 512 checkpoint_segments = 200 # less would probably be enuff with 600sec timeout checkpoint_timeout = 600 effective_cache_size = 50 random_page_cost = 1.5 default_statistics_target = 150 stats_start_collector = true stats_command_string = true |\__/| ( oo )Kari Lavikka - [EMAIL PROTECTED] - (050) 380 3808 __ooO( )Ooo___ _ ___ _ _ _ __ _ _ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Finding bottleneck
Kari Lavikka [EMAIL PROTECTED] writes: However, those configuration changes didn't have significant effect to oprofile results. AtEOXact_CatCache consumes even more cycles. I believe I've fixed that for 8.1. Relative to 8.0, I am seeing a dramatic, almost miraculous reduction in CPU load times in 8.1devel. This is for ISAM style access patterns over the parse/bind interface. (IOW one record at a time, 90% read, 10% write). Relative to commercial dedicated ISAM storage engines, pg holds up very well except in cpu load, but 8.1 is a huge step towards addressing that. So far, except for one minor (and completely understandable) issue with bitmap issues, 8.1 has been a stellar performer. Also great is the expansion of pg_locks view (which I didn't see mentioned in Bruce's TODO list, just FYI). Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Finding bottleneck
Merlin Moncure [EMAIL PROTECTED] writes: Relative to 8.0, I am seeing a dramatic, almost miraculous reduction in CPU load times in 8.1devel. This is for ISAM style access patterns over the parse/bind interface. (IOW one record at a time, 90% read, 10% write). Relative to commercial dedicated ISAM storage engines, pg holds up very well except in cpu load, but 8.1 is a huge step towards addressing that. Cool --- we've done a fair amount of work on squeezing out internal inefficiencies during this devel cycle, but it's always hard to predict just how much anyone will notice in the real world. Care to do some oprofile or gprof profiles to see where it's still bad? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Finding bottleneck
Cool --- we've done a fair amount of work on squeezing out internal inefficiencies during this devel cycle, but it's always hard to predict just how much anyone will notice in the real world. Care to do some oprofile or gprof profiles to see where it's still bad? Since release of 8.0, we are a strictly windows shop :). I tried building pg with -pg flag and got errors in some of the satellite libraries. I think this is solvable though at some point I'll spend more time on it. Anyways, just so you know the #s that I'm seein, I've run several benchmarks of various programs that access pg via our ISAM bridge. The results are as consistent as they are good. These tests are on the same box using the same .conf on the same freshly loaded data. The disk doesn't play a major role in these tests. All data access is through ExecPrepared libpq C interface. Benchmark is run from a separate box on a LAN. Bill of Materials Traversal ( ~ 62k records). ISAM* pg 8.0 pg 8.1 devel delta 8.0-8.1 running time 63 sec 90 secs71 secs21% cpu load 17%45%32%29% loadsecs** 10.71 40.5 22.72 44% recs/sec 984688873 recs/loadsec 5882 1530 2728 *ISAM is an anonymous commercial ISAM library in an optimized server architecture (pg smokes the non-optimized flat file version). **Loadsecs being seconds of CPU at 100% load. IOW cpu load drop is around 44%. Amazing! Merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Finding bottleneck
I think I have a solution for you. You have posted that you presently have these RAID volumes and behaviors: sda: data (10 spindles, raid10) sdb: xlog clog (2 spindles, raid1) sdc: os and other stuff Usually iostat (2 second interval) says: avg-cpu: %user %nice %sys %iowait %idle 32.380.00 12.8811.6243.12 Device: tps kB_read/skB_wrtn/skB_readkB_wrtn sda 202.00 1720.00 0.00 3440 0 sdb 152.50 4.00 2724.00 8 5448 sdc 0.00 0.000.00 0 0 And during checkpoint: avg-cpu: %user %nice %sys %iowait %idle 31.25 0.00 14.75 54.000.00 Device:tpskB_read/skB_wrtn/skB_readkB_wrtn sda3225.50 1562.00 35144.00 3124 70288 sdb 104.5010.00 2348.00 20 4696 sdc 0.00 0.00 0.00 0 0 During checkpoints sda is becoming saturated, essentially halting all other DB activity involving sda. A lesser version of the porblem is probably occurring every time multiple entities on sda are being accessed simultaneously, particularly simultaneous writes. My Proposed Solution: Put comment and its index on it's own dedicated RAID volume. Put comment_archive and its index on its own dedicated RAID volume. Put the rest of the tables currently part of data on their own dedicated RAID volume. Put the rest if the indexes to the tables currently part of data on their own dedicated RAID volume. Put xlog on its own dedicated RAID volume. The general idea here is to put any tables or indexes that tend to require simultaneous access, particularly write access, on different spindles. Like all things, there's a point of diminishing returns that is dependent on the HW used and the DB load. If you must wring every last bit of IO out of the HD subsystem, a more exact set of spindle assignments can be made by analyzing your queries and then 1) make sure writes that tend to be simultaneous are to different spindles, then (if you still need better IO) 2) make sure reads that tend to be simultaneous are to different spindles. At some point, your controller will become the bottleneck. At some point beyond that, the IO channels on the mainboard will become the bottleneck. My suggestion should get you to within 80-90% of optimal if I've understood the implications of your posts correctly. The other suggestion I'd make is to bump your RAM from 16GB to 32GB as soon as you can afford it and then tune your PostgreSQL parameters to make best use of it. The more RAM resident your DB, the better. Hope this helps, Ron Peacetree ===Original Message Follows=== From: Kari Lavikka tuner ( at ) bdb ( dot ) fi To: Merlin Moncure merlin ( dot ) moncure ( at ) rcsonline ( dot ) com Subject: Re: Finding bottleneck Date: Mon, 8 Aug 2005 19:19:09 +0300 (EETDST) -- Actually I modified postgresql.conf a bit and there isn't commit delay any more. That didn't make noticeable difference though.. Workload is generated by a website with about 1000 dynamic page views a second. Finland's biggest site among youths btw. Anyway, there are about 70 tables and here's some of the most important: relname | reltuples --+- comment | 1.00723e+08 comment_archive | 9.12764e+07 channel_comment | 6.93912e+06 image| 5.80314e+06 admin_event | 5.1936e+06 user_channel | 3.36877e+06 users| 325929 channel | 252267 Queries to comment table are mostly IO-bound but are performing quite well. Here's an example: (SELECT u.nick, c.comment, c.private, c.admin, c.visible, c.parsable, c.uid_sender, to_char(c.stamp, 'DD.MM.YY HH24:MI') AS stamp, c.comment_id FROM comment c INNER JOIN users u ON u.uid = c.uid_sender WHERE u.status = 'a' AND c.image_id = 15500900 AND c.uid_target = 780345 ORDER BY uid_target DESC, image_id DESC, c.comment_id DESC) LIMIT 36 And explain analyze: Limit (cost=0.00..6.81 rows=1 width=103) (actual time=0.263..17.522 rows=12 loops=1) - Nested Loop (cost=0.00..6.81 rows=1 width=103) (actual time=0.261..17.509 rows=12 loops=1) - Index Scan Backward using comment_uid_target_image_id_comment_id_20050527 on comment c (cost=0.00..3.39 rows=1 width=92) (actual time=0.129..16.213 rows=12 loops=1) Index Cond: ((uid_target = 780345) AND (image_id = 15500900)) - Index Scan using users_pkey on users u (cost=0.00..3.40 rows=1 width=15) (actual time=0.084..0.085 rows=1 loops=12) Index Cond: (u.uid = outer.uid_sender)
Re: [PERFORM] Finding bottleneck
Kari Lavikka [EMAIL PROTECTED] writes: samples %symbol name 13513390 16.0074 AtEOXact_CatCache That seems quite odd --- I'm not used to seeing that function at the top of a profile. What is the workload being profiled, exactly? He is running a commit_delay of 8. Could that be playing a role? Merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Finding bottleneck
Kari Lavikka [EMAIL PROTECTED] writes: Disk configurations looks something like this: sda: data (10 spindles, raid10) sdb: xlog clog (2 spindles, raid1) sdc: os and other stuff That's definitely wrong. Put clog on the data disk. The entire point of giving xlog its own spindle is that you don't ever want the disk heads moving off the current xlog file. I'm not sure how much this is hurting you, given that clog is relatively low volume, but if you're going to go to the trouble of putting xlog on a separate spindle then it should be a completely dedicated spindle. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Finding bottleneck
Hello, we recently upgraded our dual Xeon Dell to a brand new Sun v40z with 4 opterons, 16GB of memory and MegaRAID with enough disks. OS is Debian Sarge amd64, PostgreSQL is 8.0.3. Size of database is something like 80GB and our website performs about 600 selects and several updates/inserts a second. v40z performs somewhat better than our old Dell but mostly due to increased amount of memory. The problem is.. there seems to by plenty of free CPU available and almost no IO-wait but CPU bound queries seem to linger for some reason. Problem appears very clearly during checkpointing. Queries accumulate and when checkpointing is over, there can be something like 400 queries running but over 50% of cpu is just idling. procs ---memory ---swap-- -io --system-- cpu r b swpd free buffcache si sobibo incs us sy id wa 3 1 0 494008 159492 1410718000 919 3164 3176 13031 29 12 52 8 5 3 0 477508 159508 1411845200 1071 4479 3474 13298 27 13 47 13 0 0 0 463604 159532 1412883200 922 2903 3352 12627 29 11 52 8 3 1 0 442260 159616 1414166800 1208 3153 3357 13163 28 12 52 9 An example of a lingering query (there's usually several of these or similar): SELECT u.uid, u.nick, u.name, u.showname, i.status, i.stamp, i.image_id, i.info, i.t_width, i.t_height FROM users u INNER JOIN image i ON i.uid = u.uid INNER JOIN user_online uo ON u.uid = uo.uid WHERE u.city_id = 5 AND i.status = 'd' AND u.status = 'a' ORDER BY city_id, upper(u.nick) LIMIT (40 + 1) OFFSET 320 Tables involved contain no more than 4 million rows. Those are constantly accessed and should fit nicely to cache. But database is just slow because of some unknown reason. Any ideas? -8 Relevant rows from postgresql.conf 8- shared_buffers = 15000 # min 16, at least max_connections*2, 8KB each work_mem = 1536 # min 64, size in KB maintenance_work_mem = 32768# min 1024, size in KB max_fsm_pages = 100 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 5000# min 100, ~50 bytes each vacuum_cost_delay = 15 # 0-1000 milliseconds vacuum_cost_limit = 120 # 0-1 credits bgwriter_percent = 2# 0-100% of dirty buffers in each round fsync = true# turns forced synchronization on or off # fsync, fdatasync, open_sync, or open_datasync wal_buffers = 128 # min 4, 8KB each commit_delay = 8# range 0-10, in microseconds commit_siblings = 10# range 1-1000 checkpoint_segments = 200 # in logfile segments, min 1, 16MB each checkpoint_timeout = 1800 # range 30-3600, in seconds effective_cache_size = 100 # typically 8KB each random_page_cost = 1.8 # units are one sequential page fetch cost default_statistics_target = 150 # range 1-1000 stats_start_collector = true stats_command_string = true |\__/| ( oo )Kari Lavikka - [EMAIL PROTECTED] - (050) 380 3808 __ooO( )Ooo___ _ ___ _ _ _ __ _ _ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Finding bottleneck
Hi, On Thu, 28 Jul 2005, Kari Lavikka wrote: -8 Relevant rows from postgresql.conf 8- shared_buffers = 15000 # min 16, at least max_connections*2, 8KB each work_mem = 1536 # min 64, size in KB As an aside, I'd increase work_mem -- but it doesn't sound like that is your problem. maintenance_work_mem = 32768# min 1024, size in KB max_fsm_pages = 100 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 5000# min 100, ~50 bytes each vacuum_cost_delay = 15 # 0-1000 milliseconds vacuum_cost_limit = 120 # 0-1 credits bgwriter_percent = 2# 0-100% of dirty buffers in each round fsync = true# turns forced synchronization on or off # fsync, fdatasync, open_sync, or open_datasync wal_buffers = 128 # min 4, 8KB each Some benchmarking results out today suggest that wal_buffers = 1024 or even 2048 could greatly assist you. commit_delay = 8# range 0-10, in microseconds commit_siblings = 10# range 1-1000 This may explain the fact that you've got backed up queries and idle CPU -- I'm not certain though. What does disabling commit_delay do to your situation? Gavin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Finding bottleneck
effective_cache_size = 100 # typically 8KB each I have this setting on postgresql 7.4.8 on FreeBSD with 4 GB RAM: effective_cache_size = 27462 So eventhough your machine runs Debian and you have four times as much RAM as mine your effective_cache_size is 36 times larger. You could try lowering this setting. regards Claus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Finding bottleneck
Kari Lavikka wrote: shared_buffers = 15000 you can play around with this one but in my experience it doesn't make much difference anymore (it used to). work_mem = 1536 # min 64, size in KB this seems low. are you sure you are not getting sorts swapped to disk? fsync = true# turns forced synchronization on or off does turning this to off make a difference? This would help narrow down where the problem is. commit_delay = 8# range 0-10, in microseconds hm! how did you arrive at this number? try setting to zero and comparing. stats_start_collector = true stats_command_string = true with a high query load you may want to consider turning this off. On win32, I've had some problem with stat's collector under high load conditions. Not un unix, but it's something to look at. Just turn off stats for a while and see if it helps. good luck! your hardware should be more than adequate. Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Finding bottleneck
On 7/28/05 2:21 AM, Kari Lavikka [EMAIL PROTECTED] wrote: There's a new profiling tool called oprofile: http://oprofile.sourceforge.net/download/ that can be run without instrumenting the binaries beforehand. To actually find out what the code is doing during these stalls, oprofile can show you in which routines the CPU is spending time when you start/stop the profiling. As an alternative to the guess-change parameters-repeat approach, this is the most direct way to find the exact nature of the problem. - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings