Re: [PERFORM] Very slow update statement on 40mio rows
Hello Merlin, thanks for the feedback, I forwarded this to my developer, this is an interesting approach. -- Best regards Florian Schröck On 02/19/2013 09:04 PM, Merlin Moncure wrote: On Fri, Feb 15, 2013 at 9:32 AM, Florian Schröck fschro...@aycan.de wrote: Hello Kevin, not updating every row which doesn't need the update solved the problem! Your query took only 1 minute. :) Thank you so much for the fast response, have a great weekend! PS: When you switch to TEXT on the explain URL you can see the final runtime which was 66 minutes with the original statement. Best regards, Florian On 02/15/2013 03:59 PM, Kevin Grittner wrote: Florian Schröck fschro...@aycan.de wrote: UPDATE BackupFiles SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, cStatus='NEW'::StatusT, bOnSetBlue=false, bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE'; Explain analyze: http://explain.depesz.com/s/8y5 The statement takes 60-90 minutes. The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41 seconds. I tried to optimize the settings but until now without success. Can we optimize this update statement somehow? Do you have any other ideas? Are there any rows which would already have the values that you are setting? If so, it would be faster to skip those by using this query: UPDATE BackupFiles SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, cStatus='NEW'::StatusT, bOnSetBlue=false, bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE') AND (nTapeNr 0 OR nAFIOCounter 0 OR nBlockCounter 0 OR cStatus 'NEW'::StatusT OR bOnSetBlue IS DISTINCT FROM false OR bOnSetYellow IS DISTINCT FROM false OR nLastBackupTS '0001-01-01 00:00:00'); Another way to accomplish this is with the suppress_redundant_updates_trigger() trigger function: http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html if the number of rows you actually update is not very large relative to size of the table, just for fun, try this: CREATE OR REPLACE FUNCTION BakupFilesCandidateReset(BackupFiles) RETURNS BOOL AS $$ SELECT ($1).cStatus IN('NEW', 'WRITING', 'ONTAPE') AND (($1).nTapeNr, ($1).nAFIOCounter, ($1).nBlockCounter, ($1).cStatus, ($1).bOnSetBlue, ($1).bOnSetYellow, ($1).nLastBackupTS) IS DISTINCT FROM /* simple != will suffice if values are never null */ (0, 0, 0, 'NEW'::StatusT, false, false, '0001-01-01 00:00:00'); $$ LANGUAGE SQL IMMUTABLE; CREATE INDEX ON BackupFiles(BakupFilesCandidateReset(BackupFiles)) WHERE BakupFilesCandidateReset(BackupFiles); SELECT * FROM BackupFiles WHERE BakupFilesCandidateReset(BackupFiles); UPDATE BackupFiles SET ... WHERE BakupFilesCandidateReset(BackupFiles); etc idea here is to maintain partial boolean index representing candidate records to update. plus it's nifty. this is basic mechanism that can be used as foundation for very fast push pull queues. merlin
Re: [PERFORM] Server stalls, all CPU 100% system time
On 25/02/2013 12:45 AM, Tom Lane wrote: Odd. I wonder if you are seeing some variant of the old context swap storm problem. The 99.8% system time reading is suggestive but hardly conclusive. Does top's report of context swap rate go to the moon? It would be interesting to strace a few of the server processes while one of these events is happening, too. regards, tom lane I used vmstat to look at the context swaps, they were around 5k and 15k interrupts per second. I thought that it was to many interrupts and after a bit of search a website mentioned that the network card driver could cause that. After updating kernel and the driver the stalling is not reproducible any more. Weird enough, when I load test the server now I have 35k interrupts and 250k context switches, but no problems at all. Thanks for pointing me into the right direction. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are bitmap index scans slow to start?
On Mon, Feb 25, 2013 at 9:04 AM, Carlo Stonebanks stonec.regis...@sympatico.ca wrote: Hi Jeff, thanks for the insight. ** ** And then the next question would be, once they are in the cache, why don't they stay there? For that you would have to know what other types of activities are going on that might be driving the data out of the cache.** ** ** ** ** ** To give you an idea of the activity level, each physical machine hosts multiple DB’s with the same structure – one DB per client. ** ** We run automated ETL processes which digests client feeds (E) normalizes them (T) and then stores them in our DB (L). ** ** Looking at the stats from our audit log, the average feed load is 4 hours, divided up into 14 client sessions. Each session averages about 50 write (update, insert, no deletes) operations per second, representing 700 write operations per second. Is each of these write operations just covering a single row? Does this description apply to just one of the many (how many?) databases, so that there are really 14*N concurrent sessions? The ratio of reads per write is pretty high as the system goes through the transformation process. ** ** Since I don’t know how this compares to other PG installations, the question of using periodic REINDEX and CLUSTER brings up these questions:* *** ** ** **1) **Because we are hosting multiple DB’s, what is the impact on OS and disk caches? They have to share the RAM. One strategy would be run ETL processes only one at a time, rather than trying to run several concurrently, if that is what you are doing. That way you can concentrate one customers data in RAM, and then another's, to reduce the competition. **2) **Is there an automated CLUSTER and REINDEX strategy that will not interfere with normal operations? **3) **By PG standards, is this a busy DB - and does explain why the general caches expire? You really need to know whether those reads and writes are concentrated in a small region (relative to the amount of your RAM), or widely scattered. If you are reading and writing intensively (which you do seem to be doing) but only within a compact region, then it should not drive other data out of the cache. But, since you do seem to have IO problems from cache misses, and you do have a high level of activity, the easy conclusion is that you have too little RAM to hold the working size of your data. Cheers, Jeff
[PERFORM] Estimation question...
Howdy, the query generator in my app sometimes creates redundant filters of the form: project_id IN ( list of projects user has permission to see ) AND project_id = single project user is looking at ... and this is leading to a bad estimate (and thus a bad plan) on a few complex queries. I've included simplified examples below. This server is running 9.0.10 and the statistics target has been updated to 1000 on the project_id column. I've also loaded the one table into a 9.2.2 instance and replicated the behaviour. I can change how the query is being generated, but I'm curious why I'm getting a bad estimate. Is this an expected result? Thanks! Matt = 1) Filter on project_id only, row estimate for Bitmap Index Scan quite good. explain (analyze,buffers) select count(id) from versions WHERE project_id=115; QUERY PLAN --- Aggregate (cost=1218111.01..1218111.02 rows=1 width=4) (actual time=1531.341..1531.342 rows=1 loops=1) Buffers: shared hit=452619 - Bitmap Heap Scan on versions (cost=34245.06..1215254.86 rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197 loops=1) Recheck Cond: (project_id = 115) Buffers: shared hit=452619 - Bitmap Index Scan on versions_project_id (cost=0.00..33959.45 rows=1142461 width=0) (actual time=139.709..139.709 rows=1116037 loops=1) Index Cond: (project_id = 115) Buffers: shared hit=22077 Total runtime: 1531.399 ms 2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower. explain (analyze,buffers) select count(id) from versions WHERE project_id IN (80,115) AND project_id=115;; QUERY PLAN - Aggregate (cost=327066.18..327066.19 rows=1 width=4) (actual time=1637.889..1637.889 rows=1 loops=1) Buffers: shared hit=458389 - Bitmap Heap Scan on versions (cost=3546.56..326793.17 rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180 loops=1) Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) Buffers: shared hit=458389 - Bitmap Index Scan on versions_project_id (cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502 rows=1125436 loops=1) Index Cond: ((project_id = ANY ('{80,115}'::integer[])) AND (project_id = 115)) Buffers: shared hit=22076 Total runtime: 1637.941 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04
On Fri, Feb 15, 2013 at 10:52 AM, Scott Marlowe scott.marl...@gmail.comwrote: On Fri, Feb 15, 2013 at 11:26 AM, Josh Berkus j...@agliodbs.com wrote: On 02/14/2013 08:47 PM, Scott Marlowe wrote: If you run your benchmarks for more than a few minutes I highly recommend enabling sysstat service data collection, then you can look at it after the fact with sar. VERY useful stuff both for benchmarking and post mortem on live servers. Well, background sar, by default on Linux, only collects every 30min. For a benchmark run, you want to generate your own sar file, for example: On all my machines (debian and ubuntu) it collects every 5. All of mine were 10, but once I figured out to edit /etc/cron.d/sysstat they are now every 1 minute. sar has some remarkably opaque documentation, but I'm glad I tracked that down. Cheers, Jeff
Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04
On Tue, Feb 26, 2013 at 2:30 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Feb 15, 2013 at 10:52 AM, Scott Marlowe scott.marl...@gmail.com wrote: On Fri, Feb 15, 2013 at 11:26 AM, Josh Berkus j...@agliodbs.com wrote: On 02/14/2013 08:47 PM, Scott Marlowe wrote: If you run your benchmarks for more than a few minutes I highly recommend enabling sysstat service data collection, then you can look at it after the fact with sar. VERY useful stuff both for benchmarking and post mortem on live servers. Well, background sar, by default on Linux, only collects every 30min. For a benchmark run, you want to generate your own sar file, for example: On all my machines (debian and ubuntu) it collects every 5. All of mine were 10, but once I figured out to edit /etc/cron.d/sysstat they are now every 1 minute. oh yeah it's every 10 on the 5s. I too need to go to 1minute intervals. sar has some remarkably opaque documentation, but I'm glad I tracked that down. It's so incredibly useful. When a machine is acting up often getting it back online is more important than fixing it right then, and most of the system state stuff is lost on reboot / fix. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Server stalls, all CPU 100% system time
Andre, Please see the related thread on this list, High CPU usage / load average after upgrading to Ubuntu 12.04. You may be experiencing some of the same issues. General perspective seems to be that kernels 3.0 through 3.4 have serious performance issues. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are bitmap index scans slow to start?
Is each of these write operations just covering a single row? Does this description apply to just one of the many (how many?) databases, so that there are really 14*N concurrent sessions? All writes are single row. All DB's have exactly the same structure, only the content is different. Currently the server is hosting five active DB's - although there 14 DB's actually on the host, the balance are backups and or testing environments. When a feed comes in, it can be anything from dozens to millions of rows, and may take minutes or days to run. I had asked that PG bouncer be installed in front of the host to act as a traffic cop. Try as I may to convince the engineering team that fewer sessions running faster is optimal, they say that the 14 concurrent sessions is based on real-world experience of what imports the fastest. You really need to know whether those reads and writes are concentrated in a small region (relative to the amount of your RAM), or widely scattered. If you are reading and writing intensively (which you do seem to be doing) but only within a compact region, then it should not drive other data out of the cache. But, since you do seem to have IO problems from cache misses, and you do have a high level of activity, the easy conclusion is that you have too little RAM to hold the working size of your data. It won't be a problem of physical RAM, I believe there is at least 32GB of RAM. What constitutes a compact region? The ETL process takes the feed and distributes it to 85 core tables. I have been through many PG configuration cycles with the generous help of people in this forum. I think the big problem when getting help has been this issue of those offering assistance understanding that the whopping majority of the time, the system is performing single row reads and writes. The assumption tends to be that the end point of an ETL should just be a series of COPY statements, and it should all happen very quickly in classic SQL bulk queries.
Re: [PERFORM] Server stalls, all CPU 100% system time
On Tue, Feb 26, 2013 at 4:29 PM, Josh Berkus j...@agliodbs.com wrote: Andre, Please see the related thread on this list, High CPU usage / load average after upgrading to Ubuntu 12.04. You may be experiencing some of the same issues. General perspective seems to be that kernels 3.0 through 3.4 have serious performance issues. Someone commented they think it might be related to this kernel bug: https://lkml.org/lkml/2012/10/9/210 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance