Re: [PERFORM] Very slow update statement on 40mio rows

2013-02-26 Thread Florian Schröck
Hello Merlin,
thanks for the feedback, I forwarded this to my developer, this is an
interesting approach.


-- 

Best regards

Florian Schröck

On 02/19/2013 09:04 PM, Merlin Moncure wrote:
 On Fri, Feb 15, 2013 at 9:32 AM, Florian Schröck fschro...@aycan.de wrote:
 Hello Kevin,
 not updating every row which doesn't need the update solved the problem!
 Your query took only 1 minute. :)

 Thank you so much for the fast response, have a great weekend!

 PS: When you switch to TEXT on the explain URL you can see the final
 runtime which was 66 minutes with the original statement.

 Best regards,
 Florian

 On 02/15/2013 03:59 PM, Kevin Grittner wrote:
 Florian Schröck fschro...@aycan.de wrote:

 UPDATE BackupFiles
SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
cStatus='NEW'::StatusT, bOnSetBlue=false,
bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE';

 Explain analyze: http://explain.depesz.com/s/8y5
 The statement takes 60-90 minutes.
 The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41
 seconds.

 I tried to optimize the settings but until now without success.

 Can we optimize this update statement somehow? Do you have any
 other ideas?
 Are there any rows which would already have the values that you are
 setting?  If so, it would be faster to skip those by using this
 query:

 UPDATE BackupFiles
   SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0,
   cStatus='NEW'::StatusT, bOnSetBlue=false,
   bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00'
   WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE')
 AND (nTapeNr  0 OR nAFIOCounter  0 OR nBlockCounter  0
  OR cStatus  'NEW'::StatusT
  OR bOnSetBlue IS DISTINCT FROM false
  OR bOnSetYellow IS DISTINCT FROM false
  OR nLastBackupTS  '0001-01-01 00:00:00');

 Another way to accomplish this is with the
 suppress_redundant_updates_trigger() trigger function:

 http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html
 if the number of rows you actually update is not very large relative
 to size of the table, just for fun, try this:


 CREATE OR REPLACE FUNCTION BakupFilesCandidateReset(BackupFiles)
 RETURNS BOOL AS
 $$
   SELECT ($1).cStatus IN('NEW', 'WRITING', 'ONTAPE')
   AND
 (($1).nTapeNr, ($1).nAFIOCounter, ($1).nBlockCounter,
 ($1).cStatus, ($1).bOnSetBlue, ($1).bOnSetYellow, ($1).nLastBackupTS)
 IS DISTINCT FROM /* simple != will suffice if values are never null */
(0, 0, 0, 'NEW'::StatusT, false, false, '0001-01-01 00:00:00');
 $$ LANGUAGE SQL IMMUTABLE;

 CREATE INDEX ON BackupFiles(BakupFilesCandidateReset(BackupFiles))
   WHERE BakupFilesCandidateReset(BackupFiles);


 SELECT * FROM BackupFiles WHERE BakupFilesCandidateReset(BackupFiles);
 UPDATE BackupFiles SET ... WHERE BakupFilesCandidateReset(BackupFiles);
 etc

 idea here is to maintain partial boolean index representing candidate
 records to update.  plus it's nifty.   this is basic mechanism that
 can be used as foundation for very fast push pull queues.

 merlin




Re: [PERFORM] Server stalls, all CPU 100% system time

2013-02-26 Thread Andre

On 25/02/2013 12:45 AM, Tom Lane wrote:
Odd. I wonder if you are seeing some variant of the old context swap storm problem. The 99.8% system time reading is suggestive but hardly conclusive. Does top's report of context swap rate go to the moon? It would be interesting to strace a few of the server processes while one of these events is happening, too. regards, tom lane 


I used vmstat to look at the context swaps, they were around 5k and 15k 
interrupts per second.
I thought that it was to many interrupts and after a bit of search a website 
mentioned that the network card driver could cause that. After updating kernel 
and the driver the stalling is not reproducible any more.

Weird enough, when I load test the server now I have 35k interrupts and 250k 
context switches, but no problems at all.

Thanks for pointing me into the right direction.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Are bitmap index scans slow to start?

2013-02-26 Thread Jeff Janes
On Mon, Feb 25, 2013 at 9:04 AM, Carlo Stonebanks 
stonec.regis...@sympatico.ca wrote:

 Hi Jeff, thanks for the insight.

 ** **

  And then the next question would be, once they are in the cache, why
 don't they stay there?  For that you would have to know what other types of
 activities are going on that might be driving the data out of the cache.**
 **

 ** **

 ** **

 To give you an idea of the activity level, each physical machine hosts
 multiple DB’s with the same structure – one DB per client.

 ** **

 We run automated ETL processes which digests client feeds (E) normalizes
 them (T) and then stores them in our DB (L).

 ** **

 Looking at the stats from our audit log, the average feed load is 4 hours,
 divided up into 14 client sessions. Each session averages about 50 write
 (update, insert, no deletes) operations per second, representing 700 write
 operations per second.


Is each of these write operations just covering a single row?  Does this
description apply to just one of the many (how many?) databases, so that
there are really 14*N concurrent sessions?


 The ratio of reads per write is pretty high as the system goes through the
 transformation process.

 ** **

 Since I don’t know how this compares to other PG installations, the
 question of using periodic REINDEX and CLUSTER brings up these questions:*
 ***

 ** **

 **1)  **Because we are hosting multiple DB’s, what is the impact on
 OS and disk caches?


They have to share the RAM.  One strategy would be run ETL processes only
one at a time, rather than trying to run several concurrently, if that is
what you are doing.  That way you can concentrate one customers data in
RAM, and then another's, to reduce the competition.


 

 **2)  **Is there an automated CLUSTER and REINDEX strategy that will
 not interfere with normal operations?

 **3)  **By PG standards, is this a busy DB - and does explain why the
 general caches expire?


You really need to know whether those reads and writes are concentrated in
a small region (relative to the amount of your RAM), or widely scattered.
If you are reading and writing intensively (which you do seem to be doing)
but only within a compact region, then it should not drive other data out
of the cache.  But, since you do seem to have IO problems from cache
misses, and you do have a high level of activity, the easy conclusion is
that you have too little RAM to hold the working size of your data.

Cheers,

Jeff


[PERFORM] Estimation question...

2013-02-26 Thread Matt Daw
Howdy, the query generator in my app sometimes creates redundant
filters of the form:

project_id IN ( list of projects user has permission to see ) AND
project_id = single project user is looking at 

... and this is leading to a bad estimate (and thus a bad plan) on a
few complex queries. I've included simplified examples below. This
server is running 9.0.10 and the statistics target has been updated to
1000 on the project_id column. I've also loaded the one table into a
9.2.2 instance and replicated the behaviour.

I can change how the query is being generated, but I'm curious why I'm
getting a bad estimate. Is this an expected result?

Thanks!

Matt

=

1) Filter on project_id only, row estimate for Bitmap Index Scan quite good.

explain (analyze,buffers) select count(id) from versions WHERE project_id=115;

 QUERY PLAN
---
 Aggregate  (cost=1218111.01..1218111.02 rows=1 width=4) (actual
time=1531.341..1531.342 rows=1 loops=1)
   Buffers: shared hit=452619
   -  Bitmap Heap Scan on versions  (cost=34245.06..1215254.86
rows=1142461 width=4) (actual time=148.394..1453.383 rows=1114197
loops=1)
 Recheck Cond: (project_id = 115)
 Buffers: shared hit=452619
 -  Bitmap Index Scan on versions_project_id
(cost=0.00..33959.45 rows=1142461 width=0) (actual
time=139.709..139.709 rows=1116037 loops=1)
   Index Cond: (project_id = 115)
   Buffers: shared hit=22077
 Total runtime: 1531.399 ms

2) Filter on project_id IN () AND project_id. Row estimate is ~10x lower.

explain (analyze,buffers) select count(id) from versions WHERE
project_id IN (80,115) AND project_id=115;;

QUERY PLAN
-
 Aggregate  (cost=327066.18..327066.19 rows=1 width=4) (actual
time=1637.889..1637.889 rows=1 loops=1)
   Buffers: shared hit=458389
   -  Bitmap Heap Scan on versions  (cost=3546.56..326793.17
rows=109201 width=4) (actual time=155.107..1557.453 rows=1114180
loops=1)
 Recheck Cond: ((project_id = ANY ('{80,115}'::integer[])) AND
(project_id = 115))
 Buffers: shared hit=458389
 -  Bitmap Index Scan on versions_project_id
(cost=0.00..3519.26 rows=109201 width=0) (actual time=145.502..145.502
rows=1125436 loops=1)
   Index Cond: ((project_id = ANY ('{80,115}'::integer[]))
AND (project_id = 115))
   Buffers: shared hit=22076
 Total runtime: 1637.941 ms


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

2013-02-26 Thread Jeff Janes
On Fri, Feb 15, 2013 at 10:52 AM, Scott Marlowe scott.marl...@gmail.comwrote:

 On Fri, Feb 15, 2013 at 11:26 AM, Josh Berkus j...@agliodbs.com wrote:
  On 02/14/2013 08:47 PM, Scott Marlowe wrote:
  If you run your benchmarks for more than a few minutes I highly
  recommend enabling sysstat service data collection, then you can look
  at it after the fact with sar.  VERY useful stuff both for
  benchmarking and post mortem on live servers.
 
  Well, background sar, by default on Linux, only collects every 30min.
  For a benchmark run, you want to generate your own sar file, for example:

 On all my machines (debian and ubuntu) it collects every 5.


All of mine were 10, but once I figured out to edit /etc/cron.d/sysstat
they are now every 1 minute.

sar has some remarkably opaque documentation, but I'm glad I tracked that
down.

Cheers,

Jeff


Re: [PERFORM] High CPU usage / load average after upgrading to Ubuntu 12.04

2013-02-26 Thread Scott Marlowe
On Tue, Feb 26, 2013 at 2:30 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Feb 15, 2013 at 10:52 AM, Scott Marlowe scott.marl...@gmail.com
 wrote:

 On Fri, Feb 15, 2013 at 11:26 AM, Josh Berkus j...@agliodbs.com wrote:
  On 02/14/2013 08:47 PM, Scott Marlowe wrote:
  If you run your benchmarks for more than a few minutes I highly
  recommend enabling sysstat service data collection, then you can look
  at it after the fact with sar.  VERY useful stuff both for
  benchmarking and post mortem on live servers.
 
  Well, background sar, by default on Linux, only collects every 30min.
  For a benchmark run, you want to generate your own sar file, for
  example:

 On all my machines (debian and ubuntu) it collects every 5.


 All of mine were 10, but once I figured out to edit /etc/cron.d/sysstat they
 are now every 1 minute.

oh yeah it's every 10 on the 5s.  I too need to go to 1minute intervals.

 sar has some remarkably opaque documentation, but I'm glad I tracked that
 down.

It's so incredibly useful.  When a machine is acting up often getting
it back online is more important than fixing it right then, and most
of the system state stuff is lost on reboot / fix.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Server stalls, all CPU 100% system time

2013-02-26 Thread Josh Berkus
Andre,

Please see the related thread on this list, High CPU usage / load
average after upgrading to Ubuntu 12.04.  You may be experiencing some
of the same issues.  General perspective seems to be that kernels 3.0
through 3.4 have serious performance issues.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Are bitmap index scans slow to start?

2013-02-26 Thread Carlo Stonebanks
Is each of these write operations just covering a single row?  Does this
description apply to just one of the many (how many?) databases, so that
there are really 14*N concurrent sessions?

 

 

All writes are single row. All DB's have exactly the same structure, only
the content is different. Currently the server is hosting five active DB's -
although there 14 DB's actually on the host, the balance are backups and or
testing environments.

 

When a feed comes in, it can be anything from dozens to millions of rows,
and may take minutes or days to run. I had asked that PG bouncer be
installed in front of the host to act as a traffic cop. Try as I may to
convince the engineering team that fewer sessions running faster is optimal,
they say that the 14 concurrent sessions is based on real-world experience
of what imports the fastest.

 

 You really need to know whether those reads and writes are concentrated
in a small region (relative to the amount of your RAM), or widely scattered.
If you are reading and writing intensively (which you do seem to be doing)
but only within a compact region, then it should not drive other data out of
the cache.  But, since you do seem to have IO problems from cache misses,
and you do have a high level of activity, the easy conclusion is that you
have too little RAM to hold the working size of your data.


 

It won't be a problem of physical RAM, I believe there is at least 32GB of
RAM. What constitutes a compact region? The ETL process takes the feed and
distributes it to 85 core tables.  I have been through many PG configuration
cycles with the generous help of people in this forum. I think the big
problem when getting help has been this issue of those offering assistance
understanding that the whopping majority of the time, the system is
performing single row reads and writes. The assumption tends to be that the
end point of an ETL should just be a series of COPY statements, and it
should all happen very quickly in classic SQL bulk queries.



Re: [PERFORM] Server stalls, all CPU 100% system time

2013-02-26 Thread Scott Marlowe
On Tue, Feb 26, 2013 at 4:29 PM, Josh Berkus j...@agliodbs.com wrote:
 Andre,

 Please see the related thread on this list, High CPU usage / load
 average after upgrading to Ubuntu 12.04.  You may be experiencing some
 of the same issues.  General perspective seems to be that kernels 3.0
 through 3.4 have serious performance issues.

Someone commented they think it might be related to this kernel bug:

https://lkml.org/lkml/2012/10/9/210


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance