[PERFORM] Can lots of small writes badly hamper reads from other tables?

2012-01-24 Thread Dave Crooke
Hi folks

This could be a sheer volume issue, but I though I would ask the wisdom of
this forum as to next investigative steps.



We use PostgreSQL 8.4.4 which is bundled with our application as a VMware
virtual appliance. The bulk of the app's database activity is recording
performance data points which arrive in farily large sustained bursts of
perhaps 10,000 rows a minute at a medium sized customer, each of which are
logically separate items and being committed as individual transactions
(JDBC auto-commit mode). Our offshore QA team was assigned to track an
intermittent issue with speed of some large queries on other tables, and
they believe based on correlation the two activities may be contending.

The large query is coming off of different tables from the ones being
written to ... the raw data goes into a table named by day (partitioning is
all within the app, not PG) e.g. PERF_RAW_2012_01_24 and then there are a
bunch of rollup statements which run hourly to do the aggregations, e.g.

insert into PERF_HOURLY_2012_01_24 select key_columns, avg(data), now()
from perf_raw_2012_01_24 where time_stamp between (now() - interval '1
hour') and now() group by key_columns

The big queries are hitting multiple of the PERF_HOURLY tables and pulling
a few dozen rows from each.

We are using a 64-bit VM with 8 virtual cores and 8GB RAM, of which Java
takes a bit over half, and Linux X with CentOS 5.x   PG has 1GB of
buffer cache and reasonable (AFAICT) resource limits for everything else,
which are intended to be workable for a range of client sizes out of the
box. True transactional consistency is disabled for performance reasons,
virtual environments do not take kindly to lots of small writes.

---

Is there any tweaking we should do on the PG settings, or on the pattern in
which the app is writing - we currently use 10 writer threads on the Java
side and they keep PG going pretty good.

I considered bundling the writes into larger transactions, will that really
help much with commit consistency off?

Is there some specific usual suspect stuff I should look at on the PG
side to look for efficiency issues such as index lock contention or a poor
buffer cache hit ratio? Will doing EXPLAIN ANALYSE on the big query be
informative, and if so, does it need to be done while the write load is
applied?

The other whacky idea I had was to have the writer threads pause or
throttle themselves when a big query is happening (it's all in one JVM and
we are using a connection pooler, so it's easy to intercept and track if
needed) however that strikes me as a rather ugly hack and I'd prefer to do
something more robust and based on config tweaks that leverage existing
resource management in PG.

Relevant schema and config attached, all comments and advice welcome,
including general tuning tips and rationale for moving to PG 9.x  I'm
well aware this isn't the acme of PG tuning :)

Cheers
Dave


postgresql.conf
Description: Binary data
-- The current day's perf_raw table contains the extra b index
-- to facilitate certain queries including the rollup ones which
-- create the perf_hourly tables. This index is dropped once the
-- table is no longer being written to.

Table public.perf_raw_2011_12_31
 Column  |Type | Modifiers 
-+-+---
 guid_key| integer | 
 property_key| integer | 
 instance_id_key | integer | 
 time_stamp  | timestamp without time zone | 
 value   | double precision| 
Indexes:
perf_raw_2011_12_31_a btree (guid_key, property_key, time_stamp)
perf_raw_2011_12_31_b btree (time_stamp, property_key)

Table public.perf_raw_2011_12_29
 Column  |Type | Modifiers 
-+-+---
 guid_key| integer | 
 property_key| integer | 
 instance_id_key | integer | 
 time_stamp  | timestamp without time zone | 
 value   | double precision| 
Indexes:
perf_raw_2011_12_29_a btree (guid_key, property_key, time_stamp)

   Table public.perf_hourly_2011_12_29
 Column  |Type | Modifiers 
-+-+---
 guid_key| integer | 
 property_key| integer | 
 instance_id_key | integer | 
 time_stamp  | timestamp without time zone | 
 value   | double precision| 
Indexes:
perf_hourly_2011_12_29_a btree (guid_key, property_key, time_stamp)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Cursor fetch performance issue

2012-01-24 Thread Tony Capobianco
We are migrating our Oracle warehouse to Postgres 9.

This function responds well:

pg=# select public.getMemberAdminPrevious_sp2(247815829, 
1,'test.em...@hotmail.com', 'email', 'test');
 getmemberadminprevious_sp2 

 unnamed portal 1
(1 row)

Time: 7.549 ms

However, when testing, this fetch takes upwards of 38 minutes:

BEGIN;
select public.getMemberAdminPrevious_sp2(247815829, 1,'test.em...@hotmail.com', 
'email', 'test');
FETCH ALL IN unnamed portal 2;

How can I diagnose any performance issues with the fetch in the cursor?

Thanks.
Tony


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cursor fetch performance issue

2012-01-24 Thread Pavel Stehule
Hello

2012/1/24 Tony Capobianco tcapobia...@prospectiv.com:
 We are migrating our Oracle warehouse to Postgres 9.

 This function responds well:

 pg=# select public.getMemberAdminPrevious_sp2(247815829, 
 1,'test.em...@hotmail.com', 'email', 'test');
  getmemberadminprevious_sp2
 
  unnamed portal 1
 (1 row)

 Time: 7.549 ms

 However, when testing, this fetch takes upwards of 38 minutes:

 BEGIN;
 select public.getMemberAdminPrevious_sp2(247815829, 
 1,'test.em...@hotmail.com', 'email', 'test');
 FETCH ALL IN unnamed portal 2;

 How can I diagnose any performance issues with the fetch in the cursor?


Cursors are optimized to returns small subset of result - if you plan
to read complete result, then set

set cursor_tuple_fraction to 1.0;

this is session config value, you can set it before selected cursors queries

Regards

Pavel Stehule

 Thanks.
 Tony


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cursor fetch performance issue

2012-01-24 Thread Tony Capobianco
Running just the sql of the function returns only 10 rows:

pg=# SELECT m.memberid, m.websiteid, m.emailaddress, 
pg-#m.firstname, m.lastname, m.regcomplete, m.emailok
pg-# FROM   members m
pg-# WHERE  m.emailaddress LIKE 'test.em...@hotmail.com'
pg-# ANDm.changedate_id  5868 ORDER BY m.emailaddress, m.websiteid;
 memberid  | websiteid |emailaddress| firstname | lastname | 
regcomplete | emailok 
---+---++---+--+-+-
 247815829 | 1 | test.em...@hotmail.com | email | test |
   1 |   1
 300960335 |62 | test.em...@hotmail.com |   |  |
   1 |   1
 300959937 |   625 | test.em...@hotmail.com |   |  |
   1 |   1
 260152830 |  1453 | test.em...@hotmail.com |   |  |
   1 |   1
 300960163 |  1737 | test.em...@hotmail.com | email | test |
   1 |   1
 300960259 |  1824 | test.em...@hotmail.com | email | test |
   1 |   1
 300959742 |  1928 | test.em...@hotmail.com | email | test |
   1 |   1
 368122699 |  2457 | test.em...@hotmail.com | email | test |
   1 |   1
 403218613 |  2464 | test.em...@hotmail.com | email | test |
   1 |   0
 378951994 |  2656 | test.em...@hotmail.com |   |  |
   1 |   1
(10 rows)

Time: 132.626 ms

So, it would seem that's a small enough number of rows.  Unfortunately, 
issuing: 

set cursor_tuple_fraction to 1.0;

Did not have an effect on performance.  Is it common to modify this
cursor_tuple_fraction parameter each time we execute the function?


On Tue, 2012-01-24 at 21:47 +0100, Pavel Stehule wrote:
 Hello
 
 2012/1/24 Tony Capobianco tcapobia...@prospectiv.com:
  We are migrating our Oracle warehouse to Postgres 9.
 
  This function responds well:
 
  pg=# select public.getMemberAdminPrevious_sp2(247815829, 
  1,'test.em...@hotmail.com', 'email', 'test');
   getmemberadminprevious_sp2
  
   unnamed portal 1
  (1 row)
 
  Time: 7.549 ms
 
  However, when testing, this fetch takes upwards of 38 minutes:
 
  BEGIN;
  select public.getMemberAdminPrevious_sp2(247815829, 
  1,'test.em...@hotmail.com', 'email', 'test');
  FETCH ALL IN unnamed portal 2;
 
  How can I diagnose any performance issues with the fetch in the cursor?
 
 
 Cursors are optimized to returns small subset of result - if you plan
 to read complete result, then set
 
 set cursor_tuple_fraction to 1.0;
 
 this is session config value, you can set it before selected cursors queries
 
 Regards
 
 Pavel Stehule
 
  Thanks.
  Tony
 
 
  --
  Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can lots of small writes badly hamper reads from other tables?

2012-01-24 Thread Andy Colson

On 1/24/2012 2:16 PM, Dave Crooke wrote:

Hi folks

This could be a sheer volume issue, but I though I would ask the wisdom
of this forum as to next investigative steps.



We use PostgreSQL 8.4.4 which is bundled with our application as a
VMware virtual appliance. The bulk of the app's database activity is
recording performance data points which arrive in farily large sustained
bursts of perhaps 10,000 rows a minute at a medium sized customer, each
of which are logically separate items and being committed as individual
transactions (JDBC auto-commit mode). Our offshore QA team was assigned
to track an intermittent issue with speed of some large queries on other
tables, and they believe based on correlation the two activities may be
contending.


You have 10 connections, all doing:

begin
insert into PERF_RAW_2012_01_24  -- one record
commit


If that's what you're doing, yes, I'd say that's the slowest way possible.

Doing this would be faster:

begin
insert into PERF_RAW_2012_01_24  -- one record
insert into PERF_RAW_2012_01_24  -- one record
...
insert into PERF_RAW_2012_01_24  -- one record
commit

Doing this would be even faster:


begin
-- one insert, multiple rows
insert into PERF_RAW_2012_01_24 values (...) (...) (...) ... (...);
insert into PERF_RAW_2012_01_24 values (...) (...) (...) ... (...);
commit

And, fastest of all fastest, use COPY.  But be careful, its so fast 
it'll melt your face off :-)



I didnt even bother trying to pick out the uncommented settings from 
your .conf file.  Way to much work.


VM usually have pretty slow IO, so you might wanna watch vmstat and 
iostat to see if you are IO bound or CPU bound.


Also watching iostat before and after the change might be interesting.

If you you keep having lots and lots of transaction, look into 
commit_delay, it'll help batch commits out to disk  (if I remember 
correctly).


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can lots of small writes badly hamper reads from other tables?

2012-01-24 Thread Tomas Vondra
On 24 Leden 2012, 21:16, Dave Crooke wrote:
 Hi folks

 This could be a sheer volume issue, but I though I would ask the wisdom of
 this forum as to next investigative steps.

 

 We use PostgreSQL 8.4.4 which is bundled with our application as a VMware
 virtual appliance. The bulk of the app's database activity is recording
 performance data points which arrive in farily large sustained bursts of
 perhaps 10,000 rows a minute at a medium sized customer, each of which are
 logically separate items and being committed as individual transactions
 (JDBC auto-commit mode). Our offshore QA team was assigned to track an
 intermittent issue with speed of some large queries on other tables, and
 they believe based on correlation the two activities may be contending.

 The large query is coming off of different tables from the ones being
 written to ... the raw data goes into a table named by day (partitioning
 is
 all within the app, not PG) e.g. PERF_RAW_2012_01_24 and then there are a
 bunch of rollup statements which run hourly to do the aggregations, e.g.

Each storage device has some basic I/O limits - sequential speed (read/write)
and the maximum number or I/O operations it can handle. For example a 7.2k
drives can do up to 160MB/s sequential reads/writes, but not more than 120
I/O ops per second. Similarly for other devices - 15k drives can do up to
250 I/Os. SSDs can handle much more I/Os, e.g. Intel 320 can handle about
8k I/Os.

I have no idea what kind of storage device you're using and what amount of
sequential and random operations it can handle. But my guess you're hitting
the limit of random I/Os - each commit requires a fsync, and you're doing
10.000 of them per minute, i.e. about 160 per second. If the queries need
to read data from the drive (e.g. randomly), this just adds more I/Os.

 Is there any tweaking we should do on the PG settings, or on the pattern
 in
 which the app is writing - we currently use 10 writer threads on the Java
 side and they keep PG going pretty good.

The first thing you should do is grouping the inserts to one transaction.
That'll lower the number of I/Os the database needs to do. Besides that,
you can move the WAL to a separate (physical) device, thus spreading the
I/Os to more drives.

 I considered bundling the writes into larger transactions, will that
 really
 help much with commit consistency off?

What do you mean by commit consistency off?

 Is there some specific usual suspect stuff I should look at on the PG
 side to look for efficiency issues such as index lock contention or a poor
 buffer cache hit ratio? Will doing EXPLAIN ANALYSE on the big query be
 informative, and if so, does it need to be done while the write load is
 applied?

The first thing you should do is gathering some basic I/O stats.

Run  pg_test_fsync (a contrib module) to see how many fsync operations the
I/O subsystem can handle (if it reports more than 500, use -o to get it
running for a longer time).

Then gather vmstat 1 and iostat -x 1 for a few seconds when the workload
(inserts and queries) are actually running. That should tell you how the
drives are actually utilized.

Post these results to this list.

 Relevant schema and config attached, all comments and advice welcome,
 including general tuning tips and rationale for moving to PG 9.x  I'm
 well aware this isn't the acme of PG tuning :)

There's a nice page about tuning at the wiki:

  http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

I'd recommend significantly increasing the number of checkpoint segments,
e.g. to 64 (1GB) and setting completion target to 0.9. This usually helps
write-heavy workloads. And enable log_checkpoints.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cursor fetch performance issue

2012-01-24 Thread Andy Colson

On Tue, 2012-01-24 at 21:47 +0100, Pavel Stehule wrote:

Hello

2012/1/24 Tony Capobiancotcapobia...@prospectiv.com:

We are migrating our Oracle warehouse to Postgres 9.

This function responds well:

pg=# select public.getMemberAdminPrevious_sp2(247815829, 
1,'test.em...@hotmail.com', 'email', 'test');
  getmemberadminprevious_sp2

  unnamed portal 1
(1 row)

Time: 7.549 ms

However, when testing, this fetch takes upwards of 38 minutes:

BEGIN;
select public.getMemberAdminPrevious_sp2(247815829, 1,'test.em...@hotmail.com', 
'email', 'test');
FETCH ALL IN unnamed portal 2;

How can I diagnose any performance issues with the fetch in the cursor?



Cursors are optimized to returns small subset of result - if you plan
to read complete result, then set

set cursor_tuple_fraction to 1.0;

this is session config value, you can set it before selected cursors queries

Regards

Pavel Stehule


Thanks.
Tony


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance









On 1/24/2012 2:57 PM, Tony Capobianco wrote:
 Running just the sql of the function returns only 10 rows:

 pg=# SELECT m.memberid, m.websiteid, m.emailaddress,
 pg-#m.firstname, m.lastname, m.regcomplete, m.emailok
 pg-# FROM   members m
 pg-# WHERE  m.emailaddress LIKE 'test.em...@hotmail.com'
 pg-# ANDm.changedate_id  5868 ORDER BY m.emailaddress, 
m.websiteid;
   memberid  | websiteid |emailaddress| firstname | 
lastname | regcomplete | emailok
 
---+---++---+--+-+-
   247815829 | 1 | test.em...@hotmail.com | email | test 
   |   1 |   1
   300960335 |62 | test.em...@hotmail.com |   | 
   |   1 |   1
   300959937 |   625 | test.em...@hotmail.com |   | 
   |   1 |   1
   260152830 |  1453 | test.em...@hotmail.com |   | 
   |   1 |   1
   300960163 |  1737 | test.em...@hotmail.com | email | test 
   |   1 |   1
   300960259 |  1824 | test.em...@hotmail.com | email | test 
   |   1 |   1
   300959742 |  1928 | test.em...@hotmail.com | email | test 
   |   1 |   1
   368122699 |  2457 | test.em...@hotmail.com | email | test 
   |   1 |   1
   403218613 |  2464 | test.em...@hotmail.com | email | test 
   |   1 |   0
   378951994 |  2656 | test.em...@hotmail.com |   | 
   |   1 |   1

 (10 rows)

 Time: 132.626 ms

 So, it would seem that's a small enough number of rows. 
Unfortunately, issuing:


 set cursor_tuple_fraction to 1.0;

 Did not have an effect on performance.  Is it common to modify this
 cursor_tuple_fraction parameter each time we execute the function?




So, is getMemberAdminPrevious_sp2() preparing a statement with wildcards?

SELECT m.memberid, m.websiteid, m.emailaddress,
   m.firstname, m.lastname, m.regcomplete, m.emailok
   FROM   members m
   WHERE  m.emailaddress LIKE $1
   ANDm.changedate_id  $2
  ORDER BY m.emailaddress, m.websiteid;

Or is it creating the string and executing it:

sql = 'SELECT m.memberid, m.websiteid, m.emailaddress, '
||  ' m.firstname, m.lastname, m.regcomplete, m.emailok '
||  ' FROM   members m
||  ' WHERE  m.emailaddress LIKE ' || arg1
||  ' ANDm.changedate_id  ' || arg2
||  ' ORDER BY m.emailaddress, m.websiteid ';
execute(sql);

Maybe its the planner doesnt plan so well with $1 arguments vs actual 
arguments thing.


-Andy



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cursor fetch performance issue

2012-01-24 Thread Pavel Stehule
Hello


 So, is getMemberAdminPrevious_sp2() preparing a statement with wildcards?

 SELECT m.memberid, m.websiteid, m.emailaddress,
       m.firstname, m.lastname, m.regcomplete, m.emailok
       FROM   members m
       WHERE  m.emailaddress LIKE $1
       AND    m.changedate_id  $2
      ORDER BY m.emailaddress, m.websiteid;

 Or is it creating the string and executing it:

 sql = 'SELECT m.memberid, m.websiteid, m.emailaddress, '
    ||  ' m.firstname, m.lastname, m.regcomplete, m.emailok '
    ||  ' FROM   members m
    ||  ' WHERE  m.emailaddress LIKE ' || arg1
    ||  ' AND    m.changedate_id  ' || arg2
    ||  ' ORDER BY m.emailaddress, m.websiteid ';
 execute(sql);

 Maybe its the planner doesnt plan so well with $1 arguments vs actual
 arguments thing.


sure, it could be blind optimization problem in plpgsql. Maybe you
have to use a dynamic SQL - OPEN FOR EXECUTE stmt probably

http://www.postgresql.org/docs/9.1/interactive/plpgsql-cursors.html

Regards

Pavel Stehule

 -Andy



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cursor fetch performance issue

2012-01-24 Thread Tom Lane
Tony Capobianco tcapobia...@prospectiv.com writes:
 Running just the sql of the function returns only 10 rows:
 pg=# SELECT m.memberid, m.websiteid, m.emailaddress, 
 pg-#m.firstname, m.lastname, m.regcomplete, m.emailok
 pg-# FROM   members m
 pg-# WHERE  m.emailaddress LIKE 'test.em...@hotmail.com'
 pg-# ANDm.changedate_id  5868 ORDER BY m.emailaddress, 
 m.websiteid;

Based on that, I'd bet your problem is that the function is executing
WHERE  m.emailaddress LIKE $1
(for some spelling of $1) and you are therefore not getting the benefit
of the index optimizations that can happen when LIKE's pattern is
constant.  Do you actually need LIKE rather than just = here?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cursor fetch performance issue

2012-01-24 Thread Tony Capobianco
Here's the explain:

pg=# explain select getMemberAdminPrevious_sp(247815829, 
1,'test.em...@hotmail.com', 'Email', 'Test');
QUERY PLAN
--
 Result  (cost=0.00..0.26 rows=1 width=0)
(1 row)

Time: 1.167 ms

There was discussion of 'LIKE' v. '=' and wildcard characters are not
being entered into the $1 parameter.

This is not generating a sql string.  I feel it's something to do with
the fetch of the refcursor.  The cursor is a larger part of a function:

CREATE OR REPLACE FUNCTION PUBLIC.GETMEMBERADMINPREVIOUS_SP2 ( 
  p_memberid   IN numeric,
  p_websiteid  IN numeric,
  p_emailaddress   IN varchar,
  p_firstname  IN varchar,
  p_lastname   IN varchar)
RETURNS refcursor AS $$
DECLARE
  refrefcursor;
  l_sysdateidnumeric;
BEGIN
  l_sysdateid := sysdateid();
  if (p_memberid != 0) then
if (p_emailaddress IS NOT NULL) then
  OPEN ref FOR
SELECT m.memberid, m.websiteid, m.emailaddress,
   m.firstname, m.lastname, m.regcomplete, m.emailok
FROM   members m
WHERE  m.emailaddress LIKE p_emailaddress
ANDm.changedate_id  l_sysdateid ORDER BY m.emailaddress, 
m.websiteid;
end if;
  end if;
  Return ref;
EXCEPTION
WHEN NO_DATA_FOUND THEN
   Return null;
END;
$$ LANGUAGE 'plpgsql';


On Tue, 2012-01-24 at 22:17 +0100, Pavel Stehule wrote:
 Hello
 
 
  So, is getMemberAdminPrevious_sp2() preparing a statement with wildcards?
 
  SELECT m.memberid, m.websiteid, m.emailaddress,
m.firstname, m.lastname, m.regcomplete, m.emailok
FROM   members m
WHERE  m.emailaddress LIKE $1
ANDm.changedate_id  $2
   ORDER BY m.emailaddress, m.websiteid;
 
  Or is it creating the string and executing it:
 
  sql = 'SELECT m.memberid, m.websiteid, m.emailaddress, '
 ||  ' m.firstname, m.lastname, m.regcomplete, m.emailok '
 ||  ' FROM   members m
 ||  ' WHERE  m.emailaddress LIKE ' || arg1
 ||  ' ANDm.changedate_id  ' || arg2
 ||  ' ORDER BY m.emailaddress, m.websiteid ';
  execute(sql);
 
  Maybe its the planner doesnt plan so well with $1 arguments vs actual
  arguments thing.
 
 
 sure, it could be blind optimization problem in plpgsql. Maybe you
 have to use a dynamic SQL - OPEN FOR EXECUTE stmt probably
 
 http://www.postgresql.org/docs/9.1/interactive/plpgsql-cursors.html
 
 Regards
 
 Pavel Stehule
 
  -Andy
 
 
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can lots of small writes badly hamper reads from other tables?

2012-01-24 Thread Dave Crooke
Hi guys

Thanks for the quick followups folks  please note I am deliberately
running a setup without commit guarantees, so a lot of the conventional
advice about not doing small writes isn't applicable, but I do want to
understand more about how this affects PG internals even if the I/O is
smoothed out.

By commit consistency off I am referring to the setting
synchronous_commit = off in postgresql.conf  IIRC this should mean
semantically that a DB crash may lose data that was reported back to the
app as successfully committed, but will result in a consistent state on
reboot and recovery. In this case IIUC the 120 commits per second per
drive limit does not apply, and I hope the advice about testing fsync is
similarly not applicable to my case. Also, IIUC that settings like
commit_siblings and commit_delay should be ignored by PG in my case.

I would be interested in learning what the **in-memory** constraints and
costs are on the PG server side of doing a lot of small commits when sync
wrties are *off*, e.g. the implications for the locking system, and whether
this can affect the long queries on the other tables apart from general
resource contention.

The pattern of lots of tiny transactions is semantically correct for the
app, and I am using a JDBC prepared statement on the Java side, which I
believe the PG driver will turn in to a pre-compiled statement with enough
uses (it does NOT do so on the first few hits). This should in theory be
even cheaper than a multiple INSERT VALUES which is all text and has to be
parsed.

However, if necessary for performance I can bundle the inserts into
slightly larger transactions - cases where writes fail are due only to
occasional duplicates (same primary key) coming from upstream and are
pretty rare, and in practice losing a batch of say 100 of these records
occasionally is not a big deal in my world (ignoring sound of cringing DBAs
:) so I could afford to bundle into transactions and then just drop a whole
bundle if any single write has a primary key collision.

Storage setup varies by customer, but a typical setup is to take RAID
groups of about 5-10TB each net from something like an EMC Clariion and
slice each group into 1TB LUNs which become VMWare datastores, which are
written simultaneously from multiple hosts. A mid-size Clariion would host
perhaps 50-100 of these small LUNs, and a customer running a high
performance environment might have Fibrechannel disks and RAID-10, but SATA
and RAID-5/6 would also be normal, albeit with a substantial write-back
cache (maybe 1GB, IIRC a current Clariion SP has 4GB total). Each file on
the datastore corresponds to a virtual disk on a VM, and the datastore is
formatted with VMFS (concurrent writer filesystem, uses SCSI locking to
control access to block allocation and directory entries).

The other type of VMWare datastore works at the filesystem layer - instead
of a shared SAN with iSCSI / FC-AL, the VMware hosts are all pointed at a
shared NFS server directory. NetApp is the popular back end for this
configuration.

On top of this virtualization, I have PG laid out on two virtual disks -
WAL and log files are on the main system partition, index and table data on
a second partition. Both formatted with ext3fs.

One of my larger customers had his SAN guy complain to him that our app was
writing more data to the NetApp it was on than every other app combined, so
I am mindful of the volume being more than some of these systems were
planned for :)

Cheers
Dave

On Tue, Jan 24, 2012 at 3:09 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 24 Leden 2012, 21:16, Dave Crooke wrote:
  Hi folks
 
  This could be a sheer volume issue, but I though I would ask the wisdom
 of
  this forum as to next investigative steps.
 
  
 
  We use PostgreSQL 8.4.4 which is bundled with our application as a VMware
  virtual appliance. The bulk of the app's database activity is recording
  performance data points which arrive in farily large sustained bursts of
  perhaps 10,000 rows a minute at a medium sized customer, each of which
 are
  logically separate items and being committed as individual transactions
  (JDBC auto-commit mode). Our offshore QA team was assigned to track an
  intermittent issue with speed of some large queries on other tables, and
  they believe based on correlation the two activities may be contending.
 
  The large query is coming off of different tables from the ones being
  written to ... the raw data goes into a table named by day (partitioning
  is
  all within the app, not PG) e.g. PERF_RAW_2012_01_24 and then there are a
  bunch of rollup statements which run hourly to do the aggregations, e.g.

 Each storage device has some basic I/O limits - sequential speed
 (read/write)
 and the maximum number or I/O operations it can handle. For example a 7.2k
 drives can do up to 160MB/s sequential reads/writes, but not more than 120
 I/O ops per second. Similarly for other devices - 15k drives can do up to
 250 I/Os. 

Re: [PERFORM] Can lots of small writes badly hamper reads from other tables?

2012-01-24 Thread Tomas Vondra
On 24.1.2012 22:36, Dave Crooke wrote:
 Hi guys

 Thanks for the quick followups folks  please note I am deliberately
 running a setup without commit guarantees, so a lot of the conventional
 advice about not doing small writes isn't applicable, but I do want to
 understand more about how this affects PG internals even if the I/O is
 smoothed out.

 By commit consistency off I am referring to the setting
 synchronous_commit = off in postgresql.conf  IIRC this should mean
 semantically that a DB crash may lose data that was reported back to the
 app as successfully committed, but will result in a consistent state on
 reboot and recovery. In this case IIUC the 120 commits per second per
 drive limit does not apply, and I hope the advice about testing fsync
 is similarly not applicable to my case. Also, IIUC that settings like
 commit_siblings and commit_delay should be ignored by PG in my case.
Oh, I haven't noticed the synchronous_commit=off bit. You're right about
the consistency guarantees (possibility of lost transactions but no
corruption).

IIRC the async commit issues fsync for each commit, but does not wait
for it to finish. The question is whether this improves the way the I/O
is used or not. That's difficult to answer without more detailed info
(vmstat/iostat).

In some cases this may actually hammer the system even worse, killing
the performance, because you're removing the wait time so the INSERT
processes are submitting more fsync operations than it can handle.

There are cases when this may actually improve the I/O utilization (e.g.
when there's a lot of drives in RAID).

You need to watch the drive and CPU stats to identify the causes. Is it
CPU bound (100% cpu utilization)? Is it I/O bound (drives 100% utilized)?

Moreover, it's not just about the fsync operations. If there are
constraints that need to be checked (e.g. foreign keys, unique
constrains etc.), that may cause additional I/O operations.

Maybe you could get better results with commit_delay/commit_siblings.
That effectively groups commits into a single fsync operation. (Which
synchronous_commit=off does not do IIRC).

I've seen really good results with large amounts of concurrent clients.
How many of those insert processes are there?

 I would be interested in learning what the **in-memory** constraints and
 costs are on the PG server side of doing a lot of small commits when
 sync wrties are _off_, e.g. the implications for the locking system, and
 whether this can affect the long queries on the other tables apart from
 general resource contention.
I really doubt this is the case. If you're interested in watching these
issues, set up a pgbench database with small scaling factor (so that the
DB fits into memory) and maybe set fsync=off. Then you'll be able to
observe the locking issues etc.

But this all is just a hypothesis, and my suggestion is that you really
verify if before trying to fix it - if the bottleneck really is inside
PostgreSQL (locking or whatever).

Eliminate all the other usual bottlenecks first - I/O and CPU. Show us
some stats, e.g. vmstat, iostat etc.

 The pattern of lots of tiny transactions is semantically correct for the
 app, and I am using a JDBC prepared statement on the Java side, which I
 believe the PG driver will turn in to a pre-compiled statement with
 enough uses (it does NOT do so on the first few hits). This should in
 theory be even cheaper than a multiple INSERT VALUES which is all text
 and has to be parsed.

 However, if necessary for performance I can bundle the inserts into
 slightly larger transactions - cases where writes fail are due only to
 occasional duplicates (same primary key) coming from upstream and are
 pretty rare, and in practice losing a batch of say 100 of these records
 occasionally is not a big deal in my world (ignoring sound of cringing
 DBAs  so I could afford to bundle into transactions and then just drop
 a whole bundle if any single write has a primary key collision.
If it's semantically correct, let's try to keep it that way.

 Storage setup varies by customer, but a typical setup is to take RAID
 groups of about 5-10TB each net from something like an EMC Clariion and
 slice each group into 1TB LUNs which become VMWare datastores, which are
 written simultaneously from multiple hosts. A mid-size Clariion would
 host perhaps 50-100 of these small LUNs, and a customer running a high
 performance environment might have Fibrechannel disks and RAID-10, but
 SATA and RAID-5/6 would also be normal, albeit with a substantial
 write-back cache (maybe 1GB, IIRC a current Clariion SP has 4GB total).
 Each file on the datastore corresponds to a virtual disk on a VM, and
 the datastore is formatted with VMFS (concurrent writer filesystem, uses
 SCSI locking to control access to block allocation and directory entries).

 The other type of VMWare datastore works at the filesystem layer -
 instead of a shared SAN with iSCSI / FC-AL, the VMware hosts are all
 pointed at a