Re: [PERFORM] Can lots of small writes badly hamper reads from other tables?

2012-01-25 Thread Jeff Janes
On Tue, Jan 24, 2012 at 12:16 PM, Dave Crooke dcro...@gmail.com wrote:
 Hi folks

 This could be a sheer volume issue, but I though I would ask the wisdom of
 this forum as to next investigative steps.

To answers the question in your subject, yes.  If the disk head is
positioned to write in one place, it can't be reading from some other
place.  The various levels of caches and re-ordering and other tricks
can improve the situation, but they have a finite capacity to do so.

 We use PostgreSQL 8.4.4 which is bundled with our application as a VMware
 virtual appliance. The bulk of the app's database activity is recording
 performance data points which arrive in farily large sustained bursts of
 perhaps 10,000 rows a minute at a medium sized customer, each of which are
 logically separate items and being committed as individual transactions
 (JDBC auto-commit mode). Our offshore QA team was assigned to track an
 intermittent issue with speed of some large queries on other tables, and
 they believe based on correlation the two activities may be contending.

 The large query is coming off of different tables from the ones being
 written to ... the raw data goes into a table named by day (partitioning is
 all within the app, not PG) e.g. PERF_RAW_2012_01_24 and then there are a
 bunch of rollup statements which run hourly to do the aggregations, e.g.

In your attached schema there are two perf_raw tables, and they have
different sets of indexes on them.
Which set is in operation during the inserts?


 insert into PERF_HOURLY_2012_01_24 select key_columns, avg(data), now() from
 perf_raw_2012_01_24 where time_stamp between (now() - interval '1 hour') and
 now() group by key_columns

 The big queries are hitting multiple of the PERF_HOURLY tables and pulling a
 few dozen rows from each.

How big are they those big queries, really?  A few dozen tables times
a few dozen rows?

...

 Is there any tweaking we should do on the PG settings, or on the pattern in
 which the app is writing - we currently use 10 writer threads on the Java
 side and they keep PG going pretty good.

Do you need 10 writer threads?  What happens if you use fewer?


 I considered bundling the writes into larger transactions, will that really
 help much with commit consistency off?

With synchronous_commit=off, I wouldn't expect the transaction
structure to make much difference.  Especially not if the target of
the mass inserts is indexed.

 Is there some specific usual suspect stuff I should look at on the PG side
 to look for efficiency issues such as index lock contention or a poor buffer
 cache hit ratio? Will doing EXPLAIN ANALYSE on the big query be informative,
 and if so, does it need to be done while the write load is applied?

EXPLAIN would probably help, EXPLAIN ANALYSE while the problem is in
action would help more.

Even better would be to see where the queries are blocking during the
problem, but there is no easy way to get that in postgres.  I'd strace
-ttt -T the query process (although the mere act of stracing it can
slow it down enough to relieve the bottleneck you are trying to
identify)


 The other whacky idea I had was to have the writer threads pause or throttle
 themselves when a big query is happening (it's all in one JVM and we are
 using a connection pooler, so it's easy to intercept and track if needed)
 however that strikes me as a rather ugly hack and I'd prefer to do something
 more robust and based on config tweaks that leverage existing resource
 management in PG.

Why not just always throttle them?  If you slam the data in as fast as
possible during brief bursts, you are probably just setting yourself
up for this type of issue.  (The brief bursts can be useful if they
make better use of cache, but then you have to accept that other
things will be disrupted during those bursts.)

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Can lots of small writes badly hamper reads from other tables?

2012-01-24 Thread Dave Crooke
Hi folks

This could be a sheer volume issue, but I though I would ask the wisdom of
this forum as to next investigative steps.



We use PostgreSQL 8.4.4 which is bundled with our application as a VMware
virtual appliance. The bulk of the app's database activity is recording
performance data points which arrive in farily large sustained bursts of
perhaps 10,000 rows a minute at a medium sized customer, each of which are
logically separate items and being committed as individual transactions
(JDBC auto-commit mode). Our offshore QA team was assigned to track an
intermittent issue with speed of some large queries on other tables, and
they believe based on correlation the two activities may be contending.

The large query is coming off of different tables from the ones being
written to ... the raw data goes into a table named by day (partitioning is
all within the app, not PG) e.g. PERF_RAW_2012_01_24 and then there are a
bunch of rollup statements which run hourly to do the aggregations, e.g.

insert into PERF_HOURLY_2012_01_24 select key_columns, avg(data), now()
from perf_raw_2012_01_24 where time_stamp between (now() - interval '1
hour') and now() group by key_columns

The big queries are hitting multiple of the PERF_HOURLY tables and pulling
a few dozen rows from each.

We are using a 64-bit VM with 8 virtual cores and 8GB RAM, of which Java
takes a bit over half, and Linux X with CentOS 5.x   PG has 1GB of
buffer cache and reasonable (AFAICT) resource limits for everything else,
which are intended to be workable for a range of client sizes out of the
box. True transactional consistency is disabled for performance reasons,
virtual environments do not take kindly to lots of small writes.

---

Is there any tweaking we should do on the PG settings, or on the pattern in
which the app is writing - we currently use 10 writer threads on the Java
side and they keep PG going pretty good.

I considered bundling the writes into larger transactions, will that really
help much with commit consistency off?

Is there some specific usual suspect stuff I should look at on the PG
side to look for efficiency issues such as index lock contention or a poor
buffer cache hit ratio? Will doing EXPLAIN ANALYSE on the big query be
informative, and if so, does it need to be done while the write load is
applied?

The other whacky idea I had was to have the writer threads pause or
throttle themselves when a big query is happening (it's all in one JVM and
we are using a connection pooler, so it's easy to intercept and track if
needed) however that strikes me as a rather ugly hack and I'd prefer to do
something more robust and based on config tweaks that leverage existing
resource management in PG.

Relevant schema and config attached, all comments and advice welcome,
including general tuning tips and rationale for moving to PG 9.x  I'm
well aware this isn't the acme of PG tuning :)

Cheers
Dave


postgresql.conf
Description: Binary data
-- The current day's perf_raw table contains the extra b index
-- to facilitate certain queries including the rollup ones which
-- create the perf_hourly tables. This index is dropped once the
-- table is no longer being written to.

Table public.perf_raw_2011_12_31
 Column  |Type | Modifiers 
-+-+---
 guid_key| integer | 
 property_key| integer | 
 instance_id_key | integer | 
 time_stamp  | timestamp without time zone | 
 value   | double precision| 
Indexes:
perf_raw_2011_12_31_a btree (guid_key, property_key, time_stamp)
perf_raw_2011_12_31_b btree (time_stamp, property_key)

Table public.perf_raw_2011_12_29
 Column  |Type | Modifiers 
-+-+---
 guid_key| integer | 
 property_key| integer | 
 instance_id_key | integer | 
 time_stamp  | timestamp without time zone | 
 value   | double precision| 
Indexes:
perf_raw_2011_12_29_a btree (guid_key, property_key, time_stamp)

   Table public.perf_hourly_2011_12_29
 Column  |Type | Modifiers 
-+-+---
 guid_key| integer | 
 property_key| integer | 
 instance_id_key | integer | 
 time_stamp  | timestamp without time zone | 
 value   | double precision| 
Indexes:
perf_hourly_2011_12_29_a btree (guid_key, property_key, time_stamp)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can lots of small writes badly hamper reads from other tables?

2012-01-24 Thread Andy Colson

On 1/24/2012 2:16 PM, Dave Crooke wrote:

Hi folks

This could be a sheer volume issue, but I though I would ask the wisdom
of this forum as to next investigative steps.



We use PostgreSQL 8.4.4 which is bundled with our application as a
VMware virtual appliance. The bulk of the app's database activity is
recording performance data points which arrive in farily large sustained
bursts of perhaps 10,000 rows a minute at a medium sized customer, each
of which are logically separate items and being committed as individual
transactions (JDBC auto-commit mode). Our offshore QA team was assigned
to track an intermittent issue with speed of some large queries on other
tables, and they believe based on correlation the two activities may be
contending.


You have 10 connections, all doing:

begin
insert into PERF_RAW_2012_01_24  -- one record
commit


If that's what you're doing, yes, I'd say that's the slowest way possible.

Doing this would be faster:

begin
insert into PERF_RAW_2012_01_24  -- one record
insert into PERF_RAW_2012_01_24  -- one record
...
insert into PERF_RAW_2012_01_24  -- one record
commit

Doing this would be even faster:


begin
-- one insert, multiple rows
insert into PERF_RAW_2012_01_24 values (...) (...) (...) ... (...);
insert into PERF_RAW_2012_01_24 values (...) (...) (...) ... (...);
commit

And, fastest of all fastest, use COPY.  But be careful, its so fast 
it'll melt your face off :-)



I didnt even bother trying to pick out the uncommented settings from 
your .conf file.  Way to much work.


VM usually have pretty slow IO, so you might wanna watch vmstat and 
iostat to see if you are IO bound or CPU bound.


Also watching iostat before and after the change might be interesting.

If you you keep having lots and lots of transaction, look into 
commit_delay, it'll help batch commits out to disk  (if I remember 
correctly).


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can lots of small writes badly hamper reads from other tables?

2012-01-24 Thread Tomas Vondra
On 24 Leden 2012, 21:16, Dave Crooke wrote:
 Hi folks

 This could be a sheer volume issue, but I though I would ask the wisdom of
 this forum as to next investigative steps.

 

 We use PostgreSQL 8.4.4 which is bundled with our application as a VMware
 virtual appliance. The bulk of the app's database activity is recording
 performance data points which arrive in farily large sustained bursts of
 perhaps 10,000 rows a minute at a medium sized customer, each of which are
 logically separate items and being committed as individual transactions
 (JDBC auto-commit mode). Our offshore QA team was assigned to track an
 intermittent issue with speed of some large queries on other tables, and
 they believe based on correlation the two activities may be contending.

 The large query is coming off of different tables from the ones being
 written to ... the raw data goes into a table named by day (partitioning
 is
 all within the app, not PG) e.g. PERF_RAW_2012_01_24 and then there are a
 bunch of rollup statements which run hourly to do the aggregations, e.g.

Each storage device has some basic I/O limits - sequential speed (read/write)
and the maximum number or I/O operations it can handle. For example a 7.2k
drives can do up to 160MB/s sequential reads/writes, but not more than 120
I/O ops per second. Similarly for other devices - 15k drives can do up to
250 I/Os. SSDs can handle much more I/Os, e.g. Intel 320 can handle about
8k I/Os.

I have no idea what kind of storage device you're using and what amount of
sequential and random operations it can handle. But my guess you're hitting
the limit of random I/Os - each commit requires a fsync, and you're doing
10.000 of them per minute, i.e. about 160 per second. If the queries need
to read data from the drive (e.g. randomly), this just adds more I/Os.

 Is there any tweaking we should do on the PG settings, or on the pattern
 in
 which the app is writing - we currently use 10 writer threads on the Java
 side and they keep PG going pretty good.

The first thing you should do is grouping the inserts to one transaction.
That'll lower the number of I/Os the database needs to do. Besides that,
you can move the WAL to a separate (physical) device, thus spreading the
I/Os to more drives.

 I considered bundling the writes into larger transactions, will that
 really
 help much with commit consistency off?

What do you mean by commit consistency off?

 Is there some specific usual suspect stuff I should look at on the PG
 side to look for efficiency issues such as index lock contention or a poor
 buffer cache hit ratio? Will doing EXPLAIN ANALYSE on the big query be
 informative, and if so, does it need to be done while the write load is
 applied?

The first thing you should do is gathering some basic I/O stats.

Run  pg_test_fsync (a contrib module) to see how many fsync operations the
I/O subsystem can handle (if it reports more than 500, use -o to get it
running for a longer time).

Then gather vmstat 1 and iostat -x 1 for a few seconds when the workload
(inserts and queries) are actually running. That should tell you how the
drives are actually utilized.

Post these results to this list.

 Relevant schema and config attached, all comments and advice welcome,
 including general tuning tips and rationale for moving to PG 9.x  I'm
 well aware this isn't the acme of PG tuning :)

There's a nice page about tuning at the wiki:

  http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

I'd recommend significantly increasing the number of checkpoint segments,
e.g. to 64 (1GB) and setting completion target to 0.9. This usually helps
write-heavy workloads. And enable log_checkpoints.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can lots of small writes badly hamper reads from other tables?

2012-01-24 Thread Dave Crooke
Hi guys

Thanks for the quick followups folks  please note I am deliberately
running a setup without commit guarantees, so a lot of the conventional
advice about not doing small writes isn't applicable, but I do want to
understand more about how this affects PG internals even if the I/O is
smoothed out.

By commit consistency off I am referring to the setting
synchronous_commit = off in postgresql.conf  IIRC this should mean
semantically that a DB crash may lose data that was reported back to the
app as successfully committed, but will result in a consistent state on
reboot and recovery. In this case IIUC the 120 commits per second per
drive limit does not apply, and I hope the advice about testing fsync is
similarly not applicable to my case. Also, IIUC that settings like
commit_siblings and commit_delay should be ignored by PG in my case.

I would be interested in learning what the **in-memory** constraints and
costs are on the PG server side of doing a lot of small commits when sync
wrties are *off*, e.g. the implications for the locking system, and whether
this can affect the long queries on the other tables apart from general
resource contention.

The pattern of lots of tiny transactions is semantically correct for the
app, and I am using a JDBC prepared statement on the Java side, which I
believe the PG driver will turn in to a pre-compiled statement with enough
uses (it does NOT do so on the first few hits). This should in theory be
even cheaper than a multiple INSERT VALUES which is all text and has to be
parsed.

However, if necessary for performance I can bundle the inserts into
slightly larger transactions - cases where writes fail are due only to
occasional duplicates (same primary key) coming from upstream and are
pretty rare, and in practice losing a batch of say 100 of these records
occasionally is not a big deal in my world (ignoring sound of cringing DBAs
:) so I could afford to bundle into transactions and then just drop a whole
bundle if any single write has a primary key collision.

Storage setup varies by customer, but a typical setup is to take RAID
groups of about 5-10TB each net from something like an EMC Clariion and
slice each group into 1TB LUNs which become VMWare datastores, which are
written simultaneously from multiple hosts. A mid-size Clariion would host
perhaps 50-100 of these small LUNs, and a customer running a high
performance environment might have Fibrechannel disks and RAID-10, but SATA
and RAID-5/6 would also be normal, albeit with a substantial write-back
cache (maybe 1GB, IIRC a current Clariion SP has 4GB total). Each file on
the datastore corresponds to a virtual disk on a VM, and the datastore is
formatted with VMFS (concurrent writer filesystem, uses SCSI locking to
control access to block allocation and directory entries).

The other type of VMWare datastore works at the filesystem layer - instead
of a shared SAN with iSCSI / FC-AL, the VMware hosts are all pointed at a
shared NFS server directory. NetApp is the popular back end for this
configuration.

On top of this virtualization, I have PG laid out on two virtual disks -
WAL and log files are on the main system partition, index and table data on
a second partition. Both formatted with ext3fs.

One of my larger customers had his SAN guy complain to him that our app was
writing more data to the NetApp it was on than every other app combined, so
I am mindful of the volume being more than some of these systems were
planned for :)

Cheers
Dave

On Tue, Jan 24, 2012 at 3:09 PM, Tomas Vondra t...@fuzzy.cz wrote:

 On 24 Leden 2012, 21:16, Dave Crooke wrote:
  Hi folks
 
  This could be a sheer volume issue, but I though I would ask the wisdom
 of
  this forum as to next investigative steps.
 
  
 
  We use PostgreSQL 8.4.4 which is bundled with our application as a VMware
  virtual appliance. The bulk of the app's database activity is recording
  performance data points which arrive in farily large sustained bursts of
  perhaps 10,000 rows a minute at a medium sized customer, each of which
 are
  logically separate items and being committed as individual transactions
  (JDBC auto-commit mode). Our offshore QA team was assigned to track an
  intermittent issue with speed of some large queries on other tables, and
  they believe based on correlation the two activities may be contending.
 
  The large query is coming off of different tables from the ones being
  written to ... the raw data goes into a table named by day (partitioning
  is
  all within the app, not PG) e.g. PERF_RAW_2012_01_24 and then there are a
  bunch of rollup statements which run hourly to do the aggregations, e.g.

 Each storage device has some basic I/O limits - sequential speed
 (read/write)
 and the maximum number or I/O operations it can handle. For example a 7.2k
 drives can do up to 160MB/s sequential reads/writes, but not more than 120
 I/O ops per second. Similarly for other devices - 15k drives can do up to
 250 I/Os. 

Re: [PERFORM] Can lots of small writes badly hamper reads from other tables?

2012-01-24 Thread Tomas Vondra
On 24.1.2012 22:36, Dave Crooke wrote:
 Hi guys

 Thanks for the quick followups folks  please note I am deliberately
 running a setup without commit guarantees, so a lot of the conventional
 advice about not doing small writes isn't applicable, but I do want to
 understand more about how this affects PG internals even if the I/O is
 smoothed out.

 By commit consistency off I am referring to the setting
 synchronous_commit = off in postgresql.conf  IIRC this should mean
 semantically that a DB crash may lose data that was reported back to the
 app as successfully committed, but will result in a consistent state on
 reboot and recovery. In this case IIUC the 120 commits per second per
 drive limit does not apply, and I hope the advice about testing fsync
 is similarly not applicable to my case. Also, IIUC that settings like
 commit_siblings and commit_delay should be ignored by PG in my case.
Oh, I haven't noticed the synchronous_commit=off bit. You're right about
the consistency guarantees (possibility of lost transactions but no
corruption).

IIRC the async commit issues fsync for each commit, but does not wait
for it to finish. The question is whether this improves the way the I/O
is used or not. That's difficult to answer without more detailed info
(vmstat/iostat).

In some cases this may actually hammer the system even worse, killing
the performance, because you're removing the wait time so the INSERT
processes are submitting more fsync operations than it can handle.

There are cases when this may actually improve the I/O utilization (e.g.
when there's a lot of drives in RAID).

You need to watch the drive and CPU stats to identify the causes. Is it
CPU bound (100% cpu utilization)? Is it I/O bound (drives 100% utilized)?

Moreover, it's not just about the fsync operations. If there are
constraints that need to be checked (e.g. foreign keys, unique
constrains etc.), that may cause additional I/O operations.

Maybe you could get better results with commit_delay/commit_siblings.
That effectively groups commits into a single fsync operation. (Which
synchronous_commit=off does not do IIRC).

I've seen really good results with large amounts of concurrent clients.
How many of those insert processes are there?

 I would be interested in learning what the **in-memory** constraints and
 costs are on the PG server side of doing a lot of small commits when
 sync wrties are _off_, e.g. the implications for the locking system, and
 whether this can affect the long queries on the other tables apart from
 general resource contention.
I really doubt this is the case. If you're interested in watching these
issues, set up a pgbench database with small scaling factor (so that the
DB fits into memory) and maybe set fsync=off. Then you'll be able to
observe the locking issues etc.

But this all is just a hypothesis, and my suggestion is that you really
verify if before trying to fix it - if the bottleneck really is inside
PostgreSQL (locking or whatever).

Eliminate all the other usual bottlenecks first - I/O and CPU. Show us
some stats, e.g. vmstat, iostat etc.

 The pattern of lots of tiny transactions is semantically correct for the
 app, and I am using a JDBC prepared statement on the Java side, which I
 believe the PG driver will turn in to a pre-compiled statement with
 enough uses (it does NOT do so on the first few hits). This should in
 theory be even cheaper than a multiple INSERT VALUES which is all text
 and has to be parsed.

 However, if necessary for performance I can bundle the inserts into
 slightly larger transactions - cases where writes fail are due only to
 occasional duplicates (same primary key) coming from upstream and are
 pretty rare, and in practice losing a batch of say 100 of these records
 occasionally is not a big deal in my world (ignoring sound of cringing
 DBAs  so I could afford to bundle into transactions and then just drop
 a whole bundle if any single write has a primary key collision.
If it's semantically correct, let's try to keep it that way.

 Storage setup varies by customer, but a typical setup is to take RAID
 groups of about 5-10TB each net from something like an EMC Clariion and
 slice each group into 1TB LUNs which become VMWare datastores, which are
 written simultaneously from multiple hosts. A mid-size Clariion would
 host perhaps 50-100 of these small LUNs, and a customer running a high
 performance environment might have Fibrechannel disks and RAID-10, but
 SATA and RAID-5/6 would also be normal, albeit with a substantial
 write-back cache (maybe 1GB, IIRC a current Clariion SP has 4GB total).
 Each file on the datastore corresponds to a virtual disk on a VM, and
 the datastore is formatted with VMFS (concurrent writer filesystem, uses
 SCSI locking to control access to block allocation and directory entries).

 The other type of VMWare datastore works at the filesystem layer -
 instead of a shared SAN with iSCSI / FC-AL, the VMware hosts are all
 pointed at a