Re: [PERFORM] Deleting bytea, autovacuum, and 8.2/8.4 differences

2010-03-14 Thread fka...@googlemail.com
Hi Dave,

thank you for your answers! Here some comments:

Dave Crooke:

> > * The table just has 5 unused int columns, a timestamp,
> > OIDs, and the bytea column, *no indices*; the bytea storage
> > type is 'extended', the 16 MB are compressed to approx. the
> > half.
> >
> 
> Why no indices?

Simply because the test case had just < 50 rows (deleting
all rows older than 2 minues). Later on I would use indices.


> > while it is planned to have the interval set to 6 hours in
> > the final version (thus creating a FIFO buffer for the
> > latest 6 hours of inserted data; so the FIFO will keep
> > approx.  10.000 rows spanning 160-200 GB data).
> >
> 
> That's not the way to keep a 6 hour rolling buffer ... what you need to do
> is run the delete frequently, with  *interval '6 hours'* in the SQL acting
> as the cutoff.

In fact the delete was run frequently to cut everything
older than 6 hours *immediately*.


> If you really do want to drop the entire table contents before refilling it,
> do a *DROP TABLE* and recreate it.

No, I do not want to drop the whole table.


> > * This deletion SQL command was simply repeatedly executed
> > by pgAdmin while my app kept adding the 16 MB rows.
> >
> 
> Are you sure you are timing the delete, and not pgAdmin re-populating some
> kind of buffer?

Quite sure, yes. Because I launched just the delete command
in pgAdmin while the rest was executed by my application
outside pgAdmin, of course.



> > * Autovacuum is on; I believe I need to keep it on,
> > otherwise I do not free the disk space, right? If I switch
> > it off, the deletion time reduces from the average 10s down
> > to 4s.
> >
> 
> You may be running autovaccum too aggressively, it may be interfering with
> I/O to the tables.

Hm, so would should I change then? I wonder if it helps to
run autovacuum less aggressive if there will not be a
situation were the whole process is stopped for a while. But
I'd like to understand what to change here.


> 8.4 has a lot of performance improvements. It's definitely worth a shot. I'd
> also consider switching to another OS where you can use a 64-bit version of
> PG and a much bigger buffer cache.

O.k., I'll give it a try.


Thank You.
 Felix


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Deleting bytea, autovacuum, and 8.2/8.4 differences

2010-03-13 Thread fka...@googlemail.com
Hi all,

my posting on 2010-01-14 about the performance when writing
bytea to disk caused a longer discussion. While the fact
still holds that the overall postgresql write performance is
roughly 25% of the serial I/O disk performance this was
compensated for my special use case here by doing some other
non-postgresql related things in parallel.

Now I cannot optimize my processes any further, however, now
I am facing another quite unexpected performance issue:
Deleting rows from my simple table (with the bytea column)
having 16 MB data each, takes roughly as long as writing
them!

Little more detail:

* The table just has 5 unused int columns, a timestamp,
OIDs, and the bytea column, no indices; the bytea storage
type is 'extended', the 16 MB are compressed to approx. the
half.

* All the usual optimizations are done to reach better
write through (pg_xlog on another disk, much tweaks to the
server conf etc), however, this does not matter here, since
not the absolute performance is of interest here but the
fact that deleting roughly takes 100% of the writing time.

* I need to write 15 rows of 16 MB each to disk in a maximum
time of 15 s, which is performed here in roughly 10 seconds,
however, now I am facing the problem that keeping my
database tidy (deleting rows) takes another 5-15 s (10s on
average), so my process exceeds the maximum time of 15s for
about 5s.

* Right now I am deleting like this:

DELETE FROM table WHERE (CURRENT_TIMESTAMP -
my_timestamp_column) > interval '2 minutes';

while it is planned to have the interval set to 6 hours in
the final version (thus creating a FIFO buffer for the
latest 6 hours of inserted data; so the FIFO will keep
approx.  10.000 rows spanning 160-200 GB data).

* This deletion SQL command was simply repeatedly executed
by pgAdmin while my app kept adding the 16 MB rows.

* Autovacuum is on; I believe I need to keep it on,
otherwise I do not free the disk space, right? If I switch
it off, the deletion time reduces from the average 10s down
to 4s.

* I am using server + libpq version 8.2.4, currently on
WinXP. Will an upgrade to 8.4 help here?

Do you have any other ideas to help me out?
Oh, please...

Thank You
 Felix





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-25 Thread fka...@googlemail.com
Scott Carey:

> > (2) The tests:
> > 
> > Note: The standard speed was about 800MB/40s, so 20MB/s.
> > 
> > 
> > a)
> > What I changed: fsync=off
> > Result: 35s, so 5s faster.
> > 
> > 
> > b) like a) but:
> > checkpoint_segments=128 (was 3)
> > autovacuum=off
> > 
> > Result: 35s (no change...?!)
> > 
> 
> yes, more checkpoint_segments will help if your
> shared_buffers is larger, it won't do a whole lot
> otherwise.  Generally, I like to keep these roughly equal
> sized as a starting point for any small to medium sized
> configuration.  So if shared_buffers is 1GB, that takes 64
> checkpoint segments to hold for heavy write scenarios.

(1)

Ok, that's what I tested: 1024 MB shared_buffers, 64
checkpoint segments.

Unfortunatelly I could not run it on the same hardware
anymore: The data is written to a single disk now, not raid
anymore. So with the default shared_buffers of 8 MB (?) we
should expect 45s for writing the 800 MB. With the large
shared_buffers and checkpoints (mentioned above) I got this:

1. run (right after postgres server (re-)start): 28s (!)
2. run: 44s
3. run: 42s

So, roughly the same as with small buffers.


(2)
Then I switched again from 8.2.4 to 8.4.2:

1. run (after server start): 25s.
2. run: 38s
3. run: 38s

So, 8.4 helped a bit over 8.2.


(3) All in all

By (1) + (2) the performance bottleneck has, however,
changed a lot (as shown here by the performance monitor):

Now, the test system is definitly disk bound. Roughly
speaking, at the middle of the whole test, for about 40-50%
of the time, the 'data' disk was at 100% (and the 'WAL' at
20%), while before and after that the 'WAL' disk had a lot
of peaks at 100% (and 'data' disk at 30%).

The average MB/s of the 'data' disk was 40 MB/s (WAL:
20MB/s) -- while the raw performance is 800MB/40s = 20MB/s,
so still *half* what the disk does.

So, this remains as the last open question to me: It seems
the data is doubly written to the 'data' disk, although WAL
is written to the separate 'WAL' disk.



> > Ok, I've managed to use 8.4 here. Unfortunatelly: There was
> > nearly no improvement in speed. For example test 2d)
> > performed in 35s.
> > 
> 
> With a very small shared_buffers the improvements to
> Postgres' shared_buffer / checkpoint interaction can not
> be utilized.

See above.


Thank You
 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-22 Thread fka...@googlemail.com
Scott Carey:

> Well, something is causing the system to alternate between
> CPU and disk bound here.  (see below).
> It would be useful to see what affect the index has.

Ok, I simply deleted the index and repeated the test: I did
not notice any difference. This is probably so because in
fact I am doing just 10 INSERTs.


> > (B) perfmon34.png: Results/graphs (performance monitor):
> > 
> Great data!

BTW: I have some more screenshots but as they do not arrive
on the mailing list I keep it. The new graphs are basicly
the same anyway.


> > (1) The upper dark/gray graph touching the 100% sometimes is
> > "disk write time %" of the data disk G:
> > 
> > (2) The yellow graph is nearly completly overpainted by (1)
> > since it is "disk time %".
> > 
> > (3) The brown graph below (1) is "Disk Write Byts/s" divided
> > by 1.000.000, so around 40 MB/s average.
> > 
> 
> Looks like it is writing everything twice, or close to it.
> Alternatively the index writes occupy half, but that is
> unlikely.

'Writing twice': That is the most interesting point I
believe. Why is the data disk doing 40 MB/s *not* including
WAL, however, having 20 MB/s write thoughput in fact. Seems
like: 20 MB for data, 20 MB for X, 20 MB for WAL. 

Although that questions is still unanswered: I verified
again that I am disk bound by temporarily replacing the
raid-0 with slower solution: a singly attached sata disk
of the same type: This *did* slow down the test a lot
(approx.  20%). So, yes, I am disk bound but, again, why
that much...

About removing the index on OIDs: No impact (see above).


> > (4) The read graph is "Disk Time %" of the WAL drive E:,
> > average approx. 30%.
> > 
> 
> WAL doesn't look like a bottleneck here, as other tests
> have shown.  A larger wal_buffers setting might lower this
> more, since your record overflows the buffer for sure.
> You might want to change your test case to write records
> similar size to what you expect (do you expect 80MB?) and
> then set wal_buffers up to the size of one checkpoint
> segment (16MB) if you expect larger data per transaction.

Ok, without knowing each exact effect I changed some of the
configuration values (from the defaults in 8.2.4), and did
some tests:

(1) First, the most important 8.2.4 defaults (for you to
overlook):

#shared_buffers=32MB
#temp_buffers=8MB
#max_prepared_transactions=5
#work_mem=1MB
#maintenance_work_mem=16MB
#max_stack_depth=2MB
#max_fsm_pages=204800
#max_fsm_relations=1000
#max_files_per_process=1000
#shared_preload_libraries=''
#vacuum_cost_delay=0
#vacuum_cost_page_hit=1
#vacuum_cost_page_miss=10
#vacuum_cost_page_dirty=20
#vacuum_cost_limit=200
#bgwriter_delay=200ms
#bgwriter_lru_percent=1.0
#bgwriter_lru_maxpages=5
#bgwriter_all_percent=0.333
#bgwriter_all_maxpages=5
#fsync=on
#full_page_writes=on
#wal_buffers=64kB
#checkpoint_segments=3
#checkpoint_timeout=5min
#checkpoint_warning=30s
#seq_page_cost=1.0
#random_page_cost=4.0
#cpu_tuple_cost=0.01
#cpu_index_tuple_cost=0.005
#cpu_operator_cost=0.0025
#effective_cache_size=128MB
#default_statistics_target=10
#constraint_exclusion=off
#from_collapse_limit=8
#join_collapse_limit=8
#autovacuum=on
#autovacuum_naptime=1min
#autovacuum_vacuum_threshold=500
#autovacuum_analyze_threshold=250
#autovacuum_vacuum_scale_factor=0.2
#autovacuum_analyze_scale_factor=0.1
#autovacuum_freeze_max_age=2
#autovacuum_vacuum_cost_delay=-1
#autovacuum_vacuum_cost_limit=-1
#deadlock_timeout=1s
#max_locks_per_transaction=64


(2) The tests:

Note: The standard speed was about 800MB/40s, so 20MB/s.


a)
What I changed: fsync=off
Result: 35s, so 5s faster.


b) like a) but:
checkpoint_segments=128 (was 3)
autovacuum=off

Result: 35s (no change...?!)


c) like b) but:
temp_buffers=200MB (was 8)
wal_sync_method=open_datasync (was fsync)
wal_buffers=1024kB (was 64)

Result:
The best ever, it took just 29s, so 800MB/29s = 27.5MB/s.
However, having autovacuum=off probably means that deleted
rows will occupy disk space? And I also fear that
checkpoint_segments=128 mean that at some point in the
future there will be a huge delay then (?).


d) also like b) but:
temp_buffers=1000MB
wal_buffers=4096kB
checkpoint_segments=3
autovacuum=on

Result: Again slower 36s



I am not able to interprete that in depth.





> > (C) My interpretation
> > 
> > (1) Although the data disk G: sometimes hits 100%: All in
> > all it seems that neither the CPUs nor the data disk
> > (approx. 65%) nor the WAL disk (approx. 30%) are at their
> > limits. See also 1000 writes/s, 40MB/s write thoughput.
> > 
> 
> I think it is alternating.  Whatever is causing the 25%
> CPU jump during the 'slow' periods is a clue.  Some
> process on the system must be increasing its time
> significantly in these bursts.  I suspect it is postgres
> flushing data from its shared_buffers to the OS.  8.2 is
> not very efficient at its ability to write out to the OS
> in a constant stream, and tends to be very 'bursty' like
> this.  I suspect that 8.3 or 8.4 would perform a l

Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread fka...@googlemail.com
Ivan Voras:

> [I just skimmed this thread - did you increase the number of WAL logs to 
> something very large, like 128?]

Yes, I tried even more.

I will be writing data quite constantly in the real scenario
later. So I wonder if increasing WAL logs will have a
positive effect or not: AFAIK when I increase it, the
duration after the max is hit will be longer then (which is
not acceptable in my case).

Could anyone confirm if I got it right?

 Felix


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread fka...@googlemail.com
Pierre Frédéric Caillaud:

> I don't remember if you used TOAST compression or not.

I use 'bytea' and SET STORAGE EXTERNAL for this column.
AFAIK this switches off the compression.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread fka...@googlemail.com
Scott Carey:

> You are CPU bound.
> 
> 30% of 4 cores is greater than 25%.  25% is one core fully
> used.

I have measured the cores separately. Some of them reached
30%. I am not CPU bound here.

> The postgres compression of data in TOAST is
> probably the problem.  I'm assuming its doing Gzip, and at
> the default compression level, which on random data will
> be in the 15MB/sec range.  I don't know if TOAST will do
> compression at a lower compression level.

Hm. I use 'bytea' (see original posting) and SET STORAGE
EXTERNAL for this column which switches of the compression
AFAIK. I was doing this to measure the raw performance and
may not be able to use compression later in real scenario.

BTW: In the initial tests I used 200 blocks of 4 MB bytea
which is the real scenario; later on I was using 10 times
80MB each just to reduce the number of INSERT commands and
to make it easier to find the performance problem.


> Is your data typically random or incompressible?  If it is
> compressible then your test should be changed to reflect
> that.

Unfortunatelly I can't say much yet. Of course, in case
compression makes sense and fits the CPU performance I will
use it.


> If I am wrong, you are I/O bound

Yes. This is the first half of what we found out now.

> -- this will show up in
> windows Performance Monitor as "Disk Time (%)" -- which
> you can get on a per device or total basis, along with i/o
> per second (read and/or write) and bytes/sec metrics.

Yes, I am using this tool.

However, the deeper question is (sounds ridiculous): Why am
I I/O bound *this much* here. To recall: The write
performance in pg is about 20-25% of the worst case serial
write performance of the disk (and only about 8-10% of the
best disk perf) even though pg_xlog (WAL) is moved to
another disk, only 10 simple INSERT commands, a simple table
of 5 columns (4 unused, one bytea) and one index for OID, no
compression since STORAGE EXTERNAL, ntfs tweaks (noatime
etc), ...


> To prove that you are CPU bound, split your test in half,
> and run the two halves at the same time.  If you are CPU
> bound, then your bytes/sec performance will go up
> significantly, along with CPU usage.

Done already (see earlier posting). I am not CPU bound.
Speed was the same.


Thank You for the detailed reply.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread fka...@googlemail.com
fka...@googlemail.com:

> I'll try to execute these tests on a SSD
> and/or Raid system.

FYI:

On a sata raid-0 (mid range hardware) and recent 2x 1.5 TB
disks with a write performance of 100 MB/s (worst, to 200
MB/s max), I get a performance of 18.2 MB/s. Before, with
other disk 43 MB/s (worst to 70 MB/s max) postgres came to
14-16 MB/s.

So, I conclude finally:

(1) Postgresql write throughput (slowly) scales with the
harddisk speed.

(2) The throughput (not counting WAL doubling data) in
postgresql is 20-25% of the disk thoughput.


I want to thank you all for the very good support!

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
fka...@googlemail.com:

> I'll try to execute these tests on a SSD
> and/or Raid system.

FYI:

On a recent but mid-range performing SSD (128 MB size,
serially writing 80-140 MB, 100 MB average) the situation
was worse for some reason. No difference by fsync=on/off.

 Felix


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
Matthew Wakeling:

> The data needs to be written first to the WAL, in order to provide 
> crash-safety. So you're actually writing 1600MB, not 800.

I come back again to saving WAL to another disk. Now, after
all, I wonder: Doesn't the server wait anyway until WAL was
written to disk?

So, if true, does it should not really matter if WAL is
written to another disk then or not (besides some savings by
2x hd cache and less hd head moves).

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
Hannu Krosing:

> did you also test this with fsync=off ?

Yes. No significant difference.


> I suspect that what you are seeing is the effect of randomly writing to
> the index files. While sequential write performance can be up to
> 80MB/sec on modern drives, sequential writes are an order of magnitude
> slower. And at your data sizes you are very likely to hit a 
> CHECKPOINT, which needs to do some random writes.

Yes, from the server log I noticed that I hit checkpoints
too early and too often. I tried the astronomical value of
1000 for checkpoint_segments to not hit a single one for the
whole test run (copying 800 MB) -- even though that is no
good idea in practice of course.

It took even longer then. Probably because the server
created a lot of 16 MB log files (about 300 in my case)
which is presumly more costy (at least for the first run?)
than overwriting existing files. I am not too much into
that, though, since this is not a solution anyway on the
long run IMHO.

Thanks again.

 Felix
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
Dave Crooke:

> If you don't need that level of consistency for your 8MB blobs, write them
> to plain files named with some kind of id, and put the id in the database
> instead of the blob.

The problem here is that I later need to give access to the
database via TCP to an external client. This client will
then read out and *wipe* those masses of data
asynchronously, while I'll continue to writing into to
database.

Separating the data into an ID value (in the database) and
ordinary binary files (on disk elsewhere) means, that I need
to implement a separate TCP protocol and talk to the client
whenever it needs to read/delete the data. I try to avoid
that extra task. So postgres shall function here as a
communicator, too, not only for saving data to disk.

Thank you.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
Hannu Krosing:

> On Thu, 2010-01-14 at 21:14 +0100, fka...@googlemail.com wrote:
> > Thanks a lot for your reply.
> > 
> > Hannu Krosing:
> > 
> > > > 4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.
> > > 
> > > try inserting the same data using 4 parallel connections or even 8
> > > parallel ones.
> > 
> > Interesting idea -- I forgot to mention though that 2-3
> > cores will be occupied soon with other tasks.
> 
> Even one core will probably be idling at the througput you mentioned, so
> the advice still stands, use more than one connection to get better
> throughput.

Thank You. Since connecting more than once would mean some
major changes in my db layer I fear considering it as a
solution.

BTW: I do not get the idea behind that. Since firstly, I
later will have just one core free for postgres processes,
and secondly neither the cpu nor the postgres processes seem
to be really busy yet. Do you mean a postgres process may be
programmed in a way that it waits for something unknown
which can be surrounded by feeding another postgres process
with work, even on the same CPU?

As a short check, this is what I did (see other postings
from today for further scenarios I tested):

Setting:

* About 11.1 GB data in the table "bin_table" on a
  separate "data disk" from the tests the last days (mostly
  rows of 80 MB bin data each)

* WAL/pg_xlog not symlinked to another disk anymore.

* created tables test + test2 "LIKE bin_table"

* 2x times pgAdminIII, running:
  INSERT INTO test SELECT * FROM bin_table;

  resp.

  INSERT INTO test2 SELECT * FROM bin_table;

Result:

* To copy those 11.1 GB into test + test2 in parallel it
  took 1699 s (13,17 MB/s)

This is what was to expect. It took quite exactly 2 times of
what 1 process needs for writing half of the data.


Thank You again.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
Hello Pierre,

thank You for these useful test commands.

Here is what I did:


Pierre Frédéric Caillaud:

>   Try this :
> 
> CREATE TABLE test AS SELECT * FROM yourtable;
> 
>   This will test write speed, and TOAST compression speed.
>   Then try this:

(1)

Setting:

* pg_xlog sym'linked to another disk (to "system disk")
* having approx 11.1 GB in 'yourtable' on "data disk"
* executed SQL by pgAdmin III (as above, no transaction)

Speed:

* 754 s (14.5 MB/s)


> CREATE TABLE test (LIKE yourtable);
> COMMIT;
> INSERT INTO test SELECT * FROM yourtable;
> 
>   This does the same thing but also writes WAL.
>   I wonder what results you'll get.

(2)

Setting: like (1), and 'test' table removed first
Speed: 752 s (so, the same since pg_xlog sym'linked)


(3)

Setting: like (2), but removed symlink of pg_xlog, so
having it again on "data disk" where big data is

Speed: 801 s (so ~1 minute longer)

BTW: I expected longer duration for scenario (3).



IMHO: As neither the CPUs nor the disk throughput nor the
postgres.exe task's CPU consumption was at its limits: I
wonder what is the problem here. Maybe it is not postgresql
related at all. I'll try to execute these tests on a SSD
and/or Raid system.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread fka...@googlemail.com
Pierre Frédéric Caillaud:

>   At install, or use a symlink (they exist on windows too !...)
> 
>   http://stackoverflow.com/questions/1901405/postgresql-wal-on-windows

Very interesting! Did not help much though (see other
posting).

Thank You
 Felix


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread fka...@googlemail.com
Matthew Wakeling:

> On Thu, 14 Jan 2010, fka...@googlemail.com wrote:
> > Nevertheless: If your explanation covers all what can be
> > said about it then replacing the hard disk by a faster one
> > should increase the performance here (I'll try to check that
> > out).
> 
> Probably. However, it is worth you running the test again, and looking at 
> how busy the CPU on the machine is. The disc may be the bottleneck, or the 
> CPU may be the bottleneck.

True.

I've changed the setting a bit:

(1) Replaced 7.200 disk by a 10.000 one, still sata though.

(2) Inserting rows only 10x times (instead of 100x times)
but 80mb each, so having the same amount of 800mb in total.

(3) Changed the WAL path to the system disk (by the
great 'junction' trick mentioned in the other posting), so
actually splitting the write access to the "system" disk and
the fast "data" disk.



And here is the frustrating result:

1. None of the 4 CPUs was ever more busy than 30% (never
less idle than 70%),

2. while both disks kept being far below the average write
performance: the "data" disk had 18 peaks of approx. 40 mb
but in total the average thoughput was 16-18 mb/s.


BTW:

* Disabling noatime and similar for ntfs did not change
things much (thanks though!).

* A short cross check copying 800mb random data file from
"system" to "data" disk showed a performance of constantly
75 mb/s.


So, I have no idea what remains as the bottleneck.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com
Pierre Frédéric Caillaud:

> > 4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.
> 
>   Big CPU and slow disk...
> 
>   You should add another disk just for the WAL -- disks are pretty cheap  
> these days.
>   Writing the WAL on a second disk is the first thing to do on a  
> configuration like yours, if you are limited by writes.
>   It also reduces the fsync lag a lot since the disk is only doing WAL.

Good idea -- where can I set the path to WAL?

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com
Florian Weimer:

> > Do you have any further idea why 16MB/s seems to be the limit here?
> 
> BYTEA deserialization is very slow, and this could be a factor here.
> Have you checked that you are in fact I/O bound?

Could you elaborate that a bit? It sounds interesting but I
do not get what you mean by:

"bytea deserialization": Do you mean from an escaped string
back to real binary data? Does that apply to my case (I use
PGexecParam and have the Format arg set to 1, binary) ?

"I/O bound": What do you mean by that?


> You can speed things up by sending the data in binary, by passing
> approriate parameters to PQexecParams().

Do you mean the Format arg =1 ? If not, what is appropriate
here?

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com
Aidan Van Dyk:

> So, your SATA disk can do 43MB/s of sequential writes, but you're example
> is doing:
> 1) Sequential writes to WAL
> 2) Random writes to your index
> 3) Sequential writes to table heap
> 4) Sequential writes to table' toast heap
> 5) Any other OS-based FS overhead

Ok, I see. Thanks a lot for the detailed answer! Especially
writing to WAL may eat up 50% as I've learned now. So,
16MB/s x 2 would in fact be 32 MB/s, plus some extras...


However, does that mean: If I have a raw sequential
performance of 100%, I will get a binary write (like in my
example) which is about 33% as a general rule of thumb?

Just to mention:

* The system has two hard disks, the first for
  WinXP, the second purely for the postgres data.

* I was doing nothing else simultanously on the newly
  installed OS.

* The consumed time (50s, see my test case) were needed to
  99.9 % just by PGexecParam() function.

* No network connect to the postgres server (everything
  local).
  
* No complex sql command; just inserting 100x times using
  PGexecParam(), as a transaction.

* The binary data was marked as such in PGexecParam
  (Format = 1).

* What I meant by 43 MB/s "worst case": I downloaded
  some hd benchmarks which showed a performance of
  43-70 MB/s. (Whereas repetitions of my postgres test did
  never vary, but *constantly* performed at 16MB/s).

Hm.

Nevertheless: If your explanation covers all what can be
said about it then replacing the hard disk by a faster one
should increase the performance here (I'll try to check that
out).

Thanks again!

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com

Thanks a lot for your reply.

Hannu Krosing:

> > 4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.
> 
> try inserting the same data using 4 parallel connections or even 8
> parallel ones.

Interesting idea -- I forgot to mention though that 2-3
cores will be occupied soon with other tasks.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com

Thanks a lot for the detailed reply.

Matthew Wakeling:

> On Thu, 14 Jan 2010, fka...@googlemail.com wrote:
> > This takes about 50s, so, 800MB/50s = 16MB/s.
> >
> > However the harddisk (sata) could write 43 MB/s in the worst
> > case! Why is write performance limited to 16 MB/s?
> 
> Several reasons:
> 
> The data needs to be written first to the WAL, in order to provide 
> crash-safety. So you're actually writing 1600MB, not 800.

I understand. So the actual throughput is 32MB/s which is
closer to 43 MB/s, of course.

Can I verify that by temporarily disabling WAL writes
completely and see if the thoughput is then doubled?

 Felix


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com
Thank You for your reply.

Ivan Voras:

> Are you doing it locally or over a network? If you are accessing the 
> server over a network then it could be the location of the bottleneck.

All is done locally (for now).
 
 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com
Hello together,

I need to increase the write performance when inserting
bytea of 8MB. I am using 8.2.4 on windows with libpq.

The test setting is simple:

I write 100x times a byte array (bytea) of 8 MB random data
into a table having a binary column (and oids and 3 other
int columns, oids are indexed). I realized that writing 8 MB
of 0-bytes is optimized away. With random data, the disk
space now is filled with 800MB each run as expected. I use a
transaction around the insert command.

This takes about 50s, so, 800MB/50s = 16MB/s.

However the harddisk (sata) could write 43 MB/s in the worst
case! Why is write performance limited to 16 MB/s?


Some more hints what I do:

I use PQexecParams() and the INSERT ... $001 notation to NOT
create a real escapted string from the data additionally but
use a pointer to the 8MB data buffer.

I altered the binary column to STORAGE EXTERNAL.

Some experiments with postgresql.conf (fsync off,
shared_buffers=1000MB, checkpoint_segments=256) did not
change the 50s- much (somtimes 60s sometimes a little less).

4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.


Do you have any further idea why 16MB/s seems to be the
limit here?

Thank You
 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance