Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-26 Thread Scott Carey

On Jan 25, 2010, at 6:55 AM, fka...@googlemail.com wrote:

 Scott Carey:
 
 (2) The tests:
 
 Note: The standard speed was about 800MB/40s, so 20MB/s.
 
 
 a)
 What I changed: fsync=off
 Result: 35s, so 5s faster.
 
 
 b) like a) but:
 checkpoint_segments=128 (was 3)
 autovacuum=off
 
 Result: 35s (no change...?!)
 
 
 yes, more checkpoint_segments will help if your
 shared_buffers is larger, it won't do a whole lot
 otherwise.  Generally, I like to keep these roughly equal
 sized as a starting point for any small to medium sized
 configuration.  So if shared_buffers is 1GB, that takes 64
 checkpoint segments to hold for heavy write scenarios.
 
 (1)
 
 Ok, that's what I tested: 1024 MB shared_buffers, 64
 checkpoint segments.
 
 Unfortunatelly I could not run it on the same hardware
 anymore: The data is written to a single disk now, not raid
 anymore. So with the default shared_buffers of 8 MB (?) we
 should expect 45s for writing the 800 MB. With the large
 shared_buffers and checkpoints (mentioned above) I got this:
 
 1. run (right after postgres server (re-)start): 28s (!)
 2. run: 44s
 3. run: 42s
 
 So, roughly the same as with small buffers.
 
 
 (2)
 Then I switched again from 8.2.4 to 8.4.2:
 
 1. run (after server start): 25s.
 2. run: 38s
 3. run: 38s
 

If you expect to typically only run a batch of these large inserts 
occasionally, hopefully the 25s performance will be what you get.  

 So, 8.4 helped a bit over 8.2.
 
 
 (3) All in all
 
 By (1) + (2) the performance bottleneck has, however,
 changed a lot (as shown here by the performance monitor):
 
 Now, the test system is definitly disk bound. Roughly
 speaking, at the middle of the whole test, for about 40-50%
 of the time, the 'data' disk was at 100% (and the 'WAL' at
 20%), while before and after that the 'WAL' disk had a lot
 of peaks at 100% (and 'data' disk at 30%).
 
 The average MB/s of the 'data' disk was 40 MB/s (WAL:
 20MB/s) -- while the raw performance is 800MB/40s = 20MB/s,
 so still *half* what the disk does.
 
 So, this remains as the last open question to me: It seems
 the data is doubly written to the 'data' disk, although WAL
 is written to the separate 'WAL' disk.
 

It appears as though there is clear evidence that the system is writing data 
twice (excluding WAL).  This is where my Postgres knowledge ends and someone 
else will have to comment.  Why would it write the TOAST data twice?



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-26 Thread Greg Smith

Scott Carey wrote:

#bgwriter_delay=200ms
#bgwriter_lru_percent=1.0
#bgwriter_lru_maxpages=5
#bgwriter_all_percent=0.333
#bgwriter_all_maxpages=5
#checkpoint_segments=3
#checkpoint_timeout=5min
#checkpoint_warning=30s



Check out this for info on these parameters
http://wiki.postgresql.org/wiki/User:Gsmith  (Is there a better link Greg?)
  


Nope.  I started working on that back when I had some hope that it was 
possible to improve the background writer in PostgreSQL 8.2 without 
completely gutting it and starting over.  The 8.3 development work 
proved that idea was mistaken, which meant historical trivia about how 
the ineffective 8.2 version worked wasn't worth cleaning up to 
presentation quality anymore.  Stuck it on my personal page on the wiki 
just so I didn't lose it and could point at it, never developed into a 
proper article.


Generally, my advice for people running 8.2 is to turn the background 
writer off altogether:


bgwriter_lru_maxpages=0
bgwriter_all_maxpages=0

Because what is there by default isn't enough to really work, and if you 
crank it up enough to do something useful it will waste a lot 
resources.  It's possible with careful study to find a useful middle 
ground--I know Kevin Grittner accomplished that on their 8.2 install, 
and I did it once in a way that wasn't horrible--but you're unlikely to 
just get one easily.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-25 Thread fka...@googlemail.com
Scott Carey:

  (2) The tests:
  
  Note: The standard speed was about 800MB/40s, so 20MB/s.
  
  
  a)
  What I changed: fsync=off
  Result: 35s, so 5s faster.
  
  
  b) like a) but:
  checkpoint_segments=128 (was 3)
  autovacuum=off
  
  Result: 35s (no change...?!)
  
 
 yes, more checkpoint_segments will help if your
 shared_buffers is larger, it won't do a whole lot
 otherwise.  Generally, I like to keep these roughly equal
 sized as a starting point for any small to medium sized
 configuration.  So if shared_buffers is 1GB, that takes 64
 checkpoint segments to hold for heavy write scenarios.

(1)

Ok, that's what I tested: 1024 MB shared_buffers, 64
checkpoint segments.

Unfortunatelly I could not run it on the same hardware
anymore: The data is written to a single disk now, not raid
anymore. So with the default shared_buffers of 8 MB (?) we
should expect 45s for writing the 800 MB. With the large
shared_buffers and checkpoints (mentioned above) I got this:

1. run (right after postgres server (re-)start): 28s (!)
2. run: 44s
3. run: 42s

So, roughly the same as with small buffers.


(2)
Then I switched again from 8.2.4 to 8.4.2:

1. run (after server start): 25s.
2. run: 38s
3. run: 38s

So, 8.4 helped a bit over 8.2.


(3) All in all

By (1) + (2) the performance bottleneck has, however,
changed a lot (as shown here by the performance monitor):

Now, the test system is definitly disk bound. Roughly
speaking, at the middle of the whole test, for about 40-50%
of the time, the 'data' disk was at 100% (and the 'WAL' at
20%), while before and after that the 'WAL' disk had a lot
of peaks at 100% (and 'data' disk at 30%).

The average MB/s of the 'data' disk was 40 MB/s (WAL:
20MB/s) -- while the raw performance is 800MB/40s = 20MB/s,
so still *half* what the disk does.

So, this remains as the last open question to me: It seems
the data is doubly written to the 'data' disk, although WAL
is written to the separate 'WAL' disk.



  Ok, I've managed to use 8.4 here. Unfortunatelly: There was
  nearly no improvement in speed. For example test 2d)
  performed in 35s.
  
 
 With a very small shared_buffers the improvements to
 Postgres' shared_buffer / checkpoint interaction can not
 be utilized.

See above.


Thank You
 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-22 Thread fka...@googlemail.com
Scott Carey:

 Well, something is causing the system to alternate between
 CPU and disk bound here.  (see below).
 It would be useful to see what affect the index has.

Ok, I simply deleted the index and repeated the test: I did
not notice any difference. This is probably so because in
fact I am doing just 10 INSERTs.


  (B) perfmon34.png: Results/graphs (performance monitor):
  
 Great data!

BTW: I have some more screenshots but as they do not arrive
on the mailing list I keep it. The new graphs are basicly
the same anyway.


  (1) The upper dark/gray graph touching the 100% sometimes is
  disk write time % of the data disk G:
  
  (2) The yellow graph is nearly completly overpainted by (1)
  since it is disk time %.
  
  (3) The brown graph below (1) is Disk Write Byts/s divided
  by 1.000.000, so around 40 MB/s average.
  
 
 Looks like it is writing everything twice, or close to it.
 Alternatively the index writes occupy half, but that is
 unlikely.

'Writing twice': That is the most interesting point I
believe. Why is the data disk doing 40 MB/s *not* including
WAL, however, having 20 MB/s write thoughput in fact. Seems
like: 20 MB for data, 20 MB for X, 20 MB for WAL. 

Although that questions is still unanswered: I verified
again that I am disk bound by temporarily replacing the
raid-0 with slower solution: a singly attached sata disk
of the same type: This *did* slow down the test a lot
(approx.  20%). So, yes, I am disk bound but, again, why
that much...

About removing the index on OIDs: No impact (see above).


  (4) The read graph is Disk Time % of the WAL drive E:,
  average approx. 30%.
  
 
 WAL doesn't look like a bottleneck here, as other tests
 have shown.  A larger wal_buffers setting might lower this
 more, since your record overflows the buffer for sure.
 You might want to change your test case to write records
 similar size to what you expect (do you expect 80MB?) and
 then set wal_buffers up to the size of one checkpoint
 segment (16MB) if you expect larger data per transaction.

Ok, without knowing each exact effect I changed some of the
configuration values (from the defaults in 8.2.4), and did
some tests:

(1) First, the most important 8.2.4 defaults (for you to
overlook):

#shared_buffers=32MB
#temp_buffers=8MB
#max_prepared_transactions=5
#work_mem=1MB
#maintenance_work_mem=16MB
#max_stack_depth=2MB
#max_fsm_pages=204800
#max_fsm_relations=1000
#max_files_per_process=1000
#shared_preload_libraries=''
#vacuum_cost_delay=0
#vacuum_cost_page_hit=1
#vacuum_cost_page_miss=10
#vacuum_cost_page_dirty=20
#vacuum_cost_limit=200
#bgwriter_delay=200ms
#bgwriter_lru_percent=1.0
#bgwriter_lru_maxpages=5
#bgwriter_all_percent=0.333
#bgwriter_all_maxpages=5
#fsync=on
#full_page_writes=on
#wal_buffers=64kB
#checkpoint_segments=3
#checkpoint_timeout=5min
#checkpoint_warning=30s
#seq_page_cost=1.0
#random_page_cost=4.0
#cpu_tuple_cost=0.01
#cpu_index_tuple_cost=0.005
#cpu_operator_cost=0.0025
#effective_cache_size=128MB
#default_statistics_target=10
#constraint_exclusion=off
#from_collapse_limit=8
#join_collapse_limit=8
#autovacuum=on
#autovacuum_naptime=1min
#autovacuum_vacuum_threshold=500
#autovacuum_analyze_threshold=250
#autovacuum_vacuum_scale_factor=0.2
#autovacuum_analyze_scale_factor=0.1
#autovacuum_freeze_max_age=2
#autovacuum_vacuum_cost_delay=-1
#autovacuum_vacuum_cost_limit=-1
#deadlock_timeout=1s
#max_locks_per_transaction=64


(2) The tests:

Note: The standard speed was about 800MB/40s, so 20MB/s.


a)
What I changed: fsync=off
Result: 35s, so 5s faster.


b) like a) but:
checkpoint_segments=128 (was 3)
autovacuum=off

Result: 35s (no change...?!)


c) like b) but:
temp_buffers=200MB (was 8)
wal_sync_method=open_datasync (was fsync)
wal_buffers=1024kB (was 64)

Result:
The best ever, it took just 29s, so 800MB/29s = 27.5MB/s.
However, having autovacuum=off probably means that deleted
rows will occupy disk space? And I also fear that
checkpoint_segments=128 mean that at some point in the
future there will be a huge delay then (?).


d) also like b) but:
temp_buffers=1000MB
wal_buffers=4096kB
checkpoint_segments=3
autovacuum=on

Result: Again slower 36s



I am not able to interprete that in depth.





  (C) My interpretation
  
  (1) Although the data disk G: sometimes hits 100%: All in
  all it seems that neither the CPUs nor the data disk
  (approx. 65%) nor the WAL disk (approx. 30%) are at their
  limits. See also 1000 writes/s, 40MB/s write thoughput.
  
 
 I think it is alternating.  Whatever is causing the 25%
 CPU jump during the 'slow' periods is a clue.  Some
 process on the system must be increasing its time
 significantly in these bursts.  I suspect it is postgres
 flushing data from its shared_buffers to the OS.  8.2 is
 not very efficient at its ability to write out to the OS
 in a constant stream, and tends to be very 'bursty' like
 this.  I suspect that 8.3 or 8.4 would perform a lot
 better here, when tuned right.

Ok, I've managed to use 8.4 here. 

Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-22 Thread Scott Carey

On Jan 21, 2010, at 12:35 AM, Greg Smith wrote:

 Scott Carey wrote:
 On Jan 20, 2010, at 5:32 AM, fka...@googlemail.com wrote:
 
 
 In the attachement you'll find 2 screenshots perfmon34.png
 and perfmon35.png (I hope 2x14 kb is o.k. for the mailing
 list).
 
 
 
 I don't think they made it to the list?  I didn't see it, presumably 
 Scott got a direct copy.  I'd like to get a copy and see the graphs even 
 if takes an off-list message.  If it's an 8.2 checkpoint issue, I know 
 exactly what shapes those take in terms of the disk I/O pattern.
 

Sorry -- I didn't get them from the list, I was CC'd along with the list, and 
so my copy has the images.

 -- 
 Greg Smith2ndQuadrant   Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com  www.2ndQuadrant.com
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-22 Thread Scott Carey
On Jan 22, 2010, at 12:42 PM, fka...@googlemail.com wrote:
 
 'Writing twice': That is the most interesting point I
 believe. Why is the data disk doing 40 MB/s *not* including
 WAL, however, having 20 MB/s write thoughput in fact. Seems
 like: 20 MB for data, 20 MB for X, 20 MB for WAL. 
 

There are a few things that can do this for non-TOAST stuff.  The other comment 
that TOAST writes all zeros first might be related too.

 Although that questions is still unanswered: I verified
 again that I am disk bound by temporarily replacing the
 raid-0 with slower solution: a singly attached sata disk
 of the same type: This *did* slow down the test a lot
 (approx.  20%). So, yes, I am disk bound but, again, why
 that much...
 

Sometimes disk bound (as the graphs show).  I suspect that if you artificially 
slow your CPU down (maybe force it into power saving mode with a utility) it 
will also be slower.  The I/O seems to be the most significant part though.

 
 (1) First, the most important 8.2.4 defaults (for you to
 overlook):
 
 #shared_buffers=32MB

Try 200MB for the above
 #temp_buffers=8MB

You tried making this larger, which helped some.

 #bgwriter_delay=200ms
 #bgwriter_lru_percent=1.0
 #bgwriter_lru_maxpages=5
 #bgwriter_all_percent=0.333
 #bgwriter_all_maxpages=5
 #checkpoint_segments=3
 #checkpoint_timeout=5min
 #checkpoint_warning=30s

Check out this for info on these parameters
http://wiki.postgresql.org/wiki/User:Gsmith  (Is there a better link Greg?)

 #fsync=on
Changing this probably helps the OS spend less time flushing to disk.

 
 (2) The tests:
 
 Note: The standard speed was about 800MB/40s, so 20MB/s.
 
 
 a)
 What I changed: fsync=off
 Result: 35s, so 5s faster.
 
 
 b) like a) but:
 checkpoint_segments=128 (was 3)
 autovacuum=off
 
 Result: 35s (no change...?!)
 

yes, more checkpoint_segments will help if your shared_buffers is larger, it 
won't do a whole lot otherwise.  Generally, I like to keep these roughly equal 
sized as a starting point for any small to medium sized configuration.  So if 
shared_buffers is 1GB, that takes 64 checkpoint segments to hold for heavy 
write scenarios.

 
 c) like b) but:
 temp_buffers=200MB (was 8)
 wal_sync_method=open_datasync (was fsync)
 wal_buffers=1024kB (was 64)
 
 Result:
 The best ever, it took just 29s, so 800MB/29s = 27.5MB/s.
 However, having autovacuum=off probably means that deleted
 rows will occupy disk space? And I also fear that
 checkpoint_segments=128 mean that at some point in the
 future there will be a huge delay then (?).

I am curious which of the two helped most.  I don't think temp_buffers should 
do anything (it is for temp tables afaik).

 d) also like b) but:
 temp_buffers=1000MB
 wal_buffers=4096kB
 checkpoint_segments=3
 autovacuum=on
 
 Result: Again slower 36s
 

Try changing shared_buffers.  This is where uncommitted data needs to avoid 
overflowing before a commit.  If this was non-TOAST data, i would suspect this 
is the cause of any double-writing. But I don't know enough about TOAST to know 
if the same things happen here.


 Ok, I've managed to use 8.4 here. Unfortunatelly: There was
 nearly no improvement in speed. For example test 2d)
 performed in 35s.
 

With a very small shared_buffers the improvements to Postgres' shared_buffer / 
checkpoint interaction can not be utilized.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-21 Thread Scott Carey

On Jan 20, 2010, at 5:32 AM, fka...@googlemail.com wrote:

 
 Bulk inserts into an indexed table is always significantly
 slower than inserting unindexed and then indexing later.
 
 Agreed. However, shouldn't this be included in the disk-time
 counters? If so, it should by near 100%.
 

Well, something is causing the system to alternate between CPU and disk bound 
here. 
(see below).
It would be useful to see what affect the index has.


 
 THE TESTS:
 
 In the attachement you'll find 2 screenshots perfmon34.png
 and perfmon35.png (I hope 2x14 kb is o.k. for the mailing
 list).
 
 To explain it a bit:
 
 
 (A) The setting (to recall):
 
 WinXP, 4 CPU 3 Ghz, 4 GB RAM, Postgres 8.2.4, postgres
 system data on hd D:, postgres data on separate sata raid-0
 hd G: (serial write performance of 100-200 MB/s), pg_xlog
 symlinked to separate hd E: (sata 10.000 rpm); using libpq
 and PQexecParams() with $001.. notation and Format=1
 (binary) for doing 10 times a simple INSERT command to
 insert 80 MB of bytea data (so 800 MB in total) into a
 simple 5 col table (1 bytea with STORAGE EXTERNAL, rest
 unused cols, with oids, index on oid; all done locally.
 
 
 (B) perfmon34.png: Results/graphs (performance monitor):
 
Great data!

 (1) The upper dark/gray graph touching the 100% sometimes is
 disk write time % of the data disk G:
 
 (2) The yellow graph is nearly completly overpainted by (1)
 since it is disk time %.
 
 (3) The brown graph below (1) is Disk Write Byts/s divided
 by 1.000.000, so around 40 MB/s average.
 

Looks like it is writing everything twice, or close to it.  Alternatively the 
index writes occupy half, but that is unlikely.


 (4) The read graph is Disk Time % of the WAL drive E:,
 average approx. 30%.
 

WAL doesn't look like a bottleneck here, as other tests have shown.
A larger wal_buffers setting might lower this more, since your record overflows 
the buffer for sure.
You might want to change your test case to write records similar size to what 
you expect (do you expect 80MB?) and then set wal_buffers up to the size of one 
checkpoint segment (16MB) if you expect larger data per transaction.

 (5) Below (4) there is CPU time in total (average around
 20%), all 4 CPUs counted -- and please beleave me: separate
 CPU logs shows CPUs peaks never above 40-50%. Watching the
 several postgres processes: 0-10% CPU usage.

What I see here is the following:
The system is I/O bound, or close, and then it is CPU bound.  Note how the CPU 
spikes up by about 25% (one CPU) when the disk time drops.


 
 (6) The blue/cyan line is Disk Writes/sec divided by 100,
 so around 1000 writes/s max for the data drive G:
 
 (7) The pink graph (Disk reads/s of data disk G:) shows
 nearly zero activity.
 
 (8)
 Duration of it all 40s, so inserting 800MB is done at a
 speed of 20MB/s.
 
 (9) The other tool mentioned (DiskMon) tool had some
 problems to list all data writes in parallel. It continued
 to fill its list for 5 min.  after the test was done. I have
 not examined its results.
 
Yes, that tool by default will log a LOT of data.  It will be useful later if 
we want to figure out what 
sort of writes happen during the peaks and valleys on your chart.
 
 (C) My interpretation
 
 (1) Although the data disk G: sometimes hits 100%: All in
 all it seems that neither the CPUs nor the data disk
 (approx. 65%) nor the WAL disk (approx. 30%) are at their
 limits. See also 1000 writes/s, 40MB/s write thoughput.
 

I think it is alternating.  Whatever is causing the 25% CPU jump during the 
'slow' periods is a clue.  Some process on the system must be increasing its 
time significantly in these bursts.  I suspect it is postgres flushing data 
from its shared_buffers to the OS.  8.2 is not very efficient at its ability to 
write out to the OS in a constant stream, and tends to be very 'bursty' like 
this.  I suspect that 8.3 or 8.4 would perform a lot better here, when tuned 
right.

 (2) The screenshot also demonstrates that the whole process
 is not done smoothly but seems to be interrupted: About 15
 times the data disk time% changes between 100% and ~40%.
 
 
 (D) perfmod35.png: Further tests (fsync=off)
 
 I repeated the whole thing with fsync=off. The result was
 remarkably better (35s instead of 40s, so 22.8 MB/s). The
 former 100% peaks of the data disk G: are now flat 100%
 tops for approx 50% of the time.
 
 See attachement perfmon35.png
 
 
 (E) Remaining questions
 
 (1)
 It seems that I am disk bound, however it is still a bit
 unclear what the system is waiting for during these
 significant 'interrupts' when neither the disk nor the CPU
 (nor postgress processes) seem to be at its limits.
 

I suspect it is the postgres checkpoint system interacting with the write 
volume and size of shared_buffers.
This interaction was significantly improved in 8.3.  If you can, running a 
contemporary version would probably help a lot.  8.2.anything is rather old 
from a performance perspective.
Adjusting the work_mem and 

Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-21 Thread Greg Smith

Scott Carey wrote:

On Jan 20, 2010, at 5:32 AM, fka...@googlemail.com wrote:

  

In the attachement you'll find 2 screenshots perfmon34.png
and perfmon35.png (I hope 2x14 kb is o.k. for the mailing
list).




I don't think they made it to the list?  I didn't see it, presumably 
Scott got a direct copy.  I'd like to get a copy and see the graphs even 
if takes an off-list message.  If it's an 8.2 checkpoint issue, I know 
exactly what shapes those take in terms of the disk I/O pattern.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-21 Thread Matthew Wakeling

On Thu, 21 Jan 2010, Greg Smith wrote:

In the attachement you'll find 2 screenshots perfmon34.png
and perfmon35.png (I hope 2x14 kb is o.k. for the mailing
list).


I don't think they made it to the list?


No, it seems that no emails with image attachments ever make it through 
the list server. Someone mentioned something about banning the guy who set 
the list up from the internet or something. 
http://archives.postgresql.org/pgsql-performance/2008-01/msg00290.php


Matthew

--
Bashir: The point is, if you lie all the time, nobody will believe you, even
when you're telling the truth. (RE: The boy who cried wolf)
Garak: Are you sure that's the point, Doctor?
Bashir: What else could it be?-- Star Trek DS9
Garak: That you should never tell the same lie twice. -- Improbable Cause

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-21 Thread Florian Weimer
* Matthew Wakeling:

 The data needs to be written first to the WAL, in order to provide
 crash-safety. So you're actually writing 1600MB, not 800.

In addition to that, PostgreSQL 8.4.2 seems pre-fill TOAST files (or
all relations?) with zeros when they are written first, which adds
another 400 to 800 MB.

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread fka...@googlemail.com
fka...@googlemail.com:

 I'll try to execute these tests on a SSD
 and/or Raid system.

FYI:

On a sata raid-0 (mid range hardware) and recent 2x 1.5 TB
disks with a write performance of 100 MB/s (worst, to 200
MB/s max), I get a performance of 18.2 MB/s. Before, with
other disk 43 MB/s (worst to 70 MB/s max) postgres came to
14-16 MB/s.

So, I conclude finally:

(1) Postgresql write throughput (slowly) scales with the
harddisk speed.

(2) The throughput (not counting WAL doubling data) in
postgresql is 20-25% of the disk thoughput.


I want to thank you all for the very good support!

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread fka...@googlemail.com
Scott Carey:

 You are CPU bound.
 
 30% of 4 cores is greater than 25%.  25% is one core fully
 used.

I have measured the cores separately. Some of them reached
30%. I am not CPU bound here.

 The postgres compression of data in TOAST is
 probably the problem.  I'm assuming its doing Gzip, and at
 the default compression level, which on random data will
 be in the 15MB/sec range.  I don't know if TOAST will do
 compression at a lower compression level.

Hm. I use 'bytea' (see original posting) and SET STORAGE
EXTERNAL for this column which switches of the compression
AFAIK. I was doing this to measure the raw performance and
may not be able to use compression later in real scenario.

BTW: In the initial tests I used 200 blocks of 4 MB bytea
which is the real scenario; later on I was using 10 times
80MB each just to reduce the number of INSERT commands and
to make it easier to find the performance problem.


 Is your data typically random or incompressible?  If it is
 compressible then your test should be changed to reflect
 that.

Unfortunatelly I can't say much yet. Of course, in case
compression makes sense and fits the CPU performance I will
use it.


 If I am wrong, you are I/O bound

Yes. This is the first half of what we found out now.

 -- this will show up in
 windows Performance Monitor as Disk Time (%) -- which
 you can get on a per device or total basis, along with i/o
 per second (read and/or write) and bytes/sec metrics.

Yes, I am using this tool.

However, the deeper question is (sounds ridiculous): Why am
I I/O bound *this much* here. To recall: The write
performance in pg is about 20-25% of the worst case serial
write performance of the disk (and only about 8-10% of the
best disk perf) even though pg_xlog (WAL) is moved to
another disk, only 10 simple INSERT commands, a simple table
of 5 columns (4 unused, one bytea) and one index for OID, no
compression since STORAGE EXTERNAL, ntfs tweaks (noatime
etc), ...


 To prove that you are CPU bound, split your test in half,
 and run the two halves at the same time.  If you are CPU
 bound, then your bytes/sec performance will go up
 significantly, along with CPU usage.

Done already (see earlier posting). I am not CPU bound.
Speed was the same.


Thank You for the detailed reply.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread fka...@googlemail.com
Pierre Frédéric Caillaud:

 I don't remember if you used TOAST compression or not.

I use 'bytea' and SET STORAGE EXTERNAL for this column.
AFAIK this switches off the compression.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread Ivan Voras

On 01/19/10 11:16, fka...@googlemail.com wrote:

fka...@googlemail.com:


I'll try to execute these tests on a SSD
and/or Raid system.


FYI:

On a sata raid-0 (mid range hardware) and recent 2x 1.5 TB
disks with a write performance of 100 MB/s (worst, to 200
MB/s max), I get a performance of 18.2 MB/s. Before, with
other disk 43 MB/s (worst to 70 MB/s max) postgres came to
14-16 MB/s.


[I just skimmed this thread - did you increase the number of WAL logs to 
something very large, like 128?]



So, I conclude finally:

(1) Postgresql write throughput (slowly) scales with the
harddisk speed.

(2) The throughput (not counting WAL doubling data) in
postgresql is 20-25% of the disk thoughput.


And this is one of the more often forgot reasons why storing large 
objects in a database rather than in the file systems is a bad idea :)




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread Richard Huxton

On 19/01/10 10:50, fka...@googlemail.com wrote:

However, the deeper question is (sounds ridiculous): Why am
I I/O bound *this much* here. To recall: The write
performance in pg is about 20-25% of the worst case serial
write performance of the disk (and only about 8-10% of the
best disk perf) even though pg_xlog (WAL) is moved to
another disk, only 10 simple INSERT commands, a simple table
of 5 columns (4 unused, one bytea) and one index for OID, no
compression since STORAGE EXTERNAL, ntfs tweaks (noatime
etc), ...


I'm no Windows expert, but the sysinternals tools (since bought by 
Microsoft) have always proved useful to me.


Diskmon should show you what's happening on your machine:
http://technet.microsoft.com/en-us/sysinternals/bb896646.aspx

Be aware that this will generate a *lot* of data very quickly and you'll 
need to spend a little time analysing it. Try it without PG running to 
see what your system is up to when idle first to get a baseline.


Unfortunately it doesn't show disk seek times (which is probably what 
you want to measure) but it should let you decode what reads/writes are 
taking place when. If two consecutive disk accesses aren't adjacent then 
that implies a seek of course. Worst case you get two or more processes 
each accessing different parts of the disk in an interleaved arrangement.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread fka...@googlemail.com
Ivan Voras:

 [I just skimmed this thread - did you increase the number of WAL logs to 
 something very large, like 128?]

Yes, I tried even more.

I will be writing data quite constantly in the real scenario
later. So I wonder if increasing WAL logs will have a
positive effect or not: AFAIK when I increase it, the
duration after the max is hit will be longer then (which is
not acceptable in my case).

Could anyone confirm if I got it right?

 Felix


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread Kevin Grittner
fka...@googlemail.com fka...@googlemail.com wrote: 
 Scott Carey:
 
 You are CPU bound.
 
 30% of 4 cores is greater than 25%.  25% is one core fully
 used.
 
 I have measured the cores separately. Some of them reached
 30%. I am not CPU bound here.
 
If you have numbers like that when running one big query, or a
stream of queries one-at-a-time, you are CPU bound.  A single
request only uses one CPU at a time although it could switch among a
number of them, if the OS doesn't make an effort to keep each
process with the same CPU.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread Ivan Voras

On 01/19/10 14:36, fka...@googlemail.com wrote:

Ivan Voras:


[I just skimmed this thread - did you increase the number of WAL logs to
something very large, like 128?]


Yes, I tried even more.

I will be writing data quite constantly in the real scenario
later. So I wonder if increasing WAL logs will have a
positive effect or not: AFAIK when I increase it, the
duration after the max is hit will be longer then (which is
not acceptable in my case).

Could anyone confirm if I got it right?


It seems so - if you are writing constantly then you will probably get 
lower but more long-term-stable performance from a smaller number of WAL 
logs.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-19 Thread Scott Carey

On Jan 19, 2010, at 2:50 AM, fka...@googlemail.com wrote:

 Scott Carey:
 
 You are CPU bound.
 
 30% of 4 cores is greater than 25%.  25% is one core fully
 used.
 
 I have measured the cores separately. Some of them reached
 30%. I am not CPU bound here.
 

Measuring the cores isn't enough.  The OS switches threads between cores faster 
than it aggregates the usage time.  On Windows, measure the individual process 
CPU to see if any of them are near 100%.  One process can be using 100% (one 
full cpu, cpu bound) but switching between cores making it look like 25% on 
each.

If the individual postgres backend is significantly less than 100%, then you 
are probably not CPU bound.  If this is the case and additionally the system 
has significant disk wait time, then you are definitely not CPU bound.
Record and post the perfmon log if you wish.  In the Process section, select 
CPU time %, system time %, and user time % for all processes.  In the graph, 
you should 
see one (or two) processes eating up that CPU during the test run. 

 
 If I am wrong, you are I/O bound
 
 Yes. This is the first half of what we found out now.
 

Does the OS report that you are actually waiting on disk? See the PerfMon 
Physical Disk section.  Disk time % should be high if you are waiting on disk.

 -- this will show up in
 windows Performance Monitor as Disk Time (%) -- which
 you can get on a per device or total basis, along with i/o
 per second (read and/or write) and bytes/sec metrics.
 
 Yes, I am using this tool.
 

What does it report for disk time %?   How many I/O per second?

 However, the deeper question is (sounds ridiculous): Why am
 I I/O bound *this much* here. To recall: The write
 performance in pg is about 20-25% of the worst case serial
 write performance of the disk (and only about 8-10% of the
 best disk perf) even though pg_xlog (WAL) is moved to
 another disk, only 10 simple INSERT commands, a simple table
 of 5 columns (4 unused, one bytea) and one index for OID, no
 compression since STORAGE EXTERNAL, ntfs tweaks (noatime
 etc), ...
 

Its not going to be completely serial, we want to know if it is disk bound, and 
if so by what type of access.  The disk time %, i/o per second, and MB/sec are 
needed to figure this out. MB/sec alone is not enough.  PerfMon has tons of 
useful data you can extract on this -- i/o per second for writes and reads, 
size of writes, size of reads, time spent waiting on each disk, etc.

All the writes are not serial.   Enough disk seeks interleaved will kill the 
sequential writes.  
You have random writes due to the index.  Try this without the index and see 
what happens. The index is also CPU consuming.
You can probably move the index to the other disk too (with tablespaces), and 
the random disk activity may then follow it.  
To minimize index writes, increase shared_buffers. 

 
 To prove that you are CPU bound, split your test in half,
 and run the two halves at the same time.  If you are CPU
 bound, then your bytes/sec performance will go up
 significantly, along with CPU usage.
 
 Done already (see earlier posting). I am not CPU bound.
 Speed was the same.
 
If that result correlates with the system reporting high Disk Time (%), and the 
MB/sec written is low, then random writes or reads are causing the slowdown.  
The chief suspects for that are the WAL log, and the index.  The WAL is 
generally sequential itself, and not as much of a concern as the index.

Another reply references DiskMon from sysinternals.  This is also highly 
useful.  Once you have identified the bottleneck from PerfMon, you can use this 
to see the actual system API calls for the disk reads and writes, tracking down 
to which process on what file.  Much more than you can get from Linux easily.

Bulk inserts into an indexed table is always significantly slower than 
inserting unindexed and then indexing later.  Partitioned tables combined with 
staging tables can help here if you need to increase insert throughput more.  
Also, if random writes are your problem, a battery backed caching raid 
controller will significantly improve performance, as will anything that can 
improve random write performance (high quality SSD, faster RPM disks, more 
disks).

 
 Thank You for the detailed reply.
 
 Felix
 
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
Hello Pierre,

thank You for these useful test commands.

Here is what I did:


Pierre Frédéric Caillaud:

   Try this :
 
 CREATE TABLE test AS SELECT * FROM yourtable;
 
   This will test write speed, and TOAST compression speed.
   Then try this:

(1)

Setting:

* pg_xlog sym'linked to another disk (to system disk)
* having approx 11.1 GB in 'yourtable' on data disk
* executed SQL by pgAdmin III (as above, no transaction)

Speed:

* 754 s (14.5 MB/s)


 CREATE TABLE test (LIKE yourtable);
 COMMIT;
 INSERT INTO test SELECT * FROM yourtable;
 
   This does the same thing but also writes WAL.
   I wonder what results you'll get.

(2)

Setting: like (1), and 'test' table removed first
Speed: 752 s (so, the same since pg_xlog sym'linked)


(3)

Setting: like (2), but removed symlink of pg_xlog, so
having it again on data disk where big data is

Speed: 801 s (so ~1 minute longer)

BTW: I expected longer duration for scenario (3).



IMHO: As neither the CPUs nor the disk throughput nor the
postgres.exe task's CPU consumption was at its limits: I
wonder what is the problem here. Maybe it is not postgresql
related at all. I'll try to execute these tests on a SSD
and/or Raid system.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
Hannu Krosing:

 On Thu, 2010-01-14 at 21:14 +0100, fka...@googlemail.com wrote:
  Thanks a lot for your reply.
  
  Hannu Krosing:
  
4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.
   
   try inserting the same data using 4 parallel connections or even 8
   parallel ones.
  
  Interesting idea -- I forgot to mention though that 2-3
  cores will be occupied soon with other tasks.
 
 Even one core will probably be idling at the througput you mentioned, so
 the advice still stands, use more than one connection to get better
 throughput.

Thank You. Since connecting more than once would mean some
major changes in my db layer I fear considering it as a
solution.

BTW: I do not get the idea behind that. Since firstly, I
later will have just one core free for postgres processes,
and secondly neither the cpu nor the postgres processes seem
to be really busy yet. Do you mean a postgres process may be
programmed in a way that it waits for something unknown
which can be surrounded by feeding another postgres process
with work, even on the same CPU?

As a short check, this is what I did (see other postings
from today for further scenarios I tested):

Setting:

* About 11.1 GB data in the table bin_table on a
  separate data disk from the tests the last days (mostly
  rows of 80 MB bin data each)

* WAL/pg_xlog not symlinked to another disk anymore.

* created tables test + test2 LIKE bin_table

* 2x times pgAdminIII, running:
  INSERT INTO test SELECT * FROM bin_table;

  resp.

  INSERT INTO test2 SELECT * FROM bin_table;

Result:

* To copy those 11.1 GB into test + test2 in parallel it
  took 1699 s (13,17 MB/s)

This is what was to expect. It took quite exactly 2 times of
what 1 process needs for writing half of the data.


Thank You again.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
Dave Crooke:

 If you don't need that level of consistency for your 8MB blobs, write them
 to plain files named with some kind of id, and put the id in the database
 instead of the blob.

The problem here is that I later need to give access to the
database via TCP to an external client. This client will
then read out and *wipe* those masses of data
asynchronously, while I'll continue to writing into to
database.

Separating the data into an ID value (in the database) and
ordinary binary files (on disk elsewhere) means, that I need
to implement a separate TCP protocol and talk to the client
whenever it needs to read/delete the data. I try to avoid
that extra task. So postgres shall function here as a
communicator, too, not only for saving data to disk.

Thank you.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
Hannu Krosing:

 did you also test this with fsync=off ?

Yes. No significant difference.


 I suspect that what you are seeing is the effect of randomly writing to
 the index files. While sequential write performance can be up to
 80MB/sec on modern drives, sequential writes are an order of magnitude
 slower. And at your data sizes you are very likely to hit a 
 CHECKPOINT, which needs to do some random writes.

Yes, from the server log I noticed that I hit checkpoints
too early and too often. I tried the astronomical value of
1000 for checkpoint_segments to not hit a single one for the
whole test run (copying 800 MB) -- even though that is no
good idea in practice of course.

It took even longer then. Probably because the server
created a lot of 16 MB log files (about 300 in my case)
which is presumly more costy (at least for the first run?)
than overwriting existing files. I am not too much into
that, though, since this is not a solution anyway on the
long run IMHO.

Thanks again.

 Felix
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
Matthew Wakeling:

 The data needs to be written first to the WAL, in order to provide 
 crash-safety. So you're actually writing 1600MB, not 800.

I come back again to saving WAL to another disk. Now, after
all, I wonder: Doesn't the server wait anyway until WAL was
written to disk?

So, if true, does it should not really matter if WAL is
written to another disk then or not (besides some savings by
2x hd cache and less hd head moves).

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread fka...@googlemail.com
fka...@googlemail.com:

 I'll try to execute these tests on a SSD
 and/or Raid system.

FYI:

On a recent but mid-range performing SSD (128 MB size,
serially writing 80-140 MB, 100 MB average) the situation
was worse for some reason. No difference by fsync=on/off.

 Felix


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-18 Thread Scott Carey

On Jan 18, 2010, at 3:20 AM, fka...@googlemail.com wrote:

 Hello Pierre,
 
 thank You for these useful test commands.
 
 Here is what I did:
 
 
 Pierre Frédéric Caillaud:
 
  Try this :
 
 CREATE TABLE test AS SELECT * FROM yourtable;
 
  This will test write speed, and TOAST compression speed.
  Then try this:
 
 (1)
 
 Setting:
 
 * pg_xlog sym'linked to another disk (to system disk)
 * having approx 11.1 GB in 'yourtable' on data disk
 * executed SQL by pgAdmin III (as above, no transaction)
 
 Speed:
 
 * 754 s (14.5 MB/s)
 
 
 CREATE TABLE test (LIKE yourtable);
 COMMIT;
 INSERT INTO test SELECT * FROM yourtable;
 
  This does the same thing but also writes WAL.
  I wonder what results you'll get.
 
 (2)
 
 Setting: like (1), and 'test' table removed first
 Speed: 752 s (so, the same since pg_xlog sym'linked)
 
 
 (3)
 
 Setting: like (2), but removed symlink of pg_xlog, so
 having it again on data disk where big data is
 
 Speed: 801 s (so ~1 minute longer)
 
 BTW: I expected longer duration for scenario (3).
 
 
 
 IMHO: As neither the CPUs nor the disk throughput nor the
 postgres.exe task's CPU consumption was at its limits: I
 wonder what is the problem here. Maybe it is not postgresql
 related at all. I'll try to execute these tests on a SSD
 and/or Raid system.
 
 Felix
 

You are CPU bound.

30% of 4 cores is greater than 25%.  25% is one core fully used.  The postgres 
compression of data in TOAST is probably the problem.  I'm assuming its doing 
Gzip, and at the default compression level, which on random data will be in the 
15MB/sec range.  I don't know if TOAST will do compression at a lower 
compression level.  Is your data typically random or incompressible?  If it is 
compressible then your test should be changed to reflect that.


If I am wrong, you are I/O bound -- this will show up in windows Performance 
Monitor as Disk Time (%) -- which you can get on a per device or total basis, 
along with i/o per second (read and/or write) and bytes/sec metrics.

To prove that you are CPU bound, split your test in half, and run the two 
halves at the same time.  If you are CPU bound, then your bytes/sec performance 
will go up significantly, along with CPU usage.

If you are I/O bound, it will stay the same or get worse.

-Scott
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-16 Thread Pierre Frédéric Caillau d



I've changed the setting a bit:

(1) Replaced 7.200 disk by a 10.000 one, still sata though.

(2) Inserting rows only 10x times (instead of 100x times)
but 80mb each, so having the same amount of 800mb in total.

(3) Changed the WAL path to the system disk (by the
great 'junction' trick mentioned in the other posting), so
actually splitting the write access to the system disk and
the fast data disk.



And here is the frustrating result:

1. None of the 4 CPUs was ever more busy than 30% (never
less idle than 70%),

2. while both disks kept being far below the average write
performance: the data disk had 18 peaks of approx. 40 mb
but in total the average thoughput was 16-18 mb/s.


BTW:

* Disabling noatime and similar for ntfs did not change
things much (thanks though!).

* A short cross check copying 800mb random data file from
system to data disk showed a performance of constantly
75 mb/s.


So, I have no idea what remains as the bottleneck.

 Felix


Try this :

CREATE TABLE test AS SELECT * FROM yourtable;

This will test write speed, and TOAST compression speed.
Then try this:

CREATE TABLE test (LIKE yourtable);
COMMIT;
INSERT INTO test SELECT * FROM yourtable;

This does the same thing but also writes WAL.
I wonder what results you'll get.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread Florian Weimer
 Florian Weimer:

  Do you have any further idea why 16MB/s seems to be the limit here?
 
 BYTEA deserialization is very slow, and this could be a factor here.
 Have you checked that you are in fact I/O bound?

 Could you elaborate that a bit? It sounds interesting but I
 do not get what you mean by:

 bytea deserialization: Do you mean from an escaped string
 back to real binary data?

Yes, that is what I meant.

 Does that apply to my case (I use PGexecParam and have the Format
 arg set to 1, binary) ?

Yes, this was my suggestion.  There is probably some other issue,
then.

 I/O bound: What do you mean by that?

You should check (presumably using the Windows performance monitoring
tools, but I'm not familiar with Windows) if the PostgreSQL process is
indeed waiting on disk I/O.

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread Pierre Frédéric Caillau d



http://www.hagander.net/talks/Advanced%20PostgreSQL%20on%20Windows.pdf


Great doc ! I'm keeping that ;)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread Matthew Wakeling

On Thu, 14 Jan 2010, fka...@googlemail.com wrote:

The data needs to be written first to the WAL, in order to provide
crash-safety. So you're actually writing 1600MB, not 800.


I understand. So the actual throughput is 32MB/s which is
closer to 43 MB/s, of course.

Can I verify that by temporarily disabling WAL writes
completely and see if the thoughput is then doubled?


There isn't a magic setting in Postgres to disable the WAL. That would be 
far too tempting, and an easy way to break the database.


However, what you can do is to insert the data into the table in the same 
transaction as creating the table. Then Postgres knows that no other 
transactions can see the table, so it doesn't need to be so careful.


Unfortunately, I don't think even this strategy will work in your case, as 
you will be writing to the large object table, which already exists. Could 
someone who knows confirm this?


Matthew

--
Let's say I go into a field and I hear baa baa baa. Now, how do I work 
out whether that was baa followed by baa baa, or if it was baa baa

followed by baa?
- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread Matthew Wakeling

On Thu, 14 Jan 2010, fka...@googlemail.com wrote:

Nevertheless: If your explanation covers all what can be
said about it then replacing the hard disk by a faster one
should increase the performance here (I'll try to check that
out).


Probably. However, it is worth you running the test again, and looking at 
how busy the CPU on the machine is. The disc may be the bottleneck, or the 
CPU may be the bottleneck.


Matthew

--
Take care that thou useth the proper method when thou taketh the measure of
high-voltage circuits so that thou doth not incinerate both thee and the
meter; for verily, though thou has no account number and can be easily
replaced, the meter doth have one, and as a consequence, bringeth much woe
upon the Supply Department.   -- The Ten Commandments of Electronics

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread fka...@googlemail.com
Matthew Wakeling:

 On Thu, 14 Jan 2010, fka...@googlemail.com wrote:
  Nevertheless: If your explanation covers all what can be
  said about it then replacing the hard disk by a faster one
  should increase the performance here (I'll try to check that
  out).
 
 Probably. However, it is worth you running the test again, and looking at 
 how busy the CPU on the machine is. The disc may be the bottleneck, or the 
 CPU may be the bottleneck.

True.

I've changed the setting a bit:

(1) Replaced 7.200 disk by a 10.000 one, still sata though.

(2) Inserting rows only 10x times (instead of 100x times)
but 80mb each, so having the same amount of 800mb in total.

(3) Changed the WAL path to the system disk (by the
great 'junction' trick mentioned in the other posting), so
actually splitting the write access to the system disk and
the fast data disk.



And here is the frustrating result:

1. None of the 4 CPUs was ever more busy than 30% (never
less idle than 70%),

2. while both disks kept being far below the average write
performance: the data disk had 18 peaks of approx. 40 mb
but in total the average thoughput was 16-18 mb/s.


BTW:

* Disabling noatime and similar for ntfs did not change
things much (thanks though!).

* A short cross check copying 800mb random data file from
system to data disk showed a performance of constantly
75 mb/s.


So, I have no idea what remains as the bottleneck.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread fka...@googlemail.com
Pierre Frédéric Caillaud:

   At install, or use a symlink (they exist on windows too !...)
 
   http://stackoverflow.com/questions/1901405/postgresql-wal-on-windows

Very interesting! Did not help much though (see other
posting).

Thank You
 Felix


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread Merlin Moncure
On Thu, Jan 14, 2010 at 9:29 AM, fka...@googlemail.com
fka...@googlemail.com wrote:
 Hello together,

 I need to increase the write performance when inserting
 bytea of 8MB. I am using 8.2.4 on windows with libpq.

 The test setting is simple:

 I write 100x times a byte array (bytea) of 8 MB random data
 into a table having a binary column (and oids and 3 other
 int columns, oids are indexed). I realized that writing 8 MB
 of 0-bytes is optimized away. With random data, the disk
 space now is filled with 800MB each run as expected. I use a
 transaction around the insert command.

 This takes about 50s, so, 800MB/50s = 16MB/s.

 However the harddisk (sata) could write 43 MB/s in the worst
 case! Why is write performance limited to 16 MB/s?


 Some more hints what I do:

 I use PQexecParams() and the INSERT ... $001 notation to NOT
 create a real escapted string from the data additionally but
 use a pointer to the 8MB data buffer.

 I altered the binary column to STORAGE EXTERNAL.

 Some experiments with postgresql.conf (fsync off,
 shared_buffers=1000MB, checkpoint_segments=256) did not
 change the 50s- much (somtimes 60s sometimes a little less).

 4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.


 Do you have any further idea why 16MB/s seems to be the
 limit here?

postgres is simply not geared towards this type of workload.  16mb
isn't too bad actually, and I bet you could significantly beat that
with better disks and multiple clients sending data, maybe even close
to saturate a gigabit line.  However, there are other ways to do this
(outside the db) that are more appropriate if efficiency is a big
concern.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-15 Thread Dave Crooke
I'd second this  a database is doing all kinds of clever things to
ensure ACID consistency on every byte that gets written to it.

If you don't need that level of consistency for your 8MB blobs, write them
to plain files named with some kind of id, and put the id in the database
instead of the blob. This will reduce the amount of disk I/O for storing
each blob by nearly 50%, and will reduce marshaling overheads by a larger
magin.

From your account, it sounds like the database is performing nicely on that
hardware ... 16MB/sec to a raw disk or filesystem is rather slow by modern
standards, but 16MB/sec of database updates is pretty good for having
everything on one slow-ish spindle.

On Fri, Jan 15, 2010 at 3:15 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Thu, Jan 14, 2010 at 9:29 AM, fka...@googlemail.com
 fka...@googlemail.com wrote:
  Hello together,
 
  I need to increase the write performance when inserting
  bytea of 8MB. I am using 8.2.4 on windows with libpq.
 
  The test setting is simple:
 
  I write 100x times a byte array (bytea) of 8 MB random data
  into a table having a binary column (and oids and 3 other
  int columns, oids are indexed). I realized that writing 8 MB
  of 0-bytes is optimized away. With random data, the disk
  space now is filled with 800MB each run as expected. I use a
  transaction around the insert command.
 
  This takes about 50s, so, 800MB/50s = 16MB/s.
 
  However the harddisk (sata) could write 43 MB/s in the worst
  case! Why is write performance limited to 16 MB/s?
 
 
  Some more hints what I do:
 
  I use PQexecParams() and the INSERT ... $001 notation to NOT
  create a real escapted string from the data additionally but
  use a pointer to the 8MB data buffer.
 
  I altered the binary column to STORAGE EXTERNAL.
 
  Some experiments with postgresql.conf (fsync off,
  shared_buffers=1000MB, checkpoint_segments=256) did not
  change the 50s- much (somtimes 60s sometimes a little less).
 
  4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.
 
 
  Do you have any further idea why 16MB/s seems to be the
  limit here?

 postgres is simply not geared towards this type of workload.  16mb
 isn't too bad actually, and I bet you could significantly beat that
 with better disks and multiple clients sending data, maybe even close
 to saturate a gigabit line.  However, there are other ways to do this
 (outside the db) that are more appropriate if efficiency is a big
 concern.

 merlin

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Ivan Voras

fka...@googlemail.com wrote:

Hello together,

I need to increase the write performance when inserting
bytea of 8MB. I am using 8.2.4 on windows with libpq.




This takes about 50s, so, 800MB/50s = 16MB/s.

However the harddisk (sata) could write 43 MB/s in the worst
case! Why is write performance limited to 16 MB/s?


Do you have any further idea why 16MB/s seems to be the
limit here?


Are you doing it locally or over a network? If you are accessing the 
server over a network then it could be the location of the bottleneck.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Matthew Wakeling

On Thu, 14 Jan 2010, fka...@googlemail.com wrote:

This takes about 50s, so, 800MB/50s = 16MB/s.

However the harddisk (sata) could write 43 MB/s in the worst
case! Why is write performance limited to 16 MB/s?


Several reasons:

The data needs to be written first to the WAL, in order to provide 
crash-safety. So you're actually writing 1600MB, not 800.


Postgres needs to update a few other things on disc (indexes on the large 
object table maybe?), and needs to call fsync a couple of times. That'll 
add a bit of time.


Your discs can't write 43MB/s in the *worst case* - the worst case is lots 
of little writes scattered over the disc, where it would be lucky to 
manage 1MB/s. Not all of the writes Postgres makes are sequential. A handy 
way of knowing how sequential the writes are is to listen to the disc as 
it writes - the clicking sounds are where it has to waste time moving the 
disc head from one part of the disc to another.


Matthew

--
No trees were killed in the sending of this message.  However a large
number of electrons were terribly inconvenienced.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Aidan Van Dyk
* fka...@googlemail.com fka...@googlemail.com [100114 09:29]:
 
 This takes about 50s, so, 800MB/50s = 16MB/s.
 
 However the harddisk (sata) could write 43 MB/s in the worst
 case! Why is write performance limited to 16 MB/s?
 
 I altered the binary column to STORAGE EXTERNAL.
 
 Some experiments with postgresql.conf (fsync off,
 shared_buffers=1000MB, checkpoint_segments=256) did not
 change the 50s- much (somtimes 60s sometimes a little less).
 
 4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.
 
 
 Do you have any further idea why 16MB/s seems to be the
 limit here?

So, your SATA disk can do 43MB/s of sequential writes, but you're example
is doing:
1) Sequential writes to WAL
2) Random writes to your index
3) Sequential writes to table heap
4) Sequential writes to table' toast heap
5) Any other OS-based FS overhead

Now, writes #2,3 and 4 don't happen completely concurrently with your
WAL, some of them are still in postgres buffers, but easily enough to
interrupt the stream of WAL enough to certainly make it believable that
with everything going on on the disk, you can only write WAL at a
*sustained* 16 MB/s

If you're running a whole system on a single SATA which can stream
43MB/s, remember that for *every* other read/write sent do the disk, you
lose up to 1MB/s (12ms seek time, read/write, and back).  And in that
every other, you have FS metadata updates, any other file writes the
FS flushes, etc...  20 aditional blocks being that are either read or
written to disk are going to completely chop your 43MB/s rate...

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Florian Weimer
 Do you have any further idea why 16MB/s seems to be the limit here?

BYTEA deserialization is very slow, and this could be a factor here.
Have you checked that you are in fact I/O bound?

You can speed things up by sending the data in binary, by passing
approriate parameters to PQexecParams().

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Pierre Frédéric Caillau d



However the harddisk (sata) could write 43 MB/s in the worst
case! Why is write performance limited to 16 MB/s?

Some more hints what I do:

I use PQexecParams() and the INSERT ... $001 notation to NOT
create a real escapted string from the data additionally but
use a pointer to the 8MB data buffer.

I altered the binary column to STORAGE EXTERNAL.

Some experiments with postgresql.conf (fsync off,
shared_buffers=1000MB, checkpoint_segments=256) did not
change the 50s- much (somtimes 60s sometimes a little less).

4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.


Big CPU and slow disk...

	You should add another disk just for the WAL -- disks are pretty cheap  
these days.
	Writing the WAL on a second disk is the first thing to do on a  
configuration like yours, if you are limited by writes.

It also reduces the fsync lag a lot since the disk is only doing WAL.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com
Thank You for your reply.

Ivan Voras:

 Are you doing it locally or over a network? If you are accessing the 
 server over a network then it could be the location of the bottleneck.

All is done locally (for now).
 
 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com

Thanks a lot for the detailed reply.

Matthew Wakeling:

 On Thu, 14 Jan 2010, fka...@googlemail.com wrote:
  This takes about 50s, so, 800MB/50s = 16MB/s.
 
  However the harddisk (sata) could write 43 MB/s in the worst
  case! Why is write performance limited to 16 MB/s?
 
 Several reasons:
 
 The data needs to be written first to the WAL, in order to provide 
 crash-safety. So you're actually writing 1600MB, not 800.

I understand. So the actual throughput is 32MB/s which is
closer to 43 MB/s, of course.

Can I verify that by temporarily disabling WAL writes
completely and see if the thoughput is then doubled?

 Felix


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com

Thanks a lot for your reply.

Hannu Krosing:

  4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.
 
 try inserting the same data using 4 parallel connections or even 8
 parallel ones.

Interesting idea -- I forgot to mention though that 2-3
cores will be occupied soon with other tasks.

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com
Aidan Van Dyk:

 So, your SATA disk can do 43MB/s of sequential writes, but you're example
 is doing:
 1) Sequential writes to WAL
 2) Random writes to your index
 3) Sequential writes to table heap
 4) Sequential writes to table' toast heap
 5) Any other OS-based FS overhead

Ok, I see. Thanks a lot for the detailed answer! Especially
writing to WAL may eat up 50% as I've learned now. So,
16MB/s x 2 would in fact be 32 MB/s, plus some extras...


However, does that mean: If I have a raw sequential
performance of 100%, I will get a binary write (like in my
example) which is about 33% as a general rule of thumb?

Just to mention:

* The system has two hard disks, the first for
  WinXP, the second purely for the postgres data.

* I was doing nothing else simultanously on the newly
  installed OS.

* The consumed time (50s, see my test case) were needed to
  99.9 % just by PGexecParam() function.

* No network connect to the postgres server (everything
  local).
  
* No complex sql command; just inserting 100x times using
  PGexecParam(), as a transaction.

* The binary data was marked as such in PGexecParam
  (Format = 1).

* What I meant by 43 MB/s worst case: I downloaded
  some hd benchmarks which showed a performance of
  43-70 MB/s. (Whereas repetitions of my postgres test did
  never vary, but *constantly* performed at 16MB/s).

Hm.

Nevertheless: If your explanation covers all what can be
said about it then replacing the hard disk by a faster one
should increase the performance here (I'll try to check that
out).

Thanks again!

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com
Florian Weimer:

  Do you have any further idea why 16MB/s seems to be the limit here?
 
 BYTEA deserialization is very slow, and this could be a factor here.
 Have you checked that you are in fact I/O bound?

Could you elaborate that a bit? It sounds interesting but I
do not get what you mean by:

bytea deserialization: Do you mean from an escaped string
back to real binary data? Does that apply to my case (I use
PGexecParam and have the Format arg set to 1, binary) ?

I/O bound: What do you mean by that?


 You can speed things up by sending the data in binary, by passing
 approriate parameters to PQexecParams().

Do you mean the Format arg =1 ? If not, what is appropriate
here?

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread fka...@googlemail.com
Pierre Frédéric Caillaud:

  4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.
 
   Big CPU and slow disk...
 
   You should add another disk just for the WAL -- disks are pretty cheap  
 these days.
   Writing the WAL on a second disk is the first thing to do on a  
 configuration like yours, if you are limited by writes.
   It also reduces the fsync lag a lot since the disk is only doing WAL.

Good idea -- where can I set the path to WAL?

 Felix



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Fernando Hevia
 

 -Mensaje original-
 De: fka...@googlemail.com

 Nevertheless: If your explanation covers all what can be said 
 about it then replacing the hard disk by a faster one should 
 increase the performance here (I'll try to check that out).
 

Moving the pg_xlog directory to the OS drive should make a difference and it
will cost you zero.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Pierre Frédéric Caillau d
On Thu, 14 Jan 2010 22:28:07 +0100, fka...@googlemail.com  
fka...@googlemail.com wrote:



Pierre Frédéric Caillaud:


 4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.

Big CPU and slow disk...

You should add another disk just for the WAL -- disks are pretty cheap
these days.
Writing the WAL on a second disk is the first thing to do on a
configuration like yours, if you are limited by writes.
It also reduces the fsync lag a lot since the disk is only doing WAL.


Good idea -- where can I set the path to WAL?


At install, or use a symlink (they exist on windows too !...)

http://stackoverflow.com/questions/1901405/postgresql-wal-on-windows

	I've no idea of the other needed NTFS tweaks, like if there is a  
noatime/nodiratime ?...




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-14 Thread Magnus Hagander
2010/1/15 Pierre Frédéric Caillaud li...@peufeu.com:
 On Thu, 14 Jan 2010 22:28:07 +0100, fka...@googlemail.com 
 fka...@googlemail.com wrote:

 Pierre Frédéric Caillaud:

  4 Core CPU 3 Ghz, WinXP, 1 TB SATA disk.

        Big CPU and slow disk...

        You should add another disk just for the WAL -- disks are pretty 
 cheap
 these days.
        Writing the WAL on a second disk is the first thing to do on a
 configuration like yours, if you are limited by writes.
        It also reduces the fsync lag a lot since the disk is only doing WAL.

 Good idea -- where can I set the path to WAL?

        At install, or use a symlink (they exist on windows too !...)

        http://stackoverflow.com/questions/1901405/postgresql-wal-on-windows

        I've no idea of the other needed NTFS tweaks, like if there is a 
 noatime/nodiratime ?...

It does. See 
http://www.hagander.net/talks/Advanced%20PostgreSQL%20on%20Windows.pdf


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance