On 4/14/14, 5:51 PM, Joe Conway wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 04/14/2014 03:17 PM, Jim Nasby wrote:
On 4/14/14, 4:50 PM, Andres Freund wrote:
On 2014-04-14 14:33:03 -0700, Joe Conway wrote:
I realize there are many things that can be done to improve my
specific scenario, e.g. drop indexes before loading, change
various configs, etc. My purpose for this post is to ask if it
is really expected to get over 20 times as much WAL as heap
data?

I'd bet a large percentage of this will be full page images of
the index. The values you index are essentially distributed over
the whole index, so you'll modifiy the same indx values
repeatedly. But often enough it won't be in the same checkpoint
and thus will create full page images.

My thought exactly...

ISTM that we should be able to push all the index inserts to the
end of the transaction. That should greatly reduce the amount of
full page writes. That would also open the door for doing all the
index inserts in parallel.

That's the thing. I'm sure there is tuning and other things to improve
this particular case, but creating over 20 times as much WAL as real
data seems like pathological behavior to me.

Can you take a look at what's actually going into WAL when the wheels fall off? 
I think it should be pretty easy to test the theory that it's a ton of full 
page writes of index leaf pages...
--
Jim C. Nasby, Data Architect                       j...@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to