Re: [HACKERS] using a lot of maintenance_work_mem

2011-02-18 Thread Frederik Ramm

Tom & Kevin,

   thank you for your replies. Kevin, I had already employed all the 
tricks you mention, except using temporary tables which would be hard 
for me due to the structure of my application (but I could try using 
something like pgbouncer or so), but thanks a lot for sharing the ideas.


Tom Lane wrote:
If I were to either (a) increase MaxAllocSize to, say, 48 GB instead of 
1 GB, or (b) hack tuplesort.c to ignore MaxAllocSize, just for my local 
setup - would that likely be viable in my situation, or would I break 
countless things?


You would break countless things.


Indeed I did. I tried to raise the MaxAllocSize from 1 GB to a large 
number, but immediately got strange memory allocation errors during the 
regression test (something that looked like a wrapped integer in a 
memory allocation request).


I reduced the number in steps, and found I could compile and run 
PostgreSQL 8.3 with a MaxAllocSize of 4 GB, and PostgreSQL 9.0 with 2 GB 
without breakage.


In a completely un-scientific test run, comprising 42 individual SQL 
statements aimed at importing and indexing a large volume of data, I got 
the following results:


pg8.3 with normal MaxAllocSize .. 15284s
pg8.3 with MaxAllocSize increased to 4 GB ... 14609s (-4.5%)
pg9.0 with normal MaxAllocSize .. 12969s (-15.2%)
pg9.0 with MaxAllocSize increased to 2 GB ... 13211s (-13.5%)


I'd want to see some evidence that it's actually
helpful for production situations.  I'm a bit dubious that you're going
to gain much here.


So, on the whole it seems you were right; the performance, at least with 
that small memory increase I managed to build in without breaking 
things, doesn't increase a lot, or not at all for PostgreSQL 9.0.


The single query that gained most from the increase in memory was an 
ALTER TABLE statement to add a BIGINT primary key to a table with about 
50 million records - this was 75% faster on the both 8.3 and 9.0 but 
since it took only 120 seconds to begin with, didn't change the result a 
lot.


The single query where pg9.0 beat pg8.3 by a country mile was a CREATE 
INDEX statement on a BIGINT column to a table with about 500 million 
records - this cost 2679 seconds on normal 8.3, 2443 seconds on 
large-memory 8.3, and aroung 1650 seconds on 9.0, large memory or not.


The query that, on both 8.3 and 9.0, took about 10% longer with more 
memory was a CREATE INDEX statement on a TEXT column.


All this, as I said, completely un-scientific - I did take care to flush 
caches and not run anything in parallel, but that was about all I did so 
it might come out differently when run often.


My result of all of this? Switch to 9.0 of course ;)

Bye
Frederik

--
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09" E008°23'33"

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] using a lot of maintenance_work_mem

2011-02-14 Thread Frederik Ramm

Hi,

I am (ab)using a PostgreSQL database (with PostGIS extension) in a 
large data processing job - each day, I load several GB of data, run a 
lot of analyses on it, and then throw everything away again. Loading, 
running, and dumping the results takes about 18 hours every day.


The job involves a lot of index building and sorting, and is run on a 
64-bit machine with 96 GB of RAM.


Naturally I would like the system to use as much RAM as possible before 
resorting to disk-based operations, but no amount of 
maintenance_work_mem setting seems to make it do my bidding.


I'm using PostgreSQL 8.3 but would be willing and able to upgrade to any 
later version.


Some googling has unearthed the issue - which is likely known to all of 
you, just repeating it to prove I've done my homework - that tuplesort.c 
always tries to double its memory allocation, and will refuse to do so 
if that results in an allocation greater than MaxAllocSize:


 if ((Size) (state->memtupsize * 2) >= MaxAllocSize / sizeof(SortTuple))
 return false;

And MaxAllocSize is hardcoded to 1 GB in memutils.h.

(All this based on Postgres 9.1alpha source - I didn't want to bring 
something up that has been fixed already.)


Now I assume that there are reasons that you're doing this. memutils.h 
has the (for me) cryptic comment about MaxAllocSize: "XXX This is 
deliberately chosen to correspond to the limiting size of varlena 
objects under TOAST. See VARATT_MASK_SIZE in postgres.h.", but 
VARATT_MASK_SIZE has zero other occurences in the source code.


If I were to either (a) increase MaxAllocSize to, say, 48 GB instead of 
1 GB, or (b) hack tuplesort.c to ignore MaxAllocSize, just for my local 
setup - would that likely be viable in my situation, or would I break 
countless things?


I can afford some experimentation; as I said, I'm throwing away the 
database every day anyway. I just thought I'd solicit your advice before 
I do anything super stupid. - If I can use my setup to somehow 
contribute to further PostgreSQL development by trying out some things, 
I'll be more than happy to do so. I do C/C++ but apart from building 
packages for several platforms, I haven't worked with the PostgreSQL 
source code.


Of course the cop-out solution would be to just create a huge RAM disk 
and instruct PostgreSQL to use that for disk-based sorting. I'll do that 
if all of you say "OMG don't touch MaxAllocSize" ;)


Bye
Frederik

--
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09" E008°23'33"

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers