On Wed, Aug 8, 2012 at 2:41 PM, Geert Mak <po...@verysmall.org> wrote:
> hello everybody,
>
> we are trying to move the data from table1 into table2 using a plpgsql stored 
> procedure which is performing simple a data conversion
>
> there are about 50 million rows
>
> the tables are relatively simple, less than a dozen columns, most are 
> integer, a couple are char(32) and one is varchar holding URLs
>
> what happens is that when we execute the stored procedure, the execution eats 
> up the 17 GB free space on the server and the server crashes
>
> if we apply a limit of 1 million rows, the execution completes successfully 
> in about a minute
>
> we understand, by now, that plpgsql functions are executed in their own 
> transaction, which commits when the function comes to an end
>
> the question is -
>
> are plpgsql functions suitable for such massive operations on data and can 
> this work without so much disk space is being eaten for something which 
> should be simply "read-change-write, read-change-write, read-change-write, 
> …"? i. e. any way to force commit inside, or so?
>
> or should we rather implement this operation in some external scripting 
> language (speed is not that important, this is a one time conversion)

What is the general structure of the procedure?  In particular, how
are you browsing and updating the rows?  There is (almost) no way to
force commit inside a function -- there has been some discussion about
stored procedure and/or autonomous transaction feature in terms of
getting there.

I say 'almost' because you can emulate some aspects of autonomous
transactions with dblink, but that may not be a very good fit for your
particular case.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to