On 05/29/2018 05:18 PM, David G. Johnston wrote:
On Tue, May 29, 2018 at 4:01 PM, Alvaro Herrera <alvhe...@2ndquadrant.com 
<mailto:alvhe...@2ndquadrant.com>>wrote:

    On 2018-May-29, Stuart McGraw wrote:

    > Alternatively if there were a setting to tell Postgresql to
    > follow the SQL standard behavior of overwriting rather stacking
    > savepoints, that too would also solve my current problem I think.
    > Perhaps it is just my limited experience but the former behavior
    > has always seemed more useful in practice than the latter.

    I think if what we're doing breaks the semantics of the SQL spec, we're
    definitely open to changing our behavior.  But that wouldn't solve your
    problem today.  What I think could solve your problem today is a
    C-language extension that uses xact.c callbacks in order to expose a
    list that you can query from user space.

​Stuart:​

That said, have you measured this "leaking" and can show that it is non-trivial 
(given the large size of the overall transaction)?

No I haven't and am not sure how I would.  Are you saying I shouldn't worry 
about it and just not bother releasing any of the savepoints?  I would feel a 
little uneasy about that the same way I would feel about a program that never 
freed allocated memory or closed open files.  If I know there are relatively 
small limits on how much data will be processed or how long the program will 
run, sure.  But in my case I don't control the size of the input data and I 
don't understand the internals of savepoints so I think caution is prudent.

Also I'm not sure the warnings against premature optimization when talking 
about code performance tweaks apply to resource leaks.  The former attempt to 
make a program run faster but don't (in theory) affect its correctness.  
Resource problems often show up unexpectedly and catastrophically.  So being 
more preemptively concerned about the latter I think is justified.

Beyond that bulk ETL leveraging SAVEPOINT is not something I've encountered or 
contemplated.  Expecting and reacting to errors is expensive and itself 
error-prone.  I'd much rather try to design something that where failure is 
simply bad - usually by bulk loading with fewer constraints and then ensuring 
that future queries don't attempt to do something illegal like insert 
duplicates.

Funny you should say that :-)  I am looking at rewriting these import programs 
(there are several) to do just that.  But it is not a trivial job and in the 
meantime I need to keep what already exists, working.



Reply via email to