Re: [sqlalchemy] Extending mapper to use PostgreSQL's INSERT/UPDATE...RETURNING

Igor Katson Wed, 23 Dec 2009 06:24:54 -0800

Michael Bayer wrote:
> On Dec 22, 2009, at 5:20 PM, Igor Katson wrote:
> 
>>> 0.5 supports this using the "postgres_returning" argument to update() and 
>>> insert(), but the ORM doesn't make use of it.  in 0.6, the feature has been 
>>> generalized using the returning() method on insert() and update(), and the 
>>> ORM will use it to fetch new primary key identifiers.
>>>
>>> It does not take effect within the ORM for inline SQL other than primary 
>>> key identifiers, but you can use an insert() with returning() manually to 
>>> achieve that effect.
>>>
>> Why not, Michael? Is it hard to implement this? This seems to be pretty 
>> useful and performance-improving, reducing the amount of selects needed 
>> when coding unaware of object expiry.
> 
> because a. I dont have the time to implement conditional returning() 
> awareness within the ORM and b. I don't think it will have as dramatic a 
> performance impact as you expect.   When I added default returning() support 
> for insert() statements by default, the number of function call counts in our 
> performance tests went up, due to the additional complexity.  I have made the 
> automatic-returning feature optional for all dialects since its not really 
> clear that performance is actually enhanced.   So its a race between Python 
> method performance, which is abysmal, and overhead of SQL round trips.    
> Future enhancements like turning ResultProxy into C (which some work has been 
> done on) could then be game changers again.
> 
> Other reasons performance might not increase much at all are:
> 
> 1. a typical ORM application may create/modify a bunch of objects, then call 
> commit().  By default everything is expired unconditionally in that case - 
> RETURNING for non-primary key columns is wasted overhead.
> 2. the "set an attribute to a SQL expression" use case is very rare.  in most 
> cases the additional complexity of deciding about returning() at the ORM 
> level just adds complexity for no reason.
> 3. The use case of SQL-level defaults/onupdates on columns is there, but 
> again the usual use here is for timestamps and such that usually aren't 
> fetched in the same transaction as the one in which they are flushed - 
> fetching them back just so they can be thrown away is again wasted overhead.
> 4. its not clear overall if returning() is faster for all DBAPIs - in 
> particular the mechanics of using RETURNING with oracle and MSSQL are pretty 
> complex.
> 
> 
>> P.S. And in SQLAlchemy 0.5, what's the most orm-like way to do solve 
>> this? In my code, it looks like this now, cause I didn't find an easier 
>> (less lines + less sql-like code) way to do it.
>>        columns = ','.join([str(col) for col in obj_table.c])
>>        query = '''UPDATE %(table)s
>>        SET col = col + %(value)s
>>        WHERE id = %(id)s
>>        RETURNING %(columns)s''' % {
>>            'id': id,
>>            'columns': columns,
>>            'table': obj_table,
>>            'value': value
>>        }
>>        obj = meta.Session.query(Obj).from_statement(query).first()
> 
> As far as the rendering above, you should use the postgres_returning keyword 
> in conjunction with an update() construct, instead of piecing together a 
> string.  It should be in the API documentation.    The usage of query(Obj) to 
> call it and get an object back from RETURNING is novel and I hadn't thought 
> of that.     You need to ensure that the rows you're matching are not already 
> present in the session, or have been expired, else the new state will not be 
> reflected in the objects.
> 
> You also should take a measurement of performance enhancement with the above 
> technique, vs. using query.update() or Session.execute(update()) and then 
> re-selecting the rows back, or just allowing the natural expiration to result 
> in individual SELECTs as needed.   
> 
> If you truly have some very performance critical section where thousands of 
> rows are being inserted/updated, the ORM is going to give you fairly mediocre 
> performance with or without RETURNING, as it is designed to do lots of the 
> thinking and effort for you in a very generic (read: unoptimized) way and 
> hence is an expensive route to go no matter what.  You'd be better off 
> dropping into the usage of insert()/update()/delete() constructs and not 
> attempting to reinstate the new rows back into ORM objects, since that is all 
> an expensive process designed to make the production of source code easier.   
> In this case your source code is being made more difficult to produce anyway 
> so the ORM might not be worth it.
> 
Thanks, Michael, it's clear now. A was not thinking that Python's 
overhead might be more than RDBMS's one. After profiling my application 
with repoze.profile and apache ab,  I see that SQL only takes around 10% 
of the request serving time (being extremely fast and serving 1000 
requests in 1 second!), everything else is Python.


--

You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Extending mapper to use PostgreSQL's INSERT/UPDATE...RETURNING

Reply via email to