Re: [webpy] A db.multiple_update() function

Ben Hoyt Sun, 31 Mar 2013 15:59:58 -0700

Good comments.

Yeah, I think as soon as one wants to get very fancy with SQL joins,
temporary tables, etc, it's better just to use raw SQL with db.query().


-Ben


On Sat, Mar 30, 2013 at 8:42 PM, zbouboutchi <[email protected]>wrote:

>  Hi Ben,
>
> thanks for you answer, I did'nt think of the pythonic freedom, this is
> indeed a really good reason ;).
>
> Indeed, my approach is pure SQL and it's quite complicated to think of a
> generic function for all case since the SQL approach needs often a complete
> rewrite of functions.
> Depending on the dbms you use, there are more or less reason to use this
> method, but actually there's a pro and cons balance to use the update from
> syntax.
>
> Pros are that with this method you can optimize your requests efficiently,
> more than thousand time faster than a standard update approach for large
> datasets and complex calculus.
> Cons are that to use this method you have to think your database schema in
> order to apply it. It's totally impossible to imagine a real standard
> function since you can use inner joins, group by, custom function, sql
> inline to "forge" your source dataset.
>
> I have no idea when this is best to use the "ensemblistic" (I don't know
> if this word even exists... :D) method. But I think it's totally overkill,
> when you have only 2 or 3 lines to update, to create a temporary table and
> feed it with data you already know on the python side.
> This approach gives nearly magical results with large datasets, given that
> you can already use the insert in the temporary table to validate some data
> domains and constraints before you send theses rows in the final table.
> (This approach can also apply to insert and delete statements..)
>
> The main useful case is when you have every datas you need already in the
> database and a just have to propagate them with some calculus results.
> Let see some interesting example:
> Imagine you work on a program who create orders and estimates, and you
> wan't to give a user to update an order giving a last minute update on the
> estimate. This method enables you to update every orders that are different
> of estimates, reading and writing the information directly in the dbms,
> instead off taking it to a client and sending it back to the dbms, and
> without any parameters to introduce in your query. That could even be
> launched with a trigger on update, hiding totally the sql part from client
> and giving into some integrity checks before validating the transaction.
>
> We can also consider a small trick to work on small datasets without
> creating temporary tables, because this is overkill to create temporary
> table before updating 2 or 3 lines and you almost never need to validate
> data in the sql side when you send them from python. You can work with a
> "fake dataset" inline:
>
> update table t set name = s.name, city = s.city from
>   (select 1 as id, 'toto' as name, 'London' as city
>    union select 2 as id, 'Roger' as name, 'Gandelnans-lès-moulin' as city)
> s
> where s.id = t.id
>
> Imho, Postgres enables a cleaner syntax, but the concept is the same:
>
> with source as ( select 1 as id, 'toto' as name, 'London' as city
>                  union all select 2 as id, 'Roger' as name,
> 'Gandelnans-lès-moulin' as city )
> update table t set name = s.name, city = s.city from source s
> where s.id = t.id
>
> I think it's possible to adapt your function to use this type of fake
> dataset, if I find some time this week-end I might consider taking it to
> write something.
> Then it would be easy to benchmark them and see when it's overkill and
> when it's good. The main problem remains in the "never uniform" syntax that
> remains around these not well known sql habits and the differences between
> dbms.
>
> Regards,
>
> Christophe.
>
>
> Le 29/03/2013 22:17, Ben Hoyt a écrit :
>
>  The main utility of multiple_update is performance. In my tests it's
> 2-4x faster than separate updates, and this is the main reason I'm using it
> in our application.
>
>  And putting this logic in a function means there's a nice Pythonic API
> to call, rather than hand-building SQL whenever you want to do this.
>
>  In terms of the approach you've given with the temporary table, that's
> interesting. I haven't used the UPDATE...FROM syntax or temporary tables
> much at all, so I didn't think of this approach.
>
>  It's not quite as simple as the separate updates in one query. However,
> I'd be happy to use that approach if it's significantly faster, and if it's
> factored out into the multiple_update() function. The nice thing about
> having a function is that you can change how it works under the covers, but
> the API can remain the same.
>
>  Are you interested in benchmarking my multiple_update (second version)
> against your approach and posting results?
>
>  -Ben
>
>
>
> On Fri, Mar 29, 2013 at 10:00 PM, zbouboutchi <[email protected]
> > wrote:
>
>> Hi Ben,
>>
>> I'm confused about the utility of this function.. Since it's possible to
>> use external table with update statements, you certainly can do the whole
>> update using a single request.
>> I prepared a little example that reproduces yours in pure SQL.
>>
>> -- let's create a similar dataset:
>> create temporary table test (id integer, age integer, name text);
>> insert into test values (1, 10, 'Rose'), (42, 45, 'Bobby'), (23, 38,
>> 'Sarrah');
>>
>> -- Here are the values I want to update, in this example, null won't
>> update the test table.
>> create temporary table source (id integer, age integer, name text);
>> insert into source values (42, null, 'Bob'), (23, 30, 'Sarah');
>>
>> -- The "Bazinga" part:
>> update test t
>>   set name = coalesce(s.name, t.name), -- a small workaround to keep the
>> original value if the source is null
>>       age = coalesce(s.age, t.age) -- feel free to use any function you
>> want, that works exactly the same way as usual.
>>   from source s
>>   where s.id = t.id;
>>
>> -- Here it is:
>> select * from test order by id;
>>
>> 1;10;Rose
>> 23;30;Sarah
>> 42;50;Bob
>>
>> -- Bazinga --
>>
>> This method is the fastest I know, even on largest datasets where you
>> might waste a lot of time and energy to execute millions and millions of
>> requests, here you just update your whole dataset with one request and take
>> profit of all the database optimisations (indexes etc.). With a modern
>> dbms, you can even check the execution plan and get usefull tips on needed
>> optimisations before you execute the nightmare.
>> The only limitation I know is that you can't update more than 1 table at
>> a time, but transactions give the ability to delay constraints checks at
>> their end, so, 1 update per table seems legit.
>>
>> I think not every DBMS can handle this type of update, but it's part of
>> SQL standards.
>> At least, the famous ones (postgres, mssql, oracle, mysql etc.) have
>> these functionnalities, with more or less syntax variations.
>> I hope this will help.
>>
>> Regards,
>>
>> Christophe NINUCCI
>>
>> Le 27/03/2013 21:41, Ben Hoyt a écrit :
>>
>>>  Hi folks,
>>>
>>> In our own utility file, I've added a function called multiple_update()
>>> that executes multiple separate update statements in one query. Note that
>>> this is not the same as a single db.update() that updates multiple rows --
>>> with this function, the columns and vars for each row can be different.
>>>
>>> In my testing (Windows 64-bit, db is PostgreSQL running on another
>>> machine) I've found this to be about 3-4x as fast as individual
>>> db.update()s wrapped in a transaction, so a worthwhile speed increase. The
>>> speed increase factor seems to be about constant regardless of the number
>>> of rows (initially I was hoping it go up with N, but it didn't in my tests.)
>>>
>>> Any thoughts on the API signature, and whether this would be a good
>>> addition to web.DB? Any comments on this approach in general? Is the way
>>> I'm doing the cast to unicode and then '\n'.join(updates) the correct
>>> approach?
>>>
>>> Couple of notes:
>>>
>>> * If this were part of web.py, obviously it'd be a method on DB instead
>>> of a stand-alone function.
>>> * dbconn.db_connection is our "current database object". Again, this
>>> would go away if it were a DB method.
>>>
>>> Code below:
>>>
>>> -----
>>> def multiple_update(table, where, values_vars, database=None,
>>> _test=False):
>>>     """Execute multiple separate update statements in one query.
>>>
>>>     >>> database = web.DB(None, {})
>>>     >>> row1 = ({'name': 'Bob'}, {'id': 42})
>>>     >>> row2 = ({'name': 'Sarah', 'age': 30}, {'id': 23})
>>>     >>> print multiple_update('foo', 'id = $id', [row1, row2],
>>> _test=True, database=database)
>>>     UPDATE foo SET name = 'Bob' WHERE id = 42;
>>>     UPDATE foo SET age = 30, name = 'Sarah' WHERE id = 23;
>>>     >>> print multiple_update('foo', 'id = $id', [], _test=True,
>>> database=database)
>>>     None
>>>     """
>>>     if not values_vars:
>>>         return
>>>     if database is None:
>>>         database = dbconn.db_connection
>>>
>>>     updates = []
>>>     for values, vars in values_vars:
>>>         update = u'UPDATE {table} SET {values} WHERE {where};'.format(
>>>                 table=table,
>>>                 values=web.db.sqlwhere(values, ', '),
>>>                 where=database._where(where, vars))
>>>         updates.append(unicode(update))
>>>
>>>     query = '\n'.join(updates)
>>>     if _test:
>>>         return query
>>>     database.query(query)
>>> -----
>>>
>>> Thanks,
>>> Ben.
>>> --
>>>   You received this message because you are subscribed to the Google
>>> Groups "web.py" group.
>>>  To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/webpy?hl=en.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "web.py" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/webpy/dTeS_5Vm7BM/unsubscribe?hl=en.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/webpy?hl=en.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "web.py" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/webpy?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "web.py" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/webpy/dTeS_5Vm7BM/unsubscribe?hl=en.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/webpy?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"web.py" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/webpy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [webpy] A db.multiple_update() function

Reply via email to