Hi Ben,
thanks for you answer, I did'nt think of the pythonic freedom, this is
indeed a really good reason ;).
Indeed, my approach is pure SQL and it's quite complicated to think of a
generic function for all case since the SQL approach needs often a
complete rewrite of functions.
Depending on the dbms you use, there are more or less reason to use this
method, but actually there's a pro and cons balance to use the update
from syntax.
Pros are that with this method you can optimize your requests
efficiently, more than thousand time faster than a standard update
approach for large datasets and complex calculus.
Cons are that to use this method you have to think your database schema
in order to apply it. It's totally impossible to imagine a real standard
function since you can use inner joins, group by, custom function, sql
inline to "forge" your source dataset.
I have no idea when this is best to use the "ensemblistic" (I don't know
if this word even exists... :D) method. But I think it's totally
overkill, when you have only 2 or 3 lines to update, to create a
temporary table and feed it with data you already know on the python side.
This approach gives nearly magical results with large datasets, given
that you can already use the insert in the temporary table to validate
some data domains and constraints before you send theses rows in the
final table. (This approach can also apply to insert and delete
statements..)
The main useful case is when you have every datas you need already in
the database and a just have to propagate them with some calculus results.
Let see some interesting example:
Imagine you work on a program who create orders and estimates, and you
wan't to give a user to update an order giving a last minute update on
the estimate. This method enables you to update every orders that are
different of estimates, reading and writing the information directly in
the dbms, instead off taking it to a client and sending it back to the
dbms, and without any parameters to introduce in your query. That could
even be launched with a trigger on update, hiding totally the sql part
from client and giving into some integrity checks before validating the
transaction.
We can also consider a small trick to work on small datasets without
creating temporary tables, because this is overkill to create temporary
table before updating 2 or 3 lines and you almost never need to validate
data in the sql side when you send them from python. You can work with a
"fake dataset" inline:
update table t set name = s.name, city = s.city from
(select 1 as id, 'toto' as name, 'London' as city
union select 2 as id, 'Roger' as name, 'Gandelnans-lès-moulin' as
city) s
where s.id = t.id
Imho, Postgres enables a cleaner syntax, but the concept is the same:
with source as ( select 1 as id, 'toto' as name, 'London' as city
union all select 2 as id, 'Roger' as name,
'Gandelnans-lès-moulin' as city )
update table t set name = s.name, city = s.city from source s
where s.id = t.id
I think it's possible to adapt your function to use this type of fake
dataset, if I find some time this week-end I might consider taking it to
write something.
Then it would be easy to benchmark them and see when it's overkill and
when it's good. The main problem remains in the "never uniform" syntax
that remains around these not well known sql habits and the differences
between dbms.
Regards,
Christophe.
Le 29/03/2013 22:17, Ben Hoyt a écrit :
The main utility of multiple_update is performance. In my tests it's
2-4x faster than separate updates, and this is the main reason I'm
using it in our application.
And putting this logic in a function means there's a nice Pythonic API
to call, rather than hand-building SQL whenever you want to do this.
In terms of the approach you've given with the temporary table, that's
interesting. I haven't used the UPDATE...FROM syntax or temporary
tables much at all, so I didn't think of this approach.
It's not quite as simple as the separate updates in one query.
However, I'd be happy to use that approach if it's significantly
faster, and if it's factored out into the multiple_update() function.
The nice thing about having a function is that you can change how it
works under the covers, but the API can remain the same.
Are you interested in benchmarking my multiple_update (second version)
against your approach and posting results?
-Ben
On Fri, Mar 29, 2013 at 10:00 PM, zbouboutchi
<[email protected] <mailto:[email protected]>> wrote:
Hi Ben,
I'm confused about the utility of this function.. Since it's
possible to use external table with update statements, you
certainly can do the whole update using a single request.
I prepared a little example that reproduces yours in pure SQL.
-- let's create a similar dataset:
create temporary table test (id integer, age integer, name text);
insert into test values (1, 10, 'Rose'), (42, 45, 'Bobby'), (23,
38, 'Sarrah');
-- Here are the values I want to update, in this example, null
won't update the test table.
create temporary table source (id integer, age integer, name text);
insert into source values (42, null, 'Bob'), (23, 30, 'Sarah');
-- The "Bazinga" part:
update test t
set name = coalesce(s.name <http://s.name>, t.name
<http://t.name>), -- a small workaround to keep the original value
if the source is null
age = coalesce(s.age, t.age) -- feel free to use any
function you want, that works exactly the same way as usual.
from source s
where s.id <http://s.id> = t.id <http://t.id>;
-- Here it is:
select * from test order by id;
1;10;Rose
23;30;Sarah
42;50;Bob
-- Bazinga --
This method is the fastest I know, even on largest datasets where
you might waste a lot of time and energy to execute millions and
millions of requests, here you just update your whole dataset with
one request and take profit of all the database optimisations
(indexes etc.). With a modern dbms, you can even check the
execution plan and get usefull tips on needed optimisations before
you execute the nightmare.
The only limitation I know is that you can't update more than 1
table at a time, but transactions give the ability to delay
constraints checks at their end, so, 1 update per table seems legit.
I think not every DBMS can handle this type of update, but it's
part of SQL standards.
At least, the famous ones (postgres, mssql, oracle, mysql etc.)
have these functionnalities, with more or less syntax variations.
I hope this will help.
Regards,
Christophe NINUCCI
Le 27/03/2013 21:41, Ben Hoyt a écrit :
Hi folks,
In our own utility file, I've added a function called
multiple_update() that executes multiple separate update
statements in one query. Note that this is not the same as a
single db.update() that updates multiple rows -- with this
function, the columns and vars for each row can be different.
In my testing (Windows 64-bit, db is PostgreSQL running on
another machine) I've found this to be about 3-4x as fast as
individual db.update()s wrapped in a transaction, so a
worthwhile speed increase. The speed increase factor seems to
be about constant regardless of the number of rows (initially
I was hoping it go up with N, but it didn't in my tests.)
Any thoughts on the API signature, and whether this would be a
good addition to web.DB? Any comments on this approach in
general? Is the way I'm doing the cast to unicode and then
'\n'.join(updates) the correct approach?
Couple of notes:
* If this were part of web.py, obviously it'd be a method on
DB instead of a stand-alone function.
* dbconn.db_connection is our "current database object".
Again, this would go away if it were a DB method.
Code below:
-----
def multiple_update(table, where, values_vars, database=None,
_test=False):
"""Execute multiple separate update statements in one query.
>>> database = web.DB(None, {})
>>> row1 = ({'name': 'Bob'}, {'id': 42})
>>> row2 = ({'name': 'Sarah', 'age': 30}, {'id': 23})
>>> print multiple_update('foo', 'id = $id', [row1, row2],
_test=True, database=database)
UPDATE foo SET name = 'Bob' WHERE id = 42;
UPDATE foo SET age = 30, name = 'Sarah' WHERE id = 23;
>>> print multiple_update('foo', 'id = $id', [],
_test=True, database=database)
None
"""
if not values_vars:
return
if database is None:
database = dbconn.db_connection
updates = []
for values, vars in values_vars:
update = u'UPDATE {table} SET {values} WHERE
{where};'.format(
table=table,
values=web.db.sqlwhere(values, ', '),
where=database._where(where, vars))
updates.append(unicode(update))
query = '\n'.join(updates)
if _test:
return query
database.query(query)
-----
Thanks,
Ben.
--
You received this message because you are subscribed to the
Google Groups "web.py" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to [email protected]
<mailto:webpy%[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
Visit this group at http://groups.google.com/group/webpy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to a topic in
the Google Groups "web.py" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/webpy/dTeS_5Vm7BM/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email
to [email protected]
<mailto:webpy%[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
Visit this group at http://groups.google.com/group/webpy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google
Groups "web.py" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/webpy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
"web.py" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/webpy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.