Good comments. Yeah, I think as soon as one wants to get very fancy with SQL joins, temporary tables, etc, it's better just to use raw SQL with db.query().
-Ben On Sat, Mar 30, 2013 at 8:42 PM, zbouboutchi <[email protected]>wrote: > Hi Ben, > > thanks for you answer, I did'nt think of the pythonic freedom, this is > indeed a really good reason ;). > > Indeed, my approach is pure SQL and it's quite complicated to think of a > generic function for all case since the SQL approach needs often a complete > rewrite of functions. > Depending on the dbms you use, there are more or less reason to use this > method, but actually there's a pro and cons balance to use the update from > syntax. > > Pros are that with this method you can optimize your requests efficiently, > more than thousand time faster than a standard update approach for large > datasets and complex calculus. > Cons are that to use this method you have to think your database schema in > order to apply it. It's totally impossible to imagine a real standard > function since you can use inner joins, group by, custom function, sql > inline to "forge" your source dataset. > > I have no idea when this is best to use the "ensemblistic" (I don't know > if this word even exists... :D) method. But I think it's totally overkill, > when you have only 2 or 3 lines to update, to create a temporary table and > feed it with data you already know on the python side. > This approach gives nearly magical results with large datasets, given that > you can already use the insert in the temporary table to validate some data > domains and constraints before you send theses rows in the final table. > (This approach can also apply to insert and delete statements..) > > The main useful case is when you have every datas you need already in the > database and a just have to propagate them with some calculus results. > Let see some interesting example: > Imagine you work on a program who create orders and estimates, and you > wan't to give a user to update an order giving a last minute update on the > estimate. This method enables you to update every orders that are different > of estimates, reading and writing the information directly in the dbms, > instead off taking it to a client and sending it back to the dbms, and > without any parameters to introduce in your query. That could even be > launched with a trigger on update, hiding totally the sql part from client > and giving into some integrity checks before validating the transaction. > > We can also consider a small trick to work on small datasets without > creating temporary tables, because this is overkill to create temporary > table before updating 2 or 3 lines and you almost never need to validate > data in the sql side when you send them from python. You can work with a > "fake dataset" inline: > > update table t set name = s.name, city = s.city from > (select 1 as id, 'toto' as name, 'London' as city > union select 2 as id, 'Roger' as name, 'Gandelnans-lès-moulin' as city) > s > where s.id = t.id > > Imho, Postgres enables a cleaner syntax, but the concept is the same: > > with source as ( select 1 as id, 'toto' as name, 'London' as city > union all select 2 as id, 'Roger' as name, > 'Gandelnans-lès-moulin' as city ) > update table t set name = s.name, city = s.city from source s > where s.id = t.id > > I think it's possible to adapt your function to use this type of fake > dataset, if I find some time this week-end I might consider taking it to > write something. > Then it would be easy to benchmark them and see when it's overkill and > when it's good. The main problem remains in the "never uniform" syntax that > remains around these not well known sql habits and the differences between > dbms. > > Regards, > > Christophe. > > > Le 29/03/2013 22:17, Ben Hoyt a écrit : > > The main utility of multiple_update is performance. In my tests it's > 2-4x faster than separate updates, and this is the main reason I'm using it > in our application. > > And putting this logic in a function means there's a nice Pythonic API > to call, rather than hand-building SQL whenever you want to do this. > > In terms of the approach you've given with the temporary table, that's > interesting. I haven't used the UPDATE...FROM syntax or temporary tables > much at all, so I didn't think of this approach. > > It's not quite as simple as the separate updates in one query. However, > I'd be happy to use that approach if it's significantly faster, and if it's > factored out into the multiple_update() function. The nice thing about > having a function is that you can change how it works under the covers, but > the API can remain the same. > > Are you interested in benchmarking my multiple_update (second version) > against your approach and posting results? > > -Ben > > > > On Fri, Mar 29, 2013 at 10:00 PM, zbouboutchi <[email protected] > > wrote: > >> Hi Ben, >> >> I'm confused about the utility of this function.. Since it's possible to >> use external table with update statements, you certainly can do the whole >> update using a single request. >> I prepared a little example that reproduces yours in pure SQL. >> >> -- let's create a similar dataset: >> create temporary table test (id integer, age integer, name text); >> insert into test values (1, 10, 'Rose'), (42, 45, 'Bobby'), (23, 38, >> 'Sarrah'); >> >> -- Here are the values I want to update, in this example, null won't >> update the test table. >> create temporary table source (id integer, age integer, name text); >> insert into source values (42, null, 'Bob'), (23, 30, 'Sarah'); >> >> -- The "Bazinga" part: >> update test t >> set name = coalesce(s.name, t.name), -- a small workaround to keep the >> original value if the source is null >> age = coalesce(s.age, t.age) -- feel free to use any function you >> want, that works exactly the same way as usual. >> from source s >> where s.id = t.id; >> >> -- Here it is: >> select * from test order by id; >> >> 1;10;Rose >> 23;30;Sarah >> 42;50;Bob >> >> -- Bazinga -- >> >> This method is the fastest I know, even on largest datasets where you >> might waste a lot of time and energy to execute millions and millions of >> requests, here you just update your whole dataset with one request and take >> profit of all the database optimisations (indexes etc.). With a modern >> dbms, you can even check the execution plan and get usefull tips on needed >> optimisations before you execute the nightmare. >> The only limitation I know is that you can't update more than 1 table at >> a time, but transactions give the ability to delay constraints checks at >> their end, so, 1 update per table seems legit. >> >> I think not every DBMS can handle this type of update, but it's part of >> SQL standards. >> At least, the famous ones (postgres, mssql, oracle, mysql etc.) have >> these functionnalities, with more or less syntax variations. >> I hope this will help. >> >> Regards, >> >> Christophe NINUCCI >> >> Le 27/03/2013 21:41, Ben Hoyt a écrit : >> >>> Hi folks, >>> >>> In our own utility file, I've added a function called multiple_update() >>> that executes multiple separate update statements in one query. Note that >>> this is not the same as a single db.update() that updates multiple rows -- >>> with this function, the columns and vars for each row can be different. >>> >>> In my testing (Windows 64-bit, db is PostgreSQL running on another >>> machine) I've found this to be about 3-4x as fast as individual >>> db.update()s wrapped in a transaction, so a worthwhile speed increase. The >>> speed increase factor seems to be about constant regardless of the number >>> of rows (initially I was hoping it go up with N, but it didn't in my tests.) >>> >>> Any thoughts on the API signature, and whether this would be a good >>> addition to web.DB? Any comments on this approach in general? Is the way >>> I'm doing the cast to unicode and then '\n'.join(updates) the correct >>> approach? >>> >>> Couple of notes: >>> >>> * If this were part of web.py, obviously it'd be a method on DB instead >>> of a stand-alone function. >>> * dbconn.db_connection is our "current database object". Again, this >>> would go away if it were a DB method. >>> >>> Code below: >>> >>> ----- >>> def multiple_update(table, where, values_vars, database=None, >>> _test=False): >>> """Execute multiple separate update statements in one query. >>> >>> >>> database = web.DB(None, {}) >>> >>> row1 = ({'name': 'Bob'}, {'id': 42}) >>> >>> row2 = ({'name': 'Sarah', 'age': 30}, {'id': 23}) >>> >>> print multiple_update('foo', 'id = $id', [row1, row2], >>> _test=True, database=database) >>> UPDATE foo SET name = 'Bob' WHERE id = 42; >>> UPDATE foo SET age = 30, name = 'Sarah' WHERE id = 23; >>> >>> print multiple_update('foo', 'id = $id', [], _test=True, >>> database=database) >>> None >>> """ >>> if not values_vars: >>> return >>> if database is None: >>> database = dbconn.db_connection >>> >>> updates = [] >>> for values, vars in values_vars: >>> update = u'UPDATE {table} SET {values} WHERE {where};'.format( >>> table=table, >>> values=web.db.sqlwhere(values, ', '), >>> where=database._where(where, vars)) >>> updates.append(unicode(update)) >>> >>> query = '\n'.join(updates) >>> if _test: >>> return query >>> database.query(query) >>> ----- >>> >>> Thanks, >>> Ben. >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "web.py" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/webpy?hl=en. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> >>> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "web.py" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/webpy/dTeS_5Vm7BM/unsubscribe?hl=en. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/webpy?hl=en. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > -- > You received this message because you are subscribed to the Google Groups > "web.py" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/webpy?hl=en. > For more options, visit https://groups.google.com/groups/opt_out. > > > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "web.py" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/webpy/dTeS_5Vm7BM/unsubscribe?hl=en. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/webpy?hl=en. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- You received this message because you are subscribed to the Google Groups "web.py" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/webpy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
