Re: [webpy] A db.multiple_update() function

zbouboutchi Sat, 30 Mar 2013 00:39:58 -0700

Hi Ben,

thanks for you answer, I did'nt think of the pythonic freedom, this isindeed a really good reason ;).

Indeed, my approach is pure SQL and it's quite complicated to think of ageneric function for all case since the SQL approach needs often acomplete rewrite of functions.Depending on the dbms you use, there are more or less reason to use thismethod, but actually there's a pro and cons balance to use the updatefrom syntax.

Pros are that with this method you can optimize your requestsefficiently, more than thousand time faster than a standard updateapproach for large datasets and complex calculus.Cons are that to use this method you have to think your database schemain order to apply it. It's totally impossible to imagine a real standardfunction since you can use inner joins, group by, custom function, sqlinline to "forge" your source dataset.

I have no idea when this is best to use the "ensemblistic" (I don't knowif this word even exists... :D) method. But I think it's totallyoverkill, when you have only 2 or 3 lines to update, to create atemporary table and feed it with data you already know on the python side.This approach gives nearly magical results with large datasets, giventhat you can already use the insert in the temporary table to validatesome data domains and constraints before you send theses rows in thefinal table. (This approach can also apply to insert and deletestatements..)

The main useful case is when you have every datas you need already inthe database and a just have to propagate them with some calculus results.

Let see some interesting example:

Imagine you work on a program who create orders and estimates, and youwan't to give a user to update an order giving a last minute update onthe estimate. This method enables you to update every orders that aredifferent of estimates, reading and writing the information directly inthe dbms, instead off taking it to a client and sending it back to thedbms, and without any parameters to introduce in your query. That couldeven be launched with a trigger on update, hiding totally the sql partfrom client and giving into some integrity checks before validating thetransaction.

We can also consider a small trick to work on small datasets withoutcreating temporary tables, because this is overkill to create temporarytable before updating 2 or 3 lines and you almost never need to validatedata in the sql side when you send them from python. You can work with a"fake dataset" inline:


update table t set name = s.name, city = s.city from
  (select 1 as id, 'toto' as name, 'London' as city

union select 2 as id, 'Roger' as name, 'Gandelnans-lès-moulin' ascity) s

where s.id = t.id

Imho, Postgres enables a cleaner syntax, but the concept is the same:

with source as ( select 1 as id, 'toto' as name, 'London' as city

union all select 2 as id, 'Roger' as name,'Gandelnans-lès-moulin' as city )

update table t set name = s.name, city = s.city from source s
where s.id = t.id

I think it's possible to adapt your function to use this type of fakedataset, if I find some time this week-end I might consider taking it towrite something.Then it would be easy to benchmark them and see when it's overkill andwhen it's good. The main problem remains in the "never uniform" syntaxthat remains around these not well known sql habits and the differencesbetween dbms.


Regards,

Christophe.


Le 29/03/2013 22:17, Ben Hoyt a écrit :

The main utility of multiple_update is performance. In my tests it's2-4x faster than separate updates, and this is the main reason I'musing it in our application.

And putting this logic in a function means there's a nice Pythonic APIto call, rather than hand-building SQL whenever you want to do this.

In terms of the approach you've given with the temporary table, that'sinteresting. I haven't used the UPDATE...FROM syntax or temporarytables much at all, so I didn't think of this approach.

It's not quite as simple as the separate updates in one query.However, I'd be happy to use that approach if it's significantlyfaster, and if it's factored out into the multiple_update() function.The nice thing about having a function is that you can change how itworks under the covers, but the API can remain the same.

Are you interested in benchmarking my multiple_update (second version)against your approach and posting results?


-Ben

On Fri, Mar 29, 2013 at 10:00 PM, zbouboutchi<[email protected] <mailto:[email protected]>> wrote:


    Hi Ben,

    I'm confused about the utility of this function.. Since it's
    possible to use external table with update statements, you
    certainly can do the whole update using a single request.
    I prepared a little example that reproduces yours in pure SQL.

    -- let's create a similar dataset:
    create temporary table test (id integer, age integer, name text);
    insert into test values (1, 10, 'Rose'), (42, 45, 'Bobby'), (23,
    38, 'Sarrah');

    -- Here are the values I want to update, in this example, null
    won't update the test table.
    create temporary table source (id integer, age integer, name text);
    insert into source values (42, null, 'Bob'), (23, 30, 'Sarah');

    -- The "Bazinga" part:
    update test t
      set name = coalesce(s.name <http://s.name>, t.name
    <http://t.name>), -- a small workaround to keep the original value
    if the source is null
          age = coalesce(s.age, t.age) -- feel free to use any
    function you want, that works exactly the same way as usual.
      from source s
      where s.id <http://s.id> = t.id <http://t.id>;

    -- Here it is:
    select * from test order by id;

    1;10;Rose
    23;30;Sarah
    42;50;Bob

    -- Bazinga --

    This method is the fastest I know, even on largest datasets where
    you might waste a lot of time and energy to execute millions and
    millions of requests, here you just update your whole dataset with
    one request and take profit of all the database optimisations
    (indexes etc.). With a modern dbms, you can even check the
    execution plan and get usefull tips on needed optimisations before
    you execute the nightmare.
    The only limitation I know is that you can't update more than 1
    table at a time, but transactions give the ability to delay
    constraints checks at their end, so, 1 update per table seems legit.

    I think not every DBMS can handle this type of update, but it's
    part of SQL standards.
    At least, the famous ones (postgres, mssql, oracle, mysql etc.)
    have these functionnalities, with more or less syntax variations.
    I hope this will help.

    Regards,

    Christophe NINUCCI

    Le 27/03/2013 21:41, Ben Hoyt a écrit :

        Hi folks,

        In our own utility file, I've added a function called
        multiple_update() that executes multiple separate update
        statements in one query. Note that this is not the same as a
        single db.update() that updates multiple rows -- with this
        function, the columns and vars for each row can be different.

        In my testing (Windows 64-bit, db is PostgreSQL running on
        another machine) I've found this to be about 3-4x as fast as
        individual db.update()s wrapped in a transaction, so a
        worthwhile speed increase. The speed increase factor seems to
        be about constant regardless of the number of rows (initially
        I was hoping it go up with N, but it didn't in my tests.)

        Any thoughts on the API signature, and whether this would be a
        good addition to web.DB? Any comments on this approach in
        general? Is the way I'm doing the cast to unicode and then
        '\n'.join(updates) the correct approach?

        Couple of notes:

        * If this were part of web.py, obviously it'd be a method on
        DB instead of a stand-alone function.
        * dbconn.db_connection is our "current database object".
        Again, this would go away if it were a DB method.

        Code below:

        -----
        def multiple_update(table, where, values_vars, database=None,
        _test=False):
            """Execute multiple separate update statements in one query.

            >>> database = web.DB(None, {})
            >>> row1 = ({'name': 'Bob'}, {'id': 42})
            >>> row2 = ({'name': 'Sarah', 'age': 30}, {'id': 23})
            >>> print multiple_update('foo', 'id = $id', [row1, row2],
        _test=True, database=database)
            UPDATE foo SET name = 'Bob' WHERE id = 42;
            UPDATE foo SET age = 30, name = 'Sarah' WHERE id = 23;
            >>> print multiple_update('foo', 'id = $id', [],
        _test=True, database=database)
            None
            """
            if not values_vars:
                return
            if database is None:
                database = dbconn.db_connection

            updates = []
            for values, vars in values_vars:
                update = u'UPDATE {table} SET {values} WHERE
        {where};'.format(
                        table=table,
                        values=web.db.sqlwhere(values, ', '),
                        where=database._where(where, vars))
                updates.append(unicode(update))

            query = '\n'.join(updates)
            if _test:
                return query
            database.query(query)
        -----

        Thanks,
        Ben.

--You received this message because you are subscribed to the

        Google Groups "web.py" group.
        To unsubscribe from this group and stop receiving emails from
        it, send an email to [email protected]
        <mailto:webpy%[email protected]>.

        To post to this group, send email to [email protected]
        <mailto:[email protected]>.
        Visit this group at http://groups.google.com/group/webpy?hl=en.
        For more options, visit https://groups.google.com/groups/opt_out.

--You received this message because you are subscribed to a topic in

    the Google Groups "web.py" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/webpy/dTeS_5Vm7BM/unsubscribe?hl=en.
    To unsubscribe from this group and all its topics, send an email
    to [email protected]
    <mailto:webpy%[email protected]>.
    To post to this group, send email to [email protected]
    <mailto:[email protected]>.
    Visit this group at http://groups.google.com/group/webpy?hl=en.
    For more options, visit https://groups.google.com/groups/opt_out.



--

You received this message because you are subscribed to the GoogleGroups "web.py" group.To unsubscribe from this group and stop receiving emails from it, sendan email to [email protected].

To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/webpy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


--
You received this message because you are subscribed to the Google Groups 
"web.py" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/webpy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [webpy] A db.multiple_update() function

Reply via email to