Hi Ben,

thanks for you answer, I did'nt think of the pythonic freedom, this is indeed a really good reason ;).

Indeed, my approach is pure SQL and it's quite complicated to think of a generic function for all case since the SQL approach needs often a complete rewrite of functions. Depending on the dbms you use, there are more or less reason to use this method, but actually there's a pro and cons balance to use the update from syntax.

Pros are that with this method you can optimize your requests efficiently, more than thousand time faster than a standard update approach for large datasets and complex calculus. Cons are that to use this method you have to think your database schema in order to apply it. It's totally impossible to imagine a real standard function since you can use inner joins, group by, custom function, sql inline to "forge" your source dataset.

I have no idea when this is best to use the "ensemblistic" (I don't know if this word even exists... :D) method. But I think it's totally overkill, when you have only 2 or 3 lines to update, to create a temporary table and feed it with data you already know on the python side. This approach gives nearly magical results with large datasets, given that you can already use the insert in the temporary table to validate some data domains and constraints before you send theses rows in the final table. (This approach can also apply to insert and delete statements..)

The main useful case is when you have every datas you need already in the database and a just have to propagate them with some calculus results.
Let see some interesting example:
Imagine you work on a program who create orders and estimates, and you wan't to give a user to update an order giving a last minute update on the estimate. This method enables you to update every orders that are different of estimates, reading and writing the information directly in the dbms, instead off taking it to a client and sending it back to the dbms, and without any parameters to introduce in your query. That could even be launched with a trigger on update, hiding totally the sql part from client and giving into some integrity checks before validating the transaction.

We can also consider a small trick to work on small datasets without creating temporary tables, because this is overkill to create temporary table before updating 2 or 3 lines and you almost never need to validate data in the sql side when you send them from python. You can work with a "fake dataset" inline:

update table t set name = s.name, city = s.city from
  (select 1 as id, 'toto' as name, 'London' as city
union select 2 as id, 'Roger' as name, 'Gandelnans-lès-moulin' as city) s
where s.id = t.id

Imho, Postgres enables a cleaner syntax, but the concept is the same:

with source as ( select 1 as id, 'toto' as name, 'London' as city
union all select 2 as id, 'Roger' as name, 'Gandelnans-lès-moulin' as city )
update table t set name = s.name, city = s.city from source s
where s.id = t.id

I think it's possible to adapt your function to use this type of fake dataset, if I find some time this week-end I might consider taking it to write something. Then it would be easy to benchmark them and see when it's overkill and when it's good. The main problem remains in the "never uniform" syntax that remains around these not well known sql habits and the differences between dbms.

Regards,

Christophe.


Le 29/03/2013 22:17, Ben Hoyt a écrit :
The main utility of multiple_update is performance. In my tests it's 2-4x faster than separate updates, and this is the main reason I'm using it in our application.

And putting this logic in a function means there's a nice Pythonic API to call, rather than hand-building SQL whenever you want to do this.

In terms of the approach you've given with the temporary table, that's interesting. I haven't used the UPDATE...FROM syntax or temporary tables much at all, so I didn't think of this approach.

It's not quite as simple as the separate updates in one query. However, I'd be happy to use that approach if it's significantly faster, and if it's factored out into the multiple_update() function. The nice thing about having a function is that you can change how it works under the covers, but the API can remain the same.

Are you interested in benchmarking my multiple_update (second version) against your approach and posting results?

-Ben



On Fri, Mar 29, 2013 at 10:00 PM, zbouboutchi <[email protected] <mailto:[email protected]>> wrote:

    Hi Ben,

    I'm confused about the utility of this function.. Since it's
    possible to use external table with update statements, you
    certainly can do the whole update using a single request.
    I prepared a little example that reproduces yours in pure SQL.

    -- let's create a similar dataset:
    create temporary table test (id integer, age integer, name text);
    insert into test values (1, 10, 'Rose'), (42, 45, 'Bobby'), (23,
    38, 'Sarrah');

    -- Here are the values I want to update, in this example, null
    won't update the test table.
    create temporary table source (id integer, age integer, name text);
    insert into source values (42, null, 'Bob'), (23, 30, 'Sarah');

    -- The "Bazinga" part:
    update test t
      set name = coalesce(s.name <http://s.name>, t.name
    <http://t.name>), -- a small workaround to keep the original value
    if the source is null
          age = coalesce(s.age, t.age) -- feel free to use any
    function you want, that works exactly the same way as usual.
      from source s
      where s.id <http://s.id> = t.id <http://t.id>;

    -- Here it is:
    select * from test order by id;

    1;10;Rose
    23;30;Sarah
    42;50;Bob

    -- Bazinga --

    This method is the fastest I know, even on largest datasets where
    you might waste a lot of time and energy to execute millions and
    millions of requests, here you just update your whole dataset with
    one request and take profit of all the database optimisations
    (indexes etc.). With a modern dbms, you can even check the
    execution plan and get usefull tips on needed optimisations before
    you execute the nightmare.
    The only limitation I know is that you can't update more than 1
    table at a time, but transactions give the ability to delay
    constraints checks at their end, so, 1 update per table seems legit.

    I think not every DBMS can handle this type of update, but it's
    part of SQL standards.
    At least, the famous ones (postgres, mssql, oracle, mysql etc.)
    have these functionnalities, with more or less syntax variations.
    I hope this will help.

    Regards,

    Christophe NINUCCI

    Le 27/03/2013 21:41, Ben Hoyt a écrit :

        Hi folks,

        In our own utility file, I've added a function called
        multiple_update() that executes multiple separate update
        statements in one query. Note that this is not the same as a
        single db.update() that updates multiple rows -- with this
        function, the columns and vars for each row can be different.

        In my testing (Windows 64-bit, db is PostgreSQL running on
        another machine) I've found this to be about 3-4x as fast as
        individual db.update()s wrapped in a transaction, so a
        worthwhile speed increase. The speed increase factor seems to
        be about constant regardless of the number of rows (initially
        I was hoping it go up with N, but it didn't in my tests.)

        Any thoughts on the API signature, and whether this would be a
        good addition to web.DB? Any comments on this approach in
        general? Is the way I'm doing the cast to unicode and then
        '\n'.join(updates) the correct approach?

        Couple of notes:

        * If this were part of web.py, obviously it'd be a method on
        DB instead of a stand-alone function.
        * dbconn.db_connection is our "current database object".
        Again, this would go away if it were a DB method.

        Code below:

        -----
        def multiple_update(table, where, values_vars, database=None,
        _test=False):
            """Execute multiple separate update statements in one query.

            >>> database = web.DB(None, {})
            >>> row1 = ({'name': 'Bob'}, {'id': 42})
            >>> row2 = ({'name': 'Sarah', 'age': 30}, {'id': 23})
            >>> print multiple_update('foo', 'id = $id', [row1, row2],
        _test=True, database=database)
            UPDATE foo SET name = 'Bob' WHERE id = 42;
            UPDATE foo SET age = 30, name = 'Sarah' WHERE id = 23;
            >>> print multiple_update('foo', 'id = $id', [],
        _test=True, database=database)
            None
            """
            if not values_vars:
                return
            if database is None:
                database = dbconn.db_connection

            updates = []
            for values, vars in values_vars:
                update = u'UPDATE {table} SET {values} WHERE
        {where};'.format(
                        table=table,
                        values=web.db.sqlwhere(values, ', '),
                        where=database._where(where, vars))
                updates.append(unicode(update))

            query = '\n'.join(updates)
            if _test:
                return query
            database.query(query)
        -----

        Thanks,
        Ben.
-- You received this message because you are subscribed to the
        Google Groups "web.py" group.
        To unsubscribe from this group and stop receiving emails from
        it, send an email to [email protected]
        <mailto:webpy%[email protected]>.

        To post to this group, send email to [email protected]
        <mailto:[email protected]>.
        Visit this group at http://groups.google.com/group/webpy?hl=en.
        For more options, visit https://groups.google.com/groups/opt_out.



-- You received this message because you are subscribed to a topic in
    the Google Groups "web.py" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/webpy/dTeS_5Vm7BM/unsubscribe?hl=en.
    To unsubscribe from this group and all its topics, send an email
    to [email protected]
    <mailto:webpy%[email protected]>.
    To post to this group, send email to [email protected]
    <mailto:[email protected]>.
    Visit this group at http://groups.google.com/group/webpy?hl=en.
    For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups "web.py" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/webpy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.



--
You received this message because you are subscribed to the Google Groups 
"web.py" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/webpy?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to