ok, executemany() it is. Thank you for your help
On Nov 2, 10:36 am, Michael Bayer <[email protected]> wrote: > On Nov 1, 2009, at 5:37 PM, BEES INC wrote: > > > > > > > Ah yes I'm not sure of the terminology either, i got the 'extended' > > from the mysqldump usage info (mysqldump -e, on by default but you can > > turn it off with mysqldump --extended-insert=false) > > > Anyway, from looking at the mysqldump output, extended inserts are > > covered by this bit from the INSERT documentation: > > > "INSERT statements that use VALUES syntax can insert multiple rows. To > > do this, include multiple lists of column values, each enclosed within > > parentheses and separated by commas. Example: > > > INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);" > > > So its the difference between a single INSERT with lots of data vs a > > separate INSERT for each row > > > The execute many thing is an option, but the relationships are a > > little complex (26 tables and counting). There is only one object > > added to the session, everything else is linked from there. > > Implementing the inserts without the ORM might get a little involved > > when it comes to ensuring the relations are established correctly, but > > if that's the way to do it then so be it. > > > I am also open to the possibility to rejiggering my schema/session > > usage to fit better with SA, which is a fairly undefined goal I know. > > One thing that is very common in my program is a lot of (datetime, > > float) observations. They are all mapped to different objects and all > > go in different tables as they belong to different objects by the > > business logic. I'm not sure if it would be better to put them in one > > table and use a secondary relationship table with 2 foreign keys > > mapping objects to their given date/float observations. Indeed I had > > it that way originally but it has not seemed to have any impact on > > performance either way. > > the ORM can't bundle INSERT statements together in any way since its > necessary to retrieve information about each INSERT statement > individually, particularly for MySQL where it's the only way we can > get at the "last inserted ID", necessary for assigning identity to > each ORM-mapped instance. Additionally, when many objects are being > INSERTed by the ORM, there's no guarantee that each INSERT statement > would render in the same way, since missing values on an instance > translate into that column being omitted from just that statement so > that a server side default can fire off. > > so you can have effortless or large-scale-performant for a large bulk > insert, take your pick. > > I also would hypothesize that standard executemany() is a better > choice than than the "INSERT" with many values version, since the > latter places a toll on mysql's query parser and buffer. > > > > > Thanks for your reply > > > On Nov 2, 5:38 am, Michael Bayer <[email protected]> wrote: > >> On Nov 1, 2009, at 9:52 AM, BEES INC wrote: > > >>> Hi > > >>> Does sqlalchemy support mysql extended insert syntax? > >>> I have been chasing down very slow performance. My program > >>> generates a > >>> whole bunch of data then does a big SA commit() at the end, which is > >>> proving to be inconveniently slow. > > >>> This is a raw import of a mysqldump without extended inserts > > >>> dbregress$ time mysql -ubees -pbees tt_test < up50ma.dbdump6 > > >>> real 0m33.043s > >>> user 0m1.343s > >>> sys 0m1.572s > >>> dbregress$ > > >>> Which roughly corresponds to the time it is taking SA to do the > >>> commit. > > >>> This is a raw import using extended inserts: > > >>> dbregress$ time mysql -ubees -pbees tt_test < up50ma.dbdump6.e > > >>> real 0m5.517s > >>> user 0m0.081s > >>> sys 0m0.024s > >>> dbregress$ > > >>> Which is clearly noticeably faster. > > >>> Also this is for a test dataset, the real dataset is much larger, > >>> and > >>> when used the final commit can take upwards of 25 minutes. > > >>> I could not see anything in the documentation regarding this, > >>> apologies if i missed something. I tried periodic commits vs a > >>> single > >>> commit at the end but this did not offer any real difference in > >>> performance, and also introduced other issues. > > >> you'll get much faster performance for bulk inserts if you forego the > >> usage of the ORM and use executemany() syntax which is documented > >> here: > > >>http://www.sqlalchemy.org/docs/05/sqlexpression.html#executing- > >> multip... > > >> I've googled and searched MySQL's site for "extended insert syntax" > >> and while there are many mentions of it, there are no definitions or > >> examples I can find. For example the word "extended" does not appear > >> on this page in any form: > >> http://dev.mysql.com/doc/refman/5.0/en/insert.html > >> . --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---
