yes, the ORM is designed in such a way that it can actually batch INSERT, UPDATE and DELETE statements together. however, as I leanred early on, in both cases, the feature cannot be used:

with an INSERT, the newly saved object must be populated with its primary key value, as well as any other column defaults that executed. with mysql and sqlite, it has to use cursor.lastrowid to get the row which isnt supported with multiple-row inserts. with postgres, It is conceivable that the code could be enhanced to "detect" beforehand if a post-fetch would be needed, but there is usually a singular pre-execution of a sequence so its still usually not going to scale to 3.5 million...this optimization is not high on my list right now.

with an UPDATE/DELETE, the count of rows affected must be checked for concurrency. which again, is only supported with one row at a time.

The ORM isnt really intended for massive bulk updates (no ORM really is...). if you need to insert 3.5 million rows, you definitely should be using straight SQL for that (or a bulk update tool).

On Mar 27, 2006, at 8:05 PM, Michael Carter wrote:

Hello,
I was just working on a script that uses sqlalchemy to insert 3.5 million rows into a database. It reads each line from a text file, then creates an object out of it. The outline of the code was something like this:

count = 0
while (linesAreLeft):
    count +=1
    line = readline
    makeObjFromLine(line)
    if count % 2000 == 0:
        print "committing..."
        objectstore.commit()
        objectstore.clear()
        print "count: %s" % count




I also have echo turned on for the database engine (mysql). The result of all this is that 2000 lines get read, then when commit is called, a series of 2000 insert statements are issued.

comitting...
[engine] insert ...
[engine] insert ...
[engine] insert ...
<2000 times>
count: 2000

I thought that its possible to have a single insert statement that did something like: insert into table(col1,col2,col3) values ( (a,b,c), (d,e,f), (g, h, i) ); When I have some sort of relatedjoin between foo and bar, a bunch of calls like for bar in Bar.select(): foo.bars.append(bar), then i call objectstore.commit, a single insert is used.

Without going into the UOW code I can imagine how a uow could be structured as to make it hard to figure out what can be inserted in a single statement and what can't, but it seems possible. Or perhaps I missed something in the docs. Let me know.

Thanks,
Michael



-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Sqlalchemy-users mailing list
Sqlalchemy-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sqlalchemy-users

Reply via email to