Re: PEP 249 (database api) -- executemany() with iterable?
In message i96squ$aq...@localhost.localdomain, Martin Gregorie wrote: On Thu, 14 Oct 2010 16:36:34 +1300, Lawrence D'Oliveiro wrote: In message 4cb5e659$0$1650$742ec...@news.sonic.net, John Nagle wrote: Also note that there are some issues with doing a huge volume of updates in one MySQL InnoDB transaction. The system has to keep the data needed to undo the updates, and there's a limit on the amount of pending transaction history that can be stored. How does “load data” avoid this? Is that not a transaction too? Not usually. Its faster because there's no journalling overhead. So what happens if there’s an error part way? Do you end up with some records inserted and others not? The loader takes out an exclusive table lock, dumps the data into the table, rebuilds indexes and releases the lock. That’s not acceptable in a situation where other processes are concurrently making queries on the table. Seems to me this isn’t going to help, since both old and new tables are on the same disk, after all. And it’s the disk access that’s the bottleneck. There's a lot of overhead in journalling - much more than in applying changes to a table. The before and after images *must* be flushed to disk on commit. In UNIX terms fsync() must be called on the journal file(s) and this is an expensive operation on all OSes because committing a series of small transactions can cause the same disk block to be written several times. However, the table pages can safely be left in the DBMS cache and flushed as part of normal cache operation since, after a crash, the table changes can always be recovered from a journal roll-forward. A good DBMS will do that automatically when its restarted. And how does this address the point of whether to load the new records into a new table or the same table? -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On Sun, 17 Oct 2010 11:27:17 +1300, Lawrence D'Oliveiro wrote: In message i96squ$aq...@localhost.localdomain, Martin Gregorie wrote: On Thu, 14 Oct 2010 16:36:34 +1300, Lawrence D'Oliveiro wrote: In message 4cb5e659$0$1650$742ec...@news.sonic.net, John Nagle wrote: Also note that there are some issues with doing a huge volume of updates in one MySQL InnoDB transaction. The system has to keep the data needed to undo the updates, and there's a limit on the amount of pending transaction history that can be stored. How does “load data” avoid this? Is that not a transaction too? Not usually. Its faster because there's no journalling overhead. So what happens if there’s an error part way? Do you end up with some records inserted and others not? Since the load is usually used to populate an empty table, you simply empty the table and do it again if it doesn't complete for some reason. If you're doing a partial load you'd normally dump the table and/or take a backup beforehand. The loader takes out an exclusive table lock, dumps the data into the table, rebuilds indexes and releases the lock. That’s not acceptable in a situation where other processes are concurrently making queries on the table. Obviously, but you're unlikely to be running anything while you're doing a bulk load. If the database is used to support interactive processing you'd normally only use bulk loading during recovery after a disk failure or as part of a database migration or restructuring exercise. In fact the only case I know of where bulk loads are a normal part of the data life cycle is for data warehousing, where bulk loads are the norm. The last month's transactions are removed from the online system and bulk loaded into the data warehouse for analysis. In this case the bulk loads are either done overnight or during a weekend. However, any RDBMS that has been optimised for data warehousing will almost certainly have the ability to segment its fact table. This has two benefits: - you can bulk load a new segment offline and put it online when the load is complete. This can be done during normal operation. - the DBMS can automatically split a query and run parallel copies against each segment before combining the result sets for return to the program that made the query. Seems to me this isn’t going to help, since both old and new tables are on the same disk, after all. And it’s the disk access that’s the bottleneck. There's a lot of overhead in journalling - much more than in applying changes to a table. The before and after images *must* be flushed to disk on commit. In UNIX terms fsync() must be called on the journal file(s) and this is an expensive operation on all OSes because committing a series of small transactions can cause the same disk block to be written several times. However, the table pages can safely be left in the DBMS cache and flushed as part of normal cache operation since, after a crash, the table changes can always be recovered from a journal roll-forward. A good DBMS will do that automatically when its restarted. And how does this address the point of whether to load the new records into a new table or the same table? That's system design choice. There are a limited set of options and its up to the designer to choose one: - if you want to load data into a new table and rename CURRENT-OLD, NEW-CURRENT you have to design all queries to operate on both OLD and CURRENT tables - if your DBMS allows table segmentation and gives the ability to offline segments you can bulk load as I already described. - you can stop the system during a quiet time while you do a bulk load and/or minimise overheads by using very large transaction. - you can bite the bullet, accept the performance hit and simply run a relatively slow load process using small transactions during normal operation. There are different overheads associated with each choice. A good system designer will understand them, know their relative weight for the target hardware and database, and use this information to optimise the design. What I said about transactional and journalling overheads was explanation: unless you know that you'll never understand why you should expect bulk loading to be a lot faster than doing it with transactions designed to logically group related operations or (much, much worse) to use implied commits. This really is the worse of all worlds since it maximises journalling overhead. -- martin@ | Martin Gregorie gregorie. | Essex, UK org | -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
In message ecdbf6b3-3f99-4b59-b7f8-85bd22f97...@w9g2000prc.googlegroups.com, Steve Howell wrote: On Oct 13, 8:32 pm, Lawrence D'Oliveiro l...@geek-central.gen.new_zealand wrote: In message d2451907-c0d2-4571-b3e1-1e4d4f66a...@a7g2000prb.googlegroups.com, Steve Howell wrote: Bulk-load strategies usually solve one or more of these problems: network latency That’s not an issue. This is a bulk operation, after all. index maintenance during the upload There are usually options to temporarily turn this off. parsing of the SQL Not usually a problem, as far as I can tell. reallocating storage You mean for thr SQL? Again, not usually a problem, as far as I can tell. If you are avoiding network latency and turning off indexes, then you are using some kind of a bulk-load strategy. I thought we were talking about “load data” versus using simple variants of “insert”. At least, that’s what I was talking about. What were you talking about? If you are not concerned about parsing costs or storage churn ... What is this “storage churn” thing, and why are you raising it now? -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
Terry Reedy wrote: On 10/12/2010 11:10 AM, Roy Smith wrote: PEP 249 says about executemany(): Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters. are there any plans to update the api to allow an iterable instead of a sequence? That question would best be addressed to the pep author Marc-André Lemburg mal at lemburg.com Questions about the DB-API should be discussed on the Python DB-SIG list (put on CC): http://mail.python.org/mailman/listinfo/db-sig Regarding your question: At the time the PEP was written, Python did not have iterables. However, even with iterables, please keep in mind that pushing the data row-per-row over a network does not result in good performance, so using an iterable will make you update slower. cursor.executemany() is meant to allow the database module to optimize sending bulk data to the database and ideally, it will send the whole sequence to the database in one go. If you want to efficiently run an update with millions of entries based on an iterable, it is better to use an intermediate loop which builds sequences of say 1000 rows and then processes those with a cursor.executemany() call. You will likely also do this in multiple transactions to prevent the database from creating a multi-GB transaction log for the upload. Another aspect to keep in mind is error reporting. When sending bulk data to a database, some databases only report error for the whole data block, so finding the problem can be troublesome. For that reason, using smaller blocks is better even when having the data available as real sequence. Hope that helps, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 14 2010) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On Thu, 14 Oct 2010 16:36:34 +1300, Lawrence D'Oliveiro wrote: In message 4cb5e659$0$1650$742ec...@news.sonic.net, John Nagle wrote: Also note that there are some issues with doing a huge volume of updates in one MySQL InnoDB transaction. The system has to keep the data needed to undo the updates, and there's a limit on the amount of pending transaction history that can be stored. How does “load data” avoid this? Is that not a transaction too? Not usually. Its faster because there's no journalling overhead. The loader takes out an exclusive table lock, dumps the data into the table, rebuilds indexes and releases the lock. I can't comment about MySQL (don't use it) but this has been the case on the RDBMS databases I have used. Seems to me this isn’t going to help, since both old and new tables are on the same disk, after all. And it’s the disk access that’s the bottleneck. There's a lot of overhead in journalling - much more than in applying changes to a table. The before and after images *must* be flushed to disk on commit. In UNIX terms fsync() must be called on the journal file(s) and this is an expensive operation on all OSes because committing a series of small transactions can cause the same disk block to be written several times. However, the table pages can safely be left in the DBMS cache and flushed as part of normal cache operation since, after a crash, the table changes can always be recovered from a journal roll-forward. A good DBMS will do that automatically when its restarted. -- martin@ | Martin Gregorie gregorie. | Essex, UK org | -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On Oct 13, 8:32 pm, Lawrence D'Oliveiro l...@geek- central.gen.new_zealand wrote: In message d2451907-c0d2-4571-b3e1-1e4d4f66a...@a7g2000prb.googlegroups.com, Steve Howell wrote: Bulk-load strategies usually solve one or more of these problems: network latency That’s not an issue. This is a bulk operation, after all. index maintenance during the upload There are usually options to temporarily turn this off. parsing of the SQL Not usually a problem, as far as I can tell. reallocating storage You mean for thr SQL? Again, not usually a problem, as far as I can tell. If you are avoiding network latency and turning off indexes, then you are using some kind of a bulk-load strategy. If you are not concerned about parsing costs or storage churn, then you are simply evaluating the costs of a non-bulk-oriented strategy and determining that they are minimal for your needs, which is fine. -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
In message mailman.1711.1287055174.29448.python-l...@python.org, M.-A. Lemburg wrote: However, even with iterables, please keep in mind that pushing the data row-per-row over a network does not result in good performance, so using an iterable will make you update slower. cursor.executemany() is meant to allow the database module to optimize sending bulk data to the database and ideally, it will send the whole sequence to the database in one go. You seem to be assuming that using an iterator precludes buffering. What’s wrong with evaluating the iterator to produce as many records as the API implementation finds convenient to send at once? -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On 12 Oct, 20:21, J. Gerlach gerlach_jo...@web.de wrote: Am 12.10.2010 17:10, schrieb Roy Smith: [A]re there any plans to update the api to allow an iterable instead of a sequence? sqlite3 (standard library, python 2.6.6., Windows 32Bit) does that already:: import sqlite3 as sql connection = sql.connect(:memory:) cursor = connection.execute( CREATE TABLE test ( id INTEGER PRIMARY KEY AUTOINCREMENT, text TEXT) ;) connection.commit() cursor.executemany( INSERT INTO test (text) VALUES ( ? ); , # A generator expression - delivers one row at a time ( (hello nr %03d! % i,) for i in xrange(100))) connection.commit() cursor.execute(SELECT * FROM test) for id_, text in cursor.fetchall(): print text, id_ What happens if you do itertools.repeat(0) instead of xrange(100) ? -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
Am 13.10.2010 14:26, schrieb Jon Clements: On 12 Oct, 20:21, J. Gerlach gerlach_jo...@web.de wrote: Am 12.10.2010 17:10, schrieb Roy Smith: [A]re there any plans to update the api to allow an iterable instead of a sequence? [sqlite3 example snipped] What happens if you do itertools.repeat(0) instead of xrange(100) ? And there I see why I don't want ``executemany()`` to take a generator -- the never-ending ones will do some damage. But on a similar note: would a ``execute_n_repeats(statement, iterable, maximum_rows)`` be better? Jörg -- Have a nice day! -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On Wed, 13 Oct 2010 14:01:39 +1300, Lawrence D'Oliveiro wrote: In message 4cb4ba4e$0$1641$742ec...@news.sonic.net, John Nagle wrote: In general, if you find yourself making millions of SQL database requests in a loop, you're doing it wrong. I’ve done this. Not millions, but certainly on the order of tens of thousands. I've done it doesn't disprove that it is the wrong thing to do. It may simply mean that you too are doing it wrong. I have no opinion either way, I'm just saying. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On 10/12/2010 6:01 PM, Lawrence D'Oliveiro wrote: In message4cb4ba4e$0$1641$742ec...@news.sonic.net, John Nagle wrote: In general, if you find yourself making millions of SQL database requests in a loop, you're doing it wrong. I’ve done this. Not millions, but certainly on the order of tens of thousands. It's a scaling issue. Millions of INSERT or UPDATE requests can take hours. That's when you need the speedup of bulk loading. Big database loads are usually done by creating a text file with the desired data, then using a LOAD DATA INFILE command. May not always be flexible enough. True; you can't do anything with LOAD DATA INFILE but load data. If you need selects or joins within inserts, you may have to do it the long way. This (in MySQL) is tens to hundreds of times faster than doing individual INSERT or UPDATE commands. Why should that be? The database cannot read a text file any faster than I can. Because the indices are updated in one big update, rather than after each change to the database. Also note that there are some issues with doing a huge volume of updates in one MySQL InnoDB transaction. The system has to keep the data needed to undo the updates, and there's a limit on the amount of pending transaction history that can be stored. It's common to load data into a new, empty table, then, once the reload has succeeded, do a RENAME like CURRENT-OLD, NEW-CURRENT. Rename of multiple databases is atomic and interlocked with other operations, so you can replace an entire table on a live server. I have some bulk databases which are updated from external sources. The PhishTank database is updated with UPDATE statements every three hours. But the Open Directory database is updated by downloading a big gzipped file of XML, creating a new database table, then renaming. That load takes hours, once a week. (All this applies to MySQL, and to some extent, Postgres. If you're using SQLite, it's different. But a million records is big for SQLite, which is, after all, a lite database. At 10,000 records, you don't have to worry about any of this stuff. At 1,000,000 records, you do.) John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
In message d2451907-c0d2-4571-b3e1-1e4d4f66a...@a7g2000prb.googlegroups.com, Steve Howell wrote: Bulk-load strategies usually solve one or more of these problems: network latency That’s not an issue. This is a bulk operation, after all. index maintenance during the upload There are usually options to temporarily turn this off. parsing of the SQL Not usually a problem, as far as I can tell. reallocating storage You mean for thr SQL? Again, not usually a problem, as far as I can tell. -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
In message 4cb5e659$0$1650$742ec...@news.sonic.net, John Nagle wrote: Also note that there are some issues with doing a huge volume of updates in one MySQL InnoDB transaction. The system has to keep the data needed to undo the updates, and there's a limit on the amount of pending transaction history that can be stored. How does “load data” avoid this? Is that not a transaction too? It's common to load data into a new, empty table, then, once the reload has succeeded, do a RENAME like CURRENT-OLD, NEW-CURRENT. Rename of multiple databases is atomic and interlocked with other operations, so you can replace an entire table on a live server. Is that to avoid impacting searches on the existing data while loading the new records? Seems to me this isn’t going to help, since both old and new tables are on the same disk, after all. And it’s the disk access that’s the bottleneck. -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On 12 Oct, 16:10, Roy Smith r...@panix.com wrote: PEP 249 says about executemany(): Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters. are there any plans to update the api to allow an iterable instead of a sequence? I'm not understanding (probably me). Example? -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On Oct 12, 1:20 pm, Jon Clements jon...@googlemail.com wrote: On 12 Oct, 16:10, Roy Smith r...@panix.com wrote: PEP 249 says about executemany(): Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters. are there any plans to update the api to allow an iterable instead of a sequence? I'm not understanding (probably me). Example? I have a dictionary, d, which has a million items in it, and want to do something like: executemany(update foo set bar = %s where id = %s, d.iteritems()) If executemany accepted an iterable, that would work. But, it only accepts a sequence, so I need to do: executemany(update foo set bar = %s where id = %s, d.items()) which generates a million-item temporary list. Or am I mis- understanding the PEP? -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On 12 Oct, 18:32, Roy Smith r...@panix.com wrote: On Oct 12, 1:20 pm, Jon Clements jon...@googlemail.com wrote: On 12 Oct, 16:10, Roy Smith r...@panix.com wrote: PEP 249 says about executemany(): Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters. are there any plans to update the api to allow an iterable instead of a sequence? I'm not understanding (probably me). Example? I have a dictionary, d, which has a million items in it, and want to do something like: executemany(update foo set bar = %s where id = %s, d.iteritems()) If executemany accepted an iterable, that would work. But, it only accepts a sequence, so I need to do: executemany(update foo set bar = %s where id = %s, d.items()) which generates a million-item temporary list. Or am I mis- understanding the PEP? Interesting, but here's my guess... Replace d.items() with itertools.repeat( ('a', 'b') ) So, if you have a sequence, which has a length and known size, at least you can have an attempt at the DB operations: whether the transaction fails or not is another thing...In short, a sequence is finite, while an iterable may be infinite. That's just my guess and makes sense to me at least! Jon. -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On 10/12/2010 11:10 AM, Roy Smith wrote: PEP 249 says about executemany(): Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters. are there any plans to update the api to allow an iterable instead of a sequence? That question would best be addressed to the pep author Marc-André Lemburg mal at lemburg.com -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
In article 2eeb1c54-83f5-4375-93fb-478bdbd7e...@j25g2000yqa.googlegroups.com, Jon Clements jon...@googlemail.com wrote: On 12 Oct, 18:32, Roy Smith r...@panix.com wrote: On Oct 12, 1:20 pm, Jon Clements jon...@googlemail.com wrote: On 12 Oct, 16:10, Roy Smith r...@panix.com wrote: PEP 249 says about executemany(): Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters. are there any plans to update the api to allow an iterable instead of a sequence? I'm not understanding (probably me). Example? I have a dictionary, d, which has a million items in it, and want to do something like: executemany(update foo set bar = %s where id = %s, d.iteritems()) If executemany accepted an iterable, that would work. But, it only accepts a sequence, so I need to do: executemany(update foo set bar = %s where id = %s, d.items()) which generates a million-item temporary list. Or am I mis- understanding the PEP? Interesting, but here's my guess... Replace d.items() with itertools.repeat( ('a', 'b') ) So, if you have a sequence, which has a length and known size, at least you can have an attempt at the DB operations: whether the transaction fails or not is another thing...In short, a sequence is finite, while an iterable may be infinite. Also, keep in mind that PEP 249 DB adapter implementations are typically wrappers around a lower-level client library for a particular DB implementation and that most of those client APIs - written in C - will likely require all of the items to be passed in one call. If so, the DB adapter would need to immediately evaluate the iterable and produce a list in memory anyway. -- Ned Deily, n...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On 12 Oct, 18:53, Jon Clements jon...@googlemail.com wrote: On 12 Oct, 18:32, Roy Smith r...@panix.com wrote: On Oct 12, 1:20 pm, Jon Clements jon...@googlemail.com wrote: On 12 Oct, 16:10, Roy Smith r...@panix.com wrote: PEP 249 says about executemany(): Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters. are there any plans to update the api to allow an iterable instead of a sequence? I'm not understanding (probably me). Example? I have a dictionary, d, which has a million items in it, and want to do something like: executemany(update foo set bar = %s where id = %s, d.iteritems()) If executemany accepted an iterable, that would work. But, it only accepts a sequence, so I need to do: executemany(update foo set bar = %s where id = %s, d.items()) which generates a million-item temporary list. Or am I mis- understanding the PEP? Interesting, but here's my guess... Replace d.items() with itertools.repeat( ('a', 'b') ) So, if you have a sequence, which has a length and known size, at least you can have an attempt at the DB operations: whether the transaction fails or not is another thing...In short, a sequence is finite, while an iterable may be infinite. That's just my guess and makes sense to me at least! Jon. Actually, thinking about it some more, I would take the following approach: (this is only loosely do with the Python DB API mind you...) 1) Start a transaction 2) Create a temporary table 3) Bulk insert your million update records to the temp table (from my understanding of the PEP, executemany(), is allowed to repeatedly call execute() unless it can do something cleverer) 4) Execute an update with a from statement joining your main table and temp table (pretty sure that's ANSI standard, and DB's should support it -- embedded one's may not though, but if you're dealing with 1mil records, I'm taking a guess you're not dealing with embedded) 5) End the transaction Far more efficient as a repeated execute of 'update' will only just hit the DB once, while an update statement with a from should allow the DB a chance to optimise it. Hope that makes sense, lemme know. Cheers, Jon. -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On Tue, 12 Oct 2010 13:58:37 -0400 Terry Reedy tjre...@udel.edu wrote: On 10/12/2010 11:10 AM, Roy Smith wrote: PEP 249 says about executemany(): Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters. are there any plans to update the api to allow an iterable instead of a sequence? That question would best be addressed to the pep author Marc-André Lemburg mal at lemburg.com Or, rather to python-ideas. Python development isn't supposed to happen in private. Regards Antoine. -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On Oct 12, 2010, at 8:35 PM, Jon Clements wrote: 4) Execute an update with a from statement joining your main table and temp table (pretty sure that's ANSI standard, and DB's should support it -- embedded one's may not though, but if you're dealing with 1mil records, I'm taking a guess you're not dealing with embedded) Not ANSI. Perhaps with a subquery, which is ANSI-compliant. -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
Am 12.10.2010 17:10, schrieb Roy Smith: [A]re there any plans to update the api to allow an iterable instead of a sequence? sqlite3 (standard library, python 2.6.6., Windows 32Bit) does that already:: import sqlite3 as sql connection = sql.connect(:memory:) cursor = connection.execute( CREATE TABLE test ( id INTEGER PRIMARY KEY AUTOINCREMENT, text TEXT) ;) connection.commit() cursor.executemany( INSERT INTO test (text) VALUES ( ? ); , # A generator expression - delivers one row at a time ( (hello nr %03d! % i,) for i in xrange(100))) connection.commit() cursor.execute(SELECT * FROM test) for id_, text in cursor.fetchall(): print text, id_ -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On 10/12/2010 11:35 AM, Jon Clements wrote: On 12 Oct, 18:53, Jon Clementsjon...@googlemail.com wrote: On 12 Oct, 18:32, Roy Smithr...@panix.com wrote: On Oct 12, 1:20 pm, Jon Clementsjon...@googlemail.com wrote: On 12 Oct, 16:10, Roy Smithr...@panix.com wrote: PEP 249 says about executemany(): Prepare a database operation (query or command) and then execute it against all parameter sequences or mappings found in the sequence seq_of_parameters. are there any plans to update the api to allow an iterable instead of a sequence? I'm not understanding (probably me). Example? I have a dictionary, d, which has a million items in it, and want to do something like: executemany(update foo set bar = %s where id = %s, d.iteritems()) If executemany accepted an iterable, that would work. But, it only accepts a sequence, so I need to do: executemany(update foo set bar = %s where id = %s, d.items()) which generates a million-item temporary list. Or am I mis- understanding the PEP? Interesting, but here's my guess... Replace d.items() with itertools.repeat( ('a', 'b') ) So, if you have a sequence, which has a length and known size, at least you can have an attempt at the DB operations: whether the transaction fails or not is another thing...In short, a sequence is finite, while an iterable may be infinite. That's just my guess and makes sense to me at least! Jon. Actually, thinking about it some more, I would take the following approach: (this is only loosely do with the Python DB API mind you...) 1) Start a transaction 2) Create a temporary table 3) Bulk insert your million update records to the temp table (from my understanding of the PEP, executemany(), is allowed to repeatedly call execute() unless it can do something cleverer) 4) Execute an update with a from statement joining your main table and temp table (pretty sure that's ANSI standard, and DB's should support it -- embedded one's may not though, but if you're dealing with 1mil records, I'm taking a guess you're not dealing with embedded) 5) End the transaction Far more efficient as a repeated execute of 'update' will only just hit the DB once, while an update statement with a from should allow the DB a chance to optimise it. Hope that makes sense, lemme know. Right. In general, if you find yourself making millions of SQL database requests in a loop, you're doing it wrong. Big database loads are usually done by creating a text file with the desired data, then using a LOAD DATA INFILE command. This (in MySQL) is tens to hundreds of times faster than doing individual INSERT or UPDATE commands. John Nagle -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
In message 4cb4ba4e$0$1641$742ec...@news.sonic.net, John Nagle wrote: In general, if you find yourself making millions of SQL database requests in a loop, you're doing it wrong. I’ve done this. Not millions, but certainly on the order of tens of thousands. Big database loads are usually done by creating a text file with the desired data, then using a LOAD DATA INFILE command. May not always be flexible enough. This (in MySQL) is tens to hundreds of times faster than doing individual INSERT or UPDATE commands. Why should that be? The database cannot read a text file any faster than I can. -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP 249 (database api) -- executemany() with iterable?
On Oct 12, 6:01 pm, Lawrence D'Oliveiro l...@geek- central.gen.new_zealand wrote: In message 4cb4ba4e$0$1641$742ec...@news.sonic.net, John Nagle wrote: In general, if you find yourself making millions of SQL database requests in a loop, you're doing it wrong. I’ve done this. Not millions, but certainly on the order of tens of thousands. Big database loads are usually done by creating a text file with the desired data, then using a LOAD DATA INFILE command. May not always be flexible enough. This (in MySQL) is tens to hundreds of times faster than doing individual INSERT or UPDATE commands. Why should that be? The database cannot read a text file any faster than I can. Bulk-load strategies usually solve one or more of these problems: network latency index maintenance during the upload parsing of the SQL reallocating storage They usually come at some cost or inconvenience: only work locally indexes not available during the bulk load requires non-SQL data format up-front cost to preallocate storage The exact benefits and costs will vary according to the system, of course. Also, you might employ multiple strategies; for example, for a medium-sized problem, there may be no need to sacrifice the familiarity and convenience of SQL if you can arrange for large batches of SQL to be submitted from a local client. If indexes are the problem, and you don't need 24/7 availability, you can drop them and rebuild them. -- http://mail.python.org/mailman/listinfo/python-list