On 12/16/2015 08:33 AM, [email protected] wrote: > Hi, > > Today we increased the performance of one of our services eightfold by > changing the row: > > connection.execute(insert_statement, values=[...]) > > into > > insert_statement = table.insert().values(values) > connection.execute(insert_statement) > > We are using the db driver psycopg2. > > The reason for this behaviour is that in the first case, the executemany > method of the underlying psycopg2 cursor will be used wich uses a naive > looping approach instead of a multirow insert > (http://www.postgresql.org/message-id/ca+mi_8zeeg93nhdxizo0u16y-ygyn+fscxd34nsmktjkzqq...@mail.gmail.com). > In the second case, the proper sql (which will be multirow insert with > the postgres dialect) will be generated by SQLAlchemy and then sent to > the execute method of the underlying cursor. > > As an application developer you would expect that these code snippets > would have the same performance characteristics and we were very > surprised at first at the results. Also, in the example in the > documentation > http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements > a multirow insert is generated which made us belive that is what we > would get (allthough it does state clearly that the underlying db > driver's executemany method will be used). > > We think this is an issue, since it is a surprising behavoir from a > application developers point of view (preferbly you should not need to > know the details of the underlaying db driver in the application layer) > but we don't have a clear solution to the problem. Maybe add a note in > the executing-multiple-statements section of the documenation warning > about this behavour with psycodb2. And/or even log a warning about this > when using the multiple values api with psycodb2 driver.
The multiple "values" syntax to my knowledge is not standard SQL (e.g. is not available on any other database) and the bug in psycopg2 is that they do not use prepared statements for executemany(). Looping over the same INSERT statement many times is extremely fast when prepared statements are used. The multiple "values" syntax will also no longer exhibit the great performance you see if you need to insert hundreds of thousands of rows. Postgresql will be tasked with parsing a many-megabyte SQL string on every invocation which is very wasteful both in terms of memory and time. Basically the multiple-values syntax is not at all portable and is very awkward at every level. For the average SQL developer would be extremely surprising if SQLAlchemy's use of normal DBAPI executemany (https://www.python.org/dev/peps/pep-0249/#id18) were arbitrarily circumvented in this very specific case, leading to a new set of behavioral quirks and issues. You should definitely contact the psycopg2 developers and ask them as well why they don't switch to multiple-VALUES within an executemany() as well as why they aren't on prepared statements for this operation (if that's still the case). > > /Viktor > > > > > -- > You received this message because you are subscribed to the Google > Groups "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] > <mailto:[email protected]>. > To post to this group, send email to [email protected] > <mailto:[email protected]>. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
