Re: [sqlalchemy] Batching INSERT statements

vineet Mon, 09 Oct 2017 22:27:42 -0700

>
> if you're using Postgresql, there's a vastly easier technique to use 
> which is just to pre-fetch from the sequence: 
> identities = [ 
>     val for val, in session.execute( 
>              "select nextval('mytable_seq') from " 
>              "generate_series(1,%s)" % len(my_objects)) 
>     ) 
> ] 
> for ident, obj in zip(identities, my_objects): 
>     obj.pk = ident


Wow, that's a great idea! I got it working for most of our models. I have 
some questions about how inserts for joined-table inheritance tables are 
batched together, but I'll ask them in a separate post since they're 
somewhat unrelated to this.

So the complexity of adding multi-values insert with sequences would 
> benefit an extremely narrow set of use cases, would be very labor 
> intensive to implement and maintain, and is unnecessary for the single 
> target platform in which this case would work. 
>
That makes sense, thanks for the explanation!

 

On Monday, October 9, 2017 at 8:44:51 AM UTC-7, Mike Bayer wrote:
>
> On Mon, Oct 9, 2017 at 4:15 AM,  <[email protected] <javascript:>> 
> wrote: 
> > Hello! I've spent some time looking at SQLAlchemy's ability to batch 
> > inserts, and have a few questions about bulk_save_objects (and flushing 
> in 
> > general). 
> > 
> > Two statements that I think are true: 
> > 
> > Right now, bulk_save_objects does not fetch primary keys for inserted 
> rows 
> > (unless return_defaults is passed in, which mitigates the performance 
> gains 
> > by performing one insert statement per model). 
>
> the point of bulk save is that the objects passed to it are considered 
> as throwaway after the operation, to avoid the overhead of maintaining 
> the objects' persistence state in memory.   Using it with 
> return_defaults is nearly pointless as this will blow away the 
> efficiency gains you might get from the bulk operation, and is there 
> mostly to allow the joined-table inheritance use case to be usable. 
>
>
> > When running db.session.flush(), it looks like SQLAlchemy can batch 
> model 
> > creation into a single multi-value insert statement *only if all of the 
> > primary keys are already defined on the model*. (Verified by looking at 
> > sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements) 
> > 
> > 
> > Questions: 
> > 
> > Would it be possible for bulk_save_objects to fetch primary keys while 
> still 
> > batching INSERTs into a single query? 
>
> What do you mean "single query", do you mean, executemany()?  the 
> answer is no because DBAPI drivers don't return result sets with 
> executemany().  Do you mean, a single INSERT..VALUES with all the 
> parameters in one statement ?  The answer is no because SQLAlchemy 
> doesn't include multi-values as a transparent option; the DBAPI 
> drivers instead make use of the multi-values syntax within their 
> executemany() implementations, where again, they don't return result 
> sets. 
>
> Right now, only the MySQL drivers do this by default, the psycopg2 
> driver does it with a recently added option that SQLAlchemy does not 
> directly support, however you can set up via a connection event.   For 
> psycopg2 it can vastly speed up inserts as psycopg2's normal 
> executemany() implementation has some performance issues. 
>
>
> (This 
> > would help with e.g. inserting a bunch of rows into a table with an 
> > auto-incrementing primary key). 
>
> if you're using Postgresql, there's a vastly easier technique to use 
> which is just to pre-fetch from the sequence: 
>
> identities = [ 
>     val for val, in session.execute( 
>              "select nextval('mytable_seq') from " 
>              "generate_series(1,%s)" % len(my_objects)) 
>     ) 
> ] 
> for ident, obj in zip(identities, my_objects): 
>     obj.pk = ident 
>
>
> Now you don't need to do RETURNING or anything and the inserts can be 
> at their top efficiency. 
>
>
>
> > 
> > At least in Postgres (haven't verified for other databases), it looks 
> like 
> > one can use RETURNING for inserts with multi-value VALUES clauses. At 
> the 
> > surface, it seems like one could extend the current behavior for a 
> single 
> > row INSERT: 
> > 
> > INSERT INTO table ... VALUES (1) RETURNING id 
> > 
> > to multi-value INSERTS: 
> > 
> > INSERT INTO table ... VALUES (1), (2), ... RETURNING id 
> > 
> > and get all the benefits of the ORM while still batching inserts. 
>
>
> This would be an enormous undertaking to implement, test, and release. 
>   It would have to exist as an entirely additional series of codepaths 
> within persistence.py as multi-valued INSERT is not available on most 
> databases as well as on earlier versions of the databases that do 
> support it.     For all of that complexity, the approach would work on 
> exactly:  Postgresql only - SQLite and MySQL don't support RETURNING, 
> Oracle, SQL Server and others don't support INSERT..VALUES with 
> multiple sets. 
>
> If you are using Postgresql, you can instead pre-fetch the sequence up 
> front.  This can even be integrated into a before_flush() event. 
>
> So the complexity of adding multi-values insert with sequences would 
> benefit an extremely narrow set of use cases, would be very labor 
> intensive to implement and maintain, and is unnecessary for the single 
> target platform in which this case would work. 
>
>
> > 
> > I'm sure this is something that was thought about already, so I'd mostly 
> > love to hear any thoughts about what makes this hard. We have a lot of 
> > complex relationships and joined-table inheritance, so working with 
> > bulk_save_objects has been a bit challenging. From the comments in 
> > sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like 
> > SQLAlchemy is already aware that RETURNING for insert statements with 
> > multi-values VALUES clauses is supported, so it's possible there is a 
> reason 
> > that this is hard or can't be done. 
> > 
> > This is also something that, if it just a “missing feature”, I would 
> love to 
> > learn more and see if I could contribute. 
> > 
> > -- 
> > SQLAlchemy - 
> > The Python SQL Toolkit and Object Relational Mapper 
> > 
> > http://www.sqlalchemy.org/ 
> > 
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description. 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to [email protected] <javascript:>. 
> > To post to this group, send email to [email protected] 
> <javascript:>. 
> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Batching INSERT statements

Reply via email to