>
> if you're using Postgresql, there's a vastly easier technique to use
> which is just to pre-fetch from the sequence:
> identities = [
> val for val, in session.execute(
> "select nextval('mytable_seq') from "
> "generate_series(1,%s)" % len(my_objects))
> )
> ]
> for ident, obj in zip(identities, my_objects):
> obj.pk = ident
Wow, that's a great idea! I got it working for most of our models. I have
some questions about how inserts for joined-table inheritance tables are
batched together, but I'll ask them in a separate post since they're
somewhat unrelated to this.
So the complexity of adding multi-values insert with sequences would
> benefit an extremely narrow set of use cases, would be very labor
> intensive to implement and maintain, and is unnecessary for the single
> target platform in which this case would work.
>
That makes sense, thanks for the explanation!
On Monday, October 9, 2017 at 8:44:51 AM UTC-7, Mike Bayer wrote:
>
> On Mon, Oct 9, 2017 at 4:15 AM, <[email protected] <javascript:>>
> wrote:
> > Hello! I've spent some time looking at SQLAlchemy's ability to batch
> > inserts, and have a few questions about bulk_save_objects (and flushing
> in
> > general).
> >
> > Two statements that I think are true:
> >
> > Right now, bulk_save_objects does not fetch primary keys for inserted
> rows
> > (unless return_defaults is passed in, which mitigates the performance
> gains
> > by performing one insert statement per model).
>
> the point of bulk save is that the objects passed to it are considered
> as throwaway after the operation, to avoid the overhead of maintaining
> the objects' persistence state in memory. Using it with
> return_defaults is nearly pointless as this will blow away the
> efficiency gains you might get from the bulk operation, and is there
> mostly to allow the joined-table inheritance use case to be usable.
>
>
> > When running db.session.flush(), it looks like SQLAlchemy can batch
> model
> > creation into a single multi-value insert statement *only if all of the
> > primary keys are already defined on the model*. (Verified by looking at
> > sqlalchemy/lib/sqlalchemy/orm/persistence.py:_emit_insert_statements)
> >
> >
> > Questions:
> >
> > Would it be possible for bulk_save_objects to fetch primary keys while
> still
> > batching INSERTs into a single query?
>
> What do you mean "single query", do you mean, executemany()? the
> answer is no because DBAPI drivers don't return result sets with
> executemany(). Do you mean, a single INSERT..VALUES with all the
> parameters in one statement ? The answer is no because SQLAlchemy
> doesn't include multi-values as a transparent option; the DBAPI
> drivers instead make use of the multi-values syntax within their
> executemany() implementations, where again, they don't return result
> sets.
>
> Right now, only the MySQL drivers do this by default, the psycopg2
> driver does it with a recently added option that SQLAlchemy does not
> directly support, however you can set up via a connection event. For
> psycopg2 it can vastly speed up inserts as psycopg2's normal
> executemany() implementation has some performance issues.
>
>
> (This
> > would help with e.g. inserting a bunch of rows into a table with an
> > auto-incrementing primary key).
>
> if you're using Postgresql, there's a vastly easier technique to use
> which is just to pre-fetch from the sequence:
>
> identities = [
> val for val, in session.execute(
> "select nextval('mytable_seq') from "
> "generate_series(1,%s)" % len(my_objects))
> )
> ]
> for ident, obj in zip(identities, my_objects):
> obj.pk = ident
>
>
> Now you don't need to do RETURNING or anything and the inserts can be
> at their top efficiency.
>
>
>
> >
> > At least in Postgres (haven't verified for other databases), it looks
> like
> > one can use RETURNING for inserts with multi-value VALUES clauses. At
> the
> > surface, it seems like one could extend the current behavior for a
> single
> > row INSERT:
> >
> > INSERT INTO table ... VALUES (1) RETURNING id
> >
> > to multi-value INSERTS:
> >
> > INSERT INTO table ... VALUES (1), (2), ... RETURNING id
> >
> > and get all the benefits of the ORM while still batching inserts.
>
>
> This would be an enormous undertaking to implement, test, and release.
> It would have to exist as an entirely additional series of codepaths
> within persistence.py as multi-valued INSERT is not available on most
> databases as well as on earlier versions of the databases that do
> support it. For all of that complexity, the approach would work on
> exactly: Postgresql only - SQLite and MySQL don't support RETURNING,
> Oracle, SQL Server and others don't support INSERT..VALUES with
> multiple sets.
>
> If you are using Postgresql, you can instead pre-fetch the sequence up
> front. This can even be integrated into a before_flush() event.
>
> So the complexity of adding multi-values insert with sequences would
> benefit an extremely narrow set of use cases, would be very labor
> intensive to implement and maintain, and is unnecessary for the single
> target platform in which this case would work.
>
>
> >
> > I'm sure this is something that was thought about already, so I'd mostly
> > love to hear any thoughts about what makes this hard. We have a lot of
> > complex relationships and joined-table inheritance, so working with
> > bulk_save_objects has been a bit challenging. From the comments in
> > sqlalchemy/lib/sqlalchemy/sql/dml.py:return_defaults(), it seems like
> > SQLAlchemy is already aware that RETURNING for insert statements with
> > multi-values VALUES clauses is supported, so it's possible there is a
> reason
> > that this is hard or can't be done.
> >
> > This is also something that, if it just a “missing feature”, I would
> love to
> > learn more and see if I could contribute.
> >
> > --
> > SQLAlchemy -
> > The Python SQL Toolkit and Object Relational Mapper
> >
> > http://www.sqlalchemy.org/
> >
> > To post example code, please provide an MCVE: Minimal, Complete, and
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> > description.
> > ---
> > You received this message because you are subscribed to the Google
> Groups
> > "sqlalchemy" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an
> > email to [email protected] <javascript:>.
> > To post to this group, send email to [email protected]
> <javascript:>.
> > Visit this group at https://groups.google.com/group/sqlalchemy.
> > For more options, visit https://groups.google.com/d/optout.
>
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable
Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.