I've added https://bitbucket.org/zzzeek/sqlalchemy/issues/4109/support-psycopg2-batch-mode which will have a code review up soon.
On Tue, Oct 10, 2017 at 1:06 PM, Mike Bayer <[email protected]> wrote: > On Tue, Oct 10, 2017 at 1:47 AM, <[email protected]> wrote: >> Hello! SQLAlchemy has the (really useful) behavior of batching together >> inserts when all the primary keys are defined. It looks like it groups >> similar tables together (while maintaining insert order ) to minimize the >> number of Insert statements sent to the database. >> >> I'm unsure what the expected behavior here is for single-table inheritance >> tables. Here's some example models: >> class Base(db.Model): >> __tablename__ = 'base' >> id = db.Column(db.Integer, primary_key=True) >> base_type = db.Column(db.String) >> >> __mapper_args__ = { >> 'polymorphic_on': base_type, >> } >> >> def __init__(self, id_): >> self.id = id_ >> >> >> class SubBase1(Base): >> __mapper_args__ = { >> 'polymorphic_identity': '1', >> } >> >> col = db.Column(db.String) >> >> >> class SubBase2(Base): >> __mapper_args__ = { >> 'polymorphic_identity': '2', >> } >> >> col2 = db.Column(db.String) >> >> >> class OtherModel(db.Model): >> id = db.Column(db.Integer, primary_key=True) >> >> def __init__(self, id_): >> self.id = id_ >> >> >> Base, SubBase, and SubBase2 form a tree, while OtherModel is just another >> model. >> >> When creating alternating SubBase1s and OtherModels, SQLAlchemy batches the >> INSERTs such that only two INSERT statements are sent to the DB: >> (Snippet #1) >> for i in xrange(0, 10, 2): >> db.session.add(SubBase1(i)) >> db.session.add(OtherModel(i + 1)) >> >> db.session.flush() # Performs 2 insert statements, each with 5 elements in >> the VALUES clause >> >> However, when creating alternating SubBase1s and SubBase2s, it actually >> performs 10 separate insert statements: >> (Snippet #2) >> for i in xrange(0, 10, 2): >> db.session.add(SubBase1(i)) >> db.session.add(SubBase2(i + 1)) >> db.session.flush() # Performs 10 insert statements, each with 1 element in >> the VALUES clause >> >> >> It seems like SQLAlchemy maintains the insert_order *across* SubBase1 and >> SubBase2, but also isn't able to do a single INSERT statement with 10 >> elements in the VALUES clause (or 2 INSERT statements with 5 elements each). >> >> Questions: >> 1) Is Snippet #2 the desired behavior? I read through the unit-of-work >> summary at http://www.aosabook.org/en/sqlalchemy.html, but I wasn't totally >> sure what the desired behavior was for single-table inheritance models. > > yes because you'll note because sub1 and sub2 have different columns > mapped, the actual columns in each INSERT statement are different. > At the same time it is maintaining INSERT order into the table > overall, so it has no choice but to break things up in this way. > >> 2) If this is desired behavior, do you have any suggestions of how to >> optimize the number of queries here in e.g. a before_flush hook? Obviously >> one way would be to rewrite our application such that inserts of like tables >> happen together, but if there is a way to do this without rewriting the >> application code, that would be even better. > > if your INSERTs are very simplistic then you can still consider using > the bulk_save option, gathering your classes into lists and then doing > your own bulk operation where you ensure all like-rows are grouped > together ahead of time. > > Otherwise, if you really are trying to thread the needle through "I > want full ORM flush convenience" plus "I need to optimize INSERTs into > batches", you have one more option which is to manipulate the > insert_order on the states like this: > > for i in xrange(0, 10, 2): > s.add(SubBase1(i)) > s.add(SubBase2(i + 1)) > > from sqlalchemy import inspect > > for idx, obj in enumerate( > sorted( > s.new, > key=lambda obj: (obj.__class__, inspect(obj).insert_order) > ) > ): > inspect(obj).insert_order = idx > > > I've not documented "insert_order" as an official thing but I can support > that. > > But also if you're really trying to maximize INSERT performance you > need to use the psycopg2 executemany() extensions: > http://initd.org/psycopg/docs/extras.html#fast-execution-helpers. > These were added specifically from an email thread I started with them > in response to user complaints about performance, but SQLAlchemy has > not yet built these in as features. > > I've done no testing at all of this, however if you have the > resources, you can turn on statement logging on your Postgresql > database which will hopefully show INSERT statements being collapsed > into a single statement with multiple VALUES: > > from psycopg2 import extras > from sqlalchemy import event > > e = create_engine("postgresql://scott:tiger@localhost/test", echo=True) > > @event.listens_for(e, "do_executemany") > def do_executemany(cursor, statement, parameters, context): > context.dialect.supports_sane_multi_rowcount = False > extras.execute_batch(cursor, statement, parameters) > return True > > if you get the time to test this, please let me know as I do want to > add support for "execute_batch" directly into the psycopg2 dialect. > thanks! > > > > > > > One option I've thought of is: >> - Overwrite the insert_order in the before_flush hook to rearrange the >> models in the order I want. (Not sure if this has effects that I'm unaware >> of though) >> >> Thanks! >> >> -- >> SQLAlchemy - >> The Python SQL Toolkit and Object Relational Mapper >> >> http://www.sqlalchemy.org/ >> >> To post example code, please provide an MCVE: Minimal, Complete, and >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> description. >> --- >> You received this message because you are subscribed to the Google Groups >> "sqlalchemy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/sqlalchemy. >> For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
