I've added 
https://bitbucket.org/zzzeek/sqlalchemy/issues/4109/support-psycopg2-batch-mode
which will have a code review up soon.

On Tue, Oct 10, 2017 at 1:06 PM, Mike Bayer <[email protected]> wrote:
> On Tue, Oct 10, 2017 at 1:47 AM,  <[email protected]> wrote:
>> Hello! SQLAlchemy has the (really useful) behavior of batching together
>> inserts when all the primary keys are defined. It looks like it groups
>> similar tables together (while maintaining insert order ) to minimize the
>> number of Insert statements sent to the database.
>>
>> I'm unsure what the expected behavior here is for single-table inheritance
>> tables. Here's some example models:
>> class Base(db.Model):
>>     __tablename__ = 'base'
>>     id = db.Column(db.Integer, primary_key=True)
>>     base_type = db.Column(db.String)
>>
>>     __mapper_args__ = {
>>         'polymorphic_on': base_type,
>>     }
>>
>>     def __init__(self, id_):
>>         self.id = id_
>>
>>
>> class SubBase1(Base):
>>     __mapper_args__ = {
>>         'polymorphic_identity': '1',
>>     }
>>
>>     col = db.Column(db.String)
>>
>>
>> class SubBase2(Base):
>>     __mapper_args__ = {
>>         'polymorphic_identity': '2',
>>     }
>>
>>     col2 = db.Column(db.String)
>>
>>
>> class OtherModel(db.Model):
>>     id = db.Column(db.Integer, primary_key=True)
>>
>>     def __init__(self, id_):
>>         self.id = id_
>>
>>
>> Base, SubBase, and SubBase2 form a tree, while OtherModel is just another
>> model.
>>
>> When creating alternating SubBase1s and OtherModels, SQLAlchemy batches the
>> INSERTs such that only two INSERT statements are sent to the DB:
>> (Snippet #1)
>> for i in xrange(0, 10, 2):
>>     db.session.add(SubBase1(i))
>>     db.session.add(OtherModel(i + 1))
>>
>> db.session.flush()  # Performs 2 insert statements, each with 5 elements in
>> the VALUES clause
>>
>> However, when creating alternating SubBase1s and SubBase2s, it actually
>> performs 10 separate insert statements:
>> (Snippet #2)
>> for i in xrange(0, 10, 2):
>>     db.session.add(SubBase1(i))
>>     db.session.add(SubBase2(i + 1))
>> db.session.flush()  # Performs 10 insert statements, each with 1 element in
>> the VALUES clause
>>
>>
>> It seems like SQLAlchemy maintains the insert_order *across* SubBase1 and
>> SubBase2, but also isn't able to do a single INSERT statement with 10
>> elements in the VALUES clause (or 2 INSERT statements with 5 elements each).
>>
>> Questions:
>> 1) Is Snippet #2 the desired behavior? I read through the unit-of-work
>> summary at http://www.aosabook.org/en/sqlalchemy.html, but I wasn't totally
>> sure what the desired behavior was for single-table inheritance models.
>
> yes because you'll note because sub1 and sub2 have different columns
> mapped, the actual columns in each INSERT statement are different.
> At the same time it is maintaining INSERT order into the table
> overall, so it has no choice but to break things up in this way.
>
>> 2) If this is desired behavior, do you have any suggestions of how to
>> optimize the number of queries here in e.g. a before_flush hook? Obviously
>> one way would be to rewrite our application such that inserts of like tables
>> happen together, but if there is a way to do this without rewriting the
>> application code, that would be even better.
>
> if your INSERTs are very simplistic then you can still consider using
> the bulk_save option, gathering your classes into lists and then doing
> your own bulk operation where you ensure all like-rows are grouped
> together ahead of time.
>
> Otherwise, if you really are trying to thread the needle through "I
> want full ORM flush convenience" plus "I need to optimize INSERTs into
> batches", you have one more option which is to manipulate the
> insert_order on the states like this:
>
> for i in xrange(0, 10, 2):
>     s.add(SubBase1(i))
>     s.add(SubBase2(i + 1))
>
> from sqlalchemy import inspect
>
> for idx, obj in enumerate(
>     sorted(
>         s.new,
>         key=lambda obj: (obj.__class__, inspect(obj).insert_order)
>     )
> ):
>     inspect(obj).insert_order = idx
>
>
> I've not documented "insert_order" as an official thing but I can support 
> that.
>
> But also if you're really trying to maximize INSERT performance you
> need to use the psycopg2 executemany() extensions:
> http://initd.org/psycopg/docs/extras.html#fast-execution-helpers.
> These were added specifically from an email thread I started with them
> in response to user complaints about performance, but SQLAlchemy has
> not yet built these in as features.
>
> I've done no testing at all of this, however if you have the
> resources, you can turn on statement logging on your Postgresql
> database which will hopefully show INSERT statements being collapsed
> into a single statement with multiple VALUES:
>
> from psycopg2 import extras
> from sqlalchemy import event
>
> e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
>
> @event.listens_for(e, "do_executemany")
> def do_executemany(cursor, statement, parameters, context):
>     context.dialect.supports_sane_multi_rowcount = False
>     extras.execute_batch(cursor, statement, parameters)
>     return True
>
> if you get the time to test this, please let me know as I do want to
> add support for "execute_batch" directly into the psycopg2 dialect.
> thanks!
>
>
>
>
>
>
> One option I've thought of is:
>>   - Overwrite the insert_order in the before_flush hook to rearrange the
>> models in the order I want. (Not sure if this has effects that I'm unaware
>> of though)
>>
>> Thanks!
>>
>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalchemy.org/
>>
>> To post example code, please provide an MCVE: Minimal, Complete, and
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> description.
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to