Re: [sqlalchemy] Bulk Insert Broken for Polymorphism?

Mike Bayer Mon, 29 Feb 2016 15:05:46 -0800


On 02/29/2016 05:38 PM, Alex Hewson wrote:

Hello All,

I'm trying to use the new bulk_save_objects() to improve performance on
bulk inserts, and have run into a problem.  If bulk_save_objects() is
used to save objects of a polymorphic class..

 1. They are created correctly in the DB, with polymorphic type column
    populated correctly
 2. BUT queries for the new objects will return one of incorrect type.
    In my case I'm getting instances of Child1 back when I would expect
    to get a Child2.



turn on echo=True, and you'll see this:


INSERT INTO "Child1" DEFAULT VALUES

2016-02-29 17:48:11,349 INFO sqlalchemy.engine.base.Engine ((), (), (),(), (), (), (), () ... displaying 10 of 1000 total bound parameter sets... (), ())

what you will notice here is that this is the Child1 table receivingentirely empty rows; the primary key values from Entity are nowhere tobe found. SQLite does not enforce foreign keys by default so it's justauto-generating identifiers here, something that wouldn't happen on mostother databases where this column wouldn't work as an autoincrement bydefault. If you run it on Postgresql you get:

sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value incolumn "MyId" violates not-null constraint

DETAIL:  Failing row contains (null).

[SQL: 'INSERT INTO "Child1" DEFAULT VALUES'] [parameters: ({}, {}, {},{}, {}, {}, {}, {} ... displaying 10 of 1000 total bound parameter sets... {}, {})]

What's happening here is documented, however the documentation for bulkis a little long and the documentation referring to the use case heremight benefit from a little more boldface and probably should be moreclearly listed as "will not work", instead of a somewhat casual"however". Athttp://docs.sqlalchemy.org/en/rel_1_0/orm/persistence_techniques.html#orm-compatibility(emphasis added):

Multi-table mappings, such as joined-inheritance - **however**, anobject to be inserted across multiple tables either **needs to haveprimary key identifiers fully populated ahead of time**, else theSession.bulk_save_objects.return_defaults flag must be used, which willgreatly reduce the performance benefits


what we mean here is this:

c1_many = [Child1(Id=i+1, MyId=i+1, Content="c1inst_%d"%i) for i inrange(0,1000)]c2_many = [Child2(Id=i+1001, MyId=i+1001, Content="c2inst_%d"%i)for i in range(0,1000)]

In SQLAlchemy 1.1, things are much easier to spot, even if you're usinga non-FK/non-autoincrement enforcing database like SQLite; running thisprogram immediately catches the problem on the Python side:

sqlalchemy.exc.CompileError: Column 'Child1.MyId' is marked as amember of the primary key for table 'Child1', but has no Python-side orserver-side default generator indicated, nor does it indicate'autoincrement=True' or 'nullable=True', and no explicit value ispassed. Primary key columns typically may not store NULL.

This is because 1.1 has changed the logic of the "autoincrement" flagand adds deeper checks for NULL primary key values as described athttp://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#the-autoincrement-directive-is-no-longer-implicitly-enabled-for-a-composite-primary-key-column.

So the mitigation for the fact that your specific test case silentlyfails include:

1. this whole issue only silently passes on SQLite, not on any of thehigher volume databases where you'd want to use bulk operations in thefirst place

2. documentation here should be spruced up to list this practice as a**warning**, including that we should also have a boldface up in theearlier paragraph talking about fetching of inserted primary keys beingdisabled (this is the slowest part of the INSERT so has no place withinbulk inserts, hence you must populate columns dependent on a PK up frontwhich means the PK itself needs to be populated up front in those caseswhere you need it)

3. SQLAlchemy 1.1 won't let these INSERTs without a primary key valuewhen the column is not configured as an "autoincrement" proceed


Thanks for the clear test case here.


The following code demonstrates the problem:

|
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

fromsqlalchemy importcreate_engine
fromsqlalchemy importColumn,Integer,SmallInteger,String,ForeignKey
fromsqlalchemy.orm importsessionmaker
fromsqlalchemy.ext.declarative importdeclarative_base

Base=declarative_base()

classEntity(Base):
   __tablename__ ='Entity'
Id=Column(Integer,primary_key=True,nullable=False)
Content=Column(String)
   _polytype =Column(SmallInteger,nullable=False)

   __mapper_args__ ={
'polymorphic_identity':1,
'polymorphic_on':_polytype
}

classChild1(Entity):
   __tablename__ ='Child1'
MyId=Column(ForeignKey("Entity.Id"),primary_key=True)
   __mapper_args__ ={'polymorphic_identity':11}

classChild2(Entity):
   __tablename__ ='Child2'
MyId=Column(ForeignKey("Entity.Id"),primary_key=True)
   __mapper_args__ ={'polymorphic_identity':12}


if__name__ =='__main__':
# engine = create_engine('sqlite:///:memory:', echo=False)
   engine =create_engine('sqlite:///test.db',echo=False)
Session=sessionmaker(bind=engine)
   sess =Session()
Base.metadata.create_all(engine)
   c1_many =[Child1(Content="c1inst_%d"%i)fori inrange(0,1000)]
   c2_many =[Child2(Content="c2inst_%d"%i)fori inrange(0,1000)]
   sess.bulk_save_objects(c1_many)
   sess.bulk_save_objects(c2_many)
# sess.add_all(c1_many)
# sess.add_all(c2_many)
   sess.flush()
   sess.commit()
forc insess.query(Child1):
assertisinstance(c,Child1)
forc insess.query(Child2):
assertisinstance(c,Child2)

|

All the calls to assert isinstance(c, Child1) complete successfully.
But once we start checking for Child2 - boom, we are still getting back
Child1 instances.

At first I wondered if I was misunderstanding SA's implementation of
polymorphism, so tried inserting rows the traditional way with
sess.add_all().  But that works fine so I think I've exposed a bug in
the new bulk_save_objects() code.

My environment is Python 3.5.1, SQLAlchemy==1.0.12, SQLite 3.8.10.2 on OSX.

--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Bulk Insert Broken for Polymorphism?

Reply via email to