On 02/29/2016 05:38 PM, Alex Hewson wrote:
Hello All,

I'm trying to use the new bulk_save_objects() to improve performance on
bulk inserts, and have run into a problem.  If bulk_save_objects() is
used to save objects of a polymorphic class..

 1. They are created correctly in the DB, with polymorphic type column
    populated correctly
 2. BUT queries for the new objects will return one of incorrect type.
    In my case I'm getting instances of Child1 back when I would expect
    to get a Child2.


turn on echo=True, and you'll see this:


INSERT INTO "Child1" DEFAULT VALUES
2016-02-29 17:48:11,349 INFO sqlalchemy.engine.base.Engine ((), (), (), (), (), (), (), () ... displaying 10 of 1000 total bound parameter sets ... (), ())

what you will notice here is that this is the Child1 table receiving entirely empty rows; the primary key values from Entity are nowhere to be found. SQLite does not enforce foreign keys by default so it's just auto-generating identifiers here, something that wouldn't happen on most other databases where this column wouldn't work as an autoincrement by default. If you run it on Postgresql you get:

sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in column "MyId" violates not-null constraint
DETAIL:  Failing row contains (null).
[SQL: 'INSERT INTO "Child1" DEFAULT VALUES'] [parameters: ({}, {}, {}, {}, {}, {}, {}, {} ... displaying 10 of 1000 total bound parameter sets ... {}, {})]

What's happening here is documented, however the documentation for bulk is a little long and the documentation referring to the use case here might benefit from a little more boldface and probably should be more clearly listed as "will not work", instead of a somewhat casual "however". At http://docs.sqlalchemy.org/en/rel_1_0/orm/persistence_techniques.html#orm-compatibility (emphasis added):

Multi-table mappings, such as joined-inheritance - **however**, an object to be inserted across multiple tables either **needs to have primary key identifiers fully populated ahead of time**, else the Session.bulk_save_objects.return_defaults flag must be used, which will greatly reduce the performance benefits

what we mean here is this:

c1_many = [Child1(Id=i+1, MyId=i+1, Content="c1inst_%d"%i) for i in range(0,1000)] c2_many = [Child2(Id=i+1001, MyId=i+1001, Content="c2inst_%d"%i) for i in range(0,1000)]


In SQLAlchemy 1.1, things are much easier to spot, even if you're using a non-FK/non-autoincrement enforcing database like SQLite; running this program immediately catches the problem on the Python side:

sqlalchemy.exc.CompileError: Column 'Child1.MyId' is marked as a member of the primary key for table 'Child1', but has no Python-side or server-side default generator indicated, nor does it indicate 'autoincrement=True' or 'nullable=True', and no explicit value is passed. Primary key columns typically may not store NULL.

This is because 1.1 has changed the logic of the "autoincrement" flag and adds deeper checks for NULL primary key values as described at http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#the-autoincrement-directive-is-no-longer-implicitly-enabled-for-a-composite-primary-key-column.


So the mitigation for the fact that your specific test case silently fails include:

1. this whole issue only silently passes on SQLite, not on any of the higher volume databases where you'd want to use bulk operations in the first place

2. documentation here should be spruced up to list this practice as a **warning**, including that we should also have a boldface up in the earlier paragraph talking about fetching of inserted primary keys being disabled (this is the slowest part of the INSERT so has no place within bulk inserts, hence you must populate columns dependent on a PK up front which means the PK itself needs to be populated up front in those cases where you need it)

3. SQLAlchemy 1.1 won't let these INSERTs without a primary key value when the column is not configured as an "autoincrement" proceed

Thanks for the clear test case here.






The following code demonstrates the problem:

|
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

fromsqlalchemy importcreate_engine
fromsqlalchemy importColumn,Integer,SmallInteger,String,ForeignKey
fromsqlalchemy.orm importsessionmaker
fromsqlalchemy.ext.declarative importdeclarative_base

Base=declarative_base()

classEntity(Base):
   __tablename__ ='Entity'
Id=Column(Integer,primary_key=True,nullable=False)
Content=Column(String)
   _polytype =Column(SmallInteger,nullable=False)

   __mapper_args__ ={
'polymorphic_identity':1,
'polymorphic_on':_polytype
}

classChild1(Entity):
   __tablename__ ='Child1'
MyId=Column(ForeignKey("Entity.Id"),primary_key=True)
   __mapper_args__ ={'polymorphic_identity':11}

classChild2(Entity):
   __tablename__ ='Child2'
MyId=Column(ForeignKey("Entity.Id"),primary_key=True)
   __mapper_args__ ={'polymorphic_identity':12}


if__name__ =='__main__':
# engine = create_engine('sqlite:///:memory:', echo=False)
   engine =create_engine('sqlite:///test.db',echo=False)
Session=sessionmaker(bind=engine)
   sess =Session()
Base.metadata.create_all(engine)
   c1_many =[Child1(Content="c1inst_%d"%i)fori inrange(0,1000)]
   c2_many =[Child2(Content="c2inst_%d"%i)fori inrange(0,1000)]
   sess.bulk_save_objects(c1_many)
   sess.bulk_save_objects(c2_many)
# sess.add_all(c1_many)
# sess.add_all(c2_many)
   sess.flush()
   sess.commit()
forc insess.query(Child1):
assertisinstance(c,Child1)
forc insess.query(Child2):
assertisinstance(c,Child2)

|

All the calls to assert isinstance(c, Child1) complete successfully.
But once we start checking for Child2 - boom, we are still getting back
Child1 instances.

At first I wondered if I was misunderstanding SA's implementation of
polymorphism, so tried inserting rows the traditional way with
sess.add_all().  But that works fine so I think I've exposed a bug in
the new bulk_save_objects() code.

My environment is Python 3.5.1, SQLAlchemy==1.0.12, SQLite 3.8.10.2 on OSX.

--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to