On 02/29/2016 05:38 PM, Alex Hewson wrote:
Hello All,
I'm trying to use the new bulk_save_objects() to improve performance on
bulk inserts, and have run into a problem. If bulk_save_objects() is
used to save objects of a polymorphic class..
1. They are created correctly in the DB, with polymorphic type column
populated correctly
2. BUT queries for the new objects will return one of incorrect type.
In my case I'm getting instances of Child1 back when I would expect
to get a Child2.
turn on echo=True, and you'll see this:
INSERT INTO "Child1" DEFAULT VALUES
2016-02-29 17:48:11,349 INFO sqlalchemy.engine.base.Engine ((), (), (),
(), (), (), (), () ... displaying 10 of 1000 total bound parameter sets
... (), ())
what you will notice here is that this is the Child1 table receiving
entirely empty rows; the primary key values from Entity are nowhere to
be found. SQLite does not enforce foreign keys by default so it's just
auto-generating identifiers here, something that wouldn't happen on most
other databases where this column wouldn't work as an autoincrement by
default. If you run it on Postgresql you get:
sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in
column "MyId" violates not-null constraint
DETAIL: Failing row contains (null).
[SQL: 'INSERT INTO "Child1" DEFAULT VALUES'] [parameters: ({}, {}, {},
{}, {}, {}, {}, {} ... displaying 10 of 1000 total bound parameter sets
... {}, {})]
What's happening here is documented, however the documentation for bulk
is a little long and the documentation referring to the use case here
might benefit from a little more boldface and probably should be more
clearly listed as "will not work", instead of a somewhat casual
"however". At
http://docs.sqlalchemy.org/en/rel_1_0/orm/persistence_techniques.html#orm-compatibility
(emphasis added):
Multi-table mappings, such as joined-inheritance - **however**, an
object to be inserted across multiple tables either **needs to have
primary key identifiers fully populated ahead of time**, else the
Session.bulk_save_objects.return_defaults flag must be used, which will
greatly reduce the performance benefits
what we mean here is this:
c1_many = [Child1(Id=i+1, MyId=i+1, Content="c1inst_%d"%i) for i in
range(0,1000)]
c2_many = [Child2(Id=i+1001, MyId=i+1001, Content="c2inst_%d"%i)
for i in range(0,1000)]
In SQLAlchemy 1.1, things are much easier to spot, even if you're using
a non-FK/non-autoincrement enforcing database like SQLite; running this
program immediately catches the problem on the Python side:
sqlalchemy.exc.CompileError: Column 'Child1.MyId' is marked as a
member of the primary key for table 'Child1', but has no Python-side or
server-side default generator indicated, nor does it indicate
'autoincrement=True' or 'nullable=True', and no explicit value is
passed. Primary key columns typically may not store NULL.
This is because 1.1 has changed the logic of the "autoincrement" flag
and adds deeper checks for NULL primary key values as described at
http://docs.sqlalchemy.org/en/latest/changelog/migration_11.html#the-autoincrement-directive-is-no-longer-implicitly-enabled-for-a-composite-primary-key-column.
So the mitigation for the fact that your specific test case silently
fails include:
1. this whole issue only silently passes on SQLite, not on any of the
higher volume databases where you'd want to use bulk operations in the
first place
2. documentation here should be spruced up to list this practice as a
**warning**, including that we should also have a boldface up in the
earlier paragraph talking about fetching of inserted primary keys being
disabled (this is the slowest part of the INSERT so has no place within
bulk inserts, hence you must populate columns dependent on a PK up front
which means the PK itself needs to be populated up front in those cases
where you need it)
3. SQLAlchemy 1.1 won't let these INSERTs without a primary key value
when the column is not configured as an "autoincrement" proceed
Thanks for the clear test case here.
The following code demonstrates the problem:
|
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
fromsqlalchemy importcreate_engine
fromsqlalchemy importColumn,Integer,SmallInteger,String,ForeignKey
fromsqlalchemy.orm importsessionmaker
fromsqlalchemy.ext.declarative importdeclarative_base
Base=declarative_base()
classEntity(Base):
__tablename__ ='Entity'
Id=Column(Integer,primary_key=True,nullable=False)
Content=Column(String)
_polytype =Column(SmallInteger,nullable=False)
__mapper_args__ ={
'polymorphic_identity':1,
'polymorphic_on':_polytype
}
classChild1(Entity):
__tablename__ ='Child1'
MyId=Column(ForeignKey("Entity.Id"),primary_key=True)
__mapper_args__ ={'polymorphic_identity':11}
classChild2(Entity):
__tablename__ ='Child2'
MyId=Column(ForeignKey("Entity.Id"),primary_key=True)
__mapper_args__ ={'polymorphic_identity':12}
if__name__ =='__main__':
# engine = create_engine('sqlite:///:memory:', echo=False)
engine =create_engine('sqlite:///test.db',echo=False)
Session=sessionmaker(bind=engine)
sess =Session()
Base.metadata.create_all(engine)
c1_many =[Child1(Content="c1inst_%d"%i)fori inrange(0,1000)]
c2_many =[Child2(Content="c2inst_%d"%i)fori inrange(0,1000)]
sess.bulk_save_objects(c1_many)
sess.bulk_save_objects(c2_many)
# sess.add_all(c1_many)
# sess.add_all(c2_many)
sess.flush()
sess.commit()
forc insess.query(Child1):
assertisinstance(c,Child1)
forc insess.query(Child2):
assertisinstance(c,Child2)
|
All the calls to assert isinstance(c, Child1) complete successfully.
But once we start checking for Child2 - boom, we are still getting back
Child1 instances.
At first I wondered if I was misunderstanding SA's implementation of
polymorphism, so tried inserting rows the traditional way with
sess.add_all(). But that works fine so I think I've exposed a bug in
the new bulk_save_objects() code.
My environment is Python 3.5.1, SQLAlchemy==1.0.12, SQLite 3.8.10.2 on OSX.
--
You received this message because you are subscribed to the Google
Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To post to this group, send email to [email protected]
<mailto:[email protected]>.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.