Re: [sqlalchemy] Application Process Unusable After “… - can't proceed with initialization of other mappers.”

Mike Bayer Fri, 25 May 2018 06:44:19 -0700

Here's a second demo why _mapper_registry exists and has to be global:

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import inspect


Base = declarative_base()


class A(Base):
    __tablename__ = 'a'

    id = Column(Integer, primary_key=True)
    data = Column(String)


class B(Base):
    __tablename__ = 'b'

    id = Column(Integer, primary_key=True)
    aid = Column(ForeignKey('a.id'))
    data = Column(String)
    a = relationship("A", backref="bs")


e = create_engine("sqlite://", echo=True)
Base.metadata.create_all(e)

# let's assume SQLAlchemy did not keep a global list of all mappers.
# let's take away "configure_mappers" and instead only configure mappers
# as the program refers to them.   Patch out configure_mappers to do nothing.
from sqlalchemy.orm import mapperlib
mapperlib.configure_mappers = lambda: None


# here's the specific steps configure_mappers() needs to do for a particular
# mapper.
def configure_mapper(cls):
    a_mapper = inspect(cls).mapper
    a_mapper._post_configure_properties()
    a_mapper._expire_memoizations()
    a_mapper.dispatch.mapper_configured(
        a_mapper, a_mapper.class_)

# about to use "A".  Let's configure:
configure_mapper(A)
a1 = A()

# check the a1.bs collection.   Except.  There isn't one :)  mappers often
# contain instructions to add attributes to other mappers.  if the program
# refers to those dependent mappers first, we need to have configured
# all mappers.   if one of the mappers is failing, that's a bug in the program.
# the program is not expected to be able to run with bad mappings in it.
assert a1.bs == []

# program has failed

# maybe we could use the "Base" here as the collection, not global
_mapper_registry.
# but we support relationships and other linkages between mappers that
have different
# "bases", and declarative is only an extension to the core mapping
system in any case.
# perhaps a "SQLAlchemy 2" can tighten and modernize this system so
that all mappers
# are truly segmented into independent namespaces but this can also
cause more confusion
# too.   Overall, programs just aren't expected to have invalid mappings.


a1bs = a1.bs

# about to use "B". Let's configure:
configure_mapper(B)
a1bs.append(B())



On Fri, May 25, 2018 at 9:24 AM, Mike Bayer <[email protected]> wrote:
> Here's a demo, you want to also call gc_collect() when you dispose of
> your failed mappers:
>
> from sqlalchemy import *
> from sqlalchemy.orm import *
> from sqlalchemy.ext.declarative import declarative_base
> from sqlalchemy import exc
>
> Base = declarative_base()
>
>
> class A(Base):
>     __tablename__ = 'a'
>
>     id = Column(Integer, primary_key=True)
>     data = Column(String)
>     bs = relationship("B")
>
>
> class B(Base):
>     __tablename__ = 'b'
>
>     id = Column(Integer, primary_key=True)
>     aid = Column(ForeignKey('a.id'))
>     data = Column(String)
>
>
> # bad mapper
> class C(Base):
>     __tablename__ = 'c'
>
>     id = Column(Integer, primary_key=True)
>
>     # nonexistent relationship
>     q = relationship("q")
>
> try:
>     configure_mappers()
> except exc.InvalidRequestError as err:
>     print("First exception: %s" % err)
>
>
> class D(Base):
>     __tablename__ = 'd'
>     id = Column(Integer, primary_key=True)
>     aid = Column(ForeignKey('a.id'))
>     a = relationship("A")
>
> # other mappers are blocked
> try:
>     Session().query(D)
> except exc.InvalidRequestError as err:
>     print("Second exception: %s" % err)
>
> # delete offending class
> del C
>
> # garbage collect, as mappings contain cycles
> import gc
> gc.collect()
>
> # mapper is gone
> print(Session().query(D))
> print("it worked!")
>
>
>
>
>
>
> On Fri, May 25, 2018 at 9:16 AM, Mike Bayer <[email protected]> wrote:
>> On Fri, May 25, 2018 at 7:11 AM, Dave Mittner <[email protected]> wrote:
>>>> So that is already a new fact (which i sort of guessed might be in
>>>> play) that this is a multi-tenant system.  How many databases are we
>>>> talking about that are accessed by a single application?   What do the
>>>> tables in these databases look like, are they all the same across all
>>>> DBs (in which case use fixed mappings) or are they totally different?
>>>> if all the DBs have the same table structure then you should use only
>>>> a single table/mapping structure for each DB.
>>>
>>>
>>> There's probably something like 5 or 6 distinct MySQL databases I connect
>>> to, each on a separate host, each unique. That's why I cache the automapped
>>> classes on a per-server-per-host basis..
>>
>> the critical questions are:   1. do each of these databases have the
>> same table structures?    or if not 2. are these *fixed* structures
>> that you could map statically without using reflection?
>>
>>
>>
>>>
>>>> I've never seen that before but I might guess that you have
>>>> multiple threads reflecting tables and creating classes of the
>>>> identical name in different threads at the same time?   You definitely
>>>> can't do that without modifying how the
>>>> sqlalchemy.orm.mapper._mapper_registry works.   You need to either
>>>> ensure these names are unique at all times, and if you expect multiple
>>>> threads to access the same names, you need to use a mutex to prevent
>>>> them from doing so concurrently.
>>>
>>>
>>> I'm not entirely sure I understand what you're getting at here. Before I
>>> even added the caching, I was fully able to automap the same database over
>>> and over again on each new connection without any issue. And it functions
>>> perfectly fine now with multiple threads running multiple connections to the
>>> same database. The only problem I'm having is after an automap fails to
>>> properly read a database structure and map it to classes, which happens only
>>> rarely.
>>
>> This statement does not make sense without more information.   automap
>> either succeeds against a given database, or it fails.  It is not easy
>> to produce a situation where it fails "only rarely".    Things that
>> could cause this are: 1. the database you are automapping against is
>> constantly changing,   2. the mapping is relying upon some subtle hash
>> ordering or 3. automap is being used in a multi-threaded context where
>> more than one automap tries to hit the same tables at the same time.
>> When something happens "only rarely", that points to race conditions
>> or sometimes hash ordering issues.     To rule out #2, you would
>> create a test script that does your automap by itself and try to run
>> automap hundreds of times against the particular database that caused
>> a failure, if under Python 2 you'd want to set PYTHONHASHSEED='random'
>> in the environment.    But the error you showed looks more like a race
>> of two automaps hitting the same table.
>>
>>
>>
>> What you're suggesting seems to indicate I should be having problems
>>> constantly, but I'm not.
>>
>> the prevalence of a race condition is proportional to how unlikely the
>> race is in the first place and how much concurrency is in play on a
>> given basis.  Whether it happens every minute or once a month doesn't
>> really matter.
>>
>>>
>>>> How does your program know exactly how to interact with these automap
>>>> databases if it knows nothing of what tables are present or their
>>>> structure?     If your application *does* know these things, then you
>>>> should tell automap about it.
>>>
>>>
>>> Well, the entire point of using automap is to make the data easily
>>> accessible through the mapped objects, so the higher level code certainly
>>> knows what it's doing. This low-level DB class, however, is only meant to
>>> expedite the connection configuration process. Ostensibly the higher level
>>> code could pass more information into the lower level class in terms of
>>> tables it'll be using and so forth, but that's a pretty expensive way to
>>> bypass this problem.
>>
>> If the schemas of these databases are fixed, it's typical to have them
>> mapped up front.  Since you have a finite, small number of databases,
>> you should look into having them mapped either up front or within a
>> section that is mutexed by a threading.Lock, and in either case you
>> call configure_mappers() as soon as they are all set up.
>>
>>
>>
>>>
>>>> As far as legitimate mappings failing afterwards, that's only if you
>>>> don't get rid of these failed mappers.  If you "del" a mapper that
>>>> failed to configure and make sure it is garbage collected, it will not
>>>> interfere with subsequent mappings.   so you probably want to call
>>>> configure_mappers(), then if it fails, make sure you lose references
>>>> to those mappers that failed.
>>>
>>>
>>> As I indicated in my original post, it's not my code that's holding on to
>>> the fact that a mapping failed. It's being done in the global space within
>>> SQLAlchemy, itself. That's what I find utterly mind boggling. I don't know
>>> why it would hold on to anything in the global space and certainly not
>>> trigger failures on subsequent mapping attempts if a past attempt failed,
>>> but that's exactly what it's doing.
>>>
>>> You can see the code here:
>>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/mapper.py#L3017
>>
>> SQLAlchemy does not hold onto mappers that your program does not refer
>> towards.  I can demonstrate this if you like.   The code at 3017 is
>> only invoked if the mapper is still in the _mapper_registry, which is
>> a WeakKeyDictionary.  SQLAlchemy only maintains weak references to a
>> particular mapper.
>>
>> Now, there are a lot of challenges in getting your program to no
>> longer refer to a particular mapper, depending on what you're doing.
>> If other mappers extend from your mapper, that's a strong reference.
>> If other mappers refer to your mapper via relationship(), thats a
>> strong reference.   But here, if you are keeping these per-connection
>> mappings all independent of each other, they should get garbage
>> collected.     But if there's a bug in the weak referencing in
>> SQLAlchemy or there's a particularly difficult mapping pattern in your
>> program you need help releasing, I'd need to see specific scripts that
>> illustrate this.
>>
>>
>>>
>>> On Thu, May 24, 2018 at 8:11 PM, Mike Bayer <[email protected]>
>>> wrote:
>>>>
>>>> On Thu, May 24, 2018 at 10:34 PM, Dave Mittner <[email protected]>
>>>> wrote:
>>>> > Automap was, at the most regular, only ever used upon connection
>>>> > creation to
>>>> > a given database. (obvious implication is that DB changes within a
>>>> > connection might be problematic? not an applicable scenario thus far in
>>>> > our
>>>> > code)
>>>>
>>>> So that is already a new fact (which i sort of guessed might be in
>>>> play) that this is a multi-tenant system.  How many databases are we
>>>> talking about that are accessed by a single application?   What do the
>>>> tables in these databases look like, are they all the same across all
>>>> DBs (in which case use fixed mappings) or are they totally different?
>>>> if all the DBs have the same table structure then you should use only
>>>> a single table/mapping structure for each DB.
>>>>
>>>> If I am reading the error you are getting over at
>>>>
>>>> https://stackoverflow.com/questions/50123090/application-process-unusable-after-cant-proceed-with-initialization-of-o
>>>> correctly, I've never seen that before but I might guess that you have
>>>> multiple threads reflecting tables and creating classes of the
>>>> identical name in different threads at the same time?   You definitely
>>>> can't do that without modifying how the
>>>> sqlalchemy.orm.mapper._mapper_registry works.   You need to either
>>>> ensure these names are unique at all times, and if you expect multiple
>>>> threads to access the same names, you need to use a mutex to prevent
>>>> them from doing so concurrently.
>>>>
>>>> >
>>>> > Here's the gist of my connection initiation code that's run upon
>>>> > instantiation of the DB class.
>>>> > Mind you, a prior coworker wrote this and there's so many ways of
>>>> > interfacing with SQLAlchemy that I'm not even sure where he got it from.
>>>> >
>>>> >>             db_url = engine.url.URL(drivername = self.drivername,
>>>> >>                                     username = self.username,
>>>> >>                                     password = self.password,
>>>> >>                                     host = self.host,
>>>> >>                                     database = self.database,
>>>> >>                                     query = {'charset':'utf8'})
>>>> >>             self.engine =
>>>> >> create_engine(db_url,encoding='utf8',convert_unicode=True)
>>>> >>             self.session = Session(self.engine)
>>>> >>             self.connection = self.engine.connect()
>>>> >>
>>>> >>
>>>> >>
>>>> >>             # Automap functionality has a lot of overhead, so cache the
>>>> >> results on a per-host/database basis
>>>> >>             id = (self.drivername,self.host,self.database)
>>>> >>             if id not in DB.tables:
>>>> >>                 base = automap_base()
>>>> >>                 base.prepare(self.engine, reflect=True,
>>>> >> name_for_scalar_relationship=name_for_scalar_relationship)
>>>> >>                 DB.tables[id] = base.classes
>>>> >>             self.tables = DB.tables[id]
>>>>
>>>>
>>>> How does your program know exactly how to interact with these automap
>>>> databases if it knows nothing of what tables are present or their
>>>> structure?     If your application *does* know these things, then you
>>>> should tell automap about it.   Especially if only care about three
>>>> tables, use metadata.reflect() and pass those names to "only":
>>>>
>>>> http://docs.sqlalchemy.org/en/latest/core/metadata.html?highlight=metadata%20reflect#sqlalchemy.schema.MetaData.reflect.params.only
>>>> .   Further, your application, assuming it isn't just letting users
>>>> query whatever they want, would need to know about specific fields and
>>>> columns on these tables in order to work with queries and mapped
>>>> objects effectively.  You can map these fields and columns up front
>>>> since you know what they are, without using automap.   then you add
>>>> unit tests for these mappings to make sure they work.  that's the
>>>> typical structure of a live production application.
>>>>
>>>> automap probably can use another green warning box at
>>>>
>>>> http://docs.sqlalchemy.org/en/latest/orm/extensions/automap.html#module-sqlalchemy.ext.automap
>>>> but the idea is that it's intended for expedient access to a
>>>> particular database in an "offline", ad-hoc context.   It isn't
>>>> oriented towards high capacity use in a production application against
>>>> databases of unknown structure, because that's not generally useful
>>>> anyway, a high capacity production application would have a more
>>>> formalized notion of its schema.    automap expects to fail against an
>>>> unknown database until it is customized to work around the issues in
>>>> that DB, such as the functions for resolving naming conflicts
>>>>
>>>> (http://docs.sqlalchemy.org/en/latest/orm/extensions/automap.html#handling-simple-naming-conflicts).
>>>>
>>>> As far as legitimate mappings failing afterwards, that's only if you
>>>> don't get rid of these failed mappers.  If you "del" a mapper that
>>>> failed to configure and make sure it is garbage collected, it will not
>>>> interfere with subsequent mappings.   so you probably want to call
>>>> configure_mappers(), then if it fails, make sure you lose references
>>>> to those mappers that failed.
>>>>
>>>>
>>>>
>>>>
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Thu, May 24, 2018 at 5:50 PM, Mike Bayer <[email protected]>
>>>> > wrote:
>>>> >>
>>>> >> On Thu, May 24, 2018 at 5:40 PM, Dave Mittner <[email protected]>
>>>> >> wrote:
>>>> >> > Also posted here:
>>>> >> >
>>>> >> >
>>>> >> > https://stackoverflow.com/questions/50123090/application-process-unusable-after-cant-proceed-with-initialization-of-o
>>>> >> >
>>>> >> >
>>>> >> > I have a multithreaded application that runs various jobs in threads.
>>>> >> > One of
>>>> >> > these jobs goes out to various data sources to query for data. On
>>>> >> > occasion
>>>> >> > the mapping process fails and an exception is thrown.
>>>> >> >
>>>> >> > That on its own isn't a big deal; my system is designed to compensate
>>>> >> > for
>>>> >> > periodically failing jobs.
>>>> >> >
>>>> >> > The problem is that that mapping failure seems to be recorded in a
>>>> >> > global
>>>> >> > space that then prevents all future mapping attempts to be aborted.
>>>> >> > Even
>>>> >> > attempts on completely different threads using completely different
>>>> >> > databases. This renders my entire application effectively broken from
>>>> >> > that
>>>> >> > point on.
>>>> >> >
>>>> >> > After looking in SQLAlchemy's code, mappers are stored in a
>>>> >> > _mapper_registry
>>>> >> > global space variable and once any mapper in the registry errors out,
>>>> >> > any
>>>> >> > attempt to configure a new mapper will fail.
>>>> >> >
>>>> >> > Mapping failures of this nature may be rare -- and indeed it only
>>>> >> > rarely
>>>> >> > happens on the connection I'm having a problem with -- but this
>>>> >> > complete
>>>> >> > locking behavior of all future mapping seems very odd to me. If there
>>>> >> > isn't
>>>> >> > a way around this I might have no choice but to have my process
>>>> >> > completely
>>>> >> > exit when the exception is encountered, even if that means killing
>>>> >> > other
>>>> >> > running threads.
>>>> >>
>>>> >> are you creating mappers on the fly or on a per-request basis?   You'd
>>>> >> want to ideally have mappings created just once at the module import
>>>> >> level.  Then when your application is ready to start up, call
>>>> >> configure_mappers() and everything will be set up.
>>>> >>
>>>> >> if those are not patterns you're able to use, then please provide more
>>>> >> specifics.   from your stack trace on SO, it seems like you are using
>>>> >> automap.   When is that running?  If per request, this very expensive
>>>> >> and will have problems.
>>>> >>
>>>> >> The mapping process *is* guarded by a mutex so it is difficult to
>>>> >> produce an issue with mappings failing - the stack trace you post
>>>> >> almost appears like there is some kind of naming issue happening where
>>>> >> a particular mapper has been garbage collected or something like that
>>>> >> yet still being referred towards by other mappers that are being
>>>> >> configured.     need to see details of how your code works.
>>>> >>
>>>> >>
>>>> >>
>>>> >> >
>>>> >> > Any ideas?
>>>> >> >
>>>> >> > --
>>>> >> > SQLAlchemy -
>>>> >> > The Python SQL Toolkit and Object Relational Mapper
>>>> >> >
>>>> >> > http://www.sqlalchemy.org/
>>>> >> >
>>>> >> > To post example code, please provide an MCVE: Minimal, Complete, and
>>>> >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>>>> >> > description.
>>>> >> > ---
>>>> >> > You received this message because you are subscribed to the Google
>>>> >> > Groups
>>>> >> > "sqlalchemy" group.
>>>> >> > To unsubscribe from this group and stop receiving emails from it,
>>>> >> > send
>>>> >> > an
>>>> >> > email to [email protected].
>>>> >> > To post to this group, send email to [email protected].
>>>> >> > Visit this group at https://groups.google.com/group/sqlalchemy.
>>>> >> > For more options, visit https://groups.google.com/d/optout.
>>>> >>
>>>> >> --
>>>> >> SQLAlchemy -
>>>> >> The Python SQL Toolkit and Object Relational Mapper
>>>> >>
>>>> >> http://www.sqlalchemy.org/
>>>> >>
>>>> >> To post example code, please provide an MCVE: Minimal, Complete, and
>>>> >> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full
>>>> >> description.
>>>> >> ---
>>>> >> You received this message because you are subscribed to the Google
>>>> >> Groups
>>>> >> "sqlalchemy" group.
>>>> >> To unsubscribe from this group and stop receiving emails from it, send
>>>> >> an
>>>> >> email to [email protected].
>>>> >> To post to this group, send email to [email protected].
>>>> >> Visit this group at https://groups.google.com/group/sqlalchemy.
>>>> >> For more options, visit https://groups.google.com/d/optout.
>>>> >
>>>> >
>>>> > --
>>>> > SQLAlchemy -
>>>> > The Python SQL Toolkit and Object Relational Mapper
>>>> >
>>>> > http://www.sqlalchemy.org/
>>>> >
>>>> > To post example code, please provide an MCVE: Minimal, Complete, and
>>>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>>>> > description.
>>>> > ---
>>>> > You received this message because you are subscribed to the Google
>>>> > Groups
>>>> > "sqlalchemy" group.
>>>> > To unsubscribe from this group and stop receiving emails from it, send
>>>> > an
>>>> > email to [email protected].
>>>> > To post to this group, send email to [email protected].
>>>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>>>> > For more options, visit https://groups.google.com/d/optout.
>>>>
>>>> --
>>>> SQLAlchemy -
>>>> The Python SQL Toolkit and Object Relational Mapper
>>>>
>>>> http://www.sqlalchemy.org/
>>>>
>>>> To post example code, please provide an MCVE: Minimal, Complete, and
>>>> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full
>>>> description.
>>>> ---
>>>> You received this message because you are subscribed to the Google Groups
>>>> "sqlalchemy" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>> email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/sqlalchemy.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>> --
>>> SQLAlchemy -
>>> The Python SQL Toolkit and Object Relational Mapper
>>>
>>> http://www.sqlalchemy.org/
>>>
>>> To post example code, please provide an MCVE: Minimal, Complete, and
>>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>>> description.
>>> ---
>>> You received this message because you are subscribed to the Google Groups
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Application Process Unusable After “… - can't proceed with initialization of other mappers.”

Reply via email to