Re: [sqlalchemy] Application Process Unusable After “… - can't proceed with initialization of other mappers.”

Mike Bayer Fri, 25 May 2018 06:24:50 -0700

Here's a demo, you want to also call gc_collect() when you dispose of
your failed mappers:


from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import exc

Base = declarative_base()


class A(Base):
    __tablename__ = 'a'

    id = Column(Integer, primary_key=True)
    data = Column(String)
    bs = relationship("B")


class B(Base):
    __tablename__ = 'b'

    id = Column(Integer, primary_key=True)
    aid = Column(ForeignKey('a.id'))
    data = Column(String)


# bad mapper
class C(Base):
    __tablename__ = 'c'

    id = Column(Integer, primary_key=True)

    # nonexistent relationship
    q = relationship("q")

try:
    configure_mappers()
except exc.InvalidRequestError as err:
    print("First exception: %s" % err)


class D(Base):
    __tablename__ = 'd'
    id = Column(Integer, primary_key=True)
    aid = Column(ForeignKey('a.id'))
    a = relationship("A")

# other mappers are blocked
try:
    Session().query(D)
except exc.InvalidRequestError as err:
    print("Second exception: %s" % err)

# delete offending class
del C

# garbage collect, as mappings contain cycles
import gc
gc.collect()

# mapper is gone
print(Session().query(D))
print("it worked!")






On Fri, May 25, 2018 at 9:16 AM, Mike Bayer <[email protected]> wrote:
> On Fri, May 25, 2018 at 7:11 AM, Dave Mittner <[email protected]> wrote:
>>> So that is already a new fact (which i sort of guessed might be in
>>> play) that this is a multi-tenant system.  How many databases are we
>>> talking about that are accessed by a single application?   What do the
>>> tables in these databases look like, are they all the same across all
>>> DBs (in which case use fixed mappings) or are they totally different?
>>> if all the DBs have the same table structure then you should use only
>>> a single table/mapping structure for each DB.
>>
>>
>> There's probably something like 5 or 6 distinct MySQL databases I connect
>> to, each on a separate host, each unique. That's why I cache the automapped
>> classes on a per-server-per-host basis..
>
> the critical questions are:   1. do each of these databases have the
> same table structures?    or if not 2. are these *fixed* structures
> that you could map statically without using reflection?
>
>
>
>>
>>> I've never seen that before but I might guess that you have
>>> multiple threads reflecting tables and creating classes of the
>>> identical name in different threads at the same time?   You definitely
>>> can't do that without modifying how the
>>> sqlalchemy.orm.mapper._mapper_registry works.   You need to either
>>> ensure these names are unique at all times, and if you expect multiple
>>> threads to access the same names, you need to use a mutex to prevent
>>> them from doing so concurrently.
>>
>>
>> I'm not entirely sure I understand what you're getting at here. Before I
>> even added the caching, I was fully able to automap the same database over
>> and over again on each new connection without any issue. And it functions
>> perfectly fine now with multiple threads running multiple connections to the
>> same database. The only problem I'm having is after an automap fails to
>> properly read a database structure and map it to classes, which happens only
>> rarely.
>
> This statement does not make sense without more information.   automap
> either succeeds against a given database, or it fails.  It is not easy
> to produce a situation where it fails "only rarely".    Things that
> could cause this are: 1. the database you are automapping against is
> constantly changing,   2. the mapping is relying upon some subtle hash
> ordering or 3. automap is being used in a multi-threaded context where
> more than one automap tries to hit the same tables at the same time.
> When something happens "only rarely", that points to race conditions
> or sometimes hash ordering issues.     To rule out #2, you would
> create a test script that does your automap by itself and try to run
> automap hundreds of times against the particular database that caused
> a failure, if under Python 2 you'd want to set PYTHONHASHSEED='random'
> in the environment.    But the error you showed looks more like a race
> of two automaps hitting the same table.
>
>
>
> What you're suggesting seems to indicate I should be having problems
>> constantly, but I'm not.
>
> the prevalence of a race condition is proportional to how unlikely the
> race is in the first place and how much concurrency is in play on a
> given basis.  Whether it happens every minute or once a month doesn't
> really matter.
>
>>
>>> How does your program know exactly how to interact with these automap
>>> databases if it knows nothing of what tables are present or their
>>> structure?     If your application *does* know these things, then you
>>> should tell automap about it.
>>
>>
>> Well, the entire point of using automap is to make the data easily
>> accessible through the mapped objects, so the higher level code certainly
>> knows what it's doing. This low-level DB class, however, is only meant to
>> expedite the connection configuration process. Ostensibly the higher level
>> code could pass more information into the lower level class in terms of
>> tables it'll be using and so forth, but that's a pretty expensive way to
>> bypass this problem.
>
> If the schemas of these databases are fixed, it's typical to have them
> mapped up front.  Since you have a finite, small number of databases,
> you should look into having them mapped either up front or within a
> section that is mutexed by a threading.Lock, and in either case you
> call configure_mappers() as soon as they are all set up.
>
>
>
>>
>>> As far as legitimate mappings failing afterwards, that's only if you
>>> don't get rid of these failed mappers.  If you "del" a mapper that
>>> failed to configure and make sure it is garbage collected, it will not
>>> interfere with subsequent mappings.   so you probably want to call
>>> configure_mappers(), then if it fails, make sure you lose references
>>> to those mappers that failed.
>>
>>
>> As I indicated in my original post, it's not my code that's holding on to
>> the fact that a mapping failed. It's being done in the global space within
>> SQLAlchemy, itself. That's what I find utterly mind boggling. I don't know
>> why it would hold on to anything in the global space and certainly not
>> trigger failures on subsequent mapping attempts if a past attempt failed,
>> but that's exactly what it's doing.
>>
>> You can see the code here:
>> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/mapper.py#L3017
>
> SQLAlchemy does not hold onto mappers that your program does not refer
> towards.  I can demonstrate this if you like.   The code at 3017 is
> only invoked if the mapper is still in the _mapper_registry, which is
> a WeakKeyDictionary.  SQLAlchemy only maintains weak references to a
> particular mapper.
>
> Now, there are a lot of challenges in getting your program to no
> longer refer to a particular mapper, depending on what you're doing.
> If other mappers extend from your mapper, that's a strong reference.
> If other mappers refer to your mapper via relationship(), thats a
> strong reference.   But here, if you are keeping these per-connection
> mappings all independent of each other, they should get garbage
> collected.     But if there's a bug in the weak referencing in
> SQLAlchemy or there's a particularly difficult mapping pattern in your
> program you need help releasing, I'd need to see specific scripts that
> illustrate this.
>
>
>>
>> On Thu, May 24, 2018 at 8:11 PM, Mike Bayer <[email protected]>
>> wrote:
>>>
>>> On Thu, May 24, 2018 at 10:34 PM, Dave Mittner <[email protected]>
>>> wrote:
>>> > Automap was, at the most regular, only ever used upon connection
>>> > creation to
>>> > a given database. (obvious implication is that DB changes within a
>>> > connection might be problematic? not an applicable scenario thus far in
>>> > our
>>> > code)
>>>
>>> So that is already a new fact (which i sort of guessed might be in
>>> play) that this is a multi-tenant system.  How many databases are we
>>> talking about that are accessed by a single application?   What do the
>>> tables in these databases look like, are they all the same across all
>>> DBs (in which case use fixed mappings) or are they totally different?
>>> if all the DBs have the same table structure then you should use only
>>> a single table/mapping structure for each DB.
>>>
>>> If I am reading the error you are getting over at
>>>
>>> https://stackoverflow.com/questions/50123090/application-process-unusable-after-cant-proceed-with-initialization-of-o
>>> correctly, I've never seen that before but I might guess that you have
>>> multiple threads reflecting tables and creating classes of the
>>> identical name in different threads at the same time?   You definitely
>>> can't do that without modifying how the
>>> sqlalchemy.orm.mapper._mapper_registry works.   You need to either
>>> ensure these names are unique at all times, and if you expect multiple
>>> threads to access the same names, you need to use a mutex to prevent
>>> them from doing so concurrently.
>>>
>>> >
>>> > Here's the gist of my connection initiation code that's run upon
>>> > instantiation of the DB class.
>>> > Mind you, a prior coworker wrote this and there's so many ways of
>>> > interfacing with SQLAlchemy that I'm not even sure where he got it from.
>>> >
>>> >>             db_url = engine.url.URL(drivername = self.drivername,
>>> >>                                     username = self.username,
>>> >>                                     password = self.password,
>>> >>                                     host = self.host,
>>> >>                                     database = self.database,
>>> >>                                     query = {'charset':'utf8'})
>>> >>             self.engine =
>>> >> create_engine(db_url,encoding='utf8',convert_unicode=True)
>>> >>             self.session = Session(self.engine)
>>> >>             self.connection = self.engine.connect()
>>> >>
>>> >>
>>> >>
>>> >>             # Automap functionality has a lot of overhead, so cache the
>>> >> results on a per-host/database basis
>>> >>             id = (self.drivername,self.host,self.database)
>>> >>             if id not in DB.tables:
>>> >>                 base = automap_base()
>>> >>                 base.prepare(self.engine, reflect=True,
>>> >> name_for_scalar_relationship=name_for_scalar_relationship)
>>> >>                 DB.tables[id] = base.classes
>>> >>             self.tables = DB.tables[id]
>>>
>>>
>>> How does your program know exactly how to interact with these automap
>>> databases if it knows nothing of what tables are present or their
>>> structure?     If your application *does* know these things, then you
>>> should tell automap about it.   Especially if only care about three
>>> tables, use metadata.reflect() and pass those names to "only":
>>>
>>> http://docs.sqlalchemy.org/en/latest/core/metadata.html?highlight=metadata%20reflect#sqlalchemy.schema.MetaData.reflect.params.only
>>> .   Further, your application, assuming it isn't just letting users
>>> query whatever they want, would need to know about specific fields and
>>> columns on these tables in order to work with queries and mapped
>>> objects effectively.  You can map these fields and columns up front
>>> since you know what they are, without using automap.   then you add
>>> unit tests for these mappings to make sure they work.  that's the
>>> typical structure of a live production application.
>>>
>>> automap probably can use another green warning box at
>>>
>>> http://docs.sqlalchemy.org/en/latest/orm/extensions/automap.html#module-sqlalchemy.ext.automap
>>> but the idea is that it's intended for expedient access to a
>>> particular database in an "offline", ad-hoc context.   It isn't
>>> oriented towards high capacity use in a production application against
>>> databases of unknown structure, because that's not generally useful
>>> anyway, a high capacity production application would have a more
>>> formalized notion of its schema.    automap expects to fail against an
>>> unknown database until it is customized to work around the issues in
>>> that DB, such as the functions for resolving naming conflicts
>>>
>>> (http://docs.sqlalchemy.org/en/latest/orm/extensions/automap.html#handling-simple-naming-conflicts).
>>>
>>> As far as legitimate mappings failing afterwards, that's only if you
>>> don't get rid of these failed mappers.  If you "del" a mapper that
>>> failed to configure and make sure it is garbage collected, it will not
>>> interfere with subsequent mappings.   so you probably want to call
>>> configure_mappers(), then if it fails, make sure you lose references
>>> to those mappers that failed.
>>>
>>>
>>>
>>>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, May 24, 2018 at 5:50 PM, Mike Bayer <[email protected]>
>>> > wrote:
>>> >>
>>> >> On Thu, May 24, 2018 at 5:40 PM, Dave Mittner <[email protected]>
>>> >> wrote:
>>> >> > Also posted here:
>>> >> >
>>> >> >
>>> >> > https://stackoverflow.com/questions/50123090/application-process-unusable-after-cant-proceed-with-initialization-of-o
>>> >> >
>>> >> >
>>> >> > I have a multithreaded application that runs various jobs in threads.
>>> >> > One of
>>> >> > these jobs goes out to various data sources to query for data. On
>>> >> > occasion
>>> >> > the mapping process fails and an exception is thrown.
>>> >> >
>>> >> > That on its own isn't a big deal; my system is designed to compensate
>>> >> > for
>>> >> > periodically failing jobs.
>>> >> >
>>> >> > The problem is that that mapping failure seems to be recorded in a
>>> >> > global
>>> >> > space that then prevents all future mapping attempts to be aborted.
>>> >> > Even
>>> >> > attempts on completely different threads using completely different
>>> >> > databases. This renders my entire application effectively broken from
>>> >> > that
>>> >> > point on.
>>> >> >
>>> >> > After looking in SQLAlchemy's code, mappers are stored in a
>>> >> > _mapper_registry
>>> >> > global space variable and once any mapper in the registry errors out,
>>> >> > any
>>> >> > attempt to configure a new mapper will fail.
>>> >> >
>>> >> > Mapping failures of this nature may be rare -- and indeed it only
>>> >> > rarely
>>> >> > happens on the connection I'm having a problem with -- but this
>>> >> > complete
>>> >> > locking behavior of all future mapping seems very odd to me. If there
>>> >> > isn't
>>> >> > a way around this I might have no choice but to have my process
>>> >> > completely
>>> >> > exit when the exception is encountered, even if that means killing
>>> >> > other
>>> >> > running threads.
>>> >>
>>> >> are you creating mappers on the fly or on a per-request basis?   You'd
>>> >> want to ideally have mappings created just once at the module import
>>> >> level.  Then when your application is ready to start up, call
>>> >> configure_mappers() and everything will be set up.
>>> >>
>>> >> if those are not patterns you're able to use, then please provide more
>>> >> specifics.   from your stack trace on SO, it seems like you are using
>>> >> automap.   When is that running?  If per request, this very expensive
>>> >> and will have problems.
>>> >>
>>> >> The mapping process *is* guarded by a mutex so it is difficult to
>>> >> produce an issue with mappings failing - the stack trace you post
>>> >> almost appears like there is some kind of naming issue happening where
>>> >> a particular mapper has been garbage collected or something like that
>>> >> yet still being referred towards by other mappers that are being
>>> >> configured.     need to see details of how your code works.
>>> >>
>>> >>
>>> >>
>>> >> >
>>> >> > Any ideas?
>>> >> >
>>> >> > --
>>> >> > SQLAlchemy -
>>> >> > The Python SQL Toolkit and Object Relational Mapper
>>> >> >
>>> >> > http://www.sqlalchemy.org/
>>> >> >
>>> >> > To post example code, please provide an MCVE: Minimal, Complete, and
>>> >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>>> >> > description.
>>> >> > ---
>>> >> > You received this message because you are subscribed to the Google
>>> >> > Groups
>>> >> > "sqlalchemy" group.
>>> >> > To unsubscribe from this group and stop receiving emails from it,
>>> >> > send
>>> >> > an
>>> >> > email to [email protected].
>>> >> > To post to this group, send email to [email protected].
>>> >> > Visit this group at https://groups.google.com/group/sqlalchemy.
>>> >> > For more options, visit https://groups.google.com/d/optout.
>>> >>
>>> >> --
>>> >> SQLAlchemy -
>>> >> The Python SQL Toolkit and Object Relational Mapper
>>> >>
>>> >> http://www.sqlalchemy.org/
>>> >>
>>> >> To post example code, please provide an MCVE: Minimal, Complete, and
>>> >> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full
>>> >> description.
>>> >> ---
>>> >> You received this message because you are subscribed to the Google
>>> >> Groups
>>> >> "sqlalchemy" group.
>>> >> To unsubscribe from this group and stop receiving emails from it, send
>>> >> an
>>> >> email to [email protected].
>>> >> To post to this group, send email to [email protected].
>>> >> Visit this group at https://groups.google.com/group/sqlalchemy.
>>> >> For more options, visit https://groups.google.com/d/optout.
>>> >
>>> >
>>> > --
>>> > SQLAlchemy -
>>> > The Python SQL Toolkit and Object Relational Mapper
>>> >
>>> > http://www.sqlalchemy.org/
>>> >
>>> > To post example code, please provide an MCVE: Minimal, Complete, and
>>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>>> > description.
>>> > ---
>>> > You received this message because you are subscribed to the Google
>>> > Groups
>>> > "sqlalchemy" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send
>>> > an
>>> > email to [email protected].
>>> > To post to this group, send email to [email protected].
>>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>>> > For more options, visit https://groups.google.com/d/optout.
>>>
>>> --
>>> SQLAlchemy -
>>> The Python SQL Toolkit and Object Relational Mapper
>>>
>>> http://www.sqlalchemy.org/
>>>
>>> To post example code, please provide an MCVE: Minimal, Complete, and
>>> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full
>>> description.
>>> ---
>>> You received this message because you are subscribed to the Google Groups
>>> "sqlalchemy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/sqlalchemy.
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalchemy.org/
>>
>> To post example code, please provide an MCVE: Minimal, Complete, and
>> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> description.
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Application Process Unusable After “… - can't proceed with initialization of other mappers.”

Reply via email to