Re: [sqlalchemy] Application Process Unusable After “… - can't proceed with initialization of other mappers.”

Mike Bayer Fri, 25 May 2018 06:16:52 -0700

On Fri, May 25, 2018 at 7:11 AM, Dave Mittner <[email protected]> wrote:
>> So that is already a new fact (which i sort of guessed might be in
>> play) that this is a multi-tenant system.  How many databases are we
>> talking about that are accessed by a single application?   What do the
>> tables in these databases look like, are they all the same across all
>> DBs (in which case use fixed mappings) or are they totally different?
>> if all the DBs have the same table structure then you should use only
>> a single table/mapping structure for each DB.
>
>
> There's probably something like 5 or 6 distinct MySQL databases I connect
> to, each on a separate host, each unique. That's why I cache the automapped
> classes on a per-server-per-host basis..


the critical questions are:   1. do each of these databases have the
same table structures?    or if not 2. are these *fixed* structures
that you could map statically without using reflection?



>
>> I've never seen that before but I might guess that you have
>> multiple threads reflecting tables and creating classes of the
>> identical name in different threads at the same time?   You definitely
>> can't do that without modifying how the
>> sqlalchemy.orm.mapper._mapper_registry works.   You need to either
>> ensure these names are unique at all times, and if you expect multiple
>> threads to access the same names, you need to use a mutex to prevent
>> them from doing so concurrently.
>
>
> I'm not entirely sure I understand what you're getting at here. Before I
> even added the caching, I was fully able to automap the same database over
> and over again on each new connection without any issue. And it functions
> perfectly fine now with multiple threads running multiple connections to the
> same database. The only problem I'm having is after an automap fails to
> properly read a database structure and map it to classes, which happens only
> rarely.

This statement does not make sense without more information.   automap
either succeeds against a given database, or it fails.  It is not easy
to produce a situation where it fails "only rarely".    Things that
could cause this are: 1. the database you are automapping against is
constantly changing,   2. the mapping is relying upon some subtle hash
ordering or 3. automap is being used in a multi-threaded context where
more than one automap tries to hit the same tables at the same time.
When something happens "only rarely", that points to race conditions
or sometimes hash ordering issues.     To rule out #2, you would
create a test script that does your automap by itself and try to run
automap hundreds of times against the particular database that caused
a failure, if under Python 2 you'd want to set PYTHONHASHSEED='random'
in the environment.    But the error you showed looks more like a race
of two automaps hitting the same table.



What you're suggesting seems to indicate I should be having problems
> constantly, but I'm not.

the prevalence of a race condition is proportional to how unlikely the
race is in the first place and how much concurrency is in play on a
given basis.  Whether it happens every minute or once a month doesn't
really matter.

>
>> How does your program know exactly how to interact with these automap
>> databases if it knows nothing of what tables are present or their
>> structure?     If your application *does* know these things, then you
>> should tell automap about it.
>
>
> Well, the entire point of using automap is to make the data easily
> accessible through the mapped objects, so the higher level code certainly
> knows what it's doing. This low-level DB class, however, is only meant to
> expedite the connection configuration process. Ostensibly the higher level
> code could pass more information into the lower level class in terms of
> tables it'll be using and so forth, but that's a pretty expensive way to
> bypass this problem.

If the schemas of these databases are fixed, it's typical to have them
mapped up front.  Since you have a finite, small number of databases,
you should look into having them mapped either up front or within a
section that is mutexed by a threading.Lock, and in either case you
call configure_mappers() as soon as they are all set up.



>
>> As far as legitimate mappings failing afterwards, that's only if you
>> don't get rid of these failed mappers.  If you "del" a mapper that
>> failed to configure and make sure it is garbage collected, it will not
>> interfere with subsequent mappings.   so you probably want to call
>> configure_mappers(), then if it fails, make sure you lose references
>> to those mappers that failed.
>
>
> As I indicated in my original post, it's not my code that's holding on to
> the fact that a mapping failed. It's being done in the global space within
> SQLAlchemy, itself. That's what I find utterly mind boggling. I don't know
> why it would hold on to anything in the global space and certainly not
> trigger failures on subsequent mapping attempts if a past attempt failed,
> but that's exactly what it's doing.
>
> You can see the code here:
> https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/mapper.py#L3017

SQLAlchemy does not hold onto mappers that your program does not refer
towards.  I can demonstrate this if you like.   The code at 3017 is
only invoked if the mapper is still in the _mapper_registry, which is
a WeakKeyDictionary.  SQLAlchemy only maintains weak references to a
particular mapper.

Now, there are a lot of challenges in getting your program to no
longer refer to a particular mapper, depending on what you're doing.
If other mappers extend from your mapper, that's a strong reference.
If other mappers refer to your mapper via relationship(), thats a
strong reference.   But here, if you are keeping these per-connection
mappings all independent of each other, they should get garbage
collected.     But if there's a bug in the weak referencing in
SQLAlchemy or there's a particularly difficult mapping pattern in your
program you need help releasing, I'd need to see specific scripts that
illustrate this.


>
> On Thu, May 24, 2018 at 8:11 PM, Mike Bayer <[email protected]>
> wrote:
>>
>> On Thu, May 24, 2018 at 10:34 PM, Dave Mittner <[email protected]>
>> wrote:
>> > Automap was, at the most regular, only ever used upon connection
>> > creation to
>> > a given database. (obvious implication is that DB changes within a
>> > connection might be problematic? not an applicable scenario thus far in
>> > our
>> > code)
>>
>> So that is already a new fact (which i sort of guessed might be in
>> play) that this is a multi-tenant system.  How many databases are we
>> talking about that are accessed by a single application?   What do the
>> tables in these databases look like, are they all the same across all
>> DBs (in which case use fixed mappings) or are they totally different?
>> if all the DBs have the same table structure then you should use only
>> a single table/mapping structure for each DB.
>>
>> If I am reading the error you are getting over at
>>
>> https://stackoverflow.com/questions/50123090/application-process-unusable-after-cant-proceed-with-initialization-of-o
>> correctly, I've never seen that before but I might guess that you have
>> multiple threads reflecting tables and creating classes of the
>> identical name in different threads at the same time?   You definitely
>> can't do that without modifying how the
>> sqlalchemy.orm.mapper._mapper_registry works.   You need to either
>> ensure these names are unique at all times, and if you expect multiple
>> threads to access the same names, you need to use a mutex to prevent
>> them from doing so concurrently.
>>
>> >
>> > Here's the gist of my connection initiation code that's run upon
>> > instantiation of the DB class.
>> > Mind you, a prior coworker wrote this and there's so many ways of
>> > interfacing with SQLAlchemy that I'm not even sure where he got it from.
>> >
>> >>             db_url = engine.url.URL(drivername = self.drivername,
>> >>                                     username = self.username,
>> >>                                     password = self.password,
>> >>                                     host = self.host,
>> >>                                     database = self.database,
>> >>                                     query = {'charset':'utf8'})
>> >>             self.engine =
>> >> create_engine(db_url,encoding='utf8',convert_unicode=True)
>> >>             self.session = Session(self.engine)
>> >>             self.connection = self.engine.connect()
>> >>
>> >>
>> >>
>> >>             # Automap functionality has a lot of overhead, so cache the
>> >> results on a per-host/database basis
>> >>             id = (self.drivername,self.host,self.database)
>> >>             if id not in DB.tables:
>> >>                 base = automap_base()
>> >>                 base.prepare(self.engine, reflect=True,
>> >> name_for_scalar_relationship=name_for_scalar_relationship)
>> >>                 DB.tables[id] = base.classes
>> >>             self.tables = DB.tables[id]
>>
>>
>> How does your program know exactly how to interact with these automap
>> databases if it knows nothing of what tables are present or their
>> structure?     If your application *does* know these things, then you
>> should tell automap about it.   Especially if only care about three
>> tables, use metadata.reflect() and pass those names to "only":
>>
>> http://docs.sqlalchemy.org/en/latest/core/metadata.html?highlight=metadata%20reflect#sqlalchemy.schema.MetaData.reflect.params.only
>> .   Further, your application, assuming it isn't just letting users
>> query whatever they want, would need to know about specific fields and
>> columns on these tables in order to work with queries and mapped
>> objects effectively.  You can map these fields and columns up front
>> since you know what they are, without using automap.   then you add
>> unit tests for these mappings to make sure they work.  that's the
>> typical structure of a live production application.
>>
>> automap probably can use another green warning box at
>>
>> http://docs.sqlalchemy.org/en/latest/orm/extensions/automap.html#module-sqlalchemy.ext.automap
>> but the idea is that it's intended for expedient access to a
>> particular database in an "offline", ad-hoc context.   It isn't
>> oriented towards high capacity use in a production application against
>> databases of unknown structure, because that's not generally useful
>> anyway, a high capacity production application would have a more
>> formalized notion of its schema.    automap expects to fail against an
>> unknown database until it is customized to work around the issues in
>> that DB, such as the functions for resolving naming conflicts
>>
>> (http://docs.sqlalchemy.org/en/latest/orm/extensions/automap.html#handling-simple-naming-conflicts).
>>
>> As far as legitimate mappings failing afterwards, that's only if you
>> don't get rid of these failed mappers.  If you "del" a mapper that
>> failed to configure and make sure it is garbage collected, it will not
>> interfere with subsequent mappings.   so you probably want to call
>> configure_mappers(), then if it fails, make sure you lose references
>> to those mappers that failed.
>>
>>
>>
>>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, May 24, 2018 at 5:50 PM, Mike Bayer <[email protected]>
>> > wrote:
>> >>
>> >> On Thu, May 24, 2018 at 5:40 PM, Dave Mittner <[email protected]>
>> >> wrote:
>> >> > Also posted here:
>> >> >
>> >> >
>> >> > https://stackoverflow.com/questions/50123090/application-process-unusable-after-cant-proceed-with-initialization-of-o
>> >> >
>> >> >
>> >> > I have a multithreaded application that runs various jobs in threads.
>> >> > One of
>> >> > these jobs goes out to various data sources to query for data. On
>> >> > occasion
>> >> > the mapping process fails and an exception is thrown.
>> >> >
>> >> > That on its own isn't a big deal; my system is designed to compensate
>> >> > for
>> >> > periodically failing jobs.
>> >> >
>> >> > The problem is that that mapping failure seems to be recorded in a
>> >> > global
>> >> > space that then prevents all future mapping attempts to be aborted.
>> >> > Even
>> >> > attempts on completely different threads using completely different
>> >> > databases. This renders my entire application effectively broken from
>> >> > that
>> >> > point on.
>> >> >
>> >> > After looking in SQLAlchemy's code, mappers are stored in a
>> >> > _mapper_registry
>> >> > global space variable and once any mapper in the registry errors out,
>> >> > any
>> >> > attempt to configure a new mapper will fail.
>> >> >
>> >> > Mapping failures of this nature may be rare -- and indeed it only
>> >> > rarely
>> >> > happens on the connection I'm having a problem with -- but this
>> >> > complete
>> >> > locking behavior of all future mapping seems very odd to me. If there
>> >> > isn't
>> >> > a way around this I might have no choice but to have my process
>> >> > completely
>> >> > exit when the exception is encountered, even if that means killing
>> >> > other
>> >> > running threads.
>> >>
>> >> are you creating mappers on the fly or on a per-request basis?   You'd
>> >> want to ideally have mappings created just once at the module import
>> >> level.  Then when your application is ready to start up, call
>> >> configure_mappers() and everything will be set up.
>> >>
>> >> if those are not patterns you're able to use, then please provide more
>> >> specifics.   from your stack trace on SO, it seems like you are using
>> >> automap.   When is that running?  If per request, this very expensive
>> >> and will have problems.
>> >>
>> >> The mapping process *is* guarded by a mutex so it is difficult to
>> >> produce an issue with mappings failing - the stack trace you post
>> >> almost appears like there is some kind of naming issue happening where
>> >> a particular mapper has been garbage collected or something like that
>> >> yet still being referred towards by other mappers that are being
>> >> configured.     need to see details of how your code works.
>> >>
>> >>
>> >>
>> >> >
>> >> > Any ideas?
>> >> >
>> >> > --
>> >> > SQLAlchemy -
>> >> > The Python SQL Toolkit and Object Relational Mapper
>> >> >
>> >> > http://www.sqlalchemy.org/
>> >> >
>> >> > To post example code, please provide an MCVE: Minimal, Complete, and
>> >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> >> > description.
>> >> > ---
>> >> > You received this message because you are subscribed to the Google
>> >> > Groups
>> >> > "sqlalchemy" group.
>> >> > To unsubscribe from this group and stop receiving emails from it,
>> >> > send
>> >> > an
>> >> > email to [email protected].
>> >> > To post to this group, send email to [email protected].
>> >> > Visit this group at https://groups.google.com/group/sqlalchemy.
>> >> > For more options, visit https://groups.google.com/d/optout.
>> >>
>> >> --
>> >> SQLAlchemy -
>> >> The Python SQL Toolkit and Object Relational Mapper
>> >>
>> >> http://www.sqlalchemy.org/
>> >>
>> >> To post example code, please provide an MCVE: Minimal, Complete, and
>> >> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full
>> >> description.
>> >> ---
>> >> You received this message because you are subscribed to the Google
>> >> Groups
>> >> "sqlalchemy" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> >> an
>> >> email to [email protected].
>> >> To post to this group, send email to [email protected].
>> >> Visit this group at https://groups.google.com/group/sqlalchemy.
>> >> For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>> > --
>> > SQLAlchemy -
>> > The Python SQL Toolkit and Object Relational Mapper
>> >
>> > http://www.sqlalchemy.org/
>> >
>> > To post example code, please provide an MCVE: Minimal, Complete, and
>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> > description.
>> > ---
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "sqlalchemy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to [email protected].
>> > To post to this group, send email to [email protected].
>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalchemy.org/
>>
>> To post example code, please provide an MCVE: Minimal, Complete, and
>> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full
>> description.
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Application Process Unusable After “… - can't proceed with initialization of other mappers.”

Reply via email to