Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

Антонио Антуан Wed, 22 Nov 2017 01:57:07 -0800

Glad to see that you remember my messages :)

I've dived into `loading` module and I see that currently it is really
complicated to store additional data into pkey for each instance.


Unfortunately, suggested solutions not good for my project.
Also, I think that `shard` meaning in my case is not the same as usual.

I want to describe structure of out project, maybe it can help.

Here is definition of our databases structure:
http://joxi.ru/nAyJVvGiXMv0Dr. <http://joxi.ru/nAyJVvGiXMv0Dr>
We got master db and several geo databases. Catalogs like `users`,
`groups`, `offers` and other are replicating to geo databases, so that data
is always the same.
But also we have tables like `clicks` and `leads`. Each app instance
contains the data about them in database, related to its geo:
europe-instance into europe-db, usa-instance into usa-database and so on.
Periodically master-app pulls clicks and leads to master-database. Synced
objects always have different ids into master- and get-db, so it is ok.

But one time project owner came and said: "I need SAAS".
We see, that in current structure it's very hard (and really ugly) to
implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and other
models will be multiplied with tenants amount.

I discovered that I can override `get_bind` with another logic and it was
great: we can remove several `Base` classes, several Sessions and several
`Orders`.

Mechanism looks like this:
- we use one instance on each geo for all tenants.
- we create separated databases for each tenant: this will be multiplied
with tenants amount: http://joxi.ru/nAyJVvGiXMv0Dr.
<http://joxi.ru/nAyJVvGiXMv0Dr>
- we detect `tenant_id` using `request.host` (we use flask): each domain
binds with particular tenant;
- we store `tenant_id` into global storage.
- we use stored `tenant_id` into `Session.get_bind`:
https://gist.github.com/aCLr/9f578a2eeb225e7d65099ffb49aa8f3a
- into flask `teardown_request` we clear `tenant_id` storage and call
`Session.remove()`
- if we need to read from another get, just write `query =
query.set_shard_id(GEO)` `tenant_id`

For celery we use this:
https://gist.github.com/aCLr/d8c5ac38956947da092375b2f89d7b50
Clear to.

All this leads us only to pros, without cons: any developer has no need to
think about database chosing, just write code like there is only one
database. If you need to read from another geo-database, just call
`query.set_shard(GEO)`, tenant will be appended automatically to it.

Problems begin when we tried to test non-flask and non-celery scripts, like
cron tasks: we may want to query several tenant-databases during one
SQLA-transaction, somethins like in my first example:
https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f
(`assert_got_correct_objects_with_remove`
and `assert_got_cached_objects_without_remove`). The result you know.


During writing this message, I found out, that we need only one additional
data for primary key: `connection.bind.url`. I see, that SQLA already have
it inside `_instance_processor`, so it exists inside `_instance`. I think,
that `identity_key` should be constructed (in my case) with this code:
https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced. Clear, don't
you think so?



вт, 21 нояб. 2017 г. в 19:15, Mike Bayer <[email protected]>:

> I've looked to see how hard it would be to allow "supplemental"
> attributes to form part of the mapper's primary key tuple, and it
> would be pretty hard.   The "easy" part is getting the mapper to set
> itself up with some extra attributes that can deliver some kind of
> supplemental value to the identity key.  the harder part is then in
> loading.py where we get new rows from the DB and need this
> value...which means some whole new kind of system would need to
> deliver this for any arbitrary part of the result set given a mapping
> and the selectable we're looking at (keep in mind a Query can have
> lots of the same mapping in a single row with aliases).   This would
> be very complicated to implement and test.   I am not seeing any quick
> way to suit this use case, which has also not ever been requested
> before.
>
>
>
> On Tue, Nov 21, 2017 at 10:12 AM, Mike Bayer <[email protected]>
> wrote:
> > On Tue, Nov 21, 2017 at 7:39 AM, Антонио Антуан <[email protected]>
> wrote:
> >> Hi guys.
> >>
> >> I got this code example:
> >> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f.
> >>
> >> Here I make custom `Session` and custom `Query`. As you see, `Session`
> has
> >> several binds.
> >>
> >> Also, you can see that there are two functions:
> >> `assert_got_correct_objects_with_remove` and
> >> `assert_got_cached_objects_without_remove`.
> >>
> >> The first checks that we got correct results if `Session.remove` called.
> >> The second checks, that we got incorrect results if `Session.remove` not
> >> called.
> >>
> >> I understand, that behavior is correct: we don't remove session - so,
> we got
> >> same result from "cache-like"
> >> `sqlalchemy.orm.loading._instance_processor.session_identity_map`.
> >>
> >> I want to avoid that mechanism and don't want to use
> `session_identity_map`
> >> for different binds. In ideal, bind should be used as part of key for
> >> `session_identity_map`, but I guess, that it is not possible.
> >> Another way, acceptable for me: disable this mechanism. But I do not
> found
> >> ways to achieve this.
> >> And the third option: construct instances manually. Looks like I should
> copy
> >> code from `loading` module and add that method to `CustomSession`:
> >
> >
> > there's really no reason at all to use a "ShardedSession" if you have
> > overlapping primary key spaces from each of your binds.   I'm not sure
> > if I mentioned this at the beginning of the emails regarding this
> > project but I hope that I mentioned just using separate Session
> > objects is vastly simpler for non-intricate sharding cases, such as
> > where you always know which shard you care about and you don't care
> > about any of the others for a certain operation.     The point of
> > ShardedSession is so that objects pulled from multiple databases can
> > be intermingled in the same query and in the same transaction - which
> > by definition means they have unique primary keys.   If that's not
> > what you're doing here I don't see what advantage all this complexity
> > is getting you.
> >
> > If you're still convinced you need to be using a monolithic
> > ShardedSession then there needs to be some kind of translation of data
> > such that the mapper sees unique primary keys across the shards, or
> > unique classes.
> >
> > I've tried to think of ways to do this without too much difficulty but
> > none of them are really worth the complexity and hackiness it would
> > require.   The absolutely quickest and most well-supported, no hacks
> > required way would be to properly create a composite primary key on
> > your classes, where the second column is your shard id:
> >
> > class A(Base):
> >     __tablename__ = 'a'
> >
> >     id = Column(Integer, primary_key=True)
> >     shard_id = Column(Integer, primary_key=True)
> >
> > I tried to see if the "shard_id" column can be some kind of expression
> > that is not a Column on the Table but the mapper() is not set up to
> > support this unless you mapped the whole class to a select()
> > construct, which would make for too-complicated SQL, and you'd still
> > need to intercept this select() using events to put the right shard id
> > in.  Another is to create a custom column that renders in a special
> > way, but again you need to create events to intercept it in every case
> > to put the right shard id in, and/or remove it from things like
> > insert() statements.
> >
> > by far your two best solutions are: 1. use separate Session objects
> > per shard  2. make sure your data actually has shard-specific primary
> > keys
> >
> >
> >
> >
> >
> >
> >
> >>
> >> def instances(self, cursor, __context=None):
> >>     context = __context
> >>     if context is None:
> >>         context = QueryContext(self)
> >>     return self._custom_instances(self, cursor, context)
> >>
> >>
> >>
> >> def custom_instances(query, cursor, context):
> >>      """copied from `loading.instances` code with disabled
> >> `session_identity_map`"""
> >>
> >>
> >>
> >> The third way is the most ugly and I want to avoid it.
> >>
> >> Could you help me with my hard choice and, maybe, suggest any other
> ways and
> >> options? :)
> >>
> >> Thank you.
> >>
> >> --
> >> SQLAlchemy -
> >> The Python SQL Toolkit and Object Relational Mapper
> >>
> >> http://www.sqlalchemy.org/
> >>
> >> To post example code, please provide an MCVE: Minimal, Complete, and
> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> >> description.
> >> ---
> >> You received this message because you are subscribed to the Google
> Groups
> >> "sqlalchemy" group.
> >> To unsubscribe from this group and stop receiving emails from it, send
> an
> >> email to [email protected].
> >> To post to this group, send email to [email protected].
> >> Visit this group at https://groups.google.com/group/sqlalchemy.
> >> For more options, visit https://groups.google.com/d/optout.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>
-- 

Антон

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

Reply via email to