Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

Mike Bayer Wed, 22 Nov 2017 13:21:07 -0800

On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан <[email protected]> wrote:
> Glad to see that you remember my messages :)
>
> I've dived into `loading` module and I see that currently it is really
> complicated to store additional data into pkey for each instance.
>
> Unfortunately, suggested solutions not good for my project.
> Also, I think that `shard` meaning in my case is not the same as usual.
>
> I want to describe structure of out project, maybe it can help.
>
> Here is definition of our databases structure:
> http://joxi.ru/nAyJVvGiXMv0Dr.
> We got master db and several geo databases. Catalogs like `users`, `groups`,
> `offers` and other are replicating to geo databases, so that data is always
> the same.
> But also we have tables like `clicks` and `leads`. Each app instance
> contains the data about them in database, related to its geo:
> europe-instance into europe-db, usa-instance into usa-database and so on.
> Periodically master-app pulls clicks and leads to master-database. Synced
> objects always have different ids into master- and get-db, so it is ok.
>
> But one time project owner came and said: "I need SAAS".
> We see, that in current structure it's very hard (and really ugly) to
> implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and other
> models will be multiplied with tenants amount.
>
> I discovered that I can override `get_bind` with another logic and it was
> great: we can remove several `Base` classes, several Sessions and several
> `Orders`.
>
> Mechanism looks like this:
> - we use one instance on each geo for all tenants.
> - we create separated databases for each tenant: this will be multiplied
> with tenants amount: http://joxi.ru/nAyJVvGiXMv0Dr.
> - we detect `tenant_id` using `request.host` (we use flask): each domain
> binds with particular tenant;
> - we store `tenant_id` into global storage.
> - we use stored `tenant_id` into `Session.get_bind`:
> https://gist.github.com/aCLr/9f578a2eeb225e7d65099ffb49aa8f3a
> - into flask `teardown_request` we clear `tenant_id` storage and call
> `Session.remove()`
> - if we need to read from another get, just write `query =
> query.set_shard_id(GEO)` `tenant_id`
>
> For celery we use this:
> https://gist.github.com/aCLr/d8c5ac38956947da092375b2f89d7b50
> Clear to.
>
> All this leads us only to pros, without cons: any developer has no need to
> think about database chosing, just write code like there is only one
> database. If you need to read from another geo-database, just call
> `query.set_shard(GEO)`, tenant will be appended automatically to it.
>
> Problems begin when we tried to test non-flask and non-celery scripts, like
> cron tasks: we may want to query several tenant-databases during one
> SQLA-transaction, somethins like in my first example:
> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f
> (`assert_got_correct_objects_with_remove` and
> `assert_got_cached_objects_without_remove`). The result you know.
>
>
> During writing this message, I found out, that we need only one additional
> data for primary key: `connection.bind.url`. I see, that SQLA already have
> it inside `_instance_processor`, so it exists inside `_instance`. I think,
> that `identity_key` should be constructed (in my case) with this code:
> https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced. Clear, don't
> you think so?


that's where something needs to happen but SQLAlchemy can't do this in
such a way that is hardcoded to exactly your particular use case.   A
Query can have lots of entities in it, and if you're doing sharding a
single result set can refer to any number of shard identifiers within
not just a single result set but within a single row; they might have
come from dozens of different databases at once.  This could only be
suited with a very open plugin point that is carefully architected,
tested, and documented and I don't have the resources to envision this
for a short-term use case.

> Problems begin when we tried to test non-flask and non-celery scripts, like
> cron tasks: we may want to query several tenant-databases during one
> SQLA-transaction,

I don't understand why you can't use independent sessions for this,
because when you query two different databases, you are using two
independent transactions in any case;  they are only coordinated if
one is using two-phase transactions which I doubt is the case here
(while SQLAlchemy put a lot of work into making that possible, I don't
think anyone has ever used that feature).

Instead of:

Session.query(Foo).set_shard(id)

you say:

sessions.get(shard_id).query(Foo)





>
>
>
> вт, 21 нояб. 2017 г. в 19:15, Mike Bayer <[email protected]>:
>>
>> I've looked to see how hard it would be to allow "supplemental"
>> attributes to form part of the mapper's primary key tuple, and it
>> would be pretty hard.   The "easy" part is getting the mapper to set
>> itself up with some extra attributes that can deliver some kind of
>> supplemental value to the identity key.  the harder part is then in
>> loading.py where we get new rows from the DB and need this
>> value...which means some whole new kind of system would need to
>> deliver this for any arbitrary part of the result set given a mapping
>> and the selectable we're looking at (keep in mind a Query can have
>> lots of the same mapping in a single row with aliases).   This would
>> be very complicated to implement and test.   I am not seeing any quick
>> way to suit this use case, which has also not ever been requested
>> before.
>>
>>
>>
>> On Tue, Nov 21, 2017 at 10:12 AM, Mike Bayer <[email protected]>
>> wrote:
>> > On Tue, Nov 21, 2017 at 7:39 AM, Антонио Антуан <[email protected]>
>> > wrote:
>> >> Hi guys.
>> >>
>> >> I got this code example:
>> >> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f.
>> >>
>> >> Here I make custom `Session` and custom `Query`. As you see, `Session`
>> >> has
>> >> several binds.
>> >>
>> >> Also, you can see that there are two functions:
>> >> `assert_got_correct_objects_with_remove` and
>> >> `assert_got_cached_objects_without_remove`.
>> >>
>> >> The first checks that we got correct results if `Session.remove`
>> >> called.
>> >> The second checks, that we got incorrect results if `Session.remove`
>> >> not
>> >> called.
>> >>
>> >> I understand, that behavior is correct: we don't remove session - so,
>> >> we got
>> >> same result from "cache-like"
>> >> `sqlalchemy.orm.loading._instance_processor.session_identity_map`.
>> >>
>> >> I want to avoid that mechanism and don't want to use
>> >> `session_identity_map`
>> >> for different binds. In ideal, bind should be used as part of key for
>> >> `session_identity_map`, but I guess, that it is not possible.
>> >> Another way, acceptable for me: disable this mechanism. But I do not
>> >> found
>> >> ways to achieve this.
>> >> And the third option: construct instances manually. Looks like I should
>> >> copy
>> >> code from `loading` module and add that method to `CustomSession`:
>> >
>> >
>> > there's really no reason at all to use a "ShardedSession" if you have
>> > overlapping primary key spaces from each of your binds.   I'm not sure
>> > if I mentioned this at the beginning of the emails regarding this
>> > project but I hope that I mentioned just using separate Session
>> > objects is vastly simpler for non-intricate sharding cases, such as
>> > where you always know which shard you care about and you don't care
>> > about any of the others for a certain operation.     The point of
>> > ShardedSession is so that objects pulled from multiple databases can
>> > be intermingled in the same query and in the same transaction - which
>> > by definition means they have unique primary keys.   If that's not
>> > what you're doing here I don't see what advantage all this complexity
>> > is getting you.
>> >
>> > If you're still convinced you need to be using a monolithic
>> > ShardedSession then there needs to be some kind of translation of data
>> > such that the mapper sees unique primary keys across the shards, or
>> > unique classes.
>> >
>> > I've tried to think of ways to do this without too much difficulty but
>> > none of them are really worth the complexity and hackiness it would
>> > require.   The absolutely quickest and most well-supported, no hacks
>> > required way would be to properly create a composite primary key on
>> > your classes, where the second column is your shard id:
>> >
>> > class A(Base):
>> >     __tablename__ = 'a'
>> >
>> >     id = Column(Integer, primary_key=True)
>> >     shard_id = Column(Integer, primary_key=True)
>> >
>> > I tried to see if the "shard_id" column can be some kind of expression
>> > that is not a Column on the Table but the mapper() is not set up to
>> > support this unless you mapped the whole class to a select()
>> > construct, which would make for too-complicated SQL, and you'd still
>> > need to intercept this select() using events to put the right shard id
>> > in.  Another is to create a custom column that renders in a special
>> > way, but again you need to create events to intercept it in every case
>> > to put the right shard id in, and/or remove it from things like
>> > insert() statements.
>> >
>> > by far your two best solutions are: 1. use separate Session objects
>> > per shard  2. make sure your data actually has shard-specific primary
>> > keys
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >>
>> >> def instances(self, cursor, __context=None):
>> >>     context = __context
>> >>     if context is None:
>> >>         context = QueryContext(self)
>> >>     return self._custom_instances(self, cursor, context)
>> >>
>> >>
>> >>
>> >> def custom_instances(query, cursor, context):
>> >>      """copied from `loading.instances` code with disabled
>> >> `session_identity_map`"""
>> >>
>> >>
>> >>
>> >> The third way is the most ugly and I want to avoid it.
>> >>
>> >> Could you help me with my hard choice and, maybe, suggest any other
>> >> ways and
>> >> options? :)
>> >>
>> >> Thank you.
>> >>
>> >> --
>> >> SQLAlchemy -
>> >> The Python SQL Toolkit and Object Relational Mapper
>> >>
>> >> http://www.sqlalchemy.org/
>> >>
>> >> To post example code, please provide an MCVE: Minimal, Complete, and
>> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> >> description.
>> >> ---
>> >> You received this message because you are subscribed to the Google
>> >> Groups
>> >> "sqlalchemy" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> >> an
>> >> email to [email protected].
>> >> To post to this group, send email to [email protected].
>> >> Visit this group at https://groups.google.com/group/sqlalchemy.
>> >> For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> SQLAlchemy -
>> The Python SQL Toolkit and Object Relational Mapper
>>
>> http://www.sqlalchemy.org/
>>
>> To post example code, please provide an MCVE: Minimal, Complete, and
>> Verifiable Example.  See  http://stackoverflow.com/help/mcve for a full
>> description.
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
>
> Антон
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

Reply via email to