Glad to see that you remember my messages :) I've dived into `loading` module and I see that currently it is really complicated to store additional data into pkey for each instance.
Unfortunately, suggested solutions not good for my project. Also, I think that `shard` meaning in my case is not the same as usual. I want to describe structure of out project, maybe it can help. Here is definition of our databases structure: http://joxi.ru/nAyJVvGiXMv0Dr. <http://joxi.ru/nAyJVvGiXMv0Dr> We got master db and several geo databases. Catalogs like `users`, `groups`, `offers` and other are replicating to geo databases, so that data is always the same. But also we have tables like `clicks` and `leads`. Each app instance contains the data about them in database, related to its geo: europe-instance into europe-db, usa-instance into usa-database and so on. Periodically master-app pulls clicks and leads to master-database. Synced objects always have different ids into master- and get-db, so it is ok. But one time project owner came and said: "I need SAAS". We see, that in current structure it's very hard (and really ugly) to implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and other models will be multiplied with tenants amount. I discovered that I can override `get_bind` with another logic and it was great: we can remove several `Base` classes, several Sessions and several `Orders`. Mechanism looks like this: - we use one instance on each geo for all tenants. - we create separated databases for each tenant: this will be multiplied with tenants amount: http://joxi.ru/nAyJVvGiXMv0Dr. <http://joxi.ru/nAyJVvGiXMv0Dr> - we detect `tenant_id` using `request.host` (we use flask): each domain binds with particular tenant; - we store `tenant_id` into global storage. - we use stored `tenant_id` into `Session.get_bind`: https://gist.github.com/aCLr/9f578a2eeb225e7d65099ffb49aa8f3a - into flask `teardown_request` we clear `tenant_id` storage and call `Session.remove()` - if we need to read from another get, just write `query = query.set_shard_id(GEO)` `tenant_id` For celery we use this: https://gist.github.com/aCLr/d8c5ac38956947da092375b2f89d7b50 Clear to. All this leads us only to pros, without cons: any developer has no need to think about database chosing, just write code like there is only one database. If you need to read from another geo-database, just call `query.set_shard(GEO)`, tenant will be appended automatically to it. Problems begin when we tried to test non-flask and non-celery scripts, like cron tasks: we may want to query several tenant-databases during one SQLA-transaction, somethins like in my first example: https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f (`assert_got_correct_objects_with_remove` and `assert_got_cached_objects_without_remove`). The result you know. During writing this message, I found out, that we need only one additional data for primary key: `connection.bind.url`. I see, that SQLA already have it inside `_instance_processor`, so it exists inside `_instance`. I think, that `identity_key` should be constructed (in my case) with this code: https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced. Clear, don't you think so? вт, 21 нояб. 2017 г. в 19:15, Mike Bayer <[email protected]>: > I've looked to see how hard it would be to allow "supplemental" > attributes to form part of the mapper's primary key tuple, and it > would be pretty hard. The "easy" part is getting the mapper to set > itself up with some extra attributes that can deliver some kind of > supplemental value to the identity key. the harder part is then in > loading.py where we get new rows from the DB and need this > value...which means some whole new kind of system would need to > deliver this for any arbitrary part of the result set given a mapping > and the selectable we're looking at (keep in mind a Query can have > lots of the same mapping in a single row with aliases). This would > be very complicated to implement and test. I am not seeing any quick > way to suit this use case, which has also not ever been requested > before. > > > > On Tue, Nov 21, 2017 at 10:12 AM, Mike Bayer <[email protected]> > wrote: > > On Tue, Nov 21, 2017 at 7:39 AM, Антонио Антуан <[email protected]> > wrote: > >> Hi guys. > >> > >> I got this code example: > >> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f. > >> > >> Here I make custom `Session` and custom `Query`. As you see, `Session` > has > >> several binds. > >> > >> Also, you can see that there are two functions: > >> `assert_got_correct_objects_with_remove` and > >> `assert_got_cached_objects_without_remove`. > >> > >> The first checks that we got correct results if `Session.remove` called. > >> The second checks, that we got incorrect results if `Session.remove` not > >> called. > >> > >> I understand, that behavior is correct: we don't remove session - so, > we got > >> same result from "cache-like" > >> `sqlalchemy.orm.loading._instance_processor.session_identity_map`. > >> > >> I want to avoid that mechanism and don't want to use > `session_identity_map` > >> for different binds. In ideal, bind should be used as part of key for > >> `session_identity_map`, but I guess, that it is not possible. > >> Another way, acceptable for me: disable this mechanism. But I do not > found > >> ways to achieve this. > >> And the third option: construct instances manually. Looks like I should > copy > >> code from `loading` module and add that method to `CustomSession`: > > > > > > there's really no reason at all to use a "ShardedSession" if you have > > overlapping primary key spaces from each of your binds. I'm not sure > > if I mentioned this at the beginning of the emails regarding this > > project but I hope that I mentioned just using separate Session > > objects is vastly simpler for non-intricate sharding cases, such as > > where you always know which shard you care about and you don't care > > about any of the others for a certain operation. The point of > > ShardedSession is so that objects pulled from multiple databases can > > be intermingled in the same query and in the same transaction - which > > by definition means they have unique primary keys. If that's not > > what you're doing here I don't see what advantage all this complexity > > is getting you. > > > > If you're still convinced you need to be using a monolithic > > ShardedSession then there needs to be some kind of translation of data > > such that the mapper sees unique primary keys across the shards, or > > unique classes. > > > > I've tried to think of ways to do this without too much difficulty but > > none of them are really worth the complexity and hackiness it would > > require. The absolutely quickest and most well-supported, no hacks > > required way would be to properly create a composite primary key on > > your classes, where the second column is your shard id: > > > > class A(Base): > > __tablename__ = 'a' > > > > id = Column(Integer, primary_key=True) > > shard_id = Column(Integer, primary_key=True) > > > > I tried to see if the "shard_id" column can be some kind of expression > > that is not a Column on the Table but the mapper() is not set up to > > support this unless you mapped the whole class to a select() > > construct, which would make for too-complicated SQL, and you'd still > > need to intercept this select() using events to put the right shard id > > in. Another is to create a custom column that renders in a special > > way, but again you need to create events to intercept it in every case > > to put the right shard id in, and/or remove it from things like > > insert() statements. > > > > by far your two best solutions are: 1. use separate Session objects > > per shard 2. make sure your data actually has shard-specific primary > > keys > > > > > > > > > > > > > > > >> > >> def instances(self, cursor, __context=None): > >> context = __context > >> if context is None: > >> context = QueryContext(self) > >> return self._custom_instances(self, cursor, context) > >> > >> > >> > >> def custom_instances(query, cursor, context): > >> """copied from `loading.instances` code with disabled > >> `session_identity_map`""" > >> > >> > >> > >> The third way is the most ugly and I want to avoid it. > >> > >> Could you help me with my hard choice and, maybe, suggest any other > ways and > >> options? :) > >> > >> Thank you. > >> > >> -- > >> SQLAlchemy - > >> The Python SQL Toolkit and Object Relational Mapper > >> > >> http://www.sqlalchemy.org/ > >> > >> To post example code, please provide an MCVE: Minimal, Complete, and > >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full > >> description. > >> --- > >> You received this message because you are subscribed to the Google > Groups > >> "sqlalchemy" group. > >> To unsubscribe from this group and stop receiving emails from it, send > an > >> email to [email protected]. > >> To post to this group, send email to [email protected]. > >> Visit this group at https://groups.google.com/group/sqlalchemy. > >> For more options, visit https://groups.google.com/d/optout. > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. > -- Антон -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
