On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан <[email protected]> wrote: > Glad to see that you remember my messages :) > > I've dived into `loading` module and I see that currently it is really > complicated to store additional data into pkey for each instance. > > Unfortunately, suggested solutions not good for my project. > Also, I think that `shard` meaning in my case is not the same as usual. > > I want to describe structure of out project, maybe it can help. > > Here is definition of our databases structure: > http://joxi.ru/nAyJVvGiXMv0Dr. > We got master db and several geo databases. Catalogs like `users`, `groups`, > `offers` and other are replicating to geo databases, so that data is always > the same. > But also we have tables like `clicks` and `leads`. Each app instance > contains the data about them in database, related to its geo: > europe-instance into europe-db, usa-instance into usa-database and so on. > Periodically master-app pulls clicks and leads to master-database. Synced > objects always have different ids into master- and get-db, so it is ok. > > But one time project owner came and said: "I need SAAS". > We see, that in current structure it's very hard (and really ugly) to > implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and other > models will be multiplied with tenants amount. > > I discovered that I can override `get_bind` with another logic and it was > great: we can remove several `Base` classes, several Sessions and several > `Orders`. > > Mechanism looks like this: > - we use one instance on each geo for all tenants. > - we create separated databases for each tenant: this will be multiplied > with tenants amount: http://joxi.ru/nAyJVvGiXMv0Dr. > - we detect `tenant_id` using `request.host` (we use flask): each domain > binds with particular tenant; > - we store `tenant_id` into global storage. > - we use stored `tenant_id` into `Session.get_bind`: > https://gist.github.com/aCLr/9f578a2eeb225e7d65099ffb49aa8f3a > - into flask `teardown_request` we clear `tenant_id` storage and call > `Session.remove()` > - if we need to read from another get, just write `query = > query.set_shard_id(GEO)` `tenant_id` > > For celery we use this: > https://gist.github.com/aCLr/d8c5ac38956947da092375b2f89d7b50 > Clear to. > > All this leads us only to pros, without cons: any developer has no need to > think about database chosing, just write code like there is only one > database. If you need to read from another geo-database, just call > `query.set_shard(GEO)`, tenant will be appended automatically to it. > > Problems begin when we tried to test non-flask and non-celery scripts, like > cron tasks: we may want to query several tenant-databases during one > SQLA-transaction, somethins like in my first example: > https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f > (`assert_got_correct_objects_with_remove` and > `assert_got_cached_objects_without_remove`). The result you know. > > > During writing this message, I found out, that we need only one additional > data for primary key: `connection.bind.url`. I see, that SQLA already have > it inside `_instance_processor`, so it exists inside `_instance`. I think, > that `identity_key` should be constructed (in my case) with this code: > https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced. Clear, don't > you think so?
that's where something needs to happen but SQLAlchemy can't do this in such a way that is hardcoded to exactly your particular use case. A Query can have lots of entities in it, and if you're doing sharding a single result set can refer to any number of shard identifiers within not just a single result set but within a single row; they might have come from dozens of different databases at once. This could only be suited with a very open plugin point that is carefully architected, tested, and documented and I don't have the resources to envision this for a short-term use case. > Problems begin when we tried to test non-flask and non-celery scripts, like > cron tasks: we may want to query several tenant-databases during one > SQLA-transaction, I don't understand why you can't use independent sessions for this, because when you query two different databases, you are using two independent transactions in any case; they are only coordinated if one is using two-phase transactions which I doubt is the case here (while SQLAlchemy put a lot of work into making that possible, I don't think anyone has ever used that feature). Instead of: Session.query(Foo).set_shard(id) you say: sessions.get(shard_id).query(Foo) > > > > вт, 21 нояб. 2017 г. в 19:15, Mike Bayer <[email protected]>: >> >> I've looked to see how hard it would be to allow "supplemental" >> attributes to form part of the mapper's primary key tuple, and it >> would be pretty hard. The "easy" part is getting the mapper to set >> itself up with some extra attributes that can deliver some kind of >> supplemental value to the identity key. the harder part is then in >> loading.py where we get new rows from the DB and need this >> value...which means some whole new kind of system would need to >> deliver this for any arbitrary part of the result set given a mapping >> and the selectable we're looking at (keep in mind a Query can have >> lots of the same mapping in a single row with aliases). This would >> be very complicated to implement and test. I am not seeing any quick >> way to suit this use case, which has also not ever been requested >> before. >> >> >> >> On Tue, Nov 21, 2017 at 10:12 AM, Mike Bayer <[email protected]> >> wrote: >> > On Tue, Nov 21, 2017 at 7:39 AM, Антонио Антуан <[email protected]> >> > wrote: >> >> Hi guys. >> >> >> >> I got this code example: >> >> https://gist.github.com/aCLr/ff9462b634031ee6bccbead8d913c41f. >> >> >> >> Here I make custom `Session` and custom `Query`. As you see, `Session` >> >> has >> >> several binds. >> >> >> >> Also, you can see that there are two functions: >> >> `assert_got_correct_objects_with_remove` and >> >> `assert_got_cached_objects_without_remove`. >> >> >> >> The first checks that we got correct results if `Session.remove` >> >> called. >> >> The second checks, that we got incorrect results if `Session.remove` >> >> not >> >> called. >> >> >> >> I understand, that behavior is correct: we don't remove session - so, >> >> we got >> >> same result from "cache-like" >> >> `sqlalchemy.orm.loading._instance_processor.session_identity_map`. >> >> >> >> I want to avoid that mechanism and don't want to use >> >> `session_identity_map` >> >> for different binds. In ideal, bind should be used as part of key for >> >> `session_identity_map`, but I guess, that it is not possible. >> >> Another way, acceptable for me: disable this mechanism. But I do not >> >> found >> >> ways to achieve this. >> >> And the third option: construct instances manually. Looks like I should >> >> copy >> >> code from `loading` module and add that method to `CustomSession`: >> > >> > >> > there's really no reason at all to use a "ShardedSession" if you have >> > overlapping primary key spaces from each of your binds. I'm not sure >> > if I mentioned this at the beginning of the emails regarding this >> > project but I hope that I mentioned just using separate Session >> > objects is vastly simpler for non-intricate sharding cases, such as >> > where you always know which shard you care about and you don't care >> > about any of the others for a certain operation. The point of >> > ShardedSession is so that objects pulled from multiple databases can >> > be intermingled in the same query and in the same transaction - which >> > by definition means they have unique primary keys. If that's not >> > what you're doing here I don't see what advantage all this complexity >> > is getting you. >> > >> > If you're still convinced you need to be using a monolithic >> > ShardedSession then there needs to be some kind of translation of data >> > such that the mapper sees unique primary keys across the shards, or >> > unique classes. >> > >> > I've tried to think of ways to do this without too much difficulty but >> > none of them are really worth the complexity and hackiness it would >> > require. The absolutely quickest and most well-supported, no hacks >> > required way would be to properly create a composite primary key on >> > your classes, where the second column is your shard id: >> > >> > class A(Base): >> > __tablename__ = 'a' >> > >> > id = Column(Integer, primary_key=True) >> > shard_id = Column(Integer, primary_key=True) >> > >> > I tried to see if the "shard_id" column can be some kind of expression >> > that is not a Column on the Table but the mapper() is not set up to >> > support this unless you mapped the whole class to a select() >> > construct, which would make for too-complicated SQL, and you'd still >> > need to intercept this select() using events to put the right shard id >> > in. Another is to create a custom column that renders in a special >> > way, but again you need to create events to intercept it in every case >> > to put the right shard id in, and/or remove it from things like >> > insert() statements. >> > >> > by far your two best solutions are: 1. use separate Session objects >> > per shard 2. make sure your data actually has shard-specific primary >> > keys >> > >> > >> > >> > >> > >> > >> > >> >> >> >> def instances(self, cursor, __context=None): >> >> context = __context >> >> if context is None: >> >> context = QueryContext(self) >> >> return self._custom_instances(self, cursor, context) >> >> >> >> >> >> >> >> def custom_instances(query, cursor, context): >> >> """copied from `loading.instances` code with disabled >> >> `session_identity_map`""" >> >> >> >> >> >> >> >> The third way is the most ugly and I want to avoid it. >> >> >> >> Could you help me with my hard choice and, maybe, suggest any other >> >> ways and >> >> options? :) >> >> >> >> Thank you. >> >> >> >> -- >> >> SQLAlchemy - >> >> The Python SQL Toolkit and Object Relational Mapper >> >> >> >> http://www.sqlalchemy.org/ >> >> >> >> To post example code, please provide an MCVE: Minimal, Complete, and >> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> >> description. >> >> --- >> >> You received this message because you are subscribed to the Google >> >> Groups >> >> "sqlalchemy" group. >> >> To unsubscribe from this group and stop receiving emails from it, send >> >> an >> >> email to [email protected]. >> >> To post to this group, send email to [email protected]. >> >> Visit this group at https://groups.google.com/group/sqlalchemy. >> >> For more options, visit https://groups.google.com/d/optout. >> >> -- >> SQLAlchemy - >> The Python SQL Toolkit and Object Relational Mapper >> >> http://www.sqlalchemy.org/ >> >> To post example code, please provide an MCVE: Minimal, Complete, and >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> description. >> --- >> You received this message because you are subscribed to the Google Groups >> "sqlalchemy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at https://groups.google.com/group/sqlalchemy. >> For more options, visit https://groups.google.com/d/optout. > > -- > > Антон > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
