Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`
On Thu, Nov 23, 2017 at 6:55 PM, Антонио Антуанwrote: > > > чт, 23 нояб. 2017 г. в 20:27, Mike Bayer : >> >> On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуан >> wrote: >> > >> >> A Query can have lots of entities in it, and if you're doing sharding a >> >> single result set can refer to any number of shard identifiers within >> >> not just a single result set but within a single row; they might have >> >> come from dozens of different databases at once >> > >> > In my case it is not possible: all entities in query can be gotten only >> > from >> > one particular shard. We have totally the same database structure for >> > each >> > shard. The difference is just data stored into database. No `shard_id` >> > or >> > any other key as part of primary key for any table. >> >> >> so just to note, these aren't "shards", they're tenants. you have a >> multi-tenant application, which is normally a really easy thing. but >> you have a few side applications that want to "cheat" and use the >> per-tenant object model across multiple tenants simultaneously in the >> scope of a single Session. >> >> > If I want to make query >> > for particular database I always want to retrieve data ONLY from that >> > database. And even more than that: ONLY one database during one session >> > transaction (or, in other words, one http-request to our app). >> >> if you have one "tenant id" per HTTP request, the standard HTTP >> request pattern is one Session per request.There's no problem in >> that case. You mentioned you have some non-flask applications that >> want to communicate with multiple tenants in one Session. > > > Yes, you're right. We have some offline actions, when we want to ask each > tenant about something specific. > I see, that currently the most safe way is to call `commit`, `rollback`, > `remove` or `expunge_all` on session instance: all this methods drops > identity map. Please let me know if I'm wrong. A couple of things here. First, you are using ScopedSession, which is essentially a wrapper around an actual Session. The commit(), rollback() and expunge_all() methods are proxies that pass directly through to the underlying Session. I believe commit() and rollback() *expire* instances (so attributes will be reloaded on the next access), but don't actually remove them from the identity map (but I could be wrong about this). remove() is not a Session method though - it tells the ScopedSession to discard the current Session. A new Session will be created the next time you call one of the proxied methods. The default behaviour for ScopedSession is to use thread-locals, so each thread gets its own Session. However, you can provide your own scoping function that does whatever you want: http://docs.sqlalchemy.org/en/latest/orm/contextual.html#using-custom-created-scopes It sounds like you could pass a "get_current_tenant" function to the scoped session, along with a custom session_factory, to get the behaviour you want. Something like this (untested, would definitely require more care to make it thread-safe and so on): class SessionManager(object): def __init__(self, tenant_uris): self.engines = {} self.current_tenant = None for name, dburi in tenant_uris.items(): self.engines[name] = sa.create_engine(name) self.sessionmaker = saorm.sessionmaker() def get_current_tenant(self): return self.current_tenant def set_current_tenant(self, name): self.current_tenant = name def create_session(self): engine = self.engines[self.current_tenant] return self.sessionmaker(bind=engine) tenant_uris = { 'one': 'mysql://...', 'two': 'mysql://...', } manager = SessionManager(tenant_uris) Session = saorm.scoped_session(manager.create_session, scopefunc=manager.get_current_tenant) Base.query = Session.query_property As long as you call manager.set_current_tenant whenever you switch to querying a different tenant, this ought to work. But note that all of this confusion and complexity stems from using scoped sessions and Base.query. If you used explicit sessions everywhere, you would probably find your code less magical and easier to understand. Hope that helps, Simon -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`
чт, 23 нояб. 2017 г. в 20:27, Mike Bayer: > On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуан > wrote: > > > >> A Query can have lots of entities in it, and if you're doing sharding a > >> single result set can refer to any number of shard identifiers within > >> not just a single result set but within a single row; they might have > >> come from dozens of different databases at once > > > > In my case it is not possible: all entities in query can be gotten only > from > > one particular shard. We have totally the same database structure for > each > > shard. The difference is just data stored into database. No `shard_id` or > > any other key as part of primary key for any table. > > > so just to note, these aren't "shards", they're tenants. you have a > multi-tenant application, which is normally a really easy thing. but > you have a few side applications that want to "cheat" and use the > per-tenant object model across multiple tenants simultaneously in the > scope of a single Session. > > > If I want to make query > > for particular database I always want to retrieve data ONLY from that > > database. And even more than that: ONLY one database during one session > > transaction (or, in other words, one http-request to our app). > > if you have one "tenant id" per HTTP request, the standard HTTP > request pattern is one Session per request.There's no problem in > that case. You mentioned you have some non-flask applications that > want to communicate with multiple tenants in one Session. > Yes, you're right. We have some offline actions, when we want to ask each tenant about something specific. I see, that currently the most safe way is to call `commit`, `rollback`, `remove` or `expunge_all` on session instance: all this methods drops identity map. Please let me know if I'm wrong. > > > > >> This could only be suited with a very open plugin point that is > carefully > >> architected, tested, and documented and I don't have the resources to > >> envision this for a short-term use case. > > > > I've seen a lot of questions (for example in stackoverflow) "how to > manage > > several binds with sqlalchemy" and all answers are "Use separated > sessions". > > it's probably the best answer feel free to show specifics and I can > determine if their request would fit this hypothetical feature > otherwise. > > > I really not understand, what is the problem to implement "multibound" > > session. > > The Session has always supported multiple binds. There are two > levels supported. One is per table/mapper: > > http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#simple-vertical-partitioning > > you could probably adapt your multiple tenants into individual > mappings if there are a limited number, see the approach at > https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/EntityName. > > The other level is per primary key, that is, each primary key in the > identity map has a different bind. that's horizontal sharding. > > You're looking for a new level, which is, multiple binds for *one* > primary key. This is an intricate feature request that is feasible > but not in the short term. > > > > Don't have ideal vocabulary to explain how our team like it :) > > Currently I see only one problem: loading instances. Of course, after > fixing > > other problems may appear... > > > > I can (and want) make it as part of SQLAlchemy library. Fully-tested > part, > > of course. If you say that it is bad idea, ok then. > > it's not a bad idea. It's just difficult, and I can't do it right now. > > > >I can make it as a > > plugin, but there is a problem: functions in `loading` module are > monolithic > > and it needs some refactor for the plugin. May I suggest refactor as pull > > request? > > you can do a pull request but note that the PR process for SQLAlchemy > is not quick. 90% of code-related pull requests I get have no tests, > no documentation, or anything.More elaborate feature requests > typically involve that I end up doing the whole thing myself in any > case, using the submitter's original code as just a sketch, which > means that more involved PRs are usually just another form of feature > request.These PRs are almost always for Core level features as the > Core is easier for outside contributors to work on. ORM-level > contributions are extremely rare these days but of course I welcome > contributors for the ORM. > > And if so: could it be merged not only for major release but for, > > at least, 1.0.* (yes, in our project we use 1.0.19 :) )? > > The 1.0 series is in "maintenance" mode and as soon as 1.2 is released > (which is hopefully by end of year) it will go into "Security" mode. > There are no more 1.0 releases scheduled. It is not reasonable to be > doing new SQLAlchemy-oriented development without first upgrading your > application to the latest release which in this case is 1.1.15. > > > > I really don't want
Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`
On Thu, Nov 23, 2017 at 5:27 PM, Mike Bayerwrote: > On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуан wrote: >> >>> A Query can have lots of entities in it, and if you're doing sharding a >>> single result set can refer to any number of shard identifiers within >>> not just a single result set but within a single row; they might have >>> come from dozens of different databases at once >> >> In my case it is not possible: all entities in query can be gotten only from >> one particular shard. We have totally the same database structure for each >> shard. The difference is just data stored into database. No `shard_id` or >> any other key as part of primary key for any table. > > > so just to note, these aren't "shards", they're tenants. you have a > multi-tenant application, which is normally a really easy thing. but > you have a few side applications that want to "cheat" and use the > per-tenant object model across multiple tenants simultaneously in the > scope of a single Session. > >> If I want to make query >> for particular database I always want to retrieve data ONLY from that >> database. And even more than that: ONLY one database during one session >> transaction (or, in other words, one http-request to our app). > > if you have one "tenant id" per HTTP request, the standard HTTP > request pattern is one Session per request.There's no problem in > that case. You mentioned you have some non-flask applications that > want to communicate with multiple tenants in one Session. > OP, can you describe in more detail why these applications need to talk to multiple tenant databases in a single session? Perhaps there might be an alternative way to approach that. Simon -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`
On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуанwrote: > >> A Query can have lots of entities in it, and if you're doing sharding a >> single result set can refer to any number of shard identifiers within >> not just a single result set but within a single row; they might have >> come from dozens of different databases at once > > In my case it is not possible: all entities in query can be gotten only from > one particular shard. We have totally the same database structure for each > shard. The difference is just data stored into database. No `shard_id` or > any other key as part of primary key for any table. so just to note, these aren't "shards", they're tenants. you have a multi-tenant application, which is normally a really easy thing. but you have a few side applications that want to "cheat" and use the per-tenant object model across multiple tenants simultaneously in the scope of a single Session. > If I want to make query > for particular database I always want to retrieve data ONLY from that > database. And even more than that: ONLY one database during one session > transaction (or, in other words, one http-request to our app). if you have one "tenant id" per HTTP request, the standard HTTP request pattern is one Session per request.There's no problem in that case. You mentioned you have some non-flask applications that want to communicate with multiple tenants in one Session. > >> This could only be suited with a very open plugin point that is carefully >> architected, tested, and documented and I don't have the resources to >> envision this for a short-term use case. > > I've seen a lot of questions (for example in stackoverflow) "how to manage > several binds with sqlalchemy" and all answers are "Use separated sessions". it's probably the best answer feel free to show specifics and I can determine if their request would fit this hypothetical feature otherwise. > I really not understand, what is the problem to implement "multibound" > session. The Session has always supported multiple binds. There are two levels supported. One is per table/mapper: http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#simple-vertical-partitioning you could probably adapt your multiple tenants into individual mappings if there are a limited number, see the approach at https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/EntityName. The other level is per primary key, that is, each primary key in the identity map has a different bind. that's horizontal sharding. You're looking for a new level, which is, multiple binds for *one* primary key. This is an intricate feature request that is feasible but not in the short term. > Don't have ideal vocabulary to explain how our team like it :) > Currently I see only one problem: loading instances. Of course, after fixing > other problems may appear... > > I can (and want) make it as part of SQLAlchemy library. Fully-tested part, > of course. If you say that it is bad idea, ok then. it's not a bad idea. It's just difficult, and I can't do it right now. >I can make it as a > plugin, but there is a problem: functions in `loading` module are monolithic > and it needs some refactor for the plugin. May I suggest refactor as pull > request? you can do a pull request but note that the PR process for SQLAlchemy is not quick. 90% of code-related pull requests I get have no tests, no documentation, or anything.More elaborate feature requests typically involve that I end up doing the whole thing myself in any case, using the submitter's original code as just a sketch, which means that more involved PRs are usually just another form of feature request.These PRs are almost always for Core level features as the Core is easier for outside contributors to work on. ORM-level contributions are extremely rare these days but of course I welcome contributors for the ORM. And if so: could it be merged not only for major release but for, > at least, 1.0.* (yes, in our project we use 1.0.19 :) )? The 1.0 series is in "maintenance" mode and as soon as 1.2 is released (which is hopefully by end of year) it will go into "Security" mode. There are no more 1.0 releases scheduled. It is not reasonable to be doing new SQLAlchemy-oriented development without first upgrading your application to the latest release which in this case is 1.1.15. > I really don't want (of course!) to copy entire `loading` module for > additional logic of `identitykey` construction. But currently I do not see > any other way to implement it for my project :( > >> when you query two different databases, you are using >> two independent transactions in any case; > > So, what is the difference, if there are two transactions in any case? :) because you would have two identity maps > >> I don't understand why you can't use independent sessions > > The first problem is `query` property of `Base` instances. If we use several > sessions, we
Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`
See, that `Query.get` using `get_from_identity` instead of `instances` with key, constructed into `_get_impl`. Could you point to another places where same problem (described into my first message) can be appeared? чт, 23 нояб. 2017 г. в 16:44, Антонио Антуан: > > > A Query can have lots of entities in it, and if you're doing sharding a > > single result set can refer to any number of shard identifiers within > > not just a single result set but within a single row; they might have > > come from dozens of different databases at once > > In my case it is not possible: all entities in query can be gotten only > from one particular shard. We have totally the same database structure for > each shard. The difference is just data stored into database. No `shard_id` > or any other key as part of primary key for any table. If I want to make > query for particular database I always want to retrieve data ONLY from that > database. And even more than that: ONLY one database during one session > transaction (or, in other words, one http-request to our app). > > > > This could only be suited with a very open plugin point that is carefully > > architected, tested, and documented and I don't have the resources to > > envision this for a short-term use case. > > I've seen a lot of questions (for example in stackoverflow) "how to manage > several binds with sqlalchemy" and all answers are "Use separated > sessions". I really not understand, what is the problem to implement > "multibound" session. I make it in my project and it's really beautiful, > clear and... Don't have ideal vocabulary to explain how our team like it :) > Currently I see only one problem: loading instances. Of course, after > fixing other problems may appear... > > I can (and want) make it as part of SQLAlchemy library. Fully-tested part, > of course. If you say that it is bad idea, ok then. I can make it as a > plugin, but there is a problem: functions in `loading` module are > monolithic and it needs some refactor for the plugin. May I suggest > refactor as pull request? And if so: could it be merged not only for major > release but for, at least, 1.0.* (yes, in our project we use 1.0.19 :) )? > I really don't want (of course!) to copy entire `loading` module for > additional logic of `identitykey` construction. But currently I do not see > any other way to implement it for my project :( > > > when you query two different databases, you are using > > two independent transactions in any case; > > So, what is the difference, if there are two transactions in any case? :) > > > I don't understand why you can't use independent sessions > > The first problem is `query` property of `Base` instances. If we use > several sessions, we need to use the same amount of `Base` classes and, > consequently, the same amount of models, don't we? > Another problem for us is already existed code. We can use sessions > registry, but it take a lot of month to override entire project. Another > way: append into `Query.__iter__` such a code: > >> self.with_session(sessions.get(self._shard_id)) > >> return super(Query, self).__iter__() > But it has no effect for UPDATE and INSERT queries. Also, I'm not sure > that there is no problems in that way... > > > I have one more thought. Don't you think that it is some kind of bug: I > make query for one bind and got entity from another. Yes, that behavior is > not foreseen by library. But from other point of view, library docs have > examples how to use several binds within one session. So, problem may > happens not only in my case. > > Anyway, can my suggestion ( > https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced) has any > effect for classic one-bound `Session`? If it can't, so, what's the > problem? :) > > > Excuse me for wasting your time. > And excuse me if my suggestions are idiotic :) > > Appreciate your help. > > чт, 23 нояб. 2017 г. в 0:20, Mike Bayer : > >> On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан >> wrote: >> > Glad to see that you remember my messages :) >> > >> > I've dived into `loading` module and I see that currently it is really >> > complicated to store additional data into pkey for each instance. >> > >> > Unfortunately, suggested solutions not good for my project. >> > Also, I think that `shard` meaning in my case is not the same as usual. >> > >> > I want to describe structure of out project, maybe it can help. >> > >> > Here is definition of our databases structure: >> > http://joxi.ru/nAyJVvGiXMv0Dr. >> > We got master db and several geo databases. Catalogs like `users`, >> `groups`, >> > `offers` and other are replicating to geo databases, so that data is >> always >> > the same. >> > But also we have tables like `clicks` and `leads`. Each app instance >> > contains the data about them in database, related to its geo: >> > europe-instance into europe-db, usa-instance into usa-database and so >> on. >> > Periodically
Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`
> A Query can have lots of entities in it, and if you're doing sharding a > single result set can refer to any number of shard identifiers within > not just a single result set but within a single row; they might have > come from dozens of different databases at once In my case it is not possible: all entities in query can be gotten only from one particular shard. We have totally the same database structure for each shard. The difference is just data stored into database. No `shard_id` or any other key as part of primary key for any table. If I want to make query for particular database I always want to retrieve data ONLY from that database. And even more than that: ONLY one database during one session transaction (or, in other words, one http-request to our app). > This could only be suited with a very open plugin point that is carefully > architected, tested, and documented and I don't have the resources to > envision this for a short-term use case. I've seen a lot of questions (for example in stackoverflow) "how to manage several binds with sqlalchemy" and all answers are "Use separated sessions". I really not understand, what is the problem to implement "multibound" session. I make it in my project and it's really beautiful, clear and... Don't have ideal vocabulary to explain how our team like it :) Currently I see only one problem: loading instances. Of course, after fixing other problems may appear... I can (and want) make it as part of SQLAlchemy library. Fully-tested part, of course. If you say that it is bad idea, ok then. I can make it as a plugin, but there is a problem: functions in `loading` module are monolithic and it needs some refactor for the plugin. May I suggest refactor as pull request? And if so: could it be merged not only for major release but for, at least, 1.0.* (yes, in our project we use 1.0.19 :) )? I really don't want (of course!) to copy entire `loading` module for additional logic of `identitykey` construction. But currently I do not see any other way to implement it for my project :( > when you query two different databases, you are using > two independent transactions in any case; So, what is the difference, if there are two transactions in any case? :) > I don't understand why you can't use independent sessions The first problem is `query` property of `Base` instances. If we use several sessions, we need to use the same amount of `Base` classes and, consequently, the same amount of models, don't we? Another problem for us is already existed code. We can use sessions registry, but it take a lot of month to override entire project. Another way: append into `Query.__iter__` such a code: >> self.with_session(sessions.get(self._shard_id)) >> return super(Query, self).__iter__() But it has no effect for UPDATE and INSERT queries. Also, I'm not sure that there is no problems in that way... I have one more thought. Don't you think that it is some kind of bug: I make query for one bind and got entity from another. Yes, that behavior is not foreseen by library. But from other point of view, library docs have examples how to use several binds within one session. So, problem may happens not only in my case. Anyway, can my suggestion ( https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced) has any effect for classic one-bound `Session`? If it can't, so, what's the problem? :) Excuse me for wasting your time. And excuse me if my suggestions are idiotic :) Appreciate your help. чт, 23 нояб. 2017 г. в 0:20, Mike Bayer: > On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан > wrote: > > Glad to see that you remember my messages :) > > > > I've dived into `loading` module and I see that currently it is really > > complicated to store additional data into pkey for each instance. > > > > Unfortunately, suggested solutions not good for my project. > > Also, I think that `shard` meaning in my case is not the same as usual. > > > > I want to describe structure of out project, maybe it can help. > > > > Here is definition of our databases structure: > > http://joxi.ru/nAyJVvGiXMv0Dr. > > We got master db and several geo databases. Catalogs like `users`, > `groups`, > > `offers` and other are replicating to geo databases, so that data is > always > > the same. > > But also we have tables like `clicks` and `leads`. Each app instance > > contains the data about them in database, related to its geo: > > europe-instance into europe-db, usa-instance into usa-database and so on. > > Periodically master-app pulls clicks and leads to master-database. Synced > > objects always have different ids into master- and get-db, so it is ok. > > > > But one time project owner came and said: "I need SAAS". > > We see, that in current structure it's very hard (and really ugly) to > > implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and > other > > models will be multiplied with tenants amount. > > > > I discovered that I can