Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Simon King
On Thu, Nov 23, 2017 at 6:55 PM, Антонио Антуан  wrote:
>
>
> чт, 23 нояб. 2017 г. в 20:27, Mike Bayer :
>>
>> On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуан 
>> wrote:
>> >
>> >> A Query can have lots of entities in it, and if you're doing sharding a
>> >> single result set can refer to any number of shard identifiers within
>> >> not just a single result set but within a single row; they might have
>> >> come from dozens of different databases at once
>> >
>> > In my case it is not possible: all entities in query can be gotten only
>> > from
>> > one particular shard. We have totally the same database structure for
>> > each
>> > shard. The difference is just data stored into database. No `shard_id`
>> > or
>> > any other key as part of primary key for any table.
>>
>>
>> so just to note, these aren't "shards", they're tenants.  you have a
>> multi-tenant application, which is normally a really easy thing.  but
>> you have a few side applications that want to "cheat" and use the
>> per-tenant object model across multiple tenants simultaneously in the
>> scope of a single Session.
>>
>> > If I want to make query
>> > for particular database I always want to retrieve data ONLY from that
>> > database. And even more than that: ONLY one database during one session
>> > transaction (or, in other words, one http-request to our app).
>>
>> if you have one "tenant id" per HTTP request, the standard HTTP
>> request pattern is one Session per request.There's no problem in
>> that case.  You mentioned you have some non-flask applications that
>> want to communicate with multiple tenants in one Session.
>
>
> Yes, you're right. We have some offline actions, when we want to ask each
> tenant about something specific.
> I see, that currently the most safe way is to call `commit`, `rollback`,
> `remove` or `expunge_all` on session instance: all this methods drops
> identity map. Please let me know if I'm wrong.

A couple of things here. First, you are using ScopedSession, which is
essentially a wrapper around an actual Session. The commit(),
rollback() and expunge_all() methods are proxies that pass directly
through to the underlying Session. I believe commit() and rollback()
*expire* instances (so attributes will be reloaded on the next
access), but don't actually remove them from the identity map (but I
could be wrong about this).

remove() is not a Session method though - it tells the ScopedSession
to discard the current Session. A new Session will be created the next
time you call one of the proxied methods.

The default behaviour for ScopedSession is to use thread-locals, so
each thread gets its own Session. However, you can provide your own
scoping function that does whatever you want:

http://docs.sqlalchemy.org/en/latest/orm/contextual.html#using-custom-created-scopes

It sounds like you could pass a "get_current_tenant" function to the
scoped session, along with a custom session_factory, to get the
behaviour you want. Something like this (untested, would definitely
require more care to make it thread-safe and so on):

class SessionManager(object):
def __init__(self, tenant_uris):
self.engines = {}
self.current_tenant = None
for name, dburi in tenant_uris.items():
self.engines[name] = sa.create_engine(name)
self.sessionmaker = saorm.sessionmaker()

def get_current_tenant(self):
return self.current_tenant

def set_current_tenant(self, name):
self.current_tenant = name

def create_session(self):
engine = self.engines[self.current_tenant]
return self.sessionmaker(bind=engine)

tenant_uris = {
'one': 'mysql://...',
'two': 'mysql://...',
}
manager = SessionManager(tenant_uris)

Session = saorm.scoped_session(manager.create_session,
scopefunc=manager.get_current_tenant)

Base.query = Session.query_property


As long as you call manager.set_current_tenant whenever you switch to
querying a different tenant, this ought to work. But note that all of
this confusion and complexity stems from using scoped sessions and
Base.query. If you used explicit sessions everywhere, you would
probably find your code less magical and easier to understand.

Hope that helps,

Simon

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Антонио Антуан
чт, 23 нояб. 2017 г. в 20:27, Mike Bayer :

> On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуан 
> wrote:
> >
> >> A Query can have lots of entities in it, and if you're doing sharding a
> >> single result set can refer to any number of shard identifiers within
> >> not just a single result set but within a single row; they might have
> >> come from dozens of different databases at once
> >
> > In my case it is not possible: all entities in query can be gotten only
> from
> > one particular shard. We have totally the same database structure for
> each
> > shard. The difference is just data stored into database. No `shard_id` or
> > any other key as part of primary key for any table.
>
>
> so just to note, these aren't "shards", they're tenants.  you have a
> multi-tenant application, which is normally a really easy thing.  but
> you have a few side applications that want to "cheat" and use the
> per-tenant object model across multiple tenants simultaneously in the
> scope of a single Session.
>
> > If I want to make query
> > for particular database I always want to retrieve data ONLY from that
> > database. And even more than that: ONLY one database during one session
> > transaction (or, in other words, one http-request to our app).
>
> if you have one "tenant id" per HTTP request, the standard HTTP
> request pattern is one Session per request.There's no problem in
> that case.  You mentioned you have some non-flask applications that
> want to communicate with multiple tenants in one Session.
>

Yes, you're right. We have some offline actions, when we want to ask each
tenant about something specific.
I see, that currently the most safe way is to call `commit`, `rollback`,
`remove` or `expunge_all` on session instance: all this methods drops
identity map. Please let me know if I'm wrong.

>
> >
> >> This could only be suited with a very open plugin point that is
> carefully
> >> architected, tested, and documented and I don't have the resources to
> >> envision this for a short-term use case.
> >
> > I've seen a lot of questions (for example in stackoverflow) "how to
> manage
> > several binds with sqlalchemy" and all answers are "Use separated
> sessions".
>
> it's probably the best answer feel free to show specifics and I can
> determine if their request would fit this hypothetical feature
> otherwise.
>
> > I really not understand, what is the problem to implement "multibound"
> > session.
>
> The Session has always supported multiple binds.  There are two
> levels supported.  One is per table/mapper:
>
> http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#simple-vertical-partitioning
>
> you could probably adapt your multiple tenants into individual
> mappings if there are a limited number, see the approach at
> https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/EntityName.
>
> The other level is per primary key, that is, each primary key in the
> identity map has a different bind.  that's horizontal sharding.
>
> You're looking for a new level, which is, multiple binds for *one*
> primary key.   This is an intricate feature request that is feasible
> but not in the short term.
>
>
> > Don't have ideal vocabulary to explain how our team like it :)
> > Currently I see only one problem: loading instances. Of course, after
> fixing
> > other problems may appear...
> >
> > I can (and want) make it as part of SQLAlchemy library. Fully-tested
> part,
> > of course. If you say that it is bad idea, ok then.
>
> it's not a bad idea.  It's just difficult, and I can't do it right now.
>
>
> >I can make it as a
> > plugin, but there is a problem: functions in `loading` module are
> monolithic
> > and it needs some refactor for the plugin. May I suggest refactor as pull
> > request?
>
> you can do a pull request but note that the PR process for SQLAlchemy
> is not quick.   90% of code-related pull requests I get have no tests,
> no documentation, or anything.More elaborate feature requests
> typically involve that I end up doing the whole thing myself in any
> case, using the submitter's original code as just a sketch, which
> means that more involved PRs are usually just another form of feature
> request.These PRs are almost always for Core level features as the
> Core is easier for outside contributors to work on.   ORM-level
> contributions are extremely rare these days but of course I welcome
> contributors for the ORM.
>
> And if so: could it be merged not only for major release but for,
> > at least, 1.0.* (yes, in our project we use 1.0.19 :) )?
>
> The 1.0 series is in "maintenance" mode and as soon as 1.2 is released
> (which is hopefully by end of year) it will go into "Security" mode.
> There are no more 1.0 releases scheduled.   It is not reasonable to be
> doing new SQLAlchemy-oriented development without first upgrading your
> application to the latest release which in this case is 1.1.15.
>
>
> > I really don't want 

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Simon King
On Thu, Nov 23, 2017 at 5:27 PM, Mike Bayer  wrote:
> On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуан  wrote:
>>
>>> A Query can have lots of entities in it, and if you're doing sharding a
>>> single result set can refer to any number of shard identifiers within
>>> not just a single result set but within a single row; they might have
>>> come from dozens of different databases at once
>>
>> In my case it is not possible: all entities in query can be gotten only from
>> one particular shard. We have totally the same database structure for each
>> shard. The difference is just data stored into database. No `shard_id` or
>> any other key as part of primary key for any table.
>
>
> so just to note, these aren't "shards", they're tenants.  you have a
> multi-tenant application, which is normally a really easy thing.  but
> you have a few side applications that want to "cheat" and use the
> per-tenant object model across multiple tenants simultaneously in the
> scope of a single Session.
>
>> If I want to make query
>> for particular database I always want to retrieve data ONLY from that
>> database. And even more than that: ONLY one database during one session
>> transaction (or, in other words, one http-request to our app).
>
> if you have one "tenant id" per HTTP request, the standard HTTP
> request pattern is one Session per request.There's no problem in
> that case.  You mentioned you have some non-flask applications that
> want to communicate with multiple tenants in one Session.
>

OP, can you describe in more detail why these applications need to
talk to multiple tenant databases in a single session? Perhaps there
might be an alternative way to approach that.

Simon

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.


Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Mike Bayer
On Thu, Nov 23, 2017 at 8:44 AM, Антонио Антуан  wrote:
>
>> A Query can have lots of entities in it, and if you're doing sharding a
>> single result set can refer to any number of shard identifiers within
>> not just a single result set but within a single row; they might have
>> come from dozens of different databases at once
>
> In my case it is not possible: all entities in query can be gotten only from
> one particular shard. We have totally the same database structure for each
> shard. The difference is just data stored into database. No `shard_id` or
> any other key as part of primary key for any table.


so just to note, these aren't "shards", they're tenants.  you have a
multi-tenant application, which is normally a really easy thing.  but
you have a few side applications that want to "cheat" and use the
per-tenant object model across multiple tenants simultaneously in the
scope of a single Session.

> If I want to make query
> for particular database I always want to retrieve data ONLY from that
> database. And even more than that: ONLY one database during one session
> transaction (or, in other words, one http-request to our app).

if you have one "tenant id" per HTTP request, the standard HTTP
request pattern is one Session per request.There's no problem in
that case.  You mentioned you have some non-flask applications that
want to communicate with multiple tenants in one Session.


>
>> This could only be suited with a very open plugin point that is carefully
>> architected, tested, and documented and I don't have the resources to
>> envision this for a short-term use case.
>
> I've seen a lot of questions (for example in stackoverflow) "how to manage
> several binds with sqlalchemy" and all answers are "Use separated sessions".

it's probably the best answer feel free to show specifics and I can
determine if their request would fit this hypothetical feature
otherwise.

> I really not understand, what is the problem to implement "multibound"
> session.

The Session has always supported multiple binds.  There are two
levels supported.  One is per table/mapper:
http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#simple-vertical-partitioning

you could probably adapt your multiple tenants into individual
mappings if there are a limited number, see the approach at
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/EntityName.

The other level is per primary key, that is, each primary key in the
identity map has a different bind.  that's horizontal sharding.

You're looking for a new level, which is, multiple binds for *one*
primary key.   This is an intricate feature request that is feasible
but not in the short term.


> Don't have ideal vocabulary to explain how our team like it :)
> Currently I see only one problem: loading instances. Of course, after fixing
> other problems may appear...
>
> I can (and want) make it as part of SQLAlchemy library. Fully-tested part,
> of course. If you say that it is bad idea, ok then.

it's not a bad idea.  It's just difficult, and I can't do it right now.


>I can make it as a
> plugin, but there is a problem: functions in `loading` module are monolithic
> and it needs some refactor for the plugin. May I suggest refactor as pull
> request?

you can do a pull request but note that the PR process for SQLAlchemy
is not quick.   90% of code-related pull requests I get have no tests,
no documentation, or anything.More elaborate feature requests
typically involve that I end up doing the whole thing myself in any
case, using the submitter's original code as just a sketch, which
means that more involved PRs are usually just another form of feature
request.These PRs are almost always for Core level features as the
Core is easier for outside contributors to work on.   ORM-level
contributions are extremely rare these days but of course I welcome
contributors for the ORM.

And if so: could it be merged not only for major release but for,
> at least, 1.0.* (yes, in our project we use 1.0.19 :) )?

The 1.0 series is in "maintenance" mode and as soon as 1.2 is released
(which is hopefully by end of year) it will go into "Security" mode.
There are no more 1.0 releases scheduled.   It is not reasonable to be
doing new SQLAlchemy-oriented development without first upgrading your
application to the latest release which in this case is 1.1.15.


> I really don't want (of course!) to copy entire `loading` module for
> additional logic of `identitykey` construction. But currently I do not see
> any other way to implement it for my project :(
>
>> when you query two different databases, you are using
>> two independent transactions in any case;
>
> So, what is the difference, if there are two transactions in any case? :)

because you would have two identity maps

>
>> I don't understand why you can't use independent sessions
>
> The first problem is `query` property of `Base` instances. If we use several
> sessions, we 

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Антонио Антуан
See, that `Query.get` using `get_from_identity` instead of `instances` with
key, constructed into `_get_impl`.
Could you point to another places where same problem (described into my
first message) can be appeared?

чт, 23 нояб. 2017 г. в 16:44, Антонио Антуан :

>
> > A Query can have lots of entities in it, and if you're doing sharding a
> > single result set can refer to any number of shard identifiers within
> > not just a single result set but within a single row; they might have
> > come from dozens of different databases at once
>
> In my case it is not possible: all entities in query can be gotten only
> from one particular shard. We have totally the same database structure for
> each shard. The difference is just data stored into database. No `shard_id`
> or any other key as part of primary key for any table. If I want to make
> query for particular database I always want to retrieve data ONLY from that
> database. And even more than that: ONLY one database during one session
> transaction (or, in other words, one http-request to our app).
>
>
> > This could only be suited with a very open plugin point that is carefully
> > architected, tested, and documented and I don't have the resources to
> > envision this for a short-term use case.
>
> I've seen a lot of questions (for example in stackoverflow) "how to manage
> several binds with sqlalchemy" and all answers are "Use separated
> sessions". I really not understand, what is the problem to implement
> "multibound" session. I make it in my project and it's really beautiful,
> clear and... Don't have ideal vocabulary to explain how our team like it :)
> Currently I see only one problem: loading instances. Of course, after
> fixing other problems may appear...
>
> I can (and want) make it as part of SQLAlchemy library. Fully-tested part,
> of course. If you say that it is bad idea, ok then. I can make it as a
> plugin, but there is a problem: functions in `loading` module are
> monolithic and it needs some refactor for the plugin. May I suggest
> refactor as pull request? And if so: could it be merged not only for major
> release but for, at least, 1.0.* (yes, in our project we use 1.0.19 :) )?
> I really don't want (of course!) to copy entire `loading` module for
> additional logic of `identitykey` construction. But currently I do not see
> any other way to implement it for my project :(
>
> > when you query two different databases, you are using
> > two independent transactions in any case;
>
> So, what is the difference, if there are two transactions in any case? :)
>
> > I don't understand why you can't use independent sessions
>
> The first problem is `query` property of `Base` instances. If we use
> several sessions, we need to use the same amount of `Base` classes and,
> consequently, the same amount of models, don't we?
> Another problem for us is already existed code. We can use sessions
> registry, but it take a lot of month to override entire project. Another
> way:  append into `Query.__iter__` such a code:
> >> self.with_session(sessions.get(self._shard_id))
> >> return super(Query, self).__iter__()
> But it has no effect for UPDATE and INSERT queries. Also, I'm not sure
> that there is no problems in that way...
>
>
> I have one more thought. Don't you think that it is some kind of bug: I
> make query for one bind and got entity from another. Yes, that behavior is
> not foreseen by library. But from other point of view, library docs have
> examples how to use several binds within one session. So, problem may
> happens not only in my case.
>
> Anyway, can my suggestion (
> https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced) has any
> effect for classic one-bound `Session`? If it can't, so, what's the
> problem? :)
>
>
> Excuse me for wasting your time.
> And excuse me if my suggestions are idiotic :)
>
> Appreciate your help.
>
> чт, 23 нояб. 2017 г. в 0:20, Mike Bayer :
>
>> On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан 
>> wrote:
>> > Glad to see that you remember my messages :)
>> >
>> > I've dived into `loading` module and I see that currently it is really
>> > complicated to store additional data into pkey for each instance.
>> >
>> > Unfortunately, suggested solutions not good for my project.
>> > Also, I think that `shard` meaning in my case is not the same as usual.
>> >
>> > I want to describe structure of out project, maybe it can help.
>> >
>> > Here is definition of our databases structure:
>> > http://joxi.ru/nAyJVvGiXMv0Dr.
>> > We got master db and several geo databases. Catalogs like `users`,
>> `groups`,
>> > `offers` and other are replicating to geo databases, so that data is
>> always
>> > the same.
>> > But also we have tables like `clicks` and `leads`. Each app instance
>> > contains the data about them in database, related to its geo:
>> > europe-instance into europe-db, usa-instance into usa-database and so
>> on.
>> > Periodically 

Re: [sqlalchemy] Disable `session_identity_map` for `_instance_processor`

2017-11-23 Thread Антонио Антуан
> A Query can have lots of entities in it, and if you're doing sharding a
> single result set can refer to any number of shard identifiers within
> not just a single result set but within a single row; they might have
> come from dozens of different databases at once

In my case it is not possible: all entities in query can be gotten only
from one particular shard. We have totally the same database structure for
each shard. The difference is just data stored into database. No `shard_id`
or any other key as part of primary key for any table. If I want to make
query for particular database I always want to retrieve data ONLY from that
database. And even more than that: ONLY one database during one session
transaction (or, in other words, one http-request to our app).

> This could only be suited with a very open plugin point that is carefully
> architected, tested, and documented and I don't have the resources to
> envision this for a short-term use case.

I've seen a lot of questions (for example in stackoverflow) "how to manage
several binds with sqlalchemy" and all answers are "Use separated
sessions". I really not understand, what is the problem to implement
"multibound" session. I make it in my project and it's really beautiful,
clear and... Don't have ideal vocabulary to explain how our team like it :)
Currently I see only one problem: loading instances. Of course, after
fixing other problems may appear...

I can (and want) make it as part of SQLAlchemy library. Fully-tested part,
of course. If you say that it is bad idea, ok then. I can make it as a
plugin, but there is a problem: functions in `loading` module are
monolithic and it needs some refactor for the plugin. May I suggest
refactor as pull request? And if so: could it be merged not only for major
release but for, at least, 1.0.* (yes, in our project we use 1.0.19 :) )?
I really don't want (of course!) to copy entire `loading` module for
additional logic of `identitykey` construction. But currently I do not see
any other way to implement it for my project :(

> when you query two different databases, you are using
> two independent transactions in any case;

So, what is the difference, if there are two transactions in any case? :)

> I don't understand why you can't use independent sessions

The first problem is `query` property of `Base` instances. If we use
several sessions, we need to use the same amount of `Base` classes and,
consequently, the same amount of models, don't we?
Another problem for us is already existed code. We can use sessions
registry, but it take a lot of month to override entire project. Another
way:  append into `Query.__iter__` such a code:
>> self.with_session(sessions.get(self._shard_id))
>> return super(Query, self).__iter__()
But it has no effect for UPDATE and INSERT queries. Also, I'm not sure that
there is no problems in that way...


I have one more thought. Don't you think that it is some kind of bug: I
make query for one bind and got entity from another. Yes, that behavior is
not foreseen by library. But from other point of view, library docs have
examples how to use several binds within one session. So, problem may
happens not only in my case.

Anyway, can my suggestion (
https://gist.github.com/aCLr/746f92dedb4d303a49033c0db22beced) has any
effect for classic one-bound `Session`? If it can't, so, what's the
problem? :)


Excuse me for wasting your time.
And excuse me if my suggestions are idiotic :)

Appreciate your help.

чт, 23 нояб. 2017 г. в 0:20, Mike Bayer :

> On Wed, Nov 22, 2017 at 4:56 AM, Антонио Антуан 
> wrote:
> > Glad to see that you remember my messages :)
> >
> > I've dived into `loading` module and I see that currently it is really
> > complicated to store additional data into pkey for each instance.
> >
> > Unfortunately, suggested solutions not good for my project.
> > Also, I think that `shard` meaning in my case is not the same as usual.
> >
> > I want to describe structure of out project, maybe it can help.
> >
> > Here is definition of our databases structure:
> > http://joxi.ru/nAyJVvGiXMv0Dr.
> > We got master db and several geo databases. Catalogs like `users`,
> `groups`,
> > `offers` and other are replicating to geo databases, so that data is
> always
> > the same.
> > But also we have tables like `clicks` and `leads`. Each app instance
> > contains the data about them in database, related to its geo:
> > europe-instance into europe-db, usa-instance into usa-database and so on.
> > Periodically master-app pulls clicks and leads to master-database. Synced
> > objects always have different ids into master- and get-db, so it is ok.
> >
> > But one time project owner came and said: "I need SAAS".
> > We see, that in current structure it's very hard (and really ugly) to
> > implement saas-solution. Amount of `Base*`, `Session*`, `Order*` and
> other
> > models will be multiplied with tenants amount.
> >
> > I discovered that I can