Re: [sqlalchemy] horizontal sharding and bulk_insert

2017-08-28 Thread Антонио Антуан
Great!
Thank you, Mike.
I really hope I can find time to figure out sqlalchemy source code and 
contribute into it. Currently I have not succeeded in this :(
But the library is amazing :)

среда, 23 августа 2017 г., 16:58:20 UTC+3 пользователь Mike Bayer написал:
>
> On Wed, Aug 23, 2017 at 2:01 AM, Антонио Антуан  > wrote: 
> > 
> > 
>
> > 
> > So, and now we always make queries without shard_id chosing: we always 
> make 
> > queries on 'default' database. One exception: when we run app on master 
> and 
> > read statistics from geo-slaves, we point it explicitly: 
> > "Session.query(...).set_shard_id(NODE_PREFIX)". So, yes, we use sharding 
> > mechanizm only for that. 
> > 
> > My question still the same: can I override "connection_callable" member 
> with 
> > None on my "ShardedSessionWithDefaultBind" class? 
>
> Sure, the ShardedSession is supposed to be more of an example class 
> that you can hack on and connection_callable is a simple None/ 
> not-None thing that is used when you flush() to get a list of 
> connections per row, where each connection is decided based on the 
> sharding conditionals. 
>
>
> > 
> > Also, if you can advise any other mechanism to make things as simple as 
> we 
> > have them now (with horizontal_sharding mechanism), it would be great, 
> our 
> > team will appreciate it :) 
>
> My suggestion was that bulk_insert_mappings() / bulk_update_mappings / 
> bulk_save_objects don't actually make the objects as any part of the 
> session's state, nor does the existing object state of the session 
> have any bearing on the operation of bulk_insert/bulk_update other 
> than being part of the same transaction.   So if you don't want to 
> tinker with the state of your sharded_session, just use a different 
> Session. 
>
> If you have code like this: 
>
>  some_stuff = my_sharded_session.query(Stuff).filter(...).all() 
>
>  my_sharded_session.bulk_save_objects(objects) 
>
> you could change it to this: 
>
>  some_stuff = my_sharded_session.query(Stuff).filter(...).all() 
>
>  ad_hoc_session = 
> Session(bind=my_sharded_session.connection(shard_id='default')) 
>  ad_hoc_session.bulk_save_objects(objects) 
>
> that way you don't have to change the state of my_sharded_session. 
> It probably doesn't matter much either way, e.g. 
> session.connection_callable=None or this, assuming there is no 
> concurrent sharing of the sharded_session instance. 
>
>
>
> > 
> >> 
> >> > 
> >> > If it can help: I ALWAYS perform UPDATE/INSERT/DELETE and most of 
> SELECT 
> >> > queries in only one database. So, I wrote such subclass: 
> >> > 
> >> > 
> >> > class ShardedSessionWithDefaultBind(ShardedSession): 
> >> > def get_bind(self, mapper, shard_id=None, instance=None, 
> >> > clause=None, 
> >> > **kw): 
> >> > if shard_id is None: 
> >> > shard_id = default_shard_id 
> >> > return super(ShardedSessionWithDefaultBind, 
> >> > self).get_bind(mapper, 
> >> > shard_id, instance, clause, **kw) 
> >> > 
> >> > 
> >> > Maybe I can override "__init__" for my class and write 
> >> > "self.connnection_callable = None"? 
> >> > My research of sqlalchemy code didn't make me sure that it is safe 
> >> > enough. 
> >> > But I see, that "connection_callable" used only for checking into 
> >> > "_bulk_insert" and "_bulk_update" functions in 
> >> > sqlalchemy.orm.persistence 
> >> > module. 
> >> > So, if I 100% sure, that I ALWAYS perform INSERT/UPDATE/DELETE 
> queries 
> >> > in 
> >> > only one database, can I make it? 
> >> > 
> >> > -- 
> >> > SQLAlchemy - 
> >> > The Python SQL Toolkit and Object Relational Mapper 
> >> > 
> >> > http://www.sqlalchemy.org/ 
> >> > 
> >> > To post example code, please provide an MCVE: Minimal, Complete, and 
> >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a 
> full 
> >> > description. 
> >> > --- 
> >> > You received this message because you are subscribed to the Google 
> >> > Groups 
> >> > "sqlalchemy" group. 
> >> > To unsubscribe from this group and stop receiving emails from it, 
> send 
> >> > an 
> >> > email to sqlalchemy+...@googlegroups.com. 
> >> > To post to this group, send email to sqlal...@googlegroups.com. 
> >> > Visit this group at https://groups.google.com/group/sqlalchemy. 
> >> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > -- 
> > SQLAlchemy - 
> > The Python SQL Toolkit and Object Relational Mapper 
> > 
> > http://www.sqlalchemy.org/ 
> > 
> > To post example code, please provide an MCVE: Minimal, Complete, and 
> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> > description. 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sqlalchemy" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to sqlalchemy+...@googlegroups.com . 
> > To post to this group, send email to sqlal...@googlegroups.com 
> . 
> > Visit 

Re: [sqlalchemy] horizontal sharding and bulk_insert

2017-08-23 Thread Mike Bayer
On Wed, Aug 23, 2017 at 2:01 AM, Антонио Антуан  wrote:
>
>

>
> So, and now we always make queries without shard_id chosing: we always make
> queries on 'default' database. One exception: when we run app on master and
> read statistics from geo-slaves, we point it explicitly:
> "Session.query(...).set_shard_id(NODE_PREFIX)". So, yes, we use sharding
> mechanizm only for that.
>
> My question still the same: can I override "connection_callable" member with
> None on my "ShardedSessionWithDefaultBind" class?

Sure, the ShardedSession is supposed to be more of an example class
that you can hack on and connection_callable is a simple None/
not-None thing that is used when you flush() to get a list of
connections per row, where each connection is decided based on the
sharding conditionals.


>
> Also, if you can advise any other mechanism to make things as simple as we
> have them now (with horizontal_sharding mechanism), it would be great, our
> team will appreciate it :)

My suggestion was that bulk_insert_mappings() / bulk_update_mappings /
bulk_save_objects don't actually make the objects as any part of the
session's state, nor does the existing object state of the session
have any bearing on the operation of bulk_insert/bulk_update other
than being part of the same transaction.   So if you don't want to
tinker with the state of your sharded_session, just use a different
Session.

If you have code like this:

 some_stuff = my_sharded_session.query(Stuff).filter(...).all()

 my_sharded_session.bulk_save_objects(objects)

you could change it to this:

 some_stuff = my_sharded_session.query(Stuff).filter(...).all()

 ad_hoc_session =
Session(bind=my_sharded_session.connection(shard_id='default'))
 ad_hoc_session.bulk_save_objects(objects)

that way you don't have to change the state of my_sharded_session.
It probably doesn't matter much either way, e.g.
session.connection_callable=None or this, assuming there is no
concurrent sharing of the sharded_session instance.



>
>>
>> >
>> > If it can help: I ALWAYS perform UPDATE/INSERT/DELETE and most of SELECT
>> > queries in only one database. So, I wrote such subclass:
>> >
>> >
>> > class ShardedSessionWithDefaultBind(ShardedSession):
>> > def get_bind(self, mapper, shard_id=None, instance=None,
>> > clause=None,
>> > **kw):
>> > if shard_id is None:
>> > shard_id = default_shard_id
>> > return super(ShardedSessionWithDefaultBind,
>> > self).get_bind(mapper,
>> > shard_id, instance, clause, **kw)
>> >
>> >
>> > Maybe I can override "__init__" for my class and write
>> > "self.connnection_callable = None"?
>> > My research of sqlalchemy code didn't make me sure that it is safe
>> > enough.
>> > But I see, that "connection_callable" used only for checking into
>> > "_bulk_insert" and "_bulk_update" functions in
>> > sqlalchemy.orm.persistence
>> > module.
>> > So, if I 100% sure, that I ALWAYS perform INSERT/UPDATE/DELETE queries
>> > in
>> > only one database, can I make it?
>> >
>> > --
>> > SQLAlchemy -
>> > The Python SQL Toolkit and Object Relational Mapper
>> >
>> > http://www.sqlalchemy.org/
>> >
>> > To post example code, please provide an MCVE: Minimal, Complete, and
>> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full
>> > description.
>> > ---
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "sqlalchemy" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an
>> > email to sqlalchemy+...@googlegroups.com.
>> > To post to this group, send email to sqlal...@googlegroups.com.
>> > Visit this group at https://groups.google.com/group/sqlalchemy.
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send 

Re: [sqlalchemy] horizontal sharding and bulk_insert

2017-08-23 Thread Антонио Антуан


вторник, 22 августа 2017 г., 17:29:41 UTC+3 пользователь Mike Bayer написал:
>
> On Tue, Aug 22, 2017 at 3:43 AM, Антонио Антуан  > wrote: 
> > Hi guys 
> > I tried to implement horizontal sharding in my project. Everything is 
> ok, 
> > except bulk_inserts. 
> > When I run tests, I got this error: 
> >   File "/home/anton/Projects/proj/core/model/messages.py", line 210, in 
> > create 
> > Session.bulk_insert_mappings(MessageEntity, to_commit) 
> >   File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/scoping.py", line 
> 157, 
> > in do 
> > return getattr(self.registry(), name)(*args, **kwargs) 
> >   File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 
> 2345, 
> > in bulk_insert_mappings 
> > mapper, mappings, False, False, return_defaults, False) 
> >   File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 
> 2416, 
> > in _bulk_save_mappings 
> > transaction.rollback(_capture_exception=True) 
> >   File "build/bdist.linux-x86_64/egg/sqlalchemy/util/langhelpers.py", 
> line 
> > 60, in __exit__ 
> > compat.reraise(exc_type, exc_value, exc_tb) 
> >   File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 
> 2411, 
> > in _bulk_save_mappings 
> > mapper, mappings, transaction, isstates, return_defaults) 
> >   File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/persistence.py", 
> line 
> > 35, in _bulk_insert 
> > "connection_callable / per-instance sharding " 
> > NotImplementedError: connection_callable / per-instance sharding not 
> > supported in bulk_insert() 
> > 
> > I do not understand what 'connection_callable' does, but I really need 
> to 
> > implement it. Is there any ways to do it? 
>
> The idea of bulk_insert() is strictly one of performance with some 
> degree of convenience; you don't need it to actually accomplish 
> anything as there are many other ways to achieve what it does.   the 
> "connection_callable" thing is specifically so that an ORM persistence 
> operation can get at the appropriate database connection for a 
> particular INSERT/UPDATE/DELETE, however this adds complexity so is 
> not part of bulk_insert(). 
>
> In the case of application side horizontal sharding, you'd want to 
> take the data you're passing to it and pre-sort it into individual 
> per-shard sets, then use a Session per shard to run the inserts. 
>
> Also, application side horizontal sharding is not a great way to do 
> sharding in any case.  If you're on Postgresql I'd strongly recommend 
> using table inheritance instead. 
>
 
Yes, Postgresql used into a project. But I can't use inheritance, because I 
have several servers, each of them located in separate country. 
Yes, I could use inheritance and FDW, but in that case system become more 
complicated.

When application instance runs on master node - it can read and write to 
master database and can only read from geo-slaves.
When application instance runs on geo-slave - it can read from master, read 
from replicated table on current node and write data (some statistics) to 
some table on current node.
Periodically master read data-tables (statistics) from each geo-slave and 
write new data on master database.


Previously my project used something like that:
class ClusterMeta(object):
_pg_config = {}
_pg_sessions = defaultdict(list)
is_slave = config.get('default', 'is_slave')
prefix = config.get('default', 'prefix')

def _get_pg_conf(self):
for name, conn_string in config['sqlalchemy'].items():
if name.startswith('conn_string_slave_'):
sp = name.replace('conn_string_slave_', '').lower()
self._pg_config[sp] = conn_string
elif name in ['conn_string_master', 'conn_string']:
sp = 'default' if name == 'conn_string' else 'master'
if sp in self._pg_config:
raise ValueError('{} conn_string already 
configured!'.format(sp.upper()))
else:
self._pg_config[sp] = conn_string

def _create_pg_session(self, conn_string):
"""
:rtype: tuple(base, session)
"""
engine = create_engine(conn_string,
   convert_unicode=True)

session = ScopedSession(sessionmaker(bind=engine))
base = declarative_base()
base.query = session.query_property()

return base, session

   def get_session(self, prefix, with_base=False):
if prefix not in self._pg_sessions:
conn_string = self._pg_config.get(prefix)
if not conn_string:
raise ValueError('No conn_string for prefix "%s"' % prefix)
if prefix not in self._pg_sessions:
self._pg_sessions[prefix] = []
s = self._create_pg_session(conn_string)
self._pg_sessions[prefix].insert(0, s)
else:
s = self._pg_sessions[prefix][0]
return s if 

Re: [sqlalchemy] horizontal sharding and bulk_insert

2017-08-22 Thread Mike Bayer
On Tue, Aug 22, 2017 at 3:43 AM, Антонио Антуан  wrote:
> Hi guys
> I tried to implement horizontal sharding in my project. Everything is ok,
> except bulk_inserts.
> When I run tests, I got this error:
>   File "/home/anton/Projects/proj/core/model/messages.py", line 210, in
> create
> Session.bulk_insert_mappings(MessageEntity, to_commit)
>   File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/scoping.py", line 157,
> in do
> return getattr(self.registry(), name)(*args, **kwargs)
>   File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 2345,
> in bulk_insert_mappings
> mapper, mappings, False, False, return_defaults, False)
>   File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 2416,
> in _bulk_save_mappings
> transaction.rollback(_capture_exception=True)
>   File "build/bdist.linux-x86_64/egg/sqlalchemy/util/langhelpers.py", line
> 60, in __exit__
> compat.reraise(exc_type, exc_value, exc_tb)
>   File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 2411,
> in _bulk_save_mappings
> mapper, mappings, transaction, isstates, return_defaults)
>   File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/persistence.py", line
> 35, in _bulk_insert
> "connection_callable / per-instance sharding "
> NotImplementedError: connection_callable / per-instance sharding not
> supported in bulk_insert()
>
> I do not understand what 'connection_callable' does, but I really need to
> implement it. Is there any ways to do it?

The idea of bulk_insert() is strictly one of performance with some
degree of convenience; you don't need it to actually accomplish
anything as there are many other ways to achieve what it does.   the
"connection_callable" thing is specifically so that an ORM persistence
operation can get at the appropriate database connection for a
particular INSERT/UPDATE/DELETE, however this adds complexity so is
not part of bulk_insert().

In the case of application side horizontal sharding, you'd want to
take the data you're passing to it and pre-sort it into individual
per-shard sets, then use a Session per shard to run the inserts.

Also, application side horizontal sharding is not a great way to do
sharding in any case.  If you're on Postgresql I'd strongly recommend
using table inheritance instead.


>
> If it can help: I ALWAYS perform UPDATE/INSERT/DELETE and most of SELECT
> queries in only one database. So, I wrote such subclass:
>
>
> class ShardedSessionWithDefaultBind(ShardedSession):
> def get_bind(self, mapper, shard_id=None, instance=None, clause=None,
> **kw):
> if shard_id is None:
> shard_id = default_shard_id
> return super(ShardedSessionWithDefaultBind, self).get_bind(mapper,
> shard_id, instance, clause, **kw)
>
>
> Maybe I can override "__init__" for my class and write
> "self.connnection_callable = None"?
> My research of sqlalchemy code didn't make me sure that it is safe enough.
> But I see, that "connection_callable" used only for checking into
> "_bulk_insert" and "_bulk_update" functions in sqlalchemy.orm.persistence
> module.
> So, if I 100% sure, that I ALWAYS perform INSERT/UPDATE/DELETE queries in
> only one database, can I make it?
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.