Re: [sqlalchemy] horizontal sharding and bulk_insert
Great! Thank you, Mike. I really hope I can find time to figure out sqlalchemy source code and contribute into it. Currently I have not succeeded in this :( But the library is amazing :) среда, 23 августа 2017 г., 16:58:20 UTC+3 пользователь Mike Bayer написал: > > On Wed, Aug 23, 2017 at 2:01 AM, Антонио Антуан> wrote: > > > > > > > > > So, and now we always make queries without shard_id chosing: we always > make > > queries on 'default' database. One exception: when we run app on master > and > > read statistics from geo-slaves, we point it explicitly: > > "Session.query(...).set_shard_id(NODE_PREFIX)". So, yes, we use sharding > > mechanizm only for that. > > > > My question still the same: can I override "connection_callable" member > with > > None on my "ShardedSessionWithDefaultBind" class? > > Sure, the ShardedSession is supposed to be more of an example class > that you can hack on and connection_callable is a simple None/ > not-None thing that is used when you flush() to get a list of > connections per row, where each connection is decided based on the > sharding conditionals. > > > > > > Also, if you can advise any other mechanism to make things as simple as > we > > have them now (with horizontal_sharding mechanism), it would be great, > our > > team will appreciate it :) > > My suggestion was that bulk_insert_mappings() / bulk_update_mappings / > bulk_save_objects don't actually make the objects as any part of the > session's state, nor does the existing object state of the session > have any bearing on the operation of bulk_insert/bulk_update other > than being part of the same transaction. So if you don't want to > tinker with the state of your sharded_session, just use a different > Session. > > If you have code like this: > > some_stuff = my_sharded_session.query(Stuff).filter(...).all() > > my_sharded_session.bulk_save_objects(objects) > > you could change it to this: > > some_stuff = my_sharded_session.query(Stuff).filter(...).all() > > ad_hoc_session = > Session(bind=my_sharded_session.connection(shard_id='default')) > ad_hoc_session.bulk_save_objects(objects) > > that way you don't have to change the state of my_sharded_session. > It probably doesn't matter much either way, e.g. > session.connection_callable=None or this, assuming there is no > concurrent sharing of the sharded_session instance. > > > > > > >> > >> > > >> > If it can help: I ALWAYS perform UPDATE/INSERT/DELETE and most of > SELECT > >> > queries in only one database. So, I wrote such subclass: > >> > > >> > > >> > class ShardedSessionWithDefaultBind(ShardedSession): > >> > def get_bind(self, mapper, shard_id=None, instance=None, > >> > clause=None, > >> > **kw): > >> > if shard_id is None: > >> > shard_id = default_shard_id > >> > return super(ShardedSessionWithDefaultBind, > >> > self).get_bind(mapper, > >> > shard_id, instance, clause, **kw) > >> > > >> > > >> > Maybe I can override "__init__" for my class and write > >> > "self.connnection_callable = None"? > >> > My research of sqlalchemy code didn't make me sure that it is safe > >> > enough. > >> > But I see, that "connection_callable" used only for checking into > >> > "_bulk_insert" and "_bulk_update" functions in > >> > sqlalchemy.orm.persistence > >> > module. > >> > So, if I 100% sure, that I ALWAYS perform INSERT/UPDATE/DELETE > queries > >> > in > >> > only one database, can I make it? > >> > > >> > -- > >> > SQLAlchemy - > >> > The Python SQL Toolkit and Object Relational Mapper > >> > > >> > http://www.sqlalchemy.org/ > >> > > >> > To post example code, please provide an MCVE: Minimal, Complete, and > >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a > full > >> > description. > >> > --- > >> > You received this message because you are subscribed to the Google > >> > Groups > >> > "sqlalchemy" group. > >> > To unsubscribe from this group and stop receiving emails from it, > send > >> > an > >> > email to sqlalchemy+...@googlegroups.com. > >> > To post to this group, send email to sqlal...@googlegroups.com. > >> > Visit this group at https://groups.google.com/group/sqlalchemy. > >> > For more options, visit https://groups.google.com/d/optout. > > > > -- > > SQLAlchemy - > > The Python SQL Toolkit and Object Relational Mapper > > > > http://www.sqlalchemy.org/ > > > > To post example code, please provide an MCVE: Minimal, Complete, and > > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > > description. > > --- > > You received this message because you are subscribed to the Google > Groups > > "sqlalchemy" group. > > To unsubscribe from this group and stop receiving emails from it, send > an > > email to sqlalchemy+...@googlegroups.com . > > To post to this group, send email to sqlal...@googlegroups.com > . > > Visit
Re: [sqlalchemy] horizontal sharding and bulk_insert
On Wed, Aug 23, 2017 at 2:01 AM, Антонио Антуанwrote: > > > > So, and now we always make queries without shard_id chosing: we always make > queries on 'default' database. One exception: when we run app on master and > read statistics from geo-slaves, we point it explicitly: > "Session.query(...).set_shard_id(NODE_PREFIX)". So, yes, we use sharding > mechanizm only for that. > > My question still the same: can I override "connection_callable" member with > None on my "ShardedSessionWithDefaultBind" class? Sure, the ShardedSession is supposed to be more of an example class that you can hack on and connection_callable is a simple None/ not-None thing that is used when you flush() to get a list of connections per row, where each connection is decided based on the sharding conditionals. > > Also, if you can advise any other mechanism to make things as simple as we > have them now (with horizontal_sharding mechanism), it would be great, our > team will appreciate it :) My suggestion was that bulk_insert_mappings() / bulk_update_mappings / bulk_save_objects don't actually make the objects as any part of the session's state, nor does the existing object state of the session have any bearing on the operation of bulk_insert/bulk_update other than being part of the same transaction. So if you don't want to tinker with the state of your sharded_session, just use a different Session. If you have code like this: some_stuff = my_sharded_session.query(Stuff).filter(...).all() my_sharded_session.bulk_save_objects(objects) you could change it to this: some_stuff = my_sharded_session.query(Stuff).filter(...).all() ad_hoc_session = Session(bind=my_sharded_session.connection(shard_id='default')) ad_hoc_session.bulk_save_objects(objects) that way you don't have to change the state of my_sharded_session. It probably doesn't matter much either way, e.g. session.connection_callable=None or this, assuming there is no concurrent sharing of the sharded_session instance. > >> >> > >> > If it can help: I ALWAYS perform UPDATE/INSERT/DELETE and most of SELECT >> > queries in only one database. So, I wrote such subclass: >> > >> > >> > class ShardedSessionWithDefaultBind(ShardedSession): >> > def get_bind(self, mapper, shard_id=None, instance=None, >> > clause=None, >> > **kw): >> > if shard_id is None: >> > shard_id = default_shard_id >> > return super(ShardedSessionWithDefaultBind, >> > self).get_bind(mapper, >> > shard_id, instance, clause, **kw) >> > >> > >> > Maybe I can override "__init__" for my class and write >> > "self.connnection_callable = None"? >> > My research of sqlalchemy code didn't make me sure that it is safe >> > enough. >> > But I see, that "connection_callable" used only for checking into >> > "_bulk_insert" and "_bulk_update" functions in >> > sqlalchemy.orm.persistence >> > module. >> > So, if I 100% sure, that I ALWAYS perform INSERT/UPDATE/DELETE queries >> > in >> > only one database, can I make it? >> > >> > -- >> > SQLAlchemy - >> > The Python SQL Toolkit and Object Relational Mapper >> > >> > http://www.sqlalchemy.org/ >> > >> > To post example code, please provide an MCVE: Minimal, Complete, and >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full >> > description. >> > --- >> > You received this message because you are subscribed to the Google >> > Groups >> > "sqlalchemy" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> > an >> > email to sqlalchemy+...@googlegroups.com. >> > To post to this group, send email to sqlal...@googlegroups.com. >> > Visit this group at https://groups.google.com/group/sqlalchemy. >> > For more options, visit https://groups.google.com/d/optout. > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send
Re: [sqlalchemy] horizontal sharding and bulk_insert
вторник, 22 августа 2017 г., 17:29:41 UTC+3 пользователь Mike Bayer написал: > > On Tue, Aug 22, 2017 at 3:43 AM, Антонио Антуан> wrote: > > Hi guys > > I tried to implement horizontal sharding in my project. Everything is > ok, > > except bulk_inserts. > > When I run tests, I got this error: > > File "/home/anton/Projects/proj/core/model/messages.py", line 210, in > > create > > Session.bulk_insert_mappings(MessageEntity, to_commit) > > File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/scoping.py", line > 157, > > in do > > return getattr(self.registry(), name)(*args, **kwargs) > > File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line > 2345, > > in bulk_insert_mappings > > mapper, mappings, False, False, return_defaults, False) > > File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line > 2416, > > in _bulk_save_mappings > > transaction.rollback(_capture_exception=True) > > File "build/bdist.linux-x86_64/egg/sqlalchemy/util/langhelpers.py", > line > > 60, in __exit__ > > compat.reraise(exc_type, exc_value, exc_tb) > > File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line > 2411, > > in _bulk_save_mappings > > mapper, mappings, transaction, isstates, return_defaults) > > File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/persistence.py", > line > > 35, in _bulk_insert > > "connection_callable / per-instance sharding " > > NotImplementedError: connection_callable / per-instance sharding not > > supported in bulk_insert() > > > > I do not understand what 'connection_callable' does, but I really need > to > > implement it. Is there any ways to do it? > > The idea of bulk_insert() is strictly one of performance with some > degree of convenience; you don't need it to actually accomplish > anything as there are many other ways to achieve what it does. the > "connection_callable" thing is specifically so that an ORM persistence > operation can get at the appropriate database connection for a > particular INSERT/UPDATE/DELETE, however this adds complexity so is > not part of bulk_insert(). > > In the case of application side horizontal sharding, you'd want to > take the data you're passing to it and pre-sort it into individual > per-shard sets, then use a Session per shard to run the inserts. > > Also, application side horizontal sharding is not a great way to do > sharding in any case. If you're on Postgresql I'd strongly recommend > using table inheritance instead. > Yes, Postgresql used into a project. But I can't use inheritance, because I have several servers, each of them located in separate country. Yes, I could use inheritance and FDW, but in that case system become more complicated. When application instance runs on master node - it can read and write to master database and can only read from geo-slaves. When application instance runs on geo-slave - it can read from master, read from replicated table on current node and write data (some statistics) to some table on current node. Periodically master read data-tables (statistics) from each geo-slave and write new data on master database. Previously my project used something like that: class ClusterMeta(object): _pg_config = {} _pg_sessions = defaultdict(list) is_slave = config.get('default', 'is_slave') prefix = config.get('default', 'prefix') def _get_pg_conf(self): for name, conn_string in config['sqlalchemy'].items(): if name.startswith('conn_string_slave_'): sp = name.replace('conn_string_slave_', '').lower() self._pg_config[sp] = conn_string elif name in ['conn_string_master', 'conn_string']: sp = 'default' if name == 'conn_string' else 'master' if sp in self._pg_config: raise ValueError('{} conn_string already configured!'.format(sp.upper())) else: self._pg_config[sp] = conn_string def _create_pg_session(self, conn_string): """ :rtype: tuple(base, session) """ engine = create_engine(conn_string, convert_unicode=True) session = ScopedSession(sessionmaker(bind=engine)) base = declarative_base() base.query = session.query_property() return base, session def get_session(self, prefix, with_base=False): if prefix not in self._pg_sessions: conn_string = self._pg_config.get(prefix) if not conn_string: raise ValueError('No conn_string for prefix "%s"' % prefix) if prefix not in self._pg_sessions: self._pg_sessions[prefix] = [] s = self._create_pg_session(conn_string) self._pg_sessions[prefix].insert(0, s) else: s = self._pg_sessions[prefix][0] return s if
Re: [sqlalchemy] horizontal sharding and bulk_insert
On Tue, Aug 22, 2017 at 3:43 AM, Антонио Антуанwrote: > Hi guys > I tried to implement horizontal sharding in my project. Everything is ok, > except bulk_inserts. > When I run tests, I got this error: > File "/home/anton/Projects/proj/core/model/messages.py", line 210, in > create > Session.bulk_insert_mappings(MessageEntity, to_commit) > File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/scoping.py", line 157, > in do > return getattr(self.registry(), name)(*args, **kwargs) > File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 2345, > in bulk_insert_mappings > mapper, mappings, False, False, return_defaults, False) > File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 2416, > in _bulk_save_mappings > transaction.rollback(_capture_exception=True) > File "build/bdist.linux-x86_64/egg/sqlalchemy/util/langhelpers.py", line > 60, in __exit__ > compat.reraise(exc_type, exc_value, exc_tb) > File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/session.py", line 2411, > in _bulk_save_mappings > mapper, mappings, transaction, isstates, return_defaults) > File "build/bdist.linux-x86_64/egg/sqlalchemy/orm/persistence.py", line > 35, in _bulk_insert > "connection_callable / per-instance sharding " > NotImplementedError: connection_callable / per-instance sharding not > supported in bulk_insert() > > I do not understand what 'connection_callable' does, but I really need to > implement it. Is there any ways to do it? The idea of bulk_insert() is strictly one of performance with some degree of convenience; you don't need it to actually accomplish anything as there are many other ways to achieve what it does. the "connection_callable" thing is specifically so that an ORM persistence operation can get at the appropriate database connection for a particular INSERT/UPDATE/DELETE, however this adds complexity so is not part of bulk_insert(). In the case of application side horizontal sharding, you'd want to take the data you're passing to it and pre-sort it into individual per-shard sets, then use a Session per shard to run the inserts. Also, application side horizontal sharding is not a great way to do sharding in any case. If you're on Postgresql I'd strongly recommend using table inheritance instead. > > If it can help: I ALWAYS perform UPDATE/INSERT/DELETE and most of SELECT > queries in only one database. So, I wrote such subclass: > > > class ShardedSessionWithDefaultBind(ShardedSession): > def get_bind(self, mapper, shard_id=None, instance=None, clause=None, > **kw): > if shard_id is None: > shard_id = default_shard_id > return super(ShardedSessionWithDefaultBind, self).get_bind(mapper, > shard_id, instance, clause, **kw) > > > Maybe I can override "__init__" for my class and write > "self.connnection_callable = None"? > My research of sqlalchemy code didn't make me sure that it is safe enough. > But I see, that "connection_callable" used only for checking into > "_bulk_insert" and "_bulk_update" functions in sqlalchemy.orm.persistence > module. > So, if I 100% sure, that I ALWAYS perform INSERT/UPDATE/DELETE queries in > only one database, can I make it? > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.