Hi Mike, Thanks for the swift response.
Then I guess the easiest fix for me is to modify the cache key composition to replace the dialect object part with a dialect summary (e.g. type + version) that is the same between all engines. But it requires a guarantee that all compiled SQL queries must not contain database (schema) name. Having a single connection pool that is shared among all shards on a host is cool (i think we have discussed this in the past in another thread), but is it readily available at the moment? Also, if I go with this solution only, it still requires a LRU cache capacity of (number of hosts * number of queries). Thanks, Carson On Tuesday, November 26, 2019 at 1:33:24 AM UTC+8, Mike Bayer wrote: > > > > On Mon, Nov 25, 2019, at 4:40 AM, Carson Ip wrote: > > *Background:* > > I am using a multi-shard MySQL setup (multiple db hosts, each host holding > many databases, a.k.a. "shards"). My Python application is creating engines > to many of these shards. For performance reasons, the application utilizes > bakery and BakedQuery to avoid compiling SQL statements on every ORM call. > > *Issue:* > > I have 100+ BakedQuery and 100+ shards. It appears it needs a bakery > (LRUCache) of size = (N BakedQuery * M shards) to cache all of the queries > for all shards because the cache key for the compiled SQL (not the > BakedQuery) contains the dialect object. Please see _execute_clauseelement > in sqlalchemy.engine.base. > > key = ( > dialect, > elem, > tuple(sorted(keys)), > self.schema_for_object.hash_key, > len(distilled_params) > 1, > ) > compiled_sql = self._execution_options[*"compiled_cache"*].get(key) > *if *compiled_sql *is *None: > compiled_sql = elem.compile( > dialect=dialect, > column_keys=keys, > inline=len(distilled_params) > 1, > schema_translate_map=self.schema_for_object > *if not *self.schema_for_object.is_default > *else *None, > ) > self._execution_options[*"compiled_cache"*][key] = compiled_sql > > > When the cache capacity is small, it keeps evicting cache entries and > compiling queries, using a lot of CPU. Also if it takes N*M for cache > capacity, it is bad for memory. > > *Question:* > > 1. Any good suggestions on fixing the performance and memory issue? e.g. > by sharing the cache key > 2. To share the cache key, do we implement a __eq__ for the Dialect object > or make all the shards (engines) share the same Dialect object? > 3. Is sharing a Dialect object dangerous? I see default_schema_name in the > Dialect object. > 3. This not only affects bakedquery, but also compiled_cache of engines > (non-BakedQuery). Is there a good universal fix for the issue (like Q2, > sharing dialect)? > > > Hi there - > > yeah I dont think this is a case that is anticipated right now by any of > the cache-related systems since "dialect" is part of the key. dialect is > there because it impacts how the SQL might be emitted based on options and > so forth. > > A longer term solution (definitely not for 1.3.x) would be that dialects > produce part of the cache key based on the type of dialect and the server > version info, as well as any options that may affect SQL output. There > is a new cache key mechanism going into 1.4 that will be targeted for > mainstream use by the SQLAlchemy 2.0 series and I've added > https://github.com/sqlalchemy/sqlalchemy/issues/5002 to ensure this > aspect of it is dealt with. > > For now, it is mostly safe to share the dialect object, with the exception > of the "default schema name" which will have significance for table > reflection operation. if you aren't using table reflection then you > could in theory share the dialect. > > I would likely look first to seeing if there is a way to use a single > engine per host, and then to use the schema translation feature so that the > full set of shards per host are available under one engine: > https://docs.sqlalchemy.org/en/13/core/connections.html?highlight=schema_translate_map#schema-translating > > . > > This is what you should likely be doing in any case as it would allow you > to have a single connection pool that is shared for all shards on a host. > > > > > > > > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sqlalchemy/d1e7f3ce-4cd8-4d9d-b774-44175a4e523b%40googlegroups.com > > <https://groups.google.com/d/msgid/sqlalchemy/d1e7f3ce-4cd8-4d9d-b774-44175a4e523b%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > > -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/65159d52-9ea7-4e35-96b8-d0215530939d%40googlegroups.com.
