Re: [sqlalchemy] Performance issues with multi-shard (engines) sharing the same bakery / LRUCache

Carson Ip Tue, 26 Nov 2019 00:02:29 -0800

Hi Mike,

Thanks for the swift response.


Then I guess the easiest fix for me is to modify the cache key composition 
to replace the dialect object part with a dialect summary (e.g. type + 
version) that is the same between all engines. But it requires a guarantee 
that all compiled SQL queries must not contain database (schema) name.

Having a single connection pool that is shared among all shards on a host 
is cool (i think we have discussed this in the past in another thread), but 
is it readily available at the moment? Also, if I go with this solution 
only, it still requires a LRU cache capacity of (number of hosts * number 
of queries).

Thanks,
Carson



On Tuesday, November 26, 2019 at 1:33:24 AM UTC+8, Mike Bayer wrote:
>
>
>
> On Mon, Nov 25, 2019, at 4:40 AM, Carson Ip wrote:
>
> *Background:*
>
> I am using a multi-shard MySQL setup (multiple db hosts, each host holding 
> many databases, a.k.a. "shards"). My Python application is creating engines 
> to many of these shards. For performance reasons, the application utilizes 
> bakery and BakedQuery to avoid compiling SQL statements on every ORM call.
>
> *Issue:*
>
> I have 100+ BakedQuery and 100+ shards. It appears it needs a bakery 
> (LRUCache) of size = (N BakedQuery * M shards) to cache all of the queries 
> for all shards because the cache key for the compiled SQL (not the 
> BakedQuery) contains the dialect object. Please see _execute_clauseelement 
> in sqlalchemy.engine.base. 
>
> key = (
>     dialect,
>     elem,
>     tuple(sorted(keys)),
>     self.schema_for_object.hash_key,
>     len(distilled_params) > 1,
> )
> compiled_sql = self._execution_options[*"compiled_cache"*].get(key)
> *if *compiled_sql *is *None:
>     compiled_sql = elem.compile(
>         dialect=dialect,
>         column_keys=keys,
>         inline=len(distilled_params) > 1,
>         schema_translate_map=self.schema_for_object
>         *if not *self.schema_for_object.is_default
>         *else *None,
>     )
>     self._execution_options[*"compiled_cache"*][key] = compiled_sql
>
>
> When the cache capacity is small, it keeps evicting cache entries and 
> compiling queries, using a lot of CPU. Also if it takes N*M for cache 
> capacity, it is bad for memory.
>
> *Question:*
>
> 1. Any good suggestions on fixing the performance and memory issue? e.g. 
> by sharing the cache key
> 2. To share the cache key, do we implement a __eq__ for the Dialect object 
> or make all the shards (engines) share the same Dialect object?
> 3. Is sharing a Dialect object dangerous? I see default_schema_name in the 
> Dialect object.
> 3. This not only affects bakedquery, but also compiled_cache of engines 
> (non-BakedQuery). Is there a good universal fix for the issue (like Q2, 
> sharing dialect)?
>
>
> Hi there -
>
> yeah I dont think this is a case that is anticipated right now by any of 
> the cache-related systems since "dialect" is part of the key.   dialect is 
> there because it impacts how the SQL might be emitted based on options and 
> so forth.
>
> A longer term solution (definitely not for 1.3.x) would be that dialects 
> produce part of the cache key based on the type of dialect and the server 
> version info, as well as any options that may affect SQL output.    There 
> is a new cache key mechanism going into 1.4 that will be targeted for 
> mainstream use by the SQLAlchemy 2.0 series  and I've added 
> https://github.com/sqlalchemy/sqlalchemy/issues/5002 to ensure this 
> aspect of it is dealt with.
>
> For now, it is mostly safe to share the dialect object, with the exception 
> of the "default schema name" which will have significance for table 
> reflection operation.    if you aren't using table reflection then you 
> could in theory share the dialect.
>
> I would likely look first to seeing if there is a way to use a single 
> engine per host, and then to use the schema translation feature so that the 
> full set of shards per host are available under one engine: 
> https://docs.sqlalchemy.org/en/13/core/connections.html?highlight=schema_translate_map#schema-translating
>  
> .
>
> This is what you should likely be doing in any case as it would allow you 
> to have a single connection pool that is shared for all shards on a host.
>
>
>
>
>
>
>
>
> --
> SQLAlchemy - 
> The Python SQL Toolkit and Object Relational Mapper
>  
> http://www.sqlalchemy.org/
>  
> To post example code, please provide an MCVE: Minimal, Complete, and 
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full 
> description.
> --- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sqlalchemy/d1e7f3ce-4cd8-4d9d-b774-44175a4e523b%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/sqlalchemy/d1e7f3ce-4cd8-4d9d-b774-44175a4e523b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
>
>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/65159d52-9ea7-4e35-96b8-d0215530939d%40googlegroups.com.

Re: [sqlalchemy] Performance issues with multi-shard (engines) sharing the same bakery / LRUCache

Reply via email to