Re: Protect syscache from bloating with negative cache entries
At Mon, 22 Mar 2021 13:12:10 -0400, Bruce Momjian wrote in > On Thu, Jan 28, 2021 at 05:16:52PM +0900, Kyotaro Horiguchi wrote: > > At Thu, 28 Jan 2021 16:50:44 +0900 (JST), Kyotaro Horiguchi > > wrote in > > > I was going to write in the doc something like "you can inspect memory > > > consumption by catalog caches using pg_backend_memory_contexts", but > > > all the memory used by catalog cache is in CacheMemoryContext. Is it > > > sensible for each catalog cache to have their own contexts? > > > > Something like this. > > Is this feature not going to make it into PG 14? It first appeared in > the January, 2017 commitfest: > > https://commitfest.postgresql.org/32/931/ Thank you for looking this. However, I'm afraid that you are looking to a patch which is not a part of the project in CF, "Protect syscache ". I'm happy if it is committed. It is intending not only to show more meaningful information by pg_get_backend_memory_contexts(), but also to make it easy to investigate what kind of cache is bloating or something like that. With the patch, the functions shows individual context information lines for catcaches. > postgres=# select pg_get_backend_memory_contexts(); ... > (catcache,"catcache id 78",CacheMemoryContext,2,8192,1,6152,0,2040) > (catcache,"catcache id 77",CacheMemoryContext,2,8192,1,6152,0,2040) > (catcache,"catcache id 76",CacheMemoryContext,2,16384,2,7592,3,8792) Applying catcachecxt_by_name.patch.txt on top of it changes the output as the following. The names are not familiar to users, but give far clearer information. > (catcache,USERMAPPINGUSERSERVER,CacheMemoryContext,2,8192,1,6192,0,2000) > (catcache,USERMAPPINGOID,CacheMemoryContext,2,8192,1,6192,0,2000) > (catcache,TYPEOID,CacheMemoryContext,2,16384,2,7632,0,8752) Applying catcachecxt_by_name_id.patch.xt on top of the _by_name.patch, the output further changes as the following. > (catcache,USERMAPPINGUSERSERVER[78],CacheMemoryContext,2,8192,1,6136,0,2056) > (catcache,USERMAPPINGOID[77],CacheMemoryContext,2,8192,1,6136,0,2056) > (catcache,TYPEOID[76],CacheMemoryContext,2,16384,2,7592,3,8792) The number enclosed by brackets is cache id. It is useles for users but convenient for debugging:p catcache_individual_mcxt_2.patch.txt: rebased version of per-catcache context. catcachecxt_by_name.patch.txt: gives a meaningful name to catcache contexts. catcachecxt_by_name_id.patch.txt: and adds cache id. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 55c9445898..7d318cf7aa 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -769,6 +769,8 @@ InitCatCache(int id, { CatCache *cp; MemoryContext oldcxt; + MemoryContext mycxt; + charname[32]; size_t sz; int i; @@ -792,7 +794,12 @@ InitCatCache(int id, if (!CacheMemoryContext) CreateCacheMemoryContext(); - oldcxt = MemoryContextSwitchTo(CacheMemoryContext); + mycxt = AllocSetContextCreate(CacheMemoryContext, "catcache", + ALLOCSET_DEFAULT_SIZES); + + snprintf(name, sizeof(name), "catcache id %d", id); + oldcxt = MemoryContextSwitchTo(mycxt); + MemoryContextSetIdentifier(mycxt, (const char *)pstrdup(name)); /* * if first time through, initialize the cache group header @@ -833,6 +840,7 @@ InitCatCache(int id, cp->cc_nkeys = nkeys; for (i = 0; i < nkeys; ++i) cp->cc_keyno[i] = key[i]; + cp->cc_mcxt = mycxt; /* * new cache is initialized as far as we can go for now. print some @@ -932,12 +940,12 @@ CatalogCacheInitializeCache(CatCache *cache) relation = table_open(cache->cc_reloid, AccessShareLock); /* -* switch to the cache context so our allocations do not vanish at the end -* of a transaction +* switch to our own context under the cache context so our allocations do +* not vanish at the end of a transaction */ - Assert(CacheMemoryContext != NULL); + Assert(CacheMemoryContext != NULL && cache->cc_mcxt != NULL); - oldcxt = MemoryContextSwitchTo(CacheMemoryContext); + oldcxt = MemoryContextSwitchTo(cache->cc_mcxt); /* * copy the relcache's tuple descriptor to permanent cache storage @@ -998,7 +1006,7 @@ CatalogCacheInitializeCache(CatCache *cache) */ fmgr_info_cxt(eqfunc, >cc_skey[i].sk_func, - CacheMemoryContext); + cache->cc_mcxt); /* Initialize sk_attno suitably for HeapKeyTest() and heap scans */ cache->cc_skey[i].sk_attno = cache->cc_keyno[i]; @@
Re: Protect syscache from bloating with negative cache entries
On Thu, Jan 28, 2021 at 05:16:52PM +0900, Kyotaro Horiguchi wrote: > At Thu, 28 Jan 2021 16:50:44 +0900 (JST), Kyotaro Horiguchi > wrote in > > I was going to write in the doc something like "you can inspect memory > > consumption by catalog caches using pg_backend_memory_contexts", but > > all the memory used by catalog cache is in CacheMemoryContext. Is it > > sensible for each catalog cache to have their own contexts? > > Something like this. Is this feature not going to make it into PG 14? It first appeared in the January, 2017 commitfest: https://commitfest.postgresql.org/32/931/ -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Protect syscache from bloating with negative cache entries
At Thu, 28 Jan 2021 16:50:44 +0900 (JST), Kyotaro Horiguchi wrote in > I was going to write in the doc something like "you can inspect memory > consumption by catalog caches using pg_backend_memory_contexts", but > all the memory used by catalog cache is in CacheMemoryContext. Is it > sensible for each catalog cache to have their own contexts? Something like this. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index fa2b49c676..cfbb335bb3 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -769,6 +769,8 @@ InitCatCache(int id, { CatCache *cp; MemoryContext oldcxt; + MemoryContext mycxt; + charname[32]; size_t sz; int i; @@ -792,7 +794,12 @@ InitCatCache(int id, if (!CacheMemoryContext) CreateCacheMemoryContext(); - oldcxt = MemoryContextSwitchTo(CacheMemoryContext); + mycxt = AllocSetContextCreate(CacheMemoryContext, "catcache", + ALLOCSET_DEFAULT_SIZES); + + snprintf(name, sizeof(name), "catcache id %d", id); + oldcxt = MemoryContextSwitchTo(mycxt); + MemoryContextSetIdentifier(mycxt, (const char *)pstrdup(name)); /* * if first time through, initialize the cache group header @@ -833,6 +840,7 @@ InitCatCache(int id, cp->cc_nkeys = nkeys; for (i = 0; i < nkeys; ++i) cp->cc_keyno[i] = key[i]; + cp->cc_mcxt = mycxt; /* * new cache is initialized as far as we can go for now. print some @@ -932,12 +940,12 @@ CatalogCacheInitializeCache(CatCache *cache) relation = table_open(cache->cc_reloid, AccessShareLock); /* -* switch to the cache context so our allocations do not vanish at the end -* of a transaction +* switch to our own context under the cache context so our allocations do +* not vanish at the end of a transaction */ Assert(CacheMemoryContext != NULL); - oldcxt = MemoryContextSwitchTo(CacheMemoryContext); + oldcxt = MemoryContextSwitchTo(cache->cc_mcxt); /* * copy the relcache's tuple descriptor to permanent cache storage @@ -998,7 +1006,7 @@ CatalogCacheInitializeCache(CatCache *cache) */ fmgr_info_cxt(eqfunc, >cc_skey[i].sk_func, - CacheMemoryContext); + cache->cc_mcxt); /* Initialize sk_attno suitably for HeapKeyTest() and heap scans */ cache->cc_skey[i].sk_attno = cache->cc_keyno[i]; @@ -1697,7 +1705,7 @@ SearchCatCacheList(CatCache *cache, table_close(relation, AccessShareLock); /* Now we can build the CatCList entry. */ - oldcxt = MemoryContextSwitchTo(CacheMemoryContext); + oldcxt = MemoryContextSwitchTo(cache->cc_mcxt); nmembers = list_length(ctlist); cl = (CatCList *) palloc(offsetof(CatCList, members) + nmembers * sizeof(CatCTup *)); @@ -1830,7 +1838,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, dtp = ntp; /* Allocate memory for CatCTup and the cached tuple in one go */ - oldcxt = MemoryContextSwitchTo(CacheMemoryContext); + oldcxt = MemoryContextSwitchTo(cache->cc_mcxt); ct = (CatCTup *) palloc(sizeof(CatCTup) + MAXIMUM_ALIGNOF + dtp->t_len); @@ -1865,7 +1873,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, else { Assert(negative); - oldcxt = MemoryContextSwitchTo(CacheMemoryContext); + oldcxt = MemoryContextSwitchTo(cache->cc_mcxt); ct = (CatCTup *) palloc(sizeof(CatCTup)); /* diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h index ddc2762eb3..a32fea2f11 100644 --- a/src/include/utils/catcache.h +++ b/src/include/utils/catcache.h @@ -61,6 +61,7 @@ typedef struct catcache slist_node cc_next;/* list link */ ScanKeyData cc_skey[CATCACHE_MAXKEYS]; /* precomputed key info for heap * scans */ + MemoryContext cc_mcxt;/* memory context for this cache */ /* * Keep these at the end, so that compiling catcache.c with CATCACHE_STATS
Re: Protect syscache from bloating with negative cache entries
At Wed, 27 Jan 2021 13:11:55 +0200, Heikki Linnakangas wrote in > On 27/01/2021 03:13, Kyotaro Horiguchi wrote: > > At Thu, 14 Jan 2021 17:32:27 +0900 (JST), Kyotaro Horiguchi > > wrote in > >> The commit 4656e3d668 (debug_invalidate_system_caches_always) > >> conflicted with this patch. Rebased. > > At Wed, 27 Jan 2021 10:07:47 +0900 (JST), Kyotaro Horiguchi > > wrote in > >> (I found a bug in a benchmark-aid function > >> (CatalogCacheFlushCatalog2), I repost an updated version soon.) > > I noticed that a catcachebench-aid function > > CatalogCacheFlushCatalog2() allocates bucked array wrongly in the > > current memory context, which leads to a crash. > > This is a fixed it then rebased version. > > Thanks, with the scripts you provided, I was able to run the > performance tests on my laptop, and got very similar results as you > did. > > The impact of v7-0002-Remove-dead-flag-from-catcache-tuple.patch is > very small. I think I could see it in the tests, but only barely. And > the tests did nothing else than do syscache lookups; in any real world > scenario, it would be lost in noise. I think we can put that aside for > now, and focus on v6-0001-CatCache-expiration-feature.patch: I agree to that opinion. But a bit dissapointing that the long struggle ended up in vain:p > The pruning is still pretty lethargic: > > - Entries created in the same transactions are never pruned away > > - The size of the hash table is never shrunk. So even though the patch > - puts a backstop to the hash table growing indefinitely, if you run one > - transaction that bloats the cache, it's bloated for the rest of the > - session. Right. But more frequent check impacts on performance. We can do more aggressive pruning in idle-time. > I think that's OK. We might want to be more aggressive in the future, > but for now it seems reasonable to lean towards the current behavior > where nothing is pruned. Although I wonder if we should try to set > 'catcacheclock' more aggressively. I think we could set it whenever > GetCurrentTimestamp() is called, for example. Ah. I didn't thought that direction. global_last_acquired_timestamp or such? > Given how unaggressive this mechanism is, I think it should be safe to > enable it by default. What would be a suitable default for > catalog_cache_prune_min_age? 30 seconds? Without a detailed thought, it seems a bit too short. The ever-suggested value for the variable is 300-600s. That is, intermittent queries with about 5-10 minutes intervals don't lose cache entries. In a bad case, two queries alternately remove each other's cache entries. Q1: adds 100 entries <1 minute passed> Q2: adds 100 entries but rehash is going to happen at 150 entries and the existing 100 entreis added by Q1 are removed. <1 minute passed> Q1: adds 100 entries but rehash is going to happen at 150 entries and the existing 100 entreis added by Q2 are removed. Or a transaction sequence persists longer than that seconds may lose some of the catcache entries. > Documentation needs to be updated for the new GUC. > > Attached is a version with a few little cleanups: > - use TimestampTz instead of uint64 for the timestamps > - remove assign_catalog_cache_prune_min_age(). All it did was convert > - the GUC's value from seconds to microseconds, and stored it in a > - separate variable. Multiplication is cheap, so we can just do it when > - we use the GUC's value instead. Yeah, the laater is a trace of the struggle for cutting down cpu cycles in the normal paths. I don't object to do so. I found that some comments are apparently stale. cp->cc_oldest_ts is not used anywhere, but it is added for the decision of whether to scan or not. I fixed the following points in the attached. - Removed some comments that is obvious. ("Timestamp in us") - Added cp->cc_oldest_ts check in CatCacheCleanupOldEntries. - Set the default value for catalog_cache_prune_min_age to 600s. - Added a doc entry for the new GUC in the resoruce/memory section. - Fix some code comments. - Adjust pruning criteria from (ct->lastaccess < prune_threshold) to <=. I was going to write in the doc something like "you can inspect memory consumption by catalog caches using pg_backend_memory_contexts", but all the memory used by catalog cache is in CacheMemoryContext. Is it sensible for each catalog cache to have their own contexts? regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 01d32ffa499ee9185e222fe6fa3d39ad6ac5ff37 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 27 Jan 2021 13:08:08 +0200 Subject: [PATCH v9] CatCache expiration feature Author: Kyotaro Horiguchi --- doc/src/sgml/config.sgml | 20 +++ src/backend/access/transam/xact.c | 3 ++ src/backend/utils/cache/catcache.c | 85 +- src/backend/utils/misc/guc.c | 12 + src/include/utils/catcache.h | 17 ++ 5 files changed, 136 insertions(+), 1 deletion(-) diff --git
Re: Protect syscache from bloating with negative cache entries
On 27/01/2021 03:13, Kyotaro Horiguchi wrote: At Thu, 14 Jan 2021 17:32:27 +0900 (JST), Kyotaro Horiguchi wrote in The commit 4656e3d668 (debug_invalidate_system_caches_always) conflicted with this patch. Rebased. At Wed, 27 Jan 2021 10:07:47 +0900 (JST), Kyotaro Horiguchi wrote in (I found a bug in a benchmark-aid function (CatalogCacheFlushCatalog2), I repost an updated version soon.) I noticed that a catcachebench-aid function CatalogCacheFlushCatalog2() allocates bucked array wrongly in the current memory context, which leads to a crash. This is a fixed it then rebased version. Thanks, with the scripts you provided, I was able to run the performance tests on my laptop, and got very similar results as you did. The impact of v7-0002-Remove-dead-flag-from-catcache-tuple.patch is very small. I think I could see it in the tests, but only barely. And the tests did nothing else than do syscache lookups; in any real world scenario, it would be lost in noise. I think we can put that aside for now, and focus on v6-0001-CatCache-expiration-feature.patch: The pruning is still pretty lethargic: - Entries created in the same transactions are never pruned away - The size of the hash table is never shrunk. So even though the patch puts a backstop to the hash table growing indefinitely, if you run one transaction that bloats the cache, it's bloated for the rest of the session. I think that's OK. We might want to be more aggressive in the future, but for now it seems reasonable to lean towards the current behavior where nothing is pruned. Although I wonder if we should try to set 'catcacheclock' more aggressively. I think we could set it whenever GetCurrentTimestamp() is called, for example. Given how unaggressive this mechanism is, I think it should be safe to enable it by default. What would be a suitable default for catalog_cache_prune_min_age? 30 seconds? Documentation needs to be updated for the new GUC. Attached is a version with a few little cleanups: - use TimestampTz instead of uint64 for the timestamps - remove assign_catalog_cache_prune_min_age(). All it did was convert the GUC's value from seconds to microseconds, and stored it in a separate variable. Multiplication is cheap, so we can just do it when we use the GUC's value instead. - Heikki >From 73583c2c9cd26bad7c3942eff1ddccb2ebb43222 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 27 Jan 2021 13:08:08 +0200 Subject: [PATCH v8 1/1] CatCache expiration feature Author: Kyotaro Horiguchi --- src/backend/access/transam/xact.c | 3 ++ src/backend/utils/cache/catcache.c | 82 +- src/backend/utils/misc/guc.c | 12 + src/include/utils/catcache.h | 17 +++ 4 files changed, 112 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index a2068e3fd45..86888d24091 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1086,6 +1086,9 @@ static void AtStart_Cache(void) { AcceptInvalidationMessages(); + + if (xactStartTimestamp != 0) + SetCatCacheClock(xactStartTimestamp); } /* diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index fa2b49c676e..e29f825687a 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -38,6 +38,7 @@ #include "utils/rel.h" #include "utils/resowner_private.h" #include "utils/syscache.h" +#include "utils/timestamp.h" /* #define CACHEDEBUG */ /* turns DEBUG elogs on */ @@ -60,9 +61,18 @@ #define CACHE_elog(...) #endif +/* + * GUC variable to define the minimum age of entries that will be considered + * to be evicted in seconds. -1 to disable the feature. + */ +int catalog_cache_prune_min_age = -1; + /* Cache management header --- pointer is NULL until created */ static CatCacheHeader *CacheHdr = NULL; +/* Clock for the last accessed time of a catcache entry. */ +TimestampTz catcacheclock = 0; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -74,6 +84,7 @@ static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache, Index hashIndex, Datum v1, Datum v2, Datum v3, Datum v4); +static bool CatCacheCleanupOldEntries(CatCache *cp); static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys, Datum v1, Datum v2, Datum v3, Datum v4); @@ -99,7 +110,6 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos, static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos, Datum *srckeys, Datum *dstkeys); - /* * internal support functions */ @@ -1264,6 +1274,9 @@ SearchCatCacheInternal(CatCache *cache, */ dlist_move_head(bucket, >cache_elem); + /* Record the last access timestamp */ + ct->lastaccess = catcacheclock; + /* * If it's a positive entry,
Re: Protect syscache from bloating with negative cache entries
At Thu, 14 Jan 2021 17:32:27 +0900 (JST), Kyotaro Horiguchi wrote in > The commit 4656e3d668 (debug_invalidate_system_caches_always) > conflicted with this patch. Rebased. At Wed, 27 Jan 2021 10:07:47 +0900 (JST), Kyotaro Horiguchi wrote in > (I found a bug in a benchmark-aid function > (CatalogCacheFlushCatalog2), I repost an updated version soon.) I noticed that a catcachebench-aid function CatalogCacheFlushCatalog2() allocates bucked array wrongly in the current memory context, which leads to a crash. This is a fixed it then rebased version. -- Kyotaro Horiguchi NTT Open Source Software Center >From 5f318170b9c1e0caa1033862261800f06135e5bd Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 18 Nov 2020 16:54:31 +0900 Subject: [PATCH v7 1/3] CatCache expiration feature --- src/backend/access/transam/xact.c | 3 ++ src/backend/utils/cache/catcache.c | 87 +- src/backend/utils/misc/guc.c | 12 + src/include/utils/catcache.h | 19 +++ 4 files changed, 120 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index a2068e3fd4..86888d2409 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1086,6 +1086,9 @@ static void AtStart_Cache(void) { AcceptInvalidationMessages(); + + if (xactStartTimestamp != 0) + SetCatCacheClock(xactStartTimestamp); } /* diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index fa2b49c676..644d92dd9a 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -38,6 +38,7 @@ #include "utils/rel.h" #include "utils/resowner_private.h" #include "utils/syscache.h" +#include "utils/timestamp.h" /* #define CACHEDEBUG */ /* turns DEBUG elogs on */ @@ -60,9 +61,19 @@ #define CACHE_elog(...) #endif +/* + * GUC variable to define the minimum age of entries that will be considered + * to be evicted in seconds. -1 to disable the feature. + */ +int catalog_cache_prune_min_age = -1; +uint64 prune_min_age_us; + /* Cache management header --- pointer is NULL until created */ static CatCacheHeader *CacheHdr = NULL; +/* Clock for the last accessed time of a catcache entry. */ +uint64 catcacheclock = 0; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -74,6 +85,7 @@ static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache, Index hashIndex, Datum v1, Datum v2, Datum v3, Datum v4); +static bool CatCacheCleanupOldEntries(CatCache *cp); static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys, Datum v1, Datum v2, Datum v3, Datum v4); @@ -99,6 +111,15 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos, static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos, Datum *srckeys, Datum *dstkeys); +/* GUC assign function */ +void +assign_catalog_cache_prune_min_age(int newval, void *extra) +{ + if (newval < 0) + prune_min_age_us = UINT64_MAX; + else + prune_min_age_us = ((uint64) newval) * USECS_PER_SEC; +} /* * internal support functions @@ -1264,6 +1285,9 @@ SearchCatCacheInternal(CatCache *cache, */ dlist_move_head(bucket, >cache_elem); + /* Record the last access timestamp */ + ct->lastaccess = catcacheclock; + /* * If it's a positive entry, bump its refcount and return it. If it's * negative, we can report failure to the caller. @@ -1425,6 +1449,61 @@ SearchCatCacheMiss(CatCache *cache, return >tuple; } +/* + * CatCacheCleanupOldEntries - Remove infrequently-used entries + * + * Catcache entries happen to be left unused for a long time for several + * reasons. Remove such entries to prevent catcache from bloating. It is based + * on the similar algorithm with buffer eviction. Entries that are accessed + * several times in a certain period live longer than those that have had less + * access in the same duration. + */ +static bool +CatCacheCleanupOldEntries(CatCache *cp) +{ + int nremoved = 0; + int i; + long oldest_ts = catcacheclock; + uint64 prune_threshold = catcacheclock - prune_min_age_us; + + /* Scan over the whole hash to find entries to remove */ + for (i = 0 ; i < cp->cc_nbuckets ; i++) + { + dlist_mutable_iter iter; + + dlist_foreach_modify(iter, >cc_bucket[i]) + { + CatCTup*ct = dlist_container(CatCTup, cache_elem, iter.cur); + + /* Don't remove referenced entries */ + if (ct->refcount == 0 && +(ct->c_list == NULL || ct->c_list->refcount == 0)) + { +if (ct->lastaccess < prune_threshold) +{ + CatCacheRemoveCTup(cp, ct); + nremoved++; + + /* don't let the removed entry update oldest_ts */ + continue; +} + } + + /* update the oldest timestamp if the entry remains alive */ + if (ct->lastaccess < oldest_ts) +oldest_ts = ct->lastaccess; + } + } + +
Re: Protect syscache from bloating with negative cache entries
At Tue, 26 Jan 2021 11:43:21 +0200, Heikki Linnakangas wrote in > Hi, > > On 19/11/2020 07:25, Kyotaro Horiguchi wrote: > > Performance measurement on the attached showed better result about > > searching but maybe worse for cache entry creation. Each time number > > is the mean of 10 runs. > > # Cacache (negative) entry creation > > : time(ms) (% to master) > > master : 3965.61(100.0) > > patched-off: 4040.93(101.9) > > patched-on : 4032.22(101.7) > > # Searching negative cache entries > > master : 8173.46(100.0) > > patched-off: 7983.43( 97.7) > > patched-on : 8049.88( 98.5) > > # Creation, searching and expiration > > master : 6393.23(100.0) > > patched-off: 6527.94(102.1) > > patched-on : 15880.01(248.4) > > That is, catcache searching gets faster by 2-3% but creation gets > > slower by about 2%. If I moved the condition of 2 further up to > > CatalogCacheCreateEntry(), that degradation reduced to 0.6%. > > # Cacache (negative) entry creation > > master : 3967.45 (100.0) > > patched-off : 3990.43 (100.6) > > patched-on : 4108.96 (103.6) > > # Searching negative cache entries > > master : 8106.53 (100.0) > > patched-off : 8036.61 ( 99.1) > > patched-on : 8058.18 ( 99.4) > > # Creation, searching and expiration > > master : 6395.00 (100.0) > > patched-off : 6416.57 (100.3) > > patched-on : 15830.91 (247.6) > > Can you share the exact script or steps to reproduce these numbers? I > presume these are from the catcachebench extension, but I can't figure > out which scenario above corresponds to which catcachebench > test. Also, catcachebench seems to depend on a bunch of tables being > created in schema called "test"; what tables did you use for the above > numbers? gen_tbl.pl to generate the tables, then run2.sh to run the benchmark. sumlog.pl to summarize the result of run2.sh. $ ./gen_tbl.pl | psql postgres $ ./run2.sh | tee rawresult.txt | ./sumlog.pl (I found a bug in a benchmark-aid function (CatalogCacheFlushCatalog2), I repost an updated version soon.) Simple explanation follows since the scripts are a kind of crappy.. run2.sh: LOOPS: # of execution of catcachebench() in a single run USES : Take the average of this number of fastest executions in a single run. BINROOT : Common parent directory of target binaries. DATADIR : Data directory. (shared by all binaries) PREC : FP format for time and percentage in a result. TESTS: comma-separated numbers given to catcachebench. The "run" function spec run "binary-label" where A, B and C are the value for catalog_cache_prune_min_age. "" means no setting (used for master binary). Currently only C is in effect but all the three should be non-empty string to make it effective. The result output is: test | version | n |r | stddev --+-+-+--+- 1 | patched-off | 1/3 | 14211.96 | 261.19 test : # of catcachebench(#) version: binary label given to the run function n : USES / LOOPS r : result average time of catcachebench() in milliseconds stddev : stddev of regards. -- Kyotaro Horiguchi NTT Open Source Software Center #! /usr/bin/perl $collist = ""; foreach $i (0..1000) { $collist .= sprintf(", c%05d int", $i); } $collist = substr($collist, 2); printf "drop schema if exists test cascade;\n"; printf "create schema test;\n"; foreach $i (0..2999) { printf "create table test.t%04d ($collist);\n", $i; } #!/bin/bash LOOPS=3 USES=1 TESTS=1,2,3 BINROOT=/home/horiguti/bin DATADIR=/home/horiguti/data/data_work PREC="numeric(10,2)" /usr/bin/killall postgres /usr/bin/sleep 3 run() { local BINARY=$1 local PGCTL=$2/bin/pg_ctl local PGSQL=$2/bin/postgres local PSQL=$2/bin/psql if [ "$3" != "" ]; then local SETTING1="set catalog_cache_prune_min_age to \"$3\";" local SETTING2="set catalog_cache_prune_min_age to \"$4\";" local SETTING3="set catalog_cache_prune_min_age to \"$5\";" fi # ($PGSQL -D $DATADIR 2>&1 > /dev/null)& ($PGSQL -D $DATADIR 2>&1 > /dev/null | /usr/bin/sed -e 's/^/# /')& /usr/bin/sleep 3 ${PSQL} postgres <&1 > /dev/null | /usr/bin/sed -e 's/^/# /' # oreport > $BINARY_perf.txt } for i in $(seq 0 2); do run "patched-off" $BINROOT/pgsql_catexp "-1" "-1" "-1" run "patched-on" $BINROOT/pgsql_catexp "0" "0" "0" run "master" $BINROOT/pgsql_master_o2 "" "" "" done #! /usr/bin/perl while () { # if (/^\s+([0-9])\s*\|\s*(\w+)\s*\|\s*(\S+)\s*\|\s*([\d.]+)\s*\|\s*(\w+)\s*$/) { if (/^\s+([0-9])\s*\|\s*(\S+)\s*\|\s*(\S+)\s*\|\s*([\d.]+)\s*\|\s*([\d.]+)\s*$/) { $test = $1; $bin = $2; $time = $4; if (defined $sum{$test}{$bin}) { $sum{$test}{$bin} +=
Re: Protect syscache from bloating with negative cache entries
Hi, On 19/11/2020 07:25, Kyotaro Horiguchi wrote: Performance measurement on the attached showed better result about searching but maybe worse for cache entry creation. Each time number is the mean of 10 runs. # Cacache (negative) entry creation : time(ms) (% to master) master : 3965.61(100.0) patched-off: 4040.93(101.9) patched-on : 4032.22(101.7) # Searching negative cache entries master : 8173.46(100.0) patched-off: 7983.43( 97.7) patched-on : 8049.88( 98.5) # Creation, searching and expiration master : 6393.23(100.0) patched-off: 6527.94(102.1) patched-on : 15880.01(248.4) That is, catcache searching gets faster by 2-3% but creation gets slower by about 2%. If I moved the condition of 2 further up to CatalogCacheCreateEntry(), that degradation reduced to 0.6%. # Cacache (negative) entry creation master : 3967.45 (100.0) patched-off : 3990.43 (100.6) patched-on : 4108.96 (103.6) # Searching negative cache entries master : 8106.53 (100.0) patched-off : 8036.61 ( 99.1) patched-on : 8058.18 ( 99.4) # Creation, searching and expiration master : 6395.00 (100.0) patched-off : 6416.57 (100.3) patched-on : 15830.91 (247.6) Can you share the exact script or steps to reproduce these numbers? I presume these are from the catcachebench extension, but I can't figure out which scenario above corresponds to which catcachebench test. Also, catcachebench seems to depend on a bunch of tables being created in schema called "test"; what tables did you use for the above numbers? - Heikki
Re: Protect syscache from bloating with negative cache entries
Hello. The commit 4656e3d668 (debug_invalidate_system_caches_always) conflicted with this patch. Rebased. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From ec069488fd2675369530f3f967f02a7b683f0a7f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Wed, 18 Nov 2020 16:54:31 +0900 Subject: [PATCH v6 1/3] CatCache expiration feature --- src/backend/access/transam/xact.c | 3 ++ src/backend/utils/cache/catcache.c | 87 +- src/backend/utils/misc/guc.c | 12 + src/include/utils/catcache.h | 19 +++ 4 files changed, 120 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index a2068e3fd4..86888d2409 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1086,6 +1086,9 @@ static void AtStart_Cache(void) { AcceptInvalidationMessages(); + + if (xactStartTimestamp != 0) + SetCatCacheClock(xactStartTimestamp); } /* diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index fa2b49c676..644d92dd9a 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -38,6 +38,7 @@ #include "utils/rel.h" #include "utils/resowner_private.h" #include "utils/syscache.h" +#include "utils/timestamp.h" /* #define CACHEDEBUG */ /* turns DEBUG elogs on */ @@ -60,9 +61,19 @@ #define CACHE_elog(...) #endif +/* + * GUC variable to define the minimum age of entries that will be considered + * to be evicted in seconds. -1 to disable the feature. + */ +int catalog_cache_prune_min_age = -1; +uint64 prune_min_age_us; + /* Cache management header --- pointer is NULL until created */ static CatCacheHeader *CacheHdr = NULL; +/* Clock for the last accessed time of a catcache entry. */ +uint64 catcacheclock = 0; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -74,6 +85,7 @@ static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache, Index hashIndex, Datum v1, Datum v2, Datum v3, Datum v4); +static bool CatCacheCleanupOldEntries(CatCache *cp); static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys, Datum v1, Datum v2, Datum v3, Datum v4); @@ -99,6 +111,15 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos, static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos, Datum *srckeys, Datum *dstkeys); +/* GUC assign function */ +void +assign_catalog_cache_prune_min_age(int newval, void *extra) +{ + if (newval < 0) + prune_min_age_us = UINT64_MAX; + else + prune_min_age_us = ((uint64) newval) * USECS_PER_SEC; +} /* * internal support functions @@ -1264,6 +1285,9 @@ SearchCatCacheInternal(CatCache *cache, */ dlist_move_head(bucket, >cache_elem); + /* Record the last access timestamp */ + ct->lastaccess = catcacheclock; + /* * If it's a positive entry, bump its refcount and return it. If it's * negative, we can report failure to the caller. @@ -1425,6 +1449,61 @@ SearchCatCacheMiss(CatCache *cache, return >tuple; } +/* + * CatCacheCleanupOldEntries - Remove infrequently-used entries + * + * Catcache entries happen to be left unused for a long time for several + * reasons. Remove such entries to prevent catcache from bloating. It is based + * on the similar algorithm with buffer eviction. Entries that are accessed + * several times in a certain period live longer than those that have had less + * access in the same duration. + */ +static bool +CatCacheCleanupOldEntries(CatCache *cp) +{ + int nremoved = 0; + int i; + long oldest_ts = catcacheclock; + uint64 prune_threshold = catcacheclock - prune_min_age_us; + + /* Scan over the whole hash to find entries to remove */ + for (i = 0 ; i < cp->cc_nbuckets ; i++) + { + dlist_mutable_iter iter; + + dlist_foreach_modify(iter, >cc_bucket[i]) + { + CatCTup*ct = dlist_container(CatCTup, cache_elem, iter.cur); + + /* Don't remove referenced entries */ + if (ct->refcount == 0 && +(ct->c_list == NULL || ct->c_list->refcount == 0)) + { +if (ct->lastaccess < prune_threshold) +{ + CatCacheRemoveCTup(cp, ct); + nremoved++; + + /* don't let the removed entry update oldest_ts */ + continue; +} + } + + /* update the oldest timestamp if the entry remains alive */ + if (ct->lastaccess < oldest_ts) +oldest_ts = ct->lastaccess; + } + } + + cp->cc_oldest_ts = oldest_ts; + + if (nremoved > 0) + elog(DEBUG1, "pruning catalog cache id=%d for %s: removed %d / %d", + cp->id, cp->cc_relname, nremoved, cp->cc_ntup + nremoved); + + return nremoved > 0; +} + /* * ReleaseCatCache * @@ -1888,6 +1967,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments, ct->dead = false; ct->negative = negative; ct->hash_value = hashValue; + ct->lastaccess =
Re: Protect syscache from bloating with negative cache entries
Ah. It was obvious from the first. Sorry for the sloppy diagnosis. At Fri, 20 Nov 2020 16:08:40 +0900 (JST), Kyotaro Horiguchi wrote in > At Thu, 19 Nov 2020 15:23:05 +0900 (JST), Kyotaro Horiguchi > wrote in > > At Wed, 18 Nov 2020 21:42:02 -0800, Andres Freund > > wrote in > > > Hi, > > > > > > On 2020-11-19 14:25:36 +0900, Kyotaro Horiguchi wrote: > > > > # Creation, searching and expiration > > > > master : 6393.23(100.0) > > > > patched-off: 6527.94(102.1) > > > > patched-on : 15880.01(248.4) > > > > > > What's the deal with this massive increase here? catalog_cache_min_prune_age was set to 0 at the time, so almost all catcache entries are dropped at rehashing time. Most of the difference should be the time to search on the system catalog. 2020-11-20 16:25:25.988 LOG: database system is ready to accept connections 2020-11-20 16:26:48.504 LOG: Catcache reset 2020-11-20 16:26:48.504 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 257: 0.001500 ms 2020-11-20 16:26:48.504 LOG: rehashed catalog cache id 58 for pg_statistic; 257 tups, 256 buckets, 0.020748 ms 2020-11-20 16:26:48.505 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 513: 0.003221 ms 2020-11-20 16:26:48.505 LOG: rehashed catalog cache id 58 for pg_statistic; 513 tups, 512 buckets, 0.006962 ms 2020-11-20 16:26:48.505 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 1025: 0.006744 ms 2020-11-20 16:26:48.505 LOG: rehashed catalog cache id 58 for pg_statistic; 1025 tups, 1024 buckets, 0.009580 ms 2020-11-20 16:26:48.507 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 2049: 0.015683 ms 2020-11-20 16:26:48.507 LOG: rehashed catalog cache id 58 for pg_statistic; 2049 tups, 2048 buckets, 0.041008 ms 2020-11-20 16:26:48.509 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 4097: 0.042438 ms 2020-11-20 16:26:48.509 LOG: rehashed catalog cache id 58 for pg_statistic; 4097 tups, 4096 buckets, 0.077379 ms 2020-11-20 16:26:48.515 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 8193: 0.123798 ms 2020-11-20 16:26:48.515 LOG: rehashed catalog cache id 58 for pg_statistic; 8193 tups, 8192 buckets, 0.198505 ms 2020-11-20 16:26:48.525 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 16385: 0.180831 ms 2020-11-20 16:26:48.526 LOG: rehashed catalog cache id 58 for pg_statistic; 16385 tups, 16384 buckets, 0.361109 ms 2020-11-20 16:26:48.546 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 32769: 0.717899 ms 2020-11-20 16:26:48.547 LOG: rehashed catalog cache id 58 for pg_statistic; 32769 tups, 32768 buckets, 1.443587 ms 2020-11-20 16:26:48.588 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 65537: 1.204804 ms 2020-11-20 16:26:48.591 LOG: rehashed catalog cache id 58 for pg_statistic; 65537 tups, 65536 buckets, 3.069916 ms 2020-11-20 16:26:48.674 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 131073: 2.707709 ms 2020-11-20 16:26:48.681 LOG: rehashed catalog cache id 58 for pg_statistic; 131073 tups, 131072 buckets, 7.127622 ms 2020-11-20 16:26:48.848 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 262145: 5.895630 ms 2020-11-20 16:26:48.862 LOG: rehashed catalog cache id 58 for pg_statistic; 262145 tups, 262144 buckets, 13.433610 ms 2020-11-20 16:26:49.195 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 524289: 12.302632 ms 2020-11-20 16:26:49.223 LOG: rehashed catalog cache id 58 for pg_statistic; 524289 tups, 524288 buckets, 27.710900 ms 2020-11-20 16:26:49.937 LOG: pruning catalog cache id=58 for pg_statistic: removed 1001000 / 1048577: 66.062629 ms 2020-11-20 16:26:51.195 LOG: pruning catalog cache id=58 for pg_statistic: removed 1002001 / 1048577: 65.533468 ms 2020-11-20 16:26:52.413 LOG: pruning catalog cache id=58 for pg_statistic: removed 0 / 1048577: 25.623740 ms 2020-11-20 16:26:52.468 LOG: rehashed catalog cache id 58 for pg_statistic; 1048577 tups, 1048576 buckets, 54.314825 ms 2020-11-20 16:26:53.898 LOG: pruning catalog cache id=58 for pg_statistic: removed 2000999 / 2097153: 134.530582 ms 2020-11-20 16:26:56.404 LOG: pruning catalog cache id=58 for pg_statistic: removed 1002001 / 2097153: 111.634597 ms 2020-11-20 16:26:57.779 LOG: pruning catalog cache id=58 for pg_statistic: removed 2000999 / 2097153: 134.628430 ms 2020-11-20 16:27:00.389 LOG: pruning catalog cache id=58 for pg_statistic: removed 1002001 / 2097153: 147.221688 ms 2020-11-20 16:27:01.851 LOG: pruning catalog cache id=58 for pg_statistic: removed 2000999 / 2097153: 177.610820 ms regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
At Thu, 19 Nov 2020 15:23:05 +0900 (JST), Kyotaro Horiguchi wrote in > At Wed, 18 Nov 2020 21:42:02 -0800, Andres Freund wrote > in > > Hi, > > > > On 2020-11-19 14:25:36 +0900, Kyotaro Horiguchi wrote: > > > # Creation, searching and expiration > > > master : 6393.23(100.0) > > > patched-off: 6527.94(102.1) > > > patched-on : 15880.01(248.4) > > > > What's the deal with this massive increase here? > > CatCacheRemovedCTup(). If I replaced a call to the function in the > cleanup functoin with dlist_delete(), the result changes as: > > master : 6372.04 (100.0) (2) > patched-off : 6464.97 (101.5) (2) > patched-on : 5354.42 ( 84.0) (2) > > We could boost the expiration if we reuse the "deleted" entry at the > next entry creation. That result should be bogus. It forgot to update cc_ntup.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
At Wed, 18 Nov 2020 21:42:02 -0800, Andres Freund wrote in > Hi, > > On 2020-11-19 14:25:36 +0900, Kyotaro Horiguchi wrote: > > # Creation, searching and expiration > > master : 6393.23(100.0) > > patched-off: 6527.94(102.1) > > patched-on : 15880.01(248.4) > > What's the deal with this massive increase here? CatCacheRemovedCTup(). If I replaced a call to the function in the cleanup functoin with dlist_delete(), the result changes as: master : 6372.04 (100.0) (2) patched-off : 6464.97 (101.5) (2) patched-on : 5354.42 ( 84.0) (2) We could boost the expiration if we reuse the "deleted" entry at the next entry creation. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
Hi, On 2020-11-19 14:25:36 +0900, Kyotaro Horiguchi wrote: > # Creation, searching and expiration > master : 6393.23(100.0) > patched-off: 6527.94(102.1) > patched-on : 15880.01(248.4) What's the deal with this massive increase here? Greetings, Andres Freund
Re: Protect syscache from bloating with negative cache entries
Thank you for the comments. At Tue, 17 Nov 2020 16:22:54 -0500, Robert Haas wrote in > On Tue, Nov 17, 2020 at 10:46 AM Heikki Linnakangas wrote: > > 0.7% degradation is probably acceptable. > > I haven't looked at this patch in a while and I'm pleased with the way > it seems to have been redesigned. It seems relatively simple and > unlikely to cause big headaches. I would say that 0.7% is probably not > acceptable on a general workload, but it seems fine on a benchmark Sorry for the confusing notation, "-0.7% degradation" meant +0.7% *gain*, which I thinks is error. However, the next patch makes catcache apparently *faster* so the difference doesn't matter.. > that is specifically designed to be a worst-case for this patch, which > I gather is what's happening here. I think it would be nice if we > could enable this feature by default. Does it cause a measurable > regression on realistic workloads when enabled? I bet a default of 5 > or 10 minutes would help many users. > > One idea for improving things might be to move the "return > immediately" tests in CatCacheCleanupOldEntries() to the caller, and > only call this function if they indicate that there is some purpose. > This would avoid the function call overhead when nothing can be done. > Perhaps the two tests could be combined into one and simplified. Like, > suppose the code looks (roughly) like this: > > if (catcacheclock >= time_at_which_we_can_prune) > CatCacheCleanupOldEntries(...); Compiler removes the call (or inlines the function) but of course we can write that way and it shows the condition for calling the function better. The codelet above forgetting consideration on the result of CatCacheCleanupOldEntries() itself. The function returns false when all "old" entries have been invalidated or explicitly removed and we need to expand the hash in that case. > To make it that simple, we want catcacheclock and > time_at_which_we_can_prune to be stored as bare uint64 quantities so > we don't need TimestampDifference(). And we want > time_at_which_we_can_prune to be set to PG_UINT64_MAX when the feature > is disabled. But those both seem like pretty achievable things... and > it seems like the result would probably be faster than what you have > now. The time_at_which_we_can_prune is not global but catcache-local and needs to change at the time catalog_cache_prune_min_age is changed. So the next version does as the follwoing: - if (CatCacheCleanupOldEntries(cp)) + if (catcacheclock - cp->cc_oldest_ts > prune_min_age_us && + CatCacheCleanupOldEntries(cp)) On the other hand CatCacheCleanupOldEntries can calcualte the time_at_which_we_can_prune once at the beginning of the function. That makes the condition in the loop simpler. - TimestampDifference(ct->lastaccess, catcacheclock, , ); - - if (age > catalog_cache_prune_min_age) + if (ct->lastaccess < prune_threshold) { > + * per-statement basis and additionaly udpated periodically > > two words spelled wrong Ugg. Fixed. Checked all spellings and found another misspelling. > +void > +assign_catalog_cache_prune_min_age(int newval, void *extra) > +{ > + catalog_cache_prune_min_age = newval; > +} > > hmm, do we need this? *That* is actually useless, but the function is kept and not it maintains the internal-version of the GUC parameter (uint64 prune_min_age). > + /* > + * Entries that are not accessed after the last pruning > + * are removed in that seconds, and their lives are > + * prolonged according to how many times they are accessed > + * up to three times of the duration. We don't try shrink > + * buckets since pruning effectively caps catcache > + * expansion in the long term. > + */ > + ct->naccess = Min(2, ct->naccess); > > The code doesn't match the comment, it seems, because the limit here > is 2, not 3. I wonder if this does anything anyway. My intuition is > that when a catcache entry gets accessed at all it's probably likely > to get accessed a bunch of times. If there are any meaningful > thresholds here I'd expect us to be trying to distinguish things like > 1000+ accesses vs. 100-1000 vs. 10-100 vs. 1-10. Or maybe we don't > need to distinguish at all and can just have a single mark bit rather > than a counter. Agreed. Since I don't see a clear criteria for the threshold of the counter, I removed the naccess and related lines. I did the following changes in the attached. 1. Removed naccess and related lines. 2. Moved the precheck condition out of CatCacheCleanupOldEntries() to RehashCatCache(). 3. Use uint64 direct comparison instead of TimestampDifference(). 4. Removed CatCTup.dead flag. Performance measurement on the attached showed better result about searching but maybe worse for cache entry creation. Each time number is the mean of 10 runs. # Cacache (negative) entry creation : time(ms) (% to master) master : 3965.61(100.0)
Re: Protect syscache from bloating with negative cache entries
At Tue, 17 Nov 2020 17:46:25 +0200, Heikki Linnakangas wrote in > On 09/11/2020 11:34, Kyotaro Horiguchi wrote: > > At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas > > wrote in > >> Do you need the "ntaccess == 2" test? You could always increment the > >> counter, and in the code that uses ntaccess to decide what to evict, > >> treat all values >= 2 the same. > >> > >> Need to handle integer overflow somehow. Or maybe not: integer > >> overflow is so infrequent that even if a hot syscache entry gets > >> evicted prematurely because its ntaccess count wrapped around to 0, it > >> will happen so rarely that it won't make any difference in practice. > > That relaxing simplifies the code significantly, but a significant > > degradation by about 5% still exists. > > (SearchCatCacheInternal()) > > + ct->naccess++; > > !+ ct->lastaccess = catcacheclock; > > If I removed the second line above, the degradation disappears > > (-0.7%). > > 0.7% degradation is probably acceptable. Sorry for the confusion "-0.7% degradation" meant "+0.7% gain". > > However, I don't find the corresponding numbers in the output > > of perf. The sum of the numbers for the removed instructions is (0.02 > > + 0.28 = 0.3%). I don't think the degradation as the whole doesn't > > always reflect to the instruction level profiling, but I'm stuck here, > > anyway. > > Hmm. Some kind of cache miss effect, perhaps? offsetof(CatCTup, tuple) is Shouldn't it be seen in the perf result? > exactly 64 bytes currently, so any fields that you add after 'tuple' will go > on a different cache line. Maybe it would help if you just move the new fields > before 'tuple'. > > Making CatCTup smaller might help. Some ideas/observations: > > - The 'ct_magic' field is only used for assertion checks. Could remove it. Ok, removed. > - 4 Datums (32 bytes) are allocated for the keys, even though most catcaches > - have fewer key columns. > - In the current syscaches, keys[2] and keys[3] are only used to store 32-bit > - oids or some other smaller fields. Allocating a full 64-bit Datum for them > - wastes memory. It seems to be the last resort. > - You could move the dead flag at the end of the struct or remove it > - altogether, with the change I mentioned earlier to not keep dead items in > - the buckets This seems most promising so I did this. One annoyance is we need to know whether a catcache tuple is invalidated or not to judge whether to remove it. I used CatCtop.cache_elem.prev to signal the same in the next version. > - You could steal a few bit for dead/negative flags from some other field. Use > - special values for tuple.t_len for them or something. I stealed the MSB of refcount as negative, but the bit masking operations seems making the function slower. The benchmark-2 gets slower by around +2% as the total. > With some of these tricks, you could shrink CatCTup so that the new lastaccess > and naccess fields would fit in the same cacheline. > > That said, I think this is good enough performance-wise as it is. So if we > want to improve performance in general, that can be a separate patch. Removing CatCTup.dead increased the performance of catcache search significantly, but catcache entry creation gets slower for uncertain rasons.. (Continues to a reply to Robert's comment) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
On Tue, Nov 17, 2020 at 10:46 AM Heikki Linnakangas wrote: > 0.7% degradation is probably acceptable. I haven't looked at this patch in a while and I'm pleased with the way it seems to have been redesigned. It seems relatively simple and unlikely to cause big headaches. I would say that 0.7% is probably not acceptable on a general workload, but it seems fine on a benchmark that is specifically designed to be a worst-case for this patch, which I gather is what's happening here. I think it would be nice if we could enable this feature by default. Does it cause a measurable regression on realistic workloads when enabled? I bet a default of 5 or 10 minutes would help many users. One idea for improving things might be to move the "return immediately" tests in CatCacheCleanupOldEntries() to the caller, and only call this function if they indicate that there is some purpose. This would avoid the function call overhead when nothing can be done. Perhaps the two tests could be combined into one and simplified. Like, suppose the code looks (roughly) like this: if (catcacheclock >= time_at_which_we_can_prune) CatCacheCleanupOldEntries(...); To make it that simple, we want catcacheclock and time_at_which_we_can_prune to be stored as bare uint64 quantities so we don't need TimestampDifference(). And we want time_at_which_we_can_prune to be set to PG_UINT64_MAX when the feature is disabled. But those both seem like pretty achievable things... and it seems like the result would probably be faster than what you have now. + * per-statement basis and additionaly udpated periodically two words spelled wrong +void +assign_catalog_cache_prune_min_age(int newval, void *extra) +{ + catalog_cache_prune_min_age = newval; +} hmm, do we need this? + /* + * Entries that are not accessed after the last pruning + * are removed in that seconds, and their lives are + * prolonged according to how many times they are accessed + * up to three times of the duration. We don't try shrink + * buckets since pruning effectively caps catcache + * expansion in the long term. + */ + ct->naccess = Min(2, ct->naccess); The code doesn't match the comment, it seems, because the limit here is 2, not 3. I wonder if this does anything anyway. My intuition is that when a catcache entry gets accessed at all it's probably likely to get accessed a bunch of times. If there are any meaningful thresholds here I'd expect us to be trying to distinguish things like 1000+ accesses vs. 100-1000 vs. 10-100 vs. 1-10. Or maybe we don't need to distinguish at all and can just have a single mark bit rather than a counter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Protect syscache from bloating with negative cache entries
On 09/11/2020 11:34, Kyotaro Horiguchi wrote: At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas wrote in Do you need the "ntaccess == 2" test? You could always increment the counter, and in the code that uses ntaccess to decide what to evict, treat all values >= 2 the same. Need to handle integer overflow somehow. Or maybe not: integer overflow is so infrequent that even if a hot syscache entry gets evicted prematurely because its ntaccess count wrapped around to 0, it will happen so rarely that it won't make any difference in practice. That relaxing simplifies the code significantly, but a significant degradation by about 5% still exists. (SearchCatCacheInternal()) + ct->naccess++; !+ ct->lastaccess = catcacheclock; If I removed the second line above, the degradation disappears (-0.7%). 0.7% degradation is probably acceptable. However, I don't find the corresponding numbers in the output of perf. The sum of the numbers for the removed instructions is (0.02 + 0.28 = 0.3%). I don't think the degradation as the whole doesn't always reflect to the instruction level profiling, but I'm stuck here, anyway. Hmm. Some kind of cache miss effect, perhaps? offsetof(CatCTup, tuple) is exactly 64 bytes currently, so any fields that you add after 'tuple' will go on a different cache line. Maybe it would help if you just move the new fields before 'tuple'. Making CatCTup smaller might help. Some ideas/observations: - The 'ct_magic' field is only used for assertion checks. Could remove it. - 4 Datums (32 bytes) are allocated for the keys, even though most catcaches have fewer key columns. - In the current syscaches, keys[2] and keys[3] are only used to store 32-bit oids or some other smaller fields. Allocating a full 64-bit Datum for them wastes memory. - You could move the dead flag at the end of the struct or remove it altogether, with the change I mentioned earlier to not keep dead items in the buckets - You could steal a few bit for dead/negative flags from some other field. Use special values for tuple.t_len for them or something. With some of these tricks, you could shrink CatCTup so that the new lastaccess and naccess fields would fit in the same cacheline. That said, I think this is good enough performance-wise as it is. So if we want to improve performance in general, that can be a separate patch. - Heikki
Re: Protect syscache from bloating with negative cache entries
At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas wrote in > Do you need the "ntaccess == 2" test? You could always increment the > counter, and in the code that uses ntaccess to decide what to evict, > treat all values >= 2 the same. > > Need to handle integer overflow somehow. Or maybe not: integer > overflow is so infrequent that even if a hot syscache entry gets > evicted prematurely because its ntaccess count wrapped around to 0, it > will happen so rarely that it won't make any difference in practice. That relaxing simplifies the code significantly, but a significant degradation by about 5% still exists. (SearchCatCacheInternal()) + ct->naccess++; !+ ct->lastaccess = catcacheclock; If I removed the second line above, the degradation disappears (-0.7%). However, I don't find the corresponding numbers in the output of perf. The sum of the numbers for the removed instructions is (0.02 + 0.28 = 0.3%). I don't think the degradation as the whole doesn't always reflect to the instruction level profiling, but I'm stuck here, anyway. % samples master p2patched(p2 = patched - "ct->lastaccess = catcacheclock) = 0.47 | 0.27 | 0.17 | mov %rbx,0x8(%rbp) | | | SearchCatCacheInternal(): | | | ct->naccess++; | | | ct->lastaccess = catcacheclock; - |- | 0.02 |10f: mov catcacheclock,%rax | | | ct->naccess++; - | 0.96 | 1.00 | addl $0x1,0x14(%rbx) | | | return NULL; - | 0.11 | 0.16 | xor %ebp,%ebp | | | if (!ct->negative) 0.27 | 0.30 | 0.03 | cmpb $0x0,0x21(%rbx) | | | ct->lastaccess = catcacheclock; - | | 0.28 | mov %rax,0x18(%rbx) | | | if (!ct->negative) 0.34 | 0.08 | 0.59 | ↓ jne 149 For your information, the same table for a bit wider range follows. % samples master p2patched(p2 = patched - "ct->lastaccess = catcacheclock) = | | | dlist_foreach(iter, bucket) 6.91 | 7.06 | 5.89 | mov 0x8(%rbp),%rbx 0.78 | 0.73 | 0.81 | test %rbx,%rbx | | | ↓ je160 | | | cmp %rbx,%rbp 0.46 | 0.52 | 0.39 | ↓ jne 9d | | | ↓ jmpq 160 | | | nop 5.68 | 5.54 | 6.03 | 90: mov 0x8(%rbx),%rbx 1.44 | 1.42 | 1.43 | cmp %rbx,%rbp | | | ↓ je160 | | | { | | | ct = dlist_container(CatCTup, cache_elem, iter.cur); | | | | | | if (ct->dead) 30.36 |30.97 | 31.48 | 9d: cmpb $0x0,0x20(%rbx) 2.63 | 2.60 | 2.69 | ↑ jne 90 | | | continue; /* ignore dead entries */ | | | | | | if (ct->hash_value != hashValue) 1.41 | 1.37 | 1.35 | cmp -0x24(%rbx),%edx 3.19 | 2.97 | 2.87 | ↑ jne 90 7.17 | 5.53 | 6.89 | mov %r13,%rsi 0.02 | 0.04 | 0.04 | xor %r12d,%r12d 3.00 | 2.98 | 2.95 | ↓ jmp b5 0.15 | 0.61 | 0.20 | b0: mov 0x10(%rsp,%r12,1),%rsi 6.58 | 5.04 | 5.95 | b5: mov %ecx,0xc(%rsp) | | | CatalogCacheCompareTuple(): | | | if (!(cc_fastequal[i]) (cachekeys[i], searchkeys[i])) 1.51 | 0.92 | 1.66 | mov -0x20(%rbx,%r12,1),%rdi 0.54 | 1.64 | 0.58 | mov %edx,0x8(%rsp) 3.78 | 3.11 | 3.86 | → callq *0x38(%r14,%r12,1) 0.43 | 2.30 | 0.34 | mov 0x8(%rsp),%edx 0.20 | 0.94 | 0.25 | mov 0xc(%rsp),%ecx 0.44 | 0.41 | 0.44 | test %al,%al | | | ↑ je90 | | | for (i = 0; i < nkeys; i++) 2.28 | 1.07 | 2.26 | add $0x8,%r12 0.08 | 0.23 | 0.07 | cmp $0x18,%r12 0.11 | 0.64 | 0.10 | ↑ jne b0 | | | dlist_move_head(): | | | */ | | | static inline void | | | dlist_move_head(dlist_head *head, dlist_node *node) | | | { | | | /* fast path if it's already at the head */ | | | if (head->head.next == node) 0.08 | 0.61 | 0.04 | cmp 0x8(%rbp),%rbx 0.02 | 0.10 | 0.00 | ↓ je10f | | | return; | | | | | | dlist_delete(node); 0.01 | 0.20 | 0.06 | mov 0x8(%rbx),%rax | | | dlist_delete(): | | | node->prev->next = node->next; 0.75 | 0.13 | 0.72 | mov (%rbx),%rdx 2.89 | 3.42 | 2.22 | mov %rax,0x8(%rdx) | | | node->next->prev = node->prev; 0.01 | 0.09 | 0.00 | mov
Re: Protect syscache from bloating with negative cache entries
At Mon, 09 Nov 2020 11:13:31 +0900 (JST), Kyotaro Horiguchi wrote in > Now the branch for counter-increment is removed. For similar > branches for counter-decrement side in CatCacheCleanupOldEntries(), > Min() is compiled into cmovbe and a branch was removed. Mmm. Sorry, I sent this by a mistake. Please ignore it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas wrote in > On 06/11/2020 10:24, Kyotaro Horiguchi wrote: > > Thank you for the comment! > > First off, I thought that I managed to eliminate the degradation > > observed on the previous versions, but significant degradation (1.1% > > slower) is still seen in on case. > > One thing to keep in mind with micro-benchmarks like this is that even > completely unrelated code changes can change the layout of the code in > memory, which in turn can affect CPU caching affects in surprising > ways. If you're lucky, you can see 1-5% differences just by adding a > function that's never called, for example, if it happens to move other > code in memory so that a some hot codepath or struct gets split across > CPU cache lines. It can be infuriating when benchmarking. True. I sometimes had to make distclean to stabilize such benchmarks.. > > At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas > > wrote in > > (A) original test patch > > I naively thought that the code path is too short to bury the > > degradation of additional a few instructions. Actually I measured > > performance again with the same patch set on the current master and > > had the more or less the same result. > > master 8195.58ms, patched 8817.40 ms: +10.75% > > However, I noticed that the additional call was a recursive call and a > > jmp inserted for the recursive call seems taking significant > > time. After avoiding the recursive call, the difference reduced to > > +0.96% (master 8268.71ms : patched 8348.30ms) > > Just two instructions below are inserted in this case, which looks > > reasonable. > >8720ff <+31>: cmpl $0x,0x4ba942(%rip) # 0xd2ca48 > > > >872106 <+38>: jl 0x872240 (call to a function) > > That's interesting. I think a 1% degradation would be acceptable. > > I think we'd like to enable this feature by default though, so the > performance when it's enabled is also very important. > > > (C) inserting bare counter-update code without a branch > > > >> Do we actually need a branch there? If I understand correctly, the > >> point is to bump up a usage counter on the catcache entry. You could > >> increment the counter unconditionally, even if the feature is not > >> used, and avoid the branch that way. > > That change causes 4.9% degradation, which is worse than having a > > branch. > > master 8364.54ms, patched 8666.86ms (+4.9%) > > The additional instructions follow. > > + 8721ab <+203>:mov0x30(%rbx),%eax # %eax = ct->naccess > > + 8721ae <+206>:mov$0x2,%edx > > + 8721b3 <+211>:add$0x1,%eax# %eax++ > > + 8721b6 <+214>: cmove %edx,%eax # if %eax == 0 then %eax = 2 > > > > + 8721bf <+223>:mov%eax,0x30(%rbx) # ct->naccess = %eax > > + 8721c2 <+226>: mov 0x4cfe9f(%rip),%rax # 0xd42068 > > + 8721c9 <+233>:mov%rax,0x38(%rbx) # ct->lastaccess = %rax > > Do you need the "ntaccess == 2" test? You could always increment the > counter, and in the code that uses ntaccess to decide what to evict, > treat all values >= 2 the same. > > Need to handle integer overflow somehow. Or maybe not: integer > overflow is so infrequent that even if a hot syscache entry gets > evicted prematurely because its ntaccess count wrapped around to 0, it > will happen so rarely that it won't make any difference in practice. Agreed. Ok, I have prioritized completely avoiding degradation on the normal path, but laxing that restriction to 1% or so makes the code far simpler and make the expiration path signifinicantly faster. Now the branch for counter-increment is removed. For similar branches for counter-decrement side in CatCacheCleanupOldEntries(), Min() is compiled into cmovbe and a branch was removed.
Re: Protect syscache from bloating with negative cache entries
On 06/11/2020 10:24, Kyotaro Horiguchi wrote: Thank you for the comment! First off, I thought that I managed to eliminate the degradation observed on the previous versions, but significant degradation (1.1% slower) is still seen in on case. One thing to keep in mind with micro-benchmarks like this is that even completely unrelated code changes can change the layout of the code in memory, which in turn can affect CPU caching affects in surprising ways. If you're lucky, you can see 1-5% differences just by adding a function that's never called, for example, if it happens to move other code in memory so that a some hot codepath or struct gets split across CPU cache lines. It can be infuriating when benchmarking. At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas wrote in (A) original test patch I naively thought that the code path is too short to bury the degradation of additional a few instructions. Actually I measured performance again with the same patch set on the current master and had the more or less the same result. master 8195.58ms, patched 8817.40 ms: +10.75% However, I noticed that the additional call was a recursive call and a jmp inserted for the recursive call seems taking significant time. After avoiding the recursive call, the difference reduced to +0.96% (master 8268.71ms : patched 8348.30ms) Just two instructions below are inserted in this case, which looks reasonable. 8720ff <+31>: cmpl $0x,0x4ba942(%rip)# 0xd2ca48 872106 <+38>: jl 0x872240 (call to a function) That's interesting. I think a 1% degradation would be acceptable. I think we'd like to enable this feature by default though, so the performance when it's enabled is also very important. (C) inserting bare counter-update code without a branch Do we actually need a branch there? If I understand correctly, the point is to bump up a usage counter on the catcache entry. You could increment the counter unconditionally, even if the feature is not used, and avoid the branch that way. That change causes 4.9% degradation, which is worse than having a branch. master 8364.54ms, patched 8666.86ms (+4.9%) The additional instructions follow. + 8721ab <+203>: mov0x30(%rbx),%eax # %eax = ct->naccess + 8721ae <+206>: mov$0x2,%edx + 8721b3 <+211>: add$0x1,%eax# %eax++ + 8721b6 <+214>: cmove %edx,%eax# if %eax == 0 then %eax = 2 + 8721bf <+223>: mov%eax,0x30(%rbx) # ct->naccess = %eax + 8721c2 <+226>: mov0x4cfe9f(%rip),%rax# 0xd42068 + 8721c9 <+233>: mov%rax,0x38(%rbx) # ct->lastaccess = %rax Do you need the "ntaccess == 2" test? You could always increment the counter, and in the code that uses ntaccess to decide what to evict, treat all values >= 2 the same. Need to handle integer overflow somehow. Or maybe not: integer overflow is so infrequent that even if a hot syscache entry gets evicted prematurely because its ntaccess count wrapped around to 0, it will happen so rarely that it won't make any difference in practice. - Heikki
Re: Protect syscache from bloating with negative cache entries
me> First off, I thought that I managed to eliminate the degradation me> observed on the previous versions, but significant degradation (1.1% me> slower) is still seen in on case. While trying benchmarking with many patterns, I noticed that it slows down catcache search significantly to call CatCacheCleanupOldEntries() even if the function does almost nothing. Oddly enough the degradation gets larger if I removed the counter-updating code from SearchCatCacheInternal. It seems that RehashCatCache is called far frequently than I thought and CatCacheCleanupOldEntries was suffering the branch penalty. The degradation vanished by a likely() attached to the condition. On the contrary patched version is constantly slightly faster than master. For now, I measured the patch with three access patterns as the catcachebench was designed. master patched-off patched-on(300s) test 1 3898.18ms 3896.11ms (-0.1%) 3889.44ms (- 0.2%) test 2 8013.37ms 8098.51ms (+1.1%) 8640.63ms (+ 7.8%) test 3 6146.95ms 6147.91ms (+0.0%) 15466 ms (+152 %) master : This patch is not applied. patched-off: This patch is applied and catalog_cache_prune_min_age = -1 patched-on : This patch is applied and catalog_cache_prune_min_age = 0 test 1: Creates many negative entries in STATRELATTINH (expiration doesn't happen) test 2: Repeat fetch several negative entries for many times. test 3: test 1 with expiration happens. The result looks far better, but the test 2 still shows a small degradation... I'll continue investigating it.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 9516267f0e2943cf955cbbfe5133c13c36288ee6 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 6 Nov 2020 17:27:18 +0900 Subject: [PATCH v4] CatCache expiration feature --- src/backend/access/transam/xact.c | 3 + src/backend/utils/cache/catcache.c | 125 + src/backend/utils/misc/guc.c | 12 +++ src/include/utils/catcache.h | 20 + 4 files changed, 160 insertions(+) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index af6afcebb1..a246fcc4c0 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1086,6 +1086,9 @@ static void AtStart_Cache(void) { AcceptInvalidationMessages(); + + if (xactStartTimestamp != 0) + SetCatCacheClock(xactStartTimestamp); } /* diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 3613ae5f44..f63224bfd5 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -38,6 +38,7 @@ #include "utils/rel.h" #include "utils/resowner_private.h" #include "utils/syscache.h" +#include "utils/timestamp.h" /* #define CACHEDEBUG */ /* turns DEBUG elogs on */ @@ -60,9 +61,18 @@ #define CACHE_elog(...) #endif +/* + * GUC variable to define the minimum age of entries that will be considered + * to be evicted in seconds. -1 to disable the feature. + */ +int catalog_cache_prune_min_age = -1; + /* Cache management header --- pointer is NULL until created */ static CatCacheHeader *CacheHdr = NULL; +/* Clock for the last accessed time of a catcache entry. */ +TimestampTz catcacheclock = 0; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -74,6 +84,7 @@ static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache, Index hashIndex, Datum v1, Datum v2, Datum v3, Datum v4); +static bool CatCacheCleanupOldEntries(CatCache *cp); static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys, Datum v1, Datum v2, Datum v3, Datum v4); @@ -99,6 +110,12 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos, static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos, Datum *srckeys, Datum *dstkeys); +/* GUC assign function */ +void +assign_catalog_cache_prune_min_age(int newval, void *extra) +{ + catalog_cache_prune_min_age = newval; +} /* * internal support functions @@ -863,6 +880,10 @@ RehashCatCache(CatCache *cp) int newnbuckets; int i; + /* try removing old entries before expanding hash */ + if (CatCacheCleanupOldEntries(cp)) + return; + elog(DEBUG1, "rehashing catalog cache id %d for %s; %d tups, %d buckets", cp->id, cp->cc_relname, cp->cc_ntup, cp->cc_nbuckets); @@ -1264,6 +1285,20 @@ SearchCatCacheInternal(CatCache *cache, */ dlist_move_head(bucket, >cache_elem); + /* + * Prolong life of this entry. Since we want run as less instructions + * as possible and want the branch be stable for performance reasons, + * we don't give a strict cap on the counter. All numbers above 1 will + * be regarded as 2 in CatCacheCleanupOldEntries(). + */ + if (unlikely(catalog_cache_prune_min_age >= 0)) + { + ct->naccess++; + if (unlikely(ct->naccess
Re: Protect syscache from bloating with negative cache entries
Thank you for the comment! First off, I thought that I managed to eliminate the degradation observed on the previous versions, but significant degradation (1.1% slower) is still seen in on case. Anyway, before sending the new patch, let met just answer for the comments. At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas wrote in > On 19/11/2019 12:48, Kyotaro Horiguchi wrote: > > 1. Inserting a branch in > > SearchCatCacheInternal. (CatCache_Pattern_1.patch) > > This is the most straightforward way to add an alternative feature. > > pattern 1 | 8459.73 | 28.15 # 9% (>> 1%) slower than 7757.58 > > pattern 1 | 8504.83 | 55.61 > > pattern 1 | 8541.81 | 41.56 > > pattern 1 | 8552.20 | 27.99 > > master| 7757.58 | 22.65 > > master| 7801.32 | 20.64 > > master| 7839.57 | 25.28 > > master| 7925.30 | 38.84 > > It's so slow that it cannot be used. > > This is very surprising. A branch that's never taken ought to be > predicted by the CPU's branch-predictor, and be very cheap. (A) original test patch I naively thought that the code path is too short to bury the degradation of additional a few instructions. Actually I measured performance again with the same patch set on the current master and had the more or less the same result. master 8195.58ms, patched 8817.40 ms: +10.75% However, I noticed that the additional call was a recursive call and a jmp inserted for the recursive call seems taking significant time. After avoiding the recursive call, the difference reduced to +0.96% (master 8268.71ms : patched 8348.30ms) Just two instructions below are inserted in this case, which looks reasonable. 8720ff <+31>: cmpl $0x,0x4ba942(%rip)# 0xd2ca48 872106 <+38>: jl 0x872240 (call to a function) (C) inserting bare counter-update code without a branch > Do we actually need a branch there? If I understand correctly, the > point is to bump up a usage counter on the catcache entry. You could > increment the counter unconditionally, even if the feature is not > used, and avoid the branch that way. That change causes 4.9% degradation, which is worse than having a branch. master 8364.54ms, patched 8666.86ms (+4.9%) The additional instructions follow. + 8721ab <+203>:mov0x30(%rbx),%eax # %eax = ct->naccess + 8721ae <+206>:mov$0x2,%edx + 8721b3 <+211>:add$0x1,%eax# %eax++ + 8721b6 <+214>:cmove %edx,%eax# if %eax == 0 then %eax = 2 + 8721bf <+223>:mov%eax,0x30(%rbx) # ct->naccess = %eax + 8721c2 <+226>:mov0x4cfe9f(%rip),%rax# 0xd42068 + 8721c9 <+233>:mov%rax,0x38(%rbx) # ct->lastaccess = %rax (D) naively branching then updateing, again. Come to think of this, I measured the same with a branch again, specifically: (It showed siginificant degradation before, in my memory.) dlsit_move_head(bucket, >cache_elem); + if (catalog_cache_prune_min_age < -1) # never be true + { +(counter update) + } And I had effectively the same numbers from both master and patched. master 8066.93ms, patched 8052.37ms (-0.18%) The above branching inserts the same two instructions with (B) into different place but the result differs, for a reason uncertain to me. + 8721bb <+203>: cmpl $0x,0x4bb886(%rip) # + 8721c2 <+210>: jl 0x872208 I'm not sure why but the patched beats the master by a small difference. Anyway ths new result shows that compiler might have got smarter than before? (E) bumping up in ReleaseCatCache() (won't work) > Another thought is to bump up the usage counter in ReleaseCatCache(), > and only when the refcount reaches zero. That might be somewhat > cheaper, if it's a common pattern to acquire additional leases on an > entry that's already referenced. > > Yet another thought is to replace 'refcount' with an 'acquirecount' > and 'releasecount'. In SearchCatCacheInternal(), increment > acquirecount, and in ReleaseCatCache, increment releasecount. When > they are equal, the entry is not in use. Now you have a counter that > gets incremented on every access, with the same number of CPU > instructions in the hot paths as we have today. These don't work for negative caches, since the corresponding tuples are never released. (F) removing less-significant code. > Or maybe there are some other ways we could micro-optimize > SearchCatCacheInternal(), to buy back the slowdown that this feature Yeah, I thought of that in the beginning. (I removed dlist_move_head() at the time.) But the most difficult aspect of this approach is that I cannot tell whether the modification never cause degradation or not. > would add? For example, you could remove the "if (cl->dead) continue;" > check, if dead entries were kept out of the hash buckets. Or maybe the > catctup struct could be made slightly smaller somehow, so that it > would fit more comfortably in a single cache line. As a trial, I removed that code and added
Re: Protect syscache from bloating with negative cache entries
On 19/11/2019 12:48, Kyotaro Horiguchi wrote: 1. Inserting a branch in SearchCatCacheInternal. (CatCache_Pattern_1.patch) This is the most straightforward way to add an alternative feature. pattern 1 | 8459.73 | 28.15 # 9% (>> 1%) slower than 7757.58 pattern 1 | 8504.83 | 55.61 pattern 1 | 8541.81 | 41.56 pattern 1 | 8552.20 | 27.99 master| 7757.58 | 22.65 master| 7801.32 | 20.64 master| 7839.57 | 25.28 master| 7925.30 | 38.84 It's so slow that it cannot be used. This is very surprising. A branch that's never taken ought to be predicted by the CPU's branch-predictor, and be very cheap. Do we actually need a branch there? If I understand correctly, the point is to bump up a usage counter on the catcache entry. You could increment the counter unconditionally, even if the feature is not used, and avoid the branch that way. Another thought is to bump up the usage counter in ReleaseCatCache(), and only when the refcount reaches zero. That might be somewhat cheaper, if it's a common pattern to acquire additional leases on an entry that's already referenced. Yet another thought is to replace 'refcount' with an 'acquirecount' and 'releasecount'. In SearchCatCacheInternal(), increment acquirecount, and in ReleaseCatCache, increment releasecount. When they are equal, the entry is not in use. Now you have a counter that gets incremented on every access, with the same number of CPU instructions in the hot paths as we have today. Or maybe there are some other ways we could micro-optimize SearchCatCacheInternal(), to buy back the slowdown that this feature would add? For example, you could remove the "if (cl->dead) continue;" check, if dead entries were kept out of the hash buckets. Or maybe the catctup struct could be made slightly smaller somehow, so that it would fit more comfortably in a single cache line. My point is that I don't think we want to complicate the code much for this. All the indirection stuff seems over-engineered for this. Let's find a way to keep it simple. - Heikki
Re: Protect syscache from bloating with negative cache entries
Hello. The attached is the version that is compactified from the previous version. At Thu, 01 Oct 2020 16:47:18 +0900 (JST), Kyotaro Horiguchi wrote in > This is the rebased version. It occurred to me suddenly that static parameters to inline functions causes optimization. I split SearchCatCacheInternal() into two almost-the-same functions SearchCatCacheInternalb() and -e() until the previous version but in turn I merged the two functions into the original function and added a parameter "do_expire". Then compared at machine-code level of the two deduced functions SearchCatCache1b and SearchCatCache1e and confirmed that the expiration-related code is eliminated from the former. The lines prefixed by '+' are the instructions corresponding the following C-code, which are eliminated in SearchCatCache1b. 2770 : 2770: push %r15 2772: push %r14 ... 2849: mov%rbp,(%rbx) 284c: mov%rbx,(%rax) 284f: mov%rbx,0x8(%rbp) + 2853: mov0x30(%rbx),%eax # %eax = ct->naccess + 2856: mov$0x2,%edx + 285b: add$0x1,%eax # ct->access++ + 285e: cmove %edx,%eax # if(ct->access == 0) %eax = 2 2861: xor%ebp,%ebp 2863: cmpb $0x0,0x15(%rbx) # (if (!ct->negative)) + 2867: mov%eax,0x30(%rbx) # ct->access = %eax + 286a: mov0x0(%rip),%rax # %rax = catcacheclock + 2871: mov%rax,0x38(%rbx) # ct->lastaccess = %rax 2875: jne289a 2877: mov0x0(%rip),%rdi > if (do_expire) > { > ct->naccess++; > if (unlikely(ct->naccess == 0)) > ct->naccess = 2; > ct->lastaccess = catcacheclock; > } regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 5ac54752a8c4f555b3ccc405c65e74614057bf34 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 10 Jan 2020 15:02:26 +0900 Subject: [PATCH v3 1/3] base_change Adds struct members needed by catcache expiration feature and a GUC variable that controls the behavior of the feature. But no substantial code is not added yet. If existence of some variables alone can cause degradation, benchmarking after this patch shows that. --- src/backend/access/transam/xact.c | 3 +++ src/backend/utils/cache/catcache.c | 15 +++ src/backend/utils/misc/guc.c | 13 + src/include/utils/catcache.h | 20 4 files changed, 51 insertions(+) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index af6afcebb1..1040713955 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1085,6 +1085,9 @@ ForceSyncCommit(void) static void AtStart_Cache(void) { + if (xactStartTimestamp != 0) + SetCatCacheClock(xactStartTimestamp); + AcceptInvalidationMessages(); } diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 3613ae5f44..c5bad63aac 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -60,9 +60,18 @@ #define CACHE_elog(...) #endif +/* + * GUC variable to define the minimum age of entries that will be considered + * to be evicted in seconds. -1 to disable the feature. + */ +int catalog_cache_prune_min_age = 300; + /* Cache management header --- pointer is NULL until created */ static CatCacheHeader *CacheHdr = NULL; +/* Clock for the last accessed time of a catcache entry. */ +TimestampTz catcacheclock = 0; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -99,6 +108,12 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos, static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos, Datum *srckeys, Datum *dstkeys); +/* GUC assign function */ +void +assign_catalog_cache_prune_min_age(int newval, void *extra) +{ + catalog_cache_prune_min_age = newval; +} /* * internal support functions diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index a62d64eaa4..628a95376c 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -88,6 +88,8 @@ #include "utils/acl.h" #include "utils/builtins.h" #include "utils/bytea.h" +#include "utils/catcache.h" +#include "utils/guc_tables.h" #include "utils/float.h" #include "utils/guc_tables.h" #include "utils/memutils.h" @@ -2358,6 +2360,17 @@ static struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"catalog_cache_prune_min_age", PGC_USERSET, RESOURCES_MEM, + gettext_noop("System catalog cache entries that live unused for longer than this seconds are considered for removal."), + gettext_noop("The value of -1 turns off pruning."), +
Re: Protect syscache from bloating with negative cache entries
At Thu, 1 Oct 2020 13:37:29 +0900, Michael Paquier wrote in > On Wed, Jan 22, 2020 at 02:38:19PM +0900, Kyotaro Horiguchi wrote: > > I changed my mind to attach the benchmark patch as .txt file, > > expecting the checkers not picking up it as a part of the patchset. > > > > I have in the precise performance measurement mode for a long time, > > but I think it's settled. I'd like to return to normal mode and > > explain this patch. > > Looking at the CF bot, this patch set does not compile properly. > Could you look at that? It is complaining that TimestampDifference is implicitly used. I'm not sure the exact cause but maybe some refactoring in header file inclusion caused that. This is the rebased version. Thanks! -- Kyotaro Horiguchi NTT Open Source Software Center >From 07ce4cf303a365c3f3efc3f133551a67e4646875 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 10 Jan 2020 15:02:26 +0900 Subject: [PATCH v2 1/3] base_change Adds struct members needed by catcache expiration feature and a GUC variable that controls the behavior of the feature. But no substantial code is not added yet. If existence of some variables alone can cause degradation, benchmarking after this patch shows that. --- src/backend/access/transam/xact.c | 3 +++ src/backend/utils/cache/catcache.c | 15 +++ src/backend/utils/misc/guc.c | 13 + src/include/utils/catcache.h | 20 4 files changed, 51 insertions(+) diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index af6afcebb1..1040713955 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1085,6 +1085,9 @@ ForceSyncCommit(void) static void AtStart_Cache(void) { + if (xactStartTimestamp != 0) + SetCatCacheClock(xactStartTimestamp); + AcceptInvalidationMessages(); } diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 3613ae5f44..c5bad63aac 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -60,9 +60,18 @@ #define CACHE_elog(...) #endif +/* + * GUC variable to define the minimum age of entries that will be considered + * to be evicted in seconds. -1 to disable the feature. + */ +int catalog_cache_prune_min_age = 300; + /* Cache management header --- pointer is NULL until created */ static CatCacheHeader *CacheHdr = NULL; +/* Clock for the last accessed time of a catcache entry. */ +TimestampTz catcacheclock = 0; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -99,6 +108,12 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos, static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos, Datum *srckeys, Datum *dstkeys); +/* GUC assign function */ +void +assign_catalog_cache_prune_min_age(int newval, void *extra) +{ + catalog_cache_prune_min_age = newval; +} /* * internal support functions diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 596bcb7b84..e39c756f80 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -88,6 +88,8 @@ #include "utils/acl.h" #include "utils/builtins.h" #include "utils/bytea.h" +#include "utils/catcache.h" +#include "utils/guc_tables.h" #include "utils/float.h" #include "utils/guc_tables.h" #include "utils/memutils.h" @@ -2358,6 +2360,17 @@ static struct config_int ConfigureNamesInt[] = NULL, NULL, NULL }, + { + {"catalog_cache_prune_min_age", PGC_USERSET, RESOURCES_MEM, + gettext_noop("System catalog cache entries that live unused for longer than this seconds are considered for removal."), + gettext_noop("The value of -1 turns off pruning."), + GUC_UNIT_S + }, + _cache_prune_min_age, + 300, -1, INT_MAX, + NULL, assign_catalog_cache_prune_min_age, NULL + }, + /* * We use the hopefully-safely-small value of 100kB as the compiled-in * default for max_stack_depth. InitializeGUCOptions will increase it if diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h index f4aa316604..3d3870f05a 100644 --- a/src/include/utils/catcache.h +++ b/src/include/utils/catcache.h @@ -22,6 +22,7 @@ #include "access/htup.h" #include "access/skey.h" +#include "datatype/timestamp.h" #include "lib/ilist.h" #include "utils/relcache.h" @@ -61,6 +62,7 @@ typedef struct catcache slist_node cc_next; /* list link */ ScanKeyData cc_skey[CATCACHE_MAXKEYS]; /* precomputed key info for heap * scans */ + TimestampTz cc_oldest_ts; /* timestamp of the oldest tuple in the hash */ /* * Keep these at the end, so that compiling catcache.c with CATCACHE_STATS @@ -119,6 +121,8 @@ typedef struct catctup bool dead; /* dead but not yet removed? */ bool negative; /* negative cache entry? */ HeapTupleData tuple; /* tuple management header */ + unsigned int naccess; /* # of access to this
Re: Protect syscache from bloating with negative cache entries
On Wed, Jan 22, 2020 at 02:38:19PM +0900, Kyotaro Horiguchi wrote: > I changed my mind to attach the benchmark patch as .txt file, > expecting the checkers not picking up it as a part of the patchset. > > I have in the precise performance measurement mode for a long time, > but I think it's settled. I'd like to return to normal mode and > explain this patch. Looking at the CF bot, this patch set does not compile properly. Could you look at that? -- Michael signature.asc Description: PGP signature
Re: Protect syscache from bloating with negative cache entries
At Tue, 21 Jan 2020 17:29:47 +0100, Tomas Vondra wrote in > I see this patch is stuck in WoA since 2019/12/01, although there's a > new patch version from 2020/01/14. But the patch seems to no longer > apply, at least according to https://commitfest.cputube.org :-( So at > this point the status is actually correct. > > Not sure about the appveyor build (it seems to be about > jsonb_set_lax), > but on travis it fails like this: > > catcache.c:820:1: error: no previous prototype for > ‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes] I changed my mind to attach the benchmark patch as .txt file, expecting the checkers not picking up it as a part of the patchset. I have in the precise performance measurement mode for a long time, but I think it's settled. I'd like to return to normal mode and explain this patch. === Motive of the patch System cache is a mechanism that accelerates access to system catalogs Basically the entries in a cache is removed via invalidation machanism when corresponding system catalog entry is removed. On the other hand the system cache also holds "negative" entries that indicates that the object is nonexistent, which entry accelerates the response for nonexistent objects. But the negative cache doesn't have a chance of removal. On a long-lived session that accepts a wide variety of queries on many objects, system cache holds the cache entries for many objects that are accessed once or a few times. Suppose every object is accessed once per, say, 30 minutes, and the query doesn't needed to run in a very short time. Such cache entries are almost useless but occupy a large amount of memory. === Possible solutions. Many caching system has expiration mechanism, which removes "useless" entries to keep the size under a certain limit. The limit is typically defined by memory usage or expiration time, in a hard or soft way. Since we don't implement an detailed accouting of memory usage by cache for the performance reasons, we can use coarse memory accounting or expiration time. This patch uses expiration time because it can be detemined on a rather clearer basis. === Pruning timing The next point is when to prune cache entries. Apparently it's not reasonable to do on every cache access time, since pruning takes a far longer time than cache access. The system cache is implemented on a hash. When there's no room for a new cache entry, it gets twice in size and rehashes all entries. If pruning made some space for the new entry, rehashing can be avoided, so this patch tries pruning just before enlarging hash table. A system cache can be shrinked if less than a half of the size is used, but this patch doesn't that. It is because we cannot predict if the system cache that have just shrinked is going to enlarged just after and I don't want get this patch that complex. === Performance The pruning mechanism adds several entries to cache entry and updates System cache is a very light-weight machinery so that inserting one branch affects performance apparently. So in this patch, the new stuff is isolated from existing code path using indirect call. After trials on some call-points that can be indirect calls, I found that SearchCatCache[1-4]() is the only point that doesn't affect performance. (Please see upthread for details.) That configuraion also allows future implements of system caches, such like shared system caches. The alternative SearchCatCache[1-4] functions get a bit slower because it maintains access timestamp and access counter. Addition to that pruning puts a certain amount of additional time if no entries are not pruned off. === Pruning criteria At the pruning time described above, every entry is examined agianst the GUC variable catalog_cache_prune_min_age. The pruning mechanism involves a clock-sweep-like mechanism where an entry lives longer if it had accessed. Entry of which access counter is zero is pruned after catalog_cache_prune_min_age. Otherwise an entry survives the pruning round and its counter is decremented. All the timestamp used by the stuff is "catcacheclock", which is updated at every transaction start. === Concise test The attached test1.pl can be used to replay the syscache-bloat caused by negative entries. Setting $prune_age to -1, pruning is turned of and you can see that the backend unlimitedly takes more and more memory as time proceeds. Setting it to 10 or so, the memory size of backend process will stops raising at certain amount. === The patch The attached following are the patch. They have been separated for the benchmarking reasons but that seems to make the patch more easy to read so I leave it alone. I forgot its correct version through a long time of benchmarking so I started from v1 now. - v1-0001-base_change.patch Adds new members to existing structs and catcacheclock-related code. - v1-0002-Make-CatCacheSearchN-indirect-functions.patch Changes SearchCatCacheN functions to be called by indirect
Re: Protect syscache from bloating with negative cache entries
On Tue, Jan 21, 2020 at 02:17:53PM -0500, Tom Lane wrote: > Alvaro Herrera writes: >> Hmm ... travis is running -Werror? That seems overly strict. I think >> we shouldn't punt a patch because of that. > > Why not? We're not going to allow pushing a patch that throws warnings > on common compilers. Or if that does happen, some committer is going > to have to spend time cleaning it up. Better to clean it up sooner. > > (There is, btw, at least one buildfarm animal using -Werror.) I agree that it is good to have in Mr Robot. More early detection means less follow-up cleanup. -- Michael signature.asc Description: PGP signature
Re: Protect syscache from bloating with negative cache entries
Hello. At Tue, 21 Jan 2020 14:17:53 -0500, Tom Lane wrote in > Alvaro Herrera writes: > > On 2020-Jan-21, Tomas Vondra wrote: > >> Not sure about the appveyor build (it seems to be about jsonb_set_lax), > > FWIW, I think I fixed jsonb_set_lax yesterday, so that problem should > be gone the next time the cfbot tries this. > > >> but on travis it fails like this: > >> catcache.c:820:1: error: no previous prototype for > >> ‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes] > > > Hmm ... travis is running -Werror? That seems overly strict. I think > > we shouldn't punt a patch because of that. > > Why not? We're not going to allow pushing a patch that throws warnings > on common compilers. Or if that does happen, some committer is going > to have to spend time cleaning it up. Better to clean it up sooner. > > (There is, btw, at least one buildfarm animal using -Werror.) Mmm. The cause of the error is tentative (or crude or brute) benchmarking function provided as an extension which is not actually a part of the patch and was included for reviewer's convenience. Howerver, I don't want it work on Windows build. If that is regarded as a reason for being punt, I'll repost a new version without the benchmark soon. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
Alvaro Herrera writes: > On 2020-Jan-21, Tomas Vondra wrote: >> Not sure about the appveyor build (it seems to be about jsonb_set_lax), FWIW, I think I fixed jsonb_set_lax yesterday, so that problem should be gone the next time the cfbot tries this. >> but on travis it fails like this: >> catcache.c:820:1: error: no previous prototype for >> ‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes] > Hmm ... travis is running -Werror? That seems overly strict. I think > we shouldn't punt a patch because of that. Why not? We're not going to allow pushing a patch that throws warnings on common compilers. Or if that does happen, some committer is going to have to spend time cleaning it up. Better to clean it up sooner. (There is, btw, at least one buildfarm animal using -Werror.) regards, tom lane
Re: Protect syscache from bloating with negative cache entries
On 2020-Jan-21, Tomas Vondra wrote: > Not sure about the appveyor build (it seems to be about jsonb_set_lax), > but on travis it fails like this: > > catcache.c:820:1: error: no previous prototype for > ‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes] Hmm ... travis is running -Werror? That seems overly strict. I think we shouldn't punt a patch because of that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protect syscache from bloating with negative cache entries
Hello Kyotaro-san, I see this patch is stuck in WoA since 2019/12/01, although there's a new patch version from 2020/01/14. But the patch seems to no longer apply, at least according to https://commitfest.cputube.org :-( So at this point the status is actually correct. Not sure about the appveyor build (it seems to be about jsonb_set_lax), but on travis it fails like this: catcache.c:820:1: error: no previous prototype for ‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes] so I'll leave it in WoA for now. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protect syscache from bloating with negative cache entries
This is a new complete workable patch after a long time of struggling with benchmarking. At Tue, 19 Nov 2019 19:48:10 +0900 (JST), Kyotaro Horiguchi wrote in > I ran the run2.sh script attached, which runs catcachebench2(), which > asks SearchSysCache3() for cached entries (almost) 24 times per > run. The number of each output line is the mean of 3 times runs, and > stddev. Lines are in "time" order and edited to fit here. "gen_tbl.pl > | psql" creates a database for the benchmark. catcachebench2() runs > the shortest path in the three in the attached benchmark program. > > (pg_ctl start) > $ perl gen_tbl.pl | psql ... > (pg_ctl stop) I wonder why I took the average of the time instead of choose the fastest one. This benchmark is extremely CPU intensive so the fastest run reliably represents the perfromance. I changed the benchmark so that it shows the time of the fastest run (run4.sh). Based on the latest result, I used the pattern 3 (SearchSyscacheN indirection, but wrongly pointed as 1 in the last mail) in the latest version, I took the number of the fastest time among 3 iteration of 5 runs of both master/patched O2 binaries. version | min -+- master | 7986.65 patched | 7984.47 = 'indirect' below I would say this version doesn't get degradaded by indirect calls. So, I applied the other part of the catcache expiration patch as the succeeding parts. After that I got somewhat strange but very stable result. Just adding struct members acceleartes the benchmark. The numbers are the fastest time of 20 runs of the bencmark in 10 times iterations. ms master 7980.79 # the master with the benchmark extension (0001) = base7340.96 # add only struct members and a GUC variable. (0002) indirect7998.68 # call SearchCatCacheN indirectly (0003) = expire-off 7422.30 # CatCache expiration (0004) # (catalog_cache_prune_min_age = -1) expire-on 7861.13 # CatCache expiration (catalog_cache_prune_min_age = 0) The patch accelerates CatCaCheSearch for uncertain reasons. I'm not sure what makes the difference between about 8000ms and about 7400 ms, though. Several times building of all versions then running the benchmark gave me the results with the same tendency. I once stop this work at this point and continue later. The following files are attached. 0001-catcache-benchmark-extension.patch: benchmnark extension used by the benchmarking here. The test tables are generated using gentbl2.pl attached. (perl gentbk2.pl | psql) 0002-base_change.patch: Preliminary adds some struct members and a GUC variable to see if they cause any extent of degradation. 0003-Make-CatCacheSearchN-indirect-functions.patch: Rewrite to change CatCacheSearchN functions to be called indirectly. 0004-CatCache-expiration-feature.patch: Add CatCache expiration feature. gentbl2.pl: A script that emits SQL statements to generate test tables. run4.sh : The test script I used for benchmarkiing here. build2.sh : A script I used to build the four types of binaries used here. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From dacf4a2ac9eb49099e744ee24066b94e9f78aa61 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 14 Nov 2019 19:24:36 +0900 Subject: [PATCH 1/4] catcache benchmark extension Provides the function catcachebench(bench_no int), which runs CPU intensive benchmark on catcache search. The test table is created by a script separately provided. catcachebench(0): prewarm catcache with provided test tables. catcachebench(1): fetches all attribute stats of all tables. This benchmark loads a vast number of unique entries. Expriration doesn't work since it runs in a transaction. catcachebench(2): fetches all attribute stats of a tables many times. This benchmark repeatedly accesses already loaded entries. Expriration doesn't work since it runs in a transaction. catcachebench(3): fetches all attribute stats of all tables four times. Different from other modes, this runs expiration by forcibly updates reference clock variable every 1000 entries. At this point, variables needed for the expiration feature is not added so SetCatCacheClock is a dummy macro that just replaces it with its parameter. --- contrib/catcachebench/Makefile | 17 + contrib/catcachebench/catcachebench--0.0.sql | 14 + contrib/catcachebench/catcachebench.c| 330 +++ contrib/catcachebench/catcachebench.control | 6 + src/backend/utils/cache/catcache.c | 35 ++ src/backend/utils/cache/syscache.c | 2 +- src/include/utils/catcache.h | 3 + 7 files changed, 406 insertions(+), 1 deletion(-) create mode 100644 contrib/catcachebench/Makefile create mode 100644 contrib/catcachebench/catcachebench--0.0.sql create mode 100644 contrib/catcachebench/catcachebench.c create mode 100644 contrib/catcachebench/catcachebench.control
Re: Protect syscache from bloating with negative cache entries
On Tue, Nov 19, 2019 at 07:48:10PM +0900, Kyotaro Horiguchi wrote: > I'd like to throw in food for discussion on how much SearchSysCacheN > suffers degradation from some choices on how we can insert a code into > the SearchSysCacheN code path. Please note that the patch has a warning, causing cfbot-san to complain: catcache.c:786:1: error: no previous prototype for ‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes] CatalogCacheFlushCatalog2(Oid catId) ^ cc1: all warnings being treated as errors So this should at least be fixed. For now I have moved it to next CF, waiting on author. -- Michael signature.asc Description: PGP signature
Re: Protect syscache from bloating with negative cache entries
I'd like to throw in food for discussion on how much SearchSysCacheN suffers degradation from some choices on how we can insert a code into the SearchSysCacheN code path. I ran the run2.sh script attached, which runs catcachebench2(), which asks SearchSysCache3() for cached entries (almost) 24 times per run. The number of each output line is the mean of 3 times runs, and stddev. Lines are in "time" order and edited to fit here. "gen_tbl.pl | psql" creates a database for the benchmark. catcachebench2() runs the shortest path in the three in the attached benchmark program. (pg_ctl start) $ perl gen_tbl.pl | psql ... (pg_ctl stop) 0. Baseline (0001-benchmark.patch, 0002-Base-change.patch) At first, I made two binaries from the literally same source. For the benchmark's sake the source is already modified a bit. Specifically it has SetCatCacheClock needed by the benchmark, but actually not called in this benchmark. time(ms)|stddev(ms) not patched | 7750.42 | 23.83 # 0.6% faster than 7775.23 not patched | 7864.73 | 43.21 not patched | 7866.80 | 106.47 not patched | 7952.06 | 63.14 master | 7775.23 | 35.76 master | 7870.42 | 120.31 master | 7876.76 | 109.04 master | 7963.04 | 9.49 So, it seems to me that we cannot tell something about differences below about 80ms (about 1%) now. 1. Inserting a branch in SearchCatCacheInternal. (CatCache_Pattern_1.patch) This is the most straightforward way to add an alternative feature. pattern 1 | 8459.73 | 28.15 # 9% (>> 1%) slower than 7757.58 pattern 1 | 8504.83 | 55.61 pattern 1 | 8541.81 | 41.56 pattern 1 | 8552.20 | 27.99 master| 7757.58 | 22.65 master| 7801.32 | 20.64 master| 7839.57 | 25.28 master| 7925.30 | 38.84 It's so slow that it cannot be used. 2. Making SearchCatCacheInternal be an indirect function. (CatCache_Pattern_2.patch) Next, I made the work horse routine be called indirectly. The "inline" for the function acutally let compiler optimize SearchCatCacheN routines as described in comment but the effect doesn't seem so large at least for this case. pattern 2 | 7976.22 | 46.12 (2.6% slower > 1%) pattern 2 | 8103.03 | 51.57 pattern 2 | 8144.97 | 68.46 pattern 2 | 8353.10 | 34.89 master| 7768.40 | 56.00 master| 7772.02 | 29.05 master| 7775.05 | 27.69 master| 7830.82 | 13.78 3. Making SearchCatCacheN be indirect functions. (CatCache_Pattern_3.patch) As far as gcc/linux/x86 goes, SearchSysCacheN is comiled into the following instructions: 0x00866c20 <+0>: movslq %edi,%rdi 0x00866c23 <+3>: mov0xd3da40(,%rdi,8),%rdi 0x00866c2b <+11>: jmpq 0x856ee0 If we made SearchCatCacheN be indirect functions as the patch, it changes just one instruction as: 0x00866c50 <+0>: movslq %edi,%rdi 0x00866c53 <+3>: mov0xd3da60(,%rdi,8),%rdi 0x00866c5b <+11>: jmpq *0x4c0caf(%rip) # 0xd27910 pattern 3 | 7836.26 | 48.66 (2% slower > 1%) pattern 3 | 7963.74 | 67.88 pattern 3 | 7966.65 | 101.07 pattern 3 | 8214.57 | 71.93 master| 7679.74 | 62.20 master| 7756.14 | 77.19 master| 7867.14 | 73.33 master| 7893.97 | 47.67 I expected this runs in almost the same time. I'm not sure if it is the result of spectre_v2 mitigation, but I show status of my environment as follows. # uname -r 4.18.0-80.11.2.el8_0.x86_64 # cat /proc/cpuinfo ... model name : Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz stepping: 12 microcode : 0xae bugs: spectre_v1 spectre_v2 spec_store_bypass mds # cat /sys/devices/system/cpu/vulnerabilities/spectre_v2 Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: disabled, RSB filling I am using CentOS8 and I don't find a handy (or on-the-fly) way to disable them.. Attached are: 0001-benchmark.patch: catcache benchmark extension (and core side fix) 0002-Base-change.patch : baseline change in this series of benchmark CatCache_Pattern_1.patch: naive branching CatCache_Pattern_2.patch: indirect SearchCatCacheInternal CatCache_Pattern_1.patch: indirect SearchCatCacheN regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 245e88e1b43df74273fbaa1b22f4f64621ffe9d5 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 14 Nov 2019 19:24:36 +0900 Subject: [PATCH 1/2] benchmark --- contrib/catcachebench/Makefile | 17 + contrib/catcachebench/catcachebench--0.0.sql | 14 + contrib/catcachebench/catcachebench.c| 330 +++ contrib/catcachebench/catcachebench.control | 6 + src/backend/utils/cache/catcache.c | 33 ++ src/backend/utils/cache/syscache.c | 2 +- 6 files changed, 401 insertions(+), 1 deletion(-) create mode 100644 contrib/catcachebench/Makefile create mode 100644 contrib/catcachebench/catcachebench--0.0.sql create mode 100644 contrib/catcachebench/catcachebench.c create mode 100644
Re: Protect syscache from bloating with negative cache entries
Hello, my_gripe> But, still fluctulates by around 5%.. my_gripe> my_gripe> If this level of the degradation is still not acceptable, that my_gripe> means that nothing can be inserted in the code path and the new my_gripe> code path should be isolated from existing code by using indirect my_gripe> call. Finally, after some struggling, I think I could manage to measure the impact on performace precisely and reliably. Starting from "make distclean" every time building, then removing all in $TARGET before installation makes things stable enough. (I don't think it's good but I didin't investigate the cause..) I measured time/call by directly calling SearchSysCache3() many times. It showed that the patch causes around 0.1 microsec degradation per call. (The funtion overall took about 6.9 microsec on average.) Next, I counted how many times SearchSysCache is called during a planning with, as an instance, a query on a partitioned table having 3000 columns and 1000 partitions. explain analyze select sum(c) from test.p; Planner made 6020608 times syscache calls while planning and the overall planning time was 8641ms. (Exec time was 48ms.) 6020608 times 0.1 us is 602 ms of degradation. So roughly -7% degradation in planning time in estimation. The degradation was given by really only the two successive instructions "ADD/conditional MOVE(CMOVE)". The fact leads to the conclusion that the existing code path as is doesn't have room for any additional code. So I sought for room at least for one branch and found that (on gcc 7.3.1/CentOS7/x64). Interestingly, de-inlining SearchCatCacheInternal gave me gain on performance by about 3%. Further inlining of CatalogCacheComputeHashValue() gave another gain about 3%. I could add a branch in SearchCatCacheInteral within the gain. I also tried indirect calls but the degradation overwhelmed the gain, so I choosed branching rather than indirect calls. I didn't investigated how it happens. The following is the result. The binaries are build with the same configuration using -O2. binary means master : master HEAD. patched_off : patched, but pruning disabled (catalog_cache_prune_min_age=-1). patched_on : patched with pruning enabled. ("300s" for 1, "1s" for2, "0" for 3) bench: 1: corresponds to catcachebench(1); fetching STATRELATTINH 3000 * 1000 times generating new cache entriies. (Massive cache creatiion) Pruning doesn't happen while running this. 2: catcachebench(2); 6 times cache access on 1000 STATRELATTINH entries. (Frequent cache reference) Pruning doesn't happen while running this. 3: catcachebench(3); fetching 1000(tbls) * 3000(cols) STATRELATTINH entries. Catcache clock advancing with the interval of 100(tbls) * 3000(cols) times of access and pruning happenshoge. While running catcachebench(3) once, pruning happens 28 times and most of the time 202202 entries are removed and the total number of entries was limite to 524289. (The systable has 3000 * 1001 = 3003000 tuples.) iter: Number of iterations. Time ms and stddev is calculated over the iterations. binar| bench | iter | time ms | stddev -+---+---+--+ master | 1 |10 | 8150.30 | 12.96 master | 2 |10 | 4002.88 | 16.18 master | 3 |10 | 9065.06 | 11.46 -+---+---+--+ patched_off | 1 |10 | 8090.95 | 9.95 patched_off | 2 |10 | 3984.67 | 12.33 patched_off | 3 |10 | 9050.46 | 4.64 -+---+---+--+ patched_on | 1 |10 | 8158.95 | 6.29 patched_on | 2 |10 | 4023.72 | 10.41 patched_on | 3 |10 | 16532.66 | 18.39 patched_off is slightly faster than master. patched_on is generally a bit slower. Even though patched_on/3 seems take too long time, the extra time comes from increased catalog table acess in exchange of memory saving. (That is, it is expected behavior.) I ran it several times and most them showed the same tendency. As a side-effect, once the branch added, the shared syscache in a neighbour thread will be able to be inserted together without impact on existing code path. === The benchmark script is used as the follows: - create many (3000, as example) tables in "test" schema. I created a partitioned table with 3000 children. - The tables have many columns, 1000 for me. - Run the following commands. =# select catcachebench(0); -- warm up systables. =# set catalog_cache_prune_min_age = any; -- as required =# select catcachebench(n); -- 3 >= n >= 1, the number of "bench" above. The above result is taked with the following query. =# select 'patched_on', '3' , count(a), avg(a)::numeric(10,2), stddev(a)::numeric(10,2) from (select catcachebench(3) from generate_series(1, 10)) as a(a); The attached patches are:
RE: Protect syscache from bloating with negative cache entries
>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com] >Does this result show that hard-limit size option with memory accounting >doesn't harm >to usual users who disable hard limit size option? Hi, I've implemented relation cache size limitation with LRU list and built-in memory context size account. And I'll share some results coupled with a quick recap of catcache so that we can resume discussion if needed though relation cache bloat was also discussed in this thread but right now it's pending and catcache feature is not fixed. But a variety of information could be good I believe. Regarding catcache it seems to me recent Horiguchi san posts shows a pretty detailed stats including comparison LRU overhead and full scan of hash table. According to the result, lru overhead seems small but for simplicity this thread go without LRU. https://www.postgresql.org/message-id/20190404.215255.09756748.horiguchi.kyotaro%40lab.ntt.co.jp When there was hard limit of catcach, there was built-in memory context size accounting machinery. I checked the overhead of memory accounting, and when repeating palloc and pfree of 800 byte area many times it was 4% down on the other hand in case of 32768 byte there seems no overhead. https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A6F44564E%40G01JPEXMBKW04 Regarding relcache hard limit (relation_cache_max_size), most of the architecture was similar as catcache one with LRU list except memory accounting. Relcaches are managed by LRU list. To prune LRU cache, we need to know overall relcache sizes including objects pointed by relcache like 'index info'. So in this patch relcache objects are allocated under RelCacheMemoryContext, which is child of CacheMemoryContext. Objects pointed by relcache is allocated under child context of RelCacheMemoryContext. In built-in size accounting, if memoryContext is set to collect "group(family) size", you can get context size including child easily. I ran two experiments: A) One is pgbench using Tomas's script he posted while ago, which is randomly select 1 from many tables. https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A6F426207%40G01JPEXMBKW04 B) The other is to check memory context account overhead using the same method. https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A6F44564E%40G01JPEXMBKW04 A) randomly select 1 from many tables Results are average of 5 times each. number of tables| 100 |1000|1 --- TPS (master)|11105 |10815 |8915 TPS (patch; limit feature off) |11254 (+1%) |11176 (+3%) |9242 (+4%) TPS (patch: limit on with 1MB) |11317 (+2%) |10491 (-3%) |7380 (-17%) The results are noisy but it seems overhead of LRU and memory accounting is small when turning off the relcache limit feature. When turning on the limit feature, after exceeding the limit it drops 17%, which is no surprise. B) Repeat palloc/pfree "With group accounting" means that account test context and its child context with built-in accounting using "palloc_bench_family()". The other one is that using palloc_bench(). Please see palloc_bench.gz. [Size=32768, iter=1,000,000] Master| 59.97 ms Master with group account | 59.57 ms patched |67.23 ms patched with family |68.81 ms It seems that overhead seems large in this patch. So it needs more inspection this area. regards, Takeshi Ideriha palloc_bench.gz Description: palloc_bench.gz v15-0001-4-Add-dlist_move_tail.patch Description: v15-0001-4-Add-dlist_move_tail.patch binj9L3vPCahv.bin Description: v15-0002-4-ideriha-Memory-consumption-report-reature-of-memorycontext.patch v15-0004-4-ideriha-Remove-RelCache-Entries.patch Description: v15-0004-4-ideriha-Remove-RelCache-Entries.patch v15-0003-4-ideriha-Remove-CatCache-Entries.patch Description: v15-0003-4-ideriha-Remove-CatCache-Entries.patch
Re: Protect syscache from bloating with negative cache entries
At Fri, 05 Apr 2019 09:44:07 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20190405.094407.151644324.horiguchi.kyot...@lab.ntt.co.jp> > By the way, I found the reason of the wrong result of the > previous benchmark. The test 3_0/1 needs to update catcacheclock > midst of the loop. I'm going to fix it and rerun it. I found the cause. CataloCacheFlushCatalog() doesn't shring the hash. So no resize happens once it is bloated. I needed another version of the function that reset the cc_bucket to the initial size. Using the new debug function, I got better numbers. I focused on the performance when disabled. I rechecked that by adding the patch part-by-part and identified several causes of the degradaton. I did: - MovedpSetCatCacheClock() to AtStart_Cache() - Maybe improved the caller site of CatCacheCleanupOldEntries(). As the result: binary | test | count | avg | stddev | +--+---+-++--- master | 1_1 | 5 | 7104.90 | 4.40 | master | 2_1 | 5 | 3759.26 | 4.20 | master | 3_1 | 5 | 7954.05 | 2.15 | +--+---+-++--- Full | 1_1 | 5 | 7237.20 | 7.98 | 101.87 Full | 2_1 | 5 | 4050.98 | 8.42 | 107.76 Full | 3_1 | 5 | 8192.87 | 3.28 | 103.00 But, still fluctulates by around 5%.. If this level of the degradation is still not acceptable, that means that nothing can be inserted in the code path and the new code path should be isolated from existing code by using indirect call. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index bd5024ef00..a9414c0c07 100644 --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -1067,6 +1067,7 @@ static void AtStart_Cache(void) { AcceptInvalidationMessages(); + SetCatCacheClock(GetCurrentStatementStartTimestamp()); } /* diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 44a59e1d4f..4d849aeb4c 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -71,6 +71,7 @@ #include "tcop/pquery.h" #include "tcop/tcopprot.h" #include "tcop/utility.h" +#include "utils/catcache.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/ps_status.h" diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index d05930bc4c..91814f7210 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -60,9 +60,18 @@ #define CACHE_elog(...) #endif +/* + * GUC variable to define the minimum age of entries that will be considered + * to be evicted in seconds. -1 to disable the feature. + */ +int catalog_cache_prune_min_age = 0; + /* Cache management header --- pointer is NULL until created */ static CatCacheHeader *CacheHdr = NULL; +/* Clock for the last accessed time of a catcache entry. */ +TimestampTz catcacheclock = 0; + static inline HeapTuple SearchCatCacheInternal(CatCache *cache, int nkeys, Datum v1, Datum v2, @@ -850,9 +859,83 @@ InitCatCache(int id, */ MemoryContextSwitchTo(oldcxt); + /* initialize catcache reference clock if haven't done yet */ + if (catcacheclock == 0) + catcacheclock = GetCurrentTimestamp(); + return cp; } +/* + * CatCacheCleanupOldEntries - Remove infrequently-used entries + * + * Catcache entries happen to be left unused for a long time for several + * reasons. Remove such entries to prevent catcache from bloating. It is based + * on the similar algorithm with buffer eviction. Entries that are accessed + * several times in a certain period live longer than those that have had less + * access in the same duration. + */ +static bool +CatCacheCleanupOldEntries(CatCache *cp) +{ + int nremoved = 0; + int i; + + /* Return immediately if disabled */ + if (catalog_cache_prune_min_age == 0) + return false; + + /* Scan over the whole hash to find entries to remove */ + for (i = 0 ; i < cp->cc_nbuckets ; i++) + { + dlist_mutable_iter iter; + + dlist_foreach_modify(iter, >cc_bucket[i]) + { + CatCTup*ct = dlist_container(CatCTup, cache_elem, iter.cur); + long entry_age; + int us; + + /* Don't remove referenced entries */ + if (ct->refcount != 0 || +(ct->c_list && ct->c_list->refcount != 0)) +continue; + + /* + * Calculate the duration from the time from the last access to + * the "current" time. catcacheclock is updated per-statement + * basis and additionaly udpated periodically during a long + * running query. + */ + TimestampDifference(ct->lastaccess, catcacheclock, _age, ); + + if (entry_age < catalog_cache_prune_min_age) +continue; + + /* + * Entries that are not accessed after the last pruning are + * removed in that seconds, and their lives are prolonged + * according to how many times they are accessed up to three times + * of the duration. We don't try shrink
Re: Protect syscache from bloating with negative cache entries
Thank you for the comment. At Thu, 4 Apr 2019 15:44:35 -0400, Robert Haas wrote in > On Thu, Apr 4, 2019 at 8:53 AM Kyotaro HORIGUCHI > wrote: > > So it seems to me that the simplest "Full" version wins. The > > attached is rebsaed version. dlist_move_head(entry) is removed as > > mentioned above in that patch. > > 1. I really don't think this patch has any business changing the > existing logic. You can't just assume that the dlist_move_head() > operation is unimportant for performance. Ok, it doesn't show significant performance gain so removed that. > 2. This patch still seems to add a new LRU list that has to be > maintained. That's fairly puzzling. You seem to have concluded that > the version without the additional LRU wins, but the sent a new copy > of the version with the LRU version. Sorry, I attached wrong one. The attached is the right one, which doesn't adds the new dlist. > 3. I don't think adding an additional call to GetCurrentTimestamp() in > start_xact_command() is likely to be acceptable. There has got to be > a way to set this up so that the maximum number of new > GetCurrentTimestamp() is limited to once per N seconds, vs. the > current implementation that could do it many many many times per > second. The GetCurrentTimestamp() is called only once at very early in the backend's life in InitPostgres. Not in start_xact_command. What I did in the function is just copying stmtStartTimstamp, not GetCurrentTimestamp(). > 4. The code in CatalogCacheCreateEntry seems clearly unacceptable. In > a pathological case where CatCacheCleanupOldEntries removes exactly > one element per cycle, it could be called on every new catcache > allocation. It may be a problem, if just one entry was created in the duration longer than by catalog_cache_prune_min_age and resize interval, or all candidate entries except one are actually in use at the pruning moment. Is it realistic? > I think we need to punt this patch to next release. We're not > converging on anything committable very fast. Yeah, maybe right. This patch had several month silence several times, got comments and modified taking in the comments for more than two cycles, and finally had a death sentence (not literaly, actually postpone) at very close to this third cycle end. I anticipate the same continues in the next cycle. By the way, I found the reason of the wrong result of the previous benchmark. The test 3_0/1 needs to update catcacheclock midst of the loop. I'm going to fix it and rerun it. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 596d6b018e1b7ddd5828298bfaba3ee405eb2604 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 1 Mar 2019 13:32:51 +0900 Subject: [PATCH] Remove entries that haven't been used for a certain time Catcache entries happen to be left alone for several reasons. It is not desirable that such useless entries eat up memory. Catcache pruning feature removes entries that haven't been accessed for a certain time before enlarging hash array. --- doc/src/sgml/config.sgml | 19 + src/backend/tcop/postgres.c | 2 + src/backend/utils/cache/catcache.c| 103 +- src/backend/utils/misc/guc.c | 12 +++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/utils/catcache.h | 16 6 files changed, 150 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index bc1d0f7bfa..819b252029 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1677,6 +1677,25 @@ include_dir 'conf.d' + + catalog_cache_prune_min_age (integer) + + catalog_cache_prune_min_age configuration + parameter + + + + + Specifies the minimum amount of unused time in seconds at which a + system catalog cache entry is removed. -1 indicates that this feature + is disabled at all. The value defaults to 300 seconds (5 + minutes). The entries that are not used for the duration + can be removed to prevent catalog cache from bloating with useless + entries. + + + + max_stack_depth (integer) diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 44a59e1d4f..a0efac86bc 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -71,6 +71,7 @@ #include "tcop/pquery.h" #include "tcop/tcopprot.h" #include "tcop/utility.h" +#include "utils/catcache.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/ps_status.h" @@ -2577,6 +2578,7 @@ start_xact_command(void) * not desired, the timeout has to be disabled explicitly. */ enable_statement_timeout(); + SetCatCacheClock(GetCurrentStatementStartTimestamp()); } static void diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index d05930bc4c..03c2d8524c 100644 ---
Re: Protect syscache from bloating with negative cache entries
On Thu, Apr 4, 2019 at 8:53 AM Kyotaro HORIGUCHI wrote: > So it seems to me that the simplest "Full" version wins. The > attached is rebsaed version. dlist_move_head(entry) is removed as > mentioned above in that patch. 1. I really don't think this patch has any business changing the existing logic. You can't just assume that the dlist_move_head() operation is unimportant for performance. 2. This patch still seems to add a new LRU list that has to be maintained. That's fairly puzzling. You seem to have concluded that the version without the additional LRU wins, but the sent a new copy of the version with the LRU version. 3. I don't think adding an additional call to GetCurrentTimestamp() in start_xact_command() is likely to be acceptable. There has got to be a way to set this up so that the maximum number of new GetCurrentTimestamp() is limited to once per N seconds, vs. the current implementation that could do it many many many times per second. 4. The code in CatalogCacheCreateEntry seems clearly unacceptable. In a pathological case where CatCacheCleanupOldEntries removes exactly one element per cycle, it could be called on every new catcache allocation. I think we need to punt this patch to next release. We're not converging on anything committable very fast. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Protect syscache from bloating with negative cache entries
At Mon, 01 Apr 2019 11:05:32 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20190401.110532.102998353.horiguchi.kyot...@lab.ntt.co.jp> > At Fri, 29 Mar 2019 17:24:40 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI > wrote in > <20190329.172440.199616830.horiguchi.kyot...@lab.ntt.co.jp> > > I ran three artificial test cases. The database is created by > > gen_tbl.pl. Numbers are the average of the fastest five runs in > > successive 15 runs. > > > > Test cases are listed below. > > > > 1_0. About 3,000,000 negative entries are created in pg_statstic > > cache by scanning that many distinct columns. It is 3000 tables > > * 1001 columns. Pruning scans happen several times while a run > > but no entries are removed. This emulates the bloating phase of > > cache. catalog_cache_prune_min_age is default (300s). > > (access_tbl1.pl) > > > > 1_1. Same to 1_0 except that catalog_cache_prune_min_age is 0, > > which means turning off. > > > > 2_0. Repeatedly access 1001 of the 3,000,000 entries 6000 > > times. This emulates the stable cache case without having > > pruning. catalog_cache_prune_min_age is default (300s). > > (access_tbl2.pl) > > > > 2_1. Same to 2_0 except that catalog_cache_prune_min_age is 0, > > which means turning off. > > > > 3_0. Scan over the 3,000,000 entries twice with setting prune_age > > to 10s. A run takes about 18 seconds on my box so fair amount > > of old entries are removed. This emulates the stable case with > > continuous pruning. (access_tbl3.pl) > > > > 3_1. Same to 3_0 except that catalog_cache_prune_min_age is 0, > > which means turning off. .. > I had another.. unstable.. result. dlist_move_head is used every time an entry is accessed. It moves the accessed element to the top of bucket expecting that subsequent access become faster - a kind of LRU maintenance. But the mean length of a bucket is 2 so dlist_move_head is too complex than following one step of link. So I removed it in pruning patch. I understand I cannot get rid of noise a far as I'm poking the feature from client via communication and SQL layer. The attached extension surgically exercises SearchSysCache3(STATRELATTINH) in the almost pattern with the benchmarks taken last week. I believe that that gives far reliable numbers. But still the number fluctuates by up to about 10% every trial, and the difference among the methods is under the fluctulation. I'm tired.. But this still looks somewhat wrong. ratio in the following table is the percentage to the master for the same test. master2 is a version that removed the dlink_move_head from master. binary | test | count | avg | stddev | ratio -+--+---+-++ master | 1_0 | 5 | 7841.42 | 6.91 master | 2_0 | 5 | 3810.10 | 8.51 master | 3_0 | 5 | 7826.17 | 11.98 master | 1_1 | 5 | 7905.73 | 5.69 master | 2_1 | 5 | 3827.15 | 5.55 master | 3_1 | 5 | 7822.67 | 13.75 -+--+---+-++ master2 | 1_0 | 5 | 7538.05 | 16.65 | 96.13 master2 | 2_0 | 5 | 3927.05 | 11.58 | 103.07 master2 | 3_0 | 5 | 7455.47 | 12.03 | 95.26 master2 | 1_1 | 5 | 7485.60 | 9.38 | 94.69 master2 | 2_1 | 5 | 3870.81 | 5.54 | 101.14 master2 | 3_1 | 5 | 7437.35 | 9.91 | 95.74 -+--+---+-++ LRU | 1_0 | 5 | 7633.57 | 9.00 | 97.35 LRU | 2_0 | 5 | 4062.43 | 5.90 | 106.62 LRU | 3_0 | 5 | 8340.51 | 6.12 | 106.57 LRU | 1_1 | 5 | 7645.87 | 13.29 | 96.71 LRU | 2_1 | 5 | 4026.60 | 7.56 | 105.21 LRU | 3_1 | 5 | 8400.10 | 19.07 | 107.38 -+--+---+-++ Full| 1_0 | 5 | 7481.61 | 6.70 | 95.41 Full| 2_0 | 5 | 4084.46 | 14.50 | 107.20 Full| 3_0 | 5 | 8166.23 | 14.80 | 104.35 Full| 1_1 | 5 | 7447.20 | 10.93 | 94.20 Full| 2_1 | 5 | 4016.88 | 8.53 | 104.96 Full| 3_1 | 5 | 8258.80 | 7.91 | 105.58 -+--+---+-++ FullMod | 1_0 | 5 | 7291.80 | 14.03 | 92.99 FullMod | 2_0 | 5 | 4006.36 | 7.64 | 105.15 FullMod | 3_0 | 5 | 8143.60 | 9.26 | 104.06 FullMod | 1_1 | 5 | 7270.66 | 6.24 | 91.97 FullMod | 2_1 | 5 | 3996.20 | 13.00 | 104.42 FullMod | 3_1 | 5 | 8012.55 | 7.09 | 102 43 So "Full (scan) Mod" wins again, or the diffence is under error. I don't think this level of difference can be a reason to reject this kind of resource saving mechanism. LRU version doesn't seem particularly slow but also doesn't seem particularly fast for the complexity. FullMod version doesn't look differently. So it seems to me that the simplest "Full" version wins. The attached is rebsaed version. dlist_move_head(entry) is removed as mentioned above in that patch. The third and fourth attached are a set of script I used. $
Re: Protect syscache from bloating with negative cache entries
At Fri, 29 Mar 2019 17:24:40 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20190329.172440.199616830.horiguchi.kyot...@lab.ntt.co.jp> > I ran three artificial test cases. The database is created by > gen_tbl.pl. Numbers are the average of the fastest five runs in > successive 15 runs. > > Test cases are listed below. > > 1_0. About 3,000,000 negative entries are created in pg_statstic > cache by scanning that many distinct columns. It is 3000 tables > * 1001 columns. Pruning scans happen several times while a run > but no entries are removed. This emulates the bloating phase of > cache. catalog_cache_prune_min_age is default (300s). > (access_tbl1.pl) > > 1_1. Same to 1_0 except that catalog_cache_prune_min_age is 0, > which means turning off. > > 2_0. Repeatedly access 1001 of the 3,000,000 entries 6000 > times. This emulates the stable cache case without having > pruning. catalog_cache_prune_min_age is default (300s). > (access_tbl2.pl) > > 2_1. Same to 2_0 except that catalog_cache_prune_min_age is 0, > which means turning off. > > 3_0. Scan over the 3,000,000 entries twice with setting prune_age > to 10s. A run takes about 18 seconds on my box so fair amount > of old entries are removed. This emulates the stable case with > continuous pruning. (access_tbl3.pl) > > 2_1. Same to 3_0 except that catalog_cache_prune_min_age is 0, > which means turning off. > > > The result follows. > > | master | LRU | Full |Full-mod| > -|++++ > 1_0 | 17.287 | 17.370 | 17.255 | 16.623 | > 1_1 | 17.287 | 17.063 | 16.336 | 17.192 | > 2_0 | 15.695 | 18.769 | 18.563 | 15.527 | > 2_1 | 15.695 | 18.603 | 18.498 | 18.487 | > 3_0 | 26.576 | 33.817 | 34.384 | 34.971 | > 3_1 | 26.576 | 27.462 | 26.202 | 26.368 | > > The result of 2_0 and 2_1 seems strange, but I show you the > numbers at the present. > > - Full-scan seems to have the smallest impact when turned off. > > - Full-scan-mod seems to perform best in total. (as far as > Full-mod-2_0 is wrong value..) > > - LRU doesn't seem to outperform full scanning. I had another.. unstable.. result. | master | LRU | Full |Full-mod| -|++++ 1_0 | 16.312 | 16.540 | 16.482 | 16.348 | 1_1 | 16.312 | 16.454 | 16.335 | 16.232 | 2_0 | 16.710 | 16.954 | 17.873 | 17.345 | 2_1 | 16.710 | 17.373 | 18.499 | 17.563 | 3_0 | 25.010 | 33.031 | 33.452 | 33.937 | 3_1 | 25.010 | 24.784 | 24.570 | 25.453 | Normalizing on master's result and rounding off to 1.0%, it looks as: | master | LRU | Full |Full-mod| Test description -|++++--- 1_0 | 100 | 101 | 101 | 100 | bloating. pruning enabled. 1_1 | 100 | 101 | 100 | 100 | bloating. pruning disabled. 2_0 | 100 | 101 | 107 | 104 | normal access. pruning enabled. 2_1 | 100 | 104 | 111 | 105 | normal access. pruning disabled. 3_0 | 100 | 132 | 134 | 136 | pruning continuously running. 3_1 | 100 |99 |98 | 102 | pruning disabled. I'm not sure why the 2_1 is slower than 2_0, but LRU impacts least if the numbers are right. I will investigate the strange behavior using profiler. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
Hello. Sorry for being late a bit. At Wed, 27 Mar 2019 17:30:37 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20190327.173037.40342566.horiguchi.kyot...@lab.ntt.co.jp> > > I don't see much point in continuing to review this patch at this > > point. There's been no new version of the patch in 3 weeks, and there > > is -- in my view at least -- a rather frustrating lack of evidence > > that the complexity this patch introduces is actually beneficial. No > > matter how many people +1 the idea of making this more complicated, it > > can't be justified unless you can provide a test result showing that > > the additional complexity solves a problem that does not get solved > > without that complexity. And even then, who is going to commit a > > patch that uses a design which Tom Lane says was tried before and > > stunk? > > Hmm. Anyway it is hit by recent commit. I'll post a rebased > version and a version reverted to do hole-scan. Then I'll take > numbers as far as I can and will show the result.. tomorrow. I took performance numbers for master and three versions of the patch. Master, LRU, full-scan, modified full-scan. I noticed that useless scan can be skipped in full-scan version so I added the last versoin. I ran three artificial test cases. The database is created by gen_tbl.pl. Numbers are the average of the fastest five runs in successive 15 runs. Test cases are listed below. 1_0. About 3,000,000 negative entries are created in pg_statstic cache by scanning that many distinct columns. It is 3000 tables * 1001 columns. Pruning scans happen several times while a run but no entries are removed. This emulates the bloating phase of cache. catalog_cache_prune_min_age is default (300s). (access_tbl1.pl) 1_1. Same to 1_0 except that catalog_cache_prune_min_age is 0, which means turning off. 2_0. Repeatedly access 1001 of the 3,000,000 entries 6000 times. This emulates the stable cache case without having pruning. catalog_cache_prune_min_age is default (300s). (access_tbl2.pl) 2_1. Same to 2_0 except that catalog_cache_prune_min_age is 0, which means turning off. 3_0. Scan over the 3,000,000 entries twice with setting prune_age to 10s. A run takes about 18 seconds on my box so fair amount of old entries are removed. This emulates the stable case with continuous pruning. (access_tbl3.pl) 2_1. Same to 3_0 except that catalog_cache_prune_min_age is 0, which means turning off. The result follows. | master | LRU | Full |Full-mod| -|++++ 1_0 | 17.287 | 17.370 | 17.255 | 16.623 | 1_1 | 17.287 | 17.063 | 16.336 | 17.192 | 2_0 | 15.695 | 18.769 | 18.563 | 15.527 | 2_1 | 15.695 | 18.603 | 18.498 | 18.487 | 3_0 | 26.576 | 33.817 | 34.384 | 34.971 | 3_1 | 26.576 | 27.462 | 26.202 | 26.368 | The result of 2_0 and 2_1 seems strange, but I show you the numbers at the present. - Full-scan seems to have the smallest impact when turned off. - Full-scan-mod seems to perform best in total. (as far as Full-mod-2_0 is wrong value..) - LRU doesn't seem to outperform full scanning. For your information I measured how long pruning takes time. LRU318318 out of 2097153 entries in 26ms: 0.08us/entry. Full-scan 443443 out of 2097153 entreis in 184ms. 0.4us/entry. LRU is actually fast to remove entries but the difference seems to be canceled by the complexity of LRU maintenance. As my conclusion, we should go with the Full-scan or Full-scan-mod version. I conduct a further overnight test and will see which is better. I attached the test script set. It is used in the folling manner. (start server) # perl gen_tbl.pl | psql postgres (stop server) # sh run.sh 30 > log.txt # 30 is repeat count # perl process.pl | master | LRU | Full |Full-mod| -|++++ 1_0 | 16.711 | 17.647 | 16.767 | 17.256 | ... The attached files are follow. LRU versions patches. LRU-0001-Add-dlist_move_tail.patch LRU-0002-Remove-entries-that-haven-t-been-used-for-a-certain-.patch Fullscn version patch. FullScan-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch Fullscn-mod version patch. FullScan-mod-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch test scripts. test_script.tar.gz regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 1c397d118a65d6b76282cc904c43ecfe97ee5329 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 7 Feb 2019 14:56:07 +0900 Subject: [PATCH 1/2] Add dlist_move_tail We have dlist_push_head/tail and dlist_move_head but not dlist_move_tail. Add it. --- src/include/lib/ilist.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h index b1a5974ee4..659ab1ac87 100644 --- a/src/include/lib/ilist.h +++ b/src/include/lib/ilist.h @@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node) dlist_check(head); } +/* + * Move
Re: Protect syscache from bloating with negative cache entries
At Mon, 25 Mar 2019 09:28:57 -0400, Robert Haas wrote in > On Thu, Mar 7, 2019 at 11:40 PM Ideriha, Takeshi > wrote: > > Just to be sure, we introduced the LRU list in this thread to find the > > entries less than threshold time > > without scanning the whole hash table. If hash table becomes large without > > LRU list, scanning time becomes slow. > > Hmm. So, it's a trade-off, right? One option is to have an LRU list, > which imposes a small overhead on every syscache or catcache operation > to maintain the LRU ordering. The other option is to have no LRU > list, which imposes a larger overhead every time we clean up the > syscaches. My bias is toward thinking that the latter is better, > because: > > 1. Not everybody is going to use this feature, and > > 2. Syscache cleanup should be something that only happens every so > many minutes, and probably while the backend is otherwise idle, > whereas lookups can happen many times per millisecond. > > However, perhaps someone will provide some evidence that casts a > different light on the situation. It's closer to my feeling. When cache is enlarged, all entries are copied into new twice-in-size hash. If some entries removed, we don't need to duplicate the whole hash, otherwise it means that we do extra scan. We don't the pruning scan not frequently than the interval so it is not a bad bid. > I don't see much point in continuing to review this patch at this > point. There's been no new version of the patch in 3 weeks, and there > is -- in my view at least -- a rather frustrating lack of evidence > that the complexity this patch introduces is actually beneficial. No > matter how many people +1 the idea of making this more complicated, it > can't be justified unless you can provide a test result showing that > the additional complexity solves a problem that does not get solved > without that complexity. And even then, who is going to commit a > patch that uses a design which Tom Lane says was tried before and > stunk? Hmm. Anyway it is hit by recent commit. I'll post a rebased version and a version reverted to do hole-scan. Then I'll take numbers as far as I can and will show the result.. tomorrow. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
On Thu, Mar 7, 2019 at 11:40 PM Ideriha, Takeshi wrote: > Just to be sure, we introduced the LRU list in this thread to find the > entries less than threshold time > without scanning the whole hash table. If hash table becomes large without > LRU list, scanning time becomes slow. Hmm. So, it's a trade-off, right? One option is to have an LRU list, which imposes a small overhead on every syscache or catcache operation to maintain the LRU ordering. The other option is to have no LRU list, which imposes a larger overhead every time we clean up the syscaches. My bias is toward thinking that the latter is better, because: 1. Not everybody is going to use this feature, and 2. Syscache cleanup should be something that only happens every so many minutes, and probably while the backend is otherwise idle, whereas lookups can happen many times per millisecond. However, perhaps someone will provide some evidence that casts a different light on the situation. I don't see much point in continuing to review this patch at this point. There's been no new version of the patch in 3 weeks, and there is -- in my view at least -- a rather frustrating lack of evidence that the complexity this patch introduces is actually beneficial. No matter how many people +1 the idea of making this more complicated, it can't be justified unless you can provide a test result showing that the additional complexity solves a problem that does not get solved without that complexity. And even then, who is going to commit a patch that uses a design which Tom Lane says was tried before and stunk? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
RE: Protect syscache from bloating with negative cache entries
>From: Vladimir Sitnikov [mailto:sitnikov.vladi...@gmail.com] > >Robert> This email thread is really short on clear demonstrations that X >Robert> or Y is useful. > >It is useful when the whole database does **not** crash, isn't it? > >Case A (==current PostgeSQL mode): syscache grows, then OOMkiller chimes in, >kills >the database process, and it leads to the complete cluster failure (all other >PG >processes terminate themselves). > >Case B (==limit syscache by 10MiB or whatever as Tsunakawa, Takayuki >asks): a single ill-behaved process works a bit slower and/or consumers more >CPU >than the other ones. The whole DB is still alive. > >I'm quite sure "case B" is much better for the end users and for the database >administrators. > >So, +1 to Tsunakawa, Takayuki, it would be so great if there was a way to >limit the >memory consumption of a single process (e.g. syscache, workmem, etc, etc). > >Robert> However, memory usage is quite unpredictable. It depends on how >Robert> many backends are active > >The number of backends can be limited by ensuring a proper limits at >application >connection pool level and/or pgbouncer and/or things like that. > >Robert>how many copies of work_mem and/or maintenance_work_mem are in >Robert>use > >There might be other patches to cap the total use of >work_mem/maintenance_work_mem, > >Robert>I don't think we >Robert> can say that just imposing a limit on the size of the system >Robert>caches is going to be enough to reliably prevent an out of >Robert>memory condition > >The less possibilities there are for OOM the better. Quite often it is much >better to fail >a single SQL rather than kill all the DB processes. Yeah, I agree. This limit would be useful for such extreme situation. Regards, Takeshi Ideriha
RE: Protect syscache from bloating with negative cache entries
>From: Robert Haas [mailto:robertmh...@gmail.com] >On Thu, Mar 7, 2019 at 9:49 AM Tomas Vondra >wrote: >> I don't think this shows any regression, but perhaps we should do a >> microbenchmark isolating the syscache entirely? > >Well, if we need the LRU list, then yeah I think a microbenchmark would be a >good idea >to make sure we really understand what the impact of that is going to be. But >if we >don't need it and can just remove it then we don't. Just to be sure, we introduced the LRU list in this thread to find the entries less than threshold time without scanning the whole hash table. If hash table becomes large without LRU list, scanning time becomes slow. Regards, Takeshi Ideriha
Re: Protect syscache from bloating with negative cache entries
On 3/7/19 4:01 PM, Robert Haas wrote: > On Thu, Mar 7, 2019 at 9:49 AM Tomas Vondra > wrote: >> I don't think this shows any regression, but perhaps we should do a >> microbenchmark isolating the syscache entirely? > > Well, if we need the LRU list, then yeah I think a microbenchmark > would be a good idea to make sure we really understand what the impact > of that is going to be. But if we don't need it and can just remove > it then we don't. > >> What I had in mind is more along these lines: >> >> (a) track number of active syscache entries (increment when adding a new >> one, decrement when evicting one) >> >> (b) track peak number of active syscache entries >> >> (c) after clock-sweep, if (peak > K*active) where K=2 or K=4 or so, do a >> memory context swap, i.e. create a new context, copy active entries over >> and destroy the old one >> >> That would at least free() the memory. Of course, the syscache entries >> may have different sizes, so tracking just numbers of entries is just an >> approximation. But I think it'd be enough. > > Yeah, that could be done. I'm not sure how expensive it would be, and > I'm also not sure how much more effective it would be than what's > currently proposed in terms of actually freeing memory. If you free > enough dead syscache entries, you might manage to give some memory > back to the OS: after all, there may be some locality there. And even > if you don't, you'll at least prevent further growth, which might be > good enough. > I have my doubts about that happening in practice. It might happen for some workloads, but I think the locality is rather unpredictable. > We could consider doing some version of what has been proposed here > and the thing you're proposing here could later be implemented on top > of that. I mean, evicting entries at all is a prerequisite to > copy-and-compact. > Sure. I'm not saying the patch must do this to make it committable. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protect syscache from bloating with negative cache entries
On Thu, Mar 7, 2019 at 9:49 AM Tomas Vondra wrote: > I don't think this shows any regression, but perhaps we should do a > microbenchmark isolating the syscache entirely? Well, if we need the LRU list, then yeah I think a microbenchmark would be a good idea to make sure we really understand what the impact of that is going to be. But if we don't need it and can just remove it then we don't. > What I had in mind is more along these lines: > > (a) track number of active syscache entries (increment when adding a new > one, decrement when evicting one) > > (b) track peak number of active syscache entries > > (c) after clock-sweep, if (peak > K*active) where K=2 or K=4 or so, do a > memory context swap, i.e. create a new context, copy active entries over > and destroy the old one > > That would at least free() the memory. Of course, the syscache entries > may have different sizes, so tracking just numbers of entries is just an > approximation. But I think it'd be enough. Yeah, that could be done. I'm not sure how expensive it would be, and I'm also not sure how much more effective it would be than what's currently proposed in terms of actually freeing memory. If you free enough dead syscache entries, you might manage to give some memory back to the OS: after all, there may be some locality there. And even if you don't, you'll at least prevent further growth, which might be good enough. We could consider doing some version of what has been proposed here and the thing you're proposing here could later be implemented on top of that. I mean, evicting entries at all is a prerequisite to copy-and-compact. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Protect syscache from bloating with negative cache entries
Robert Haas writes: > On Wed, Mar 6, 2019 at 6:18 PM Tomas Vondra > wrote: >> Which part of the LRU approach is supposedly expensive? Updating the >> lastaccess field or moving the entries to tail? I'd guess it's the >> latter, so perhaps we can keep some sort of age field, update it less >> frequently (once per minute?), and do the clock sweep? > Move to tail (although lastaccess would be expensive if too if it > involves an extra gettimeofday() call). As I recall, the big problem with the old LRU code was loss of locality of access, in that in addition to the data associated with hot syscache entries, you were necessarily also touching list link fields associated with not-hot entries. That's bad for the CPU cache. A gettimeofday call (or any other kernel call) per syscache access would be a complete disaster. regards, tom lane
Re: Protect syscache from bloating with negative cache entries
On 3/7/19 3:34 PM, Robert Haas wrote: > On Wed, Mar 6, 2019 at 6:18 PM Tomas Vondra > wrote: >> I agree clock sweep might be sufficient, although the benchmarks done in >> this thread so far do not suggest the LRU approach is very expensive. > > I'm not sure how thoroughly it's been tested -- has someone > constructed a benchmark that does a lot of syscache lookups and > measured how much slower they get with this new code? > What I've done on v13 (and I don't think the results would be that different on the current patch, but I may rerun it if needed) is a test that creates large number of tables (up to 1M) and then accesses them randomly. I don't know if it matches what you imagine, but see [1] https://www.postgresql.org/message-id/74386116-0bc5-84f2-e614-0cff19aca2de%402ndquadrant.com I don't think this shows any regression, but perhaps we should do a microbenchmark isolating the syscache entirely? >> A simple true/false flag, as proposed by Robert, would mean we can only >> do the cleanup once per the catalog_cache_prune_min_age interval, so >> with the default value (5 minutes) the entries might be between 5 and 10 >> minutes old. That's probably acceptable, although for higher values the >> range gets wider and wider ... > > That's true, but I don't know that it matters. I'm not sure there's > much of a use case for raising this parameter to some larger value, > but even if there is, is it really worth complicating the mechanism to > make sure that we throw away entries in a more timely fashion? That's > not going to be cost-free, either in terms of CPU cycles or in terms > of code complexity. > True, although it very much depends on how expensive it would be. > Again, I think our goal should be to add the least mechanism here that > solves the problem. If we can show that a true/false flag makes poor > decisions about which entries to evict and a smarter algorithm does > better, then it's worth considering. However, my bet is that it makes > no meaningful difference. > True. >> Which part of the LRU approach is supposedly expensive? Updating the >> lastaccess field or moving the entries to tail? I'd guess it's the >> latter, so perhaps we can keep some sort of age field, update it less >> frequently (once per minute?), and do the clock sweep? > > Move to tail (although lastaccess would be expensive if too if it > involves an extra gettimeofday() call). GCLOCK, like we use for > shared_buffers, is a common approximation of LRU which tends to be a > lot less expensive to implement. We could do that here and it might > work well, but I think the question, again, is whether we really need > it. I think our goal here should just be to jettison cache entries > that are clearly worthless. It's expensive enough to reload cache > entries that any kind of aggressive eviction policy is probably a > loser, and if our goal is just to get rid of the stuff that's clearly > not being used, we don't need to be super-accurate about it. > True. >> BTW wasn't one of the cases this thread aimed to improve a session that >> accesses a lot of objects in a short period of time? That balloons the >> syscache, and while this patch evicts the entries from memory, we never >> actually release the memory back (because AllocSet just moves it into >> the freelists) and it's unlikely to get swapped out (because other >> chunks on those memory pages are likely to be still used). I've proposed >> to address that by recreating the context if it gets too bloated, and I >> think Alvaro agreed with that. But I haven't seen any further discussion >> about that. > > That's an interesting point. It seems reasonable to me to just throw > away everything and release all memory if the session has been idle > for a while, but if the session is busy doing stuff, discarding > everything in bulk like that is going to cause latency spikes. > What I had in mind is more along these lines: (a) track number of active syscache entries (increment when adding a new one, decrement when evicting one) (b) track peak number of active syscache entries (c) after clock-sweep, if (peak > K*active) where K=2 or K=4 or so, do a memory context swap, i.e. create a new context, copy active entries over and destroy the old one That would at least free() the memory. Of course, the syscache entries may have different sizes, so tracking just numbers of entries is just an approximation. But I think it'd be enough. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protect syscache from bloating with negative cache entries
On Wed, Mar 6, 2019 at 6:18 PM Tomas Vondra wrote: > I agree clock sweep might be sufficient, although the benchmarks done in > this thread so far do not suggest the LRU approach is very expensive. I'm not sure how thoroughly it's been tested -- has someone constructed a benchmark that does a lot of syscache lookups and measured how much slower they get with this new code? > A simple true/false flag, as proposed by Robert, would mean we can only > do the cleanup once per the catalog_cache_prune_min_age interval, so > with the default value (5 minutes) the entries might be between 5 and 10 > minutes old. That's probably acceptable, although for higher values the > range gets wider and wider ... That's true, but I don't know that it matters. I'm not sure there's much of a use case for raising this parameter to some larger value, but even if there is, is it really worth complicating the mechanism to make sure that we throw away entries in a more timely fashion? That's not going to be cost-free, either in terms of CPU cycles or in terms of code complexity. Again, I think our goal should be to add the least mechanism here that solves the problem. If we can show that a true/false flag makes poor decisions about which entries to evict and a smarter algorithm does better, then it's worth considering. However, my bet is that it makes no meaningful difference. > Which part of the LRU approach is supposedly expensive? Updating the > lastaccess field or moving the entries to tail? I'd guess it's the > latter, so perhaps we can keep some sort of age field, update it less > frequently (once per minute?), and do the clock sweep? Move to tail (although lastaccess would be expensive if too if it involves an extra gettimeofday() call). GCLOCK, like we use for shared_buffers, is a common approximation of LRU which tends to be a lot less expensive to implement. We could do that here and it might work well, but I think the question, again, is whether we really need it. I think our goal here should just be to jettison cache entries that are clearly worthless. It's expensive enough to reload cache entries that any kind of aggressive eviction policy is probably a loser, and if our goal is just to get rid of the stuff that's clearly not being used, we don't need to be super-accurate about it. > BTW wasn't one of the cases this thread aimed to improve a session that > accesses a lot of objects in a short period of time? That balloons the > syscache, and while this patch evicts the entries from memory, we never > actually release the memory back (because AllocSet just moves it into > the freelists) and it's unlikely to get swapped out (because other > chunks on those memory pages are likely to be still used). I've proposed > to address that by recreating the context if it gets too bloated, and I > think Alvaro agreed with that. But I haven't seen any further discussion > about that. That's an interesting point. It seems reasonable to me to just throw away everything and release all memory if the session has been idle for a while, but if the session is busy doing stuff, discarding everything in bulk like that is going to cause latency spikes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Protect syscache from bloating with negative cache entries
On 3/6/19 9:17 PM, Tom Lane wrote: > Robert Haas writes: >> OK, so this is getting simpler, but I'm wondering why we need >> dlist_move_tail() at all. It is a well-known fact that maintaining >> LRU ordering is expensive and it seems to be unnecessary for our >> purposes here. > > Yeah ... LRU maintenance was another thing that used to be in the > catcache logic and was thrown out as far too expensive. Your idea > of just using a clock sweep instead seems plausible. > I agree clock sweep might be sufficient, although the benchmarks done in this thread so far do not suggest the LRU approach is very expensive. A simple true/false flag, as proposed by Robert, would mean we can only do the cleanup once per the catalog_cache_prune_min_age interval, so with the default value (5 minutes) the entries might be between 5 and 10 minutes old. That's probably acceptable, although for higher values the range gets wider and wider ... Which part of the LRU approach is supposedly expensive? Updating the lastaccess field or moving the entries to tail? I'd guess it's the latter, so perhaps we can keep some sort of age field, update it less frequently (once per minute?), and do the clock sweep? BTW wasn't one of the cases this thread aimed to improve a session that accesses a lot of objects in a short period of time? That balloons the syscache, and while this patch evicts the entries from memory, we never actually release the memory back (because AllocSet just moves it into the freelists) and it's unlikely to get swapped out (because other chunks on those memory pages are likely to be still used). I've proposed to address that by recreating the context if it gets too bloated, and I think Alvaro agreed with that. But I haven't seen any further discussion about that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protect syscache from bloating with negative cache entries
Robert Haas writes: > OK, so this is getting simpler, but I'm wondering why we need > dlist_move_tail() at all. It is a well-known fact that maintaining > LRU ordering is expensive and it seems to be unnecessary for our > purposes here. Yeah ... LRU maintenance was another thing that used to be in the catcache logic and was thrown out as far too expensive. Your idea of just using a clock sweep instead seems plausible. regards, tom lane
Re: Protect syscache from bloating with negative cache entries
On Fri, Mar 1, 2019 at 3:33 AM Kyotaro HORIGUCHI wrote: > > > It is artificial (or acutually wont't be repeatedly executed in a > > > session) but anyway what can get benefit from > > > catalog_cache_memory_target would be a kind of extreme. > > > > I agree. So then let's not have it. > > Ah... Yeah! I see. Andres' concern was that crucial syscache > entries might be blown away during a long idle time. If that > happens, it's enough to just turn off in the almost all of such > cases. +1. > In the attached v18, >catalog_cache_memory_target is removed, >removed some leftover of removing the hard limit feature, >separated catcache clock update during a query into 0003. >attached 0004 (monitor part) in order just to see how it is working. > > v18-0001-Add-dlist_move_tail: > Just adds dlist_move_tail > > v18-0002-Remove-entries-that-haven-t-been-used-for-a-certain-: > Revised pruning feature. OK, so this is getting simpler, but I'm wondering why we need dlist_move_tail() at all. It is a well-known fact that maintaining LRU ordering is expensive and it seems to be unnecessary for our purposes here. Can't CatCacheCleanupOldEntries just use a single-bit flag on the entry? If the flag is set, clear it. If the flag is clear, drop the entry. When an entry is used, set the flag. Then, entries will go away if they are not used between consecutive calls to CatCacheCleanupOldEntries. Sure, that might be slightly less accurate in terms of which entries get thrown away, but I bet it makes no real difference. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
RE: Protect syscache from bloating with negative cache entries
>From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com] >> [Size=800, iter=1,000,000] >> Master |15.763 >> Patched|16.262 (+3%) >> >> [Size=32768, iter=1,000,000] >> Master |61.3076 >> Patched|62.9566 (+2%) > >What's the unit, second or millisecond? Millisecond. >Why is the number of digits to the right of the decimal point? > >Is the measurement correct? I'm wondering because the difference is larger in >the >latter case. Isn't the accounting processing almost the same in both cases? >* former: 16.262 - 15.763 = 4.99 >* latter: 62.956 - 61.307 = 16.49 >I think the overhead is sufficiently small. It may get even smaller with a >trivial tweak. > >You added the new member usedspace at the end of MemoryContextData. The >original size of MemoryContextData is 72 bytes, and Intel Xeon's cache line is >64 bytes. >So, the new member will be on a separate cache line. Try putting usedspace >before >the name member. OK. I changed the order of MemoryContextData members to fit usedspace into one cacheline. I disabled all the catcache eviction mechanism in patched one and compared it with master to investigate that overhead of memory accounting become small enough. The settings are almost same as the last email. But last time the number of trials was 50 so I increased it and tried 5000 times to calculate the average figure (rounded off to three decimal place). [Size=800, iter=1,000,000] Master |15.64 ms Patched|16.26 ms (+4%) The difference is 0.62ms [Size=32768, iter=1,000,000] Master |61.39 ms Patched|60.99 ms (-1%) I guess there is around 2% noise. But based on this experiment it seems the overhead small. Still there is some overhead but it can be covered by some other manipulation like malloc(). Does this result show that hard-limit size option with memory accounting doesn't harm to usual users who disable hard limit size option? Regards, Takeshi Ideriha v15-0001-3-Add-dlist_move_tail.patch Description: v15-0001-3-Add-dlist_move_tail.patch binuIZWS0c0ho.bin Description: v15-0002-3-ideriha-Memory-consumption-report-reature-of-memorycontext.patch v15-0003-3-ideriha-Remove-CatCache-Entries.patch Description: v15-0003-3-ideriha-Remove-CatCache-Entries.patch
Re: Protect syscache from bloating with negative cache entries
At Tue, 26 Feb 2019 10:55:18 -0500, Robert Haas wrote in > On Mon, Feb 25, 2019 at 1:27 AM Kyotaro HORIGUCHI > wrote: > > > I'd like to see some evidence that catalog_cache_memory_target has any > > > value, vs. just always setting it to zero. > > > > It is artificial (or acutually wont't be repeatedly executed in a > > session) but anyway what can get benefit from > > catalog_cache_memory_target would be a kind of extreme. > > I agree. So then let's not have it. Ah... Yeah! I see. Andres' concern was that crucial syscache entries might be blown away during a long idle time. If that happens, it's enough to just turn off in the almost all of such cases. We no longer need to count memory usage without the feature. That sutff is moved to monitoring feature, which is out of the scope of the current status of this patch. > We shouldn't add more mechanism here than actually has value. It > seems pretty clear that keeping cache entries that go unused for long > periods can't be that important; even if we need them again > eventually, reloading them every 5 or 10 minutes can't hurt that much. > On the other hand, I think it's also pretty clear that evicting cache > entries that are being used frequently will have disastrous effects on > performance; as I noted in the other email I just sent, consider the > effects of CLOBBER_CACHE_ALWAYS. No reasonable user is going to want > to incur a massive slowdown to save a little bit of memory. > > I see that *in theory* there is a value to > catalog_cache_memory_target, because *maybe* there is a workload where > tuning that GUC will lead to better performance at lower memory usage > than any competing proposal. But unless we can actually see an > example of such a workload, which so far I don't, we're adding a knob > that everybody has to think about how to tune when in fact we have no > idea how to tune it or whether it even needs to be tuned. That > doesn't make sense. We have to be able to document the parameters we > have and explain to users how they should be used. And as far as this > parameter is concerned I think we are not at that point. In the attached v18, catalog_cache_memory_target is removed, removed some leftover of removing the hard limit feature, separated catcache clock update during a query into 0003. attached 0004 (monitor part) in order just to see how it is working. v18-0001-Add-dlist_move_tail: Just adds dlist_move_tail v18-0002-Remove-entries-that-haven-t-been-used-for-a-certain-: Revised pruning feature. v18-0003-Asynchronous-update-of-catcache-clock: Separated catcache clock update feature. v18-0004-Syscache-usage-tracking-feature: Usage tracking feature. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 54388a7452eda1faadaa108e1bc21d51844f9224 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 7 Feb 2019 14:56:07 +0900 Subject: [PATCH 1/6] Add dlist_move_tail We have dlist_push_head/tail and dlist_move_head but not dlist_move_tail. Add it. --- src/include/lib/ilist.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h index b1a5974ee4..659ab1ac87 100644 --- a/src/include/lib/ilist.h +++ b/src/include/lib/ilist.h @@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node) dlist_check(head); } +/* + * Move element from its current position in the list to the tail position in + * the same list. + * + * Undefined behaviour if 'node' is not already part of the list. + */ +static inline void +dlist_move_tail(dlist_head *head, dlist_node *node) +{ + /* fast path if it's already at the tail */ + if (head->head.prev == node) + return; + + dlist_delete(node); + dlist_push_tail(head, node); + + dlist_check(head); +} + /* * Check whether 'node' has a following node. * Caution: unreliable if 'node' is not in the list. -- 2.16.3 >From c79d5fc86f45e6545cbc257040e46125ffc5cb92 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Fri, 1 Mar 2019 13:32:51 +0900 Subject: [PATCH 2/6] Remove entries that haven't been used for a certain time Catcache entries happen to be left alone for several reasons. It is not desirable that such useless entries eat up memory. Catcache pruning feature removes entries that haven't been accessed for a certain time before enlarging hash array. --- doc/src/sgml/config.sgml | 19 src/backend/tcop/postgres.c | 2 + src/backend/utils/cache/catcache.c| 122 +- src/backend/utils/misc/guc.c | 12 +++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/utils/catcache.h | 18 6 files changed, 171 insertions(+), 3 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6d42b7afe7..737a156bb4 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1661,6 +1661,25 @@ include_dir
Re: Protect syscache from bloating with negative cache entries
Robert> This email thread is really short on clear demonstrations that X or Y Robert> is useful. It is useful when the whole database does **not** crash, isn't it? Case A (==current PostgeSQL mode): syscache grows, then OOMkiller chimes in, kills the database process, and it leads to the complete cluster failure (all other PG processes terminate themselves). Case B (==limit syscache by 10MiB or whatever as Tsunakawa, Takayuki asks): a single ill-behaved process works a bit slower and/or consumers more CPU than the other ones. The whole DB is still alive. I'm quite sure "case B" is much better for the end users and for the database administrators. So, +1 to Tsunakawa, Takayuki, it would be so great if there was a way to limit the memory consumption of a single process (e.g. syscache, workmem, etc, etc). Robert> However, memory usage is quite unpredictable. It depends on how many Robert> backends are active The number of backends can be limited by ensuring a proper limits at application connection pool level and/or pgbouncer and/or things like that. Robert>how many copies of work_mem and/or Robert> maintenance_work_mem are in use There might be other patches to cap the total use of work_mem/maintenance_work_mem, Robert>I don't think we Robert> can say that just imposing a limit on the size of the system caches is Robert> going to be enough to reliably prevent an out of memory condition The less possibilities there are for OOM the better. Quite often it is much better to fail a single SQL rather than kill all the DB processes. Vladimir
RE: Protect syscache from bloating with negative cache entries
From: Ideriha, Takeshi/出利葉 健 > [Size=800, iter=1,000,000] > Master |15.763 > Patched|16.262 (+3%) > > [Size=32768, iter=1,000,000] > Master |61.3076 > Patched|62.9566 (+2%) What's the unit, second or millisecond? Why is the number of digits to the right of the decimal point? Is the measurement correct? I'm wondering because the difference is larger in the latter case. Isn't the accounting processing almost the sane in both cases? * former: 16.262 - 15.763 = 4.99 * latter: 62.956 - 61.307 = 16.49 > At least compared to previous HashAg version, the overhead is smaller. > It has some overhead but is increase by 2 or 3% a little bit? I think the overhead is sufficiently small. It may get even smaller with a trivial tweak. You added the new member usedspace at the end of MemoryContextData. The original size of MemoryContextData is 72 bytes, and Intel Xeon's cache line is 64 bytes. So, the new member will be on a separate cache line. Try putting usedspace before the name member. Regards Takayuki Tsunakawa
Re: Protect syscache from bloating with negative cache entries
On Wed, Feb 27, 2019 at 3:16 AM Ideriha, Takeshi wrote: > I'm afraid I may be quibbling about it. > What about users who understand performance drops but don't want to > add memory or decrease concurrency? > I think that PostgreSQL has a parameter > which most of users don't mind and use is as default > but a few of users want to change it. > In this case as you said, introducing hard limit parameter causes > performance decrease significantly so how about adding detailed caution > to the document like planner cost parameter? There's nothing wrong with a parameter that is useful to some people and harmless to everyone else, but the people who are proposing that parameter still have to demonstrate that it has those properties. This email thread is really short on clear demonstrations that X or Y is useful. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
RE: Protect syscache from bloating with negative cache entries
>From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com] >From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com] >> I measured the memory context accounting overhead using Tomas's tool >> palloc_bench, which he made it a while ago in the similar discussion. >> https://www.postgresql.org/message-id/53f7e83c.3020...@fuzzy.cz >> >> This tool is a little bit outdated so I fixed it but basically I >> followed him. >> Things I did: >> - make one MemoryContext >> - run both palloc() and pfree() for 32kB area 1,000,000 times. >> - And measure this time >And are you sure you didn't enable assert checking? Ah, sorry.. I misconfigured it. >I'm afraid the measurement is not correct. First, the older discussion below >shows >that the accounting overhead is much, much smaller, even with a more complex >accounting. >Second, allocation/free of memory > 8 KB calls malloc()/free(). I guess the >accounting overhead will be more likely to be hidden under the overhead of >malloc() >and free(). What we'd like to know the overhead when malloc() and free() are >not >called. Here is the average of 50 times measurement. Palloc-pfree for 800byte with 1,000,000 times, and 32kB with 1,000,000 times. I checked malloc is not called at size=800 using gdb. [Size=800, iter=1,000,000] Master |15.763 Patched|16.262 (+3%) [Size=32768, iter=1,000,000] Master |61.3076 Patched|62.9566 (+2%) At least compared to previous HashAg version, the overhead is smaller. It has some overhead but is increase by 2 or 3% a little bit? Regards, Takeshi Ideriha
RE: Protect syscache from bloating with negative cache entries
From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com] > I measured the memory context accounting overhead using Tomas's tool > palloc_bench, > which he made it a while ago in the similar discussion. > https://www.postgresql.org/message-id/53f7e83c.3020...@fuzzy.cz > > This tool is a little bit outdated so I fixed it but basically I followed > him. > Things I did: > - make one MemoryContext > - run both palloc() and pfree() for 32kB area 1,000,000 times. > - And measure this time > > The result shows that master is 30 times faster than patched one. > So as Andres mentioned in upper thread it seems it has overhead. > > [master (without v15 patch)] > 61.52 ms > 60.96 ms > 61.40 ms > 61.42 ms > 61.14 ms > > [with v15 patch] > 1838.02 ms > 1754.84 ms > 1755.83 ms > 1789.69 ms > 1789.44 ms > I'm afraid the measurement is not correct. First, the older discussion below shows that the accounting overhead is much, much smaller, even with a more complex accounting. 9.5: Better memory accounting, towards memory-bounded HashAg https://www.postgresql.org/message-id/flat/1407012053.15301.53.camel%40jeff-desktop Second, allocation/free of memory > 8 KB calls malloc()/free(). I guess the accounting overhead will be more likely to be hidden under the overhead of malloc() and free(). What we'd like to know the overhead when malloc() and free() are not called. And are you sure you didn't enable assert checking? Regards Takayuki Tsunakawa
RE: Protect syscache from bloating with negative cache entries
>From: Robert Haas [mailto:robertmh...@gmail.com] > >On Mon, Feb 25, 2019 at 3:50 AM Tsunakawa, Takayuki > wrote: >> How can I make sure that this context won't exceed, say, 10 MB to avoid OOM? > >As Tom has said before and will probably say again, I don't think you actually >want that. >We know that PostgreSQL gets roughly 100x slower with the system caches >disabled >- try running with CLOBBER_CACHE_ALWAYS. If you are accessing the same system >cache entries repeatedly in a loop - which is not at all an unlikely scenario, >just run the >same query or sequence of queries in a loop - and if the number of entries >exceeds >10MB even, perhaps especially, by just a tiny bit, you are going to see a >massive >performance hit. >Maybe it won't be 100x because some more-commonly-used entries will always stay >cached, but it's going to be really big, I think. > >Now you could say - well it's still better than running out of memory. >However, memory usage is quite unpredictable. It depends on how many backends >are active and how many copies of work_mem and/or maintenance_work_mem are in >use, among other things. I don't think we can say that just imposing a limit >on the >size of the system caches is going to be enough to reliably prevent an out of >memory >condition unless the other use of memory on the machine happens to be extremely >stable. >So I think what's going to happen if you try to impose a hard-limit on the >size of the >system cache is that you will cause some workloads to slow down by 3x or more >without actually preventing out of memory conditions. What you need to do is >accept >that system caches need to grow as big as they need to grow, and if that >causes you >to run out of memory, either buy more memory or reduce the number of concurrent >sessions you allow. It would be fine to instead limit the cache memory if >those cache >entries only had a mild effect on performance, but I don't think that's the >case. I'm afraid I may be quibbling about it. What about users who understand performance drops but don't want to add memory or decrease concurrency? I think that PostgreSQL has a parameter which most of users don't mind and use is as default but a few of users want to change it. In this case as you said, introducing hard limit parameter causes performance decrease significantly so how about adding detailed caution to the document like planner cost parameter? Regards, Takeshi Ideriha
RE: Protect syscache from bloating with negative cache entries
>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com] >>>* 0001: add dlist_push_tail() ... as is >>>* 0002: memory accounting, with correction based on feedback >>>* 0003: merge the original 0003 and 0005, with correction based on >>>feedback >> >>Attached are simpler version based on Horiguchi san's ver15 patch, >>which means cache is pruned by both time and size. >>(Still cleanup function is complex but it gets much simpler.) > >I don't mean to disregard what Horiguchi san and others have developed and >discussed. >But I refactored again the v15 patch to reduce complexity of v15 patch because >it >seems to me one of the reason for dropping feature for pruning by size stems >from >code complexity. > >Another thing is there's been discussed about over memory accounting overhead >but >the overhead effect hasn't been measured in this thread. So I'd like to >measure it. I measured the memory context accounting overhead using Tomas's tool palloc_bench, which he made it a while ago in the similar discussion. https://www.postgresql.org/message-id/53f7e83c.3020...@fuzzy.cz This tool is a little bit outdated so I fixed it but basically I followed him. Things I did: - make one MemoryContext - run both palloc() and pfree() for 32kB area 1,000,000 times. - And measure this time The result shows that master is 30 times faster than patched one. So as Andres mentioned in upper thread it seems it has overhead. [master (without v15 patch)] 61.52 ms 60.96 ms 61.40 ms 61.42 ms 61.14 ms [with v15 patch] 1838.02 ms 1754.84 ms 1755.83 ms 1789.69 ms 1789.44 ms Regards, Takeshi Ideriha
Re: Protect syscache from bloating with negative cache entries
On Mon, Feb 25, 2019 at 1:27 AM Kyotaro HORIGUCHI wrote: > > I'd like to see some evidence that catalog_cache_memory_target has any > > value, vs. just always setting it to zero. > > It is artificial (or acutually wont't be repeatedly executed in a > session) but anyway what can get benefit from > catalog_cache_memory_target would be a kind of extreme. I agree. So then let's not have it. We shouldn't add more mechanism here than actually has value. It seems pretty clear that keeping cache entries that go unused for long periods can't be that important; even if we need them again eventually, reloading them every 5 or 10 minutes can't hurt that much. On the other hand, I think it's also pretty clear that evicting cache entries that are being used frequently will have disastrous effects on performance; as I noted in the other email I just sent, consider the effects of CLOBBER_CACHE_ALWAYS. No reasonable user is going to want to incur a massive slowdown to save a little bit of memory. I see that *in theory* there is a value to catalog_cache_memory_target, because *maybe* there is a workload where tuning that GUC will lead to better performance at lower memory usage than any competing proposal. But unless we can actually see an example of such a workload, which so far I don't, we're adding a knob that everybody has to think about how to tune when in fact we have no idea how to tune it or whether it even needs to be tuned. That doesn't make sense. We have to be able to document the parameters we have and explain to users how they should be used. And as far as this parameter is concerned I think we are not at that point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Protect syscache from bloating with negative cache entries
On Mon, Feb 25, 2019 at 3:50 AM Tsunakawa, Takayuki wrote: > How can I make sure that this context won't exceed, say, 10 MB to avoid OOM? As Tom has said before and will probably say again, I don't think you actually want that. We know that PostgreSQL gets roughly 100x slower with the system caches disabled - try running with CLOBBER_CACHE_ALWAYS. If you are accessing the same system cache entries repeatedly in a loop - which is not at all an unlikely scenario, just run the same query or sequence of queries in a loop - and if the number of entries exceeds 10MB even, perhaps especially, by just a tiny bit, you are going to see a massive performance hit. Maybe it won't be 100x because some more-commonly-used entries will always stay cached, but it's going to be really big, I think. Now you could say - well it's still better than running out of memory. However, memory usage is quite unpredictable. It depends on how many backends are active and how many copies of work_mem and/or maintenance_work_mem are in use, among other things. I don't think we can say that just imposing a limit on the size of the system caches is going to be enough to reliably prevent an out of memory condition unless the other use of memory on the machine happens to be extremely stable. So I think what's going to happen if you try to impose a hard-limit on the size of the system cache is that you will cause some workloads to slow down by 3x or more without actually preventing out of memory conditions. What you need to do is accept that system caches need to grow as big as they need to grow, and if that causes you to run out of memory, either buy more memory or reduce the number of concurrent sessions you allow. It would be fine to instead limit the cache memory if those cache entries only had a mild effect on performance, but I don't think that's the case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
RE: Protect syscache from bloating with negative cache entries
>>From: Tsunakawa, Takayuki >>Ideriha-san, >>Could you try simplifying the v15 patch set to see how simple the code >>would look or not? That is: >> >>* 0001: add dlist_push_tail() ... as is >>* 0002: memory accounting, with correction based on feedback >>* 0003: merge the original 0003 and 0005, with correction based on >>feedback > >Attached are simpler version based on Horiguchi san's ver15 patch, which means >cache is pruned by both time and size. >(Still cleanup function is complex but it gets much simpler.) I don't mean to disregard what Horiguchi san and others have developed and discussed. But I refactored again the v15 patch to reduce complexity of v15 patch because it seems to me one of the reason for dropping feature for pruning by size stems from code complexity. Another thing is there's been discussed about over memory accounting overhead but the overhead effect hasn't been measured in this thread. So I'd like to measure it. Regards, Takeshi Ideriha v15-0001-2-Add-dlist_move_tail.patch Description: v15-0001-2-Add-dlist_move_tail.patch bin7aPjzkMJvE.bin Description: v15-0002-2-ideriha-Memory-consumption-report-reature-of-memorycontext.patch v15-0003-2-ideriha-Remove-CatCache-Entries.patch Description: v15-0003-2-ideriha-Remove-CatCache-Entries.patch
RE: Protect syscache from bloating with negative cache entries
From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp] > - If you find the process too much "bloat"s and you (intuirively) > suspect the cause is system cache, set it to certain shorter > value, say 1 minutes, and set the catalog_cache_memory_target > to allowable amount of memory for each process. The memory > usage will be stable at (un)certain amount above the target. Could you guide me how to tune these parameters in an example scenario? Let me take the original problematic case referenced at the beginning of this thread. That is: * A PL/pgSQL function that creates a temp table, accesses it, (accesses other non-temp tables), and drop the temp table. * An application repeatedly begins a transaction, calls the stored function, and commits the transaction. With v16 patch applied, and leaving the catalog_cache_xxx parameters set to their defaults, CacheMemoryContext continued to increase as follows: CacheMemoryContext: 1065016 total in 9 blocks; 104168 free (17 chunks); 960848 used CacheMemoryContext: 8519736 total in 12 blocks; 3765504 free (19 chunks); 4754232 used CacheMemoryContext: 25690168 total in 14 blocks; 8372096 free (21 chunks); 17318072 used CacheMemoryContext: 42991672 total in 16 blocks; 11741024 free (21761 chunks); 31250648 used How can I make sure that this context won't exceed, say, 10 MB to avoid OOM? I'm afraid that once the catcache hash table becomes large in a short period, the eviction would happen less frequently, leading to memory bloat. Regards Takayuki Tsunakawa
RE: Protect syscache from bloating with negative cache entries
From: Robert Haas [mailto:robertmh...@gmail.com] > I don't understand the idea that we would add something to PostgreSQL > without proving that it has value. Sure, other systems have somewhat > similar systems, and they have knobs to tune them. But, first, we > don't know that those other systems made all the right decisions, and > second, even they are, that doesn't mean that we'll derive similar > benefits in a system with a completely different code base and many > other internal differences. I understand that general idea. So, I don't have an idea why the proposed approach, eviction based only on elapsed time only at hash table expansion, is better for PostgreSQL's code base and other internal differences... > You need to demonstrate that each and every GUC you propose to add has > a real, measurable benefit in some plausible scenario. You can't just > argue that other people have something kinda like this so we should > have it too. Or, well, you can argue that, but if you do, then -1 > from me. The benefit of the size limit are: * Controllable and predictable memory usage. The DBA can be sure that OOM won't happen. * Smoothed (non-abnormal) transaction response time. This is due to the elimination of bulk eviction of cache entries. I'm not sure how to tune catalog_cache_prune_min_age and catalog_cache_memory_target. Let me pick up a test scenario in a later mail in response to Horiguchi-san. Regards Takayuki Tsunakawa
Re: Protect syscache from bloating with negative cache entries
At Mon, 25 Feb 2019 15:23:22 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20190225.152322.104148315.horiguchi.kyot...@lab.ntt.co.jp> > I think the two parameters are to be tuned in the following > steps. > > - If the default setting sutisfies you, leave it alone. (as a > general suggestion) > > - If you find your (syscache-sensitive) query are to be executed > with rather longer intervals, say 10-30 minutes, and it gets > slower than shorter intervals, consider increase > catalog_cache_prune_min_age to about the query interval. If you > don't suffer process-bloat, that's fine. > > - If you find the process too much "bloat"s and you (intuirively) > suspect the cause is system cache, set it to certain shorter > value, say 1 minutes, and set the catalog_cache_memory_target > to allowable amount of memory for each process. The memory > usage will be stable at (un)certain amount above the target. > > > Or, if you want determine the setting previously with rather > strict limit, and if the monitoring feature were a part of this > patchset, a user can check how much memory is used for the query. > > $ perl -e 'print "set track_catalog_cache_usage_interval = 1000;\n"; for > (0..) { print "CREATE TABLE foo$_ PARTITION OF foo FOR VALUES WITH > (MODULUS 1, REMAINDER $_);\n"; } print "select sum(size) from > pg_stat_syscache";' | psql > >sum > - > 7088523 It's not substantial, but the number is for catalog_cache_prune_min_age = 300s, I had 12MB when it is disabled. perl -e 'print "set catalog_cache_prune_min_age to 0; set track_catalog_cache_usage_interval = 1000;\n"; for (0..) { print "CREATE TABLE foo$_ PARTITION OF foo FOR VALUES WITH (MODULUS 1, REMAINDER $_);\n"; } print "select sum(size) from pg_stat_syscache";' | psql sum -- 12642321 > In this case, set catalog_cache_memory_target to 7MB and > catalog_cache_memory_target to '1min'. Since the target doesn't > work strictly (checked only at every resizing time), possibly > you need further tuning. regards. - Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
At Wed, 20 Feb 2019 13:09:08 -0500, Robert Haas wrote in > On Tue, Feb 19, 2019 at 11:15 PM Kyotaro HORIGUCHI > wrote: > > Difference from v15: > > > > Removed AllocSet accounting stuff. We use approximate memory > > size for catcache. > > > > Removed prune-by-number(or size) stuff. > > > > Adressing comments from Tsunakawa-san and Ideriha-san . > > > > Separated catcache monitoring feature. (Removed from this set) > > (But it is crucial to check this feature...) > > > > Is this small enough ? > > The commit message in 0002 says 'This also can put a hard limit on the > number of catcache entries.' but neither of the GUCs that you've > documented have that effect. Is that a leftover from a previous > version? Mmm. Right. Thank you for pointing that and sorry for that. Fixed it including another mistake in the commit message in my repo. It will appear in the next version. | Remove entries that haven't been used for a certain time | | Catcache entries can be left alone for several reasons. It is not | desirable that they eat up memory. With this patch, entries that | haven't been used for a certain time are considered to be removed | before enlarging hash array. > I'd like to see some evidence that catalog_cache_memory_target has any > value, vs. just always setting it to zero. I came up with the > following somewhat artificial example that shows that it might have > value. > > rhaas=# create table foo (a int primary key, b text) partition by hash (a); > [rhaas pgsql]$ perl -e 'for (0..) { print "CREATE TABLE foo$_ > PARTITION OF foo FOR VALUES WITH (MODULUS 1, REMAINDER $_);\n"; }' > | psql > > First execution of 'select * from foo' in a brand new session takes > about 1.9 seconds; subsequent executions take about 0.7 seconds. So, > if catalog_cache_memory_target were set to a high enough value to > allow all of that stuff to remain in cache, we could possibly save > about 1.2 seconds coming off the blocks after a long idle period. > That might be enough to justify having the parameter. But I'm not > quite sure how high the value would need to be set to actually get the > benefit in a case like that, or what happens if you set it to a value > that's not quite high enough. It is artificial (or acutually wont't be repeatedly executed in a session) but anyway what can get benefit from catalog_cache_memory_target would be a kind of extreme. I think the two parameters are to be tuned in the following steps. - If the default setting sutisfies you, leave it alone. (as a general suggestion) - If you find your (syscache-sensitive) query are to be executed with rather longer intervals, say 10-30 minutes, and it gets slower than shorter intervals, consider increase catalog_cache_prune_min_age to about the query interval. If you don't suffer process-bloat, that's fine. - If you find the process too much "bloat"s and you (intuirively) suspect the cause is system cache, set it to certain shorter value, say 1 minutes, and set the catalog_cache_memory_target to allowable amount of memory for each process. The memory usage will be stable at (un)certain amount above the target. Or, if you want determine the setting previously with rather strict limit, and if the monitoring feature were a part of this patchset, a user can check how much memory is used for the query. $ perl -e 'print "set track_catalog_cache_usage_interval = 1000;\n"; for (0..) { print "CREATE TABLE foo$_ PARTITION OF foo FOR VALUES WITH (MODULUS 1, REMAINDER $_);\n"; } print "select sum(size) from pg_stat_syscache";' | psql sum - 7088523 In this case, set catalog_cache_memory_target to 7MB and catalog_cache_memory_target to '1min'. Since the target doesn't work strictly (checked only at every resizing time), possibly you need further tuning. > that's not quite high enough. I think it might be good to play around > some more with cases like this, just to get a feeling for how much > time you can save in exchange for how much memory. All kind of tuning is something of that kind, I think. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Protect syscache from bloating with negative cache entries
On Thu, Feb 21, 2019 at 1:38 AM Tsunakawa, Takayuki wrote: > Why don't we consider this just like the database cache and other DBMS's > dictionary caches? That is, > > * If you want to avoid infinite memory bloat, set the upper limit on size. > > * To find a better limit, check the hit ratio with the statistics view (based > on Horiguchi-san's original 0004 patch, although that seems modification > anyway) > > Why do people try to get away from a familiar idea... Am I missing something? I don't understand the idea that we would add something to PostgreSQL without proving that it has value. Sure, other systems have somewhat similar systems, and they have knobs to tune them. But, first, we don't know that those other systems made all the right decisions, and second, even they are, that doesn't mean that we'll derive similar benefits in a system with a completely different code base and many other internal differences. You need to demonstrate that each and every GUC you propose to add has a real, measurable benefit in some plausible scenario. You can't just argue that other people have something kinda like this so we should have it too. Or, well, you can argue that, but if you do, then -1 from me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
RE: Protect syscache from bloating with negative cache entries
>From: Tsunakawa, Takayuki >Ideriha-san, >Could you try simplifying the v15 patch set to see how simple the code would >look or >not? That is: > >* 0001: add dlist_push_tail() ... as is >* 0002: memory accounting, with correction based on feedback >* 0003: merge the original 0003 and 0005, with correction based on feedback Attached are simpler version based on Horiguchi san's ver15 patch, which means cache is pruned by both time and size. (Still cleanup function is complex but it gets much simpler.) Regards, Takeshi Ideriha v15-0003-ideriha-Remove-CatCache-Entries.patch Description: v15-0003-ideriha-Remove-CatCache-Entries.patch binBLOINvJH0o.bin Description: v15-0002-ideriha-Memory-consumption-report-reature-of-memorycontext.patch v15-0001-Add-dlist_move_tail.patch Description: v15-0001-Add-dlist_move_tail.patch
Re: Protect syscache from bloating with negative cache entries
On Tue, Feb 19, 2019 at 07:08:14AM +, Tsunakawa, Takayuki wrote: > We all have to manage things within resource constraints. The DBA > wants to make sure the server doesn't overuse memory to avoid crash > or slowdown due to swapping. Oracle does it, and another open source > database, MySQL, does it too. PostgreSQL does it with shared_buffers, > wal_buffers, and work_mem (within a single session). Then, I thought > it's natural to do it with catcache/relcache/plancache. I already addressed these questions in an email from Feb 14: https://www.postgresql.org/message-id/20190214154955.gb19...@momjian.us I understand the operational needs of limiting resources in some cases, but there is also the history of OS's using working set to allocate things, which didn't work too well: https://en.wikipedia.org/wiki/Working_set I think we need to address the most pressing problem of unlimited cache size bloat and then take a holistic look at all memory allocation. If we are going to address that in a global way, I don't see the relation cache as the place to start. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
RE: Protect syscache from bloating with negative cache entries
From: Robert Haas [mailto:robertmh...@gmail.com] > That might be enough to justify having the parameter. But I'm not > quite sure how high the value would need to be set to actually get the > benefit in a case like that, or what happens if you set it to a value > that's not quite high enough. I think it might be good to play around > some more with cases like this, just to get a feeling for how much > time you can save in exchange for how much memory. Why don't we consider this just like the database cache and other DBMS's dictionary caches? That is, * If you want to avoid infinite memory bloat, set the upper limit on size. * To find a better limit, check the hit ratio with the statistics view (based on Horiguchi-san's original 0004 patch, although that seems modification anyway) Why do people try to get away from a familiar idea... Am I missing something? Ideriha-san, Could you try simplifying the v15 patch set to see how simple the code would look or not? That is: * 0001: add dlist_push_tail() ... as is * 0002: memory accounting, with correction based on feedback * 0003: merge the original 0003 and 0005, with correction based on feedback Regards Takayuki Tsunakawa
RE: Protect syscache from bloating with negative cache entries
From: Ideriha, Takeshi/出利葉 健 > I checked it with perf record -avg and perf report. > The following shows top 20 symbols during benchmark including kernel space. > The main difference between master (unpatched) and patched one seems that > patched one consumes cpu catcache-evict-and-refill functions including > SearchCatCacheMiss(), CatalogCacheCreateEntry(), > CatCacheCleanupOldEntries(). > So it seems to me that these functions needs further inspection > to suppress the performace decline as much as possible Thank you. It's good to see the expected functions, rather than strange behavior. The performance drop is natural just like the database cache's hit ratio is low. The remedy for performance by the user is also the same as the database cache -- increase the catalog cache. Regards Takayuki Tsunakawa
RE: Protect syscache from bloating with negative cache entries
>From: Tsunakawa, Takayuki >>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com] >> number of tables | 100 |1000|1 >> --- >> TPS (master) |10966 |10654 |9099 >> TPS (patch)| 11137 (+1%) |10710 (+0%) |772 (-91%) >> >> It seems that before cache exceeding the limit (no pruning at 100 and >> 1000), the results are almost same with master but after exceeding the >> limit (at >> 1) >> the decline happens. > >How many concurrent clients? One client (default setting). >Can you show the perf's call graph sampling profiles of both the unpatched and >patched version, to confirm that the bottleneck is around catcache eviction >and refill? I checked it with perf record -avg and perf report. The following shows top 20 symbols during benchmark including kernel space. The main difference between master (unpatched) and patched one seems that patched one consumes cpu catcache-evict-and-refill functions including SearchCatCacheMiss(), CatalogCacheCreateEntry(), CatCacheCleanupOldEntries(). So it seems to me that these functions needs further inspection to suppress the performace decline as much as possible Master(%) master|patch (%) patch 51.25% cpu_startup_entry | 51.45% cpu_startup_entry 51.13% arch_cpu_idle | 51.19% arch_cpu_idle 51.13% default_idle| 51.19% default_idle 51.13% native_safe_halt| 50.95% native_safe_halt 36.27% PostmasterMain | 46.98% PostmasterMain 36.27% main| 46.98% main 36.27% __libc_start_main | 46.98% __libc_start_main 36.07% ServerLoop | 46.93% ServerLoop 35.75% PostgresMain| 46.89% PostgresMain 26.03% exec_simple_query | 45.99% exec_simple_query 26.00% rest_init | 43.40% SearchCatCacheMiss 26.00% start_kernel| 42.80% CatalogCacheCreateEntry 26.00% x86_64_start_reservations | 42.75% CatCacheCleanupOldEntries 26.00% x86_64_start_kernel | 27.04% rest_init 25.26% start_secondary | 27.04% start_kernel 10.25% pg_plan_queries | 27.04% x86_64_start_reservations 10.17% pg_plan_query | 27.04% x86_64_start_kernel 10.16% main| 24.42% start_secondary 10.16% __libc_start_main | 22.35% pg_analyze_and_rewrite 10.03% standard_planner| 22.35% parse_analyze Regards, Takeshi Ideriha
Re: Protect syscache from bloating with negative cache entries
On Tue, Feb 19, 2019 at 11:15 PM Kyotaro HORIGUCHI wrote: > Difference from v15: > > Removed AllocSet accounting stuff. We use approximate memory > size for catcache. > > Removed prune-by-number(or size) stuff. > > Adressing comments from Tsunakawa-san and Ideriha-san . > > Separated catcache monitoring feature. (Removed from this set) > (But it is crucial to check this feature...) > > Is this small enough ? The commit message in 0002 says 'This also can put a hard limit on the number of catcache entries.' but neither of the GUCs that you've documented have that effect. Is that a leftover from a previous version? I'd like to see some evidence that catalog_cache_memory_target has any value, vs. just always setting it to zero. I came up with the following somewhat artificial example that shows that it might have value. rhaas=# create table foo (a int primary key, b text) partition by hash (a); [rhaas pgsql]$ perl -e 'for (0..) { print "CREATE TABLE foo$_ PARTITION OF foo FOR VALUES WITH (MODULUS 1, REMAINDER $_);\n"; }' | psql First execution of 'select * from foo' in a brand new session takes about 1.9 seconds; subsequent executions take about 0.7 seconds. So, if catalog_cache_memory_target were set to a high enough value to allow all of that stuff to remain in cache, we could possibly save about 1.2 seconds coming off the blocks after a long idle period. That might be enough to justify having the parameter. But I'm not quite sure how high the value would need to be set to actually get the benefit in a case like that, or what happens if you set it to a value that's not quite high enough. I think it might be good to play around some more with cases like this, just to get a feeling for how much time you can save in exchange for how much memory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Protect syscache from bloating with negative cache entries
At Thu, 14 Feb 2019 00:40:10 -0800, Andres Freund wrote in <20190214084010.bdn6tmba2j7sz...@alap3.anarazel.de> > Hi, > > On 2019-02-13 15:31:14 +0900, Kyotaro HORIGUCHI wrote: > > Instead, I added an accounting(?) interface function. > > > > | MemoryContextGettConsumption(MemoryContext cxt); > > > > The API returns the current consumption in this memory > > context. This allows "real" memory accounting almost without > > overhead. > > That's definitely *NOT* almost without overhead. This adds additional > instructions to one postgres' hottest set of codepaths. I'm not sure how much the two instructions in AllocSetAlloc actually impacts, but I agree that it is doubtful that the size-limit feature worth the possible slowdown in any extent. # I faintly remember that I tried the same thing before.. > I think you're not working incrementally enough here. I strongly suggest > solving the negative cache entry problem, and then incrementally go from > there after that's committed. The likelihood of this patch ever getting > merged otherwise seems extremely small. Mmm. Scoping to the negcache prolem, my very first patch posted two-years ago does that based on invalidation for pg_statistic and pg_class, like I think Tom have suggested somewhere in this thread. https://www.postgresql.org/message-id/20161219.201505.11562604.horiguchi.kyot...@lab.ntt.co.jp This is completely different approach from the current shape and it would be useless after pruning is introduced. So I'd like to go for the generic pruning by age. Difference from v15: Removed AllocSet accounting stuff. We use approximate memory size for catcache. Removed prune-by-number(or size) stuff. Adressing comments from Tsunakawa-san and Ideriha-san . Separated catcache monitoring feature. (Removed from this set) (But it is crucial to check this feature...) Is this small enough ? regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 191496e02abd4d7b261705e8d2a0ef4aed5827c7 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Thu, 7 Feb 2019 14:56:07 +0900 Subject: [PATCH 1/2] Add dlist_move_tail We have dlist_push_head/tail and dlist_move_head but not dlist_move_tail. Add it. --- src/include/lib/ilist.h | 19 +++ 1 file changed, 19 insertions(+) diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h index b1a5974ee4..659ab1ac87 100644 --- a/src/include/lib/ilist.h +++ b/src/include/lib/ilist.h @@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node) dlist_check(head); } +/* + * Move element from its current position in the list to the tail position in + * the same list. + * + * Undefined behaviour if 'node' is not already part of the list. + */ +static inline void +dlist_move_tail(dlist_head *head, dlist_node *node) +{ + /* fast path if it's already at the tail */ + if (head->head.prev == node) + return; + + dlist_delete(node); + dlist_push_tail(head, node); + + dlist_check(head); +} + /* * Check whether 'node' has a following node. * Caution: unreliable if 'node' is not in the list. -- 2.16.3 >From 59f53da08abb70398611b33f635b46bda87a7534 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Tue, 16 Oct 2018 13:04:30 +0900 Subject: [PATCH 2/2] Remove entries that haven't been used for a certain time Catcache entries can be left alone for several reasons. It is not desirable that they eat up memory. With this patch, This adds consideration of removal of entries that haven't been used for a certain time before enlarging the hash array. This also can put a hard limit on the number of catcache entries. --- doc/src/sgml/config.sgml | 40 + src/backend/tcop/postgres.c | 13 ++ src/backend/utils/cache/catcache.c| 243 -- src/backend/utils/init/globals.c | 1 + src/backend/utils/init/postinit.c | 11 ++ src/backend/utils/misc/guc.c | 23 +++ src/backend/utils/misc/postgresql.conf.sample | 2 + src/include/miscadmin.h | 1 + src/include/utils/catcache.h | 43 - src/include/utils/timeout.h | 1 + 10 files changed, 364 insertions(+), 14 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 8bd57f376b..7a93aef659 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1661,6 +1661,46 @@ include_dir 'conf.d' + + catalog_cache_prune_min_age (integer) + + catalog_cache_prune_min_age configuration + parameter + + + + +Specifies the minimum amount of unused time in seconds at which a +system catalog cache entry is removed. -1 indicates that this feature +is disabled at all. The value defaults to 300 seconds (5 +minutes). The catalog cache entries that are not used for +the duration can be removed to prevent it from
RE: Protect syscache from bloating with negative cache entries
From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com] > number of tables | 100 |1000|1 > --- > TPS (master) |10966 |10654 |9099 > TPS (patch)| 11137 (+1%) |10710 (+0%) |772 (-91%) > > It seems that before cache exceeding the limit (no pruning at 100 and 1000), > the results are almost same with master but after exceeding the limit (at > 1) > the decline happens. How many concurrent clients? Can you show the perf's call graph sampling profiles of both the unpatched and patched version, to confirm that the bottleneck is around catcache eviction and refill? Regards Takayuki Tsunakawa
RE: Protect syscache from bloating with negative cache entries
>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com] >But at the same time, I did some benchmark with only hard limit option enabled >and >time-related option disabled, because the figures of this case are not >provided in this >thread. >So let me share it. I'm sorry but I'm taking back result about patch and correcting it. I configured postgresql (master) with only CFLAGS=O2 but I misconfigured postgres (path applied) with --enable-cassert --enable-debug --enable-tap-tests 'CFLAGS=-O0'. These debug option (especially --enable-cassert) caused enourmous overhead. (I thought I checked the configure option.. I was maybe tired.) So I changed these to only 'CFLAGS=-O2' and re-measured them. >I did two experiments. One is to show negative cache bloat is suppressed. >This thread originated from the issue that negative cache of pg_statistics is >bloating as >creating and dropping temp table is repeatedly executed. >https://www.postgresql.org/message-id/20161219.201505.11562604.horiguchi.kyot >aro%40lab.ntt.co.jp >Using the script attached the first email in this thread, I repeated create >and drop >temp table at 1 times. >(experiment is repeated 5 times. catalog_cache_max_size = 500kB. > compared master branch and patch with hard memory limit) > >Here are TPS and CacheMemoryContext 'used' memory (total - freespace) >calculated >by MemoryContextPrintStats() at 100, 1000, 1 times of create-and-drop >transaction. The result shows cache bloating is suppressed after exceeding the >limit >(at 1) but tps declines regardless of the limit. > >number of tx (create and drop) | 100 |1000|1 >--- >used CacheMemoryContext (master) |610296|2029256 |15909024 used >CacheMemoryContext (patch) |755176|880552 |880592 >--- >TPS (master) |414 |407 |399 >TPS (patch) |242 |225 |220 Correct one: number of tx (create and drop) | 100 |1000|1 --- TPS (master) |414 |407 |399 TPS (patch) |447 |415 |409 The results between master and patch is almost same. >Another experiment is using Tomas's script posted while ago, The scenario is >do select >1 from multiple tables randomly (uniform distribution). >(experiment is repeated 5 times. catalog_cache_max_size = 10MB. > compared master branch and patch with only hard memory limit enabled) > >Before doing the benchmark, I checked pruning is happened only at 1 tables >using >debug option. The result shows degradation regardless of before or after >pruning. >I personally still need hard size limitation but I'm surprised that the >difference is so >significant. > >number of tables | 100 |1000|1 >--- >TPS (master) |10966 |10654 |9099 >TPS (patch)|4491 |2099 |378 Correct one: number of tables | 100 |1000|1 --- TPS (master) |10966 |10654 |9099 TPS (patch)| 11137 (+1%) |10710 (+0%) |772 (-91%) It seems that before cache exceeding the limit (no pruning at 100 and 1000), the results are almost same with master but after exceeding the limit (at 1) the decline happens. Regards, Takeshi Ideriha
Re: Protect syscache from bloating with negative cache entries
On 2/19/19 12:43 AM, Tsunakawa, Takayuki wrote: > Hi Horiguchi-san, > > I've looked through your patches. This is the first part of my review > results. Let me post the rest after another work today. > > BTW, how about merging 0003 and 0005, and separating and deferring 0004 in > another thread? That may help to relieve other community members by making > this patch set not so large and complex. > > > > [Bottleneck investigation] > Ideriha-san and I are trying to find the bottleneck. My first try shows > there's little overhead. Here's what I did: > > > shared_buffers = 1GB > catalog_cache_prune_min_age = -1 > catalog_cache_max_size = 10MB > > > $ pgbench -i -s 10 > $ pg_ctl stop and then start > $ cache all data in shared buffers by running pg_prewarm on branches, > tellers, accounts, and their indexes > $ pgbench --select-only -c 1 -T 60 > > > master : 8612 tps > patched: 8553 tps (-0.7%) > > There's little (0.7%) performance overhead with: > * one additional dlist_move_tail() in every catcache access > * memory usage accounting in operations other than catcache access (relevant > catcache entries should be cached in the first pgbench transaction) > > I'll check other patterns to find out how big overhead there is. > 0.7% may easily be just a noise, possibly due to differences in layout of the binary. How many runs? What was the variability of the results between runs? What hardware was this tested on? FWIW I doubt tests with such small small schema are proving anything - the cache/lists are likely tiny. That's why I tested with much larger number of relations. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Protect syscache from bloating with negative cache entries
From: 'Bruce Momjian' [mailto:br...@momjian.us] > I think, in general, smaller is better, as long as making something > smaller doesn't remove data that is frequently accessed. Having a timer > to expire only old entries seems like it accomplished this goal. > > Having a minimum size and not taking it to zero size makes sense if we > know we will need certain entries like pg_class in the next query. > However, if the session is idle for hours, we should just probably > remove everything, so maybe the minimum doesn't make sense --- just > remove everything. That's another interesting idea. A somewhat relevant feature is Oracle's "ALTER SYSTEM FLUSH SHARED_POOL". It flushes all dictionary cache, library cache, and SQL plan entries. The purpose is different: not to release memory, but to defragment the shared memory. > I don't think other DBMSs are a good model since they have a reputation > for requiring a lot of tuning --- tuning that we have often automated. Yeah, I agree that PostgreSQL is easier to use in many aspects. On the other hand, although I hesitate to say this (please don't get upset...), I feel PostgreSQL is a bit too loose about memory usage. To my memory, PostgreSQL crashed OS due to OOM in our user environments: * Creating and dropping temp tables repeatedly in a stored PL/pgSQL function. This results in infinite CacheMemoryContext bloat. This is referred to at the beginning of this mail thread. Oracle and MySQL can limit the size of the dictionary cache. * Each pair of SAVEPOINT/RELEASE leaves 8KB of CurTransactionContext. The customer used psqlODBC to run a batch app, which ran millions of SQL statements in a transaction. psqlODBC wraps each SQL statement with SAVEPOINT and RELEASE by default. I guess this is what caused the crash of AWS Aurora in last year's Amazon Prime Day. * Setting a large value to work_mem, and then run many concurrent large queries. Oracle can limit the total size of all sessions' memory with PGA_AGGREGATE_TARGET parameter. We all have to manage things within resource constraints. The DBA wants to make sure the server doesn't overuse memory to avoid crash or slowdown due to swapping. Oracle does it, and another open source database, MySQL, does it too. PostgreSQL does it with shared_buffers, wal_buffers, and work_mem (within a single session). Then, I thought it's natural to do it with catcache/relcache/plancache. Regards Takayuki Tsunakawa
RE: Protect syscache from bloating with negative cache entries
Hi Horiguchi-san, This is the rest of my review comments. (5) patch 0003 CatcacheClockTimeoutPending = 0; + + /* Update timetamp then set up the next timeout */ + false is better than 0, to follow other **Pending variables. timetamp -> timestamp (6) patch 0003 GetCatCacheClock() is not used now. Why don't we add it when the need arises? (7) patch 0003 Why don't we remove the catcache timer (Setup/UpdateCatCacheClockTimer), unless we need it by all means? That simplifies the code. Long-running queries can be thought as follows: * A single lengthy SQL statement, e.g. SELECT for reporting/analytics, COPY for data loading, and UPDATE/DELETE for batch processing, should only require small number of catalog entries during their query analysis/planning. They won't suffer from cache eviction during query execution. * Do not have to evict cache entries while executing a long-running stored procedure, because its constituent SQL statements may access the same tables. If the stored procedure accesses so many tables that you are worried about the catcache memory overuse, then catalog_cache_max_size can be used. Another natural idea would be to update the cache clock when SPI executes each SQL statement. (8) patch 0003 + uint64 base_size; + uint64 base_size = MemoryContextGetConsumption(CacheMemoryContext); This may also as well be Size, not uint64. (9) patch 0003 @@ -1940,7 +2208,7 @@ CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos, Datum *keys) /* * Helper routine that copies the keys in the srckeys array into the dstkeys * one, guaranteeing that the datums are fully allocated in the current memory - * context. + * context. Returns allocated memory size. */ static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos, @@ -1976,7 +2244,6 @@ CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos, att->attbyval, att->attlen); } - } This change seem to be no longer necessary thanks to the memory accounting. (10) patch 0004 How about separating this in another thread, so that the rest of the patch set becomes easier to review and commit? Regarding the design, I'm inclined to avoid each backend writing the file. To simplify the code, I think we can take advantage of the fortunate situation -- the number of backends and catcaches are fixed at server startup. My rough sketch is: * Allocate an array of statistics entries in shared memory, whose element is (pid or backend id, catcache id or name, hits, misses, ...). The number of array elements is MaxBackends * number of catcaches (some dozens). * Each backend updates its own entry in the shared memory during query execution. * Stats collector periodically scans the array and write it to the stats file. (11) patch 0005 +dlist_head cc_lru_list = {0}; +Size global_size = 0; It is better to put these in CatCacheHeader. That way, backends that do not access the catcache (archiver, stats collector, etc.) do not have to waste memory for these global variables. (12) patch 0005 + else if (catalog_cache_max_size > 0 && +global_size > catalog_cache_max_size * 1024) CatCacheCleanupOldEntries(cache); On the second line, catalog_cache_max_size should be cast to Size to avoid overflow. (13) patch 0005 + gettext_noop("Sets the maximum size of catcache in kilobytes."), catcache -> catalog cache (14) patch 0005 + CatCache *owner; /* owner catcache */ CatCTup already has my_cache member. (15) patch 0005 if (nremoved > 0) elog(DEBUG1, "pruning catalog cache id=%d for %s: removed %d / %d", cp->id, cp->cc_relname, nremoved, nelems_before); In prune-by-size case, this elog doesn't very meaningful data. How about dividing this function into two, one is for prune-by-age and another for prune-by-size? I supppose that would make the functions easier to understand. Regards Takayuki Tsunakawa
RE: Protect syscache from bloating with negative cache entries
Hi Horiguchi-san, I've looked through your patches. This is the first part of my review results. Let me post the rest after another work today. BTW, how about merging 0003 and 0005, and separating and deferring 0004 in another thread? That may help to relieve other community members by making this patch set not so large and complex. [Bottleneck investigation] Ideriha-san and I are trying to find the bottleneck. My first try shows there's little overhead. Here's what I did: shared_buffers = 1GB catalog_cache_prune_min_age = -1 catalog_cache_max_size = 10MB $ pgbench -i -s 10 $ pg_ctl stop and then start $ cache all data in shared buffers by running pg_prewarm on branches, tellers, accounts, and their indexes $ pgbench --select-only -c 1 -T 60 master : 8612 tps patched: 8553 tps (-0.7%) There's little (0.7%) performance overhead with: * one additional dlist_move_tail() in every catcache access * memory usage accounting in operations other than catcache access (relevant catcache entries should be cached in the first pgbench transaction) I'll check other patterns to find out how big overhead there is. [Source code review] Below are my findings on the patch set v15: (1) patch 0001 All right. (2) patch 0002 @@ -87,6 +87,7 @@ typedef struct MemoryContextData const char *name; /* context name (just for debugging) */ const char *ident; /* context ID if any (just for debugging) */ MemoryContextCallback *reset_cbs; /* list of reset/delete callbacks */ + uint64 consumption;/* accumulates consumed memory size */ } MemoryContextData; Size is more appropriate as a data type than uint64 because other places use Size for memory size variables. How about "usedspace" instead of "consumption"? Because that aligns better with the naming used for MemoryContextCounters's member variables, totalspace and freespace. (3) patch 0002 + context->consumption += chunk_size; (and similar sites) The used space should include the size of the context-type-specific chunk header, so that the count is closer to the actual memory size seen by the user. Here, let's make consensus on what the used space represents. Is it either of the following? a) The total space allocated from OS. i.e., the sum of the malloc()ed regions for a given memory context. b) The total space of all chunks, including their headers, of a given memory context. a) is better because that's the actual memory usage from the DBA's standpoint. But a) cannot be used because CacheMemoryContext is used for various things. So we have to compromise on b). Is this OK? One possible future improvement is to use a separate memory context exclusively for the catcache, which is a child of CacheMemoryContext. That way, we can adopt a). (4) patch 0002 @@ -614,6 +614,9 @@ AllocSetReset(MemoryContext context) + set->header.consumption = 0; This can be put in MemoryContextResetOnly() instead of context-type-specific reset functions. Regards Takayuki Tsunakawa
Re: Protect syscache from bloating with negative cache entries
On 2019-Feb-15, Tomas Vondra wrote: > ISTM there's a couple of ways to deal with that: > > 1) Ignore the memory amounts entirely, and do just time-base eviction. > > 2) If we want some size thresholds (e.g. to disable eviction for > backends with small caches etc.) use the number of entries instead. I > don't think that's particularly worse that specifying size in MB. Why is there a *need* for size-based eviction? Seems that time-based should be sufficient. Is the proposed approach to avoid eviction at all until the size threshold has been reached? I'm not sure I see the point of that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Protect syscache from bloating with negative cache entries
From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com] > I think "catalog_cache_..." is fine. If we end up with a similar > patchfor relcache, we can probably call it "relation_cache_". Agreed, those are not too long or too short, and they are sufficiently descriptive. Regards Takayuki Tsunakawa
Re: Protect syscache from bloating with negative cache entries
On 2/14/19 4:49 PM, 'Bruce Momjian' wrote: > On Thu, Feb 14, 2019 at 01:31:49AM +, Tsunakawa, Takayuki wrote: >> From: Bruce Momjian [mailto:br...@momjian.us] That being said, having a "minimal size" threshold before starting with the time-based eviction may be a good idea. >>> >>> Agreed. I see the minimal size as a way to keep the systems tables >>> in cache, which we know we will need for the next query. >> >> Isn't it the maximum size, not minimal size? Maximum size allows >> to keep desired amount of system tables in memory as well as to >> control memory consumption to avoid out-of-memory errors (OS crash!). >> I'm wondering why people want to take a different approach to >> catcatch, which is unlike other PostgreSQL memory e.g. shared_buffers, >> temp_buffers, SLRU buffers, work_mem, and other DBMSs. > > Well, that is an _excellent_ question, and one I had to think about. > I think we're talking about two different concepts here: 1) minimal size - We don't do any extra eviction at all, until we reach this cache size. So we don't get any extra overhead from it. If a system does not have issues. 2) maximal size - We ensure the cache size is below this threshold. If there's more data, we evict enough entries to get below it. My proposal is essentially to do just (1), so the cache can grow very large if needed but then it shrinks again after a while. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protect syscache from bloating with negative cache entries
On 2/14/19 3:46 PM, Bruce Momjian wrote: > On Thu, Feb 14, 2019 at 12:40:10AM -0800, Andres Freund wrote: >> Hi, >> >> On 2019-02-13 15:31:14 +0900, Kyotaro HORIGUCHI wrote: >>> Instead, I added an accounting(?) interface function. >>> >>> | MemoryContextGettConsumption(MemoryContext cxt); >>> >>> The API returns the current consumption in this memory >>> context. This allows "real" memory accounting almost without >>> overhead. >> >> That's definitely *NOT* almost without overhead. This adds additional >> instructions to one postgres' hottest set of codepaths. >> >> I think you're not working incrementally enough here. I strongly suggest >> solving the negative cache entry problem, and then incrementally go from >> there after that's committed. The likelihood of this patch ever getting >> merged otherwise seems extremely small. > > Agreed --- the patch is going in the wrong direction. > I recall endless discussions about memory accounting in the "memory-bounded hash-aggregate" patch a couple of years ago, and the overhead was one of the main issues there. So yeah, trying to solve that problem here is likely to kill this patch (or at least significantly delay it). ISTM there's a couple of ways to deal with that: 1) Ignore the memory amounts entirely, and do just time-base eviction. 2) If we want some size thresholds (e.g. to disable eviction for backends with small caches etc.) use the number of entries instead. I don't think that's particularly worse that specifying size in MB. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Protect syscache from bloating with negative cache entries
On 2/13/19 1:23 AM, Tsunakawa, Takayuki wrote: > From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp] >> I'm at a loss how call syscache for users. I think it is "catalog >> cache". The most basic component is called catcache, which is >> covered by the syscache layer, both of then are not revealed to >> users, and it is shown to user as "catalog cache". >> >> "catalog_cache_prune_min_age", "catalog_cache_memory_target", (if >> exists) "catalog_cache_entry_limit" and >> "catalog_cache_prune_ratio" make sense? > > PostgreSQL documentation uses "system catalog" in its table of contents, so > syscat_cache_xxx would be a bit more familiar? I'm for either catalog_ and > syscat_, but what name shall we use for the relation cache? catcache and > relcache have different element sizes and possibly different usage patterns, > so they may as well have different parameters just like MySQL does. If we > follow that idea, then the name would be relation_cache_xxx. However, from > the user's viewpoint, the relation cache is also created from the system > catalog like pg_class and pg_attribute... > I think "catalog_cache_..." is fine. If we end up with a similar patchfor relcache, we can probably call it "relation_cache_". I'd be OK even with "system_catalog_cache_..." - I don't think it's overly long (better to have a longer but descriptive name), and "syscat" just seems like unnecessary abbreviation. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
RE: Protect syscache from bloating with negative cache entries
>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com] >>About the new global-size based evicition(2), cache entry creation >>becomes slow after the total size reached to the limit since every one >>new entry evicts one or more old (= >>not-recently-used) entries. Because of not needing knbos for each >>cache, it become far realistic. So I added documentation of >"catalog_cache_max_size" in 0005. > >Now I'm also trying to benchmark, which will be posted in another email. According to recent comments by Andres and Bruce maybe we should address negative cache bloat step by step for example by reviewing Tom's patch. But at the same time, I did some benchmark with only hard limit option enabled and time-related option disabled, because the figures of this case are not provided in this thread. So let me share it. I did two experiments. One is to show negative cache bloat is suppressed. This thread originated from the issue that negative cache of pg_statistics is bloating as creating and dropping temp table is repeatedly executed. https://www.postgresql.org/message-id/20161219.201505.11562604.horiguchi.kyotaro%40lab.ntt.co.jp Using the script attached the first email in this thread, I repeated create and drop temp table at 1 times. (experiment is repeated 5 times. catalog_cache_max_size = 500kB. compared master branch and patch with hard memory limit) Here are TPS and CacheMemoryContext 'used' memory (total - freespace) calculated by MemoryContextPrintStats() at 100, 1000, 1 times of create-and-drop transaction. The result shows cache bloating is suppressed after exceeding the limit (at 1) but tps declines regardless of the limit. number of tx (create and drop) | 100 |1000|1 --- used CacheMemoryContext (master) |610296|2029256 |15909024 used CacheMemoryContext (patch) |755176|880552 |880592 --- TPS (master) |414 |407 |399 TPS (patch) |242 |225 |220 Another experiment is using Tomas's script posted while ago, The scenario is do select 1 from multiple tables randomly (uniform distribution). (experiment is repeated 5 times. catalog_cache_max_size = 10MB. compared master branch and patch with only hard memory limit enabled) Before doing the benchmark, I checked pruning is happened only at 1 tables using debug option. The result shows degradation regardless of before or after pruning. I personally still need hard size limitation but I'm surprised that the difference is so significant. number of tables | 100 |1000|1 --- TPS (master) |10966 |10654 |9099 TPS (patch)|4491 |2099 |378 Regards, Takeshi Ideriha
Re: Protect syscache from bloating with negative cache entries
On Thu, Feb 14, 2019 at 01:31:49AM +, Tsunakawa, Takayuki wrote: > From: Bruce Momjian [mailto:br...@momjian.us] > > > That being said, having a "minimal size" threshold before starting > > > with the time-based eviction may be a good idea. > > > > Agreed. I see the minimal size as a way to keep the systems tables > > in cache, which we know we will need for the next query. > > Isn't it the maximum size, not minimal size? Maximum size allows > to keep desired amount of system tables in memory as well as to > control memory consumption to avoid out-of-memory errors (OS crash!). > I'm wondering why people want to take a different approach to > catcatch, which is unlike other PostgreSQL memory e.g. shared_buffers, > temp_buffers, SLRU buffers, work_mem, and other DBMSs. Well, that is an _excellent_ question, and one I had to think about. I think, in general, smaller is better, as long as making something smaller doesn't remove data that is frequently accessed. Having a timer to expire only old entries seems like it accomplished this goal. Having a minimum size and not taking it to zero size makes sense if we know we will need certain entries like pg_class in the next query. However, if the session is idle for hours, we should just probably remove everything, so maybe the minimum doesn't make sense --- just remove everything. As for why we don't do this with everything --- we can't do it with shared_buffers since we can't change its size while the server is running. For work_mem, we assume all the work_mem data is for the current query, and therefore frequently accessed. Also, work_mem is not memory we can just free if it is not used since it contains intermediate results required by the current query. I think temp_buffers, since it can be resized in the session, actually could use a similar minimizing feature, though that would mean it behaves slightly differently from shared_buffers, and it might not be worth it. Also, I assume the value of temp_buffers was mostly for use by the current query --- yes, it can be used for cross-query caching, but I am not sure if that is its primary purpose. I thought its goal was to prevent shared_buffers from being populated with temporary per-session buffers. I don't think other DBMSs are a good model since they have a reputation for requiring a lot of tuning --- tuning that we have often automated. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Protect syscache from bloating with negative cache entries
On Thu, Feb 14, 2019 at 12:40:10AM -0800, Andres Freund wrote: > Hi, > > On 2019-02-13 15:31:14 +0900, Kyotaro HORIGUCHI wrote: > > Instead, I added an accounting(?) interface function. > > > > | MemoryContextGettConsumption(MemoryContext cxt); > > > > The API returns the current consumption in this memory > > context. This allows "real" memory accounting almost without > > overhead. > > That's definitely *NOT* almost without overhead. This adds additional > instructions to one postgres' hottest set of codepaths. > > I think you're not working incrementally enough here. I strongly suggest > solving the negative cache entry problem, and then incrementally go from > there after that's committed. The likelihood of this patch ever getting > merged otherwise seems extremely small. Agreed --- the patch is going in the wrong direction. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Protect syscache from bloating with negative cache entries
Hi, On 2019-02-13 15:31:14 +0900, Kyotaro HORIGUCHI wrote: > Instead, I added an accounting(?) interface function. > > | MemoryContextGettConsumption(MemoryContext cxt); > > The API returns the current consumption in this memory > context. This allows "real" memory accounting almost without > overhead. That's definitely *NOT* almost without overhead. This adds additional instructions to one postgres' hottest set of codepaths. I think you're not working incrementally enough here. I strongly suggest solving the negative cache entry problem, and then incrementally go from there after that's committed. The likelihood of this patch ever getting merged otherwise seems extremely small. Greetings, Andres Freund
RE: Protect syscache from bloating with negative cache entries
>From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp] > > >(2) Another new patch v15-0005 on top of previous design of > limit-by-number-of-a-cache feature converts it to > limit-by-size-on-all-caches feature, which I think is > Tsunakawa-san wanted. Yeah, size looks better to me. >As far as I can see no significant degradation is found in usual (as long as >pruning >doesn't happen) code paths. > >About the new global-size based evicition(2), cache entry creation becomes >slow after >the total size reached to the limit since every one new entry evicts one or >more old (= >not-recently-used) entries. Because of not needing knbos for each cache, it >become >far realistic. So I added documentation of "catalog_cache_max_size" in 0005. Now I'm also trying to benchmark, which will be posted in another email. Here are things I noticed: [1] compiler warning catcache.c:109:1: warning: missing braces around initializer [-Wmissing-braces] dlist_head cc_lru_list = {0}; ^ catcache.c:109:1: warning: (near initialization for ‘cc_lru_list.head’) [-Wmissing-braces] [2] catalog_cache_max_size is not appered in postgresql.conf.sample [3] global lru list and global size can be included in CatCacheHeader, which seems to me good place because this structure contains global cache information regardless of kind of CatCache [4] when applying patch with git am, there are several warnings about trailing white space at v15-0003 Regards, Takeshi Ideriha
RE: Protect syscache from bloating with negative cache entries
From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp] > It is too complex as I was afraid. The indirect calls causes siginicant > degradation. (Anyway the previous code was bogus in that it passes > CACHELINEALIGN'ed pointer to get_chunk_size..) > > Instead, I added an accounting(?) interface function. > > | MemoryContextGettConsumption(MemoryContext cxt); > > The API returns the current consumption in this memory context. This allows > "real" memory accounting almost without overhead. That looks like a great idea! Actually, I was thinking of using MemoryContextStats() or its new lightweight variant to get the used amount, but I was afraid it would be too costly to call in catcache code. You are smarter, and I was just stupid. > (2) Another new patch v15-0005 on top of previous design of > limit-by-number-of-a-cache feature converts it to > limit-by-size-on-all-caches feature, which I think is > Tsunakawa-san wanted. Thank you very, very much! I look forward to reviewing v15. I'll be away from the office tomorrow, so I'd like to review it on this weekend or the beginning of next week. I've confirmed and am sure that 0001 can be committed. > As far as I can see no significant degradation is found in usual (as long > as pruning doesn't happen) code paths. > > About the new global-size based evicition(2), cache entry creation becomes > slow after the total size reached to the limit since every one new entry > evicts one or more old (= > not-recently-used) entries. Because of not needing knbos for each cache, > it become far realistic. So I added documentation of > "catalog_cache_max_size" in 0005. Could you show us the comparison of before and after the pruning starts, if you already have it? If you lost the data, I'm OK to see the data after the code review. Regards Takayuki Tsunakawa
RE: Protect syscache from bloating with negative cache entries
From: Bruce Momjian [mailto:br...@momjian.us] > > That being said, having a "minimal size" threshold before starting with > > the time-based eviction may be a good idea. > > Agreed. I see the minimal size as a way to keep the systems tables in > cache, which we know we will need for the next query. Isn't it the maximum size, not minimal size? Maximum size allows to keep desired amount of system tables in memory as well as to control memory consumption to avoid out-of-memory errors (OS crash!). I'm wondering why people want to take a different approach to catcatch, which is unlike other PostgreSQL memory e.g. shared_buffers, temp_buffers, SLRU buffers, work_mem, and other DBMSs. Regards Takayuki Tsunakawa
Re: Protect syscache from bloating with negative cache entries
On Tue, Feb 12, 2019 at 02:53:40AM +0100, Tomas Vondra wrote: > Right. But the logic behind time-based approach is that evicting such > entries should not cause any issues exactly because they are accessed > infrequently. It might incur some latency when we need them for the > first time after the eviction, but IMHO that's acceptable (although I > see Andres did not like that). > > FWIW we might even evict entries after some time passes since inserting > them into the cache - that's what memcached et al do, IIRC. The logic is > that frequently accessed entries will get immediately loaded back (thus > keeping cache hit ratio high). But there are reasons why the other dbs > do that - like not having any cache invalidation (unlike us). Agreed. If this fixes 90% of the issues people will have, and it applies to the 99.9% of users who will never tune this, it is a clear win. If we want to add something that requires tuning later, we can consider it once the non-tuning solution is done. > That being said, having a "minimal size" threshold before starting with > the time-based eviction may be a good idea. Agreed. I see the minimal size as a way to keep the systems tables in cache, which we know we will need for the next query. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Protect syscache from bloating with negative cache entries
At Wed, 13 Feb 2019 02:15:42 +, "Tsunakawa, Takayuki" wrote in <0A3221C70F24FB45833433255569204D1FB97CF1@G01JPEXMBYT05> > From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com] > > > I didin't consider planning that happen within a function. If > > > 5min is the default for catalog_cache_prune_min_age, 10% of it > > > (30s) seems enough and gettieofday() with such intervals wouldn't > > > affect forground jobs. I'd choose catalog_c_p_m_age/10 rather > > > than fixed value 30s and 1s as the minimal. > > > > > > > Actually, I see CatCacheCleanupOldEntries contains this comment: > > > > /* > > * Calculate the duration from the time of the last access to the > > * "current" time. Since catcacheclock is not advanced within a > > * transaction, the entries that are accessed within the current > > * transaction won't be pruned. > > */ > > > > which I think is pretty much what I've been saying ... But the question > > is whether we need to do something about it. > > Hmm, I'm surprised at v14 patch about this. I remember that previous patches > renewed the cache clock on every statement, and it is correct. If the cache > clock is only updated at the beginning of a transaction, the following TODO > item would not be solved: > > https://wiki.postgresql.org/wiki/Todo Sorry, its just a stale comment. In v15, it is alreday ouch! still left alone. (Actually CatCacheGetStats doesn't perform pruning.) I'll remove it in the next version. It is called in start_xact_command, which is called per statement, provided with statement timestamp. > /* > * Calculate the duration from the time from the last access to > * the "current" time. catcacheclock is updated per-statement > * basis and additionaly udpated periodically during a long > * running query. > */ > TimestampDifference(ct->lastaccess, catcacheclock, _age, ); > " Reduce memory use when analyzing many tables in a single command by making > catcache and syscache flushable or bounded." In v14 and v15, addition to it a timer that fires with the interval of catalog_cache_prune_min_age/10 - 30s when the parameter is 5min - updates the catcache clock using gettimeofday(), which in turn is the source of LRU timestamp. > Also, Tom mentioned pg_dump in this thread (protect syscache...). pg_dump > runs in a single transaction, touching all system catalogs. That may result > in OOM, and this patch can rescue it. So, all the problem will be addressed in v14. regards. -- Kyotaro Horiguchi NTT Open Source Software Center