subject:"Re\: Protect syscache from bloating with negative cache entries"

Re: Protect syscache from bloating with negative cache entries

2021-03-23 Thread Kyotaro Horiguchi

At Mon, 22 Mar 2021 13:12:10 -0400, Bruce Momjian  wrote in 
> On Thu, Jan 28, 2021 at 05:16:52PM +0900, Kyotaro Horiguchi wrote:
> > At Thu, 28 Jan 2021 16:50:44 +0900 (JST), Kyotaro Horiguchi 
> >  wrote in 
> > > I was going to write in the doc something like "you can inspect memory
> > > consumption by catalog caches using pg_backend_memory_contexts", but
> > > all the memory used by catalog cache is in CacheMemoryContext.  Is it
> > > sensible for each catalog cache to have their own contexts?
> > 
> > Something like this.
> 
> Is this feature not going to make it into PG 14?  It first appeared in
> the January, 2017 commitfest:
> 
>   https://commitfest.postgresql.org/32/931/

Thank you for looking this. However, I'm afraid that you are looking
to a patch which is not a part of the project in CF, "Protect syscache
".  I'm happy if it is committed.

It is intending not only to show more meaningful information by
pg_get_backend_memory_contexts(), but also to make it easy to
investigate what kind of cache is bloating or something like that.

With the patch, the functions shows individual context information
lines for catcaches.

> postgres=# select pg_get_backend_memory_contexts();
...
>  (catcache,"catcache id 78",CacheMemoryContext,2,8192,1,6152,0,2040)
>  (catcache,"catcache id 77",CacheMemoryContext,2,8192,1,6152,0,2040)
>  (catcache,"catcache id 76",CacheMemoryContext,2,16384,2,7592,3,8792)

Applying catcachecxt_by_name.patch.txt on top of it changes the output
as the following. The names are not familiar to users, but give far
clearer information.

>  (catcache,USERMAPPINGUSERSERVER,CacheMemoryContext,2,8192,1,6192,0,2000)
>  (catcache,USERMAPPINGOID,CacheMemoryContext,2,8192,1,6192,0,2000)
>  (catcache,TYPEOID,CacheMemoryContext,2,16384,2,7632,0,8752)

Applying catcachecxt_by_name_id.patch.xt on top of the _by_name.patch,
the output further changes as the following.

>  (catcache,USERMAPPINGUSERSERVER[78],CacheMemoryContext,2,8192,1,6136,0,2056)
>  (catcache,USERMAPPINGOID[77],CacheMemoryContext,2,8192,1,6136,0,2056)
>  (catcache,TYPEOID[76],CacheMemoryContext,2,16384,2,7592,3,8792)

The number enclosed by brackets is cache id. It is useles for users
but convenient for debugging:p

catcache_individual_mcxt_2.patch.txt: rebased version of per-catcache context.
catcachecxt_by_name.patch.txt: gives a meaningful name to catcache contexts.
catcachecxt_by_name_id.patch.txt: and adds cache id.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/cache/catcache.c 
b/src/backend/utils/cache/catcache.c
index 55c9445898..7d318cf7aa 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -769,6 +769,8 @@ InitCatCache(int id,
 {
CatCache   *cp;
MemoryContext oldcxt;
+   MemoryContext mycxt;
+   charname[32];
size_t  sz;
int i;
 
@@ -792,7 +794,12 @@ InitCatCache(int id,
if (!CacheMemoryContext)
CreateCacheMemoryContext();
 
-   oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+   mycxt = AllocSetContextCreate(CacheMemoryContext, "catcache",
+ 
ALLOCSET_DEFAULT_SIZES);
+
+   snprintf(name, sizeof(name), "catcache id %d", id);
+   oldcxt = MemoryContextSwitchTo(mycxt);
+   MemoryContextSetIdentifier(mycxt, (const char *)pstrdup(name));
 
/*
 * if first time through, initialize the cache group header
@@ -833,6 +840,7 @@ InitCatCache(int id,
cp->cc_nkeys = nkeys;
for (i = 0; i < nkeys; ++i)
cp->cc_keyno[i] = key[i];
+   cp->cc_mcxt = mycxt;
 
/*
 * new cache is initialized as far as we can go for now. print some
@@ -932,12 +940,12 @@ CatalogCacheInitializeCache(CatCache *cache)
relation = table_open(cache->cc_reloid, AccessShareLock);
 
/*
-* switch to the cache context so our allocations do not vanish at the 
end
-* of a transaction
+* switch to our own context under the cache context so our allocations 
do
+* not vanish at the end of a transaction
 */
-   Assert(CacheMemoryContext != NULL);
+   Assert(CacheMemoryContext != NULL && cache->cc_mcxt != NULL);
 
-   oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+   oldcxt = MemoryContextSwitchTo(cache->cc_mcxt);
 
/*
 * copy the relcache's tuple descriptor to permanent cache storage
@@ -998,7 +1006,7 @@ CatalogCacheInitializeCache(CatCache *cache)
 */
fmgr_info_cxt(eqfunc,
  >cc_skey[i].sk_func,
- CacheMemoryContext);
+ cache->cc_mcxt);
 
/* Initialize sk_attno suitably for HeapKeyTest() and heap 
scans */
cache->cc_skey[i].sk_attno = cache->cc_keyno[i];
@@

Re: Protect syscache from bloating with negative cache entries

2021-03-22 Thread Bruce Momjian

On Thu, Jan 28, 2021 at 05:16:52PM +0900, Kyotaro Horiguchi wrote:
> At Thu, 28 Jan 2021 16:50:44 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > I was going to write in the doc something like "you can inspect memory
> > consumption by catalog caches using pg_backend_memory_contexts", but
> > all the memory used by catalog cache is in CacheMemoryContext.  Is it
> > sensible for each catalog cache to have their own contexts?
> 
> Something like this.

Is this feature not going to make it into PG 14?  It first appeared in
the January, 2017 commitfest:

https://commitfest.postgresql.org/32/931/

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Protect syscache from bloating with negative cache entries

2021-01-28 Thread Kyotaro Horiguchi

At Thu, 28 Jan 2021 16:50:44 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> I was going to write in the doc something like "you can inspect memory
> consumption by catalog caches using pg_backend_memory_contexts", but
> all the memory used by catalog cache is in CacheMemoryContext.  Is it
> sensible for each catalog cache to have their own contexts?

Something like this.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/cache/catcache.c 
b/src/backend/utils/cache/catcache.c
index fa2b49c676..cfbb335bb3 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -769,6 +769,8 @@ InitCatCache(int id,
 {
CatCache   *cp;
MemoryContext oldcxt;
+   MemoryContext mycxt;
+   charname[32];
size_t  sz;
int i;
 
@@ -792,7 +794,12 @@ InitCatCache(int id,
if (!CacheMemoryContext)
CreateCacheMemoryContext();
 
-   oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+   mycxt = AllocSetContextCreate(CacheMemoryContext, "catcache",
+ 
ALLOCSET_DEFAULT_SIZES);
+
+   snprintf(name, sizeof(name), "catcache id %d", id);
+   oldcxt = MemoryContextSwitchTo(mycxt);
+   MemoryContextSetIdentifier(mycxt, (const char *)pstrdup(name));
 
/*
 * if first time through, initialize the cache group header
@@ -833,6 +840,7 @@ InitCatCache(int id,
cp->cc_nkeys = nkeys;
for (i = 0; i < nkeys; ++i)
cp->cc_keyno[i] = key[i];
+   cp->cc_mcxt = mycxt;
 
/*
 * new cache is initialized as far as we can go for now. print some
@@ -932,12 +940,12 @@ CatalogCacheInitializeCache(CatCache *cache)
relation = table_open(cache->cc_reloid, AccessShareLock);
 
/*
-* switch to the cache context so our allocations do not vanish at the 
end
-* of a transaction
+* switch to our own context under the cache context so our allocations 
do
+* not vanish at the end of a transaction
 */
Assert(CacheMemoryContext != NULL);
 
-   oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+   oldcxt = MemoryContextSwitchTo(cache->cc_mcxt);
 
/*
 * copy the relcache's tuple descriptor to permanent cache storage
@@ -998,7 +1006,7 @@ CatalogCacheInitializeCache(CatCache *cache)
 */
fmgr_info_cxt(eqfunc,
  >cc_skey[i].sk_func,
- CacheMemoryContext);
+ cache->cc_mcxt);
 
/* Initialize sk_attno suitably for HeapKeyTest() and heap 
scans */
cache->cc_skey[i].sk_attno = cache->cc_keyno[i];
@@ -1697,7 +1705,7 @@ SearchCatCacheList(CatCache *cache,
table_close(relation, AccessShareLock);
 
/* Now we can build the CatCList entry. */
-   oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+   oldcxt = MemoryContextSwitchTo(cache->cc_mcxt);
nmembers = list_length(ctlist);
cl = (CatCList *)
palloc(offsetof(CatCList, members) + nmembers * 
sizeof(CatCTup *));
@@ -1830,7 +1838,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, 
Datum *arguments,
dtp = ntp;
 
/* Allocate memory for CatCTup and the cached tuple in one go */
-   oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+   oldcxt = MemoryContextSwitchTo(cache->cc_mcxt);
 
ct = (CatCTup *) palloc(sizeof(CatCTup) +
MAXIMUM_ALIGNOF 
+ dtp->t_len);
@@ -1865,7 +1873,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, 
Datum *arguments,
else
{
Assert(negative);
-   oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
+   oldcxt = MemoryContextSwitchTo(cache->cc_mcxt);
ct = (CatCTup *) palloc(sizeof(CatCTup));
 
/*
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index ddc2762eb3..a32fea2f11 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -61,6 +61,7 @@ typedef struct catcache
slist_node  cc_next;/* list link */
ScanKeyData cc_skey[CATCACHE_MAXKEYS];  /* precomputed key info for heap

 * scans */
+   MemoryContext   cc_mcxt;/* memory context for this cache */
 
/*
 * Keep these at the end, so that compiling catcache.c with 
CATCACHE_STATS

Re: Protect syscache from bloating with negative cache entries

2021-01-27 Thread Kyotaro Horiguchi

At Wed, 27 Jan 2021 13:11:55 +0200, Heikki Linnakangas  wrote 
in 
> On 27/01/2021 03:13, Kyotaro Horiguchi wrote:
> > At Thu, 14 Jan 2021 17:32:27 +0900 (JST), Kyotaro Horiguchi
> >  wrote in
> >> The commit 4656e3d668 (debug_invalidate_system_caches_always)
> >> conflicted with this patch. Rebased.
> > At Wed, 27 Jan 2021 10:07:47 +0900 (JST), Kyotaro Horiguchi
> >  wrote in
> >> (I found a bug in a benchmark-aid function
> >> (CatalogCacheFlushCatalog2), I repost an updated version soon.)
> > I noticed that a catcachebench-aid function
> > CatalogCacheFlushCatalog2() allocates bucked array wrongly in the
> > current memory context, which leads to a crash.
> > This is a fixed it then rebased version.
> 
> Thanks, with the scripts you provided, I was able to run the
> performance tests on my laptop, and got very similar results as you
> did.
> 
> The impact of v7-0002-Remove-dead-flag-from-catcache-tuple.patch is
> very small. I think I could see it in the tests, but only barely. And
> the tests did nothing else than do syscache lookups; in any real world
> scenario, it would be lost in noise. I think we can put that aside for
> now, and focus on v6-0001-CatCache-expiration-feature.patch:

I agree to that opinion. But a bit dissapointing that the long
struggle ended up in vain:p

> The pruning is still pretty lethargic:
> 
> - Entries created in the same transactions are never pruned away
> 
> - The size of the hash table is never shrunk. So even though the patch
> - puts a backstop to the hash table growing indefinitely, if you run one
> - transaction that bloats the cache, it's bloated for the rest of the
> - session.

Right. But more frequent check impacts on performance. We can do more
aggressive pruning in idle-time.

> I think that's OK. We might want to be more aggressive in the future,
> but for now it seems reasonable to lean towards the current behavior
> where nothing is pruned. Although I wonder if we should try to set
> 'catcacheclock' more aggressively. I think we could set it whenever
> GetCurrentTimestamp() is called, for example.

Ah. I didn't thought that direction. global_last_acquired_timestamp or
such?

> Given how unaggressive this mechanism is, I think it should be safe to
> enable it by default. What would be a suitable default for
> catalog_cache_prune_min_age? 30 seconds?

Without a detailed thought, it seems a bit too short. The
ever-suggested value for the variable is 300-600s. That is,
intermittent queries with about 5-10 minutes intervals don't lose
cache entries.

In a bad case, two queries alternately remove each other's cache
entries.

Q1: adds 100 entries
<1 minute passed>

Q2: adds 100 entries but rehash is going to happen at 150 entries and
the existing 100 entreis added by Q1 are removed.
<1 minute passed>

Q1: adds 100 entries but rehash is going to happen at 150 entries and
the existing 100 entreis added by Q2 are removed.

Or a transaction sequence persists longer than that seconds may lose
some of the catcache entries.

> Documentation needs to be updated for the new GUC.
> 
> Attached is a version with a few little cleanups:
> - use TimestampTz instead of uint64 for the timestamps
> - remove assign_catalog_cache_prune_min_age(). All it did was convert
> - the GUC's value from seconds to microseconds, and stored it in a
> - separate variable. Multiplication is cheap, so we can just do it when
> - we use the GUC's value instead.

Yeah, the laater is a trace of the struggle for cutting down cpu
cycles in the normal paths. I don't object to do so.

I found that some comments are apparently stale. cp->cc_oldest_ts is
not used anywhere, but it is added for the decision of whether to scan
or not.

I fixed the following points in the attached.

- Removed some comments that is obvious. ("Timestamp in us")
- Added cp->cc_oldest_ts check in CatCacheCleanupOldEntries.
- Set the default value for catalog_cache_prune_min_age to 600s.
- Added a doc entry for the new GUC in the resoruce/memory section.
- Fix some code comments.
- Adjust pruning criteria from (ct->lastaccess < prune_threshold) to <=.

I was going to write in the doc something like "you can inspect memory
consumption by catalog caches using pg_backend_memory_contexts", but
all the memory used by catalog cache is in CacheMemoryContext.  Is it
sensible for each catalog cache to have their own contexts?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 01d32ffa499ee9185e222fe6fa3d39ad6ac5ff37 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 27 Jan 2021 13:08:08 +0200
Subject: [PATCH v9] CatCache expiration feature

Author: Kyotaro Horiguchi
---
 doc/src/sgml/config.sgml   | 20 +++
 src/backend/access/transam/xact.c  |  3 ++
 src/backend/utils/cache/catcache.c | 85 +-
 src/backend/utils/misc/guc.c   | 12 +
 src/include/utils/catcache.h   | 17 ++
 5 files changed, 136 insertions(+), 1 deletion(-)

diff --git

Re: Protect syscache from bloating with negative cache entries

2021-01-27 Thread Heikki Linnakangas


On 27/01/2021 03:13, Kyotaro Horiguchi wrote:

At Thu, 14 Jan 2021 17:32:27 +0900 (JST), Kyotaro Horiguchi 
 wrote in

The commit 4656e3d668 (debug_invalidate_system_caches_always)
conflicted with this patch. Rebased.


At Wed, 27 Jan 2021 10:07:47 +0900 (JST), Kyotaro Horiguchi 
 wrote in

(I found a bug in a benchmark-aid function
(CatalogCacheFlushCatalog2), I repost an updated version soon.)


I noticed that a catcachebench-aid function
CatalogCacheFlushCatalog2() allocates bucked array wrongly in the
current memory context, which leads to a crash.

This is a fixed it then rebased version.


Thanks, with the scripts you provided, I was able to run the performance 
tests on my laptop, and got very similar results as you did.


The impact of v7-0002-Remove-dead-flag-from-catcache-tuple.patch is very 
small. I think I could see it in the tests, but only barely. And the 
tests did nothing else than do syscache lookups; in any real world 
scenario, it would be lost in noise. I think we can put that aside for 
now, and focus on v6-0001-CatCache-expiration-feature.patch:


The pruning is still pretty lethargic:

- Entries created in the same transactions are never pruned away

- The size of the hash table is never shrunk. So even though the patch 
puts a backstop to the hash table growing indefinitely, if you run one 
transaction that bloats the cache, it's bloated for the rest of the session.


I think that's OK. We might want to be more aggressive in the future, 
but for now it seems reasonable to lean towards the current behavior 
where nothing is pruned. Although I wonder if we should try to set 
'catcacheclock' more aggressively. I think we could set it whenever 
GetCurrentTimestamp() is called, for example.


Given how unaggressive this mechanism is, I think it should be safe to 
enable it by default. What would be a suitable default for 
catalog_cache_prune_min_age? 30 seconds?


Documentation needs to be updated for the new GUC.

Attached is a version with a few little cleanups:
- use TimestampTz instead of uint64 for the timestamps
- remove assign_catalog_cache_prune_min_age(). All it did was convert 
the GUC's value from seconds to microseconds, and stored it in a 
separate variable. Multiplication is cheap, so we can just do it when we 
use the GUC's value instead.


- Heikki
>From 73583c2c9cd26bad7c3942eff1ddccb2ebb43222 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 27 Jan 2021 13:08:08 +0200
Subject: [PATCH v8 1/1] CatCache expiration feature

Author: Kyotaro Horiguchi
---
 src/backend/access/transam/xact.c  |  3 ++
 src/backend/utils/cache/catcache.c | 82 +-
 src/backend/utils/misc/guc.c   | 12 +
 src/include/utils/catcache.h   | 17 +++
 4 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd45..86888d24091 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1086,6 +1086,9 @@ static void
 AtStart_Cache(void)
 {
 	AcceptInvalidationMessages();
+
+	if (xactStartTimestamp != 0)
+		SetCatCacheClock(xactStartTimestamp);
 }
 
 /*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index fa2b49c676e..e29f825687a 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -38,6 +38,7 @@
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 
  /* #define CACHEDEBUG */	/* turns DEBUG elogs on */
@@ -60,9 +61,18 @@
 #define CACHE_elog(...)
 #endif
 
+/*
+ * GUC variable to define the minimum age of entries that will be considered
+ * to be evicted in seconds. -1 to disable the feature.
+ */
+int catalog_cache_prune_min_age = -1;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Clock for the last accessed time of a catcache entry. */
+TimestampTz	catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
 			   int nkeys,
 			   Datum v1, Datum v2,
@@ -74,6 +84,7 @@ static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache,
 Index hashIndex,
 Datum v1, Datum v2,
 Datum v3, Datum v4);
+static bool CatCacheCleanupOldEntries(CatCache *cp);
 
 static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys,
 		   Datum v1, Datum v2, Datum v3, Datum v4);
@@ -99,7 +110,6 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 			 Datum *srckeys, Datum *dstkeys);
 
-
 /*
  *	internal support functions
  */
@@ -1264,6 +1274,9 @@ SearchCatCacheInternal(CatCache *cache,
 		 */
 		dlist_move_head(bucket, >cache_elem);
 
+		/* Record the last access timestamp */
+		ct->lastaccess = catcacheclock;
+
 		/*
 		 * If it's a positive entry,

Re: Protect syscache from bloating with negative cache entries

2021-01-26 Thread Kyotaro Horiguchi

At Thu, 14 Jan 2021 17:32:27 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> The commit 4656e3d668 (debug_invalidate_system_caches_always)
> conflicted with this patch. Rebased.

At Wed, 27 Jan 2021 10:07:47 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> (I found a bug in a benchmark-aid function
> (CatalogCacheFlushCatalog2), I repost an updated version soon.)

I noticed that a catcachebench-aid function
CatalogCacheFlushCatalog2() allocates bucked array wrongly in the
current memory context, which leads to a crash.

This is a fixed it then rebased version.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 5f318170b9c1e0caa1033862261800f06135e5bd Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 18 Nov 2020 16:54:31 +0900
Subject: [PATCH v7 1/3] CatCache expiration feature

---
 src/backend/access/transam/xact.c  |  3 ++
 src/backend/utils/cache/catcache.c | 87 +-
 src/backend/utils/misc/guc.c   | 12 +
 src/include/utils/catcache.h   | 19 +++
 4 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..86888d2409 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1086,6 +1086,9 @@ static void
 AtStart_Cache(void)
 {
 	AcceptInvalidationMessages();
+
+	if (xactStartTimestamp != 0)
+		SetCatCacheClock(xactStartTimestamp);
 }
 
 /*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index fa2b49c676..644d92dd9a 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -38,6 +38,7 @@
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 
  /* #define CACHEDEBUG */	/* turns DEBUG elogs on */
@@ -60,9 +61,19 @@
 #define CACHE_elog(...)
 #endif
 
+/*
+ * GUC variable to define the minimum age of entries that will be considered
+ * to be evicted in seconds. -1 to disable the feature.
+ */
+int catalog_cache_prune_min_age = -1;
+uint64	prune_min_age_us;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Clock for the last accessed time of a catcache entry. */
+uint64	catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
 			   int nkeys,
 			   Datum v1, Datum v2,
@@ -74,6 +85,7 @@ static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache,
 Index hashIndex,
 Datum v1, Datum v2,
 Datum v3, Datum v4);
+static bool CatCacheCleanupOldEntries(CatCache *cp);
 
 static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys,
 		   Datum v1, Datum v2, Datum v3, Datum v4);
@@ -99,6 +111,15 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 			 Datum *srckeys, Datum *dstkeys);
 
+/* GUC assign function */
+void
+assign_catalog_cache_prune_min_age(int newval, void *extra)
+{
+	if (newval < 0)
+		prune_min_age_us = UINT64_MAX;
+	else
+		prune_min_age_us = ((uint64) newval) * USECS_PER_SEC;
+}
 
 /*
  *	internal support functions
@@ -1264,6 +1285,9 @@ SearchCatCacheInternal(CatCache *cache,
 		 */
 		dlist_move_head(bucket, >cache_elem);
 
+		/* Record the last access timestamp */
+		ct->lastaccess = catcacheclock;
+
 		/*
 		 * If it's a positive entry, bump its refcount and return it. If it's
 		 * negative, we can report failure to the caller.
@@ -1425,6 +1449,61 @@ SearchCatCacheMiss(CatCache *cache,
 	return >tuple;
 }
 
+/*
+ * CatCacheCleanupOldEntries - Remove infrequently-used entries
+ *
+ * Catcache entries happen to be left unused for a long time for several
+ * reasons. Remove such entries to prevent catcache from bloating. It is based
+ * on the similar algorithm with buffer eviction. Entries that are accessed
+ * several times in a certain period live longer than those that have had less
+ * access in the same duration.
+ */
+static bool
+CatCacheCleanupOldEntries(CatCache *cp)
+{
+	int		nremoved = 0;
+	int		i;
+	long	oldest_ts = catcacheclock;
+	uint64	prune_threshold = catcacheclock - prune_min_age_us;
+
+	/* Scan over the whole hash to find entries to remove */
+	for (i = 0 ; i < cp->cc_nbuckets ; i++)
+	{
+		dlist_mutable_iter	iter;
+
+		dlist_foreach_modify(iter, >cc_bucket[i])
+		{
+			CatCTup*ct = dlist_container(CatCTup, cache_elem, iter.cur);
+
+			/* Don't remove referenced entries */
+			if (ct->refcount == 0 &&
+(ct->c_list == NULL || ct->c_list->refcount == 0))
+			{
+if (ct->lastaccess < prune_threshold)
+{
+	CatCacheRemoveCTup(cp, ct);
+	nremoved++;
+
+	/* don't let the removed entry update oldest_ts */
+	continue;
+}
+			}
+
+			/* update the oldest timestamp if the entry remains alive */
+			if (ct->lastaccess < oldest_ts)
+oldest_ts = ct->lastaccess;
+		}
+	}
+
+

Re: Protect syscache from bloating with negative cache entries

2021-01-26 Thread Kyotaro Horiguchi

At Tue, 26 Jan 2021 11:43:21 +0200, Heikki Linnakangas  wrote 
in 
> Hi,
> 
> On 19/11/2020 07:25, Kyotaro Horiguchi wrote:
> > Performance measurement on the attached showed better result about
> > searching but maybe worse for cache entry creation.  Each time number
> > is the mean of 10 runs.
> > # Cacache (negative) entry creation
> > :  time(ms) (% to master)
> > master :  3965.61(100.0)
> > patched-off:  4040.93(101.9)
> > patched-on :  4032.22(101.7)
> > # Searching negative cache entries
> > master :  8173.46(100.0)
> > patched-off:  7983.43( 97.7)
> > patched-on :  8049.88( 98.5)
> > # Creation, searching and expiration
> > master :  6393.23(100.0)
> > patched-off:  6527.94(102.1)
> > patched-on : 15880.01(248.4)
> > That is, catcache searching gets faster by 2-3% but creation gets
> > slower by about 2%. If I moved the condition of 2 further up to
> > CatalogCacheCreateEntry(), that degradation reduced to 0.6%.
> > # Cacache (negative) entry creation
> > master  :  3967.45   (100.0)
> > patched-off :  3990.43   (100.6)
> > patched-on  :  4108.96   (103.6)
> > # Searching negative cache entries
> > master  :  8106.53   (100.0)
> > patched-off :  8036.61   ( 99.1)
> > patched-on  :  8058.18   ( 99.4)
> > # Creation, searching and expiration
> > master  :  6395.00   (100.0)
> > patched-off :  6416.57   (100.3)
> > patched-on  : 15830.91   (247.6)
> 
> Can you share the exact script or steps to reproduce these numbers? I
> presume these are from the catcachebench extension, but I can't figure
> out which scenario above corresponds to which catcachebench
> test. Also, catcachebench seems to depend on a bunch of tables being
> created in schema called "test"; what tables did you use for the above
> numbers?

gen_tbl.pl to generate the tables, then run2.sh to run the
benchmark. sumlog.pl to summarize the result of run2.sh.

$ ./gen_tbl.pl | psql postgres
$ ./run2.sh | tee rawresult.txt | ./sumlog.pl

(I found a bug in a benchmark-aid function
(CatalogCacheFlushCatalog2), I repost an updated version soon.)

Simple explanation follows since the scripts are a kind of crappy..

run2.sh:
  LOOPS: # of execution of catcachebench() in a single run
  USES : Take the average of this number of fastest executions in a
 single run.
  BINROOT  : Common parent directory of target binaries.
  DATADIR  : Data directory. (shared by all binaries)
  PREC : FP format for time and percentage in a result.
  TESTS: comma-separated numbers given to catcachebench.
  
 The "run" function spec

   run "binary-label"
 where A, B and C are the value for catalog_cache_prune_min_age. ""
 means no setting (used for master binary). Currently only C is in
 effect but all the three should be non-empty string to make it
 effective.

 The result output is:

   test |   version   |  n  |r | stddev  
  --+-+-+--+-
  1 | patched-off | 1/3 | 14211.96 |  261.19

   test   : # of catcachebench(#)
   version: binary label given to the run function
   n  : USES / LOOPS
   r  : result average time of catcachebench() in milliseconds
   stddev : stddev of

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

#! /usr/bin/perl
$collist = "";
foreach $i (0..1000) {
$collist .= sprintf(", c%05d int", $i);
}
$collist = substr($collist, 2);

printf "drop schema if exists test cascade;\n";
printf "create schema test;\n";
foreach $i (0..2999) {
printf "create table test.t%04d ($collist);\n", $i;
}
#!/bin/bash
LOOPS=3
USES=1
TESTS=1,2,3
BINROOT=/home/horiguti/bin
DATADIR=/home/horiguti/data/data_work
PREC="numeric(10,2)"

/usr/bin/killall postgres
/usr/bin/sleep 3

run() {
local BINARY=$1
local PGCTL=$2/bin/pg_ctl
local PGSQL=$2/bin/postgres
local PSQL=$2/bin/psql

if [ "$3" != "" ]; then
  local SETTING1="set catalog_cache_prune_min_age to \"$3\";"
  local SETTING2="set catalog_cache_prune_min_age to \"$4\";"
  local SETTING3="set catalog_cache_prune_min_age to \"$5\";"
fi

#   ($PGSQL -D $DATADIR 2>&1 > /dev/null)&
($PGSQL -D $DATADIR 2>&1 > /dev/null | /usr/bin/sed -e 's/^/# /')&
/usr/bin/sleep 3
${PSQL} postgres <&1 > /dev/null | /usr/bin/sed -e 's/^/# 
/'

#   oreport > $BINARY_perf.txt
}

for i in $(seq 0 2); do
run "patched-off" $BINROOT/pgsql_catexp "-1" "-1" "-1"
run "patched-on" $BINROOT/pgsql_catexp "0" "0" "0"
run "master" $BINROOT/pgsql_master_o2 "" "" ""
done

#! /usr/bin/perl

while () {
#   if 
(/^\s+([0-9])\s*\|\s*(\w+)\s*\|\s*(\S+)\s*\|\s*([\d.]+)\s*\|\s*(\w+)\s*$/) {
if 
(/^\s+([0-9])\s*\|\s*(\S+)\s*\|\s*(\S+)\s*\|\s*([\d.]+)\s*\|\s*([\d.]+)\s*$/) {
$test = $1;
$bin = $2;
$time = $4;
if (defined $sum{$test}{$bin}) {
$sum{$test}{$bin} +=

Re: Protect syscache from bloating with negative cache entries

2021-01-26 Thread Heikki Linnakangas


Hi,

On 19/11/2020 07:25, Kyotaro Horiguchi wrote:

Performance measurement on the attached showed better result about
searching but maybe worse for cache entry creation.  Each time number
is the mean of 10 runs.

# Cacache (negative) entry creation
:  time(ms) (% to master)
master :  3965.61(100.0)
patched-off:  4040.93(101.9)
patched-on :  4032.22(101.7)

# Searching negative cache entries
master :  8173.46(100.0)
patched-off:  7983.43( 97.7)
patched-on :  8049.88( 98.5)

# Creation, searching and expiration
master :  6393.23(100.0)
patched-off:  6527.94(102.1)
patched-on : 15880.01(248.4)


That is, catcache searching gets faster by 2-3% but creation gets
slower by about 2%. If I moved the condition of 2 further up to
CatalogCacheCreateEntry(), that degradation reduced to 0.6%.

# Cacache (negative) entry creation
master  :  3967.45   (100.0)
patched-off :  3990.43   (100.6)
patched-on  :  4108.96   (103.6)

# Searching negative cache entries
master  :  8106.53   (100.0)
patched-off :  8036.61   ( 99.1)
patched-on  :  8058.18   ( 99.4)

# Creation, searching and expiration
master  :  6395.00   (100.0)
patched-off :  6416.57   (100.3)
patched-on  : 15830.91   (247.6)


Can you share the exact script or steps to reproduce these numbers? I 
presume these are from the catcachebench extension, but I can't figure 
out which scenario above corresponds to which catcachebench test. Also, 
catcachebench seems to depend on a bunch of tables being created in 
schema called "test"; what tables did you use for the above numbers?


- Heikki

Re: Protect syscache from bloating with negative cache entries

2021-01-14 Thread Kyotaro Horiguchi

Hello.

The commit 4656e3d668 (debug_invalidate_system_caches_always)
conflicted with this patch. Rebased.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From ec069488fd2675369530f3f967f02a7b683f0a7f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 18 Nov 2020 16:54:31 +0900
Subject: [PATCH v6 1/3] CatCache expiration feature

---
 src/backend/access/transam/xact.c  |  3 ++
 src/backend/utils/cache/catcache.c | 87 +-
 src/backend/utils/misc/guc.c   | 12 +
 src/include/utils/catcache.h   | 19 +++
 4 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a2068e3fd4..86888d2409 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1086,6 +1086,9 @@ static void
 AtStart_Cache(void)
 {
 	AcceptInvalidationMessages();
+
+	if (xactStartTimestamp != 0)
+		SetCatCacheClock(xactStartTimestamp);
 }
 
 /*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index fa2b49c676..644d92dd9a 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -38,6 +38,7 @@
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 
  /* #define CACHEDEBUG */	/* turns DEBUG elogs on */
@@ -60,9 +61,19 @@
 #define CACHE_elog(...)
 #endif
 
+/*
+ * GUC variable to define the minimum age of entries that will be considered
+ * to be evicted in seconds. -1 to disable the feature.
+ */
+int catalog_cache_prune_min_age = -1;
+uint64	prune_min_age_us;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Clock for the last accessed time of a catcache entry. */
+uint64	catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
 			   int nkeys,
 			   Datum v1, Datum v2,
@@ -74,6 +85,7 @@ static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache,
 Index hashIndex,
 Datum v1, Datum v2,
 Datum v3, Datum v4);
+static bool CatCacheCleanupOldEntries(CatCache *cp);
 
 static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys,
 		   Datum v1, Datum v2, Datum v3, Datum v4);
@@ -99,6 +111,15 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 			 Datum *srckeys, Datum *dstkeys);
 
+/* GUC assign function */
+void
+assign_catalog_cache_prune_min_age(int newval, void *extra)
+{
+	if (newval < 0)
+		prune_min_age_us = UINT64_MAX;
+	else
+		prune_min_age_us = ((uint64) newval) * USECS_PER_SEC;
+}
 
 /*
  *	internal support functions
@@ -1264,6 +1285,9 @@ SearchCatCacheInternal(CatCache *cache,
 		 */
 		dlist_move_head(bucket, >cache_elem);
 
+		/* Record the last access timestamp */
+		ct->lastaccess = catcacheclock;
+
 		/*
 		 * If it's a positive entry, bump its refcount and return it. If it's
 		 * negative, we can report failure to the caller.
@@ -1425,6 +1449,61 @@ SearchCatCacheMiss(CatCache *cache,
 	return >tuple;
 }
 
+/*
+ * CatCacheCleanupOldEntries - Remove infrequently-used entries
+ *
+ * Catcache entries happen to be left unused for a long time for several
+ * reasons. Remove such entries to prevent catcache from bloating. It is based
+ * on the similar algorithm with buffer eviction. Entries that are accessed
+ * several times in a certain period live longer than those that have had less
+ * access in the same duration.
+ */
+static bool
+CatCacheCleanupOldEntries(CatCache *cp)
+{
+	int		nremoved = 0;
+	int		i;
+	long	oldest_ts = catcacheclock;
+	uint64	prune_threshold = catcacheclock - prune_min_age_us;
+
+	/* Scan over the whole hash to find entries to remove */
+	for (i = 0 ; i < cp->cc_nbuckets ; i++)
+	{
+		dlist_mutable_iter	iter;
+
+		dlist_foreach_modify(iter, >cc_bucket[i])
+		{
+			CatCTup*ct = dlist_container(CatCTup, cache_elem, iter.cur);
+
+			/* Don't remove referenced entries */
+			if (ct->refcount == 0 &&
+(ct->c_list == NULL || ct->c_list->refcount == 0))
+			{
+if (ct->lastaccess < prune_threshold)
+{
+	CatCacheRemoveCTup(cp, ct);
+	nremoved++;
+
+	/* don't let the removed entry update oldest_ts */
+	continue;
+}
+			}
+
+			/* update the oldest timestamp if the entry remains alive */
+			if (ct->lastaccess < oldest_ts)
+oldest_ts = ct->lastaccess;
+		}
+	}
+
+	cp->cc_oldest_ts = oldest_ts;
+
+	if (nremoved > 0)
+		elog(DEBUG1, "pruning catalog cache id=%d for %s: removed %d / %d",
+			 cp->id, cp->cc_relname, nremoved, cp->cc_ntup + nremoved);
+
+	return nremoved > 0;
+}
+
 /*
  *	ReleaseCatCache
  *
@@ -1888,6 +1967,7 @@ CatalogCacheCreateEntry(CatCache *cache, HeapTuple ntp, Datum *arguments,
 	ct->dead = false;
 	ct->negative = negative;
 	ct->hash_value = hashValue;
+	ct->lastaccess =

Re: Protect syscache from bloating with negative cache entries

2020-11-19 Thread Kyotaro Horiguchi

Ah. It was obvious from the first.

Sorry for the sloppy diagnosis.

At Fri, 20 Nov 2020 16:08:40 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Thu, 19 Nov 2020 15:23:05 +0900 (JST), Kyotaro Horiguchi 
>  wrote in 
> > At Wed, 18 Nov 2020 21:42:02 -0800, Andres Freund  
> > wrote in 
> > > Hi,
> > > 
> > > On 2020-11-19 14:25:36 +0900, Kyotaro Horiguchi wrote:
> > > > # Creation, searching and expiration
> > > > master :  6393.23(100.0)
> > > > patched-off:  6527.94(102.1)
> > > > patched-on : 15880.01(248.4)
> > > 
> > > What's the deal with this massive increase here?

catalog_cache_min_prune_age was set to 0 at the time, so almost all
catcache entries are dropped at rehashing time. Most of the difference
should be the time to search on the system catalog.


2020-11-20 16:25:25.988  LOG:  database system is ready to accept connections
2020-11-20 16:26:48.504  LOG:  Catcache reset
2020-11-20 16:26:48.504  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 257: 0.001500 ms
2020-11-20 16:26:48.504  LOG:  rehashed catalog cache id 58 for pg_statistic; 
257 tups, 256 buckets, 0.020748 ms
2020-11-20 16:26:48.505  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 513: 0.003221 ms
2020-11-20 16:26:48.505  LOG:  rehashed catalog cache id 58 for pg_statistic; 
513 tups, 512 buckets, 0.006962 ms
2020-11-20 16:26:48.505  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 1025: 0.006744 ms
2020-11-20 16:26:48.505  LOG:  rehashed catalog cache id 58 for pg_statistic; 
1025 tups, 1024 buckets, 0.009580 ms
2020-11-20 16:26:48.507  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 2049: 0.015683 ms
2020-11-20 16:26:48.507  LOG:  rehashed catalog cache id 58 for pg_statistic; 
2049 tups, 2048 buckets, 0.041008 ms
2020-11-20 16:26:48.509  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 4097: 0.042438 ms
2020-11-20 16:26:48.509  LOG:  rehashed catalog cache id 58 for pg_statistic; 
4097 tups, 4096 buckets, 0.077379 ms
2020-11-20 16:26:48.515  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 8193: 0.123798 ms
2020-11-20 16:26:48.515  LOG:  rehashed catalog cache id 58 for pg_statistic; 
8193 tups, 8192 buckets, 0.198505 ms
2020-11-20 16:26:48.525  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 16385: 0.180831 ms
2020-11-20 16:26:48.526  LOG:  rehashed catalog cache id 58 for pg_statistic; 
16385 tups, 16384 buckets, 0.361109 ms
2020-11-20 16:26:48.546  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 32769: 0.717899 ms
2020-11-20 16:26:48.547  LOG:  rehashed catalog cache id 58 for pg_statistic; 
32769 tups, 32768 buckets, 1.443587 ms
2020-11-20 16:26:48.588  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 65537: 1.204804 ms
2020-11-20 16:26:48.591  LOG:  rehashed catalog cache id 58 for pg_statistic; 
65537 tups, 65536 buckets, 3.069916 ms
2020-11-20 16:26:48.674  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 131073: 2.707709 ms
2020-11-20 16:26:48.681  LOG:  rehashed catalog cache id 58 for pg_statistic; 
131073 tups, 131072 buckets, 7.127622 ms
2020-11-20 16:26:48.848  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 262145: 5.895630 ms
2020-11-20 16:26:48.862  LOG:  rehashed catalog cache id 58 for pg_statistic; 
262145 tups, 262144 buckets, 13.433610 ms
2020-11-20 16:26:49.195  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 524289: 12.302632 ms
2020-11-20 16:26:49.223  LOG:  rehashed catalog cache id 58 for pg_statistic; 
524289 tups, 524288 buckets, 27.710900 ms
2020-11-20 16:26:49.937  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 1001000 / 1048577: 66.062629 ms
2020-11-20 16:26:51.195  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 1002001 / 1048577: 65.533468 ms
2020-11-20 16:26:52.413  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 0 / 1048577: 25.623740 ms
2020-11-20 16:26:52.468  LOG:  rehashed catalog cache id 58 for pg_statistic; 
1048577 tups, 1048576 buckets, 54.314825 ms
2020-11-20 16:26:53.898  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 2000999 / 2097153: 134.530582 ms
2020-11-20 16:26:56.404  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 1002001 / 2097153: 111.634597 ms
2020-11-20 16:26:57.779  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 2000999 / 2097153: 134.628430 ms
2020-11-20 16:27:00.389  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 1002001 / 2097153: 147.221688 ms
2020-11-20 16:27:01.851  LOG:  pruning catalog cache id=58 for pg_statistic: 
removed 2000999 / 2097153: 177.610820 ms

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Protect syscache from bloating with negative cache entries

2020-11-19 Thread Kyotaro Horiguchi

At Thu, 19 Nov 2020 15:23:05 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Wed, 18 Nov 2020 21:42:02 -0800, Andres Freund  wrote 
> in 
> > Hi,
> > 
> > On 2020-11-19 14:25:36 +0900, Kyotaro Horiguchi wrote:
> > > # Creation, searching and expiration
> > > master :  6393.23(100.0)
> > > patched-off:  6527.94(102.1)
> > > patched-on : 15880.01(248.4)
> > 
> > What's the deal with this massive increase here?
> 
> CatCacheRemovedCTup(). If I replaced a call to the function in the
> cleanup functoin with dlist_delete(), the result changes as:
> 
> master  :  6372.04   (100.0) (2)
> patched-off :  6464.97   (101.5) (2)
> patched-on  :  5354.42   ( 84.0) (2)
> 
> We could boost the expiration if we reuse the "deleted" entry at the
> next entry creation.

That result should be bogus. It forgot to update cc_ntup..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Protect syscache from bloating with negative cache entries

2020-11-18 Thread Kyotaro Horiguchi

At Wed, 18 Nov 2020 21:42:02 -0800, Andres Freund  wrote in 
> Hi,
> 
> On 2020-11-19 14:25:36 +0900, Kyotaro Horiguchi wrote:
> > # Creation, searching and expiration
> > master :  6393.23(100.0)
> > patched-off:  6527.94(102.1)
> > patched-on : 15880.01(248.4)
> 
> What's the deal with this massive increase here?

CatCacheRemovedCTup(). If I replaced a call to the function in the
cleanup functoin with dlist_delete(), the result changes as:

master  :  6372.04   (100.0) (2)
patched-off :  6464.97   (101.5) (2)
patched-on  :  5354.42   ( 84.0) (2)

We could boost the expiration if we reuse the "deleted" entry at the
next entry creation.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Protect syscache from bloating with negative cache entries

2020-11-18 Thread Andres Freund

Hi,

On 2020-11-19 14:25:36 +0900, Kyotaro Horiguchi wrote:
> # Creation, searching and expiration
> master :  6393.23(100.0)
> patched-off:  6527.94(102.1)
> patched-on : 15880.01(248.4)

What's the deal with this massive increase here?

Greetings,

Andres Freund

Re: Protect syscache from bloating with negative cache entries

2020-11-18 Thread Kyotaro Horiguchi

Thank you for the comments.

At Tue, 17 Nov 2020 16:22:54 -0500, Robert Haas  wrote 
in 
> On Tue, Nov 17, 2020 at 10:46 AM Heikki Linnakangas  wrote:
> > 0.7% degradation is probably acceptable.
> 
> I haven't looked at this patch in a while and I'm pleased with the way
> it seems to have been redesigned. It seems relatively simple and
> unlikely to cause big headaches. I would say that 0.7% is probably not
> acceptable on a general workload, but it seems fine on a benchmark

Sorry for the confusing notation, "-0.7% degradation" meant +0.7%
*gain*, which I thinks is error.  However, the next patch makes
catcache apparently *faster* so the difference doesn't matter..

> that is specifically designed to be a worst-case for this patch, which
> I gather is what's happening here. I think it would be nice if we
> could enable this feature by default. Does it cause a measurable
> regression on realistic workloads when enabled? I bet a default of 5
> or 10 minutes would help many users.
> 
> One idea for improving things might be to move the "return
> immediately" tests in CatCacheCleanupOldEntries() to the caller, and
> only call this function if they indicate that there is some purpose.
> This would avoid the function call overhead when nothing can be done.
> Perhaps the two tests could be combined into one and simplified. Like,
> suppose the code looks (roughly) like this:
> 
> if (catcacheclock >= time_at_which_we_can_prune)
> CatCacheCleanupOldEntries(...);

Compiler removes the call (or inlines the function) but of course we
can write that way and it shows the condition for calling the function
better.  The codelet above forgetting consideration on the result of
CatCacheCleanupOldEntries() itself.  The function returns false when
all "old" entries have been invalidated or explicitly removed and we
need to expand the hash in that case.

> To make it that simple, we want catcacheclock and
> time_at_which_we_can_prune to be stored as bare uint64 quantities so
> we don't need TimestampDifference(). And we want
> time_at_which_we_can_prune to be set to PG_UINT64_MAX when the feature
> is disabled. But those both seem like pretty achievable things... and
> it seems like the result would probably be faster than what you have
> now.

The time_at_which_we_can_prune is not global but catcache-local and
needs to change at the time catalog_cache_prune_min_age is changed.

So the next version does as the follwoing:

-   if (CatCacheCleanupOldEntries(cp))
+   if (catcacheclock - cp->cc_oldest_ts > prune_min_age_us &&
+   CatCacheCleanupOldEntries(cp))

On the other hand CatCacheCleanupOldEntries can calcualte the
time_at_which_we_can_prune once at the beginning of the function. That
makes the condition in the loop simpler.

-   TimestampDifference(ct->lastaccess, catcacheclock, , );
-
-   if (age > catalog_cache_prune_min_age)
+   if (ct->lastaccess < prune_threshold)
{

> + * per-statement basis and additionaly udpated periodically
> 
> two words spelled wrong

Ugg. Fixed.  Checked all spellings and found another misspelling.

> +void
> +assign_catalog_cache_prune_min_age(int newval, void *extra)
> +{
> + catalog_cache_prune_min_age = newval;
> +}
> 
> hmm, do we need this?

*That* is actually useless, but the function is kept and not it
maintains the internal-version of the GUC parameter (uint64
prune_min_age).

> + /*
> + * Entries that are not accessed after the last pruning
> + * are removed in that seconds, and their lives are
> + * prolonged according to how many times they are accessed
> + * up to three times of the duration. We don't try shrink
> + * buckets since pruning effectively caps catcache
> + * expansion in the long term.
> + */
> + ct->naccess = Min(2, ct->naccess);
> 
> The code doesn't match the comment, it seems, because the limit here
> is 2, not 3. I wonder if this does anything anyway. My intuition is
> that when a catcache entry gets accessed at all it's probably likely
> to get accessed a bunch of times. If there are any meaningful
> thresholds here I'd expect us to be trying to distinguish things like
> 1000+ accesses vs. 100-1000 vs. 10-100 vs. 1-10. Or maybe we don't
> need to distinguish at all and can just have a single mark bit rather
> than a counter.

Agreed. Since I don't see a clear criteria for the threshold of the
counter, I removed the naccess and related lines.

I did the following changes in the attached.

1. Removed naccess and related lines.

2. Moved the precheck condition out of CatCacheCleanupOldEntries() to
  RehashCatCache().

3. Use uint64 direct comparison instead of TimestampDifference().

4. Removed CatCTup.dead flag.

Performance measurement on the attached showed better result about
searching but maybe worse for cache entry creation.  Each time number
is the mean of 10 runs.

# Cacache (negative) entry creation
   :  time(ms) (% to master)
master :  3965.61(100.0)

Re: Protect syscache from bloating with negative cache entries

2020-11-18 Thread Kyotaro Horiguchi

At Tue, 17 Nov 2020 17:46:25 +0200, Heikki Linnakangas  wrote 
in 
> On 09/11/2020 11:34, Kyotaro Horiguchi wrote:
> > At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas 
> > wrote in
> >> Do you need the "ntaccess == 2" test? You could always increment the
> >> counter, and in the code that uses ntaccess to decide what to evict,
> >> treat all values >= 2 the same.
> >>
> >> Need to handle integer overflow somehow. Or maybe not: integer
> >> overflow is so infrequent that even if a hot syscache entry gets
> >> evicted prematurely because its ntaccess count wrapped around to 0, it
> >> will happen so rarely that it won't make any difference in practice.
> > That relaxing simplifies the code significantly, but a significant
> > degradation by about 5% still exists.
> > (SearchCatCacheInternal())
> >   + ct->naccess++;
> > !+ ct->lastaccess = catcacheclock;
> > If I removed the second line above, the degradation disappears
> > (-0.7%).
> 
> 0.7% degradation is probably acceptable.

Sorry for the confusion "-0.7% degradation" meant "+0.7% gain".

> > However, I don't find the corresponding numbers in the output
> > of perf. The sum of the numbers for the removed instructions is (0.02
> > + 0.28 = 0.3%).  I don't think the degradation as the whole doesn't
> > always reflect to the instruction level profiling, but I'm stuck here,
> > anyway.
> 
> Hmm. Some kind of cache miss effect, perhaps? offsetof(CatCTup, tuple) is

Shouldn't it be seen in the perf result?

> exactly 64 bytes currently, so any fields that you add after 'tuple' will go
> on a different cache line. Maybe it would help if you just move the new fields
> before 'tuple'.
> 
> Making CatCTup smaller might help. Some ideas/observations:
> 
> - The 'ct_magic' field is only used for assertion checks. Could remove it.

Ok, removed.

> - 4 Datums (32 bytes) are allocated for the keys, even though most catcaches
> - have fewer key columns.
> - In the current syscaches, keys[2] and keys[3] are only used to store 32-bit
> - oids or some other smaller fields. Allocating a full 64-bit Datum for them
> - wastes memory.

It seems to be the last resort.

> - You could move the dead flag at the end of the struct or remove it
> - altogether, with the change I mentioned earlier to not keep dead items in
> - the buckets

This seems most promising so I did this.  One annoyance is we need to
know whether a catcache tuple is invalidated or not to judge whether
to remove it.  I used CatCtop.cache_elem.prev to signal the same in
the next version.

> - You could steal a few bit for dead/negative flags from some other field. Use
> - special values for tuple.t_len for them or something.

I stealed the MSB of refcount as negative, but the bit masking
operations seems making the function slower.  The benchmark-2 gets
slower by around +2% as the total.

> With some of these tricks, you could shrink CatCTup so that the new lastaccess
> and naccess fields would fit in the same cacheline.
> 
> That said, I think this is good enough performance-wise as it is. So if we
> want to improve performance in general, that can be a separate patch.

Removing CatCTup.dead increased the performance of catcache search
significantly, but catcache entry creation gets slower for uncertain
rasons..

(Continues to a reply to Robert's comment)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Protect syscache from bloating with negative cache entries

2020-11-17 Thread Robert Haas

On Tue, Nov 17, 2020 at 10:46 AM Heikki Linnakangas  wrote:
> 0.7% degradation is probably acceptable.

I haven't looked at this patch in a while and I'm pleased with the way
it seems to have been redesigned. It seems relatively simple and
unlikely to cause big headaches. I would say that 0.7% is probably not
acceptable on a general workload, but it seems fine on a benchmark
that is specifically designed to be a worst-case for this patch, which
I gather is what's happening here. I think it would be nice if we
could enable this feature by default. Does it cause a measurable
regression on realistic workloads when enabled? I bet a default of 5
or 10 minutes would help many users.

One idea for improving things might be to move the "return
immediately" tests in CatCacheCleanupOldEntries() to the caller, and
only call this function if they indicate that there is some purpose.
This would avoid the function call overhead when nothing can be done.
Perhaps the two tests could be combined into one and simplified. Like,
suppose the code looks (roughly) like this:

if (catcacheclock >= time_at_which_we_can_prune)
CatCacheCleanupOldEntries(...);

To make it that simple, we want catcacheclock and
time_at_which_we_can_prune to be stored as bare uint64 quantities so
we don't need TimestampDifference(). And we want
time_at_which_we_can_prune to be set to PG_UINT64_MAX when the feature
is disabled. But those both seem like pretty achievable things... and
it seems like the result would probably be faster than what you have
now.

+ * per-statement basis and additionaly udpated periodically

two words spelled wrong

+void
+assign_catalog_cache_prune_min_age(int newval, void *extra)
+{
+ catalog_cache_prune_min_age = newval;
+}

hmm, do we need this?

+ /*
+ * Entries that are not accessed after the last pruning
+ * are removed in that seconds, and their lives are
+ * prolonged according to how many times they are accessed
+ * up to three times of the duration. We don't try shrink
+ * buckets since pruning effectively caps catcache
+ * expansion in the long term.
+ */
+ ct->naccess = Min(2, ct->naccess);

The code doesn't match the comment, it seems, because the limit here
is 2, not 3. I wonder if this does anything anyway. My intuition is
that when a catcache entry gets accessed at all it's probably likely
to get accessed a bunch of times. If there are any meaningful
thresholds here I'd expect us to be trying to distinguish things like
1000+ accesses vs. 100-1000 vs. 10-100 vs. 1-10. Or maybe we don't
need to distinguish at all and can just have a single mark bit rather
than a counter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Protect syscache from bloating with negative cache entries

2020-11-17 Thread Heikki Linnakangas


On 09/11/2020 11:34, Kyotaro Horiguchi wrote:

At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas  wrote in

Do you need the "ntaccess == 2" test? You could always increment the
counter, and in the code that uses ntaccess to decide what to evict,
treat all values >= 2 the same.

Need to handle integer overflow somehow. Or maybe not: integer
overflow is so infrequent that even if a hot syscache entry gets
evicted prematurely because its ntaccess count wrapped around to 0, it
will happen so rarely that it won't make any difference in practice.


That relaxing simplifies the code significantly, but a significant
degradation by about 5% still exists.

(SearchCatCacheInternal())
  + ct->naccess++;
!+ ct->lastaccess = catcacheclock;

If I removed the second line above, the degradation disappears
(-0.7%).


0.7% degradation is probably acceptable.


However, I don't find the corresponding numbers in the output
of perf. The sum of the numbers for the removed instructions is (0.02
+ 0.28 = 0.3%).  I don't think the degradation as the whole doesn't
always reflect to the instruction level profiling, but I'm stuck here,
anyway.


Hmm. Some kind of cache miss effect, perhaps? offsetof(CatCTup, tuple) 
is exactly 64 bytes currently, so any fields that you add after 'tuple' 
will go on a different cache line. Maybe it would help if you just move 
the new fields before 'tuple'.


Making CatCTup smaller might help. Some ideas/observations:

- The 'ct_magic' field is only used for assertion checks. Could remove it.

- 4 Datums (32 bytes) are allocated for the keys, even though most 
catcaches have fewer key columns.


- In the current syscaches, keys[2] and keys[3] are only used to store 
32-bit oids or some other smaller fields. Allocating a full 64-bit Datum 
for them wastes memory.


- You could move the dead flag at the end of the struct or remove it 
altogether, with the change I mentioned earlier to not keep dead items 
in the buckets


- You could steal a few bit for dead/negative flags from some other 
field. Use special values for tuple.t_len for them or something.


With some of these tricks, you could shrink CatCTup so that the new 
lastaccess and naccess fields would fit in the same cacheline.


That said, I think this is good enough performance-wise as it is. So if 
we want to improve performance in general, that can be a separate patch.


- Heikki

Re: Protect syscache from bloating with negative cache entries

2020-11-09 Thread Kyotaro Horiguchi

At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas  wrote 
in 
> Do you need the "ntaccess == 2" test? You could always increment the
> counter, and in the code that uses ntaccess to decide what to evict,
> treat all values >= 2 the same.
> 
> Need to handle integer overflow somehow. Or maybe not: integer
> overflow is so infrequent that even if a hot syscache entry gets
> evicted prematurely because its ntaccess count wrapped around to 0, it
> will happen so rarely that it won't make any difference in practice.

That relaxing simplifies the code significantly, but a significant
degradation by about 5% still exists.

(SearchCatCacheInternal())
 + ct->naccess++;
!+ ct->lastaccess = catcacheclock;

If I removed the second line above, the degradation disappears
(-0.7%). However, I don't find the corresponding numbers in the output
of perf. The sum of the numbers for the removed instructions is (0.02
+ 0.28 = 0.3%).  I don't think the degradation as the whole doesn't
always reflect to the instruction level profiling, but I'm stuck here,
anyway.

 % samples
master  p2patched(p2 = patched - "ct->lastaccess = catcacheclock)
=
 0.47 | 0.27 |  0.17 |   mov   %rbx,0x8(%rbp)
  |  |   | SearchCatCacheInternal():
  |  |   | ct->naccess++;
  |  |   | ct->lastaccess = catcacheclock;
- |- |  0.02 |10f:   mov   catcacheclock,%rax
  |  |   | ct->naccess++;
- | 0.96 |  1.00 |   addl  $0x1,0x14(%rbx)
  |  |   | return NULL;
- | 0.11 |  0.16 |   xor   %ebp,%ebp
  |  |   | if (!ct->negative)
 0.27 | 0.30 |  0.03 |   cmpb  $0x0,0x21(%rbx)
  |  |   | ct->lastaccess = catcacheclock;
- |  |  0.28 |   mov   %rax,0x18(%rbx)
  |  |   | if (!ct->negative)
 0.34 | 0.08 |  0.59 | ↓ jne   149




For your information, the same table for a bit wider range follows.

 % samples
master  p2patched(p2 = patched - "ct->lastaccess = catcacheclock)
=
  |  |   | dlist_foreach(iter, bucket)
 6.91 | 7.06 |  5.89 |   mov   0x8(%rbp),%rbx
 0.78 | 0.73 |  0.81 |   test  %rbx,%rbx
  |  |   | ↓ je160
  |  |   |   cmp   %rbx,%rbp
 0.46 | 0.52 |  0.39 | ↓ jne   9d
  |  |   | ↓ jmpq  160
  |  |   |   nop
 5.68 | 5.54 |  6.03 | 90:   mov   0x8(%rbx),%rbx
 1.44 | 1.42 |  1.43 |   cmp   %rbx,%rbp
  |  |   | ↓ je160
  |  |   | {
  |  |   | ct = dlist_container(CatCTup, cache_elem, iter.cur);
  |  |   |
  |  |   | if (ct->dead)
30.36 |30.97 | 31.48 | 9d:   cmpb  $0x0,0x20(%rbx)
 2.63 | 2.60 |  2.69 | ↑ jne   90
  |  |   | continue;   /* ignore dead 
entries */
  |  |   |
  |  |   | if (ct->hash_value != hashValue)
 1.41 | 1.37 |  1.35 |   cmp   -0x24(%rbx),%edx
 3.19 | 2.97 |  2.87 | ↑ jne   90
 7.17 | 5.53 |  6.89 |   mov   %r13,%rsi
 0.02 | 0.04 |  0.04 |   xor   %r12d,%r12d
 3.00 | 2.98 |  2.95 | ↓ jmp   b5
 0.15 | 0.61 |  0.20 | b0:   mov   0x10(%rsp,%r12,1),%rsi
 6.58 | 5.04 |  5.95 | b5:   mov   %ecx,0xc(%rsp)
  |  |   | CatalogCacheCompareTuple():
  |  |   | if (!(cc_fastequal[i]) (cachekeys[i], searchkeys[i]))
 1.51 | 0.92 |  1.66 |   mov   -0x20(%rbx,%r12,1),%rdi
 0.54 | 1.64 |  0.58 |   mov   %edx,0x8(%rsp)
 3.78 | 3.11 |  3.86 | → callq *0x38(%r14,%r12,1)
 0.43 | 2.30 |  0.34 |   mov   0x8(%rsp),%edx
 0.20 | 0.94 |  0.25 |   mov   0xc(%rsp),%ecx
 0.44 | 0.41 |  0.44 |   test  %al,%al
  |  |   | ↑ je90
  |  |   | for (i = 0; i < nkeys; i++)
 2.28 | 1.07 |  2.26 |   add   $0x8,%r12
 0.08 | 0.23 |  0.07 |   cmp   $0x18,%r12
 0.11 | 0.64 |  0.10 | ↑ jne   b0
  |  |   | dlist_move_head():
  |  |   | */
  |  |   | static inline void
  |  |   | dlist_move_head(dlist_head *head, dlist_node *node)
  |  |   | {
  |  |   | /* fast path if it's already at the head */
  |  |   | if (head->head.next == node)
 0.08 | 0.61 |  0.04 |   cmp   0x8(%rbp),%rbx
 0.02 | 0.10 |  0.00 | ↓ je10f
  |  |   | return;
  |  |   |
  |  |   | dlist_delete(node);
 0.01 | 0.20 |  0.06 |   mov   0x8(%rbx),%rax
  |  |   | dlist_delete():
  |  |   | node->prev->next = node->next;
 0.75 | 0.13 |  0.72 |   mov   (%rbx),%rdx
 2.89 | 3.42 |  2.22 |   mov   %rax,0x8(%rdx)
  |  |   | node->next->prev = node->prev;
 0.01 | 0.09 |  0.00 |   mov

Re: Protect syscache from bloating with negative cache entries

2020-11-08 Thread Kyotaro Horiguchi

At Mon, 09 Nov 2020 11:13:31 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> Now the branch for counter-increment is removed.  For similar
> branches for counter-decrement side in CatCacheCleanupOldEntries(),
> Min() is compiled into cmovbe and a branch was removed.

Mmm. Sorry, I sent this by a mistake. Please ignore it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Protect syscache from bloating with negative cache entries

2020-11-08 Thread Kyotaro Horiguchi

At Fri, 6 Nov 2020 10:42:15 +0200, Heikki Linnakangas  wrote 
in 
> On 06/11/2020 10:24, Kyotaro Horiguchi wrote:
> > Thank you for the comment!
> > First off, I thought that I managed to eliminate the degradation
> > observed on the previous versions, but significant degradation (1.1%
> > slower) is still seen in on case.
> 
> One thing to keep in mind with micro-benchmarks like this is that even
> completely unrelated code changes can change the layout of the code in
> memory, which in turn can affect CPU caching affects in surprising
> ways. If you're lucky, you can see 1-5% differences just by adding a
> function that's never called, for example, if it happens to move other
> code in memory so that a some hot codepath or struct gets split across
> CPU cache lines. It can be infuriating when benchmarking.

True.  I sometimes had to make distclean to stabilize such benchmarks..

> > At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas
> >  wrote in
> > (A) original test patch
> > I naively thought that the code path is too short to bury the
> > degradation of additional a few instructions.  Actually I measured
> > performance again with the same patch set on the current master and
> > had the more or less the same result.
> > master 8195.58ms, patched 8817.40 ms: +10.75%
> > However, I noticed that the additional call was a recursive call and a
> > jmp inserted for the recursive call seems taking significant
> > time. After avoiding the recursive call, the difference reduced to
> > +0.96% (master 8268.71ms : patched 8348.30ms)
> > Just two instructions below are inserted in this case, which looks
> > reasonable.
> >8720ff <+31>: cmpl $0x,0x4ba942(%rip) # 0xd2ca48
> >
> >872106 <+38>: jl 0x872240  (call to a function)
> 
> That's interesting. I think a 1% degradation would be acceptable.
> 
> I think we'd like to enable this feature by default though, so the
> performance when it's enabled is also very important.
> 
> > (C) inserting bare counter-update code without a branch
> > 
> >> Do we actually need a branch there? If I understand correctly, the
> >> point is to bump up a usage counter on the catcache entry. You could
> >> increment the counter unconditionally, even if the feature is not
> >> used, and avoid the branch that way.
> > That change causes 4.9% degradation, which is worse than having a
> > branch.
> > master 8364.54ms, patched 8666.86ms (+4.9%)
> > The additional instructions follow.
> > + 8721ab <+203>:mov0x30(%rbx),%eax  # %eax = ct->naccess
> > + 8721ae <+206>:mov$0x2,%edx
> > + 8721b3 <+211>:add$0x1,%eax# %eax++
> > + 8721b6 <+214>: cmove %edx,%eax # if %eax == 0 then %eax = 2
> > 
> > + 8721bf <+223>:mov%eax,0x30(%rbx)  # ct->naccess = %eax
> > + 8721c2 <+226>: mov 0x4cfe9f(%rip),%rax # 0xd42068 
> > + 8721c9 <+233>:mov%rax,0x38(%rbx)  # ct->lastaccess = %rax
> 
> Do you need the "ntaccess == 2" test? You could always increment the
> counter, and in the code that uses ntaccess to decide what to evict,
> treat all values >= 2 the same.
> 
> Need to handle integer overflow somehow. Or maybe not: integer
> overflow is so infrequent that even if a hot syscache entry gets
> evicted prematurely because its ntaccess count wrapped around to 0, it
> will happen so rarely that it won't make any difference in practice.

Agreed. Ok, I have prioritized completely avoiding degradation on the
normal path, but laxing that restriction to 1% or so makes the code
far simpler and make the expiration path signifinicantly faster.

Now the branch for counter-increment is removed.  For similar
branches for counter-decrement side in CatCacheCleanupOldEntries(),
Min() is compiled into cmovbe and a branch was removed.

Re: Protect syscache from bloating with negative cache entries

2020-11-06 Thread Heikki Linnakangas


On 06/11/2020 10:24, Kyotaro Horiguchi wrote:

Thank you for the comment!

First off, I thought that I managed to eliminate the degradation
observed on the previous versions, but significant degradation (1.1%
slower) is still seen in on case.


One thing to keep in mind with micro-benchmarks like this is that even 
completely unrelated code changes can change the layout of the code in 
memory, which in turn can affect CPU caching affects in surprising ways. 
If you're lucky, you can see 1-5% differences just by adding a function 
that's never called, for example, if it happens to move other code in 
memory so that a some hot codepath or struct gets split across CPU cache 
lines. It can be infuriating when benchmarking.



At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas  wrote in
(A) original test patch

I naively thought that the code path is too short to bury the
degradation of additional a few instructions.  Actually I measured
performance again with the same patch set on the current master and
had the more or less the same result.

master 8195.58ms, patched 8817.40 ms: +10.75%

However, I noticed that the additional call was a recursive call and a
jmp inserted for the recursive call seems taking significant
time. After avoiding the recursive call, the difference reduced to
+0.96% (master 8268.71ms : patched 8348.30ms)

Just two instructions below are inserted in this case, which looks
reasonable.

   8720ff <+31>:  cmpl   $0x,0x4ba942(%rip)# 0xd2ca48 

   872106 <+38>:  jl 0x872240  (call to a function)


That's interesting. I think a 1% degradation would be acceptable.

I think we'd like to enable this feature by default though, so the 
performance when it's enabled is also very important.



(C) inserting bare counter-update code without a branch


Do we actually need a branch there? If I understand correctly, the
point is to bump up a usage counter on the catcache entry. You could
increment the counter unconditionally, even if the feature is not
used, and avoid the branch that way.


That change causes 4.9% degradation, which is worse than having a
branch.

master 8364.54ms, patched 8666.86ms (+4.9%)

The additional instructions follow.

+ 8721ab <+203>:  mov0x30(%rbx),%eax  # %eax = ct->naccess
+ 8721ae <+206>:  mov$0x2,%edx
+ 8721b3 <+211>:  add$0x1,%eax# %eax++
+ 8721b6 <+214>:  cmove  %edx,%eax# if %eax == 0 then %eax = 2

+ 8721bf <+223>:  mov%eax,0x30(%rbx)  # ct->naccess = %eax
+ 8721c2 <+226>:  mov0x4cfe9f(%rip),%rax# 0xd42068 
+ 8721c9 <+233>:  mov%rax,0x38(%rbx)  # ct->lastaccess = %rax


Do you need the "ntaccess == 2" test? You could always increment the 
counter, and in the code that uses ntaccess to decide what to evict, 
treat all values >= 2 the same.


Need to handle integer overflow somehow. Or maybe not: integer overflow 
is so infrequent that even if a hot syscache entry gets evicted 
prematurely because its ntaccess count wrapped around to 0, it will 
happen so rarely that it won't make any difference in practice.


- Heikki

Re: Protect syscache from bloating with negative cache entries

2020-11-06 Thread Kyotaro Horiguchi

me> First off, I thought that I managed to eliminate the degradation
me> observed on the previous versions, but significant degradation (1.1%
me> slower) is still seen in on case.

While trying benchmarking with many patterns, I noticed that it slows
down catcache search significantly to call CatCacheCleanupOldEntries()
even if the function does almost nothing.  Oddly enough the
degradation gets larger if I removed the counter-updating code from
SearchCatCacheInternal. It seems that RehashCatCache is called far
frequently than I thought and CatCacheCleanupOldEntries was suffering
the branch penalty.

The degradation vanished by a likely() attached to the condition. On
the contrary patched version is constantly slightly faster than
master.

For now, I measured the patch with three access patterns as the
catcachebench was designed.

 master  patched-off patched-on(300s)
test 1   3898.18ms   3896.11ms (-0.1%)   3889.44ms (-  0.2%)
test 2   8013.37ms   8098.51ms (+1.1%)   8640.63ms (+  7.8%)
test 3   6146.95ms   6147.91ms (+0.0%)  15466   ms (+152  %)

master : This patch is not applied.
patched-off: This patch is applied and catalog_cache_prune_min_age = -1
patched-on : This patch is applied and catalog_cache_prune_min_age = 0

test 1: Creates many negative entries in STATRELATTINH
(expiration doesn't happen)
test 2: Repeat fetch several negative entries for many times.
test 3: test 1 with expiration happens.

The result looks far better, but the test 2 still shows a small
degradation... I'll continue investigating it..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 9516267f0e2943cf955cbbfe5133c13c36288ee6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Nov 2020 17:27:18 +0900
Subject: [PATCH v4] CatCache expiration feature

---
 src/backend/access/transam/xact.c  |   3 +
 src/backend/utils/cache/catcache.c | 125 +
 src/backend/utils/misc/guc.c   |  12 +++
 src/include/utils/catcache.h   |  20 +
 4 files changed, 160 insertions(+)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afcebb1..a246fcc4c0 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1086,6 +1086,9 @@ static void
 AtStart_Cache(void)
 {
 	AcceptInvalidationMessages();
+
+	if (xactStartTimestamp != 0)
+		SetCatCacheClock(xactStartTimestamp);
 }
 
 /*
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 3613ae5f44..f63224bfd5 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -38,6 +38,7 @@
 #include "utils/rel.h"
 #include "utils/resowner_private.h"
 #include "utils/syscache.h"
+#include "utils/timestamp.h"
 
 
  /* #define CACHEDEBUG */	/* turns DEBUG elogs on */
@@ -60,9 +61,18 @@
 #define CACHE_elog(...)
 #endif
 
+/*
+ * GUC variable to define the minimum age of entries that will be considered
+ * to be evicted in seconds. -1 to disable the feature.
+ */
+int catalog_cache_prune_min_age = -1;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Clock for the last accessed time of a catcache entry. */
+TimestampTz	catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
 			   int nkeys,
 			   Datum v1, Datum v2,
@@ -74,6 +84,7 @@ static pg_noinline HeapTuple SearchCatCacheMiss(CatCache *cache,
 Index hashIndex,
 Datum v1, Datum v2,
 Datum v3, Datum v4);
+static bool CatCacheCleanupOldEntries(CatCache *cp);
 
 static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys,
 		   Datum v1, Datum v2, Datum v3, Datum v4);
@@ -99,6 +110,12 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 			 Datum *srckeys, Datum *dstkeys);
 
+/* GUC assign function */
+void
+assign_catalog_cache_prune_min_age(int newval, void *extra)
+{
+	catalog_cache_prune_min_age = newval;
+}
 
 /*
  *	internal support functions
@@ -863,6 +880,10 @@ RehashCatCache(CatCache *cp)
 	int			newnbuckets;
 	int			i;
 
+	/* try removing old entries before expanding hash */
+	if (CatCacheCleanupOldEntries(cp))
+		return;
+
 	elog(DEBUG1, "rehashing catalog cache id %d for %s; %d tups, %d buckets",
 		 cp->id, cp->cc_relname, cp->cc_ntup, cp->cc_nbuckets);
 
@@ -1264,6 +1285,20 @@ SearchCatCacheInternal(CatCache *cache,
 		 */
 		dlist_move_head(bucket, >cache_elem);
 
+		/*
+		 * Prolong life of this entry. Since we want run as less instructions
+		 * as possible and want the branch be stable for performance reasons,
+		 * we don't give a strict cap on the counter. All numbers above 1 will
+		 * be regarded as 2 in CatCacheCleanupOldEntries().
+		 */
+		if (unlikely(catalog_cache_prune_min_age >= 0))
+		{
+			ct->naccess++;
+			if (unlikely(ct->naccess

Re: Protect syscache from bloating with negative cache entries

2020-11-06 Thread Kyotaro Horiguchi

Thank you for the comment!

First off, I thought that I managed to eliminate the degradation
observed on the previous versions, but significant degradation (1.1%
slower) is still seen in on case.

Anyway, before sending the new patch, let met just answer for the
comments.

At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas  wrote 
in 
> On 19/11/2019 12:48, Kyotaro Horiguchi wrote:
> > 1. Inserting a branch in
> > SearchCatCacheInternal. (CatCache_Pattern_1.patch)
> >   This is the most straightforward way to add an alternative feature.
> > pattern 1 | 8459.73 |  28.15  # 9% (>> 1%) slower than 7757.58
> > pattern 1 | 8504.83 |  55.61
> > pattern 1 | 8541.81 |  41.56
> > pattern 1 | 8552.20 |  27.99
> > master| 7757.58 |  22.65
> > master| 7801.32 |  20.64
> > master| 7839.57 |  25.28
> > master| 7925.30 |  38.84
> >   It's so slow that it cannot be used.
> 

> This is very surprising. A branch that's never taken ought to be
> predicted by the CPU's branch-predictor, and be very cheap.

(A) original test patch

I naively thought that the code path is too short to bury the
degradation of additional a few instructions.  Actually I measured
performance again with the same patch set on the current master and
had the more or less the same result.

master 8195.58ms, patched 8817.40 ms: +10.75%

However, I noticed that the additional call was a recursive call and a
jmp inserted for the recursive call seems taking significant
time. After avoiding the recursive call, the difference reduced to
+0.96% (master 8268.71ms : patched 8348.30ms)

Just two instructions below are inserted in this case, which looks
reasonable.

  8720ff <+31>: cmpl   $0x,0x4ba942(%rip)# 0xd2ca48 

  872106 <+38>: jl 0x872240  (call to a function)


(C) inserting bare counter-update code without a branch

> Do we actually need a branch there? If I understand correctly, the
> point is to bump up a usage counter on the catcache entry. You could
> increment the counter unconditionally, even if the feature is not
> used, and avoid the branch that way.

That change causes 4.9% degradation, which is worse than having a
branch.

master 8364.54ms, patched 8666.86ms (+4.9%)

The additional instructions follow.

+ 8721ab <+203>:mov0x30(%rbx),%eax  # %eax = ct->naccess
+ 8721ae <+206>:mov$0x2,%edx
+ 8721b3 <+211>:add$0x1,%eax# %eax++
+ 8721b6 <+214>:cmove  %edx,%eax# if %eax == 0 then %eax = 2

+ 8721bf <+223>:mov%eax,0x30(%rbx)  # ct->naccess = %eax
+ 8721c2 <+226>:mov0x4cfe9f(%rip),%rax# 0xd42068 

+ 8721c9 <+233>:mov%rax,0x38(%rbx)  # ct->lastaccess = %rax


(D) naively branching then updateing, again.

Come to think of this, I measured the same with a branch again,
specifically: (It showed siginificant degradation before, in my
memory.)

  dlsit_move_head(bucket, >cache_elem);

+ if (catalog_cache_prune_min_age < -1)  # never be true
+ {
+(counter update)
+ }

And I had effectively the same numbers from both master and patched.

master 8066.93ms, patched 8052.37ms (-0.18%)

The above branching inserts the same two instructions with (B) into
different place but the result differs, for a reason uncertain to me.

+  8721bb <+203>:   cmpl   $0x,0x4bb886(%rip)   # 

+  8721c2 <+210>:   jl 0x872208 

I'm not sure why but the patched beats the master by a small
difference.  Anyway ths new result shows that compiler might have got
smarter than before?


(E) bumping up in ReleaseCatCache() (won't work)

> Another thought is to bump up the usage counter in ReleaseCatCache(),
> and only when the refcount reaches zero. That might be somewhat
> cheaper, if it's a common pattern to acquire additional leases on an
> entry that's already referenced.
> 
> Yet another thought is to replace 'refcount' with an 'acquirecount'
> and 'releasecount'. In SearchCatCacheInternal(), increment
> acquirecount, and in ReleaseCatCache, increment releasecount. When
> they are equal, the entry is not in use. Now you have a counter that
> gets incremented on every access, with the same number of CPU
> instructions in the hot paths as we have today.

These don't work for negative caches, since the corresponding tuples
are never released.


(F) removing less-significant code.

> Or maybe there are some other ways we could micro-optimize
> SearchCatCacheInternal(), to buy back the slowdown that this feature

Yeah, I thought of that in the beginning. (I removed dlist_move_head()
at the time.)  But the most difficult aspect of this approach is that
I cannot tell whether the modification never cause degradation or not.

> would add? For example, you could remove the "if (cl->dead) continue;"
> check, if dead entries were kept out of the hash buckets. Or maybe the
> catctup struct could be made slightly smaller somehow, so that it
> would fit more comfortably in a single cache line.

As a trial, I removed that code and added

Re: Protect syscache from bloating with negative cache entries

2020-11-05 Thread Heikki Linnakangas


On 19/11/2019 12:48, Kyotaro Horiguchi wrote:

1. Inserting a branch in SearchCatCacheInternal. (CatCache_Pattern_1.patch)

  This is the most straightforward way to add an alternative feature.

pattern 1 | 8459.73 |  28.15  # 9% (>> 1%) slower than 7757.58
pattern 1 | 8504.83 |  55.61
pattern 1 | 8541.81 |  41.56
pattern 1 | 8552.20 |  27.99
master| 7757.58 |  22.65
master| 7801.32 |  20.64
master| 7839.57 |  25.28
master| 7925.30 |  38.84

  It's so slow that it cannot be used.


This is very surprising. A branch that's never taken ought to be 
predicted by the CPU's branch-predictor, and be very cheap.


Do we actually need a branch there? If I understand correctly, the point 
is to bump up a usage counter on the catcache entry. You could increment 
the counter unconditionally, even if the feature is not used, and avoid 
the branch that way.


Another thought is to bump up the usage counter in ReleaseCatCache(), 
and only when the refcount reaches zero. That might be somewhat cheaper, 
if it's a common pattern to acquire additional leases on an entry that's 
already referenced.


Yet another thought is to replace 'refcount' with an 'acquirecount' and 
'releasecount'. In SearchCatCacheInternal(), increment acquirecount, and 
in ReleaseCatCache, increment releasecount. When they are equal, the 
entry is not in use. Now you have a counter that gets incremented on 
every access, with the same number of CPU instructions in the hot paths 
as we have today.


Or maybe there are some other ways we could micro-optimize 
SearchCatCacheInternal(), to buy back the slowdown that this feature 
would add? For example, you could remove the "if (cl->dead) continue;" 
check, if dead entries were kept out of the hash buckets. Or maybe the 
catctup struct could be made slightly smaller somehow, so that it would 
fit more comfortably in a single cache line.


My point is that I don't think we want to complicate the code much for 
this. All the indirection stuff seems over-engineered for this. Let's 
find a way to keep it simple.


- Heikki

Re: Protect syscache from bloating with negative cache entries

2020-11-04 Thread Kyotaro Horiguchi

Hello.

The attached is the version that is compactified from the previous
version.

At Thu, 01 Oct 2020 16:47:18 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> This is the rebased version.

It occurred to me suddenly that static parameters to inline functions
causes optimization.  I split SearchCatCacheInternal() into two
almost-the-same functions SearchCatCacheInternalb() and -e() until the
previous version but in turn I merged the two functions into the
original function and added a parameter "do_expire".

Then compared at machine-code level of the two deduced functions
SearchCatCache1b and SearchCatCache1e and confirmed that the
expiration-related code is eliminated from the former.

The lines prefixed by '+' are the instructions corresponding the
following C-code, which are eliminated in SearchCatCache1b.

2770 :
2770:   push   %r15
2772:   push   %r14
...
2849:   mov%rbp,(%rbx)
284c:   mov%rbx,(%rax)
284f:   mov%rbx,0x8(%rbp)
+   2853:   mov0x30(%rbx),%eax  # %eax = ct->naccess
+   2856:   mov$0x2,%edx
+   285b:   add$0x1,%eax  #  ct->access++
+   285e:   cmove  %edx,%eax  #  if(ct->access == 0) %eax = 2
2861:   xor%ebp,%ebp
2863:   cmpb   $0x0,0x15(%rbx)  # (if (!ct->negative))
+   2867:   mov%eax,0x30(%rbx)  # ct->access = %eax
+   286a:   mov0x0(%rip),%rax   # %rax = catcacheclock
+   2871:   mov%rax,0x38(%rbx)  # ct->lastaccess = %rax
2875:   jne289a 
2877:   mov0x0(%rip),%rdi


>   if (do_expire)
>   {
>   ct->naccess++;
>   if (unlikely(ct->naccess == 0))
>   ct->naccess = 2;
>   ct->lastaccess = catcacheclock;
>   }

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 5ac54752a8c4f555b3ccc405c65e74614057bf34 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 10 Jan 2020 15:02:26 +0900
Subject: [PATCH v3 1/3] base_change

Adds struct members needed by catcache expiration feature and a GUC
variable that controls the behavior of the feature. But no substantial
code is not added yet.

If existence of some variables alone can cause degradation,
benchmarking after this patch shows that.
---
 src/backend/access/transam/xact.c  |  3 +++
 src/backend/utils/cache/catcache.c | 15 +++
 src/backend/utils/misc/guc.c   | 13 +
 src/include/utils/catcache.h   | 20 
 4 files changed, 51 insertions(+)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afcebb1..1040713955 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1085,6 +1085,9 @@ ForceSyncCommit(void)
 static void
 AtStart_Cache(void)
 {
+	if (xactStartTimestamp != 0)
+		SetCatCacheClock(xactStartTimestamp);
+
 	AcceptInvalidationMessages();
 }
 
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 3613ae5f44..c5bad63aac 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -60,9 +60,18 @@
 #define CACHE_elog(...)
 #endif
 
+/*
+ * GUC variable to define the minimum age of entries that will be considered
+ * to be evicted in seconds. -1 to disable the feature.
+ */
+int catalog_cache_prune_min_age = 300;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Clock for the last accessed time of a catcache entry. */
+TimestampTz	catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
 			   int nkeys,
 			   Datum v1, Datum v2,
@@ -99,6 +108,12 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 			 Datum *srckeys, Datum *dstkeys);
 
+/* GUC assign function */
+void
+assign_catalog_cache_prune_min_age(int newval, void *extra)
+{
+	catalog_cache_prune_min_age = newval;
+}
 
 /*
  *	internal support functions
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a62d64eaa4..628a95376c 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -88,6 +88,8 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
+#include "utils/catcache.h"
+#include "utils/guc_tables.h"
 #include "utils/float.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
@@ -2358,6 +2360,17 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"catalog_cache_prune_min_age", PGC_USERSET, RESOURCES_MEM,
+			gettext_noop("System catalog cache entries that live unused for longer than this seconds are considered for removal."),
+			gettext_noop("The value of -1 turns off pruning."),
+

Re: Protect syscache from bloating with negative cache entries

2020-10-01 Thread Kyotaro Horiguchi

At Thu, 1 Oct 2020 13:37:29 +0900, Michael Paquier  wrote 
in 
> On Wed, Jan 22, 2020 at 02:38:19PM +0900, Kyotaro Horiguchi wrote:
> > I changed my mind to attach the benchmark patch as .txt file,
> > expecting the checkers not picking up it as a part of the patchset.
> > 
> > I have in the precise performance measurement mode for a long time,
> > but I think it's settled. I'd like to return to normal mode and
> > explain this patch.
> 
> Looking at the CF bot, this patch set does not compile properly.
> Could you look at that?

It is complaining that TimestampDifference is implicitly used. I'm not
sure the exact cause but maybe some refactoring in header file
inclusion caused that.

This is the rebased version.

Thanks!

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 07ce4cf303a365c3f3efc3f133551a67e4646875 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 10 Jan 2020 15:02:26 +0900
Subject: [PATCH v2 1/3] base_change

Adds struct members needed by catcache expiration feature and a GUC
variable that controls the behavior of the feature. But no substantial
code is not added yet.

If existence of some variables alone can cause degradation,
benchmarking after this patch shows that.
---
 src/backend/access/transam/xact.c  |  3 +++
 src/backend/utils/cache/catcache.c | 15 +++
 src/backend/utils/misc/guc.c   | 13 +
 src/include/utils/catcache.h   | 20 
 4 files changed, 51 insertions(+)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afcebb1..1040713955 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1085,6 +1085,9 @@ ForceSyncCommit(void)
 static void
 AtStart_Cache(void)
 {
+	if (xactStartTimestamp != 0)
+		SetCatCacheClock(xactStartTimestamp);
+
 	AcceptInvalidationMessages();
 }
 
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index 3613ae5f44..c5bad63aac 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -60,9 +60,18 @@
 #define CACHE_elog(...)
 #endif
 
+/*
+ * GUC variable to define the minimum age of entries that will be considered
+ * to be evicted in seconds. -1 to disable the feature.
+ */
+int catalog_cache_prune_min_age = 300;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Clock for the last accessed time of a catcache entry. */
+TimestampTz	catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
 			   int nkeys,
 			   Datum v1, Datum v2,
@@ -99,6 +108,12 @@ static void CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 static void CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos,
 			 Datum *srckeys, Datum *dstkeys);
 
+/* GUC assign function */
+void
+assign_catalog_cache_prune_min_age(int newval, void *extra)
+{
+	catalog_cache_prune_min_age = newval;
+}
 
 /*
  *	internal support functions
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 596bcb7b84..e39c756f80 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -88,6 +88,8 @@
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/bytea.h"
+#include "utils/catcache.h"
+#include "utils/guc_tables.h"
 #include "utils/float.h"
 #include "utils/guc_tables.h"
 #include "utils/memutils.h"
@@ -2358,6 +2360,17 @@ static struct config_int ConfigureNamesInt[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"catalog_cache_prune_min_age", PGC_USERSET, RESOURCES_MEM,
+			gettext_noop("System catalog cache entries that live unused for longer than this seconds are considered for removal."),
+			gettext_noop("The value of -1 turns off pruning."),
+			GUC_UNIT_S
+		},
+		_cache_prune_min_age,
+		300, -1, INT_MAX,
+		NULL, assign_catalog_cache_prune_min_age, NULL
+	},
+
 	/*
 	 * We use the hopefully-safely-small value of 100kB as the compiled-in
 	 * default for max_stack_depth.  InitializeGUCOptions will increase it if
diff --git a/src/include/utils/catcache.h b/src/include/utils/catcache.h
index f4aa316604..3d3870f05a 100644
--- a/src/include/utils/catcache.h
+++ b/src/include/utils/catcache.h
@@ -22,6 +22,7 @@
 
 #include "access/htup.h"
 #include "access/skey.h"
+#include "datatype/timestamp.h"
 #include "lib/ilist.h"
 #include "utils/relcache.h"
 
@@ -61,6 +62,7 @@ typedef struct catcache
 	slist_node	cc_next;		/* list link */
 	ScanKeyData cc_skey[CATCACHE_MAXKEYS];	/* precomputed key info for heap
 			 * scans */
+	TimestampTz	cc_oldest_ts;	/* timestamp of the oldest tuple in the hash */
 
 	/*
 	 * Keep these at the end, so that compiling catcache.c with CATCACHE_STATS
@@ -119,6 +121,8 @@ typedef struct catctup
 	bool		dead;			/* dead but not yet removed? */
 	bool		negative;		/* negative cache entry? */
 	HeapTupleData tuple;		/* tuple management header */
+	unsigned int naccess;		/* # of access to this

Re: Protect syscache from bloating with negative cache entries

2020-09-30 Thread Michael Paquier

On Wed, Jan 22, 2020 at 02:38:19PM +0900, Kyotaro Horiguchi wrote:
> I changed my mind to attach the benchmark patch as .txt file,
> expecting the checkers not picking up it as a part of the patchset.
> 
> I have in the precise performance measurement mode for a long time,
> but I think it's settled. I'd like to return to normal mode and
> explain this patch.

Looking at the CF bot, this patch set does not compile properly.
Could you look at that?
--
Michael


signature.asc
Description: PGP signature

Re: Protect syscache from bloating with negative cache entries

2020-01-21 Thread Kyotaro Horiguchi

At Tue, 21 Jan 2020 17:29:47 +0100, Tomas Vondra  
wrote in 
> I see this patch is stuck in WoA since 2019/12/01, although there's a
> new patch version from 2020/01/14. But the patch seems to no longer
> apply, at least according to https://commitfest.cputube.org :-( So at
> this point the status is actually correct.
> 
> Not sure about the appveyor build (it seems to be about
> jsonb_set_lax),
> but on travis it fails like this:
> 
>   catcache.c:820:1: error: no previous prototype for
>   ‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes]

I changed my mind to attach the benchmark patch as .txt file,
expecting the checkers not picking up it as a part of the patchset.

I have in the precise performance measurement mode for a long time,
but I think it's settled. I'd like to return to normal mode and
explain this patch.

=== Motive of the patch

System cache is a mechanism that accelerates access to system catalogs
Basically the entries in a cache is removed via invalidation machanism
when corresponding system catalog entry is removed. On the other hand
the system cache also holds "negative" entries that indicates that the
object is nonexistent, which entry accelerates the response for
nonexistent objects. But the negative cache doesn't have a chance of
removal.

On a long-lived session that accepts a wide variety of queries on many
objects, system cache holds the cache entries for many objects that
are accessed once or a few times. Suppose every object is accessed
once per, say, 30 minutes, and the query doesn't needed to run in a
very short time. Such cache entries are almost useless but occupy a
large amount of memory.


=== Possible solutions.

Many caching system has expiration mechanism, which removes "useless"
entries to keep the size under a certain limit.  The limit is
typically defined by memory usage or expiration time, in a hard or
soft way.  Since we don't implement an detailed accouting of memory
usage by cache for the performance reasons, we can use coarse memory
accounting or expiration time.  This patch uses expiration time
because it can be detemined on a rather clearer basis.


=== Pruning timing

The next point is when to prune cache entries. Apparently it's not
reasonable to do on every cache access time, since pruning takes a far
longer time than cache access.

The system cache is implemented on a hash. When there's no room for a new cache 
entry, it gets twice in size and rehashes all entries.  If pruning made some 
space for the new entry, rehashing can be avoided, so this patch tries pruning 
just before enlarging hash table.

A system cache can be shrinked if less than a half of the size is
used, but this patch doesn't that.  It is because we cannot predict if
the system cache that have just shrinked is going to enlarged just
after and I don't want get this patch that complex.


=== Performance

The pruning mechanism adds several entries to cache entry and updates

System cache is a very light-weight machinery so that inserting one
branch affects performance apparently. So in this patch, the new stuff
is isolated from existing code path using indirect call. After trials
on some call-points that can be indirect calls, I found that
SearchCatCache[1-4]() is the only point that doesn't affect
performance. (Please see upthread for details.)  That configuraion
also allows future implements of system caches, such like shared
system caches.

The alternative SearchCatCache[1-4] functions get a bit slower because
it maintains access timestamp and access counter.  Addition to that
pruning puts a certain amount of additional time if no entries are not
pruned off.


=== Pruning criteria

At the pruning time described above, every entry is examined agianst
the GUC variable catalog_cache_prune_min_age. The pruning mechanism
involves a clock-sweep-like mechanism where an entry lives longer if
it had accessed. Entry of which access counter is zero is pruned after
catalog_cache_prune_min_age. Otherwise an entry survives the pruning
round and its counter is decremented.

All the timestamp used by the stuff is "catcacheclock", which is
updated at every transaction start.


=== Concise test

The attached test1.pl can be used to replay the syscache-bloat caused
by negative entries. Setting $prune_age to -1, pruning is turned of
and you can see that the backend unlimitedly takes more and more
memory as time proceeds. Setting it to 10 or so, the memory size of
backend process will stops raising at certain amount.


=== The patch

The attached following are the patch. They have been separated for the
benchmarking reasons but that seems to make the patch more easy to
read so I leave it alone.  I forgot its correct version through a long
time of benchmarking so I started from v1 now.

- v1-0001-base_change.patch
  Adds new members to existing structs and catcacheclock-related code.

- v1-0002-Make-CatCacheSearchN-indirect-functions.patch
  Changes SearchCatCacheN functions to be called by indirect

Re: Protect syscache from bloating with negative cache entries

2020-01-21 Thread Michael Paquier

On Tue, Jan 21, 2020 at 02:17:53PM -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
>> Hmm ... travis is running -Werror?  That seems overly strict.  I think
>> we shouldn't punt a patch because of that.
> 
> Why not?  We're not going to allow pushing a patch that throws warnings
> on common compilers.  Or if that does happen, some committer is going
> to have to spend time cleaning it up.  Better to clean it up sooner.
> 
> (There is, btw, at least one buildfarm animal using -Werror.)

I agree that it is good to have in Mr Robot.  More early detection
means less follow-up cleanup.
--
Michael


signature.asc
Description: PGP signature

Re: Protect syscache from bloating with negative cache entries

2020-01-21 Thread Kyotaro Horiguchi

Hello.

At Tue, 21 Jan 2020 14:17:53 -0500, Tom Lane  wrote in 
> Alvaro Herrera  writes:
> > On 2020-Jan-21, Tomas Vondra wrote:
> >> Not sure about the appveyor build (it seems to be about jsonb_set_lax),
> 
> FWIW, I think I fixed jsonb_set_lax yesterday, so that problem should
> be gone the next time the cfbot tries this.
> 
> >> but on travis it fails like this:
> >> catcache.c:820:1: error: no previous prototype for 
> >> ‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes]
> 
> > Hmm ... travis is running -Werror?  That seems overly strict.  I think
> > we shouldn't punt a patch because of that.
> 
> Why not?  We're not going to allow pushing a patch that throws warnings
> on common compilers.  Or if that does happen, some committer is going
> to have to spend time cleaning it up.  Better to clean it up sooner.
> 
> (There is, btw, at least one buildfarm animal using -Werror.)

Mmm. The cause of the error is tentative (or crude or brute)
benchmarking function provided as an extension which is not actually a
part of the patch and was included for reviewer's convenience.
Howerver, I don't want it work on Windows build. If that is regarded
as a reason for being punt, I'll repost a new version without the
benchmark soon.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Protect syscache from bloating with negative cache entries

2020-01-21 Thread Tom Lane

Alvaro Herrera  writes:
> On 2020-Jan-21, Tomas Vondra wrote:
>> Not sure about the appveyor build (it seems to be about jsonb_set_lax),

FWIW, I think I fixed jsonb_set_lax yesterday, so that problem should
be gone the next time the cfbot tries this.

>> but on travis it fails like this:
>> catcache.c:820:1: error: no previous prototype for 
>> ‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes]

> Hmm ... travis is running -Werror?  That seems overly strict.  I think
> we shouldn't punt a patch because of that.

Why not?  We're not going to allow pushing a patch that throws warnings
on common compilers.  Or if that does happen, some committer is going
to have to spend time cleaning it up.  Better to clean it up sooner.

(There is, btw, at least one buildfarm animal using -Werror.)

regards, tom lane

Re: Protect syscache from bloating with negative cache entries

2020-01-21 Thread Alvaro Herrera

On 2020-Jan-21, Tomas Vondra wrote:

> Not sure about the appveyor build (it seems to be about jsonb_set_lax),
> but on travis it fails like this:
> 
>   catcache.c:820:1: error: no previous prototype for 
> ‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes]

Hmm ... travis is running -Werror?  That seems overly strict.  I think
we shouldn't punt a patch because of that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Protect syscache from bloating with negative cache entries

2020-01-21 Thread Tomas Vondra


Hello Kyotaro-san,

I see this patch is stuck in WoA since 2019/12/01, although there's a
new patch version from 2020/01/14. But the patch seems to no longer
apply, at least according to https://commitfest.cputube.org :-( So at
this point the status is actually correct.

Not sure about the appveyor build (it seems to be about jsonb_set_lax),
but on travis it fails like this:

  catcache.c:820:1: error: no previous prototype for 
‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes]

so I'll leave it in WoA for now.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Protect syscache from bloating with negative cache entries

2020-01-13 Thread Kyotaro Horiguchi

This is a new complete workable patch after a long time of struggling
with benchmarking.

At Tue, 19 Nov 2019 19:48:10 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> I ran the run2.sh script attached, which runs catcachebench2(), which
> asks SearchSysCache3() for cached entries (almost) 24 times per
> run.  The number of each output line is the mean of 3 times runs, and
> stddev. Lines are in "time" order and edited to fit here. "gen_tbl.pl
> | psql" creates a database for the benchmark. catcachebench2() runs
> the shortest path in the three in the attached benchmark program.
> 
> (pg_ctl start)
> $ perl gen_tbl.pl | psql ...
> (pg_ctl stop)

I wonder why I took the average of the time instead of choose the
fastest one.  This benchmark is extremely CPU intensive so the fastest
run reliably represents the perfromance.

I changed the benchmark so that it shows the time of the fastest run
(run4.sh). Based on the latest result, I used the pattern 3
(SearchSyscacheN indirection, but wrongly pointed as 1 in the last
mail) in the latest version,

I took the number of the fastest time among 3 iteration of 5 runs of
both master/patched O2 binaries.

 version |   min   
-+-
 master  | 7986.65 
 patched | 7984.47   = 'indirect' below

I would say this version doesn't get degradaded by indirect calls.
So, I applied the other part of the catcache expiration patch as the
succeeding parts. After that I got somewhat strange but very stable
result.  Just adding struct members acceleartes the benchmark. The
numbers are the fastest time of 20 runs of the bencmark in 10 times
iterations.

  ms
master  7980.79   # the master with the benchmark extension (0001)
=
base7340.96   # add only struct members and a GUC variable. (0002)
indirect7998.68   # call SearchCatCacheN indirectly (0003)
=
expire-off  7422.30   # CatCache expiration (0004)
  # (catalog_cache_prune_min_age = -1)
expire-on   7861.13   # CatCache expiration (catalog_cache_prune_min_age = 0)


The patch accelerates CatCaCheSearch for uncertain reasons. I'm not
sure what makes the difference between about 8000ms and about 7400 ms,
though. Several times building of all versions then running the
benchmark gave me the results with the same tendency. I once stop this
work at this point and continue later. The following files are
attached.

0001-catcache-benchmark-extension.patch:
  benchmnark extension used by the benchmarking here.  The test tables
  are generated using gentbl2.pl attached. (perl gentbk2.pl | psql)

0002-base_change.patch:
  Preliminary adds some struct members and a GUC variable to see if
  they cause any extent of degradation.

0003-Make-CatCacheSearchN-indirect-functions.patch:
  Rewrite to change CatCacheSearchN functions to be called indirectly.

0004-CatCache-expiration-feature.patch:
  Add CatCache expiration feature.

gentbl2.pl: A script that emits SQL statements to generate test tables.
run4.sh   : The test script I used for benchmarkiing here.
build2.sh : A script I used to build the four types of binaries used here.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From dacf4a2ac9eb49099e744ee24066b94e9f78aa61 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 14 Nov 2019 19:24:36 +0900
Subject: [PATCH 1/4] catcache benchmark extension

Provides the function catcachebench(bench_no int), which runs CPU
intensive benchmark on catcache search. The test table is created by a
script separately provided.

catcachebench(0): prewarm catcache with provided test tables.
catcachebench(1): fetches all attribute stats of all tables.
This benchmark loads a vast number of unique entries.
	Expriration doesn't work since it runs in a transaction.
catcachebench(2): fetches all attribute stats of a tables many times.
This benchmark repeatedly accesses already loaded entries.
	Expriration doesn't work since it runs in a transaction.
catcachebench(3): fetches all attribute stats of all tables four times.
Different from other modes, this runs expiration by forcibly
updates reference clock variable every 1000 entries.

At this point, variables needed for the expiration feature is not
added so SetCatCacheClock is a dummy macro that just replaces it with
its parameter.
---
 contrib/catcachebench/Makefile   |  17 +
 contrib/catcachebench/catcachebench--0.0.sql |  14 +
 contrib/catcachebench/catcachebench.c| 330 +++
 contrib/catcachebench/catcachebench.control  |   6 +
 src/backend/utils/cache/catcache.c   |  35 ++
 src/backend/utils/cache/syscache.c   |   2 +-
 src/include/utils/catcache.h |   3 +
 7 files changed, 406 insertions(+), 1 deletion(-)
 create mode 100644 contrib/catcachebench/Makefile
 create mode 100644 contrib/catcachebench/catcachebench--0.0.sql
 create mode 100644 contrib/catcachebench/catcachebench.c
 create mode 100644 contrib/catcachebench/catcachebench.control

Re: Protect syscache from bloating with negative cache entries

2019-11-30 Thread Michael Paquier

On Tue, Nov 19, 2019 at 07:48:10PM +0900, Kyotaro Horiguchi wrote:
> I'd like to throw in food for discussion on how much SearchSysCacheN
> suffers degradation from some choices on how we can insert a code into
> the SearchSysCacheN code path.

Please note that the patch has a warning, causing cfbot-san to
complain:
catcache.c:786:1: error: no previous prototype for
‘CatalogCacheFlushCatalog2’ [-Werror=missing-prototypes]
 CatalogCacheFlushCatalog2(Oid catId)
 ^
cc1: all warnings being treated as errors

So this should at least be fixed.  For now I have moved it to next CF,
waiting on author.
--
Michael


signature.asc
Description: PGP signature

Re: Protect syscache from bloating with negative cache entries

2019-11-19 Thread Kyotaro Horiguchi

I'd like to throw in food for discussion on how much SearchSysCacheN
suffers degradation from some choices on how we can insert a code into
the SearchSysCacheN code path.

I ran the run2.sh script attached, which runs catcachebench2(), which
asks SearchSysCache3() for cached entries (almost) 24 times per
run.  The number of each output line is the mean of 3 times runs, and
stddev. Lines are in "time" order and edited to fit here. "gen_tbl.pl
| psql" creates a database for the benchmark. catcachebench2() runs
the shortest path in the three in the attached benchmark program.

(pg_ctl start)
$ perl gen_tbl.pl | psql ...
(pg_ctl stop)


0. Baseline (0001-benchmark.patch, 0002-Base-change.patch)

At first, I made two binaries from the literally same source. For the
benchmark's sake the source is already modified a bit. Specifically it
has SetCatCacheClock needed by the benchmark, but actually not called
in this benchmark.


  time(ms)|stddev(ms)
not patched | 7750.42 |  23.83   # 0.6% faster than 7775.23
not patched | 7864.73 |  43.21
not patched | 7866.80 | 106.47
not patched | 7952.06 |  63.14
master  | 7775.23 |  35.76
master  | 7870.42 | 120.31
master  | 7876.76 | 109.04
master  | 7963.04 |   9.49

So, it seems to me that we cannot tell something about differences
below about 80ms (about 1%) now.


1. Inserting a branch in SearchCatCacheInternal. (CatCache_Pattern_1.patch)

 This is the most straightforward way to add an alternative feature.

pattern 1 | 8459.73 |  28.15  # 9% (>> 1%) slower than 7757.58
pattern 1 | 8504.83 |  55.61
pattern 1 | 8541.81 |  41.56
pattern 1 | 8552.20 |  27.99
master| 7757.58 |  22.65
master| 7801.32 |  20.64
master| 7839.57 |  25.28
master| 7925.30 |  38.84

 It's so slow that it cannot be used.


2. Making SearchCatCacheInternal be an indirect function.
   (CatCache_Pattern_2.patch)

Next, I made the work horse routine be called indirectly. The "inline"
for the function acutally let compiler optimize SearchCatCacheN
routines as described in comment but the effect doesn't seem so large
at least for this case.

pattern 2 | 7976.22 |  46.12  (2.6% slower > 1%)
pattern 2 | 8103.03 |  51.57
pattern 2 | 8144.97 |  68.46
pattern 2 | 8353.10 |  34.89
master| 7768.40 |  56.00
master| 7772.02 |  29.05
master| 7775.05 |  27.69
master| 7830.82 |  13.78


3. Making SearchCatCacheN be indirect functions. (CatCache_Pattern_3.patch)

As far as gcc/linux/x86 goes, SearchSysCacheN is comiled into the
following instructions:

 0x00866c20 <+0>:   movslq %edi,%rdi
 0x00866c23 <+3>:   mov0xd3da40(,%rdi,8),%rdi
 0x00866c2b <+11>:  jmpq   0x856ee0 

If we made SearchCatCacheN be indirect functions as the patch, it
changes just one instruction as:

 0x00866c50 <+0>:   movslq %edi,%rdi
 0x00866c53 <+3>:   mov0xd3da60(,%rdi,8),%rdi
 0x00866c5b <+11>:  jmpq   *0x4c0caf(%rip) # 0xd27910 


pattern 3 | 7836.26 |  48.66 (2% slower > 1%)
pattern 3 | 7963.74 |  67.88
pattern 3 | 7966.65 | 101.07
pattern 3 | 8214.57 |  71.93
master| 7679.74 |  62.20
master| 7756.14 |  77.19
master| 7867.14 |  73.33
master| 7893.97 |  47.67

I expected this runs in almost the same time. I'm not sure if it is
the result of spectre_v2 mitigation, but I show status of my
environment as follows.


# uname -r
4.18.0-80.11.2.el8_0.x86_64
# cat /proc/cpuinfo
...
model name  : Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
stepping: 12
microcode   : 0xae
bugs: spectre_v1 spectre_v2 spec_store_bypass mds
# cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: 
disabled, RSB filling


I am using CentOS8 and I don't find a handy (or on-the-fly) way to
disable them..

Attached are:

0001-benchmark.patch: catcache benchmark extension (and core side fix)
0002-Base-change.patch  : baseline change in this series of benchmark
CatCache_Pattern_1.patch: naive branching
CatCache_Pattern_2.patch: indirect SearchCatCacheInternal
CatCache_Pattern_1.patch: indirect SearchCatCacheN

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 245e88e1b43df74273fbaa1b22f4f64621ffe9d5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 14 Nov 2019 19:24:36 +0900
Subject: [PATCH 1/2] benchmark

---
 contrib/catcachebench/Makefile   |  17 +
 contrib/catcachebench/catcachebench--0.0.sql |  14 +
 contrib/catcachebench/catcachebench.c| 330 +++
 contrib/catcachebench/catcachebench.control  |   6 +
 src/backend/utils/cache/catcache.c   |  33 ++
 src/backend/utils/cache/syscache.c   |   2 +-
 6 files changed, 401 insertions(+), 1 deletion(-)
 create mode 100644 contrib/catcachebench/Makefile
 create mode 100644 contrib/catcachebench/catcachebench--0.0.sql
 create mode 100644 contrib/catcachebench/catcachebench.c
 create mode 100644

Re: Protect syscache from bloating with negative cache entries

2019-07-01 Thread Kyotaro Horiguchi

Hello, 

my_gripe> But, still fluctulates by around 5%..
my_gripe> 
my_gripe> If this level of the degradation is still not acceptable, that
my_gripe> means that nothing can be inserted in the code path and the new
my_gripe> code path should be isolated from existing code by using indirect
my_gripe> call.

Finally, after some struggling, I think I could manage to measure
the impact on performace precisely and reliably. Starting from
"make distclean" every time building, then removing all in
$TARGET before installation makes things stable enough. (I don't
think it's good but I didin't investigate the cause..)

I measured time/call by directly calling SearchSysCache3() many
times. It showed that the patch causes around 0.1 microsec
degradation per call. (The funtion overall took about 6.9
microsec on average.)

Next, I counted how many times SearchSysCache is called during a
planning with, as an instance, a query on a partitioned table
having 3000 columns and 1000 partitions.

  explain analyze select sum(c) from test.p;

Planner made 6020608 times syscache calls while planning and the
overall planning time was 8641ms. (Exec time was 48ms.) 6020608
times 0.1 us is 602 ms of degradation. So roughly -7% degradation
in planning time in estimation. The degradation was given by
really only the two successive instructions "ADD/conditional
MOVE(CMOVE)". The fact leads to the conclusion that the existing
code path as is doesn't have room for any additional code.


So I sought for room at least for one branch and found that (on
gcc 7.3.1/CentOS7/x64). Interestingly, de-inlining
SearchCatCacheInternal gave me gain on performance by about
3%. Further inlining of CatalogCacheComputeHashValue() gave
another gain about 3%. I could add a branch in
SearchCatCacheInteral within the gain.

I also tried indirect calls but the degradation overwhelmed the
gain, so I choosed branching rather than indirect calls. I didn't
investigated how it happens.


The following is the result. The binaries are build with the same
configuration using -O2.

binary means
  master  : master HEAD.
  patched_off : patched, but pruning disabled (catalog_cache_prune_min_age=-1).
  patched_on  : patched with pruning enabled.
("300s" for 1, "1s" for2, "0" for 3)

bench:
  1: corresponds to catcachebench(1); fetching STATRELATTINH 3000
 * 1000 times generating new cache entriies. (Massive cache
   creatiion)
 Pruning doesn't happen while running this.

  2: catcachebench(2); 6 times cache access on 1000
 STATRELATTINH entries. (Frequent cache reference)
 Pruning doesn't happen while running this.

  3: catcachebench(3); fetching 1000(tbls) * 3000(cols)

 STATRELATTINH entries. Catcache clock advancing with the
 interval of 100(tbls) * 3000(cols) times of access and
 pruning happenshoge.

 While running catcachebench(3) once, pruning happens 28
 times and most of the time 202202 entries are removed and
 the total number of entries was limite to 524289. (The
 systable has 3000 * 1001 = 3003000 tuples.)

iter: Number of iterations. Time ms and stddev is calculated over
 the iterations.


binar| bench | iter  |  time ms | stddev
-+---+---+--+
 master  | 1 |10 |  8150.30 |  12.96
 master  | 2 |10 |  4002.88 |  16.18
 master  | 3 |10 |  9065.06 |  11.46
-+---+---+--+
 patched_off | 1 |10 |  8090.95 |   9.95
 patched_off | 2 |10 |  3984.67 |  12.33
 patched_off | 3 |10 |  9050.46 |   4.64
-+---+---+--+
 patched_on  | 1 |10 |  8158.95 |   6.29
 patched_on  | 2 |10 |  4023.72 |  10.41
 patched_on  | 3 |10 | 16532.66 |  18.39

patched_off is slightly faster than master. patched_on is
generally a bit slower. Even though patched_on/3 seems take too
long time, the extra time comes from increased catalog table
acess in exchange of memory saving. (That is, it is expected
behavior.) I ran it several times and most them showed the same
tendency.

As a side-effect, once the branch added, the shared syscache in a
neighbour thread will be able to be inserted together without
impact on existing code path.


===
The benchmark script is used as the follows:

- create many (3000, as example) tables in "test" schema. I
  created a partitioned table with 3000 children.

- The tables have many columns, 1000 for me.

- Run the following commands.

  =# select catcachebench(0);  -- warm up systables.
  =# set catalog_cache_prune_min_age = any; -- as required
  =# select catcachebench(n);  -- 3 >= n >= 1, the number of "bench" above.


  The above result is taked with the following query.

  =# select 'patched_on', '3' , count(a), avg(a)::numeric(10,2), 
stddev(a)::numeric(10,2) from (select catcachebench(3) from generate_series(1, 
10)) as a(a);


The attached patches are:

RE: Protect syscache from bloating with negative cache entries

2019-04-11 Thread Ideriha, Takeshi

>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>Does this result show that hard-limit size option with memory accounting
>doesn't harm
>to usual users who disable hard limit size option?

Hi,

I've implemented relation cache size limitation with LRU list and built-in
memory context size account.
And I'll share some results coupled with a quick recap of catcache so that we
can resume discussion if needed
though relation cache bloat was also discussed in this thread but right now
it's pending
and catcache feature is not fixed. But a variety of information could be good I
believe.

Regarding catcache it seems to me recent Horiguchi san posts shows a pretty
detailed stats
including comparison LRU overhead and full scan of hash table. According to the
result, lru overhead seems small
but for simplicity this thread go without LRU.
https://www.postgresql.org/message-id/20190404.215255.09756748.horiguchi.kyotaro%40lab.ntt.co.jp

When there was hard limit of catcach, there was built-in memory context size
accounting machinery.
I checked the overhead of memory accounting, and when repeating palloc and
pfree of 800 byte area many times it was 4% down
on the other hand in case of 32768 byte there seems no overhead.
https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A6F44564E%40G01JPEXMBKW04

Regarding relcache hard limit (relation_cache_max_size), most of the
architecture was similar as catcache one with LRU list except memory accounting.
Relcaches are managed by LRU list. To prune LRU cache, we need to know overall
relcache sizes including objects pointed by relcache
like 'index info'.
So in this patch relcache objects are allocated under RelCacheMemoryContext,
which is child of CacheMemoryContext. Objects pointed by
relcache is allocated under child context of RelCacheMemoryContext.
In built-in size accounting, if memoryContext is set to collect "group(family)
size", you can get context size including child easily.

I ran two experiments:
A) One is pgbench using Tomas's script he posted while ago, which is randomly
select 1 from many tables.
https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A6F426207%40G01JPEXMBKW04

B) The other is to check memory context account overhead using the same method.
https://www.postgresql.org/message-id/4E72940DA2BF16479384A86D54D0988A6F44564E%40G01JPEXMBKW04

A) randomly select 1 from many tables
Results are average of 5 times each.

number of tables| 100 |1000|1
---
TPS (master)|11105 |10815 |8915
TPS (patch; limit feature off) |11254 (+1%) |11176 (+3%) |9242 (+4%)
TPS (patch: limit on with 1MB) |11317 (+2%) |10491 (-3%) |7380 (-17%)

The results are noisy but it seems overhead of LRU and memory accounting is
small when turning off the relcache limit feature.
When turning on the limit feature, after exceeding the limit it drops 17%,
which is no surprise.

B) Repeat palloc/pfree
"With group accounting" means that account test context and its child context
with built-in accounting using "palloc_bench_family()".
The other one is that using palloc_bench(). Please see palloc_bench.gz.

[Size=32768, iter=1,000,000]
Master| 59.97 ms
Master with group account | 59.57 ms
patched |67.23 ms
patched with family |68.81 ms

It seems that overhead seems large in this patch. So it needs more inspection
this area.

regards,
Takeshi Ideriha

palloc_bench.gz
Description: palloc_bench.gz

v15-0001-4-Add-dlist_move_tail.patch
Description: v15-0001-4-Add-dlist_move_tail.patch

binj9L3vPCahv.bin
Description: v15-0002-4-ideriha-Memory-consumption-report-reature-of-memorycontext.patch

v15-0004-4-ideriha-Remove-RelCache-Entries.patch
Description: v15-0004-4-ideriha-Remove-RelCache-Entries.patch

v15-0003-4-ideriha-Remove-CatCache-Entries.patch
Description: v15-0003-4-ideriha-Remove-CatCache-Entries.patch

Re: Protect syscache from bloating with negative cache entries

2019-04-05 Thread Kyotaro HORIGUCHI

At Fri, 05 Apr 2019 09:44:07 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190405.094407.151644324.horiguchi.kyot...@lab.ntt.co.jp>
> By the way, I found the reason of the wrong result of the
> previous benchmark. The test 3_0/1 needs to update catcacheclock
> midst of the loop. I'm going to fix it and rerun it.

I found the cause. CataloCacheFlushCatalog() doesn't shring the
hash. So no resize happens once it is bloated. I needed another
version of the function that reset the cc_bucket to the initial
size.

Using the new debug function, I got better numbers.

I focused on the performance when disabled. I rechecked that by
adding the patch part-by-part and identified several causes of
the degradaton. I did:

- MovedpSetCatCacheClock() to AtStart_Cache()
- Maybe improved the caller site of CatCacheCleanupOldEntries().

As the result:

 binary | test | count |   avg   | stddev | 
+--+---+-++---
 master | 1_1  | 5 | 7104.90 |   4.40 | 
 master | 2_1  | 5 | 3759.26 |   4.20 | 
 master | 3_1  | 5 | 7954.05 |   2.15 | 
+--+---+-++---
 Full   | 1_1  | 5 | 7237.20 |   7.98 | 101.87
 Full   | 2_1  | 5 | 4050.98 |   8.42 | 107.76
 Full   | 3_1  | 5 | 8192.87 |   3.28 | 103.00

But, still fluctulates by around 5%..

If this level of the degradation is still not acceptable, that
means that nothing can be inserted in the code path and the new
code path should be isolated from existing code by using indirect
call.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index bd5024ef00..a9414c0c07 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -1067,6 +1067,7 @@ static void
 AtStart_Cache(void)
 {
 	AcceptInvalidationMessages();
+	SetCatCacheClock(GetCurrentStatementStartTimestamp());
 }
 
 /*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..4d849aeb4c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -71,6 +71,7 @@
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
+#include "utils/catcache.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index d05930bc4c..91814f7210 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -60,9 +60,18 @@
 #define CACHE_elog(...)
 #endif
 
+/*
+ * GUC variable to define the minimum age of entries that will be considered
+ * to be evicted in seconds. -1 to disable the feature.
+ */
+int catalog_cache_prune_min_age = 0;
+
 /* Cache management header --- pointer is NULL until created */
 static CatCacheHeader *CacheHdr = NULL;
 
+/* Clock for the last accessed time of a catcache entry. */
+TimestampTz	catcacheclock = 0;
+
 static inline HeapTuple SearchCatCacheInternal(CatCache *cache,
 	   int nkeys,
 	   Datum v1, Datum v2,
@@ -850,9 +859,83 @@ InitCatCache(int id,
 	 */
 	MemoryContextSwitchTo(oldcxt);
 
+	/* initialize catcache reference clock if haven't done yet */
+	if (catcacheclock == 0)
+		catcacheclock = GetCurrentTimestamp();
+
 	return cp;
 }
 
+/*
+ * CatCacheCleanupOldEntries - Remove infrequently-used entries
+ *
+ * Catcache entries happen to be left unused for a long time for several
+ * reasons. Remove such entries to prevent catcache from bloating. It is based
+ * on the similar algorithm with buffer eviction. Entries that are accessed
+ * several times in a certain period live longer than those that have had less
+ * access in the same duration.
+ */
+static bool
+CatCacheCleanupOldEntries(CatCache *cp)
+{
+	int	nremoved = 0;
+	int i;
+
+	/* Return immediately if disabled */
+	if (catalog_cache_prune_min_age == 0)
+		return false;
+
+	/* Scan over the whole hash to find entries to remove */
+	for (i = 0 ; i < cp->cc_nbuckets ; i++)
+	{
+		dlist_mutable_iter	iter;
+
+		dlist_foreach_modify(iter, >cc_bucket[i])
+		{
+			CatCTup*ct = dlist_container(CatCTup, cache_elem, iter.cur);
+			long		entry_age;
+			int			us;
+
+			/* Don't remove referenced entries */
+			if (ct->refcount != 0 ||
+(ct->c_list && ct->c_list->refcount != 0))
+continue;
+
+			/*
+			 * Calculate the duration from the time from the last access to
+			 * the "current" time. catcacheclock is updated per-statement
+			 * basis and additionaly udpated periodically during a long
+			 * running query.
+			 */
+			TimestampDifference(ct->lastaccess, catcacheclock, _age, );
+
+			if (entry_age < catalog_cache_prune_min_age)
+continue;
+
+			/*
+			 * Entries that are not accessed after the last pruning are
+			 * removed in that seconds, and their lives are prolonged
+			 * according to how many times they are accessed up to three times
+			 * of the duration. We don't try shrink

Re: Protect syscache from bloating with negative cache entries

2019-04-04 Thread Kyotaro HORIGUCHI

Thank you for the comment.

At Thu, 4 Apr 2019 15:44:35 -0400, Robert Haas  wrote in 

> On Thu, Apr 4, 2019 at 8:53 AM Kyotaro HORIGUCHI
>  wrote:
> > So it seems to me that the simplest "Full" version wins. The
> > attached is rebsaed version. dlist_move_head(entry) is removed as
> > mentioned above in that patch.
> 
> 1. I really don't think this patch has any business changing the
> existing logic.  You can't just assume that the dlist_move_head()
> operation is unimportant for performance.

Ok, it doesn't show significant performance gain so removed that.

> 2. This patch still seems to add a new LRU list that has to be
> maintained.  That's fairly puzzling.  You seem to have concluded that
> the version without the additional LRU wins, but the sent a new copy
> of the version with the LRU version.

Sorry, I attached wrong one. The attached is the right one, which
doesn't adds the new dlist.

> 3. I don't think adding an additional call to GetCurrentTimestamp() in
> start_xact_command() is likely to be acceptable.  There has got to be
> a way to set this up so that the maximum number of new
> GetCurrentTimestamp() is limited to once per N seconds, vs. the
> current implementation that could do it many many many times per
> second.

The GetCurrentTimestamp() is called only once at very early in
the backend's life in InitPostgres. Not in
start_xact_command. What I did in the function is just copying
stmtStartTimstamp, not GetCurrentTimestamp().

> 4. The code in CatalogCacheCreateEntry seems clearly unacceptable.  In
> a pathological case where CatCacheCleanupOldEntries removes exactly
> one element per cycle, it could be called on every new catcache
> allocation.

It may be a problem, if just one entry was created in the
duration longer than by catalog_cache_prune_min_age and resize
interval, or all candidate entries except one are actually in use
at the pruning moment. Is it realistic?

> I think we need to punt this patch to next release.  We're not
> converging on anything committable very fast.

Yeah, maybe right. This patch had several month silence several
times, got comments and modified taking in the comments for more
than two cycles, and finally had a death sentence (not literaly,
actually postpone) at very close to this third cycle end. I
anticipate the same continues in the next cycle.

By the way, I found the reason of the wrong result of the
previous benchmark. The test 3_0/1 needs to update catcacheclock
midst of the loop. I'm going to fix it and rerun it.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 596d6b018e1b7ddd5828298bfaba3ee405eb2604 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 1 Mar 2019 13:32:51 +0900
Subject: [PATCH] Remove entries that haven't been used for a certain time

Catcache entries happen to be left alone for several reasons. It is
not desirable that such useless entries eat up memory. Catcache
pruning feature removes entries that haven't been accessed for a
certain time before enlarging hash array.
---
 doc/src/sgml/config.sgml  |  19 +
 src/backend/tcop/postgres.c   |   2 +
 src/backend/utils/cache/catcache.c| 103 +-
 src/backend/utils/misc/guc.c  |  12 +++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/utils/catcache.h  |  16 
 6 files changed, 150 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index bc1d0f7bfa..819b252029 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1677,6 +1677,25 @@ include_dir 'conf.d'
   
  
 
+ 
+  catalog_cache_prune_min_age (integer)
+  
+   catalog_cache_prune_min_age configuration
+   parameter
+  
+  
+  
+   
+		 Specifies the minimum amount of unused time in seconds at which a
+		 system catalog cache entry is removed. -1 indicates that this feature
+		 is disabled at all. The value defaults to 300 seconds (5
+		 minutes). The entries that are not used for the duration
+		 can be removed to prevent catalog cache from bloating with useless
+		 entries.
+   
+  
+ 
+
  
   max_stack_depth (integer)
   
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 44a59e1d4f..a0efac86bc 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -71,6 +71,7 @@
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "tcop/utility.h"
+#include "utils/catcache.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
@@ -2577,6 +2578,7 @@ start_xact_command(void)
 	 * not desired, the timeout has to be disabled explicitly.
 	 */
 	enable_statement_timeout();
+	SetCatCacheClock(GetCurrentStatementStartTimestamp());
 }
 
 static void
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index d05930bc4c..03c2d8524c 100644
---

Re: Protect syscache from bloating with negative cache entries

2019-04-04 Thread Robert Haas

On Thu, Apr 4, 2019 at 8:53 AM Kyotaro HORIGUCHI
 wrote:
> So it seems to me that the simplest "Full" version wins. The
> attached is rebsaed version. dlist_move_head(entry) is removed as
> mentioned above in that patch.

1. I really don't think this patch has any business changing the
existing logic.  You can't just assume that the dlist_move_head()
operation is unimportant for performance.

2. This patch still seems to add a new LRU list that has to be
maintained.  That's fairly puzzling.  You seem to have concluded that
the version without the additional LRU wins, but the sent a new copy
of the version with the LRU version.

3. I don't think adding an additional call to GetCurrentTimestamp() in
start_xact_command() is likely to be acceptable.  There has got to be
a way to set this up so that the maximum number of new
GetCurrentTimestamp() is limited to once per N seconds, vs. the
current implementation that could do it many many many times per
second.

4. The code in CatalogCacheCreateEntry seems clearly unacceptable.  In
a pathological case where CatCacheCleanupOldEntries removes exactly
one element per cycle, it could be called on every new catcache
allocation.

I think we need to punt this patch to next release.  We're not
converging on anything committable very fast.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Protect syscache from bloating with negative cache entries

2019-04-04 Thread Kyotaro HORIGUCHI

At Mon, 01 Apr 2019 11:05:32 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190401.110532.102998353.horiguchi.kyot...@lab.ntt.co.jp>
> At Fri, 29 Mar 2019 17:24:40 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
>  wrote in 
> <20190329.172440.199616830.horiguchi.kyot...@lab.ntt.co.jp>
> > I ran three artificial test cases. The database is created by
> > gen_tbl.pl. Numbers are the average of the fastest five runs in
> > successive 15 runs.
> > 
> > Test cases are listed below.
> > 
> > 1_0. About 3,000,000 negative entries are created in pg_statstic
> >   cache by scanning that many distinct columns. It is 3000 tables
> >   * 1001 columns. Pruning scans happen several times while a run
> >   but no entries are removed. This emulates the bloating phase of
> >   cache. catalog_cache_prune_min_age is default (300s).
> >   (access_tbl1.pl)
> > 
> > 1_1. Same to 1_0 except that catalog_cache_prune_min_age is 0,
> >   which means turning off.
> > 
> > 2_0. Repeatedly access 1001 of the 3,000,000 entries 6000
> >   times. This emulates the stable cache case without having
> >   pruning. catalog_cache_prune_min_age is default (300s).
> >  (access_tbl2.pl)
> > 
> > 2_1. Same to 2_0 except that catalog_cache_prune_min_age is 0,
> >   which means turning off.
> > 
> > 3_0. Scan over the 3,000,000 entries twice with setting prune_age
> >   to 10s. A run takes about 18 seconds on my box so fair amount
> >   of old entries are removed. This emulates the stable case with
> >   continuous pruning. (access_tbl3.pl)
> > 
> > 3_1. Same to 3_0 except that catalog_cache_prune_min_age is 0,
> >   which means turning off.
..
> I had another.. unstable..  result.

dlist_move_head is used every time an entry is accessed. It moves
the accessed element to the top of bucket expecting that
subsequent access become faster - a kind of LRU maintenance. But
the mean length of a bucket is 2 so dlist_move_head is too
complex than following one step of link. So I removed it in
pruning patch.

I understand I cannot get rid of noise a far as I'm poking the
feature from client via communication and SQL layer.

The attached extension surgically exercises
SearchSysCache3(STATRELATTINH) in the almost pattern with the
benchmarks taken last week.  I believe that that gives far
reliable numbers. But still the number fluctuates by up to about
10% every trial, and the difference among the methods is under
the fluctulation. I'm tired.. But this still looks somewhat wrong.

ratio in the following table is the percentage to the master for
the same test. master2 is a version that removed the
dlink_move_head from master.

 binary  | test | count |   avg   | stddev | ratio
-+--+---+-++
 master  | 1_0  | 5 | 7841.42 |   6.91
 master  | 2_0  | 5 | 3810.10 |   8.51
 master  | 3_0  | 5 | 7826.17 |  11.98
 master  | 1_1  | 5 | 7905.73 |   5.69
 master  | 2_1  | 5 | 3827.15 |   5.55
 master  | 3_1  | 5 | 7822.67 |  13.75
-+--+---+-++
 master2 | 1_0  | 5 | 7538.05 |  16.65 |  96.13
 master2 | 2_0  | 5 | 3927.05 |  11.58 | 103.07
 master2 | 3_0  | 5 | 7455.47 |  12.03 |  95.26
 master2 | 1_1  | 5 | 7485.60 |   9.38 |  94.69
 master2 | 2_1  | 5 | 3870.81 |   5.54 | 101.14
 master2 | 3_1  | 5 | 7437.35 |   9.91 |  95.74
-+--+---+-++
 LRU | 1_0  | 5 | 7633.57 |   9.00 |  97.35
 LRU | 2_0  | 5 | 4062.43 |   5.90 | 106.62
 LRU | 3_0  | 5 | 8340.51 |   6.12 | 106.57
 LRU | 1_1  | 5 | 7645.87 |  13.29 |  96.71
 LRU | 2_1  | 5 | 4026.60 |   7.56 | 105.21
 LRU | 3_1  | 5 | 8400.10 |  19.07 | 107.38
-+--+---+-++
 Full| 1_0  | 5 | 7481.61 |   6.70 |  95.41
 Full| 2_0  | 5 | 4084.46 |  14.50 | 107.20
 Full| 3_0  | 5 | 8166.23 |  14.80 | 104.35
 Full| 1_1  | 5 | 7447.20 |  10.93 |  94.20
 Full| 2_1  | 5 | 4016.88 |   8.53 | 104.96
 Full| 3_1  | 5 | 8258.80 |   7.91 | 105.58
-+--+---+-++
 FullMod | 1_0  | 5 | 7291.80 |  14.03 |  92.99
 FullMod | 2_0  | 5 | 4006.36 |   7.64 | 105.15
 FullMod | 3_0  | 5 | 8143.60 |   9.26 | 104.06
 FullMod | 1_1  | 5 | 7270.66 |   6.24 |  91.97
 FullMod | 2_1  | 5 | 3996.20 |  13.00 | 104.42
 FullMod | 3_1  | 5 | 8012.55 |   7.09 | 102 43



So "Full (scan) Mod" wins again, or the diffence is under error.

I don't think this level of difference can be a reason to reject
this kind of resource saving mechanism. LRU version doesn't seem
particularly slow but also doesn't seem particularly fast for the
complexity. FullMod version doesn't look differently.

So it seems to me that the simplest "Full" version wins. The
attached is rebsaed version. dlist_move_head(entry) is removed as
mentioned above in that patch.

The third and fourth attached are a set of script I used.

$

Re: Protect syscache from bloating with negative cache entries

2019-03-31 Thread Kyotaro HORIGUCHI

At Fri, 29 Mar 2019 17:24:40 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190329.172440.199616830.horiguchi.kyot...@lab.ntt.co.jp>
> I ran three artificial test cases. The database is created by
> gen_tbl.pl. Numbers are the average of the fastest five runs in
> successive 15 runs.
> 
> Test cases are listed below.
> 
> 1_0. About 3,000,000 negative entries are created in pg_statstic
>   cache by scanning that many distinct columns. It is 3000 tables
>   * 1001 columns. Pruning scans happen several times while a run
>   but no entries are removed. This emulates the bloating phase of
>   cache. catalog_cache_prune_min_age is default (300s).
>   (access_tbl1.pl)
> 
> 1_1. Same to 1_0 except that catalog_cache_prune_min_age is 0,
>   which means turning off.
> 
> 2_0. Repeatedly access 1001 of the 3,000,000 entries 6000
>   times. This emulates the stable cache case without having
>   pruning. catalog_cache_prune_min_age is default (300s).
>  (access_tbl2.pl)
> 
> 2_1. Same to 2_0 except that catalog_cache_prune_min_age is 0,
>   which means turning off.
> 
> 3_0. Scan over the 3,000,000 entries twice with setting prune_age
>   to 10s. A run takes about 18 seconds on my box so fair amount
>   of old entries are removed. This emulates the stable case with
>   continuous pruning. (access_tbl3.pl)
> 
> 2_1. Same to 3_0 except that catalog_cache_prune_min_age is 0,
>   which means turning off.
> 
> 
> The result follows.
> 
>  | master |  LRU   |  Full  |Full-mod|
> -|++++
>  1_0 | 17.287 | 17.370 | 17.255 | 16.623 |
>  1_1 | 17.287 | 17.063 | 16.336 | 17.192 |
>  2_0 | 15.695 | 18.769 | 18.563 | 15.527 |
>  2_1 | 15.695 | 18.603 | 18.498 | 18.487 |
>  3_0 | 26.576 | 33.817 | 34.384 | 34.971 |
>  3_1 | 26.576 | 27.462 | 26.202 | 26.368 |
> 
> The result of 2_0 and 2_1 seems strange, but I show you the
> numbers at the present.
> 
> - Full-scan seems to have the smallest impact when turned off.
> 
> - Full-scan-mod seems to perform best in total. (as far as
>   Full-mod-2_0 is wrong value..)
> 
> - LRU doesn't seem to outperform full scanning.

I had another.. unstable..  result.

 | master |  LRU   |  Full  |Full-mod|
-|++++
 1_0 | 16.312 | 16.540 | 16.482 | 16.348 |
 1_1 | 16.312 | 16.454 | 16.335 | 16.232 |
 2_0 | 16.710 | 16.954 | 17.873 | 17.345 |
 2_1 | 16.710 | 17.373 | 18.499 | 17.563 |
 3_0 | 25.010 | 33.031 | 33.452 | 33.937 |
 3_1 | 25.010 | 24.784 | 24.570 | 25.453 |


Normalizing on master's result and rounding off to 1.0%, it looks
as:

 | master |  LRU   |  Full  |Full-mod|  Test description
-|++++---
 1_0 |   100  |   101  |   101  |   100  |   bloating. pruning enabled.
 1_1 |   100  |   101  |   100  |   100  |   bloating. pruning disabled.
 2_0 |   100  |   101  |   107  |   104  |   normal access. pruning enabled.
 2_1 |   100  |   104  |   111  |   105  |   normal access. pruning disabled.
 3_0 |   100  |   132  |   134  |   136  |   pruning continuously running.
 3_1 |   100  |99  |98  |   102  |   pruning disabled.

I'm not sure why the 2_1 is slower than 2_0, but LRU impacts
least if the numbers are right.

I will investigate the strange behavior using profiler.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Protect syscache from bloating with negative cache entries

2019-03-29 Thread Kyotaro HORIGUCHI

Hello. Sorry for being late a bit.

At Wed, 27 Mar 2019 17:30:37 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190327.173037.40342566.horiguchi.kyot...@lab.ntt.co.jp>
> > I don't see much point in continuing to review this patch at this
> > point.  There's been no new version of the patch in 3 weeks, and there
> > is -- in my view at least -- a rather frustrating lack of evidence
> > that the complexity this patch introduces is actually beneficial.  No
> > matter how many people +1 the idea of making this more complicated, it
> > can't be justified unless you can provide a test result showing that
> > the additional complexity solves a problem that does not get solved
> > without that complexity.  And even then, who is going to commit a
> > patch that uses a design which Tom Lane says was tried before and
> > stunk?
> 
> Hmm. Anyway it is hit by recent commit. I'll post a rebased
> version and a version reverted to do hole-scan. Then I'll take
> numbers as far as I can and will show the result.. tomorrow.

I took performance numbers for master and three versions of the
patch. Master, LRU, full-scan, modified full-scan. I noticed that
useless scan can be skipped in full-scan version so I added the
last versoin.

I ran three artificial test cases. The database is created by
gen_tbl.pl. Numbers are the average of the fastest five runs in
successive 15 runs.

Test cases are listed below.

1_0. About 3,000,000 negative entries are created in pg_statstic
  cache by scanning that many distinct columns. It is 3000 tables
  * 1001 columns. Pruning scans happen several times while a run
  but no entries are removed. This emulates the bloating phase of
  cache. catalog_cache_prune_min_age is default (300s).
  (access_tbl1.pl)

1_1. Same to 1_0 except that catalog_cache_prune_min_age is 0,
  which means turning off.

2_0. Repeatedly access 1001 of the 3,000,000 entries 6000
  times. This emulates the stable cache case without having
  pruning. catalog_cache_prune_min_age is default (300s).
 (access_tbl2.pl)

2_1. Same to 2_0 except that catalog_cache_prune_min_age is 0,
  which means turning off.

3_0. Scan over the 3,000,000 entries twice with setting prune_age
  to 10s. A run takes about 18 seconds on my box so fair amount
  of old entries are removed. This emulates the stable case with
  continuous pruning. (access_tbl3.pl)

2_1. Same to 3_0 except that catalog_cache_prune_min_age is 0,
  which means turning off.


The result follows.

 | master |  LRU   |  Full  |Full-mod|
-|++++
 1_0 | 17.287 | 17.370 | 17.255 | 16.623 |
 1_1 | 17.287 | 17.063 | 16.336 | 17.192 |
 2_0 | 15.695 | 18.769 | 18.563 | 15.527 |
 2_1 | 15.695 | 18.603 | 18.498 | 18.487 |
 3_0 | 26.576 | 33.817 | 34.384 | 34.971 |
 3_1 | 26.576 | 27.462 | 26.202 | 26.368 |

The result of 2_0 and 2_1 seems strange, but I show you the
numbers at the present.

- Full-scan seems to have the smallest impact when turned off.

- Full-scan-mod seems to perform best in total. (as far as
  Full-mod-2_0 is wrong value..)

- LRU doesn't seem to outperform full scanning.

For your information I measured how long pruning takes time.

LRU318318 out of 2097153 entries in 26ms:  0.08us/entry.
Full-scan  443443 out of 2097153 entreis in 184ms. 0.4us/entry.

LRU is actually fast to remove entries but the difference seems
to be canceled by the complexity of LRU maintenance.

As my conclusion, we should go with the Full-scan or
Full-scan-mod version. I conduct a further overnight test and
will see which is better.

I attached the test script set. It is used in the folling manner.

(start server)
# perl gen_tbl.pl | psql postgres
(stop server)
# sh run.sh 30 > log.txt   # 30 is repeat count
# perl process.pl
 | master |  LRU   |  Full  |Full-mod|
-|++++
 1_0 | 16.711 | 17.647 | 16.767 | 17.256 |
...


The attached files are follow.

LRU versions patches.
  LRU-0001-Add-dlist_move_tail.patch
  LRU-0002-Remove-entries-that-haven-t-been-used-for-a-certain-.patch

Fullscn version patch.
  FullScan-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch

Fullscn-mod version patch.
  FullScan-mod-0001-Remove-entries-that-haven-t-been-used-for-a-certain-.patch

test scripts.
  test_script.tar.gz


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 1c397d118a65d6b76282cc904c43ecfe97ee5329 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 7 Feb 2019 14:56:07 +0900
Subject: [PATCH 1/2] Add dlist_move_tail

We have dlist_push_head/tail and dlist_move_head but not
dlist_move_tail. Add it.
---
 src/include/lib/ilist.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index b1a5974ee4..659ab1ac87 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move

Re: Protect syscache from bloating with negative cache entries

2019-03-27 Thread Kyotaro HORIGUCHI

At Mon, 25 Mar 2019 09:28:57 -0400, Robert Haas  wrote 
in 
> On Thu, Mar 7, 2019 at 11:40 PM Ideriha, Takeshi
>  wrote:
> > Just to be sure, we introduced the LRU list in this thread to find the 
> > entries less than threshold time
> > without scanning the whole hash table. If hash table becomes large without 
> > LRU list, scanning time becomes slow.
> 
> Hmm.  So, it's a trade-off, right?  One option is to have an LRU list,
> which imposes a small overhead on every syscache or catcache operation
> to maintain the LRU ordering.  The other option is to have no LRU
> list, which imposes a larger overhead every time we clean up the
> syscaches.  My bias is toward thinking that the latter is better,
> because:
> 
> 1. Not everybody is going to use this feature, and
> 
> 2. Syscache cleanup should be something that only happens every so
> many minutes, and probably while the backend is otherwise idle,
> whereas lookups can happen many times per millisecond.
> 
> However, perhaps someone will provide some evidence that casts a
> different light on the situation.

It's closer to my feeling. When cache is enlarged, all entries
are copied into new twice-in-size hash. If some entries removed,
we don't need to duplicate the whole hash, otherwise it means
that we do extra scan. We don't the pruning scan not frequently
than the interval so it is not a bad bid.

> I don't see much point in continuing to review this patch at this
> point.  There's been no new version of the patch in 3 weeks, and there
> is -- in my view at least -- a rather frustrating lack of evidence
> that the complexity this patch introduces is actually beneficial.  No
> matter how many people +1 the idea of making this more complicated, it
> can't be justified unless you can provide a test result showing that
> the additional complexity solves a problem that does not get solved
> without that complexity.  And even then, who is going to commit a
> patch that uses a design which Tom Lane says was tried before and
> stunk?

Hmm. Anyway it is hit by recent commit. I'll post a rebased
version and a version reverted to do hole-scan. Then I'll take
numbers as far as I can and will show the result.. tomorrow.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Protect syscache from bloating with negative cache entries

2019-03-25 Thread Robert Haas

On Thu, Mar 7, 2019 at 11:40 PM Ideriha, Takeshi
 wrote:
> Just to be sure, we introduced the LRU list in this thread to find the 
> entries less than threshold time
> without scanning the whole hash table. If hash table becomes large without 
> LRU list, scanning time becomes slow.

Hmm.  So, it's a trade-off, right?  One option is to have an LRU list,
which imposes a small overhead on every syscache or catcache operation
to maintain the LRU ordering.  The other option is to have no LRU
list, which imposes a larger overhead every time we clean up the
syscaches.  My bias is toward thinking that the latter is better,
because:

1. Not everybody is going to use this feature, and

2. Syscache cleanup should be something that only happens every so
many minutes, and probably while the backend is otherwise idle,
whereas lookups can happen many times per millisecond.

However, perhaps someone will provide some evidence that casts a
different light on the situation.

I don't see much point in continuing to review this patch at this
point.  There's been no new version of the patch in 3 weeks, and there
is -- in my view at least -- a rather frustrating lack of evidence
that the complexity this patch introduces is actually beneficial.  No
matter how many people +1 the idea of making this more complicated, it
can't be justified unless you can provide a test result showing that
the additional complexity solves a problem that does not get solved
without that complexity.  And even then, who is going to commit a
patch that uses a design which Tom Lane says was tried before and
stunk?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: Protect syscache from bloating with negative cache entries

2019-03-24 Thread Ideriha, Takeshi


>From: Vladimir Sitnikov [mailto:sitnikov.vladi...@gmail.com]
>
>Robert> This email thread is really short on clear demonstrations that X
>Robert> or Y is useful.
>
>It is useful when the whole database does **not** crash, isn't it?
>
>Case A (==current PostgeSQL mode): syscache grows, then OOMkiller chimes in, 
>kills
>the database process, and it leads to the complete cluster failure (all other 
>PG
>processes terminate themselves).
>
>Case B (==limit syscache by 10MiB or whatever as Tsunakawa, Takayuki
>asks):  a single ill-behaved process works a bit slower and/or consumers more 
>CPU
>than the other ones. The whole DB is still alive.
>
>I'm quite sure "case B" is much better for the end users and for the database
>administrators.
>
>So, +1 to Tsunakawa, Takayuki, it would be so great if there was a way to 
>limit the
>memory consumption of a single process (e.g. syscache, workmem, etc, etc).
>
>Robert> However, memory usage is quite unpredictable.  It depends on how
>Robert> many backends are active
>
>The number of backends can be limited by ensuring a proper limits at 
>application
>connection pool level and/or pgbouncer and/or things like that.
>
>Robert>how many copies of work_mem and/or  maintenance_work_mem are in
>Robert>use
>
>There might be other patches to cap the total use of
>work_mem/maintenance_work_mem,
>
>Robert>I don't think we
>Robert> can say that just imposing a limit on the size of the system
>Robert>caches is  going to be enough to reliably prevent an out of
>Robert>memory condition
>
>The less possibilities there are for OOM the better. Quite often it is much 
>better to fail
>a single SQL rather than kill all the DB processes.

Yeah, I agree. This limit would be useful for such extreme situation. 

Regards,
Takeshi Ideriha

RE: Protect syscache from bloating with negative cache entries

2019-03-07 Thread Ideriha, Takeshi

>From: Robert Haas [mailto:robertmh...@gmail.com]
>On Thu, Mar 7, 2019 at 9:49 AM Tomas Vondra 
>wrote:
>> I don't think this shows any regression, but perhaps we should do a
>> microbenchmark isolating the syscache entirely?
>
>Well, if we need the LRU list, then yeah I think a microbenchmark would be a 
>good idea
>to make sure we really understand what the impact of that is going to be.  But 
>if we
>don't need it and can just remove it then we don't.

Just to be sure, we introduced the LRU list in this thread to find the entries 
less than threshold time
without scanning the whole hash table. If hash table becomes large without LRU 
list, scanning time becomes slow.

Regards,
Takeshi Ideriha

Re: Protect syscache from bloating with negative cache entries

2019-03-07 Thread Tomas Vondra

On 3/7/19 4:01 PM, Robert Haas wrote:
> On Thu, Mar 7, 2019 at 9:49 AM Tomas Vondra
>  wrote:
>> I don't think this shows any regression, but perhaps we should do a
>> microbenchmark isolating the syscache entirely?
> 
> Well, if we need the LRU list, then yeah I think a microbenchmark
> would be a good idea to make sure we really understand what the impact
> of that is going to be.  But if we don't need it and can just remove
> it then we don't.
> 
>> What I had in mind is more along these lines:
>>
>> (a) track number of active syscache entries (increment when adding a new
>> one, decrement when evicting one)
>>
>> (b) track peak number of active syscache entries
>>
>> (c) after clock-sweep, if (peak > K*active) where K=2 or K=4 or so, do a
>> memory context swap, i.e. create a new context, copy active entries over
>> and destroy the old one
>>
>> That would at least free() the memory. Of course, the syscache entries
>> may have different sizes, so tracking just numbers of entries is just an
>> approximation. But I think it'd be enough.
> 
> Yeah, that could be done.  I'm not sure how expensive it would be, and
> I'm also not sure how much more effective it would be than what's
> currently proposed in terms of actually freeing memory.  If you free
> enough dead syscache entries, you might manage to give some memory
> back to the OS: after all, there may be some locality there.  And even
> if you don't, you'll at least prevent further growth, which might be
> good enough.
> 

I have my doubts about that happening in practice. It might happen for
some workloads, but I think the locality is rather unpredictable.

> We could consider doing some version of what has been proposed here
> and the thing you're proposing here could later be implemented on top
> of that.  I mean, evicting entries at all is a prerequisite to
> copy-and-compact.
> 

Sure. I'm not saying the patch must do this to make it committable.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Protect syscache from bloating with negative cache entries

2019-03-07 Thread Robert Haas

On Thu, Mar 7, 2019 at 9:49 AM Tomas Vondra
 wrote:
> I don't think this shows any regression, but perhaps we should do a
> microbenchmark isolating the syscache entirely?

Well, if we need the LRU list, then yeah I think a microbenchmark
would be a good idea to make sure we really understand what the impact
of that is going to be.  But if we don't need it and can just remove
it then we don't.

> What I had in mind is more along these lines:
>
> (a) track number of active syscache entries (increment when adding a new
> one, decrement when evicting one)
>
> (b) track peak number of active syscache entries
>
> (c) after clock-sweep, if (peak > K*active) where K=2 or K=4 or so, do a
> memory context swap, i.e. create a new context, copy active entries over
> and destroy the old one
>
> That would at least free() the memory. Of course, the syscache entries
> may have different sizes, so tracking just numbers of entries is just an
> approximation. But I think it'd be enough.

Yeah, that could be done.  I'm not sure how expensive it would be, and
I'm also not sure how much more effective it would be than what's
currently proposed in terms of actually freeing memory.  If you free
enough dead syscache entries, you might manage to give some memory
back to the OS: after all, there may be some locality there.  And even
if you don't, you'll at least prevent further growth, which might be
good enough.

We could consider doing some version of what has been proposed here
and the thing you're proposing here could later be implemented on top
of that.  I mean, evicting entries at all is a prerequisite to
copy-and-compact.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Protect syscache from bloating with negative cache entries

2019-03-07 Thread Tom Lane

Robert Haas  writes:
> On Wed, Mar 6, 2019 at 6:18 PM Tomas Vondra
>  wrote:
>> Which part of the LRU approach is supposedly expensive? Updating the
>> lastaccess field or moving the entries to tail? I'd guess it's the
>> latter, so perhaps we can keep some sort of age field, update it less
>> frequently (once per minute?), and do the clock sweep?

> Move to tail (although lastaccess would be expensive if too if it
> involves an extra gettimeofday() call).

As I recall, the big problem with the old LRU code was loss of
locality of access, in that in addition to the data associated with
hot syscache entries, you were necessarily also touching list link
fields associated with not-hot entries.  That's bad for the CPU cache.

A gettimeofday call (or any other kernel call) per syscache access
would be a complete disaster.

regards, tom lane

Re: Protect syscache from bloating with negative cache entries

2019-03-07 Thread Tomas Vondra




On 3/7/19 3:34 PM, Robert Haas wrote:
> On Wed, Mar 6, 2019 at 6:18 PM Tomas Vondra
>  wrote:
>> I agree clock sweep might be sufficient, although the benchmarks done in
>> this thread so far do not suggest the LRU approach is very expensive.
> 
> I'm not sure how thoroughly it's been tested -- has someone
> constructed a benchmark that does a lot of syscache lookups and
> measured how much slower they get with this new code?
> 

What I've done on v13 (and I don't think the results would be that
different on the current patch, but I may rerun it if needed) is a test
that creates large number of tables (up to 1M) and then accesses them
randomly. I don't know if it matches what you imagine, but see [1]

https://www.postgresql.org/message-id/74386116-0bc5-84f2-e614-0cff19aca2de%402ndquadrant.com

I don't think this shows any regression, but perhaps we should do a
microbenchmark isolating the syscache entirely?

>> A simple true/false flag, as proposed by Robert, would mean we can only
>> do the cleanup once per the catalog_cache_prune_min_age interval, so
>> with the default value (5 minutes) the entries might be between 5 and 10
>> minutes old. That's probably acceptable, although for higher values the
>> range gets wider and wider ...
> 
> That's true, but I don't know that it matters.  I'm not sure there's
> much of a use case for raising this parameter to some larger value,
> but even if there is, is it really worth complicating the mechanism to
> make sure that we throw away entries in a more timely fashion?  That's
> not going to be cost-free, either in terms of CPU cycles or in terms
> of code complexity.
> 

True, although it very much depends on how expensive it would be.

> Again, I think our goal should be to add the least mechanism here that
> solves the problem.  If we can show that a true/false flag makes poor
> decisions about which entries to evict and a smarter algorithm does
> better, then it's worth considering.  However, my bet is that it makes
> no meaningful difference.
> 

True.

>> Which part of the LRU approach is supposedly expensive? Updating the
>> lastaccess field or moving the entries to tail? I'd guess it's the
>> latter, so perhaps we can keep some sort of age field, update it less
>> frequently (once per minute?), and do the clock sweep?
> 
> Move to tail (although lastaccess would be expensive if too if it
> involves an extra gettimeofday() call).  GCLOCK, like we use for
> shared_buffers, is a common approximation of LRU which tends to be a
> lot less expensive to implement.  We could do that here and it might
> work well, but I think the question, again, is whether we really need
> it.  I think our goal here should just be to jettison cache entries
> that are clearly worthless.  It's expensive enough to reload cache
> entries that any kind of aggressive eviction policy is probably a
> loser, and if our goal is just to get rid of the stuff that's clearly
> not being used, we don't need to be super-accurate about it.
> 

True.

>> BTW wasn't one of the cases this thread aimed to improve a session that
>> accesses a lot of objects in a short period of time? That balloons the
>> syscache, and while this patch evicts the entries from memory, we never
>> actually release the memory back (because AllocSet just moves it into
>> the freelists) and it's unlikely to get swapped out (because other
>> chunks on those memory pages are likely to be still used). I've proposed
>> to address that by recreating the context if it gets too bloated, and I
>> think Alvaro agreed with that. But I haven't seen any further discussion
>> about that.
> 
> That's an interesting point.  It seems reasonable to me to just throw
> away everything and release all memory if the session has been idle
> for a while, but if the session is busy doing stuff, discarding
> everything in bulk like that is going to cause latency spikes.
> 

What I had in mind is more along these lines:

(a) track number of active syscache entries (increment when adding a new
one, decrement when evicting one)

(b) track peak number of active syscache entries

(c) after clock-sweep, if (peak > K*active) where K=2 or K=4 or so, do a
memory context swap, i.e. create a new context, copy active entries over
and destroy the old one

That would at least free() the memory. Of course, the syscache entries
may have different sizes, so tracking just numbers of entries is just an
approximation. But I think it'd be enough.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Protect syscache from bloating with negative cache entries

2019-03-07 Thread Robert Haas

On Wed, Mar 6, 2019 at 6:18 PM Tomas Vondra
 wrote:
> I agree clock sweep might be sufficient, although the benchmarks done in
> this thread so far do not suggest the LRU approach is very expensive.

I'm not sure how thoroughly it's been tested -- has someone
constructed a benchmark that does a lot of syscache lookups and
measured how much slower they get with this new code?

> A simple true/false flag, as proposed by Robert, would mean we can only
> do the cleanup once per the catalog_cache_prune_min_age interval, so
> with the default value (5 minutes) the entries might be between 5 and 10
> minutes old. That's probably acceptable, although for higher values the
> range gets wider and wider ...

That's true, but I don't know that it matters.  I'm not sure there's
much of a use case for raising this parameter to some larger value,
but even if there is, is it really worth complicating the mechanism to
make sure that we throw away entries in a more timely fashion?  That's
not going to be cost-free, either in terms of CPU cycles or in terms
of code complexity.

Again, I think our goal should be to add the least mechanism here that
solves the problem.  If we can show that a true/false flag makes poor
decisions about which entries to evict and a smarter algorithm does
better, then it's worth considering.  However, my bet is that it makes
no meaningful difference.

> Which part of the LRU approach is supposedly expensive? Updating the
> lastaccess field or moving the entries to tail? I'd guess it's the
> latter, so perhaps we can keep some sort of age field, update it less
> frequently (once per minute?), and do the clock sweep?

Move to tail (although lastaccess would be expensive if too if it
involves an extra gettimeofday() call).  GCLOCK, like we use for
shared_buffers, is a common approximation of LRU which tends to be a
lot less expensive to implement.  We could do that here and it might
work well, but I think the question, again, is whether we really need
it.  I think our goal here should just be to jettison cache entries
that are clearly worthless.  It's expensive enough to reload cache
entries that any kind of aggressive eviction policy is probably a
loser, and if our goal is just to get rid of the stuff that's clearly
not being used, we don't need to be super-accurate about it.

> BTW wasn't one of the cases this thread aimed to improve a session that
> accesses a lot of objects in a short period of time? That balloons the
> syscache, and while this patch evicts the entries from memory, we never
> actually release the memory back (because AllocSet just moves it into
> the freelists) and it's unlikely to get swapped out (because other
> chunks on those memory pages are likely to be still used). I've proposed
> to address that by recreating the context if it gets too bloated, and I
> think Alvaro agreed with that. But I haven't seen any further discussion
> about that.

That's an interesting point.  It seems reasonable to me to just throw
away everything and release all memory if the session has been idle
for a while, but if the session is busy doing stuff, discarding
everything in bulk like that is going to cause latency spikes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Protect syscache from bloating with negative cache entries

2019-03-06 Thread Tomas Vondra

On 3/6/19 9:17 PM, Tom Lane wrote:
> Robert Haas  writes:
>> OK, so this is getting simpler, but I'm wondering why we need
>> dlist_move_tail() at all.  It is a well-known fact that maintaining
>> LRU ordering is expensive and it seems to be unnecessary for our
>> purposes here.
> 
> Yeah ... LRU maintenance was another thing that used to be in the
> catcache logic and was thrown out as far too expensive.  Your idea
> of just using a clock sweep instead seems plausible.
> 

I agree clock sweep might be sufficient, although the benchmarks done in
this thread so far do not suggest the LRU approach is very expensive.

A simple true/false flag, as proposed by Robert, would mean we can only
do the cleanup once per the catalog_cache_prune_min_age interval, so
with the default value (5 minutes) the entries might be between 5 and 10
minutes old. That's probably acceptable, although for higher values the
range gets wider and wider ...

Which part of the LRU approach is supposedly expensive? Updating the
lastaccess field or moving the entries to tail? I'd guess it's the
latter, so perhaps we can keep some sort of age field, update it less
frequently (once per minute?), and do the clock sweep?

BTW wasn't one of the cases this thread aimed to improve a session that
accesses a lot of objects in a short period of time? That balloons the
syscache, and while this patch evicts the entries from memory, we never
actually release the memory back (because AllocSet just moves it into
the freelists) and it's unlikely to get swapped out (because other
chunks on those memory pages are likely to be still used). I've proposed
to address that by recreating the context if it gets too bloated, and I
think Alvaro agreed with that. But I haven't seen any further discussion
about that.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Protect syscache from bloating with negative cache entries

2019-03-06 Thread Tom Lane

Robert Haas  writes:
> OK, so this is getting simpler, but I'm wondering why we need
> dlist_move_tail() at all.  It is a well-known fact that maintaining
> LRU ordering is expensive and it seems to be unnecessary for our
> purposes here.

Yeah ... LRU maintenance was another thing that used to be in the
catcache logic and was thrown out as far too expensive.  Your idea
of just using a clock sweep instead seems plausible.

regards, tom lane

Re: Protect syscache from bloating with negative cache entries

2019-03-06 Thread Robert Haas

On Fri, Mar 1, 2019 at 3:33 AM Kyotaro HORIGUCHI
 wrote:
> > > It is artificial (or acutually wont't be repeatedly executed in a
> > > session) but anyway what can get benefit from
> > > catalog_cache_memory_target would be a kind of extreme.
> >
> > I agree.  So then let's not have it.
>
> Ah... Yeah! I see. Andres' concern was that crucial syscache
> entries might be blown away during a long idle time. If that
> happens, it's enough to just turn off in the almost all of such
> cases.

+1.

> In the attached v18,
>catalog_cache_memory_target is removed,
>removed some leftover of removing the hard limit feature,
>separated catcache clock update during a query into 0003.
>attached 0004 (monitor part) in order just to see how it is working.
>
> v18-0001-Add-dlist_move_tail:
>   Just adds dlist_move_tail
>
> v18-0002-Remove-entries-that-haven-t-been-used-for-a-certain-:
>   Revised pruning feature.

OK, so this is getting simpler, but I'm wondering why we need
dlist_move_tail() at all.  It is a well-known fact that maintaining
LRU ordering is expensive and it seems to be unnecessary for our
purposes here.  Can't CatCacheCleanupOldEntries just use a single-bit
flag on the entry?  If the flag is set, clear it.  If the flag is
clear, drop the entry.  When an entry is used, set the flag.  Then,
entries will go away if they are not used between consecutive calls to
CatCacheCleanupOldEntries.  Sure, that might be slightly less accurate
in terms of which entries get thrown away, but I bet it makes no real
difference.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: Protect syscache from bloating with negative cache entries

2019-03-03 Thread Ideriha, Takeshi

>From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com]

>> [Size=800, iter=1,000,000]
>> Master |15.763
>> Patched|16.262 (+3%)
>>
>> [Size=32768, iter=1,000,000]
>> Master |61.3076
>> Patched|62.9566 (+2%)
>
>What's the unit, second or millisecond?
Millisecond.

>Why is the number of digits to the right of the decimal point?
>
>Is the measurement correct?  I'm wondering because the difference is larger in 
>the
>latter case.  Isn't the accounting processing almost the same in both cases?
>* former: 16.262 - 15.763 = 4.99
>* latter: 62.956 - 61.307 = 16.49
>I think the overhead is sufficiently small.  It may get even smaller with a 
>trivial tweak.
>
>You added the new member usedspace at the end of MemoryContextData.  The
>original size of MemoryContextData is 72 bytes, and Intel Xeon's cache line is 
>64 bytes.
>So, the new member will be on a separate cache line.  Try putting usedspace 
>before
>the name member.

OK. I changed the order of MemoryContextData members to fit usedspace into one 
cacheline.
I disabled all the catcache eviction mechanism in patched one and compared it 
with master
to investigate that overhead of memory accounting become small enough. 

The settings are almost same as the last email. 
But last time the number of trials was 50 so I increased it and tried 5000 
times to 
calculate the average figure (rounded off to three decimal place).
 [Size=800, iter=1,000,000]
  Master |15.64 ms
  Patched|16.26 ms (+4%)
  The difference is  0.62ms

 [Size=32768, iter=1,000,000]
  Master |61.39 ms
  Patched|60.99 ms (-1%)
  
I guess there is around 2% noise.
But based on this experiment it seems the overhead small.
Still there is some overhead but it can be covered by some other 
manipulation like malloc().

Does this result show that hard-limit size option with memory accounting 
doesn't harm to usual users who disable hard limit size option?

Regards,
Takeshi Ideriha


v15-0001-3-Add-dlist_move_tail.patch
Description: v15-0001-3-Add-dlist_move_tail.patch


binuIZWS0c0ho.bin
Description: v15-0002-3-ideriha-Memory-consumption-report-reature-of-memorycontext.patch


v15-0003-3-ideriha-Remove-CatCache-Entries.patch
Description: v15-0003-3-ideriha-Remove-CatCache-Entries.patch

Re: Protect syscache from bloating with negative cache entries

2019-03-01 Thread Kyotaro HORIGUCHI

At Tue, 26 Feb 2019 10:55:18 -0500, Robert Haas  wrote 
in 
> On Mon, Feb 25, 2019 at 1:27 AM Kyotaro HORIGUCHI
>  wrote:
> > > I'd like to see some evidence that catalog_cache_memory_target has any
> > > value, vs. just always setting it to zero.
> >
> > It is artificial (or acutually wont't be repeatedly executed in a
> > session) but anyway what can get benefit from
> > catalog_cache_memory_target would be a kind of extreme.
> 
> I agree.  So then let's not have it.

Ah... Yeah! I see. Andres' concern was that crucial syscache
entries might be blown away during a long idle time. If that
happens, it's enough to just turn off in the almost all of such
cases.

We no longer need to count memory usage without the feature. That
sutff is moved to monitoring feature, which is out of the scope
of the current status of this patch.

> We shouldn't add more mechanism here than actually has value.  It
> seems pretty clear that keeping cache entries that go unused for long
> periods can't be that important; even if we need them again
> eventually, reloading them every 5 or 10 minutes can't hurt that much.
> On the other hand, I think it's also pretty clear that evicting cache
> entries that are being used frequently will have disastrous effects on
> performance; as I noted in the other email I just sent, consider the
> effects of CLOBBER_CACHE_ALWAYS.  No reasonable user is going to want
> to incur a massive slowdown to save a little bit of memory.
> 
> I see that *in theory* there is a value to
> catalog_cache_memory_target, because *maybe* there is a workload where
> tuning that GUC will lead to better performance at lower memory usage
> than any competing proposal.  But unless we can actually see an
> example of such a workload, which so far I don't, we're adding a knob
> that everybody has to think about how to tune when in fact we have no
> idea how to tune it or whether it even needs to be tuned.  That
> doesn't make sense.  We have to be able to document the parameters we
> have and explain to users how they should be used.  And as far as this
> parameter is concerned I think we are not at that point.

In the attached v18,
   catalog_cache_memory_target is removed,
   removed some leftover of removing the hard limit feature, 
   separated catcache clock update during a query into 0003.
   attached 0004 (monitor part) in order just to see how it is working.

v18-0001-Add-dlist_move_tail:
  Just adds dlist_move_tail

v18-0002-Remove-entries-that-haven-t-been-used-for-a-certain-:
  Revised pruning feature.


v18-0003-Asynchronous-update-of-catcache-clock:
  Separated catcache clock update feature.

v18-0004-Syscache-usage-tracking-feature:
  Usage tracking feature.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 54388a7452eda1faadaa108e1bc21d51844f9224 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 7 Feb 2019 14:56:07 +0900
Subject: [PATCH 1/6] Add dlist_move_tail

We have dlist_push_head/tail and dlist_move_head but not
dlist_move_tail. Add it.
---
 src/include/lib/ilist.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index b1a5974ee4..659ab1ac87 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
-- 
2.16.3

>From c79d5fc86f45e6545cbc257040e46125ffc5cb92 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 1 Mar 2019 13:32:51 +0900
Subject: [PATCH 2/6] Remove entries that haven't been used for a certain time

Catcache entries happen to be left alone for several reasons. It is
not desirable that such useless entries eat up memory. Catcache
pruning feature removes entries that haven't been accessed for a
certain time before enlarging hash array.
---
 doc/src/sgml/config.sgml  |  19 
 src/backend/tcop/postgres.c   |   2 +
 src/backend/utils/cache/catcache.c| 122 +-
 src/backend/utils/misc/guc.c  |  12 +++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/utils/catcache.h  |  18 
 6 files changed, 171 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6d42b7afe7..737a156bb4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1661,6 +1661,25 @@ include_dir

Re: Protect syscache from bloating with negative cache entries

2019-02-28 Thread Vladimir Sitnikov

Robert> This email thread is really short on clear demonstrations that X or Y
Robert> is useful.

It is useful when the whole database does **not** crash, isn't it?

Case A (==current PostgeSQL mode): syscache grows, then OOMkiller
chimes in, kills the database process, and it leads to the complete
cluster failure (all other PG processes terminate themselves).

Case B (==limit syscache by 10MiB or whatever as Tsunakawa, Takayuki
asks):  a single ill-behaved process works a bit slower and/or
consumers more CPU than the other ones. The whole DB is still alive.

I'm quite sure "case B" is much better for the end users and for the
database administrators.

So, +1 to Tsunakawa, Takayuki, it would be so great if there was a way
to limit the memory consumption of a single process (e.g. syscache,
workmem, etc, etc).

Robert> However, memory usage is quite unpredictable.  It depends on how many
Robert> backends are active

The number of backends can be limited by ensuring a proper limits at
application connection pool level and/or pgbouncer and/or things like
that.

Robert>how many copies of work_mem and/or
Robert> maintenance_work_mem are in use

There might be other patches to cap the total use of
work_mem/maintenance_work_mem,

Robert>I don't think we
Robert> can say that just imposing a limit on the size of the system caches is
Robert> going to be enough to reliably prevent an out of memory condition

The less possibilities there are for OOM the better. Quite often it is
much better to fail a single SQL rather than kill all the DB
processes.

Vladimir

RE: Protect syscache from bloating with negative cache entries

2019-02-28 Thread Tsunakawa, Takayuki

From: Ideriha, Takeshi/出利葉 健
> [Size=800, iter=1,000,000]
> Master |15.763
> Patched|16.262 (+3%)
> 
> [Size=32768, iter=1,000,000]
> Master |61.3076
> Patched|62.9566 (+2%)

What's the unit, second or millisecond?
Why is the number of digits to the right of the decimal point?

Is the measurement correct?  I'm wondering because the difference is larger in 
the latter case.  Isn't the accounting processing almost the sane in both cases?
* former: 16.262 - 15.763 = 4.99
* latter: 62.956 - 61.307 = 16.49


> At least compared to previous HashAg version, the overhead is smaller.
> It has some overhead but is increase by 2 or 3% a little bit?

I think the overhead is sufficiently small.  It may get even smaller with a 
trivial tweak.

You added the new member usedspace at the end of MemoryContextData.  The 
original size of MemoryContextData is 72 bytes, and Intel Xeon's cache line is 
64 bytes.  So, the new member will be on a separate cache line.  Try putting 
usedspace before the name member.


Regards
Takayuki Tsunakawa

Re: Protect syscache from bloating with negative cache entries

2019-02-28 Thread Robert Haas

On Wed, Feb 27, 2019 at 3:16 AM Ideriha, Takeshi
 wrote:
> I'm afraid I may be quibbling about it.
> What about users who understand performance drops but don't want to
> add memory or decrease concurrency?
> I think that PostgreSQL has a parameter
> which most of users don't mind and use is as default
> but a few of users want to change it.
> In this case as you said, introducing hard limit parameter causes
> performance decrease significantly so how about adding detailed caution
> to the document like planner cost parameter?

There's nothing wrong with a parameter that is useful to some people
and harmless to everyone else, but the people who are proposing that
parameter still have to demonstrate that it has those properties.
This email thread is really short on clear demonstrations that X or Y
is useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: Protect syscache from bloating with negative cache entries

2019-02-28 Thread Ideriha, Takeshi

>From: Tsunakawa, Takayuki [mailto:tsunakawa.ta...@jp.fujitsu.com]
>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>> I measured the memory context accounting overhead using Tomas's tool
>> palloc_bench, which he made it a while ago in the similar discussion.
>> https://www.postgresql.org/message-id/53f7e83c.3020...@fuzzy.cz
>>
>> This tool is a little bit outdated so I fixed it but basically I
>> followed him.
>> Things I did:
>> - make one MemoryContext
>> - run both palloc() and pfree() for 32kB area 1,000,000 times.
>> - And measure this time

>And are you sure you didn't enable assert checking?
Ah, sorry.. I misconfigured it. 

>I'm afraid the measurement is not correct.  First, the older discussion below 
>shows
>that the accounting overhead is much, much smaller, even with a more complex
>accounting.
>Second, allocation/free of memory > 8 KB calls malloc()/free().  I guess the
>accounting overhead will be more likely to be hidden under the overhead of 
>malloc()
>and free().  What we'd like to know the overhead when malloc() and free() are 
>not
>called.

Here is the average of 50 times measurement. 
Palloc-pfree for 800byte with 1,000,000 times, and 32kB with 1,000,000 times.
I checked malloc is not called at size=800 using gdb.

[Size=800, iter=1,000,000]
Master |15.763
Patched|16.262 (+3%)

[Size=32768, iter=1,000,000]
Master |61.3076
Patched|62.9566 (+2%)

At least compared to previous HashAg version, the overhead is smaller.
It has some overhead but is increase by 2 or 3% a little bit?

Regards,
Takeshi Ideriha

RE: Protect syscache from bloating with negative cache entries

2019-02-27 Thread Tsunakawa, Takayuki

From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
> I measured the memory context accounting overhead using Tomas's tool
> palloc_bench,
> which he made it a while ago in the similar discussion.
> https://www.postgresql.org/message-id/53f7e83c.3020...@fuzzy.cz
> 
> This tool is a little bit outdated so I fixed it but basically I followed
> him.
> Things I did:
> - make one MemoryContext
> - run both palloc() and pfree() for 32kB area 1,000,000 times.
> - And measure this time
> 
> The result shows that master is 30 times faster than patched one.
> So as Andres mentioned in upper thread it seems it has overhead.
> 
> [master (without v15 patch)]
> 61.52 ms
> 60.96 ms
> 61.40 ms
> 61.42 ms
> 61.14 ms
> 
> [with v15 patch]
> 1838.02 ms
> 1754.84 ms
> 1755.83 ms
> 1789.69 ms
> 1789.44 ms
> 

I'm afraid the measurement is not correct.  First, the older discussion below 
shows that the accounting overhead is much, much smaller, even with a more 
complex accounting.

9.5: Better memory accounting, towards memory-bounded HashAg
https://www.postgresql.org/message-id/flat/1407012053.15301.53.camel%40jeff-desktop

Second, allocation/free of memory > 8 KB calls malloc()/free().  I guess the 
accounting overhead will be more likely to be hidden under the overhead of 
malloc() and free().  What we'd like to know the overhead when malloc() and 
free() are not called.

And are you sure you didn't enable assert checking?


Regards
Takayuki Tsunakawa

RE: Protect syscache from bloating with negative cache entries

2019-02-27 Thread Ideriha, Takeshi

>From: Robert Haas [mailto:robertmh...@gmail.com]
>
>On Mon, Feb 25, 2019 at 3:50 AM Tsunakawa, Takayuki
> wrote:
>> How can I make sure that this context won't exceed, say, 10 MB to avoid OOM?
>
>As Tom has said before and will probably say again, I don't think you actually 
>want that.
>We know that PostgreSQL gets roughly 100x slower with the system caches 
>disabled
>- try running with CLOBBER_CACHE_ALWAYS.  If you are accessing the same system
>cache entries repeatedly in a loop - which is not at all an unlikely scenario, 
>just run the
>same query or sequence of queries in a loop - and if the number of entries 
>exceeds
>10MB even, perhaps especially, by just a tiny bit, you are going to see a 
>massive
>performance hit.
>Maybe it won't be 100x because some more-commonly-used entries will always stay
>cached, but it's going to be really big, I think.
>
>Now you could say - well it's still better than running out of memory.
>However, memory usage is quite unpredictable.  It depends on how many backends
>are active and how many copies of work_mem and/or maintenance_work_mem are in
>use, among other things.  I don't think we can say that just imposing a limit 
>on the
>size of the system caches is going to be enough to reliably prevent an out of 
>memory
>condition unless the other use of memory on the machine happens to be extremely
>stable.

>So I think what's going to happen if you try to impose a hard-limit on the 
>size of the
>system cache is that you will cause some workloads to slow down by 3x or more
>without actually preventing out of memory conditions.  What you need to do is 
>accept
>that system caches need to grow as big as they need to grow, and if that 
>causes you
>to run out of memory, either buy more memory or reduce the number of concurrent
>sessions you allow.  It would be fine to instead limit the cache memory if 
>those cache
>entries only had a mild effect on performance, but I don't think that's the 
>case.


I'm afraid I may be quibbling about it.
What about users who understand performance drops but don't want to 
add memory or decrease concurrency?
I think that PostgreSQL has a parameter
which most of users don't mind and use is as default 
but a few of users want to change it.
In this case as you said, introducing hard limit parameter causes
performance decrease significantly so how about adding detailed caution
to the document like planner cost parameter?

Regards,
Takeshi Ideriha

RE: Protect syscache from bloating with negative cache entries

2019-02-26 Thread Ideriha, Takeshi

>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>>>* 0001: add dlist_push_tail() ... as is
>>>* 0002: memory accounting, with correction based on feedback
>>>* 0003: merge the original 0003 and 0005, with correction based on
>>>feedback
>>
>>Attached are simpler version based on Horiguchi san's ver15 patch,
>>which means cache is pruned by both time and size.
>>(Still cleanup function is complex but it gets much simpler.)
>
>I don't mean to disregard what Horiguchi san and others have developed and
>discussed.
>But I refactored again the v15 patch to reduce complexity of v15 patch because 
>it
>seems to me one of the reason for dropping feature for pruning by size stems 
>from
>code complexity.
>
>Another thing is there's been discussed about over memory accounting overhead 
>but
>the overhead effect hasn't been measured in this thread. So I'd like to 
>measure it.

I measured the memory context accounting overhead using Tomas's tool 
palloc_bench, 
which he made it a while ago in the similar discussion.
https://www.postgresql.org/message-id/53f7e83c.3020...@fuzzy.cz 

This tool is a little bit outdated so I fixed it but basically I followed him.
Things I did:
- make one MemoryContext
- run both palloc() and pfree() for 32kB area 1,000,000 times. 
- And measure this time 

The result shows that master is 30 times faster than patched one.
So as Andres mentioned in upper thread it seems it has overhead.

[master (without v15 patch)]
61.52 ms
60.96 ms
61.40 ms
61.42 ms
61.14 ms

[with v15 patch]
1838.02 ms
1754.84 ms
1755.83 ms
1789.69 ms
1789.44 ms

Regards,
Takeshi Ideriha

Re: Protect syscache from bloating with negative cache entries

2019-02-26 Thread Robert Haas

On Mon, Feb 25, 2019 at 1:27 AM Kyotaro HORIGUCHI
 wrote:
> > I'd like to see some evidence that catalog_cache_memory_target has any
> > value, vs. just always setting it to zero.
>
> It is artificial (or acutually wont't be repeatedly executed in a
> session) but anyway what can get benefit from
> catalog_cache_memory_target would be a kind of extreme.

I agree.  So then let's not have it.

We shouldn't add more mechanism here than actually has value.  It
seems pretty clear that keeping cache entries that go unused for long
periods can't be that important; even if we need them again
eventually, reloading them every 5 or 10 minutes can't hurt that much.
On the other hand, I think it's also pretty clear that evicting cache
entries that are being used frequently will have disastrous effects on
performance; as I noted in the other email I just sent, consider the
effects of CLOBBER_CACHE_ALWAYS.  No reasonable user is going to want
to incur a massive slowdown to save a little bit of memory.

I see that *in theory* there is a value to
catalog_cache_memory_target, because *maybe* there is a workload where
tuning that GUC will lead to better performance at lower memory usage
than any competing proposal.  But unless we can actually see an
example of such a workload, which so far I don't, we're adding a knob
that everybody has to think about how to tune when in fact we have no
idea how to tune it or whether it even needs to be tuned.  That
doesn't make sense.  We have to be able to document the parameters we
have and explain to users how they should be used.  And as far as this
parameter is concerned I think we are not at that point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Protect syscache from bloating with negative cache entries

2019-02-26 Thread Robert Haas

On Mon, Feb 25, 2019 at 3:50 AM Tsunakawa, Takayuki
 wrote:
> How can I make sure that this context won't exceed, say, 10 MB to avoid OOM?

As Tom has said before and will probably say again, I don't think you
actually want that.  We know that PostgreSQL gets roughly 100x slower
with the system caches disabled - try running with
CLOBBER_CACHE_ALWAYS.  If you are accessing the same system cache
entries repeatedly in a loop - which is not at all an unlikely
scenario, just run the same query or sequence of queries in a loop -
and if the number of entries exceeds 10MB even, perhaps especially, by
just a tiny bit, you are going to see a massive performance hit.
Maybe it won't be 100x because some more-commonly-used entries will
always stay cached, but it's going to be really big, I think.

Now you could say - well it's still better than running out of memory.
However, memory usage is quite unpredictable.  It depends on how many
backends are active and how many copies of work_mem and/or
maintenance_work_mem are in use, among other things.  I don't think we
can say that just imposing a limit on the size of the system caches is
going to be enough to reliably prevent an out of memory condition
unless the other use of memory on the machine happens to be extremely
stable.

So I think what's going to happen if you try to impose a hard-limit on
the size of the system cache is that you will cause some workloads to
slow down by 3x or more without actually preventing out of memory
conditions.  What you need to do is accept that system caches need to
grow as big as they need to grow, and if that causes you to run out of
memory, either buy more memory or reduce the number of concurrent
sessions you allow.  It would be fine to instead limit the cache
memory if those cache entries only had a mild effect on performance,
but I don't think that's the case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: Protect syscache from bloating with negative cache entries

2019-02-25 Thread Ideriha, Takeshi

>>From: Tsunakawa, Takayuki
>>Ideriha-san,
>>Could you try simplifying the v15 patch set to see how simple the code
>>would look or not?  That is:
>>
>>* 0001: add dlist_push_tail() ... as is
>>* 0002: memory accounting, with correction based on feedback
>>* 0003: merge the original 0003 and 0005, with correction based on
>>feedback
>
>Attached are simpler version based on Horiguchi san's ver15 patch, which means
>cache is pruned by both time and size.
>(Still cleanup function is complex but it gets much simpler.)

I don't mean to disregard what Horiguchi san and others have developed and 
discussed. 
But I refactored again the v15 patch to reduce complexity of v15 patch
because it seems to me one of the reason for dropping feature for pruning by 
size stems from
code complexity.

Another thing is there's been discussed about over memory accounting overhead 
but
the overhead effect hasn't been measured in this thread. So I'd like to measure 
it.

Regards,
Takeshi Ideriha


v15-0001-2-Add-dlist_move_tail.patch
Description: v15-0001-2-Add-dlist_move_tail.patch


bin7aPjzkMJvE.bin
Description: v15-0002-2-ideriha-Memory-consumption-report-reature-of-memorycontext.patch


v15-0003-2-ideriha-Remove-CatCache-Entries.patch
Description: v15-0003-2-ideriha-Remove-CatCache-Entries.patch

RE: Protect syscache from bloating with negative cache entries

2019-02-25 Thread Tsunakawa, Takayuki

From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> - If you find the process too much "bloat"s and you (intuirively)
>   suspect the cause is system cache, set it to certain shorter
>   value, say 1 minutes, and set the catalog_cache_memory_target
>   to allowable amount of memory for each process. The memory
>   usage will be stable at (un)certain amount above the target.

Could you guide me how to tune these parameters in an example scenario?  Let me 
take the original problematic case referenced at the beginning of this thread.  
That is:

* A PL/pgSQL function that creates a temp table, accesses it, (accesses other 
non-temp tables), and drop the temp table.
* An application repeatedly begins a transaction, calls the stored function, 
and commits the transaction.

With v16 patch applied, and leaving the catalog_cache_xxx parameters set to 
their defaults, CacheMemoryContext continued to increase as follows:

CacheMemoryContext: 1065016 total in 9 blocks; 104168 free (17 chunks); 960848 
used
CacheMemoryContext: 8519736 total in 12 blocks; 3765504 free (19 chunks); 
4754232 used
CacheMemoryContext: 25690168 total in 14 blocks; 8372096 free (21 chunks); 
17318072 used
CacheMemoryContext: 42991672 total in 16 blocks; 11741024 free (21761 chunks); 
31250648 used

How can I make sure that this context won't exceed, say, 10 MB to avoid OOM?

I'm afraid that once the catcache hash table becomes large in a short period, 
the eviction would happen less frequently, leading to memory bloat.


Regards
Takayuki Tsunakawa

RE: Protect syscache from bloating with negative cache entries

2019-02-25 Thread Tsunakawa, Takayuki

From: Robert Haas [mailto:robertmh...@gmail.com]
> I don't understand the idea that we would add something to PostgreSQL
> without proving that it has value.  Sure, other systems have somewhat
> similar systems, and they have knobs to tune them.  But, first, we
> don't know that those other systems made all the right decisions, and
> second, even they are, that doesn't mean that we'll derive similar
> benefits in a system with a completely different code base and many
> other internal differences.

I understand that general idea.  So, I don't have an idea why the proposed 
approach, eviction based only on elapsed time only at hash table expansion, is 
better for PostgreSQL's code base and other internal differences...


> You need to demonstrate that each and every GUC you propose to add has
> a real, measurable benefit in some plausible scenario.  You can't just
> argue that other people have something kinda like this so we should
> have it too.  Or, well, you can argue that, but if you do, then -1
> from me.

The benefit of the size limit are:
* Controllable and predictable memory usage.  The DBA can be sure that OOM 
won't happen.
* Smoothed (non-abnormal) transaction response time.  This is due to the 
elimination of bulk eviction of cache entries.


I'm not sure how to tune catalog_cache_prune_min_age and 
catalog_cache_memory_target.  Let me pick up a test scenario in a later mail in 
response to Horiguchi-san.


Regards
Takayuki Tsunakawa

Re: Protect syscache from bloating with negative cache entries

2019-02-24 Thread Kyotaro HORIGUCHI

At Mon, 25 Feb 2019 15:23:22 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190225.152322.104148315.horiguchi.kyot...@lab.ntt.co.jp>
> I think the two parameters are to be tuned in the following
> steps.
> 
> - If the default setting sutisfies you, leave it alone. (as a
>   general suggestion)
> 
> - If you find your (syscache-sensitive) query are to be executed
>   with rather longer intervals, say 10-30 minutes, and it gets
>   slower than shorter intervals, consider increase
>   catalog_cache_prune_min_age to about the query interval. If you
>   don't suffer process-bloat, that's fine.
> 
> - If you find the process too much "bloat"s and you (intuirively)
>   suspect the cause is system cache, set it to certain shorter
>   value, say 1 minutes, and set the catalog_cache_memory_target
>   to allowable amount of memory for each process. The memory
>   usage will be stable at (un)certain amount above the target.
> 
> 
> Or, if you want determine the setting previously with rather
> strict limit, and if the monitoring feature were a part of this
> patchset, a user can check how much memory is used for the query.
> 
> $ perl -e 'print "set track_catalog_cache_usage_interval = 1000;\n"; for 
> (0..) { print "CREATE TABLE foo$_ PARTITION OF foo FOR VALUES WITH 
> (MODULUS 1, REMAINDER $_);\n"; } print "select sum(size) from 
> pg_stat_syscache";' | psql
> 
>sum   
> -
>  7088523


It's not substantial, but the number is for
catalog_cache_prune_min_age = 300s, I had 12MB when it is
disabled.

perl -e 'print "set catalog_cache_prune_min_age to 0; set 
track_catalog_cache_usage_interval = 1000;\n"; for (0..) { print "CREATE 
TABLE foo$_ PARTITION OF foo FOR VALUES WITH (MODULUS 1, REMAINDER $_);\n"; 
} print "select sum(size) from pg_stat_syscache";' | psql

   sum
--
 12642321

> In this case, set catalog_cache_memory_target to 7MB and
> catalog_cache_memory_target to '1min'. Since the target doesn't
> work strictly (checked only at every resizing time), possibly
> you need further tuning.

regards.

- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Protect syscache from bloating with negative cache entries

2019-02-24 Thread Kyotaro HORIGUCHI

At Wed, 20 Feb 2019 13:09:08 -0500, Robert Haas  wrote 
in 
> On Tue, Feb 19, 2019 at 11:15 PM Kyotaro HORIGUCHI
>  wrote:
> > Difference from v15:
> >
> >   Removed AllocSet accounting stuff. We use approximate memory
> >   size for catcache.
> >
> >   Removed prune-by-number(or size) stuff.
> >
> >   Adressing comments from Tsunakawa-san and Ideriha-san .
> >
> >   Separated catcache monitoring feature. (Removed from this set)
> > (But it is crucial to check this feature...)
> >
> > Is this small enough ?
> 
> The commit message in 0002 says 'This also can put a hard limit on the
> number of catcache entries.' but neither of the GUCs that you've
> documented have that effect.  Is that a leftover from a previous
> version?

Mmm. Right. Thank you for pointing that and sorry for that. Fixed
it including another mistake in the commit message in my repo. It
will appear in the next version.

| Remove entries that haven't been used for a certain time
| 
| Catcache entries can be left alone for several reasons. It is not
| desirable that they eat up memory. With this patch, entries that
| haven't been used for a certain time are considered to be removed
| before enlarging hash array.

> I'd like to see some evidence that catalog_cache_memory_target has any
> value, vs. just always setting it to zero.  I came up with the
> following somewhat artificial example that shows that it might have
> value.
> 
> rhaas=# create table foo (a int primary key, b text) partition by hash (a);
> [rhaas pgsql]$ perl -e 'for (0..) { print "CREATE TABLE foo$_
> PARTITION OF foo FOR VALUES WITH (MODULUS 1, REMAINDER $_);\n"; }'
> | psql
> 
> First execution of 'select * from foo' in a brand new session takes
> about 1.9 seconds; subsequent executions take about 0.7 seconds.  So,
> if catalog_cache_memory_target were set to a high enough value to
> allow all of that stuff to remain in cache, we could possibly save
> about 1.2 seconds coming off the blocks after a long idle period.
> That might be enough to justify having the parameter.  But I'm not
> quite sure how high the value would need to be set to actually get the
> benefit in a case like that, or what happens if you set it to a value
> that's not quite high enough.

It is artificial (or acutually wont't be repeatedly executed in a
session) but anyway what can get benefit from
catalog_cache_memory_target would be a kind of extreme.

I think the two parameters are to be tuned in the following
steps.

- If the default setting sutisfies you, leave it alone. (as a
  general suggestion)

- If you find your (syscache-sensitive) query are to be executed
  with rather longer intervals, say 10-30 minutes, and it gets
  slower than shorter intervals, consider increase
  catalog_cache_prune_min_age to about the query interval. If you
  don't suffer process-bloat, that's fine.

- If you find the process too much "bloat"s and you (intuirively)
  suspect the cause is system cache, set it to certain shorter
  value, say 1 minutes, and set the catalog_cache_memory_target
  to allowable amount of memory for each process. The memory
  usage will be stable at (un)certain amount above the target.


Or, if you want determine the setting previously with rather
strict limit, and if the monitoring feature were a part of this
patchset, a user can check how much memory is used for the query.

$ perl -e 'print "set track_catalog_cache_usage_interval = 1000;\n"; for 
(0..) { print "CREATE TABLE foo$_ PARTITION OF foo FOR VALUES WITH (MODULUS 
1, REMAINDER $_);\n"; } print "select sum(size) from pg_stat_syscache";' | 
psql

   sum   
-
 7088523

In this case, set catalog_cache_memory_target to 7MB and
catalog_cache_memory_target to '1min'. Since the target doesn't
work strictly (checked only at every resizing time), possibly
you need further tuning.

> that's not quite high enough.  I think it might be good to play around
> some more with cases like this, just to get a feeling for how much
> time you can save in exchange for how much memory.

All kind of tuning is something of that kind, I think.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Protect syscache from bloating with negative cache entries

2019-02-22 Thread Robert Haas

On Thu, Feb 21, 2019 at 1:38 AM Tsunakawa, Takayuki
 wrote:
> Why don't we consider this just like the database cache and other DBMS's 
> dictionary caches?  That is,
>
> * If you want to avoid infinite memory bloat, set the upper limit on size.
>
> * To find a better limit, check the hit ratio with the statistics view (based 
> on Horiguchi-san's original 0004 patch, although that seems modification 
> anyway)
>
> Why do people try to get away from a familiar idea...  Am I missing something?

I don't understand the idea that we would add something to PostgreSQL
without proving that it has value.  Sure, other systems have somewhat
similar systems, and they have knobs to tune them.  But, first, we
don't know that those other systems made all the right decisions, and
second, even they are, that doesn't mean that we'll derive similar
benefits in a system with a completely different code base and many
other internal differences.

You need to demonstrate that each and every GUC you propose to add has
a real, measurable benefit in some plausible scenario.  You can't just
argue that other people have something kinda like this so we should
have it too.  Or, well, you can argue that, but if you do, then -1
from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: Protect syscache from bloating with negative cache entries

2019-02-22 Thread Ideriha, Takeshi

>From: Tsunakawa, Takayuki
>Ideriha-san,
>Could you try simplifying the v15 patch set to see how simple the code would 
>look or
>not?  That is:
>
>* 0001: add dlist_push_tail() ... as is
>* 0002: memory accounting, with correction based on feedback
>* 0003: merge the original 0003 and 0005, with correction based on feedback

Attached are simpler version based on Horiguchi san's ver15 patch, 
which means cache is pruned by both time and size.
(Still cleanup function is complex but it gets much simpler.)

Regards,
Takeshi Ideriha


v15-0003-ideriha-Remove-CatCache-Entries.patch
Description: v15-0003-ideriha-Remove-CatCache-Entries.patch


binBLOINvJH0o.bin
Description: v15-0002-ideriha-Memory-consumption-report-reature-of-memorycontext.patch


v15-0001-Add-dlist_move_tail.patch
Description: v15-0001-Add-dlist_move_tail.patch

Re: Protect syscache from bloating with negative cache entries

2019-02-21 Thread 'Bruce Momjian'

On Tue, Feb 19, 2019 at 07:08:14AM +, Tsunakawa, Takayuki wrote:
> We all have to manage things within resource constraints.  The DBA
> wants to make sure the server doesn't overuse memory to avoid crash
> or slowdown due to swapping.  Oracle does it, and another open source
> database, MySQL, does it too.  PostgreSQL does it with shared_buffers,
> wal_buffers, and work_mem (within a single session).  Then, I thought
> it's natural to do it with catcache/relcache/plancache.

I already addressed these questions in an email from Feb 14:

https://www.postgresql.org/message-id/20190214154955.gb19...@momjian.us

I understand the operational needs of limiting resources in some cases,
but there is also the history of OS's using working set to allocate
things, which didn't work too well:

https://en.wikipedia.org/wiki/Working_set

I think we need to address the most pressing problem of unlimited cache size
bloat and then take a holistic look at all memory allocation.  If we
are going to address that in a global way, I don't see the relation
cache as the place to start.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

RE: Protect syscache from bloating with negative cache entries

2019-02-20 Thread Tsunakawa, Takayuki

From: Robert Haas [mailto:robertmh...@gmail.com]
> That might be enough to justify having the parameter.  But I'm not 
> quite sure how high the value would need to be set to actually get the 
> benefit in a case like that, or what happens if you set it to a value 
> that's not quite high enough.  I think it might be good to play around 
> some more with cases like this, just to get a feeling for how much 
> time you can save in exchange for how much memory.

Why don't we consider this just like the database cache and other DBMS's 
dictionary caches?  That is,

* If you want to avoid infinite memory bloat, set the upper limit on size.

* To find a better limit, check the hit ratio with the statistics view (based 
on Horiguchi-san's original 0004 patch, although that seems modification anyway)


Why do people try to get away from a familiar idea...  Am I missing something?

Ideriha-san,
Could you try simplifying the v15 patch set to see how simple the code would 
look or not?  That is:

* 0001: add dlist_push_tail() ... as is
* 0002: memory accounting, with correction based on feedback
* 0003: merge the original 0003 and 0005, with correction based on feedback


Regards
Takayuki Tsunakawa

RE: Protect syscache from bloating with negative cache entries

2019-02-20 Thread Tsunakawa, Takayuki

From: Ideriha, Takeshi/出利葉 健
> I checked it with perf record -avg and perf report.
> The following shows top 20 symbols during benchmark including kernel space.
> The main difference between master (unpatched) and patched one seems that
> patched one consumes cpu catcache-evict-and-refill functions including
> SearchCatCacheMiss(),  CatalogCacheCreateEntry(),
> CatCacheCleanupOldEntries().
> So it seems to me that these functions needs further inspection
> to suppress the performace decline as much as possible

Thank you.  It's good to see the expected functions, rather than strange 
behavior.  The performance drop is natural just like the database cache's hit 
ratio is low.  The remedy for performance by the user is also the same as the 
database cache -- increase the catalog cache.


Regards
Takayuki Tsunakawa

RE: Protect syscache from bloating with negative cache entries

2019-02-20 Thread Ideriha, Takeshi

>From: Tsunakawa, Takayuki
>>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>> number of tables   | 100  |1000|1
>> ---
>> TPS (master)   |10966  |10654 |9099
>> TPS (patch)| 11137 (+1%) |10710 (+0%) |772 (-91%)
>>
>> It seems that before cache exceeding the limit (no pruning at 100 and
>> 1000), the results are almost same with master but after exceeding the
>> limit (at
>> 1)
>> the decline happens.
>
>How many concurrent clients?
One client (default setting). 

>Can you show the perf's call graph sampling profiles of both the unpatched and
>patched version, to confirm that the bottleneck is around catcache eviction 
>and refill?

I checked it with perf record -avg and perf report. 
The following shows top 20 symbols during benchmark including kernel space.
The main difference between master (unpatched) and patched one seems that
patched one consumes cpu catcache-evict-and-refill functions including 
SearchCatCacheMiss(),  CatalogCacheCreateEntry(), CatCacheCleanupOldEntries().
So it seems to me that these functions needs further inspection 
to suppress the performace decline as much as possible 

Master(%) master|patch (%)  patch
51.25%  cpu_startup_entry   |   51.45%  cpu_startup_entry
51.13%  arch_cpu_idle   |   51.19%  arch_cpu_idle
51.13%  default_idle|   51.19%  default_idle
51.13%  native_safe_halt|   50.95%  native_safe_halt
36.27%  PostmasterMain  |   46.98%  PostmasterMain
36.27%  main|   46.98%  main
36.27%  __libc_start_main   |   46.98%  __libc_start_main
36.07%  ServerLoop  |   46.93%  ServerLoop
35.75%  PostgresMain|   46.89%  PostgresMain
26.03%  exec_simple_query   |   45.99%  exec_simple_query
26.00%  rest_init   |   43.40%  SearchCatCacheMiss
26.00%  start_kernel|   42.80%  CatalogCacheCreateEntry
26.00%  x86_64_start_reservations   |   42.75%  
CatCacheCleanupOldEntries
26.00%  x86_64_start_kernel |   27.04%  rest_init
25.26%  start_secondary |   27.04%  start_kernel
10.25%  pg_plan_queries |   27.04%  x86_64_start_reservations
10.17%  pg_plan_query   |   27.04%  x86_64_start_kernel
10.16%  main|   24.42%  start_secondary
10.16%  __libc_start_main   |   22.35%  pg_analyze_and_rewrite
10.03%  standard_planner|   22.35%  parse_analyze

Regards,
Takeshi Ideriha

Re: Protect syscache from bloating with negative cache entries

2019-02-20 Thread Robert Haas

On Tue, Feb 19, 2019 at 11:15 PM Kyotaro HORIGUCHI
 wrote:
> Difference from v15:
>
>   Removed AllocSet accounting stuff. We use approximate memory
>   size for catcache.
>
>   Removed prune-by-number(or size) stuff.
>
>   Adressing comments from Tsunakawa-san and Ideriha-san .
>
>   Separated catcache monitoring feature. (Removed from this set)
> (But it is crucial to check this feature...)
>
> Is this small enough ?

The commit message in 0002 says 'This also can put a hard limit on the
number of catcache entries.' but neither of the GUCs that you've
documented have that effect.  Is that a leftover from a previous
version?

I'd like to see some evidence that catalog_cache_memory_target has any
value, vs. just always setting it to zero.  I came up with the
following somewhat artificial example that shows that it might have
value.

rhaas=# create table foo (a int primary key, b text) partition by hash (a);
[rhaas pgsql]$ perl -e 'for (0..) { print "CREATE TABLE foo$_
PARTITION OF foo FOR VALUES WITH (MODULUS 1, REMAINDER $_);\n"; }'
| psql

First execution of 'select * from foo' in a brand new session takes
about 1.9 seconds; subsequent executions take about 0.7 seconds.  So,
if catalog_cache_memory_target were set to a high enough value to
allow all of that stuff to remain in cache, we could possibly save
about 1.2 seconds coming off the blocks after a long idle period.
That might be enough to justify having the parameter.  But I'm not
quite sure how high the value would need to be set to actually get the
benefit in a case like that, or what happens if you set it to a value
that's not quite high enough.  I think it might be good to play around
some more with cases like this, just to get a feeling for how much
time you can save in exchange for how much memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Protect syscache from bloating with negative cache entries

2019-02-19 Thread Kyotaro HORIGUCHI

At Thu, 14 Feb 2019 00:40:10 -0800, Andres Freund  wrote in 
<20190214084010.bdn6tmba2j7sz...@alap3.anarazel.de>
> Hi,
> 
> On 2019-02-13 15:31:14 +0900, Kyotaro HORIGUCHI wrote:
> > Instead, I added an accounting(?) interface function.
> > 
> > | MemoryContextGettConsumption(MemoryContext cxt);
> > 
> > The API returns the current consumption in this memory
> > context. This allows "real" memory accounting almost without
> > overhead.
> 
> That's definitely *NOT* almost without overhead. This adds additional
> instructions to one postgres' hottest set of codepaths.

I'm not sure how much the two instructions in AllocSetAlloc
actually impacts, but I agree that it is doubtful that the
size-limit feature worth the possible slowdown in any extent.

# I faintly remember that I tried the same thing before..

> I think you're not working incrementally enough here. I strongly suggest
> solving the negative cache entry problem, and then incrementally go from
> there after that's committed. The likelihood of this patch ever getting
> merged otherwise seems extremely small.

Mmm. Scoping to the negcache prolem, my very first patch posted
two-years ago does that based on invalidation for pg_statistic
and pg_class, like I think Tom have suggested somewhere in this
thread.

https://www.postgresql.org/message-id/20161219.201505.11562604.horiguchi.kyot...@lab.ntt.co.jp

This is completely different approach from the current shape and
it would be useless after pruning is introduced. So I'd like to
go for the generic pruning by age.

Difference from v15:

  Removed AllocSet accounting stuff. We use approximate memory
  size for catcache.

  Removed prune-by-number(or size) stuff.

  Adressing comments from Tsunakawa-san and Ideriha-san .

  Separated catcache monitoring feature. (Removed from this set)
(But it is crucial to check this feature...)


Is this small enough ?

regards.  

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 191496e02abd4d7b261705e8d2a0ef4aed5827c7 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 7 Feb 2019 14:56:07 +0900
Subject: [PATCH 1/2] Add dlist_move_tail

We have dlist_push_head/tail and dlist_move_head but not
dlist_move_tail. Add it.
---
 src/include/lib/ilist.h | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index b1a5974ee4..659ab1ac87 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -394,6 +394,25 @@ dlist_move_head(dlist_head *head, dlist_node *node)
 	dlist_check(head);
 }
 
+/*
+ * Move element from its current position in the list to the tail position in
+ * the same list.
+ *
+ * Undefined behaviour if 'node' is not already part of the list.
+ */
+static inline void
+dlist_move_tail(dlist_head *head, dlist_node *node)
+{
+	/* fast path if it's already at the tail */
+	if (head->head.prev == node)
+		return;
+
+	dlist_delete(node);
+	dlist_push_tail(head, node);
+
+	dlist_check(head);
+}
+
 /*
  * Check whether 'node' has a following node.
  * Caution: unreliable if 'node' is not in the list.
-- 
2.16.3

>From 59f53da08abb70398611b33f635b46bda87a7534 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 16 Oct 2018 13:04:30 +0900
Subject: [PATCH 2/2] Remove entries that haven't been used for a certain time

Catcache entries can be left alone for several reasons. It is not
desirable that they eat up memory. With this patch, This adds
consideration of removal of entries that haven't been used for a
certain time before enlarging the hash array.

This also can put a hard limit on the number of catcache entries.
---
 doc/src/sgml/config.sgml  |  40 +
 src/backend/tcop/postgres.c   |  13 ++
 src/backend/utils/cache/catcache.c| 243 --
 src/backend/utils/init/globals.c  |   1 +
 src/backend/utils/init/postinit.c |  11 ++
 src/backend/utils/misc/guc.c  |  23 +++
 src/backend/utils/misc/postgresql.conf.sample |   2 +
 src/include/miscadmin.h   |   1 +
 src/include/utils/catcache.h  |  43 -
 src/include/utils/timeout.h   |   1 +
 10 files changed, 364 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8bd57f376b..7a93aef659 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1661,6 +1661,46 @@ include_dir 'conf.d'
   
  
 
+ 
+  catalog_cache_prune_min_age (integer)
+  
+   catalog_cache_prune_min_age configuration
+   parameter
+  
+  
+  
+   
+Specifies the minimum amount of unused time in seconds at which a
+system catalog cache entry is removed. -1 indicates that this feature
+is disabled at all. The value defaults to 300 seconds (5
+minutes). The catalog cache entries that are not used for
+the duration can be removed to prevent it from

RE: Protect syscache from bloating with negative cache entries

2019-02-19 Thread Tsunakawa, Takayuki

From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
> number of tables   | 100  |1000|1
> ---
> TPS (master)   |10966  |10654 |9099
> TPS (patch)| 11137 (+1%) |10710 (+0%) |772 (-91%)
> 
> It seems that before cache exceeding the limit (no pruning at 100 and 1000),
> the results are almost same with master but after exceeding the limit (at
> 1)
> the decline happens.

How many concurrent clients?

Can you show the perf's call graph sampling profiles of both the unpatched and 
patched version, to confirm that the bottleneck is around catcache eviction and 
refill?


Regards
Takayuki Tsunakawa

RE: Protect syscache from bloating with negative cache entries

2019-02-19 Thread Ideriha, Takeshi

>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]
>But at the same time, I did some benchmark with only hard limit option enabled 
>and
>time-related option disabled, because the figures of this case are not 
>provided in this
>thread.
>So let me share it.

I'm sorry but I'm taking back result about patch and correcting it.
I configured postgresql (master) with only CFLAGS=O2
but I misconfigured postgres (path applied) with 
--enable-cassert --enable-debug --enable-tap-tests 'CFLAGS=-O0'.
These debug option (especially --enable-cassert) caused enourmous overhead.
(I thought I checked the configure option.. I was maybe tired.)
So I changed these to only 'CFLAGS=-O2' and re-measured them.

>I did two experiments. One is to show negative cache bloat is suppressed.
>This thread originated from the issue that negative cache of pg_statistics is 
>bloating as
>creating and dropping temp table is repeatedly executed.
>https://www.postgresql.org/message-id/20161219.201505.11562604.horiguchi.kyot
>aro%40lab.ntt.co.jp
>Using the script attached the first email in this thread, I repeated create 
>and drop
>temp table at 1 times.
>(experiment is repeated 5 times. catalog_cache_max_size = 500kB.
> compared master branch and patch with hard memory limit)
>
>Here are TPS and CacheMemoryContext 'used' memory (total - freespace) 
>calculated
>by MemoryContextPrintStats() at 100, 1000, 1 times of create-and-drop
>transaction. The result shows cache bloating is suppressed after exceeding the 
>limit
>(at 1) but tps declines regardless of the limit.
>
>number of tx (create and drop)   | 100  |1000|1
>---
>used CacheMemoryContext  (master) |610296|2029256 |15909024 used
>CacheMemoryContext  (patch)  |755176|880552  |880592
>---
>TPS (master) |414   |407 |399
>TPS (patch)   |242   |225 |220

Correct one:
number of tx (create and drop)   | 100  |1000|1
---
TPS (master) |414   |407 |399
TPS (patch)   |447   |415 |409

The results between master and patch is almost same.


>Another experiment is using Tomas's script posted while ago, The scenario is 
>do select
>1 from multiple tables randomly (uniform distribution).
>(experiment is repeated 5 times. catalog_cache_max_size = 10MB.
> compared master branch and patch with only hard memory limit enabled)
>
>Before doing the benchmark, I checked pruning is happened only at 1 tables 
>using
>debug option. The result shows degradation regardless of before or after 
>pruning.
>I personally still need hard size limitation but I'm surprised that the 
>difference is so
>significant.
>
>number of tables   | 100  |1000|1
>---
>TPS (master)   |10966  |10654 |9099
>TPS (patch)|4491   |2099 |378

Correct one:
number of tables   | 100  |1000|1
---
TPS (master)   |10966  |10654 |9099
TPS (patch)| 11137 (+1%) |10710 (+0%) |772 (-91%)

It seems that before cache exceeding the limit (no pruning at 100 and 1000),
the results are almost same with master but after exceeding the limit (at 
1) 
the decline happens.


Regards,
Takeshi Ideriha

Re: Protect syscache from bloating with negative cache entries

2019-02-19 Thread Tomas Vondra



On 2/19/19 12:43 AM, Tsunakawa, Takayuki wrote:
> Hi Horiguchi-san,
> 
> I've looked through your patches.  This is the first part of my review 
> results.  Let me post the rest after another work today.
> 
> BTW, how about merging 0003 and 0005, and separating and deferring 0004 in 
> another thread?  That may help to relieve other community members by making 
> this patch set not so large and complex.
> 
> 
> 
> [Bottleneck investigation]
> Ideriha-san and I are trying to find the bottleneck.  My first try shows 
> there's little overhead.  Here's what I did:
> 
> 
> shared_buffers = 1GB
> catalog_cache_prune_min_age = -1
> catalog_cache_max_size = 10MB
> 
> 
> $ pgbench -i -s 10
> $ pg_ctl stop and then start
> $ cache all data in shared buffers by running pg_prewarm on branches, 
> tellers, accounts, and their indexes
> $ pgbench --select-only -c 1 -T 60
> 
> 
> master : 8612 tps
> patched: 8553 tps (-0.7%)
> 
> There's little (0.7%) performance overhead with:
> * one additional dlist_move_tail() in every catcache access
> * memory usage accounting in operations other than catcache access (relevant 
> catcache entries should be cached in the first pgbench transaction)
> 
> I'll check other patterns to find out how big overhead there is.
> 

0.7% may easily be just a noise, possibly due to differences in layout
of the binary. How many runs? What was the variability of the results
between runs? What hardware was this tested on?

FWIW I doubt tests with such small small schema are proving anything -
the cache/lists are likely tiny. That's why I tested with much larger
number of relations.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Protect syscache from bloating with negative cache entries

2019-02-18 Thread Tsunakawa, Takayuki

From: 'Bruce Momjian' [mailto:br...@momjian.us]
> I think, in general, smaller is better, as long as making something
> smaller doesn't remove data that is frequently accessed.  Having a timer
> to expire only old entries seems like it accomplished this goal.
> 
> Having a minimum size and not taking it to zero size makes sense if we
> know we will need certain entries like pg_class in the next query.
> However, if the session is idle for hours, we should just probably
> remove everything, so maybe the minimum doesn't make sense --- just
> remove everything.

That's another interesting idea.  A somewhat relevant feature is Oracle's 
"ALTER SYSTEM FLUSH SHARED_POOL".  It flushes all dictionary cache, library 
cache, and SQL plan entries.  The purpose is different: not to release memory, 
but to defragment the shared memory.


> I don't think other DBMSs are a good model since they have a reputation
> for requiring a lot of tuning --- tuning that we have often automated.

Yeah, I agree that PostgreSQL is easier to use in many aspects.

On the other hand, although I hesitate to say this (please don't get upset...), 
I feel PostgreSQL is a bit too loose about memory usage.  To my memory, 
PostgreSQL crashed OS due to OOM in our user environments:

* Creating and dropping temp tables repeatedly in a stored PL/pgSQL function.  
This results in infinite CacheMemoryContext bloat.  This is referred to at the 
beginning of this mail thread.
Oracle and MySQL can limit the size of the dictionary cache.

* Each pair of SAVEPOINT/RELEASE leaves 8KB of CurTransactionContext.  The 
customer used psqlODBC to run a batch app, which ran millions of SQL statements 
in a transaction.  psqlODBC wraps each SQL statement with SAVEPOINT and RELEASE 
by default.
I guess this is what caused the crash of AWS Aurora in last year's Amazon Prime 
Day.

* Setting a large value to work_mem, and then run many concurrent large queries.
Oracle can limit the total size of all sessions' memory with 
PGA_AGGREGATE_TARGET parameter.


We all have to manage things within resource constraints.  The DBA wants to 
make sure the server doesn't overuse memory to avoid crash or slowdown due to 
swapping.  Oracle does it, and another open source database, MySQL, does it 
too.  PostgreSQL does it with shared_buffers, wal_buffers, and work_mem (within 
a single session).  Then, I thought it's natural to do it with 
catcache/relcache/plancache.


Regards
Takayuki Tsunakawa

RE: Protect syscache from bloating with negative cache entries

2019-02-18 Thread Tsunakawa, Takayuki

Hi Horiguchi-san,

This is the rest of my review comments.



(5) patch 0003
CatcacheClockTimeoutPending = 0;
+
+   /* Update timetamp then set up the next timeout */
+

false is better than 0, to follow other **Pending variables.

timetamp -> timestamp


(6) patch 0003
GetCatCacheClock() is not used now.  Why don't we add it when the need arises?


(7) patch 0003
Why don't we remove the catcache timer (Setup/UpdateCatCacheClockTimer), unless 
we need it by all means?  That simplifies the code.

Long-running queries can be thought as follows:

* A single lengthy SQL statement, e.g. SELECT for reporting/analytics, COPY for 
data loading, and UPDATE/DELETE for batch processing, should only require small 
number of catalog entries during their query analysis/planning.  They won't 
suffer from cache eviction during query execution.

* Do not have to evict cache entries while executing a long-running stored 
procedure, because its constituent SQL statements may access the same tables.  
If the stored procedure accesses so many tables that you are worried about the 
catcache memory overuse, then catalog_cache_max_size can be used.  Another 
natural idea would be to update the cache clock when SPI executes each SQL 
statement.


(8) patch 0003
+   uint64  base_size;
+   uint64  base_size = 
MemoryContextGetConsumption(CacheMemoryContext);

This may also as well be Size, not uint64.


(9) patch 0003
@@ -1940,7 +2208,7 @@ CatCacheFreeKeys(TupleDesc tupdesc, int nkeys, int 
*attnos, Datum *keys)
 /*
  * Helper routine that copies the keys in the srckeys array into the dstkeys
  * one, guaranteeing that the datums are fully allocated in the current memory
- * context.
+ * context. Returns allocated memory size.
  */
 static void
 CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int *attnos,
@@ -1976,7 +2244,6 @@ CatCacheCopyKeys(TupleDesc tupdesc, int nkeys, int 
*attnos,
   att->attbyval,
   att->attlen);
}
-
 }

This change seem to be no longer necessary thanks to the memory accounting.


(10) patch 0004
How about separating this in another thread, so that the rest of the patch set 
becomes easier to review and commit?

Regarding the design, I'm inclined to avoid each backend writing the file.  To 
simplify the code, I think we can take advantage of the fortunate situation -- 
the number of backends and catcaches are fixed at server startup.  My rough 
sketch is:

* Allocate an array of statistics entries in shared memory, whose element is 
(pid or backend id, catcache id or name, hits, misses, ...).  The number of 
array elements is MaxBackends * number of catcaches (some dozens).

* Each backend updates its own entry in the shared memory during query 
execution.

* Stats collector periodically scans the array and write it to the stats file.


(11) patch 0005
+dlist_head cc_lru_list = {0};
+Size   global_size = 0;

It is better to put these in CatCacheHeader.  That way, backends that do not 
access the catcache (archiver, stats collector, etc.) do not have to waste 
memory for these global variables.


(12) patch 0005
+   else if (catalog_cache_max_size > 0 &&
+global_size > catalog_cache_max_size * 1024)
CatCacheCleanupOldEntries(cache);

On the second line, catalog_cache_max_size should be cast to Size to avoid 
overflow.


(13) patch 0005
+   gettext_noop("Sets the maximum size of catcache in 
kilobytes."),

catcache -> catalog cache


(14) patch 0005
+   CatCache   *owner;  /* owner catcache */

CatCTup already has my_cache member.


(15) patch 0005
if (nremoved > 0)
elog(DEBUG1, "pruning catalog cache id=%d for %s: removed %d / 
%d",
 cp->id, cp->cc_relname, nremoved, nelems_before);

In prune-by-size case, this elog doesn't very meaningful data.  How about 
dividing this function into two, one is for prune-by-age and another for 
prune-by-size?  I supppose that would make the functions easier to understand.


Regards
Takayuki Tsunakawa

RE: Protect syscache from bloating with negative cache entries

2019-02-18 Thread Tsunakawa, Takayuki

Hi Horiguchi-san,

I've looked through your patches.  This is the first part of my review results. 
 Let me post the rest after another work today.

BTW, how about merging 0003 and 0005, and separating and deferring 0004 in 
another thread?  That may help to relieve other community members by making 
this patch set not so large and complex.



[Bottleneck investigation]
Ideriha-san and I are trying to find the bottleneck.  My first try shows 
there's little overhead.  Here's what I did:


shared_buffers = 1GB
catalog_cache_prune_min_age = -1
catalog_cache_max_size = 10MB


$ pgbench -i -s 10
$ pg_ctl stop and then start
$ cache all data in shared buffers by running pg_prewarm on branches, tellers, 
accounts, and their indexes
$ pgbench --select-only -c 1 -T 60


master : 8612 tps
patched: 8553 tps (-0.7%)

There's little (0.7%) performance overhead with:
* one additional dlist_move_tail() in every catcache access
* memory usage accounting in operations other than catcache access (relevant 
catcache entries should be cached in the first pgbench transaction)

I'll check other patterns to find out how big overhead there is.


[Source code review]
Below are my findings on the patch set v15:

(1) patch 0001
All right.


(2) patch 0002
@@ -87,6 +87,7 @@ typedef struct MemoryContextData
const char *name;   /* context name (just for 
debugging) */
const char *ident;  /* context ID if any (just for 
debugging) */
MemoryContextCallback *reset_cbs;   /* list of reset/delete 
callbacks */
+   uint64  consumption;/* accumulates consumed memory size */
 } MemoryContextData;

Size is more appropriate as a data type than uint64 because other places use 
Size for memory size variables.

How about "usedspace" instead of "consumption"?  Because that aligns better 
with the naming used for MemoryContextCounters's member variables, totalspace 
and freespace.


(3) patch 0002
+   context->consumption += chunk_size;
(and similar sites)

The used space should include the size of the context-type-specific chunk 
header, so that the count is closer to the actual memory size seen by the user.

Here, let's make consensus on what the used space represents.  Is it either of 
the following?

a) The total space allocated from OS.  i.e., the sum of the malloc()ed regions 
for a given memory context.
b) The total space of all chunks, including their headers, of a given memory 
context.

a) is better because that's the actual memory usage from the DBA's standpoint.  
But a) cannot be used because CacheMemoryContext is used for various things.  
So we have to compromise on b).  Is this OK?

One possible future improvement is to use a separate memory context exclusively 
for the catcache, which is a child of CacheMemoryContext.  That way, we can 
adopt a).



(4) patch 0002
@@ -614,6 +614,9 @@ AllocSetReset(MemoryContext context)
+   set->header.consumption = 0;

This can be put in MemoryContextResetOnly() instead of context-type-specific 
reset functions.


Regards
Takayuki Tsunakawa

Re: Protect syscache from bloating with negative cache entries

2019-02-18 Thread Alvaro Herrera

On 2019-Feb-15, Tomas Vondra wrote:

> ISTM there's a couple of ways to deal with that:
> 
> 1) Ignore the memory amounts entirely, and do just time-base eviction.
> 
> 2) If we want some size thresholds (e.g. to disable eviction for
> backends with small caches etc.) use the number of entries instead. I
> don't think that's particularly worse that specifying size in MB.

Why is there a *need* for size-based eviction?  Seems that time-based
should be sufficient.  Is the proposed approach to avoid eviction at all
until the size threshold has been reached?  I'm not sure I see the point
of that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Protect syscache from bloating with negative cache entries

2019-02-17 Thread Tsunakawa, Takayuki

From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com]
> I think "catalog_cache_..." is fine. If we end up with a similar
> patchfor relcache, we can probably call it "relation_cache_".

Agreed, those are not too long or too short, and they are sufficiently 
descriptive.


Regards
Takayuki Tsunakawa

Re: Protect syscache from bloating with negative cache entries

2019-02-15 Thread Tomas Vondra

On 2/14/19 4:49 PM, 'Bruce Momjian' wrote:
> On Thu, Feb 14, 2019 at 01:31:49AM +, Tsunakawa, Takayuki wrote:
>> From: Bruce Momjian [mailto:br...@momjian.us]
 That being said, having a "minimal size" threshold before starting
 with the time-based eviction may be a good idea.
>>>
>>> Agreed.  I see the minimal size as a way to keep the systems tables
>>> in cache, which we know we will need for the next query.
>>
>> Isn't it the maximum size, not minimal size?  Maximum size allows
>> to keep desired amount of system tables in memory as well as to
>> control memory consumption to avoid out-of-memory errors (OS crash!).
>> I'm wondering why people want to take a different approach to
>> catcatch, which is unlike other PostgreSQL memory e.g. shared_buffers,
>> temp_buffers, SLRU buffers, work_mem, and other DBMSs.
> 
> Well, that is an _excellent_ question, and one I had to think about.
> 

I think we're talking about two different concepts here:

1) minimal size - We don't do any extra eviction at all, until we reach
this cache size. So we don't get any extra overhead from it. If a system
does not have issues.

2) maximal size - We ensure the cache size is below this threshold. If
there's more data, we evict enough entries to get below it.

My proposal is essentially to do just (1), so the cache can grow very
large if needed but then it shrinks again after a while.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Protect syscache from bloating with negative cache entries

2019-02-15 Thread Tomas Vondra

On 2/14/19 3:46 PM, Bruce Momjian wrote:
> On Thu, Feb 14, 2019 at 12:40:10AM -0800, Andres Freund wrote:
>> Hi,
>>
>> On 2019-02-13 15:31:14 +0900, Kyotaro HORIGUCHI wrote:
>>> Instead, I added an accounting(?) interface function.
>>>
>>> | MemoryContextGettConsumption(MemoryContext cxt);
>>>
>>> The API returns the current consumption in this memory
>>> context. This allows "real" memory accounting almost without
>>> overhead.
>>
>> That's definitely *NOT* almost without overhead. This adds additional
>> instructions to one postgres' hottest set of codepaths.
>>
>> I think you're not working incrementally enough here. I strongly suggest
>> solving the negative cache entry problem, and then incrementally go from
>> there after that's committed. The likelihood of this patch ever getting
>> merged otherwise seems extremely small.
> 
> Agreed --- the patch is going in the wrong direction.
> 

I recall endless discussions about memory accounting in the
"memory-bounded hash-aggregate" patch a couple of years ago, and the
overhead was one of the main issues there. So yeah, trying to solve that
problem here is likely to kill this patch (or at least significantly
delay it).

ISTM there's a couple of ways to deal with that:

1) Ignore the memory amounts entirely, and do just time-base eviction.

2) If we want some size thresholds (e.g. to disable eviction for
backends with small caches etc.) use the number of entries instead. I
don't think that's particularly worse that specifying size in MB.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Protect syscache from bloating with negative cache entries

2019-02-15 Thread Tomas Vondra





On 2/13/19 1:23 AM, Tsunakawa, Takayuki wrote:
> From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
>> I'm at a loss how call syscache for users. I think it is "catalog
>> cache". The most basic component is called catcache, which is
>> covered by the syscache layer, both of then are not revealed to
>> users, and it is shown to user as "catalog cache".
>>
>> "catalog_cache_prune_min_age", "catalog_cache_memory_target", (if
>> exists) "catalog_cache_entry_limit" and
>> "catalog_cache_prune_ratio" make sense?
> 
> PostgreSQL documentation uses "system catalog" in its table of contents, so 
> syscat_cache_xxx would be a bit more familiar?  I'm for either catalog_ and 
> syscat_, but what name shall we use for the relation cache?  catcache and 
> relcache have different element sizes and possibly different usage patterns, 
> so they may as well have different parameters just like MySQL does.  If we 
> follow that idea, then the name would be relation_cache_xxx.  However, from 
> the user's viewpoint, the relation cache is also created from the system 
> catalog like pg_class and pg_attribute...
> 

I think "catalog_cache_..." is fine. If we end up with a similar
patchfor relcache, we can probably call it "relation_cache_".

I'd be OK even with "system_catalog_cache_..." - I don't think it's
overly long (better to have a longer but descriptive name), and "syscat"
just seems like unnecessary abbreviation.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Protect syscache from bloating with negative cache entries

2019-02-15 Thread Ideriha, Takeshi

>From: Ideriha, Takeshi [mailto:ideriha.take...@jp.fujitsu.com]

>>About the new global-size based evicition(2), cache entry creation
>>becomes slow after the total size reached to the limit since every one
>>new entry evicts one or more old (=
>>not-recently-used) entries. Because of not needing knbos for each
>>cache, it become far realistic. So I added documentation of
>"catalog_cache_max_size" in 0005.
>
>Now I'm also trying to benchmark, which will be posted in another email.

According to recent comments by Andres and Bruce
maybe we should address negative cache bloat step by step 
for example by reviewing Tom's patch. 

But at the same time, I did some benchmark with only hard limit option enabled
and time-related option disabled, because the figures of this case are not 
provided in this thread.
So let me share it.

I did two experiments. One is to show negative cache bloat is suppressed.
This thread originated from the issue that negative cache of pg_statistics 
is bloating as creating and dropping temp table is repeatedly executed.
https://www.postgresql.org/message-id/20161219.201505.11562604.horiguchi.kyotaro%40lab.ntt.co.jp
  
Using the script attached the first email in this thread, I repeated create and 
drop temp table at 1 times.
(experiment is repeated 5 times. catalog_cache_max_size = 500kB. 
 compared master branch and patch with hard memory limit)

Here are TPS and CacheMemoryContext 'used' memory (total - freespace) 
calculated by MemoryContextPrintStats()
at 100, 1000, 1 times of create-and-drop transaction. The result shows 
cache bloating is suppressed
after exceeding the limit (at 1) but tps declines regardless of the limit.

number of tx (create and drop)   | 100  |1000|1 
---
used CacheMemoryContext  (master) |610296|2029256 |15909024
used CacheMemoryContext  (patch)  |755176|880552  |880592
---
TPS (master) |414   |407 |399
TPS (patch)   |242   |225 |220


Another experiment is using Tomas's script posted while ago,
The scenario is do select 1 from multiple tables randomly (uniform 
distribution).
(experiment is repeated 5 times. catalog_cache_max_size = 10MB. 
 compared master branch and patch with only hard memory limit enabled)

Before doing the benchmark, I checked pruning is happened only at 1 tables
using debug option. The result shows degradation regardless of before or after 
pruning. 
I personally still need hard size limitation but I'm surprised that the 
difference is so significant.

number of tables   | 100  |1000|1 
---
TPS (master)   |10966  |10654 |9099
TPS (patch)|4491   |2099 |378

Regards,
Takeshi Ideriha

Re: Protect syscache from bloating with negative cache entries

2019-02-14 Thread 'Bruce Momjian'

On Thu, Feb 14, 2019 at 01:31:49AM +, Tsunakawa, Takayuki wrote:
> From: Bruce Momjian [mailto:br...@momjian.us]
> > > That being said, having a "minimal size" threshold before starting
> > > with the time-based eviction may be a good idea.
> >
> > Agreed.  I see the minimal size as a way to keep the systems tables
> > in cache, which we know we will need for the next query.
>
> Isn't it the maximum size, not minimal size?  Maximum size allows
> to keep desired amount of system tables in memory as well as to
> control memory consumption to avoid out-of-memory errors (OS crash!).
> I'm wondering why people want to take a different approach to
> catcatch, which is unlike other PostgreSQL memory e.g. shared_buffers,
> temp_buffers, SLRU buffers, work_mem, and other DBMSs.

Well, that is an _excellent_ question, and one I had to think about.

I think, in general, smaller is better, as long as making something
smaller doesn't remove data that is frequently accessed.  Having a timer
to expire only old entries seems like it accomplished this goal.

Having a minimum size and not taking it to zero size makes sense if we
know we will need certain entries like pg_class in the next query. 
However, if the session is idle for hours, we should just probably
remove everything, so maybe the minimum doesn't make sense --- just
remove everything.

As for why we don't do this with everything --- we can't do it with
shared_buffers since we can't change its size while the server is
running.  For work_mem, we assume all the work_mem data is for the
current query, and therefore frequently accessed.  Also, work_mem is not
memory we can just free if it is not used since it contains intermediate
results required by the current query.  I think temp_buffers, since it
can be resized in the session, actually could use a similar minimizing
feature, though that would mean it behaves slightly differently from
shared_buffers, and it might not be worth it.  Also, I assume the value
of temp_buffers was mostly for use by the current query --- yes, it can
be used for cross-query caching, but I am not sure if that is its
primary purpose.  I thought its goal was to prevent shared_buffers from
being populated with temporary per-session buffers.

I don't think other DBMSs are a good model since they have a reputation
for requiring a lot of tuning --- tuning that we have often automated.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Protect syscache from bloating with negative cache entries

2019-02-14 Thread Bruce Momjian

On Thu, Feb 14, 2019 at 12:40:10AM -0800, Andres Freund wrote:
> Hi,
> 
> On 2019-02-13 15:31:14 +0900, Kyotaro HORIGUCHI wrote:
> > Instead, I added an accounting(?) interface function.
> > 
> > | MemoryContextGettConsumption(MemoryContext cxt);
> > 
> > The API returns the current consumption in this memory
> > context. This allows "real" memory accounting almost without
> > overhead.
> 
> That's definitely *NOT* almost without overhead. This adds additional
> instructions to one postgres' hottest set of codepaths.
> 
> I think you're not working incrementally enough here. I strongly suggest
> solving the negative cache entry problem, and then incrementally go from
> there after that's committed. The likelihood of this patch ever getting
> merged otherwise seems extremely small.

Agreed --- the patch is going in the wrong direction.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Protect syscache from bloating with negative cache entries

2019-02-14 Thread Andres Freund

Hi,

On 2019-02-13 15:31:14 +0900, Kyotaro HORIGUCHI wrote:
> Instead, I added an accounting(?) interface function.
> 
> | MemoryContextGettConsumption(MemoryContext cxt);
> 
> The API returns the current consumption in this memory
> context. This allows "real" memory accounting almost without
> overhead.

That's definitely *NOT* almost without overhead. This adds additional
instructions to one postgres' hottest set of codepaths.

I think you're not working incrementally enough here. I strongly suggest
solving the negative cache entry problem, and then incrementally go from
there after that's committed. The likelihood of this patch ever getting
merged otherwise seems extremely small.

Greetings,

Andres Freund

RE: Protect syscache from bloating with negative cache entries

2019-02-14 Thread Ideriha, Takeshi

>From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
>
>
>(2) Another new patch v15-0005 on top of previous design of
>  limit-by-number-of-a-cache feature converts it to
>  limit-by-size-on-all-caches feature, which I think is
>  Tsunakawa-san wanted.
Yeah, size looks better to me.

>As far as I can see no significant degradation is found in usual (as long as 
>pruning
>doesn't happen) code paths.
>
>About the new global-size based evicition(2), cache entry creation becomes 
>slow after
>the total size reached to the limit since every one new entry evicts one or 
>more old (=
>not-recently-used) entries. Because of not needing knbos for each cache, it 
>become
>far realistic. So I added documentation of "catalog_cache_max_size" in 0005.

Now I'm also trying to benchmark, which will be posted in another email.

Here are things I noticed:

[1] compiler warning
catcache.c:109:1: warning: missing braces around initializer [-Wmissing-braces]
 dlist_head cc_lru_list = {0};
 ^
catcache.c:109:1: warning: (near initialization for ‘cc_lru_list.head’) 
[-Wmissing-braces]

[2] catalog_cache_max_size is not appered in postgresql.conf.sample

[3] global lru list and global size can be included in CatCacheHeader, which 
seems to me 
good place because this structure contains global cache information 
regardless of kind of CatCache 

[4] when applying patch with git am, there are several warnings about trailing 
white space at v15-0003

Regards,
Takeshi Ideriha

RE: Protect syscache from bloating with negative cache entries

2019-02-13 Thread Tsunakawa, Takayuki

From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp]
> It is too complex as I was afraid. The indirect calls causes siginicant
> degradation. (Anyway the previous code was bogus in that it passes
> CACHELINEALIGN'ed pointer to get_chunk_size..)
> 
> Instead, I added an accounting(?) interface function.
> 
> | MemoryContextGettConsumption(MemoryContext cxt);
> 
> The API returns the current consumption in this memory context. This allows
> "real" memory accounting almost without overhead.

That looks like a great idea!  Actually, I was thinking of using 
MemoryContextStats() or its new lightweight variant to get the used amount, but 
I was afraid it would be too costly to call in catcache code.  You are smarter, 
and I was just stupid.


> (2) Another new patch v15-0005 on top of previous design of
>   limit-by-number-of-a-cache feature converts it to
>   limit-by-size-on-all-caches feature, which I think is
>   Tsunakawa-san wanted.

Thank you very, very much!  I look forward to reviewing v15.  I'll be away from 
the office tomorrow, so I'd like to review it on this weekend or the beginning 
of next week.  I've confirmed and am sure that 0001 can be committed.




> As far as I can see no significant degradation is found in usual (as long
> as pruning doesn't happen) code paths.
> 
> About the new global-size based evicition(2), cache entry creation becomes
> slow after the total size reached to the limit since every one new entry
> evicts one or more old (=
> not-recently-used) entries. Because of not needing knbos for each cache,
> it become far realistic. So I added documentation of
> "catalog_cache_max_size" in 0005.

Could you show us the comparison of before and after the pruning starts, if you 
already have it?  If you lost the data, I'm OK to see the data after the code 
review.


Regards
Takayuki Tsunakawa

RE: Protect syscache from bloating with negative cache entries

2019-02-13 Thread Tsunakawa, Takayuki

From: Bruce Momjian [mailto:br...@momjian.us]
> > That being said, having a "minimal size" threshold before starting with
> > the time-based eviction may be a good idea.
> 
> Agreed.  I see the minimal size as a way to keep the systems tables in
> cache, which we know we will need for the next query.

Isn't it the maximum size, not minimal size?  Maximum size allows to keep 
desired amount of system tables in memory as well as to control memory 
consumption to avoid out-of-memory errors (OS crash!).  I'm wondering why 
people want to take a different approach to catcatch, which is unlike other 
PostgreSQL memory e.g. shared_buffers, temp_buffers, SLRU buffers, work_mem, 
and other DBMSs.


Regards
Takayuki Tsunakawa

Re: Protect syscache from bloating with negative cache entries

2019-02-13 Thread Bruce Momjian

On Tue, Feb 12, 2019 at 02:53:40AM +0100, Tomas Vondra wrote:
> Right. But the logic behind time-based approach is that evicting such
> entries should not cause any issues exactly because they are accessed
> infrequently. It might incur some latency when we need them for the
> first time after the eviction, but IMHO that's acceptable (although I
> see Andres did not like that).
> 
> FWIW we might even evict entries after some time passes since inserting
> them into the cache - that's what memcached et al do, IIRC. The logic is
> that frequently accessed entries will get immediately loaded back (thus
> keeping cache hit ratio high). But there are reasons why the other dbs
> do that - like not having any cache invalidation (unlike us).

Agreed.  If this fixes 90% of the issues people will have, and it
applies to the 99.9% of users who will never tune this, it is a clear
win.  If we want to add something that requires tuning later, we can
consider it once the non-tuning solution is done.

> That being said, having a "minimal size" threshold before starting with
> the time-based eviction may be a good idea.

Agreed.  I see the minimal size as a way to keep the systems tables in
cache, which we know we will need for the next query.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Protect syscache from bloating with negative cache entries

2019-02-13 Thread Kyotaro HORIGUCHI

At Wed, 13 Feb 2019 02:15:42 +, "Tsunakawa, Takayuki" 
 wrote in 
<0A3221C70F24FB45833433255569204D1FB97CF1@G01JPEXMBYT05>
> From: Tomas Vondra [mailto:tomas.von...@2ndquadrant.com]
> > > I didin't consider planning that happen within a function. If
> > > 5min is the default for catalog_cache_prune_min_age, 10% of it
> > > (30s) seems enough and gettieofday() with such intervals wouldn't
> > > affect forground jobs. I'd choose catalog_c_p_m_age/10 rather
> > > than fixed value 30s and 1s as the minimal.
> > >
> > 
> > Actually, I see CatCacheCleanupOldEntries contains this comment:
> > 
> > /*
> >  * Calculate the duration from the time of the last access to the
> >  * "current" time. Since catcacheclock is not advanced within a
> >  * transaction, the entries that are accessed within the current
> >  * transaction won't be pruned.
> >  */
> > 
> > which I think is pretty much what I've been saying ... But the question
> > is whether we need to do something about it.
> 
> Hmm, I'm surprised at v14 patch about this.  I remember that previous patches 
> renewed the cache clock on every statement, and it is correct.  If the cache 
> clock is only updated at the beginning of a transaction, the following TODO 
> item would not be solved:
> 
> https://wiki.postgresql.org/wiki/Todo

Sorry, its just a stale comment. In v15, it is alreday ouch!
still left alone. (Actually CatCacheGetStats doesn't perform
pruning.)  I'll remove it in the next version. It is called in
start_xact_command, which is called per statement, provided with
statement timestamp.


> /*
>  * Calculate the duration from the time from the last access to
>  * the "current" time. catcacheclock is updated per-statement
>  * basis and additionaly udpated periodically during a long
>  * running query.
>  */
> TimestampDifference(ct->lastaccess, catcacheclock, _age, );


> " Reduce memory use when analyzing many tables in a single command by making 
> catcache and syscache flushable or bounded."

In v14 and v15, addition to it a timer that fires with the
interval of catalog_cache_prune_min_age/10 - 30s when the
parameter is 5min - updates the catcache clock using
gettimeofday(), which in turn is the source of LRU timestamp.

> Also, Tom mentioned pg_dump in this thread (protect syscache...).  pg_dump 
> runs in a single transaction, touching all system catalogs.  That may result 
> in OOM, and this patch can rescue it.

So, all the problem will be addressed in v14.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

1 2 3 >

1 - 100 of 218 matches

Mail list logo