Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Aleksander Alekseev
Hi Teodor,

Thanks a lot for a review!

> > step1 In pgstat_report_stat() you remove one by one entries from hash and
> > remove them all. Isn't it better to hash_destroy/hash_create or even let 
> > hash
> > lives in separate memory context and just resets it?

Agree, fixed.

> > step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash
> > although they will be free from point of view of pgStatTabList.

Good point! Fixed.

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b704788eb5..4060f241e2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -161,6 +161,20 @@ typedef struct TabStatusArray
 static TabStatusArray *pgStatTabList = NULL;
 
 /*
+ * pgStatTabHash entry
+ */
+typedef struct TabStatHashEntry
+{
+	Oid t_id;
+	PgStat_TableStatus* tsa_entry;
+} TabStatHashEntry;
+
+/*
+ * Hash table for O(1) t_id -> tsa_entry lookup
+ */
+static HTAB *pgStatTabHash = NULL;
+
+/*
  * Backends store per-function info that's waiting to be sent to the collector
  * in this hash table (indexed by function OID).
  */
@@ -824,6 +838,12 @@ pgstat_report_stat(bool force)
 	}
 
 	/*
+	 * pgStatTabHash is outdated on this point so we have to clean it.
+	 */
+	hash_destroy(pgStatTabHash);
+	pgStatTabHash = NULL;
+
+	/*
 	 * Send partial messages.  Make sure that any pending xact commit/abort
 	 * gets counted, even if there are no table stats to send.
 	 */
@@ -1668,59 +1688,87 @@ pgstat_initstats(Relation rel)
 }
 
 /*
+ * Make sure pgStatTabList and pgStatTabHash are initialized.
+ */
+static void
+make_sure_stat_tab_initialized()
+{
+	HASHCTL			ctl;
+	MemoryContext	new_ctx;
+
+	if(!pgStatTabList)
+	{
+		/* This is first time procedure is called */
+		pgStatTabList = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
+sizeof(TabStatusArray));
+	}
+
+	if(pgStatTabHash)
+		return;
+
+	/* Hash table was freed or never existed.  */
+
+	new_ctx = AllocSetContextCreate(
+		TopMemoryContext,
+		"PGStatLookupHashTableContext",
+		ALLOCSET_DEFAULT_SIZES);
+
+	memset(, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(TabStatHashEntry);
+	ctl.hcxt = new_ctx;
+
+	pgStatTabHash = hash_create("pgstat t_id to tsa_entry lookup hash table",
+		TABSTAT_QUANTUM, , HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+}
+
+/*
  * get_tabstat_entry - find or create a PgStat_TableStatus entry for rel
  */
 static PgStat_TableStatus *
 get_tabstat_entry(Oid rel_id, bool isshared)
 {
+	TabStatHashEntry* hash_entry;
 	PgStat_TableStatus *entry;
 	TabStatusArray *tsa;
-	TabStatusArray *prev_tsa;
-	int			i;
+	bool found;
+
+	make_sure_stat_tab_initialized();
 
 	/*
-	 * Search the already-used tabstat slots for this relation.
+	 * Find an entry or create a new one.
 	 */
-	prev_tsa = NULL;
-	for (tsa = pgStatTabList; tsa != NULL; prev_tsa = tsa, tsa = tsa->tsa_next)
+	hash_entry = hash_search(pgStatTabHash, _id, HASH_ENTER, );
+	if(found)
+		return hash_entry->tsa_entry;
+
+	/*
+	 * `hash_entry` was just created and now we have to fill it.
+	 * First make sure there is a free space in a last element of pgStatTabList.
+	 */
+	tsa = pgStatTabList;
+	while(tsa->tsa_used == TABSTAT_QUANTUM)
 	{
-		for (i = 0; i < tsa->tsa_used; i++)
+		if(tsa->tsa_next == NULL)
 		{
-			entry = >tsa_entries[i];
-			if (entry->t_id == rel_id)
-return entry;
+			tsa->tsa_next = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
+		sizeof(TabStatusArray));
 		}
 
-		if (tsa->tsa_used < TABSTAT_QUANTUM)
-		{
-			/*
-			 * It must not be present, but we found a free slot instead. Fine,
-			 * let's use this one.  We assume the entry was already zeroed,
-			 * either at creation or after last use.
-			 */
-			entry = >tsa_entries[tsa->tsa_used++];
-			entry->t_id = rel_id;
-			entry->t_shared = isshared;
-			return entry;
-		}
+		tsa = tsa->tsa_next;
 	}
 
 	/*
-	 * We ran out of tabstat slots, so allocate more.  Be sure they're zeroed.
-	 */
-	tsa = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
-	sizeof(TabStatusArray));
-	if (prev_tsa)
-		prev_tsa->tsa_next = tsa;
-	else
-		pgStatTabList = tsa;
-
-	/*
-	 * Use the first entry of the new TabStatusArray.
+	 * Add an entry.
 	 */
 	entry = >tsa_entries[tsa->tsa_used++];
 	entry->t_id = rel_id;
 	entry->t_shared = isshared;
+
+	/*
+	 * Add a corresponding entry to pgStatTabHash.
+	 */
+	hash_entry->tsa_entry = entry;
 	return entry;
 }
 
@@ -1732,22 +1780,19 @@ get_tabstat_entry(Oid rel_id, bool isshared)
 PgStat_TableStatus *
 find_tabstat_entry(Oid rel_id)
 {
-	PgStat_TableStatus *entry;
-	TabStatusArray *tsa;
-	int			i;
+	TabStatHashEntry* hash_entry;
 
-	for (tsa = pgStatTabList; tsa != NULL; tsa = tsa->tsa_next)
-	{
-		for (i = 0; i < tsa->tsa_used; i++)
-		{
-			entry = >tsa_entries[i];
-			if (entry->t_id == rel_id)
-return entry;
-		}
-	}
+	/*
+	 * There are no entries at all.
+	 */
+	if(!pgStatTabHash)
+		

Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Teodor Sigaev

Sorry, 1) and 4) is my fault, comment in hsearch.h:
* ... The hash key
* is expected to be at the start of the caller's hash entry data structure.

Ops, forgot that.

Teodor Sigaev wrote:

things in order I'm attaching the previous patch as well.


Patches look good, but I have some notices:

1 step1 Why do you need TabStatHashEntry at all? TabStatHashEntry.t_id is never
used for read, so entry for hash could be just a pointer to PgStat_TableStatus.

2 step1 In pgstat_report_stat() you remove one by one entries from hash and
remove them all. Isn't it better to hash_destroy/hash_create or even let hash
lives in separate memory context and just resets it?

3 step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash
although they will be free from point of view of pgStatTabList.


4 step 2. The same as 1) about SeenRelsEntry->rel_id but it even isn't
initialized anywhere.





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Teodor Sigaev

things in order I'm attaching the previous patch as well.


Patches look good, but I have some notices:

1 step1 Why do you need TabStatHashEntry at all? TabStatHashEntry.t_id is never 
used for read, so entry for hash could be just a pointer to PgStat_TableStatus.


2 step1 In pgstat_report_stat() you remove one by one entries from hash and 
remove them all. Isn't it better to hash_destroy/hash_create or even let hash 
lives in separate memory context and just resets it?


3 step1 Again, pgstat_report_stat(), all-zero entries aren't deleted from hash 
although they will be free from point of view of pgStatTabList.



4 step 2. The same as 1) about SeenRelsEntry->rel_id but it even isn't 
initialized anywhere.




--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Aleksander Alekseev
Also I would like to share some benchmark results.

Before applying any patches (copied from one of previous messages):

```
latency average = 1153.495 ms
latency stddev = 154.366 ms
tps = 6.061104 (including connections establishing)
tps = 6.061211 (excluding connections establishing)
```

After applying first patch (copied as well):

```
latency average = 853.862 ms
latency stddev = 112.270 ms
tps = 8.191861 (including connections establishing)
tps = 8.192028 (excluding connections establishing)
```

After applying both patches:

```
latency average = 533.646 ms
latency stddev = 86.311 ms
tps = 13.110328 (including connections establishing)
tps = 13.110608 (excluding connections establishing)
```

Small amount of partitions (2 to be exact), no patches:

```
latency average = 0.928 ms
latency stddev = 0.296 ms
tps = 7535.224897 (including connections establishing)
tps = 7536.145457 (excluding connections establishing)
```

Small amount of partitions, bot patches applied:

```
latency average = 0.915 ms
latency stddev = 0.269 ms
tps = 7638.344922 (including connections establishing)
tps = 7639.164769 (excluding connections establishing)
```

As you can see these patches don't make things worse in any regard.

Perf shows that both bottlenecks are gone. Before [1] and after [2].

[1] http://afiskon.ru/s/00/2008c4ae66_temp.png
[2] http://afiskon.ru/s/a5/fd81628a3a_temp.png

On Fri, Mar 24, 2017 at 03:17:44PM +0300, Aleksander Alekseev wrote:
> Hi Anastasia,
> 
> Thanks a lot for a review!
> 
> As was mentioned above there is also a bottleneck in find_all_inheritors
> procedure. Turned out the problem there is similar and it could be easily
> fixed as well. Corresponding patch is attached to this message. To keep
> things in order I'm attaching the previous patch as well.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: [HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-24 Thread Aleksander Alekseev
Hi Anastasia,

Thanks a lot for a review!

As was mentioned above there is also a bottleneck in find_all_inheritors
procedure. Turned out the problem there is similar and it could be easily
fixed as well. Corresponding patch is attached to this message. To keep
things in order I'm attaching the previous patch as well.

On Wed, Mar 22, 2017 at 11:53:45AM +, Anastasia Lubennikova wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:not tested
> 
> The patch looks good to me.
> It applies clearly, passes all the tests and eliminates the bottleneck 
> described in the first message.
> And as I can see from the thread, there were no objections from others,
> except a few minor comments about code style, which are fixed in the last 
> version of the patch.
> So I mark it "Ready for committer".
> 
> The new status of this patch is: Ready for Committer
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 7cacb1e9b2..a22a5a37c8 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -161,6 +161,20 @@ typedef struct TabStatusArray
 static TabStatusArray *pgStatTabList = NULL;
 
 /*
+ * pgStatTabHash entry
+ */
+typedef struct TabStatHashEntry
+{
+	Oid t_id;
+	PgStat_TableStatus* tsa_entry;
+} TabStatHashEntry;
+
+/*
+ * Hash table for O(1) t_id -> tsa_entry lookup
+ */
+static HTAB *pgStatTabHash = NULL;
+
+/*
  * Backends store per-function info that's waiting to be sent to the collector
  * in this hash table (indexed by function OID).
  */
@@ -815,7 +829,13 @@ pgstat_report_stat(bool force)
 pgstat_send_tabstat(this_msg);
 this_msg->m_nentries = 0;
 			}
+
+			/*
+			 * Entry will be zeroed soon so we need to remove it from the lookup table.
+			 */
+			hash_search(pgStatTabHash, >t_id, HASH_REMOVE, NULL);
 		}
+
 		/* zero out TableStatus structs after use */
 		MemSet(tsa->tsa_entries, 0,
 			   tsa->tsa_used * sizeof(PgStat_TableStatus));
@@ -1667,59 +1687,77 @@ pgstat_initstats(Relation rel)
 }
 
 /*
+ * Make sure pgStatTabList and pgStatTabHash are initialized.
+ */
+static void
+make_sure_stat_tab_initialized()
+{
+	HASHCTL ctl;
+
+	if (pgStatTabList)
+		return;
+
+	Assert(!pgStatTabHash);
+
+	memset(, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(TabStatHashEntry);
+	ctl.hcxt = TopMemoryContext;
+
+	pgStatTabHash = hash_create("pgstat t_id to tsa_entry lookup table",
+		TABSTAT_QUANTUM, , HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+	pgStatTabList = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
+sizeof(TabStatusArray));
+}
+
+/*
  * get_tabstat_entry - find or create a PgStat_TableStatus entry for rel
  */
 static PgStat_TableStatus *
 get_tabstat_entry(Oid rel_id, bool isshared)
 {
+	TabStatHashEntry* hash_entry;
 	PgStat_TableStatus *entry;
 	TabStatusArray *tsa;
-	TabStatusArray *prev_tsa;
-	int			i;
+	bool found;
+
+	make_sure_stat_tab_initialized();
+
+	/*
+	 * Find an entry or create a new one.
+	 */
+	hash_entry = hash_search(pgStatTabHash, _id, HASH_ENTER, );
+	if(found)
+		return hash_entry->tsa_entry;
 
 	/*
-	 * Search the already-used tabstat slots for this relation.
+	 * `hash_entry` was just created and now we have to fill it.
+	 * First make sure there is a free space in a last element of pgStatTabList.
 	 */
-	prev_tsa = NULL;
-	for (tsa = pgStatTabList; tsa != NULL; prev_tsa = tsa, tsa = tsa->tsa_next)
+	tsa = pgStatTabList;
+	while(tsa->tsa_used == TABSTAT_QUANTUM)
 	{
-		for (i = 0; i < tsa->tsa_used; i++)
+		if(tsa->tsa_next == NULL)
 		{
-			entry = >tsa_entries[i];
-			if (entry->t_id == rel_id)
-return entry;
+			tsa->tsa_next = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
+		sizeof(TabStatusArray));
 		}
 
-		if (tsa->tsa_used < TABSTAT_QUANTUM)
-		{
-			/*
-			 * It must not be present, but we found a free slot instead. Fine,
-			 * let's use this one.  We assume the entry was already zeroed,
-			 * either at creation or after last use.
-			 */
-			entry = >tsa_entries[tsa->tsa_used++];
-			entry->t_id = rel_id;
-			entry->t_shared = isshared;
-			return entry;
-		}
+		tsa = tsa->tsa_next;
 	}
 
 	/*
-	 * We ran out of tabstat slots, so allocate more.  Be sure they're zeroed.
-	 */
-	tsa = (TabStatusArray *) MemoryContextAllocZero(TopMemoryContext,
-	sizeof(TabStatusArray));
-	if (prev_tsa)
-		prev_tsa->tsa_next = tsa;
-	else
-		pgStatTabList = tsa;
-
-	/*
-	 * Use the first entry of the new TabStatusArray.
+	 * Add an entry.
 	 */
 	entry = >tsa_entries[tsa->tsa_used++];
 	entry->t_id = rel_id;
 	entry->t_shared = isshared;
+

[HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-22 Thread Anastasia Lubennikova
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:not tested

The patch looks good to me.
It applies clearly, passes all the tests and eliminates the bottleneck 
described in the first message.
And as I can see from the thread, there were no objections from others,
except a few minor comments about code style, which are fixed in the last 
version of the patch.
So I mark it "Ready for committer".

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Declarative partitioning optimization for large amount of partitions

2017-03-21 Thread David Steele

On 3/10/17 7:16 AM, Aleksander Alekseev wrote:


Thanks a lot for the review!


Anyone want to take a crack at reviewing this new version?

Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers