Re: use ARM intrinsics in pg_lfind32() where available

2022-08-29 Thread John Naylor
On Mon, Aug 29, 2022 at 3:19 PM John Naylor
 wrote:
>
> It turns out MSVC animal drongo doesn't like this cast -- on x86 they
> are the same underlying type. Will look into that as more results come
> in.

Here's the simplest fix I can think of:

/*
 * Exactly like vector8_is_highbit_set except for the input type, so
it still looks
 * at each _byte_ separately.
 *
 * XXX x86 uses the same underlying type for vectors with 8-bit,
16-bit, and 32-bit
 * integer elements, but Arm does not, hence the need for a separate function.
 * We could instead adopt the behavior of Arm's vmaxvq_u32(), i.e. check each
 * 32-bit element, but that would require an additional mask operation on x86.
 */
static inline bool
vector32_is_highbit_set(const Vector32 v)
{
#if defined(USE_NEON)
return vector8_is_highbit_set((Vector8) v);
#else
return vector8_is_highbit_set(v);
#endif
}

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: windows cfbot failing: my_perl

2022-08-26 Thread John Naylor
On Fri, Aug 26, 2022 at 9:27 PM Andres Freund  wrote:
>
> Because perl, extremely unhelpfully, #defines free. Which, not surprisingly,
> causes issues when including system headers referencing free as well.
>
> I don't really see a good solution to this other than hoisting the
> mb/pg_wchar.h include out to before we include all the perl stuff. That does
> fix the issue.

We could also move is_valid_ascii somewhere else. It's only
tangentially related to "wide chars" anyway.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Insertion Sort Improvements

2022-08-26 Thread John Naylor
On Thu, Aug 25, 2022 at 5:55 PM Benjamin Coutu  wrote:
>
> Hello,
>
> Inspired by the recent discussions[1][2] around sort improvements, I took a 
> look around the code and noticed the use of a somewhat naive version of 
> insertion sort within the broader quicksort code.
>
> The current implementation (see sort_template.h) is practically the textbook 
> version of insertion sort:

Right. I think it's worth looking into. When I tested dual-pivot
quicksort, a threshold of 18 seemed best for my inputs, so making
insertion sort more efficient could tilt the balance more in favor of
dual-pivot. (It already seems slightly ahead, but as I mentioned in
the thread you linked, removing branches for null handling would make
it more compelling).

> DO_ASSIGN and DO_COPY macros would have to be declared analogue to DO_SWAP 
> via the template.

I don't think you need these macros, since this optimization is only
convenient if you know the type at compile time. See the attached,
which I had laying around when I was looking at PDQuicksort. I believe
it's similar to what you have in mind. (Ignore the part about
"unguarded", it's irrelevant out of context). Notice that it avoids
unnecessary copies, but has two calls to DO_COMPARE, which is *not*
small for tuples.

> Anyways, there might be some low hanging fruit here. If it turns out to be 
> significantly faster one might even consider increasing the insertion sort 
> threshold from < 7 to < 10 sort elements. But that is a whole other 
> discussion for another day.

I think it belongs around 10 now anyway. I also don't think you'll see
much benefit with your proposal at a threshold of 7 -- I suspect it'd
be more enlightening to test a range of thresholds with and without
the patch, to see how the inflection point shifts. That worked pretty
well when testing dual-pivot.

-- 
John Naylor
EDB: http://www.enterprisedb.com
From fd4bfdbee88478fac32838712c112d91f73c5db5 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 4 May 2022 23:35:06 +0700
Subject: [PATCH v3 8/8] Use unguarded insertion sort where possible

Since we recurse to the partitions in order, guarding is only
needed at the very beginning of the input. It's not yet clear
if this is a performance win for sort keys with complex
comparators.
---
 src/include/lib/sort_template.h| 58 +-
 src/test/regress/expected/geometry.out |  2 +-
 2 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/src/include/lib/sort_template.h b/src/include/lib/sort_template.h
index 3b3c4bc128..ae76fd39e8 100644
--- a/src/include/lib/sort_template.h
+++ b/src/include/lib/sort_template.h
@@ -79,7 +79,6 @@
  *		https://github.com/orlp/pdqsort
  *
  * WIP: Not yet implemented:
- *  - use unguarded insertion sort where possible
  *  - fall back to heap sort to guard the stack
  *
  * Other techniques used in PDQuicksort, but likely not worth adopting:
@@ -174,6 +173,7 @@ ST_SCOPE void ST_SORT(ST_ELEMENT_TYPE * first, size_t n
 /* sort private helper functions */
 #define ST_INSERTION_SORT ST_MAKE_NAME(ST_SORT, insertion_sort)
 #define ST_INSERTION_SORT_PARTIAL ST_MAKE_NAME(ST_SORT, insertion_sort_partial)
+#define ST_INSERTION_SORT_UNGUARDED ST_MAKE_NAME(ST_SORT, insertion_sort_unguarded)
 #define ST_PARTITION_LEFT ST_MAKE_NAME(ST_SORT, partition_left)
 #define ST_PARTITION_RIGHT ST_MAKE_NAME(ST_SORT, partition_right)
 #define ST_SORT_INTERNAL ST_MAKE_NAME(ST_SORT, internal)
@@ -213,6 +213,12 @@ ST_SCOPE void ST_SORT(ST_ELEMENT_TYPE * first, size_t n
 			ST_SORT_INVOKE_COMPARE		\
 			ST_SORT_INVOKE_ARG)
 
+#define DO_INSERTION_SORT_UNGUARDED(begin_, end_)		\
+	ST_INSERTION_SORT_UNGUARDED((begin_), (end_)		\
+			ST_SORT_INVOKE_ELEMENT_SIZE	\
+			ST_SORT_INVOKE_COMPARE		\
+			ST_SORT_INVOKE_ARG)
+
 #define DO_PARTITION_LEFT(begin_, end_)	\
 	ST_PARTITION_LEFT((begin_), (end_)	\
 			ST_SORT_INVOKE_ELEMENT_SIZE	\
@@ -402,6 +408,48 @@ ST_INSERTION_SORT_PARTIAL(ST_POINTER_TYPE * begin,
 	return true;
 }
 
+static inline void
+ST_INSERTION_SORT_UNGUARDED(ST_POINTER_TYPE * begin,
+  ST_POINTER_TYPE * end
+  ST_SORT_PROTO_ELEMENT_SIZE
+  ST_SORT_PROTO_COMPARE
+  ST_SORT_PROTO_ARG)
+{
+	ST_POINTER_TYPE *cur;
+	ST_POINTER_TYPE *sift;
+	ST_POINTER_TYPE *sift_1;
+
+	if (begin == end)
+		return;
+
+	for (cur = begin + ST_POINTER_STEP; cur < end; cur += ST_POINTER_STEP)
+	{
+#ifndef ST_ELEMENT_TYPE_VOID
+		sift = cur;
+		sift_1 = cur - 1;
+
+		if (DO_COMPARE(sift_1, sift) > 0)
+		{
+			ST_ELEMENT_TYPE tmp;
+			tmp = *sift;
+
+			do
+			{
+*sift-- = *sift_1;
+sift_1--; /* Avoid multiple evaluation in DO_COMPARE */
+			} while (DO_COMPARE(sift_1, ) > 0);
+
+			*sift = tmp;
+		}
+#else
+		for (sift = cur, sift_1 = cur - ST_POINTER_STEP;
+			 DO_COMPARE(sift_1, sift) > 0;
+			 sift -= ST_POINTER_STEP, sift_1 -= ST_POINTER_STEP)
+			DO_SWAP(sift, sift_1);
+#endif

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-29 Thread John Naylor
On Mon, Aug 29, 2022 at 12:44 PM Nathan Bossart
 wrote:
> [v6]

Pushed with a couple comment adjustments, let's see what the build
farm thinks...

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Insertion Sort Improvements

2022-08-28 Thread John Naylor
On Fri, Aug 26, 2022 at 9:06 PM Benjamin Coutu  wrote:
>
> Another idea could be to run a binary insertion sort and use a much higher 
> threshold. This could significantly cut down on comparisons (especially with 
> presorted runs, which are quite common in real workloads).

Comparisons that must go to the full tuple are expensive enough that
this idea might have merit in some cases, but that would be a research
project.

> If full binary search turned out to be an issue regarding cache locality, we 
> could do it in smaller chunks,

The main issue with binary search is poor branch prediction. Also, if
large chunks are bad for cache locality, isn't that a strike against
using a "much higher threshold"?

> With less comparisons we should start keeping track of swaps and use that as 
> an efficient way to determine presortedness. We could change the insertion 
> sort threshold to a swap threshold and do insertion sort until we hit the 
> swap threshold. I suspect that would make the current presorted check 
> obsolete (as binary insertion sort without or even with a few swaps should be 
> faster than the current presorted-check).

The thread you linked to discusses partial insertion sort as a
replacement for the pre-sorted check, along with benchmark results and
graphs IIRC. I think it's possibly worth doing, but needs more
investigation to make sure the (few) regressions I saw either: 1. were
just noise or 2. can be ameliorated. As I said in the dual pivot
thread, this would be great for dual pivot since we could reuse
partial insertion sort for choosing the pivots, reducing binary space.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: use ARM intrinsics in pg_lfind32() where available

2022-08-28 Thread John Naylor
On Sun, Aug 28, 2022 at 10:58 AM Nathan Bossart
 wrote:
>
> Here is a new patch set in which I've attempted to address all feedback.

Looks in pretty good shape. Some more comments:

+ uint32 nelem_per_vector = sizeof(Vector32) / sizeof(uint32);
+ uint32 nelem_per_iteration = 4 * nelem_per_vector;

Using local #defines would be my style. I don't have a reason to
object to this way, but adding const makes these vars more clear.
Speaking of const:

- const __m128i tmp1 = _mm_or_si128(result1, result2);
- const __m128i tmp2 = _mm_or_si128(result3, result4);
- const __m128i result = _mm_or_si128(tmp1, tmp2);
+ tmp1 = vector32_or(result1, result2);
+ tmp2 = vector32_or(result3, result4);
+ result = vector32_or(tmp1, tmp2);

Any reason to throw away the const declarations?

+static inline bool
+vector32_is_highbit_set(const Vector32 v)
+{
+#ifdef USE_SSE2
+ return (_mm_movemask_epi8(v) & 0x) != 0;
+#endif
+}

I'm not sure why we need this function -- AFAICS it just adds more
work on x86 for zero benefit. For our present application, can we just
cast to Vector8 (for Arm's sake) and call the 8-bit version?

Aside from that, I plan on rewriting some comments for commit, some of
which pre-date this patch:

- * operations using bitwise operations on unsigned integers.
+ * operations using bitwise operations on unsigned integers.  Note that many
+ * of the functions in this file presently do not have non-SIMD
+ * implementations.

It's unclear to the reader whether this is a matter of 'round-to-it's.
I'd like to document what I asserted in this thread, that it's likely
not worthwhile to do anything with a uint64 representing two 32-bit
ints. (It *is* demonstrably worth it for handling 8 byte-values at a
time)

  * Use saturating subtraction to find bytes <= c, which will present as
- * NUL bytes in 'sub'.
+ * NUL bytes.

I'd like to to point out that the reason to do it this way is to
workaround SIMD architectures frequent lack of unsigned comparison.

+ * Return the result of subtracting the respective elements of the input
+ * vectors using saturation.

I wonder if we should explain briefly what saturating arithmetic is. I
had never encountered it outside of a SIMD programming context.

--
John Naylor
EDB: http://www.enterprisedb.com




Re: use ARM intrinsics in pg_lfind32() where available

2022-08-29 Thread John Naylor
On Mon, Aug 29, 2022 at 11:25 AM John Naylor
 wrote:
> +static inline bool
> +vector32_is_highbit_set(const Vector32 v)
> +{
> +#ifdef USE_SSE2
> + return (_mm_movemask_epi8(v) & 0x) != 0;
> +#endif
> +}
>
> I'm not sure why we need this function -- AFAICS it just adds more
> work on x86 for zero benefit. For our present application, can we just
> cast to Vector8 (for Arm's sake) and call the 8-bit version?

It turns out MSVC animal drongo doesn't like this cast -- on x86 they
are the same underlying type. Will look into that as more results come
in.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: use ARM intrinsics in pg_lfind32() where available

2022-08-29 Thread John Naylor
On Mon, Aug 29, 2022 at 4:28 PM John Naylor
 wrote:
>
> Here's the simplest fix I can think of:
>
> /*
>  * Exactly like vector8_is_highbit_set except for the input type, so
> it still looks
>  * at each _byte_ separately.
>  *
>  * XXX x86 uses the same underlying type for vectors with 8-bit,
> 16-bit, and 32-bit
>  * integer elements, but Arm does not, hence the need for a separate function.
>  * We could instead adopt the behavior of Arm's vmaxvq_u32(), i.e. check each
>  * 32-bit element, but that would require an additional mask operation on x86.
>  */
> static inline bool
> vector32_is_highbit_set(const Vector32 v)
> {
> #if defined(USE_NEON)
> return vector8_is_highbit_set((Vector8) v);
> #else
> return vector8_is_highbit_set(v);
> #endif
> }

Bowerbird just reported the same error, so I went ahead and pushed a
fix with this.

--
John Naylor
EDB: http://www.enterprisedb.com




Re: windows cfbot failing: my_perl

2022-08-26 Thread John Naylor
On Sat, Aug 27, 2022 at 4:15 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-08-26 17:05:52 -0400, Andrew Dunstan wrote:
> > On 2022-08-26 Fr 10:47, Andres Freund wrote:
> > > Given the crazy defines of stuff like free, it seems like a good idea to 
> > > have
> > > a rule that no headers should be included after plperl.h with
> > > PG_NEED_PERL_XSUB_H defined.  It's not like there's not other chances of 
> > > of
> > > pulling in malloc.h from within pg_wchar.h somehow.
> > >
> > > It's a bit ugly to have the mb/pg_wchar.h in plperl.h instead of
> > > plperl_helpers.h, but ...
>
> > It's already included directly in plperl.c, so couldn't we just lift it
> > directly into SPI.xs and Util.xs?
>
> I think it'd also be needed in hstore_plperl.c, jsonb_plperl.c. Putting the
> include in plperl.h would keep that aspect transparent, because plperl_utils.h
> includes plperl.h.

Since plperl_helpers.h already includes plperl.h, I'm not sure why
both are included everywhere the former is. If .c/.xs files didn't
include plperl.h directly, we could keep pg_wchar.h in
plperl_helpers.h. Not sure if that's workable or any better...

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: windows cfbot failing: my_perl

2022-08-26 Thread John Naylor
On Sat, Aug 27, 2022 at 10:02 AM Tom Lane  wrote:
>
> John Naylor  writes:
> > On Sat, Aug 27, 2022 at 4:15 AM Andres Freund  wrote:
> >> I think it'd also be needed in hstore_plperl.c, jsonb_plperl.c. Putting the
> >> include in plperl.h would keep that aspect transparent, because 
> >> plperl_utils.h
> >> includes plperl.h.
>
> > Since plperl_helpers.h already includes plperl.h, I'm not sure why
> > both are included everywhere the former is. If .c/.xs files didn't
> > include plperl.h directly, we could keep pg_wchar.h in
> > plperl_helpers.h. Not sure if that's workable or any better...
>
> Maybe we should flush the separate plperl_helpers.h header and just
> put those static-inline functions in plperl.h.

Here's a patch with that idea, not tested on Windows yet.

-- 
John Naylor
EDB: http://www.enterprisedb.com
From b9fba2229b064ab3d7971917cf9bfc1f95bc2d6f Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 27 Aug 2022 11:17:36 +0700
Subject: [PATCH v1] Be more careful to avoid including system headers after
 perl.h

Commit 121d2d3d70 included simd.h into pg_wchar.h, which lead to perl.h's
free() being redefined, at least on Windows. To fix, move the static
inline function definitions from plperl_helpers.h, into plperl.h, where
we already document the necessary inclusion order. Since those functions
were the only reason for the existence of plperl_helpers.h, remove it.

First reported by Justin Pryzby
Diagnosis by Andres Freund, patch by myself per suggestion from Tom Lane
Discussion: https://www.postgresql.org/message-id/20220826115546.GE2342%40telsasoft.com
---
 contrib/hstore_plperl/hstore_plperl.c |   1 -
 contrib/jsonb_plperl/jsonb_plperl.c   |   1 -
 src/pl/plperl/GNUmakefile |   4 +-
 src/pl/plperl/SPI.xs  |   1 -
 src/pl/plperl/Util.xs |   1 -
 src/pl/plperl/plperl.c|   2 -
 src/pl/plperl/plperl.h| 170 -
 src/pl/plperl/plperl_helpers.h| 171 --
 8 files changed, 171 insertions(+), 180 deletions(-)
 delete mode 100644 src/pl/plperl/plperl_helpers.h

diff --git a/contrib/hstore_plperl/hstore_plperl.c b/contrib/hstore_plperl/hstore_plperl.c
index c72785d99e..4a1629cad5 100644
--- a/contrib/hstore_plperl/hstore_plperl.c
+++ b/contrib/hstore_plperl/hstore_plperl.c
@@ -3,7 +3,6 @@
 #include "fmgr.h"
 #include "hstore/hstore.h"
 #include "plperl.h"
-#include "plperl_helpers.h"
 
 PG_MODULE_MAGIC;
 
diff --git a/contrib/jsonb_plperl/jsonb_plperl.c b/contrib/jsonb_plperl/jsonb_plperl.c
index 22e90afe1b..2af1e0c02a 100644
--- a/contrib/jsonb_plperl/jsonb_plperl.c
+++ b/contrib/jsonb_plperl/jsonb_plperl.c
@@ -4,7 +4,6 @@
 
 #include "fmgr.h"
 #include "plperl.h"
-#include "plperl_helpers.h"
 #include "utils/fmgrprotos.h"
 #include "utils/jsonb.h"
 
diff --git a/src/pl/plperl/GNUmakefile b/src/pl/plperl/GNUmakefile
index a2e6410f53..1ebf3c9ba2 100644
--- a/src/pl/plperl/GNUmakefile
+++ b/src/pl/plperl/GNUmakefile
@@ -72,7 +72,7 @@ XSUBPPDIR = $(shell $(PERL) -e 'use List::Util qw(first); print first { -r "$$_/
 
 include $(top_srcdir)/src/Makefile.shlib
 
-plperl.o: perlchunks.h plperl_opmask.h plperl_helpers.h
+plperl.o: perlchunks.h plperl_opmask.h
 
 plperl_opmask.h: plperl_opmask.pl
 	@if [ x"$(perl_privlibexp)" = x"" ]; then echo "configure switch --with-perl was not specified."; exit 1; fi
@@ -103,7 +103,7 @@ uninstall: uninstall-lib uninstall-data
 
 install-data: installdirs
 	$(INSTALL_DATA) $(addprefix $(srcdir)/, $(DATA)) '$(DESTDIR)$(datadir)/extension/'
-	$(INSTALL_DATA) $(srcdir)/plperl.h $(srcdir)/ppport.h $(srcdir)/plperl_helpers.h '$(DESTDIR)$(includedir_server)'
+	$(INSTALL_DATA) $(srcdir)/plperl.h $(srcdir)/ppport.h '$(DESTDIR)$(includedir_server)'
 
 uninstall-data:
 	rm -f $(addprefix '$(DESTDIR)$(datadir)/extension'/, $(notdir $(DATA)))
diff --git a/src/pl/plperl/SPI.xs b/src/pl/plperl/SPI.xs
index b2db3bd694..e81432e634 100644
--- a/src/pl/plperl/SPI.xs
+++ b/src/pl/plperl/SPI.xs
@@ -13,7 +13,6 @@
 /* perl stuff */
 #define PG_NEED_PERL_XSUB_H
 #include "plperl.h"
-#include "plperl_helpers.h"
 
 
 MODULE = PostgreSQL::InServer::SPI PREFIX = spi_
diff --git a/src/pl/plperl/Util.xs b/src/pl/plperl/Util.xs
index 47eba59415..bb4580ebfa 100644
--- a/src/pl/plperl/Util.xs
+++ b/src/pl/plperl/Util.xs
@@ -20,7 +20,6 @@
 /* perl stuff */
 #define PG_NEED_PERL_XSUB_H
 #include "plperl.h"
-#include "plperl_helpers.h"
 
 
 static text *
diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index af354a68cc..5d192a0ce5 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -23,7 +23,6 @@
 #include "commands/trigger.h"
 #include "executor/spi.h"
 #include "funcapi.h&q

Re: use ARM intrinsics in pg_lfind32() where available

2022-08-27 Thread John Naylor
On Sat, Aug 27, 2022 at 1:24 AM Nathan Bossart  wrote:
>
> Here is a rebased patch set that applies to HEAD.

0001:

 #define USE_NO_SIMD
 typedef uint64 Vector8;
+typedef uint64 Vector32;
 #endif

I don't forsee any use of emulating vector registers with uint64 if
they only hold two ints. I wonder if it'd be better if all vector32
functions were guarded with #ifndef NO_USE_SIMD. (I wonder if
declarations without definitions cause warnings...)

+ * NB: This function assumes that each lane in the given vector either has all
+ * bits set or all bits zeroed, as it is mainly intended for use with
+ * operations that produce such vectors (e.g., vector32_eq()).  If this
+ * assumption is not true, this function's behavior is undefined.
+ */

Hmm?

Also, is_highbit_set() already has uses same intrinsic and has the
same intended effect, since we only care about the boolean result.

0002:

-#elif defined(USE_SSE2)
+#elif defined(USE_SSE2) || defined(USE_NEON)

I think we can just say #else.

-#if defined(USE_SSE2)
- __m128i sub;
+#ifndef USE_NO_SIMD
+ Vector8 sub;

+#elif defined(USE_NEON)
+
+ /* use the same approach as the USE_SSE2 block above */
+ sub = vqsubq_u8(v, vector8_broadcast(c));
+ result = vector8_has_zero(sub);

I think we should invent a helper that does saturating subtraction and
call that, inlining the sub var so we don't need to mess with it
further.

Otherwise seems fine.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: windows cfbot failing: my_perl

2022-08-26 Thread John Naylor
On Sat, Aug 27, 2022 at 11:20 AM John Naylor
 wrote:
>
> Here's a patch with that idea, not tested on Windows yet.

Update: I tried taking the CI for a spin, but ran into IT issues with
Github when I tried to push my branch to remote.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: windows cfbot failing: my_perl

2022-08-27 Thread John Naylor
On Sat, Aug 27, 2022 at 2:23 PM Andres Freund  wrote:
>
> Hi,
>
> On 2022-08-27 12:53:24 +0700, John Naylor wrote:
> > Update: I tried taking the CI for a spin, but ran into IT issues with
> > Github when I tried to push my branch to remote.
>
> A github, not a CI issue? Just making sure...

Yeah, I forked PG from the Github page, cloned it locally, applied the
patch and tried to push to origin.

> As a workaround you can just open a CF entry, that'll run the patch soon.

Yeah, I did that after taking a break -- there are compiler warnings
for contrib/sepgsql/label.c where pfree's argument is cast to void *,
so seems unrelated.

> But either way, I ran the patch "manually" in a windows VM that I had running
> anyway. With the meson patchset, but I don't see how it could matter here.
>
> 1/5 postgresql:setup / tmp_install  OK
> 1.30s
> 2/5 postgresql:jsonb_plperl / jsonb_plperl/regress  OK
> 8.30s
> 3/5 postgresql:bool_plperl / bool_plperl/regressOK
> 8.30s
> 4/5 postgresql:hstore_plperl / hstore_plperl/regressOK
> 8.64s
> 5/5 postgresql:plperl / plperl/regress  OK   
> 10.41s
>
> Ok: 5
>
>
> I didn't test other platforms.
>
>
> WRT the patch's commit message: The issue isn't that perl's free() is
> redefined, it's that perl's #define free (which references perl globals!)
> breaks windows' header...

Ah, thanks for that detail and for testing, will push.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: use ARM intrinsics in pg_lfind32() where available

2022-08-29 Thread John Naylor
On Tue, Aug 30, 2022 at 12:17 AM Nathan Bossart
 wrote:
> Thanks!  I've attached a follow-up patch with a couple of small
> suggestions.

Pushed, thanks!

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: use ARM intrinsics in pg_lfind32() where available

2022-08-21 Thread John Naylor
On Sat, Aug 20, 2022 at 5:28 AM Nathan Bossart  wrote:
>
> On Fri, Aug 19, 2022 at 02:26:02PM -0700, Andres Freund wrote:
> > Are you sure there's not an appropriate define for us to use here instead 
> > of a
> > configure test? E.g.
> >
> > echo|cc -dM -P -E -|grep -iE 'arm|aarch'
> > ...
> > #define __AARCH64_SIMD__ 1
> > ...
> > #define __ARM_NEON 1
> > #define __ARM_NEON_FP 0xE
> > #define __ARM_NEON__ 1
> > ..
> >
> > I strikes me as non-scalable to explicitly test all the simd instructions 
> > we'd
> > use.
>
> Thanks for the pointer.  GCC, Clang, and the Arm compiler all seem to
> define __ARM_NEON, so here is a patch that uses that instead.

Is this also ever defined on 32-bit? If so, is it safe, meaning the
compiler will not emit these instructions without additional flags?
I'm wondering if  __aarch64__ would be clearer on that, and if we get
windows-on-arm support as has been proposed, could also add _M_ARM64.

I also see #if defined(__aarch64__) || defined(__aarch64) in our
codebase already, but I'm not sure what recognizes the latter.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-21 Thread John Naylor
On Sun, Aug 21, 2022 at 12:47 PM Nathan Bossart
 wrote:
>
> I spent some more time looking at this one, and I had a few ideas that I
> thought I'd share.  0001 is your v6 patch with a few additional changes,
> including simplying the assertions for readability, splitting out the
> Vector type into Vector8 and Vector32 (needed for ARM), and adjusting
> pg_lfind32() to use the new tools in simd.h.  0002 adds ARM versions of
> everything, which obsoletes the other thread I started [0].  This is still
> a little rough around the edges (e.g., this should probably be more than 2
> patches), but I think it helps demonstrate a more comprehensive design than
> what I've proposed in the pg_lfind32-for-ARM thread [0].
>
> Apologies if I'm stepping on your toes a bit here.

Not at all! However, the 32-bit-element changes are irrelevant for
json, and make review more difficult. I would suggest keeping those in
the other thread starting with whatever refactoring is needed. I can
always rebase over that.

Not a full review, but on a brief look:

- I like the idea of simplifying the assertions, but I can't get
behind using platform lfind to do it, since it has a different API,
requires new functions we don't need, and possibly has portability
issues. A simple for-loop is better for assertions.
- A runtime elog is not appropriate for a compile time check -- use
#error instead.

--
John Naylor
EDB: http://www.enterprisedb.com




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-30 Thread John Naylor
On Tue, Aug 23, 2022 at 1:03 PM John Naylor
 wrote:
>
> LGTM overall. My plan is to split out the json piece, adding tests for
> that, and commit the infrastructure for it fairly soon.

Here's the final piece. I debated how many tests to add and decided it
was probably enough to add one each for checking quotes and
backslashes in the fast path. There is one cosmetic change in the
code: Before, the vectorized less-equal check compared to 0x1F, but
the byte-wise path did so with < 32. I made them both "less-equal 31"
for consistency. I'll commit this by the end of the week unless anyone
has a better idea about testing.

-- 
John Naylor
EDB: http://www.enterprisedb.com
From f1159dcc2044edb107e0dfeae5e8f3c7feb10cd2 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 31 Aug 2022 10:39:17 +0700
Subject: [PATCH v10] Optimize JSON lexing of long strings

Use optimized linear search when looking ahead for end quotes,
backslashes, and non-printable characters. This results in nearly 40%
faster JSON parsing on x86-64 when most values are long strings, and
all platforms should see some improvement.

Reviewed by Andres Freund and Nathan Bossart
Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA%40mail.gmail.com
---
 src/common/jsonapi.c   | 13 ++---
 src/test/regress/expected/json.out | 13 +
 src/test/regress/sql/json.sql  |  5 +
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index fefd1d24d9..cfc025749c 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -19,6 +19,7 @@
 
 #include "common/jsonapi.h"
 #include "mb/pg_wchar.h"
+#include "port/pg_lfind.h"
 
 #ifndef FRONTEND
 #include "miscadmin.h"
@@ -844,7 +845,7 @@ json_lex_string(JsonLexContext *lex)
 		}
 		else
 		{
-			char	   *p;
+			char	   *p = s;
 
 			if (hi_surrogate != -1)
 return JSON_UNICODE_LOW_SURROGATE;
@@ -853,11 +854,17 @@ json_lex_string(JsonLexContext *lex)
 			 * Skip to the first byte that requires special handling, so we
 			 * can batch calls to appendBinaryStringInfo.
 			 */
-			for (p = s; p < end; p++)
+			while (p < end - sizeof(Vector8) &&
+   !pg_lfind8('\\', (uint8 *) p, sizeof(Vector8)) &&
+   !pg_lfind8('"', (uint8 *) p, sizeof(Vector8)) &&
+   !pg_lfind8_le(31, (uint8 *) p, sizeof(Vector8)))
+p += sizeof(Vector8);
+
+			for (; p < end; p++)
 			{
 if (*p == '\\' || *p == '"')
 	break;
-else if ((unsigned char) *p < 32)
+else if ((unsigned char) *p <= 31)
 {
 	/* Per RFC4627, these characters MUST be escaped. */
 	/*
diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out
index e9d6e9faf2..cb181226e9 100644
--- a/src/test/regress/expected/json.out
+++ b/src/test/regress/expected/json.out
@@ -42,6 +42,19 @@ LINE 1: SELECT '"\v"'::json;
^
 DETAIL:  Escape sequence "\v" is invalid.
 CONTEXT:  JSON data, line 1: "\v...
+-- Check fast path for longer strings (at least 16 bytes long)
+SELECT ('"'||repeat('.', 12)||'abc"')::json; -- OK
+   json
+---
+ "abc"
+(1 row)
+
+SELECT ('"'||repeat('.', 12)||'abc\n"')::json; -- OK, legal escapes
+json 
+-
+ "abc\n"
+(1 row)
+
 -- see json_encoding test for input with unicode escapes
 -- Numbers.
 SELECT '1'::json;-- OK
diff --git a/src/test/regress/sql/json.sql b/src/test/regress/sql/json.sql
index e366c6f51b..589e0cea36 100644
--- a/src/test/regress/sql/json.sql
+++ b/src/test/regress/sql/json.sql
@@ -7,6 +7,11 @@ SELECT '"abc
 def"'::json;	-- ERROR, unescaped newline in string constant
 SELECT '"\n\"\\"'::json;		-- OK, legal escapes
 SELECT '"\v"'::json;			-- ERROR, not a valid JSON escape
+
+-- Check fast path for longer strings (at least 16 bytes long)
+SELECT ('"'||repeat('.', 12)||'abc"')::json; -- OK
+SELECT ('"'||repeat('.', 12)||'abc\n"')::json; -- OK, legal escapes
+
 -- see json_encoding test for input with unicode escapes
 
 -- Numbers.
-- 
2.36.1



Re: broken table formatting in psql

2022-09-01 Thread John Naylor
On Thu, Sep 1, 2022 at 2:13 PM Pavel Stehule  wrote:
> problem is in bad width of invisible char 200E

I removed this comment in bab982161e since it didn't match the code.
I'd be interested to see what happened after v12.

- *   - Other format characters (general category code Cf in the Unicode
- * database) and ZERO WIDTH SPACE (U+200B) have a column
width of 0.

UnicodeData.txt has this:

200B;ZERO WIDTH SPACE;Cf;0;BN;N;
200C;ZERO WIDTH NON-JOINER;Cf;0;BN;N;
200D;ZERO WIDTH JOINER;Cf;0;BN;N;
200E;LEFT-TO-RIGHT MARK;Cf;0;L;N;
200F;RIGHT-TO-LEFT MARK;Cf;0;R;N;

So maybe we need to take Cf characters in this file into account, in
addition to Me and Mn (combining characters).

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [RFC] building postgres with meson - v12

2022-09-02 Thread John Naylor
On Thu, Sep 1, 2022 at 1:12 AM Andres Freund  wrote:
> [v12]

+# Build a small utility static lib for the parser. This makes it easier to not
+# depend on gram.h already having been generated for most of the other code
+# (which depends on generated headers having been generated). The generation
+# of the parser is slow...

It's not obvious whether this is intended to be a Meson-only
optimization or a workaround for something awkward to specify.

+  # FIXME: -output option is only available in perl 5.9.3 - but that's
+  # probably a fine minimum requirement?

Since we've retired some buildfarm animals recently, it seems the
oldest perl there is 5.14? ... which came out in 2011, so it seems
later on we could just set that as the minimum.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: broken table formatting in psql

2022-09-02 Thread John Naylor
On Fri, Sep 2, 2022 at 12:17 PM Kyotaro Horiguchi
 wrote:
>
> At Thu, 01 Sep 2022 18:22:06 +0900 (JST), Kyotaro Horiguchi 
>  wrote in
> > At Thu, 1 Sep 2022 15:00:38 +0700, John Naylor 
> >  wrote in
> > > UnicodeData.txt has this:
> > >
> > > 200B;ZERO WIDTH SPACE;Cf;0;BN;N;
> > > 200C;ZERO WIDTH NON-JOINER;Cf;0;BN;N;
> > > 200D;ZERO WIDTH JOINER;Cf;0;BN;N;
> > > 200E;LEFT-TO-RIGHT MARK;Cf;0;L;N;
> > > 200F;RIGHT-TO-LEFT MARK;Cf;0;R;N;
> > >
> > > So maybe we need to take Cf characters in this file into account, in
> > > addition to Me and Mn (combining characters).
> >
> > Including them into unicode_combining_table.h actually worked, but I'm
> > not sure it is valid to include Cf's among Mn/Me's..

Looking at the definition, Cf means "other, format" category, "Format
character that affects the layout of text or the operation of text
processes, but is not normally rendered". [1]

> UnicodeData.txt
> 174:00AD;SOFT HYPHEN;Cf;0;BN;N;
>
> Soft-hyphen seems like not zero-width.. usually...

I gather it only appears at line breaks, which I doubt we want to handle.

>  0600;ARABIC NUMBER SIGN;Cf;0;AN;N;
> 110BD;KAITHI NUMBER SIGN;Cf;0;L;N;
>
> Mmm. These looks like not zero-width?

There are glyphs, but there is something special about the first one:

select U&'\0600';

Looks like this in psql (substituting 'X' to avoid systemic differences):

+--+
| ?column? |
+--+
| X   |
+--+
(1 row)

Copy from psql to vim or nano:

+--+
| ?column? |
+--+
| X|
+--+
(1 row)

...so it does mess up the border the same way. The second
(U&'\+0110bd') doesn't render for me.

> However, it seems like basically a win if we include "Cf"s to the
> "combining" table?

There seems to be a case for that. If we did include those, we should
rename the table to match.

I found this old document from 2002 on "default ignorable" characters
that normally have no visible glyph:

https://unicode.org/L2/L2002/02368-default-ignorable.html

If there is any doubt about including all of Cf, we could also just
add a branch in wchar.c to hard-code the 200B-200F range.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-09-01 Thread John Naylor
On Wed, Aug 31, 2022 at 11:17 AM Nathan Bossart
 wrote:
>
> On Wed, Aug 31, 2022 at 10:50:39AM +0700, John Naylor wrote:
> > Here's the final piece. I debated how many tests to add and decided it
> > was probably enough to add one each for checking quotes and
> > backslashes in the fast path. There is one cosmetic change in the
> > code: Before, the vectorized less-equal check compared to 0x1F, but
> > the byte-wise path did so with < 32. I made them both "less-equal 31"
> > for consistency. I'll commit this by the end of the week unless anyone
> > has a better idea about testing.
>
> LGTM

Pushed, thanks for looking!

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: build remaining Flex files standalone

2022-09-02 Thread John Naylor
On Sat, Sep 3, 2022 at 1:29 AM Andres Freund  wrote:
>
> Hi John,
>
> Are you planning to press ahead with these?

I was waiting for feedback on the latest set, so tomorrow I'll see
about the FIXME and remove the leftover bogus include. I was thinking
of applying the guc-file patches separately and then squashing the
rest since they're *mostly* mechanical:

> > Subject: [PATCH v4 01/11] Preparatory refactoring for compiling guc-file.c
> >  standalone
> > Subject: [PATCH v4 02/11] Move private declarations shared between guc.c and
> >  guc-file.l to new header
> > Subject: [PATCH v4 03/11] Build guc-file.c standalone
>
> 01-03 are a bit more complicated, but still look not far off. There's a FIXME
> about failing headercheck.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: warning: comparison of integer expressions of different signedness related to simd.h

2022-09-02 Thread John Naylor
On Sat, Sep 3, 2022 at 12:30 PM Pavel Stehule  wrote:
>
> Hi
>
> I got fresh warnings when I build an extension
>
> In file included from /usr/local/pgsql/master/include/server/mb/pg_wchar.h:22,
>  from src/format.c:17:
> /usr/local/pgsql/master/include/server/port/simd.h: In function ‘vector8_has’:
> /usr/local/pgsql/master/include/server/port/simd.h:168:27: warning: 
> comparison of integer expressions of different signedness: ‘int’ and ‘long 
> unsigned int’ [-Wsign-compare]
>   168 | for (int i = 0; i < sizeof(Vector8); i++)
>   |   ^

"int" should probably be "Size" -- does that remove the warning?

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: warning: comparison of integer expressions of different signedness related to simd.h

2022-09-03 Thread John Naylor
On Sat, Sep 3, 2022 at 12:57 PM Tom Lane  wrote:
>
> John Naylor  writes:
> > On Sat, Sep 3, 2022 at 12:30 PM Pavel Stehule  
> > wrote:
> >> /usr/local/pgsql/master/include/server/port/simd.h: In function 
> >> ‘vector8_has’:
> >> /usr/local/pgsql/master/include/server/port/simd.h:168:27: warning: 
> >> comparison of integer expressions of different signedness: ‘int’ and ‘long 
> >> unsigned int’ [-Wsign-compare]
> >> 168 | for (int i = 0; i < sizeof(Vector8); i++)
>
> > "int" should probably be "Size" -- does that remove the warning?
>
> Agreed, should be Size or size_t, or else cast the sizeof() result.
> But I wonder why none of the buildfarm is showing such a warning.

If I add -Wsign-compare to CPPFLAGS, I get dozens of warnings all over
the place. It's probably unreasonable for extensions to expect to
compile cleanly with warnings that the core server doesn't use, but
this header is clearly wrong and easy to remedy, so I've pushed a
patch.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: build remaining Flex files standalone

2022-09-03 Thread John Naylor
On Sat, Sep 3, 2022 at 1:29 AM Andres Freund  wrote:

> > Subject: [PATCH v4 01/11] Preparatory refactoring for compiling guc-file.c
> >  standalone
> > Subject: [PATCH v4 02/11] Move private declarations shared between guc.c and
> >  guc-file.l to new header
> > Subject: [PATCH v4 03/11] Build guc-file.c standalone
>
> 01-03 are a bit more complicated, but still look not far off. There's a FIXME
> about failing headercheck.

Fixed by adding utils/guc.h to the new internal header, which now
lives in the same directory as guc.c and guc-file.l, similar to how I
did json path in the last patch. Also removed the bogus include from
v4 to . Pushed 01 and 02 separately, then squashed and pushed the
rest.


--
John Naylor
EDB: http://www.enterprisedb.com




Re: [RFC] building postgres with meson - v12

2022-09-04 Thread John Naylor
On Fri, Sep 2, 2022 at 11:35 PM Andres Freund  wrote:
>
> Hi,
>
> On 2022-09-02 14:17:26 +0700, John Naylor wrote:
> > On Thu, Sep 1, 2022 at 1:12 AM Andres Freund  wrote:
> > > [v12]
> >
> > +# Build a small utility static lib for the parser. This makes it easier to 
> > not
> > +# depend on gram.h already having been generated for most of the other code
> > +# (which depends on generated headers having been generated). The 
> > generation
> > +# of the parser is slow...
> >
> > It's not obvious whether this is intended to be a Meson-only
> > optimization or a workaround for something awkward to specify.
>
> It is an optimization. The parser generation is by far the slowest part of a
> build. If other files can only be compiled once gram.h is generated, there's a
> long initial period where little can happen. So instead of having all .c files
> have a dependency on gram.h having been generated, the above makes only
> scan.c, gram.c compilation depend on gram.h.  It only matters for the first
> compilation, because such dependencies are added as order-only dependencies,
> supplanted by more precise compiler generated dependencies after.

Okay, I think the comment could include some of this info for clarity.

> It's still pretty annoying that so much of the build is initially idle,
> waiting for genbki.pl to finish.
>
> Part of that is due to some ugly dependencies of src/common on backend headers
> that IMO probably shouldn't exist (e.g. src/common/relpath.c includes
> catalog/pg_tablespace_d.h).

Technically, *_d.h headers are not backend, that's why it's safe to
include them anywhere. relpath.c in its current form has to know the
tablespace OIDs, which I guess is what you think is ugly. (I agree
it's not great)

> Looks like it'd not be hard to get at least the
> _shlib version of src/common and libpq build without waiting for that. But for
> all the backend code I don't really see a way, so it'd be nice to make genbki
> faster at some point.

The attached gets me a ~15% reduction in clock time by having
Catalog.pm parse the .dat files in one sweep, when we don't care about
formatting, i.e. most of the time:

master:
User time (seconds): 0.48
Maximum resident set size (kbytes): 36112

patch:
User time (seconds): 0.41
Maximum resident set size (kbytes): 35808

That's pretty simple -- I think going beyond that would require some
perl profiling.

-- 
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index e91a8e10a8..9dd932e30a 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -287,6 +287,8 @@ sub ParseData
 	my $catname = $1;
 	my $data= [];
 
+	if ($preserve_formatting)
+	{
 	# Scan the input file.
 	while (<$ifd>)
 	{
@@ -346,11 +348,24 @@ sub ParseData
 		{
 			push @$data, $hash_ref if !$hash_ref->{autogenerated};
 		}
-		elsif ($preserve_formatting)
+		else
 		{
 			push @$data, $_;
 		}
 	}
+	}
+	else
+	{
+		# When we only care about the contents, it's faster to read and eval
+		# the whole file at once.
+		my $full_file = do { local(@ARGV, $/) = $input_file; <> };
+		eval '$data = ' . $full_file;
+		foreach my $hash_ref (@{$data})
+		{
+			AddDefaultValues($hash_ref, $schema, $catname);
+		}
+	}
+
 	close $ifd;
 
 	# If this is pg_type, auto-generate array types too.


Re: Compilation issue on Solaris.

2022-09-05 Thread John Naylor
On Sun, Jul 10, 2022 at 9:27 PM Tom Lane  wrote:
>
> Thomas Munro  writes:
> > Something bothers me about adding yet more clutter to every compile
> > line for the rest of time to solve a problem that exists only for
> > unpatched systems, and also that it's not even really a Solaris thing,
> > it's a C11 thing.
>
> I tend to agree with this standpoint: if it's only a warning, and
> it only appears in a small range of not-up-to-date Solaris builds,
> then a reasonable approach is "update your system if you don't want
> to see the warning".
>
> A positive argument for doing nothing is that there's room to worry
> whether -D__STDC_WANT_LIB_EXT1__ might have any side-effects we
> *don't* want.

This is still listed in the CF as needing review, so I went and marked
it rejected.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: build remaining Flex files standalone

2022-09-06 Thread John Naylor
On Mon, Sep 5, 2022 at 1:18 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-09-04 12:16:10 +0700, John Naylor wrote:
> > Pushed 01 and 02 separately, then squashed and pushed the rest.
>
> Thanks a lot! It does look a good bit cleaner to me now.
>
> I think, as a followup improvement, we should move gramparse.h to
> src/backend/parser, and stop installing gram.h, gramparse.h. gramparse.h
> already had this note:
>
>  * NOTE: this file is only meant to be included in the core parsing files,
>  * i.e., parser.c, gram.y, and scan.l.
>  * Definitions that are needed outside the core parser should be in parser.h.
>
> What do you think?

+1 for the concept, but haven't looked at the details.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Minimum bison and flex versions

2022-09-06 Thread John Naylor
On Sat, Sep 3, 2022 at 4:04 AM Andres Freund  wrote:
> On 2022-09-02 16:29:22 -0400, Tom Lane wrote:
> > Andres Freund  writes:
> > > On 2022-09-02 15:08:01 -0400, Tom Lane wrote:
> > I'd be content for now to set the minimums at 2.3 and 2.5.35.
>
> +1

Here are autoconf-only patches to that effect.

-- 
John Naylor
EDB: http://www.enterprisedb.com
From 5e07442ebf728c486536038aecf29aefcc892455 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Tue, 6 Sep 2022 11:41:58 +0700
Subject: [PATCH v2 2/2] Bump minimum version of Flex to 2.5.35

Since the retirement of some older buildfarm members, the oldest Bison
that gets regular testing is 2.5.35.

Discussion: https://www.postgresql.org/message-id/1097762.1662145...@sss.pgh.pa.us
---
 config/programs.m4| 10 --
 configure |  6 +++---
 doc/src/sgml/install-windows.sgml |  2 +-
 doc/src/sgml/installation.sgml|  2 +-
 src/backend/utils/misc/guc-file.l |  4 ++--
 src/tools/msvc/pgflex.pl  |  2 +-
 6 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/config/programs.m4 b/config/programs.m4
index bcdfbc3a51..ce83155592 100644
--- a/config/programs.m4
+++ b/config/programs.m4
@@ -65,10 +65,8 @@ AC_SUBST(BISONFLAGS)
 # PGAC_PATH_FLEX
 # --
 # Look for Flex, set the output variable FLEX to its path if found.
-# Reject versions before 2.5.31, as we need a reasonably non-buggy reentrant
-# scanner.  (Note: the well-publicized security problem in 2.5.31 does not
-# affect Postgres, and there are still distros shipping patched 2.5.31,
-# so allow it.)  Also find Flex if its installed under `lex', but do not
+# Reject versions before 2.5.35 (the earliest version in the buildfarm
+# as of 2022). Also find Flex if its installed under `lex', but do not
 # accept other Lex programs.
 
 AC_DEFUN([PGAC_PATH_FLEX],
@@ -92,14 +90,14 @@ else
 echo '%%'  > conftest.l
 if $pgac_candidate -t conftest.l 2>/dev/null | grep FLEX_SCANNER >/dev/null 2>&1; then
   pgac_flex_version=`$pgac_candidate --version 2>/dev/null`
-  if echo "$pgac_flex_version" | sed ['s/[.a-z]/ /g'] | $AWK '{ if ([$]1 == 2 && ([$]2 > 5 || ([$]2 == 5 && [$]3 >= 31))) exit 0; else exit 1;}'
+  if echo "$pgac_flex_version" | sed ['s/[.a-z]/ /g'] | $AWK '{ if ([$]1 == 2 && ([$]2 > 5 || ([$]2 == 5 && [$]3 >= 35))) exit 0; else exit 1;}'
   then
 pgac_cv_path_flex=$pgac_candidate
 break 2
   else
 AC_MSG_WARN([
 *** The installed version of Flex, $pgac_candidate, is too old to use with PostgreSQL.
-*** Flex version 2.5.31 or later is required, but this is $pgac_flex_version.])
+*** Flex version 2.5.35 or later is required, but this is $pgac_flex_version.])
   fi
 fi
   fi
diff --git a/configure b/configure
index 4a725ab205..613c385c48 100755
--- a/configure
+++ b/configure
@@ -10278,17 +10278,17 @@ else
 echo '%%'  > conftest.l
 if $pgac_candidate -t conftest.l 2>/dev/null | grep FLEX_SCANNER >/dev/null 2>&1; then
   pgac_flex_version=`$pgac_candidate --version 2>/dev/null`
-  if echo "$pgac_flex_version" | sed 's/[.a-z]/ /g' | $AWK '{ if ($1 == 2 && ($2 > 5 || ($2 == 5 && $3 >= 31))) exit 0; else exit 1;}'
+  if echo "$pgac_flex_version" | sed 's/[.a-z]/ /g' | $AWK '{ if ($1 == 2 && ($2 > 5 || ($2 == 5 && $3 >= 35))) exit 0; else exit 1;}'
   then
 pgac_cv_path_flex=$pgac_candidate
 break 2
   else
 { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING:
 *** The installed version of Flex, $pgac_candidate, is too old to use with PostgreSQL.
-*** Flex version 2.5.31 or later is required, but this is $pgac_flex_version." >&5
+*** Flex version 2.5.35 or later is required, but this is $pgac_flex_version." >&5
 $as_echo "$as_me: WARNING:
 *** The installed version of Flex, $pgac_candidate, is too old to use with PostgreSQL.
-*** Flex version 2.5.31 or later is required, but this is $pgac_flex_version." >&2;}
+*** Flex version 2.5.35 or later is required, but this is $pgac_flex_version." >&2;}
   fi
 fi
   fi
diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml
index 499fd59bf0..c00ab2b4b2 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -220,7 +220,7 @@ $ENV{MSBFLAGS}="/m";
   Bison and Flex are
   required to build from Git, but not required when building from a release
   file. Only Bison versions 2.3 and later
-  will work. Flex must be version 2.5.31 or later.
+  will work. Flex must be version 2.5.35 or later.
  
 
  
diff --git a/doc/src/sgml/installation.sgml 

Re: [RFC] building postgres with meson - v12

2022-09-06 Thread John Naylor
On Mon, Sep 5, 2022 at 4:11 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-09-04 13:12:52 +0700, John Naylor wrote:
> > On Fri, Sep 2, 2022 at 11:35 PM Andres Freund  wrote:
> > >
> > > Hi,
> > >
> > > On 2022-09-02 14:17:26 +0700, John Naylor wrote:
> > > > On Thu, Sep 1, 2022 at 1:12 AM Andres Freund  wrote:
> > > > > [v12]
> > > >
> > > > +# Build a small utility static lib for the parser. This makes it 
> > > > easier to not
> > > > +# depend on gram.h already having been generated for most of the other 
> > > > code
> > > > +# (which depends on generated headers having been generated). The 
> > > > generation
> > > > +# of the parser is slow...
> > > >
> > > > It's not obvious whether this is intended to be a Meson-only
> > > > optimization or a workaround for something awkward to specify.
> > >
> > > It is an optimization. The parser generation is by far the slowest part 
> > > of a
> > > build. If other files can only be compiled once gram.h is generated, 
> > > there's a
> > > long initial period where little can happen. So instead of having all .c 
> > > files
> > > have a dependency on gram.h having been generated, the above makes only
> > > scan.c, gram.c compilation depend on gram.h.  It only matters for the 
> > > first
> > > compilation, because such dependencies are added as order-only 
> > > dependencies,
> > > supplanted by more precise compiler generated dependencies after.
> >
> > Okay, I think the comment could include some of this info for clarity.
>
> Working on that.
>
>
> > > It's still pretty annoying that so much of the build is initially idle,
> > > waiting for genbki.pl to finish.
> > >
> > > Part of that is due to some ugly dependencies of src/common on backend 
> > > headers
> > > that IMO probably shouldn't exist (e.g. src/common/relpath.c includes
> > > catalog/pg_tablespace_d.h).
> >
> > Technically, *_d.h headers are not backend, that's why it's safe to
> > include them anywhere. relpath.c in its current form has to know the
> > tablespace OIDs, which I guess is what you think is ugly. (I agree
> > it's not great)
>
> Yea, I'm not saying it's unsafe in a produces-wrong-results way, just that it
> seems architecturally dubious / circular.
>
>
> > > Looks like it'd not be hard to get at least the
> > > _shlib version of src/common and libpq build without waiting for that. 
> > > But for
> > > all the backend code I don't really see a way, so it'd be nice to make 
> > > genbki
> > > faster at some point.
> >
> > The attached gets me a ~15% reduction in clock time by having
> > Catalog.pm parse the .dat files in one sweep, when we don't care about
> > formatting, i.e. most of the time:
>
> Cool. Seems worthwhile.

Okay, here's a cleaned up version with more idiomatic style and a new
copy of the perlcritic exception.

Note that the indentation hasn't changed. My thought there: perltidy
will be run again next year, at which time it will be part of a listed
whitespace-only commit. Any objections, since that could confuse
someone before then?

-- 
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm
index e91a8e10a8..a71f5b05a8 100644
--- a/src/backend/catalog/Catalog.pm
+++ b/src/backend/catalog/Catalog.pm
@@ -287,6 +287,8 @@ sub ParseData
 	my $catname = $1;
 	my $data= [];
 
+	if ($preserve_formatting)
+	{
 	# Scan the input file.
 	while (<$ifd>)
 	{
@@ -346,11 +348,25 @@ sub ParseData
 		{
 			push @$data, $hash_ref if !$hash_ref->{autogenerated};
 		}
-		elsif ($preserve_formatting)
+		else
 		{
 			push @$data, $_;
 		}
 	}
+	}
+	else
+	{
+		# When we only care about the contents, it's faster to read and eval
+		# the whole file at once.
+		local $/;
+		my $full_file = <$ifd>;
+		eval '$data = ' . $full_file;## no critic (ProhibitStringyEval)
+		foreach my $hash_ref (@{$data})
+		{
+			AddDefaultValues($hash_ref, $schema, $catname);
+		}
+	}
+
 	close $ifd;
 
 	# If this is pg_type, auto-generate array types too.


Re: use ARM intrinsics in pg_lfind32() where available

2022-08-23 Thread John Naylor
On Tue, Aug 23, 2022 at 4:15 AM Nathan Bossart  wrote:
>
> On Mon, Aug 22, 2022 at 11:50:35AM +0700, John Naylor wrote:

> > Is this also ever defined on 32-bit? If so, is it safe, meaning the
> > compiler will not emit these instructions without additional flags?
> > I'm wondering if  __aarch64__ would be clearer on that, and if we get
> > windows-on-arm support as has been proposed, could also add _M_ARM64.
>
> I haven't been able to enable __ARM_NEON on 32-bit, but if it is somehow
> possible, we should probably add an __aarch64__ check since functions like
> vmaxvq_u32() do not appear to be available on 32-bit.  I have been able to
> compile for __aarch64__ without __ARM_NEON, so it might still be a good
> idea to check for __ARM_NEON.

The important thing is: if we compile with __aarch64__ as a target:
- Will the compiler emit the intended instructions from the intrinsics
without extra flags?
- Can a user on ARM64 ever get a runtime fault if the machine attempts
to execute NEON instructions? "I have been able to compile for
__aarch64__ without __ARM_NEON" doesn't really answer that question --
what exactly did this entail?

> > I also see #if defined(__aarch64__) || defined(__aarch64) in our
> > codebase already, but I'm not sure what recognizes the latter.
>
> I'm not sure what uses the latter, either.

I took a quick look around at Debian code search, *BSD, Apple, and a
few other places, and I can't find it. Then, I looked at the
discussions around commit 5c7603c318872a42e "Add ARM64 (aarch64)
support to s_lock.h", and the proposed patch [1] only had __aarch64__
. When it was committed, the platform was vaporware and I suppose we
included "__aarch64" as a prophylactic measure because no other reason
was given. It doesn't seem to exist anywhere, so unless someone can
demonstrate otherwise, I'm going to rip it out soon.

[1] 
https://www.postgresql.org/message-id/flat/1368448758.23422.12.camel%40t520.redhat.com

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-23 Thread John Naylor
On Wed, Aug 24, 2022 at 12:15 AM Nathan Bossart
 wrote:
>
> On Tue, Aug 23, 2022 at 01:03:03PM +0700, John Naylor wrote:
> > On Tue, Aug 23, 2022 at 10:32 AM Nathan Bossart
> >> Here's a new version of the patch with the 32-bit changes and calls to
> >> lfind() removed.
> >
> > LGTM overall. My plan is to split out the json piece, adding tests for
> > that, and commit the infrastructure for it fairly soon. Possible
> > bikeshedding: Functions like vector8_eq() might be misunderstood as
> > comparing two vectors, but here we are comparing each lane with a
> > scalar. I wonder if vector8_eq_scalar() et al might be more clear.
>
> Good point.  I had used vector32_veq() to denote vector comparison, which
> would extend to something like vector8_seq().  But that doesn't seem
> descriptive enough.  It might be worth considering vector8_contains() or
> vector8_has() as well.  I don't really have an opinion, but if I had to
> pick something, I guess I'd choose vector8_contains().

It seems "scalar" would be a bad choice since it already means
(confusingly) operating on the least significant element of a vector.
I'm thinking of *_has and *_has_le, matching the already existing in
the earlier patch *_has_zero.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: fix typo - empty statement ;;

2022-08-22 Thread John Naylor
On Tue, Aug 23, 2022 at 7:14 AM Peter Smith  wrote:
>
> I noticed an accidental ;;
>
> PSA patch to remove the same.

Pushed.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Considering additional sort specialisation functions for PG16

2022-08-22 Thread John Naylor
On Tue, Aug 23, 2022 at 9:18 AM David Rowley  wrote:
>
> I have the following dimensions in mind for consideration:
>
> 1. Specialisations to handle sorting of non-null datums (eliminates
> checking for nulls in the comparison function)
> 2. Specialisations to handle single column sorts (eliminates
> tiebreaker function call or any checks for existence of tiebreaker)
> 3. ASC sort (No need for if (ssup->ssup_reverse) 
> INVERT_COMPARE_RESULT(compare))
>
> If we did all of the above then we'd end up with 3 * 2 * 2 * 2 = 24
> specialization functions.  That seems a bit excessive. So here I'd
> like to discuss which ones we should add, if any.
>
> I've attached a very basic implementation of #1 which adds 3 new
> functions for sorting non-null datums.

Did you happen to see

https://www.postgresql.org/message-id/CAFBsxsFhq8VUSkUL5YO17cFXbCPwtbbxBu%2Bd9MFrrsssfDXm3Q%40mail.gmail.com

where I experimented with removing all null handling? What I had in
mind was pre-partitioning nulls and non-nulls when populating the
SortTuple array, then calling qsort twice, once with the non-null
partition with comparators that assume non-null, and (only if there
are additional sort keys) once on the null partition. And the
pre-partitioning would take care of nulls first/last upfront. I
haven't looked into the feasibility of this yet, but the good thing
about the concept is that it removes null handling in the comparators
without additional sort specializations.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-23 Thread John Naylor
On Tue, Aug 23, 2022 at 10:32 AM Nathan Bossart
 wrote:
>
> On Mon, Aug 22, 2022 at 02:22:29PM -0700, Nathan Bossart wrote:
> > On Mon, Aug 22, 2022 at 09:35:34AM +0700, John Naylor wrote:
> >> Not at all! However, the 32-bit-element changes are irrelevant for
> >> json, and make review more difficult. I would suggest keeping those in
> >> the other thread starting with whatever refactoring is needed. I can
> >> always rebase over that.
> >
> > Yeah, I'll remove those to keep this thread focused.
>
> Here's a new version of the patch with the 32-bit changes and calls to
> lfind() removed.

LGTM overall. My plan is to split out the json piece, adding tests for
that, and commit the infrastructure for it fairly soon. Possible
bikeshedding: Functions like vector8_eq() might be misunderstood as
comparing two vectors, but here we are comparing each lane with a
scalar. I wonder if vector8_eq_scalar() et al might be more clear.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Considering additional sort specialisation functions for PG16

2022-08-23 Thread John Naylor
On Tue, Aug 23, 2022 at 11:24 AM David Rowley  wrote:
>
> On Tue, 23 Aug 2022 at 15:22, John Naylor  
> wrote:
> > Did you happen to see
> >
> > https://www.postgresql.org/message-id/CAFBsxsFhq8VUSkUL5YO17cFXbCPwtbbxBu%2Bd9MFrrsssfDXm3Q%40mail.gmail.com
>
> I missed that.  It looks like a much more promising idea than what I
> came up with. I've not looked at your code yet, but I'm interested and
> will aim to look soon.

Note that I haven't actually implemented this idea yet, just tried to
model the effects by lobotomizing the current comparators. I think
it's worth pursuing and will try to come back to it this cycle, but if
you or anyone else wants to try, that's fine of course.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: use ARM intrinsics in pg_lfind32() where available

2022-08-25 Thread John Naylor
On Thu, Aug 25, 2022 at 11:57 AM Nathan Bossart
 wrote:
>
> On Thu, Aug 25, 2022 at 10:38:34AM +0700, John Naylor wrote:
> > On Thu, Aug 25, 2022 at 1:01 AM Nathan Bossart  
> > wrote:
> >> On Wed, Aug 24, 2022 at 11:07:03AM +0700, John Naylor wrote:
> >> > - Can a user on ARM64 ever get a runtime fault if the machine attempts
> >> > to execute NEON instructions?
> >>
> >> IIUC yes, although I'm not sure how likely it is in practice.
> >
> > Given the quoted part above, it doesn't seem likely, but we should try
> > to find out for sure, because a runtime fault is surely not acceptable
> > even on a toy system.
>
> The ARM literature appears to indicate that Neon support is pretty standard
> on aarch64, and AFAICT it's pretty common to just assume it's available.

This doesn't exactly rise to the level of "find out for sure", so I
went looking myself. This is the language I found [1]:

"Both floating-point and NEON are required in all standard ARMv8
implementations. However, implementations targeting specialized
markets may support the following combinations:

No NEON or floating-point.
Full floating-point and SIMD support with exception trapping.
Full floating-point and SIMD support without exception trapping."

Since we assume floating-point, I see no reason not to assume NEON,
but a case could be made for documenting that we require NEON on
aarch64, in addition to exception trapping (for CRC runtime check) and
floating point on any Arm. Or even just say "standard". I don't
believe anyone will want to run Postgres on specialized hardware
lacking these features, so maybe it's a moot point.

[1] 
https://developer.arm.com/documentation/den0024/a/AArch64-Floating-point-and-NEON

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-25 Thread John Naylor
On Fri, Aug 26, 2022 at 10:14 AM Nathan Bossart
 wrote:
>
> > +test_lfind8_internal(uint8 key)
[...]
> > + elog(ERROR, "pg_lfind8() found nonexistent element <= 
> > '0x%x'", key + 1);
> > +}
>
> nitpick: Shouldn't the elog() calls use "==" instead of "<=" for this one?

Good catch, will fix.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: use SSE2 for is_valid_ascii

2022-08-26 Thread John Naylor
On Fri, Aug 26, 2022 at 10:26 AM Nathan Bossart
 wrote:
>
> On Thu, Aug 25, 2022 at 04:41:53PM +0700, John Naylor wrote:
> > v3 applies on top of the v9 json_lex_string patch in [1] and adds a
> > bit more to that, resulting in a simpler patch that is more amenable
> > to additional SIMD-capable platforms.
>
> LGTM

Thanks for looking, pushed with some rearrangements.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-26 Thread John Naylor
On Thu, Aug 25, 2022 at 1:35 PM John Naylor
 wrote:
>
> I think I'll go ahead and commit 0001 in a couple days pending further 
> comments.

Pushed with Nathan's correction and some cosmetic rearrangements.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-19 Thread John Naylor
On Tue, Aug 16, 2022 at 4:23 AM Nathan Bossart  wrote:
>
> On Mon, Aug 15, 2022 at 08:33:21PM +0700, John Naylor wrote:
> > +#ifdef USE_SSE2
> > + chunk = _mm_loadu_si128((const __m128i *) [i]);
> > +#else
> > + memcpy(, [i], sizeof(chunk));
> > +#endif   /* USE_SSE2 */
>
> Perhaps there should be a macro or inline function for loading a vector so
> that these USE_SSE2 checks can be abstracted away, too.

This is done. Also:
- a complete overhaul of the pg_lfind8* tests
- using a typedef for the vector type
- some refactoring, name changes and other cleanups (a few of these
could also be applied to the 32-byte element path, but that is left
for future work)

TODO: json-specific tests of the new path

--
John Naylor
EDB: http://www.enterprisedb.com
diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c
index fefd1d24d9..efcaedd682 100644
--- a/src/common/jsonapi.c
+++ b/src/common/jsonapi.c
@@ -19,6 +19,7 @@
 
 #include "common/jsonapi.h"
 #include "mb/pg_wchar.h"
+#include "port/pg_lfind.h"
 
 #ifndef FRONTEND
 #include "miscadmin.h"
@@ -844,7 +845,7 @@ json_lex_string(JsonLexContext *lex)
 		}
 		else
 		{
-			char	   *p;
+			char	   *p = s;
 
 			if (hi_surrogate != -1)
 return JSON_UNICODE_LOW_SURROGATE;
@@ -853,7 +854,13 @@ json_lex_string(JsonLexContext *lex)
 			 * Skip to the first byte that requires special handling, so we
 			 * can batch calls to appendBinaryStringInfo.
 			 */
-			for (p = s; p < end; p++)
+			while (p < end - sizeof(Vector) &&
+   !pg_lfind8('\\', (uint8 *) p, sizeof(Vector)) &&
+   !pg_lfind8('"', (uint8 *) p, sizeof(Vector)) &&
+   !pg_lfind8_le(0x1F, (uint8 *) p, sizeof(Vector)))
+p += sizeof(Vector);
+
+			for (; p < end; p++)
 			{
 if (*p == '\\' || *p == '"')
 	break;
diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h
index fb125977b2..bb4033c7fc 100644
--- a/src/include/port/pg_lfind.h
+++ b/src/include/port/pg_lfind.h
@@ -1,7 +1,7 @@
 /*-
  *
  * pg_lfind.h
- *	  Optimized linear search routines.
+ *	  Optimized linear search routines using SIMD intrinsics where available
  *
  * Copyright (c) 2022, PostgreSQL Global Development Group
  *
@@ -15,6 +15,68 @@
 
 #include "port/simd.h"
 
+/*
+ * pg_lfind8
+ *
+ * Return true if there is an element in 'base' that equals 'key', otherwise
+ * return false.
+ */
+static inline bool
+pg_lfind8(uint8 key, uint8 *base, uint32 nelem)
+{
+	uint32		i;
+	/* round down to multiple of vector length */
+	uint32		tail_idx = nelem & ~(sizeof(Vector) - 1);
+	Vector		chunk;
+
+	for (i = 0; i < tail_idx; i += sizeof(Vector))
+	{
+		vector_load(, [i]);
+		if (vector_eq_byte(chunk, key))
+			return true;
+	}
+
+	/* Process the remaining elements one at a time. */
+	for (; i < nelem; i++)
+	{
+		if (key == base[i])
+			return true;
+	}
+
+	return false;
+}
+
+/*
+ * pg_lfind8_le
+ *
+ * Return true if there is an element in 'base' that is less than or equal to
+ * 'key', otherwise return false.
+ */
+static inline bool
+pg_lfind8_le(uint8 key, uint8 *base, uint32 nelem)
+{
+	uint32		i;
+	/* round down to multiple of vector length */
+	uint32		tail_idx = nelem & ~(sizeof(Vector) - 1);
+	Vector		chunk;
+
+	for (i = 0; i < tail_idx; i += sizeof(Vector))
+	{
+		vector_load(, [i]);
+		if (vector_le_byte(chunk, key))
+			return true;
+	}
+
+	/* Process the remaining elements one at a time. */
+	for (; i < nelem; i++)
+	{
+		if (base[i] <= key)
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * pg_lfind32
  *
@@ -26,7 +88,6 @@ pg_lfind32(uint32 key, uint32 *base, uint32 nelem)
 {
 	uint32		i = 0;
 
-	/* Use SIMD intrinsics where available. */
 #ifdef USE_SSE2
 
 	/*
diff --git a/src/include/port/simd.h b/src/include/port/simd.h
index a571e79f57..56da8e27cd 100644
--- a/src/include/port/simd.h
+++ b/src/include/port/simd.h
@@ -25,6 +25,141 @@
 #if (defined(__x86_64__) || defined(_M_AMD64))
 #include 
 #define USE_SSE2
+typedef __m128i Vector;
+
+#else
+typedef uint64 Vector;
+#endif			/* (defined(__x86_64__) || defined(_M_AMD64)) */
+
+
+static inline void vector_load(Vector * v, const uint8 *s);
+static inline Vector vector_broadcast(const uint8 c);
+static inline bool vector_has_zero(const Vector v);
+static inline bool vector_le_byte(const Vector v, const uint8 c);
+static inline bool vector_eq_byte(const Vector v, const uint8 c);
+
+
+/* load a chunk of memory into a register */
+static inline void
+vector_load(Vector * v, const uint8 *s)
+{
+#ifdef USE_SSE2
+	*v = _mm_loadu_si128((const __m128i *) s);
+#else
+	memcpy(v, s, sizeof(Vector));
+#endif
+}
+
+/* return a vector with all bytes set to c */
+static inline Vector
+vector_broadcast(const uint8 c)
+{
+#ifdef USE_SSE2
+	return _mm_s

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-28 Thread John Naylor
On Wed, Sep 28, 2022 at 1:18 PM John Naylor 
wrote:
> [stuff about size classes]

I kind of buried the lede here on one thing: If we only have 4 kinds
regardless of the number of size classes, we can use 2 bits of the pointer
for dispatch, which would only require 4-byte alignment. That should make
that technique more portable.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [RFC] building postgres with meson - v13

2022-09-27 Thread John Naylor
On Tue, Sep 27, 2022 at 2:06 AM Andres Freund  wrote:
>
> On 2022-09-26 15:18:29 +0700, John Naylor wrote:
> > Either way it doesn't exist on this machine. I was able to get a working
> > build with
> >
> > /usr/local/Homebrew/Library/Homebrew/os/mac/pkgconfig
>
> Hm - what did you need this path for - I don't think that should be
needed.

I just cargo-culted the pattern from Arm (before I figured out it was Arm)
and used the "find" command to look for the directories by name. I tried
again without specifying any of the three directory flags, and I can run
the tests getting:

Ok: 233
Expected Fail:  0
Fail:   0
Unexpected Pass:0
Skipped:2
Timeout:0

...which is fine for me since I don't do much development on MacOS nowadays.

> > 1) /opt/homebrew/ seems to be an "Apple silicon" path?
>
> Yea, it's /usr/local on x86-64, based on what was required to make macos
CI
> work. I updated the wiki page, half-blindly - it'd be nice if you could
> confirm that that works?

Not sure if you intended for me to try the full script in your last
response or just what's in the wiki page, but for the latter (on commit
bed0927aeb0c6), it fails at

[1656/2199] Linking target src/bin/psql/psql
FAILED: src/bin/psql/psql
clang  -o src/bin/psql/psql
src/bin/psql/psql.p/meson-generated_.._psqlscanslash.c.o
src/bin/psql/psql.p/meson-generated_.._sql_help.c.o
src/bin/psql/psql.p/command.c.o src/bin/psql/psql.p/common.c.o
src/bin/psql/psql.p/copy.c.o src/bin/psql/psql.p/crosstabview.c.o
src/bin/psql/psql.p/describe.c.o src/bin/psql/psql.p/help.c.o
src/bin/psql/psql.p/input.c.o src/bin/psql/psql.p/large_obj.c.o
src/bin/psql/psql.p/mainloop.c.o src/bin/psql/psql.p/prompt.c.o
src/bin/psql/psql.p/startup.c.o src/bin/psql/psql.p/stringutils.c.o
src/bin/psql/psql.p/tab-complete.c.o src/bin/psql/psql.p/variables.c.o
-L/usr/local/opt/readline/lib -L/usr/local/opt/gettext/lib
-L/usr/local/opt/zlib/lib -L/usr/local/opt/openssl/lib
-I/usr/local/opt/readline/include -I/usr/local/opt/gettext/include
-I/usr/local/opt/zlib/include -I/usr/local/opt/openssl/include
-Wl,-dead_strip_dylibs -Wl,-headerpad_max_install_names
-Wl,-undefined,error -isysroot
/Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk
-Wl,-rpath,@loader_path/../../interfaces/libpq -Wl,-rpath,/usr/local/lib
-Wl,-rpath,/usr/local/Cellar/zstd/1.5.2/lib src/fe_utils/libpgfeutils.a
src/common/libpgcommon.a src/common/libpgcommon_ryu.a
src/common/libpgcommon_config_info.a src/port/libpgport.a
src/port/libpgport_crc.a src/interfaces/libpq/libpq.5.dylib -lm
/usr/local/lib/libintl.dylib -ledit -lz
/usr/local/Cellar/zstd/1.5.2/lib/libzstd.dylib -lz -lz -lz
Undefined symbols for architecture x86_64:
  "_rl_completion_suppress_quote", referenced from:
  _psql_completion in tab-complete.c.o
  _quote_file_name in tab-complete.c.o
  _complete_from_files in tab-complete.c.o
  "_rl_filename_dequoting_function", referenced from:
  _initialize_readline in tab-complete.c.o
  "_rl_filename_quote_characters", referenced from:
  _initialize_readline in tab-complete.c.o
  "_rl_filename_quoting_function", referenced from:
  _initialize_readline in tab-complete.c.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see
invocation)

--
John Naylor
EDB: http://www.enterprisedb.com


Re: fixing typo in comment for restriction_is_or_clause

2022-10-24 Thread John Naylor
On Tue, Oct 25, 2022 at 12:19 AM Zhihong Yu  wrote:
>
> Hi,
> When I was looking at src/backend/optimizer/util/restrictinfo.c, I found
a typo in one of the comments.

Using "t" as an abbreviation for "true" was probably intentional, so not a
typo. There is no doubt what the behavior is.

> I also took the chance to simplify the code a little bit.

It's perfectly clear and simple now, even if it doesn't win at "code golf".

--
John Naylor
EDB: http://www.enterprisedb.com


Re: fixing typo in comment for restriction_is_or_clause

2022-10-25 Thread John Naylor
On Tue, Oct 25, 2022 at 9:48 AM Richard Guo  wrote:
>
>
> On Tue, Oct 25, 2022 at 10:05 AM John Naylor 
wrote:
>>
>> It's perfectly clear and simple now, even if it doesn't win at "code
golf".
>
>
> Agree with your point.  Do you think we can further make the one-line
> function a macro or an inline function in the .h file?  I think this
> function is called quite frequently during planning, so maybe doing that
> would bring a little bit of efficiency.

My point was misunderstood, which is: I don't think we need to do anything
at all here if the goal was purely about aesthetics.

If the goal has now changed to efficiency, I have no opinion about that yet
since no evidence has been presented.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-10-26 Thread John Naylor
remind
us.

+ /* there is no key to delete */
+ if (!rt_node_search_leaf(node, key, RT_ACTION_FIND, NULL))
+   return false;
+
+ /* Update the statistics */
+ tree->num_keys--;
+
+ /*
+  * Delete the key from the leaf node and recursively delete the key in
+  * inner nodes if necessary.
+  */
+ Assert(NODE_IS_LEAF(stack[level]));
+ while (level >= 0)
+ {
+   rt_node*node = stack[level--];
+
+   if (NODE_IS_LEAF(node))
+ rt_node_search_leaf(node, key, RT_ACTION_DELETE, NULL);
+   else
+ rt_node_search_inner(node, key, RT_ACTION_DELETE, NULL);
+
+   /* If the node didn't become empty, we stop deleting the key */
+   if (!NODE_IS_EMPTY(node))
+ break;
+
+   /* The node became empty */
+   rt_free_node(tree, node);
+ }

Here we call rt_node_search_leaf() twice -- once to check for existence,
and once to delete. All three search calls are inlined, so this wastes
space. Let's try to delete the leaf, return if not found, otherwise handle
the leaf bookkeepping and loop over the inner nodes. This might require
some duplication of code.

+ndoe_inner_128_update(rt_node_inner_128 *node, uint8 chunk, rt_node *child)

Spelling

+static inline void
+chunk_children_array_copy(uint8 *src_chunks, rt_node **src_children,
+ uint8 *dst_chunks, rt_node **dst_children, int count)
+{
+ memcpy(dst_chunks, src_chunks, sizeof(uint8) * count);
+ memcpy(dst_children, src_children, sizeof(rt_node *) * count);
+}

gcc generates better code with something like this (but not hard-coded) at
the top:

if (count > 4)
pg_unreachable();

This would have to change when we implement shrinking of nodes, but might
still be useful.

+ if (!rt_node_search_leaf(node, key, RT_ACTION_FIND, value_p))
+   return false;
+
+ return true;

Maybe just "return rt_node_search_leaf(...)" ?

-- 
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-10-26 Thread John Naylor
On Thu, Oct 27, 2022 at 9:11 AM Masahiko Sawada 
wrote:
>
> True. I'm going to start with 6 bytes and will consider reducing it to
> 5 bytes.

Okay, let's plan on 6 for now, so we have the worst-case sizes up front. As
discussed, I will attempt the size class decoupling after v8 and see how it
goes.

> Encoding the kind in a pointer tag could be tricky given DSA

If it turns out to be unworkable, that's life. If it's just tricky, that
can certainly be put off for future work. I hope to at least test it out
with local memory.

> support so currently I'm thinking to pack the node kind and node
> capacity classes to uint8.

That won't work, if we need 128 for capacity, leaving no bits left. I want
the capacity to be a number we can directly compare with the count (we
won't ever need to store 256 because that node will never grow). Also,
further to my last message, we need to access the kind quickly, without
more cycles.

> I've made some progress on investigating DSA support. I've written
> draft patch for that and regression tests passed. I'll share it as a
> separate patch for discussion with v8 radix tree patch.

Great!

> While implementing DSA support, I realized that we may not need to use
> pointer tagging to distinguish between backend-local address or
> dsa_pointer. In order to get a backend-local address from dsa_pointer,
> we need to pass dsa_area like:

I was not clear -- when I see how much code changes to accommodate DSA
pointers, I imagine I will pretty much know the places that would be
affected by tagging the pointer with the node kind.

Speaking of tests, there is currently no Meson support, but tests pass
because this library is not used anywhere in the backend yet, and
apparently the CI Meson builds don't know to run the regression test? That
will need to be done too. However, it's okay to keep the benchmarking
module in autoconf, since it won't be committed.

> > +static inline void
> > +chunk_children_array_copy(uint8 *src_chunks, rt_node **src_children,
> > + uint8 *dst_chunks, rt_node **dst_children, int count)
> > +{
> > + memcpy(dst_chunks, src_chunks, sizeof(uint8) * count);
> > + memcpy(dst_children, src_children, sizeof(rt_node *) * count);
> > +}
> >
> > gcc generates better code with something like this (but not hard-coded)
at the top:
> >
> > if (count > 4)
> > pg_unreachable();

Actually it just now occurred to me there's a bigger issue here: *We* know
this code can only get here iff count==4, so why doesn't the compiler know
that? I believe it boils down to

static rt_node_kind_info_elem rt_node_kind_info[RT_NODE_KIND_COUNT] = {

In the assembly, I see it checks if there is room in the node by doing a
runtime lookup in this array, which is not constant. This might not be
important just yet, because I want to base the check on the proposed node
capacity instead, but I mention it as a reminder to us to make sure we take
all opportunities for the compiler to propagate constants.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Documentation for building with meson

2022-10-27 Thread John Naylor
+# Run the main pg_regress and isolation tests
+meson test --suite main

This does not work for me in a fresh install until running

meson test --suite setup

In fact, we see in

https://wiki.postgresql.org/wiki/Meson

meson test --suite setup --suite main

That was just an eyeball check from a naive user -- it would be good to try
running everything documented here.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Reducing the chunk header sizes on all memory context types

2022-10-24 Thread John Naylor
On Thu, Oct 20, 2022 at 1:55 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-10-11 10:21:17 +0700, John Naylor wrote:
> > On Tue, Oct 11, 2022 at 5:31 AM David Rowley 
wrote:
> > >
> > > The proposed patches in [1] do aim to make additional usages of the
> > > slab allocator, and I have a feeling that we'll want to fix the
> > > performance of slab.c before those. Perhaps the Asserts are a better
> > > option if we're to get the proposed radix tree implementation.
> >
> > Going by [1], that use case is not actually a natural fit for slab
because
> > of memory fragmentation.
> >
> > [1]
> >
https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de
>
> Not so sure about that - IIRC I made one slab for each different size
class,
> which seemed to work well and suit slab well?

If that's the case, then great! The linked message didn't give me that
impression, but I won't worry about it.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [RFC] building postgres with meson - v13

2022-09-20 Thread John Naylor
On Wed, Sep 21, 2022 at 7:11 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-09-19 19:16:30 -0700, Andres Freund wrote:
> > To have some initial "translation" for other developers I've started a
wiki
> > page with a translation table. Still very WIP:
> > https://wiki.postgresql.org/wiki/Meson
> >
> > For now, a bit of polishing aside, I'm just planning to add a minimal
> > explanation of what's happening, and a reference to this thread.
>
> I added installation instructions for meson for a bunch of platforms, but

Small typo: The homebrew section is still labeled with "find MacPorts
libraries".

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-21 Thread John Naylor
On Tue, Sep 20, 2022 at 3:19 PM Masahiko Sawada 
wrote:
>
> On Fri, Sep 16, 2022 at 4:54 PM John Naylor
>  wrote:

> > Here again, I'd rather put this off and focus on getting the "large
> > details" in good enough shape so we can got towards integrating with
> > vacuum.
>
> Thank you for the comments! These above comments are addressed by
> Nathan in a newly derived thread. I'll work on the patch.

I still seem to be out-voted on when to tackle this particular
optimization, so I've extended the v6 benchmark code with a hackish
function that populates a fixed number of keys, but with different fanouts.
(diff attached as a text file)

I didn't take particular care to make this scientific, but the following
seems pretty reproducible. Note what happens to load and search performance
when node16 has 15 entries versus 16:

 fanout | nkeys  | rt_mem_allocated | rt_load_ms | rt_search_ms
++--++--
 15 | 327680 |  3776512 | 39 |   20
(1 row)
num_keys = 327680, height = 4, n4 = 1, n16 = 23408, n32 = 0, n128 = 0, n256
= 0

 fanout | nkeys  | rt_mem_allocated | rt_load_ms | rt_search_ms
++--++--
 16 | 327680 |  3514368 | 25 |   11
(1 row)
num_keys = 327680, height = 4, n4 = 0, n16 = 21846, n32 = 0, n128 = 0, n256
= 0

In trying to wrap the SIMD code behind layers of abstraction, the latest
patch (and Nathan's cleanup) threw it away in almost all cases. To explain,
we need to talk about how vectorized code deals with the "tail" that is too
small for the register:

1. Use a one-by-one algorithm, like we do for the pg_lfind* variants.
2. Read some junk into the register and mask off false positives from the
result.

There are advantages to both depending on the situation.

Patch v5 and earlier used #2. Patch v6 used #1, so if a node16 has 15
elements or less, it will iterate over them one-by-one exactly like a
node4. Only when full with 16 will the vector path be taken. When another
entry is added, the elements are copied to the next bigger node, so there's
a *small* window where it's fast.

In short, this code needs to be lower level so that we still have full
control while being portable. I will work on this, and also the related
code for node dispatch.

Since v6 has some good infrastructure to do low-level benchmarking, I also
want to do some experiments with memory management.

(I have further comments about the code, but I will put that off until
later)

> I'll consider how to integrate with vacuum as the next step. One
> concern for me is how to limit the memory usage to
> maintenance_work_mem. Unlike using a flat array, memory space for
> adding one TID varies depending on the situation. If we want strictly
> not to allow using memory more than maintenance_work_mem, probably we
> need to estimate the memory consumption in a conservative way.

+1

--
John Naylor
EDB: http://www.enterprisedb.com
commit 18407962e96ccec6c9aeeba97412edd762a5a4fe
Author: John Naylor 
Date:   Wed Sep 21 11:44:43 2022 +0700

Add special benchmark function to test effect of fanout

diff --git a/contrib/bench_radix_tree/Makefile 
b/contrib/bench_radix_tree/Makefile
index b8f70e12d1..952bb0ceae 100644
--- a/contrib/bench_radix_tree/Makefile
+++ b/contrib/bench_radix_tree/Makefile
@@ -7,7 +7,7 @@ OBJS = \
 EXTENSION = bench_radix_tree
 DATA = bench_radix_tree--1.0.sql
 
-REGRESS = bench
+REGRESS = bench_fixed_height
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/bench_radix_tree/bench_radix_tree--1.0.sql 
b/contrib/bench_radix_tree/bench_radix_tree--1.0.sql
index 6663abe6a4..f2fee15b17 100644
--- a/contrib/bench_radix_tree/bench_radix_tree--1.0.sql
+++ b/contrib/bench_radix_tree/bench_radix_tree--1.0.sql
@@ -40,3 +40,15 @@ OUT load_ms int8)
 returns record
 as 'MODULE_PATHNAME'
 LANGUAGE C STRICT VOLATILE PARALLEL UNSAFE;
+
+create function bench_fixed_height_search(
+fanout int4,
+OUT fanout int4,
+OUT nkeys int8,
+OUT rt_mem_allocated int8,
+OUT rt_load_ms int8,
+OUT rt_search_ms int8
+)
+returns record
+as 'MODULE_PATHNAME'
+LANGUAGE C STRICT VOLATILE PARALLEL UNSAFE;
diff --git a/contrib/bench_radix_tree/bench_radix_tree.c 
b/contrib/bench_radix_tree/bench_radix_tree.c
index 5806ef7519..0778da2d7b 100644
--- a/contrib/bench_radix_tree/bench_radix_tree.c
+++ b/contrib/bench_radix_tree/bench_radix_tree.c
@@ -13,6 +13,7 @@
 #include "fmgr.h"
 #include "funcapi.h"
 #include "lib/radixtree.h"
+#include 
 #include "miscadmin.h"
 #include "utils/timestamp.h"
 
@@ -24,6 +25,7 @@ PG_MODULE_MAGIC;
 PG_FUNCTION_INFO_V1(bench_seq_search);
 PG_FUNCTION_INFO_V1(bench_shuffle_search);
 PG_FUNCTION_INFO_V1(bench_load_random_int);
+PG_FUNCTION_INFO_V1(bench_fixed_height_search);
 
 static radix_tree *rt = NULL;
 static ItemPointer itemptrs = NULL;

Re: broken table formatting in psql

2022-09-12 Thread John Naylor
On Mon, Sep 12, 2022 at 12:44 PM Pavel Stehule  wrote:
> This is not a critical issue, really.  On second thought, I don't see the 
> point in releasing fresh Postgres with this bug, where there is know bugfix - 
> and this bugfix should be compatible (at this moment) with 16.

I agree the actual logic/data change is low-risk. The patch renames
two files, which seems a bit much this late in the cycle. Maybe that's
okay, but I'd like someone else to opine before doing so.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: build remaining Flex files standalone

2022-09-12 Thread John Naylor
On Fri, Sep 9, 2022 at 12:18 PM John Naylor
 wrote:
>
> It seems gramparse.h isn't installed now? In any case, here's a patch
> to move gramparse to the backend dir and stop symlinking/ installing
> gram.h.

Looking more closely at src/include/Makefile, this is incorrect -- all
files in SUBDIRS are copied over. So moving gramparse.h to the backend
will automatically not install it. The explicit install rule for
gram.h was for vpath builds.

CI builds fine. For v2 I only adjusted the commit message. I'll push
in a couple days unless there are objections.
--
John Naylor
EDB: http://www.enterprisedb.com
From dcfe0976e6118a2c385b4473ceab76a742a6f1ff Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Fri, 9 Sep 2022 11:57:39 +0700
Subject: [PATCH v2] Move gramparse.h to src/backend/parser

This header is semi-private, being used only in files related to
raw parsing, so move to the backend directory where those files
live. This allows removal of makefile rules that symlink gram.h to
src/include/parser, since gramparse.h can now include gram.h from within
the same directory. While at it, stop installing gram.h.

Per suggestion from Andres Freund and Peter Eisentraut
Discussion: https://www.postgresql.org/message-id/20220904181759.px6uosll6zbxcum5%40awork3.anarazel.de
---
 src/backend/Makefile| 7 +--
 src/backend/parser/gram.y   | 2 +-
 src/{include => backend}/parser/gramparse.h | 4 ++--
 src/backend/parser/parser.c | 2 +-
 src/backend/parser/scan.l   | 2 +-
 src/include/Makefile| 4 ++--
 src/include/parser/.gitignore   | 1 -
 src/tools/msvc/Install.pm   | 4 
 src/tools/pginclude/cpluspluscheck  | 1 -
 src/tools/pginclude/headerscheck| 1 -
 10 files changed, 8 insertions(+), 20 deletions(-)
 rename src/{include => backend}/parser/gramparse.h (97%)
 delete mode 100644 src/include/parser/.gitignore

diff --git a/src/backend/Makefile b/src/backend/Makefile
index d0d34821d5..181c217fae 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -153,12 +153,7 @@ submake-utils-headers:
 
 .PHONY: generated-headers
 
-generated-headers: $(top_builddir)/src/include/parser/gram.h $(top_builddir)/src/include/storage/lwlocknames.h submake-catalog-headers submake-nodes-headers submake-utils-headers
-
-$(top_builddir)/src/include/parser/gram.h: parser/gram.h
-	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
-	  cd '$(dir $@)' && rm -f $(notdir $@) && \
-	  $(LN_S) "$$prereqdir/$(notdir $<)" .
+generated-headers: $(top_builddir)/src/include/storage/lwlocknames.h submake-catalog-headers submake-nodes-headers submake-utils-headers
 
 $(top_builddir)/src/include/storage/lwlocknames.h: storage/lmgr/lwlocknames.h
 	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ea33784316..82f03fc9c9 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -55,9 +55,9 @@
 #include "catalog/pg_trigger.h"
 #include "commands/defrem.h"
 #include "commands/trigger.h"
+#include "gramparse.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
-#include "parser/gramparse.h"
 #include "parser/parser.h"
 #include "storage/lmgr.h"
 #include "utils/date.h"
diff --git a/src/include/parser/gramparse.h b/src/backend/parser/gramparse.h
similarity index 97%
rename from src/include/parser/gramparse.h
rename to src/backend/parser/gramparse.h
index 41b753a96c..c4726c618d 100644
--- a/src/include/parser/gramparse.h
+++ b/src/backend/parser/gramparse.h
@@ -11,7 +11,7 @@
  * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/parser/gramparse.h
+ * src/backend/parser/gramparse.h
  *
  *-
  */
@@ -26,7 +26,7 @@
  * NB: include gram.h only AFTER including scanner.h, because scanner.h
  * is what #defines YYLTYPE.
  */
-#include "parser/gram.h"
+#include "gram.h"
 
 /*
  * The YY_EXTRA data that a flex scanner allows us to pass around.  Private
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 50227cc098..ef85d3bb68 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -22,7 +22,7 @@
 #include "postgres.h"
 
 #include "mb/pg_wchar.h"
-#include "parser/gramparse.h"
+#include "gramparse.h"
 #include "parser/parser.h"
 #include "parser/scansup.h"
 
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 882e081aae..db8b0fe8eb 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -36,

Re: minimum perl version

2022-09-13 Thread John Naylor
On Wed, Sep 14, 2022 at 6:47 AM Tom Lane  wrote:
>
> I've just switched longfin to use built-from-source perl 5.14.0.

In that case, here is a quick update with commit message. Not yet any
change for MSVC, but I can put together something later.

Since we're much less willing to support older Windows and Visual
Studio versions, maybe it's low-enough risk defer the check to the
Meson conversion? I understand our MSVC process will then go away much
more quickly than autoconf...

-- 
John Naylor
EDB: http://www.enterprisedb.com
From d8e241968395c552d667c1eb393cbaea12f0df6c Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 14 Sep 2022 09:58:13 +0700
Subject: [PATCH v2] Bump minimum Perl version to 5.14

The oldest vendor-shipped Perl in the buildfarm is 5.14.2, which is the
last version that Debian Wheezy shipped. That OS is EOL, but we keep
it running because there is no other convenient way to test certain
non-mainstream 32-bit platforms. There is no bugfix in the .2 release
that is required, and yet it's also not the latest minor release --
that would be 5.14.4. To clarify the situation, we also have arranged the
buildfarm to test 5.14.0. That allows configure scripts and documentation
to state 5.14 without fine print.

Discussion: https://www.postgresql.org/message-id/20220902181553.ev4pgzhubhdkg...@awork3.anarazel.de
---
 config/perl.m4   |  4 ++--
 configure|  6 +++---
 doc/src/sgml/install-windows.sgml|  2 +-
 doc/src/sgml/installation.sgml   |  4 ++--
 src/pl/plperl/plc_perlboot.pl|  1 -
 src/test/perl/PostgreSQL/Test/Cluster.pm |  2 +-
 src/test/perl/README | 10 +++---
 src/tools/msvc/gendef.pl |  1 -
 src/tools/pgindent/pgindent  |  1 -
 9 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/config/perl.m4 b/config/perl.m4
index c9fd91397c..8126e79f67 100644
--- a/config/perl.m4
+++ b/config/perl.m4
@@ -11,11 +11,11 @@ if test "$PERL"; then
   pgac_perl_version=`$PERL -v 2>/dev/null | sed -n ['s/This is perl.*v[a-z ]*\([0-9]\.[0-9][0-9.]*\).*$/\1/p']`
   AC_MSG_NOTICE([using perl $pgac_perl_version])
   if echo "$pgac_perl_version" | sed ['s/[.a-z_]/ /g'] | \
-$AWK '{ if ([$]1 == 5 && ([$]2 > 8 || ($[2] == 8 && [$]3 >= 3))) exit 1; else exit 0;}'
+$AWK '{ if ([$]1 == 5 && ([$]2 >= 14)) exit 1; else exit 0;}'
   then
 AC_MSG_WARN([
 *** The installed version of Perl, $PERL, is too old to use with PostgreSQL.
-*** Perl version 5.8.3 or later is required, but this is $pgac_perl_version.])
+*** Perl version 5.14 or later is required, but this is $pgac_perl_version.])
 PERL=""
   fi
 fi
diff --git a/configure b/configure
index fd2a782454..f325bd85b8 100755
--- a/configure
+++ b/configure
@@ -10491,14 +10491,14 @@ if test "$PERL"; then
   { $as_echo "$as_me:${as_lineno-$LINENO}: using perl $pgac_perl_version" >&5
 $as_echo "$as_me: using perl $pgac_perl_version" >&6;}
   if echo "$pgac_perl_version" | sed 's/[.a-z_]/ /g' | \
-$AWK '{ if ($1 == 5 && ($2 > 8 || ($2 == 8 && $3 >= 3))) exit 1; else exit 0;}'
+$AWK '{ if ($1 == 5 && ($2 >= 14)) exit 1; else exit 0;}'
   then
 { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING:
 *** The installed version of Perl, $PERL, is too old to use with PostgreSQL.
-*** Perl version 5.8.3 or later is required, but this is $pgac_perl_version." >&5
+*** Perl version 5.14 or later is required, but this is $pgac_perl_version." >&5
 $as_echo "$as_me: WARNING:
 *** The installed version of Perl, $PERL, is too old to use with PostgreSQL.
-*** Perl version 5.8.3 or later is required, but this is $pgac_perl_version." >&2;}
+*** Perl version 5.14 or later is required, but this is $pgac_perl_version." >&2;}
 PERL=""
   fi
 fi
diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml
index c00ab2b4b2..29d3294dc8 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -190,7 +190,7 @@ $ENV{MSBFLAGS}="/m";
   or Cygwin Perl will not work. It must also be present in the PATH.
   Binaries can be downloaded from
   https://www.activestate.com;>
-  (Note: version 5.8.3 or later is required,
+  (Note: version 5.14 or later is required,
   the free Standard Distribution is sufficient).
  
 
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 7c79608e55..319c7e6966 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -165,7 +165,7 @@ su - postgres
   PL/Perl you need a full
   Perl installation, including the
   libperl library and the header files.
-  The minimum required version is Perl 5.8.3.
+  The minimu

Re: minimum perl version

2022-09-13 Thread John Naylor
On Wed, Sep 14, 2022 at 10:46 AM Tom Lane  wrote:
>
> John Naylor  writes:
> > On Wed, Sep 14, 2022 at 6:47 AM Tom Lane  wrote:
> >> I've just switched longfin to use built-from-source perl 5.14.0.
>
> > In that case, here is a quick update with commit message. Not yet any
> > change for MSVC, but I can put together something later.
>
> Looks reasonable just by eyeball, did not test.
>
> > Since we're much less willing to support older Windows and Visual
> > Studio versions, maybe it's low-enough risk defer the check to the
> > Meson conversion? I understand our MSVC process will then go away much
> > more quickly than autoconf...
>
> Agreed --- the MSVC scripts are on a pretty short leash now.
> Not clear it's worth fixing them for this point.  If we've
> failed to get rid of them by the time v16 release approaches,
> maybe it'd be worth doing something then.

Okay, pushed with no further MSVC changes, after doing a round on CI.

--
John Naylor
EDB: http://www.enterprisedb.com




Re: build remaining Flex files standalone

2022-09-13 Thread John Naylor
On Wed, Sep 14, 2022 at 6:10 AM Andres Freund  wrote:
>
> On 2022-09-12 14:49:50 +0700, John Naylor wrote:
> > CI builds fine. For v2 I only adjusted the commit message. I'll push
> > in a couple days unless there are objections.
>
> Makes sense to me. Thanks for working on it!

This is done.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: failing to build preproc.c on solaris with sun studio

2022-09-13 Thread John Naylor
On Wed, Sep 14, 2022 at 5:24 AM Thomas Munro  wrote:
>
> On Tue, Sep 6, 2022 at 9:34 AM Tom Lane  wrote:
> > Peter Eisentraut  writes:
> > > Why is this being proposed?
> >
> > Andres is annoyed by the long build time of ecpg, which he has to
> > wait for whether he wants to test it or not.  I could imagine that
> > I might disable ecpg testing on my slowest buildfarm animals, too.
>
> This message triggered me to try to teach ccache how to cache
> preproc.y -> preproc.{c,h}, and I got that basically working[1], but
> upstream doesn't want it (yet).  I'll try again if the proposed
> refactoring to allow more kinds of compiler-like-things goes
> somewhere.  I think that started with people's struggles with GCC vs
> MSVC.  Given the simplicity of this case, though, I suppose we could
> have a little not-very-general shell/python/whatever wrapper script --
> just compute a checksum of the input and keep the output files around.

If we're going to go to this length, it seems more straightforward to
just check the .c/.h files into version control, like every other
project that I have such knowledge of.

To be fair, our grammar changes much more often. One other possible
deal-breaker of that is that it makes it more painful for forks to
maintain additional syntax.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: New strategies for freezing, advancing relfrozenxid early

2022-09-14 Thread John Naylor
On Wed, Sep 14, 2022 at 12:53 AM Peter Geoghegan  wrote:

> This is still only scratching the surface of what is possible with
> dead_items. The visibility map snapshot concept can enable a far more
> sophisticated approach to resource management in vacuumlazy.c. It
> could help us to replace a simple array of item pointers (the current
> dead_items array) with a faster and more space-efficient data
> structure. Masahiko Sawada has done a lot of work on this recently, so
> this may interest him.

I don't quite see how it helps "enable" that. It'd be more logical to
me to say the VM snapshot *requires* you to think harder about
resource management, since a palloc'd snapshot should surely be
counted as part of the configured memory cap that admins control.
(Commonly, it'll be less than a few dozen MB, so I'll leave that
aside.) Since Masahiko hasn't (to my knowlege) gone as far as
integrating his ideas into vacuum, I'm not sure if the current state
of affairs has some snag that a snapshot will ease, but if there is,
you haven't described what it is.

I do remember your foreshadowing in the radix tree thread a while
back, and I do think it's an intriguing idea to combine pages-to-scan
and dead TIDs in the same data structure. The devil is in the details,
of course. It's worth looking into.

> VM snapshots could also make it practical for the new data structure
> to spill to disk to avoid multiple index scans/passed by VACUUM.

I'm not sure spilling to disk is solving the right problem (as opposed
to the hash join case, or to the proposed conveyor belt system which
has a broader aim). I've found several times that a customer will ask
if raising maintenance work mem from 1GB to 10GB will make vacuum
faster. Looking at the count of index scans, it's pretty much always
"1", so even if the current approach could scale above 1GB, "no" it
wouldn't help to raise that limit.

Your mileage may vary, of course.

Continuing my customer example, searching the dead TID list faster
*will* make vacuum faster. The proposed tree structure is more memory
efficient, and IIUC could scale beyond 1GB automatically since each
node is a separate allocation, so the answer will be "yes" in the rare
case the current setting is in fact causing multiple index scans.
Furthermore, it doesn't have to anticipate the maximum size, so there
is no up front calculation assuming max-tuples-per-page, so it
automatically uses less memory for less demanding tables.

(But +1 for changing that calculation for as long as we do have the
single array.)

--
John Naylor
EDB: http://www.enterprisedb.com




Re: Inconsistencies in error messages

2022-09-14 Thread John Naylor
On Wed, Sep 14, 2022 at 5:01 PM Ekaterina Kiryanova
 wrote:
>
> Hi,
>
> When translating error messages, Alexander Lakhin
> () noticed some inconsistencies so I prepared a
> small patch to fix those.

+1

This one

- errmsg("background worker \"%s\": background worker without shared
memory access are not supported",
+ errmsg("background worker \"%s\": background workers without shared
memory access are not supported",

is a grammar error so worth backpatching, but the rest are cosmetic.

Will commit this way unless there are objections.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: minimum perl version

2022-09-13 Thread John Naylor
On Sat, Sep 3, 2022 at 2:11 AM Tom Lane  wrote:
>
> Andres Freund  writes:
> > 5.14 would be a trivial lift as far as the buildfarm is concerned.
>
> Yeah, that seems like a reasonable new minimum for Perl.  I might
> see about setting up an animal running 5.14.0, just so we can say
> "5.14" in the docs without fine print.

Until such time as that happens, here is a draft to require 5.14.2.

-- 
John Naylor
EDB: http://www.enterprisedb.com
 config/perl.m4   |  4 ++--
 configure|  6 +++---
 doc/src/sgml/install-windows.sgml|  2 +-
 doc/src/sgml/installation.sgml   |  4 ++--
 src/pl/plperl/plc_perlboot.pl|  1 -
 src/test/perl/PostgreSQL/Test/Cluster.pm |  2 +-
 src/test/perl/README | 10 +++---
 src/tools/msvc/gendef.pl |  1 -
 src/tools/pgindent/pgindent  |  1 -
 9 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/config/perl.m4 b/config/perl.m4
index c9fd91397c..29f54bbb79 100644
--- a/config/perl.m4
+++ b/config/perl.m4
@@ -11,11 +11,11 @@ if test "$PERL"; then
   pgac_perl_version=`$PERL -v 2>/dev/null | sed -n ['s/This is perl.*v[a-z ]*\([0-9]\.[0-9][0-9.]*\).*$/\1/p']`
   AC_MSG_NOTICE([using perl $pgac_perl_version])
   if echo "$pgac_perl_version" | sed ['s/[.a-z_]/ /g'] | \
-$AWK '{ if ([$]1 == 5 && ([$]2 > 8 || ($[2] == 8 && [$]3 >= 3))) exit 1; else exit 0;}'
+$AWK '{ if ([$]1 == 5 && ([$]2 > 14 || ($[2] == 14 && [$]3 >= 2))) exit 1; else exit 0;}'
   then
 AC_MSG_WARN([
 *** The installed version of Perl, $PERL, is too old to use with PostgreSQL.
-*** Perl version 5.8.3 or later is required, but this is $pgac_perl_version.])
+*** Perl version 5.14.2 or later is required, but this is $pgac_perl_version.])
 PERL=""
   fi
 fi
diff --git a/configure b/configure
index fd2a782454..160c181441 100755
--- a/configure
+++ b/configure
@@ -10491,14 +10491,14 @@ if test "$PERL"; then
   { $as_echo "$as_me:${as_lineno-$LINENO}: using perl $pgac_perl_version" >&5
 $as_echo "$as_me: using perl $pgac_perl_version" >&6;}
   if echo "$pgac_perl_version" | sed 's/[.a-z_]/ /g' | \
-$AWK '{ if ($1 == 5 && ($2 > 8 || ($2 == 8 && $3 >= 3))) exit 1; else exit 0;}'
+$AWK '{ if ($1 == 5 && ($2 > 14 || ($2 == 14 && $3 >= 2))) exit 1; else exit 0;}'
   then
 { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING:
 *** The installed version of Perl, $PERL, is too old to use with PostgreSQL.
-*** Perl version 5.8.3 or later is required, but this is $pgac_perl_version." >&5
+*** Perl version 5.14.2 or later is required, but this is $pgac_perl_version." >&5
 $as_echo "$as_me: WARNING:
 *** The installed version of Perl, $PERL, is too old to use with PostgreSQL.
-*** Perl version 5.8.3 or later is required, but this is $pgac_perl_version." >&2;}
+*** Perl version 5.14.2 or later is required, but this is $pgac_perl_version." >&2;}
 PERL=""
   fi
 fi
diff --git a/doc/src/sgml/install-windows.sgml b/doc/src/sgml/install-windows.sgml
index c00ab2b4b2..2e41d75db0 100644
--- a/doc/src/sgml/install-windows.sgml
+++ b/doc/src/sgml/install-windows.sgml
@@ -190,7 +190,7 @@ $ENV{MSBFLAGS}="/m";
   or Cygwin Perl will not work. It must also be present in the PATH.
   Binaries can be downloaded from
   https://www.activestate.com;>
-  (Note: version 5.8.3 or later is required,
+  (Note: version 5.14.2 or later is required,
   the free Standard Distribution is sufficient).
  
 
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 7c79608e55..5d7c573729 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -165,7 +165,7 @@ su - postgres
   PL/Perl you need a full
   Perl installation, including the
   libperl library and the header files.
-  The minimum required version is Perl 5.8.3.
+  The minimum required version is Perl 5.14.2.
   Since PL/Perl will be a shared
   library, the libperl
   libperl library must be a shared library
@@ -325,7 +325,7 @@ su - postgres
perl
   
 
-  Perl 5.8.3 or later is needed to build from a Git checkout,
+  Perl 5.14.2 or later is needed to build from a Git checkout,
   or if you changed the input files for any of the build steps that
   use Perl scripts.  If building on Windows you will need
   Perl in any case.  Perl is
diff --git a/src/pl/plperl/plc_perlboot.pl b/src/pl/plperl/plc_perlboot.pl
index 8fd7f998bc..72cb53f6e3 100644
--- a/src/pl/plperl/plc_perlboot.pl
+++ b/src/pl/plperl/plc_perlboot.pl
@@ -6,7 +6,6 @@
 use strict;
 use warnings;
 
-use 5.008001;
 use vars qw(%_SHARED $_TD);
 
 Pos

Re: proposal: possibility to read dumped table's name from file

2022-09-13 Thread John Naylor
On Mon, Sep 12, 2022 at 8:10 PM Pavel Stehule  wrote:
>
> po 12. 9. 2022 v 9:59 odesílatel Daniel Gustafsson  napsal:
>> I don't the capabilities of the tool is all that interesting compared to the
>> long term maintainability and readability of the source code.

With make distprep and maintainer-clean, separate makefile and MSVC
build logic a short time before converting to Meson, I'm not sure that
even the short term maintainability here is a good trade off for what
we're getting.

> The parser in bison/flex does the same work and it is true, so code is more 
> readable. Although for this case, a handy written parser was trivial too.

If the hand-written version is trivial, then we should prefer it.
--
John Naylor
EDB: http://www.enterprisedb.com




Re: broken table formatting in psql

2022-09-13 Thread John Naylor
On Thu, Sep 8, 2022 at 12:39 PM John Naylor
 wrote:
>
> On Fri, Sep 2, 2022 at 3:19 PM Kyotaro Horiguchi
>  wrote:
> >
> > At Fri, 2 Sep 2022 13:43:50 +0700, John Naylor 
> >  wrote in
> > > If there is any doubt about including all of Cf, we could also just
> > > add a branch in wchar.c to hard-code the 200B-200F range.
> >
> > If every way has defect to the similar extent, I think we will choose
> > to use authoritative data at least for the first step. We might want
> > to have additional filtering on it but it would be another issue,
> > maybe.
> >
> > Attached is the first cut of that. (The commit messages is not great,
> > though.)
>
> Okay, the patch looks good to me overall.

As discussed, I pushed to master only, with only one additional
comment in the perl script to describe Me/Mn/Cf.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: minimum perl version

2022-09-13 Thread John Naylor
On Tue, Sep 13, 2022 at 5:53 PM John Naylor
 wrote:
>
> Until such time as that happens, here is a draft to require 5.14.2.

As soon as I hit send, it occurred to me that we don't check the perl
version on Windows, since (I seem to recall) 5.8.3 was too old to be
an option on that platform. We'll have to add a new check somewhere.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: broken table formatting in psql

2022-09-07 Thread John Naylor
On Fri, Sep 2, 2022 at 3:19 PM Kyotaro Horiguchi
 wrote:
>
> At Fri, 2 Sep 2022 13:43:50 +0700, John Naylor  
> wrote in
> > If there is any doubt about including all of Cf, we could also just
> > add a branch in wchar.c to hard-code the 200B-200F range.
>
> If every way has defect to the similar extent, I think we will choose
> to use authoritative data at least for the first step. We might want
> to have additional filtering on it but it would be another issue,
> maybe.
>
> Attached is the first cut of that. (The commit messages is not great,
> though.)

Okay, the patch looks good to me overall. Comparing releases, some
other ranges were in v11 but left out in v12 with the transition to
using a script:

0x070F
{0x200B, 0x200F}
{0x202A, 0x202E}
{0x206A, 0x206F}
0xFEFF
{0xFFF9, 0xFFFB}

Does anyone want to advocate for backpatching these missing ranges to
v12 and up? v12 still has a table in-line so trivial to remedy, but
v13 and up use a script, so these exceptions would likely have to use
hard-coded branches to keep from bringing in new changes.

If so, does anyone want to advocate for including this patch in v15?
It claims Unicode 14.0.0, and this would make that claim more
technically correct as well as avoiding additional branches.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [RFC] building postgres with meson - v12

2022-09-08 Thread John Naylor
On Wed, Sep 7, 2022 at 3:36 PM Alvaro Herrera  wrote:
>
> On 2022-Sep-06, John Naylor wrote:
>
> > Note that the indentation hasn't changed. My thought there: perltidy
> > will be run again next year, at which time it will be part of a listed
> > whitespace-only commit. Any objections, since that could confuse
> > someone before then?
>
> I think a good plan is to commit the fix without tidy, then commit the
> tidy separately, then add the latter commit to .git-blame-ignore-revs.
> That avoids leaving the code untidy for a year.

Okay, done that way. I also made sure we got the same info for error
reporting. It's not identical, but arguably better, going from:

Bareword found where operator expected at (eval 4480) line 3, near "'btree' xxx"
(Missing operator before xxx?)
../../../src/include/catalog/pg_amop.dat: error parsing line 20:

to:

Bareword found where operator expected at (eval 12) line 20, near "'btree' xxx"
(Missing operator before xxx?)
error parsing ../../../src/include/catalog/pg_amop.dat

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: build remaining Flex files standalone

2022-09-08 Thread John Naylor
On Wed, Sep 7, 2022 at 4:27 PM Peter Eisentraut
 wrote:
>
> On 04.09.22 20:17, Andres Freund wrote:
> > I think, as a followup improvement, we should move gramparse.h to
> > src/backend/parser, and stop installing gram.h, gramparse.h. gramparse.h
> > already had this note:
> >
> >   * NOTE: this file is only meant to be included in the core parsing files,
> >   * i.e., parser.c, gram.y, and scan.l.
> >   * Definitions that are needed outside the core parser should be in 
> > parser.h.
> >
> > What do you think?
>
> I found in my notes:
>
> * maybe gram.h and gramparse.h should not be installed
>
> So, yeah. ;-)

It seems gramparse.h isn't installed now? In any case, here's a patch
to move gramparse to the backend dir and stop symlinking/ installing
gram.h. Confusingly, MSVC didn't seem to copy gram.h to src/include,
so I'm not yet sure how it still managed to build...

-- 
John Naylor
EDB: http://www.enterprisedb.com
From ff89180d7b0fe4c95a470a657ee465c5384d6ecc Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Fri, 9 Sep 2022 11:57:39 +0700
Subject: [PATCH v1] Move gramparse.h to src/backend/parser

This allows removal of makefile rules that symlink gram.h to
src/include/parser. While at it, stop installing gram.h. This
seems unnecessary at present time.

Idea from Andres Freund and Peter Eisentraut
Discussion: https://www.postgresql.org/message-id/20220904181759.px6uosll6zbxcum5%40awork3.anarazel.de
---
 src/backend/Makefile| 7 +--
 src/backend/parser/gram.y   | 2 +-
 src/{include => backend}/parser/gramparse.h | 4 ++--
 src/backend/parser/parser.c | 2 +-
 src/backend/parser/scan.l   | 2 +-
 src/include/Makefile| 4 ++--
 src/include/parser/.gitignore   | 1 -
 src/tools/msvc/Install.pm   | 4 
 src/tools/pginclude/cpluspluscheck  | 1 -
 src/tools/pginclude/headerscheck| 1 -
 10 files changed, 8 insertions(+), 20 deletions(-)
 rename src/{include => backend}/parser/gramparse.h (97%)
 delete mode 100644 src/include/parser/.gitignore

diff --git a/src/backend/Makefile b/src/backend/Makefile
index 5c4772298d..42cc3bda1d 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -153,12 +153,7 @@ submake-utils-headers:
 
 .PHONY: generated-headers
 
-generated-headers: $(top_builddir)/src/include/parser/gram.h $(top_builddir)/src/include/storage/lwlocknames.h submake-catalog-headers submake-nodes-headers submake-utils-headers
-
-$(top_builddir)/src/include/parser/gram.h: parser/gram.h
-	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
-	  cd '$(dir $@)' && rm -f $(notdir $@) && \
-	  $(LN_S) "$$prereqdir/$(notdir $<)" .
+generated-headers: $(top_builddir)/src/include/storage/lwlocknames.h submake-catalog-headers submake-nodes-headers submake-utils-headers
 
 $(top_builddir)/src/include/storage/lwlocknames.h: storage/lmgr/lwlocknames.h
 	prereqdir=`cd '$(dir $<)' >/dev/null && pwd` && \
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0492ff9a66..668ad3fa8e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -55,9 +55,9 @@
 #include "catalog/pg_trigger.h"
 #include "commands/defrem.h"
 #include "commands/trigger.h"
+#include "gramparse.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
-#include "parser/gramparse.h"
 #include "parser/parser.h"
 #include "storage/lmgr.h"
 #include "utils/date.h"
diff --git a/src/include/parser/gramparse.h b/src/backend/parser/gramparse.h
similarity index 97%
rename from src/include/parser/gramparse.h
rename to src/backend/parser/gramparse.h
index 41b753a96c..c4726c618d 100644
--- a/src/include/parser/gramparse.h
+++ b/src/backend/parser/gramparse.h
@@ -11,7 +11,7 @@
  * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
  *
- * src/include/parser/gramparse.h
+ * src/backend/parser/gramparse.h
  *
  *-
  */
@@ -26,7 +26,7 @@
  * NB: include gram.h only AFTER including scanner.h, because scanner.h
  * is what #defines YYLTYPE.
  */
-#include "parser/gram.h"
+#include "gram.h"
 
 /*
  * The YY_EXTRA data that a flex scanner allows us to pass around.  Private
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 50227cc098..ef85d3bb68 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -22,7 +22,7 @@
 #include "postgres.h"
 
 #include "mb/pg_wchar.h"
-#include "parser/gramparse.h"
+#include "gramparse.h"
 #include "parser/parser.h&qu

Re: Minimum bison and flex versions

2022-09-08 Thread John Naylor
On Fri, Sep 9, 2022 at 12:07 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-09-06 13:32:32 +0700, John Naylor wrote:
> > Here are autoconf-only patches to that effect.
>
> Looks like you did actually include src/tools/msvc as well :)

Ah, in my head I meant "no patches for the Meson branch". :-)

CI fails on MSVC pg_upgrade, but that shows up in some existing CF bot
failures and in any case doesn't have a grammar, so I have pushed,
thanks for looking!

> > Since the retirement of some older buildfarm members, the oldest Bison
> > that gets regular testing is 2.3. MacOS ships that version, and will
> > continue doing so for the forseeable future because of Apple's policy
> > regarding GPLv3. While Mac users could use a package manager to install
> > a newer version, there is no compelling reason to do so at this time.
>
> s/to do so/to force them to do so/?

There are good reasons for a dev to install a newer Bison, like better
diagnostics, so used this language.

--
John Naylor
EDB: http://www.enterprisedb.com




Re: Cleaning up historical portability baggage

2022-09-15 Thread John Naylor
On Thu, Sep 15, 2022 at 3:11 PM Ibrar Ahmed  wrote:
> The patch does not apply successfully; please rebase the patch.

There's a good reason for that -- the latest one was committed two
weeks ago. The status should still be waiting on author, though,
namely for:

On Fri, Aug 26, 2022 at 5:28 AM Thomas Munro  wrote:
> Remaining things from this thread:
>  * removing --disable-thread-safety
>  * removing those vestigial HAVE_XXX macros (one by one analysis and patches)
>  * making Unix sockets secure for Windows in tests

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Support for Rust

2022-09-19 Thread John Naylor
On Mon, Sep 12, 2022 at 10:29 PM Tom Lane  wrote:
>
> Lev Kokotov  writes:
> > I took a small part of Postgres to get started, so just as a PoC; it
> > compiles and runs though. Larger parts will take more work (deleting
code,
> > not just swapping object files), and more fancy things like PG_TRY() and
> > friends will have to be rewritten, so not a short and easy migration.
>
> Yeah, that's what I thought.  "Allow some parts to be written in
> language X" soon turns into "Rewrite the entire system in language X,
> including fundamental rethinking of memory management, error handling,
> and some other things".  That's pretty much a non-starter.

Added "Rewrite the code in a different language" to "Features We Do Not
Want" section of Wiki, referencing the two threads that came up:

https://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want

-- 
John Naylor
EDB: http://www.enterprisedb.com


Re: Typo in xact.c

2022-09-18 Thread John Naylor
On Fri, Sep 16, 2022 at 10:51 AM John Naylor
 wrote:
>
> On Fri, Sep 16, 2022 at 10:11 AM Kyotaro Horiguchi
>  wrote:
> >
> > The patch seems to me covering all occurances of PG_PROC as PGPROC.
>
> +1 since this hinders grep-ability.

Pushed this.

> > I found several uses of PG_PROC as (pg_catalog.)pg_proc, which is
> > quite confusing, too..
>
> It's pretty obvious to me what that refers to in primnodes.h, although
> the capitalization of (some, but not all) catalog names in comments in
> that file is a bit strange. Maybe not worth changing there.

I left this alone. It's not wrong, and I don't think it's confusing in context.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-22 Thread John Naylor
On Thu, Sep 22, 2022 at 7:52 PM John Naylor 
wrote:
>
>
> On Thu, Sep 22, 2022 at 1:26 PM Masahiko Sawada 
wrote:
> > Good point. While keeping the chunks in the small nodes in sorted
> > order is useful for visiting all keys in sorted order, additional
> > branches and memmove calls could be slow.
>
> Right, the ordering is a property that some users will need, so best to
keep it. Although the node128 doesn't have that property -- too slow to do
so, I think.

Nevermind, I must have been mixing up keys and values there...

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-22 Thread John Naylor
On Thu, Sep 22, 2022 at 11:46 AM John Naylor 
wrote:
> One thing I want to try soon is storing fewer than 16/32 etc entries, so
that the whole node fits comfortably inside a power-of-two allocation. That
would allow us to use aset without wasting space for the smaller nodes,
which would be faster and possibly would solve the fragmentation problem
Andres referred to in

>
https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de

While calculating node sizes that fit within a power-of-two size, I noticed
the current base node is a bit wasteful, taking up 8 bytes. The node kind
only has a small number of values, so it doesn't really make sense to use
an enum here in the struct (in fact, Andres' prototype used a uint8 for
node_kind). We could use a bitfield for the count and kind:

uint16 -- kind and count bitfield
uint8 shift;
uint8 chunk;

That's only 4 bytes. Plus, if the kind is ever encoded in a pointer tag,
the bitfield can just go back to being count only.

Here are the v6 node kinds:

node4:   8 +   4 +(4)+   4*8 =   48 bytes
node16:  8 +  16 +  16*8 =  152
node32:  8 +  32 +  32*8 =  296
node128: 8 + 256 + 128/8 + 128*8 = 1304
node256: 8   + 256/8 + 256*8 = 2088

And here are the possible ways we could optimize nodes for space using aset
allocation. Parentheses are padding bytes. Even if my math has mistakes,
the numbers shouldn't be too far off:

node3:   4 +   3 +(1)+   3*8 =   32 bytes
node6:   4 +   6 +(6)+   6*8 =   64
node13:  4 +  13 +(7)+  13*8 =  128
node28:  4 +  28 +  28*8 =  256
node31:  4 + 256 +  32/8 +  31*8 =  512 (XXX not good)
node94:  4 + 256 +  96/8 +  94*8 = 1024
node220: 4 + 256 + 224/8 + 220*8 = 2048
node256: = 4096

The main disadvantage is that node256 would balloon in size.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-22 Thread John Naylor
On Thu, Sep 22, 2022 at 1:26 PM Masahiko Sawada 
wrote:
>
> On Thu, Sep 22, 2022 at 1:46 PM John Naylor
>  wrote:
> > While on the subject, I wonder how important it is to keep the chunks
in the small nodes in sorted order. That adds branches and memmove calls,
and is the whole reason for the recent "pg_lfind_ge" function.
>
> Good point. While keeping the chunks in the small nodes in sorted
> order is useful for visiting all keys in sorted order, additional
> branches and memmove calls could be slow.

Right, the ordering is a property that some users will need, so best to
keep it. Although the node128 doesn't have that property -- too slow to do
so, I think.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [RFC] building postgres with meson - v13

2022-09-26 Thread John Naylor
On Wed, Sep 21, 2022 at 7:11 AM Andres Freund  wrote:
>
> I added installation instructions for meson for a bunch of platforms, but

A couple more things for the wiki:

1) /opt/homebrew/ seems to be an "Apple silicon" path? Either way it
doesn't exist on this machine. I was able to get a working build with

/usr/local/Homebrew/Library/Homebrew/os/mac/pkgconfig

(My homebrew install doesn't seem to have anything relevant for
extra_include_dirs or extra_lib_dirs.)

2) Also, "ninja -v install" has the same line count as "ninja install" --
are there versions that do something different?

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-16 Thread John Naylor
On Fri, Sep 16, 2022 at 1:01 PM Masahiko Sawada  wrote:
>
> On Mon, Aug 15, 2022 at 10:39 PM John Naylor
>  wrote:
> >
> > bool, buth = and <=. Should be pretty close. Also, i believe if you
> > left this for last as a possible refactoring, it might save some work.

v6 demonstrates why this should have been put off towards the end. (more below)

> > In any case, I'll take a look at the latest patch next month.

Since the CF entry said "Needs Review", I began looking at v5 again
this week. Hopefully not too much has changed, but in the future I
strongly recommend setting to "Waiting on Author" if a new version is
forthcoming. I realize many here share updated patches at any time,
but I'd like to discourage the practice especially for large patches.

> I've updated the radix tree patch. It's now separated into two patches.
>
> 0001 patch introduces pg_lsearch8() and pg_lsearch8_ge() (we may find
> better names) that are similar to the pg_lfind8() family but they
> return the index of the key in the vector instead of true/false. The
> patch includes regression tests.

I don't want to do a full review of this just yet, but I'll just point
out some problems from a quick glance.

+/*
+ * Return the index of the first element in the vector that is greater than
+ * or eual to the given scalar. Return sizeof(Vector8) if there is no such
+ * element.

That's a bizarre API to indicate non-existence.

+ *
+ * Note that this function assumes the elements in the vector are sorted.
+ */

That is *completely* unacceptable for a general-purpose function.

+#else /* USE_NO_SIMD */
+ Vector8 r = 0;
+ uint8 *rp = (uint8 *) 
+
+ for (Size i = 0; i < sizeof(Vector8); i++)
+ rp[i] = (((const uint8 *) )[i] == ((const uint8 *) )[i]) ? 0xFF : 0;

I don't think we should try to force the non-simd case to adopt the
special semantics of vector comparisons. It's much easier to just use
the same logic as the assert builds.

+#ifdef USE_SSE2
+ return (uint32) _mm_movemask_epi8(v);
+#elif defined(USE_NEON)
+ static const uint8 mask[16] = {
+1 << 0, 1 << 1, 1 << 2, 1 << 3,
+1 << 4, 1 << 5, 1 << 6, 1 << 7,
+1 << 0, 1 << 1, 1 << 2, 1 << 3,
+1 << 4, 1 << 5, 1 << 6, 1 << 7,
+  };
+
+uint8x16_t masked = vandq_u8(vld1q_u8(mask), (uint8x16_t)
vshrq_n_s8(v, 7));
+uint8x16_t maskedhi = vextq_u8(masked, masked, 8);
+
+return (uint32) vaddvq_u16((uint16x8_t) vzip1q_u8(masked, maskedhi));

For Arm, we need to be careful here. This article goes into a lot of
detail for this situation:

https://community.arm.com/arm-community-blogs/b/infrastructure-solutions-blog/posts/porting-x86-vector-bitmask-optimizations-to-arm-neon

Here again, I'd rather put this off and focus on getting the "large
details" in good enough shape so we can got towards integrating with
vacuum.

> In addition to two patches, I've attached the third patch. It's not
> part of radix tree implementation but introduces a contrib module
> bench_radix_tree, a tool for radix tree performance benchmarking. It
> measures loading and lookup performance of both the radix tree and a
> flat array.

Excellent! This was high on my wish list.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Typo in xact.c

2022-09-15 Thread John Naylor
On Fri, Sep 16, 2022 at 10:11 AM Kyotaro Horiguchi
 wrote:
>
> At Thu, 15 Sep 2022 22:38:01 +0800, Japin Li  wrote in
> >
> > Hi hacker,
> >
> > Recently, I find there might be a typo in xact.c comments.  The comments
> > say "PG_PROC", however, it actually means "PGPROC" structure.  Since we
> > have pg_proc catalog, and use PG_PROC to reference the catalog [1], so,
> > we should use PGPROC to reference the structure.  Any thoughts?
> >
> > [1] src/include/nodes/primnodes.h
>
> The patch seems to me covering all occurances of PG_PROC as PGPROC.

+1 since this hinders grep-ability.

> I found several uses of PG_PROC as (pg_catalog.)pg_proc, which is
> quite confusing, too..

It's pretty obvious to me what that refers to in primnodes.h, although
the capitalization of (some, but not all) catalog names in comments in
that file is a bit strange. Maybe not worth changing there.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: Inconsistencies in error messages

2022-09-15 Thread John Naylor
On Wed, Sep 14, 2022 at 5:25 PM John Naylor
 wrote:
> Will commit this way unless there are objections.

I forgot to mention yesterday, but this is done.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: New strategies for freezing, advancing relfrozenxid early

2022-09-15 Thread John Naylor
On Wed, Sep 14, 2022 at 11:33 PM Peter Geoghegan  wrote:
>
> On Wed, Sep 14, 2022 at 3:18 AM John Naylor

> > Furthermore, it doesn't have to anticipate the maximum size, so there
> > is no up front calculation assuming max-tuples-per-page, so it
> > automatically uses less memory for less demanding tables.
>
> The final number of TIDs doesn't seem like the most interesting
> information that VM snapshots could provide us when it comes to
> building the dead_items TID data structure -- the *distribution* of
> TIDs across heap pages seems much more interesting. The "shape" can be
> known ahead of time, at least to some degree. It can help with
> compression, which will reduce cache misses.

My point here was simply that spilling to disk is an admission of
failure to utilize memory efficiently and thus shouldn't be a selling
point of VM snapshots. Other selling points could still be valid.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PATCH] Clarify the comments about varlena header encoding

2022-09-07 Thread John Naylor
On Tue, Sep 6, 2022 at 5:19 PM Aleksander Alekseev
 wrote:
>
> Hi John,
>
> Thanks for the feedback.
>
> > Maybe "00xx 4-byte length word (aligned)," is more clear about
> > what is aligned. Also, adding all those xxx's obscures the point that
> > we only need to examine one byte to figure out what to do next.
>
> IMO "00xx 4-byte length word" is still confusing. One can misread
> this as a 00-xx-xx-xx hex value, where the first byte (not two bits)
> is 00h.

The top of the comment literally says

 * Bit layouts for varlena headers on big-endian machines:

...but maybe we can say at the top that we inspect the first byte to
determine what kind of header it is. Or put the now-standard 0b in
front.

> > The patch has:
> >
> > + * In the third case the va_tag field (see varattrib_1b_e) is used to 
> > discern
> > + * the specific type and length of the pointer datum. On disk the "xxx" 
> > bits
> > + * currently always store sizeof(varatt_external) + 2.
> >
> > ...so not sure where 17 came from.
>
> Right, AFTER applying the patch it's clear that it's actually 18
> bytes.

Okay, I see now that this quote from your first email:

"This is misleading too. The comments above this line say that `struct
varatt_external` is a TOAST pointer. sizeof(varatt_external) = 16,
plus 1 byte equals 17, right? However the documentation [1] claims the
result should be 18:"

...is not your thought, but one of a fictional misled reader. I
actually found this phrase more misleading than the header comments.
:-)

I think the problem is ambiguity about what a "toast pointer" is. This comment:

 * struct varatt_external is a traditional "TOAST pointer", that is, the

has caused people to think a toasted value in the main relation takes
up 16 bytes on disk sizeof(varatt_external) = 16, when it's actually
18. Is the 16 the "toast pointer" or the 18?

> > - * 1000 1-byte length word, unaligned, TOAST pointer
> > + * 1000 , TOAST pointer (struct varatt_external)
> >
> > This implies that the header is two bytes, which is not accurate. That
> > next byte is a type tag:
> > [...]
> > ...and does not always represent the on-disk length:
>
> Well, the comments don't say what is the header and what is the type
> tag.

Because the comments explain the following macros that read bits in
the *first* byte of a 1- or 4-byte header to determine what kind it
is.

> They merely describe the bit layouts. The patch doesn't seem to
> make things worse in this respect. Do you think we should address this
> too? I suspect that describing the difference between the header and
> the type tag here will create even more confusion.

I said nothing about describing the difference between the header and
type tag. The patch added xxx's for the type tag in a comment about
the header. This is more misleading than what is there now.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: failing to build preproc.c on solaris with sun studio

2022-09-07 Thread John Naylor
On Wed, Sep 7, 2022 at 1:45 PM Peter Eisentraut
 wrote:
>
> On 05.09.22 23:34, Tom Lane wrote:
> > Peter Eisentraut  writes:
> >> Why is this being proposed?
> >
> > Andres is annoyed by the long build time of ecpg, which he has to
> > wait for whether he wants to test it or not.  I could imagine that
> > I might disable ecpg testing on my slowest buildfarm animals, too.
> >
> > I suppose maybe we could compromise on inventing --with-ecpg but
> > having it default to "on", so that you have to take positive
> > action if you don't want it.
>
> We already have "make all" vs. "make world" to build just the important
> stuff versus everything.  And we have "make world-bin" to build,
> approximately, everything except the slow stuff.  Let's try to work
> within the existing mechanisms.  For example, removing ecpg from "make
> all" might be sensible.
>
> (Obviously, "all" is then increasingly becoming a lie.  Maybe a renaming
> like "all" -> "core" and "world" -> "all" could be in order.)
>
> The approach with the make targets is better than a configure option,
> because it allows you to build a narrow set of things during development
> and then build everything for final confirmation, without having to
> re-run configure.  Also, it's less confusing for packagers.

Another point is that the --with-FOO options are intended for building
and linking with external library FOO.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: proposal: possibility to read dumped table's name from file

2022-09-09 Thread John Naylor
On Thu, Sep 8, 2022 at 7:32 PM Daniel Gustafsson  wrote:
> [v3]

Note that the grammar has shift-reduce conflicts. If you run a fairly
recent Bison, you can show them like this:

bison -Wno-deprecated -Wcounterexamples -d -o filterparse.c filterparse.y

filterparse.y: warning: 2 shift/reduce conflicts [-Wconflicts-sr]
filterparse.y: warning: shift/reduce conflict on token C_INCLUDE
[-Wcounterexamples]
  Example: • C_INCLUDE include_object pattern
  Shift derivation
Filters
↳ 3: Filter
 ↳ 4: • C_INCLUDE include_object pattern
  Reduce derivation
Filters
↳ 2: Filters  Filter
 ↳ 1: ε • ↳ 4: C_INCLUDE include_object pattern
filterparse.y: warning: shift/reduce conflict on token C_EXCLUDE
[-Wcounterexamples]
  Example: • C_EXCLUDE exclude_object pattern
  Shift derivation
Filters
↳ 3: Filter
 ↳ 5: • C_EXCLUDE exclude_object pattern
  Reduce derivation
Filters
↳ 2: Filters  Filter
 ↳ 1: ε • ↳ 5: C_EXCLUDE exclude_object pattern

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: broken table formatting in psql

2022-09-11 Thread John Naylor
On Thu, Sep 8, 2022 at 12:51 PM Pavel Stehule  wrote:
>
>
>> Does anyone want to advocate for backpatching these missing ranges to
>> v12 and up? v12 still has a table in-line so trivial to remedy, but
>> v13 and up use a script, so these exceptions would likely have to use
>> hard-coded branches to keep from bringing in new changes.
>>
>> If so, does anyone want to advocate for including this patch in v15?
>> It claims Unicode 14.0.0, and this would make that claim more
>> technically correct as well as avoiding additional branches.
>
>
> I think it can be fixed just in v15 and master.  This issue has no impact on 
> SQL.

Well, if the regressions from v11 are not important enough to
backpatch, there is not as much of a case to backpatch the full fix to
v15 either.
-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PATCH] Clarify the comments about varlena header encoding

2022-09-11 Thread John Naylor
On Sun, Sep 11, 2022 at 5:06 PM Aleksander Alekseev
 wrote:
>
> Hi John,
>
> Many thanks for the feedback!
>
> > Or put the now-standard 0b in front.
>
> Good idea.

Now that I look at the results, though, it's distracting and not good
for readability. I'm not actually sure we need to do anything here,
but I am somewhat in favor of putting [un]aligned in parentheses, as
already discussed. Even there, in the first email you said:

> Also one can read this as "aligned, uncompressed
> data" (which again is wrong).

I'm not sure it rises to the level of "wrong", because a blob of bytes
immediately after an aligned uint32 is in fact aligned. The important
thing is: a zero byte is always either a padding byte or part of a
4-byte header, so it's the alignment of the header we really care
about.

> > I think the problem is ambiguity about what a "toast pointer" is. This 
> > comment:
> >
> > * struct varatt_external is a traditional "TOAST pointer", that is, the
>
> Right. The comment for varatt_external says that it IS a TOAST
> pointer.

Well, the word "traditional" is not very informative, but it is there.
And afterwards there is also varatt_indirect, varatt_expanded, and
varattrib_1b_e, which all mention "TOAST pointer".

> However the comments for varlena headers bit layout
> implicitly include it into a TOAST pointer, which contradicts the
> previous comments. I suggest we fix this ambiguity by explicitly
> enumerating the type tag in the comments for varlena headers.

- * 1000 1-byte length word, unaligned, TOAST pointer
+ * 0b1000 1-byte length word (unaligned), type tag, TOAST pointer

This is distracting from the point of this whole comment, which, I
will say again is: How to look at the first byte to determine what
kind of varlena we're looking at. There is no reason to mention the
type tag here, at all.

- * In TOAST pointers the va_tag field (see varattrib_1b_e) is used to discern
- * the specific type and length of the pointer datum.
+ * For the TOAST pointers the type tag (see varattrib_1b_e.va_tag field) is
+ * used to discern the specific type and length of the pointer datum.

I don't think this clarifies anything, it's just a rephrasing.

More broadly, I think the description of varlenas in this header is at
a kind of "local maximum" -- minor adjustments are more likely to make
it worse. To significantly improve clarity might require a larger
rewriting, but I'm not personally interested in taking part in that.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-21 Thread John Naylor
On Thu, Sep 22, 2022 at 1:01 AM Nathan Bossart 
wrote:
>
> On Wed, Sep 21, 2022 at 01:17:21PM +0700, John Naylor wrote:
>
> > In short, this code needs to be lower level so that we still have full
> > control while being portable. I will work on this, and also the related
> > code for node dispatch.
>
> Is it possible to use approach #2 here, too?  AFAICT space is allocated
for
> all of the chunks, so there wouldn't be any danger in searching all them
> and discarding any results >= node->count.

Sure, the caller could pass the maximum node capacity, and then check if
the returned index is within the range of the node count.

> Granted, we're depending on the
> number of chunks always being a multiple of elements-per-vector in order
to
> avoid the tail path, but that seems like a reasonably safe assumption that
> can be covered with comments.

Actually, we don't need to depend on that at all. When I said "junk" above,
that can be any bytes, as long as we're not reading off the end of
allocated memory. We'll never do that here, since the child pointers/values
follow. In that case, the caller can hard-code the  size (it would even
happen to work now to multiply rt_node_kind by 16, to be sneaky). One thing
I want to try soon is storing fewer than 16/32 etc entries, so that the
whole node fits comfortably inside a power-of-two allocation. That would
allow us to use aset without wasting space for the smaller nodes, which
would be faster and possibly would solve the fragmentation problem Andres
referred to in

https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de

While on the subject, I wonder how important it is to keep the chunks in
the small nodes in sorted order. That adds branches and memmove calls, and
is the whole reason for the recent "pg_lfind_ge" function.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [RFC] building postgres with meson - v13

2022-09-29 Thread John Naylor
On Tue, Sep 27, 2022 at 2:41 PM John Naylor 
wrote:
>
> On Tue, Sep 27, 2022 at 2:06 AM Andres Freund  wrote:
> >
> > On 2022-09-26 15:18:29 +0700, John Naylor wrote:

> > Yea, it's /usr/local on x86-64, based on what was required to make
macos CI
> > work. I updated the wiki page, half-blindly - it'd be nice if you could
> > confirm that that works?
>
> Not sure if you intended for me to try the full script in your last
response or just what's in the wiki page, but for the latter (on commit
bed0927aeb0c6), it fails at
>
> [1656/2199] Linking target src/bin/psql/psql
> FAILED: src/bin/psql/psql

Per off-list discussion with Andres, the linking failure was caused by some
env variables set in my bash profile for the sake of Homebrew. After
removing those, the recipe in the wiki worked fine.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-10-05 Thread John Naylor
On Wed, Oct 5, 2022 at 1:46 PM Masahiko Sawada 
wrote:
>
> On Wed, Sep 28, 2022 at 12:49 PM Masahiko Sawada 
wrote:
> >
> > On Fri, Sep 23, 2022 at 12:11 AM John Naylor
> >  wrote:
> > Yeah, node31 and node256 are bloated.  We probably could use slab for
> > node256 independently. It's worth trying a benchmark to see how it
> > affects the performance and the tree size.

This wasn't the focus of your current email, but while experimenting with
v6 I had another thought about local allocation: If we use the default slab
block size of 8192 bytes, then only 3 chunks of size 2088 can fit, right?
If so, since aset and DSA also waste at least a few hundred bytes, we could
store a useless 256-byte slot array within node256. That way, node128 and
node256 share the same start of pointers/values array, so there would be
one less branch for getting that address. In v6, rt_node_get_values and
rt_node_get_children are not inlined (asde: gcc uses a jump table for 5
kinds but not for 4), but possibly should be, and the smaller the better.

> Regarding DSA support, IIUC we need to use dsa_pointer in inner nodes
> to point to its child nodes, instead of C pointers (ig, backend-local
> address). I'm thinking of a straightforward approach as the first
> step; inner nodes have a union of rt_node* and dsa_pointer and we
> choose either one based on whether the radix tree is shared or not. We
> allocate and free the shared memory for individual nodes by
> dsa_allocate() and dsa_free(), respectively. Therefore we need to get
> a C pointer from dsa_pointer by using dsa_get_address() while
> descending the tree. I'm a bit concerned that calling
> dsa_get_address() for every descent could be performance overhead but
> I'm going to measure it anyway.

Are dsa pointers aligned the same as pointers to locally allocated memory?
Meaning, is the offset portion always a multiple of 4 (or 8)? It seems that
way from a glance, but I can't say for sure. If the lower 2 bits of a DSA
pointer are never set, we can tag them the same way as a regular pointer.
That same technique could help hide the latency of converting the pointer,
by the same way it would hide the latency of loading parts of a node into
CPU registers.

One concern is, handling both local and dsa cases in the same code requires
more (predictable) branches and reduces code density. That might be a
reason in favor of templating to handle each case in its own translation
unit. But that might be overkill.
--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-10-06 Thread John Naylor
On Thu, Oct 6, 2022 at 2:53 PM Masahiko Sawada 
wrote:
>
> On Wed, Oct 5, 2022 at 6:40 PM John Naylor 
wrote:
> >
> > This wasn't the focus of your current email, but while experimenting
with v6 I had another thought about local allocation: If we use the default
slab block size of 8192 bytes, then only 3 chunks of size 2088 can fit,
right? If so, since aset and DSA also waste at least a few hundred bytes,
we could store a useless 256-byte slot array within node256. That way,
node128 and node256 share the same start of pointers/values array, so there
would be one less branch for getting that address. In v6,
rt_node_get_values and rt_node_get_children are not inlined (asde: gcc uses
a jump table for 5 kinds but not for 4), but possibly should be, and the
smaller the better.
>
> It would be good for performance but I'm a bit concerned that it's
> highly optimized to the design of aset and DSA. Since size 2088 will
> be currently classed as 2616 in DSA, DSA wastes 528 bytes. However, if
> we introduce a new class of 2304 (=2048 + 256) bytes we cannot store a
> useless 256-byte and the assumption will be broken.

A new DSA class is hypothetical. A better argument against my idea is that
SLAB_DEFAULT_BLOCK_SIZE is arbitrary. FWIW, I looked at the prototype just
now and the slab block sizes are:

Max(pg_nextpower2_32((MAXALIGN(inner_class_info[i].size) + 16) * 32), 1024)

...which would be 128kB for nodemax. I'm curious about the difference.

> > One concern is, handling both local and dsa cases in the same code
requires more (predictable) branches and reduces code density. That might
be a reason in favor of templating to handle each case in its own
translation unit.
>
> Right. We also need to support locking for shared radix tree, which
> would require more branches.

Hmm, now it seems we'll likely want to template local vs. shared as a later
step...

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Reducing the chunk header sizes on all memory context types

2022-10-10 Thread John Naylor
On Tue, Oct 11, 2022 at 5:31 AM David Rowley  wrote:
>
> The proposed patches in [1] do aim to make additional usages of the
> slab allocator, and I have a feeling that we'll want to fix the
> performance of slab.c before those. Perhaps the Asserts are a better
> option if we're to get the proposed radix tree implementation.

Going by [1], that use case is not actually a natural fit for slab because
of memory fragmentation. The motivation to use slab there was that the
allocation sizes are just over a power of two, leading to a lot of wasted
space for aset. FWIW, I have proposed in that thread a scheme to squeeze
things into power-of-two sizes without wasting quite as much space. That's
not a done deal, of course, but it could work today without adding memory
management code.

[1]
https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de

--
John Naylor
EDB: http://www.enterprisedb.com


Re: introduce optimized linear search functions that return index of matching element

2022-10-11 Thread John Naylor
On Sat, Sep 17, 2022 at 12:29 PM Nathan Bossart 
wrote:
>
> On Fri, Sep 16, 2022 at 02:54:14PM +0700, John Naylor wrote:
> > v6 demonstrates why this should have been put off towards the end.
(more below)
>
> Since the SIMD code is fresh in my mind, I wanted to offer my review for
> 0001 in the "Improve dead tuple storage for lazy vacuum" thread [0].
> However, I agree with John that the SIMD part of that work should be left
> for the end

As I mentioned in the radix tree thread, I don't believe this level of
abstraction is appropriate for the intended use case. We'll want to
incorporate some of the low-level simd.h improvements later, so you should
get authorship credit for those.  I've marked the entry "returned with
feedback".

--
John Naylor
EDB: http://www.enterprisedb.com


Re: meson PGXS compatibility

2022-10-12 Thread John Naylor
On Wed, Oct 12, 2022 at 12:50 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:
>
> On 05.10.22 22:07, Andres Freund wrote:
> > 0001: autoconf: Unify CFLAGS_SSE42 and CFLAGS_ARMV8_CRC32C
>
> I wonder, there has been some work lately to use SIMD and such in other
> places.  Would that affect what kinds of extra CPU-specific compiler
> flags and combinations we might need?

In short, no. The functionality added during this cycle only uses
instructions available by default on the platform. CRC on Arm depends on
armv8-a+crc, and CRC on x86-64 depends on SSE4.2. We can't assume those
currently.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-10-09 Thread John Naylor
of, we can discuss further. I
don't pretend to know the practical consequences of every change I mention.

- If you have started coding the shared memory case, I'd advise to continue
so we can see what that looks like. If that has not gotten beyond the
design stage, I'd like to first see an attempt at tearing down some of the
clumsier abstractions in the current patch.
- As a "smoke test", there should ideally be nothing as general as
rt_node_get_children/values(). We should ideally always know what kind we
are if we found out earlier.
- For distinguishing between linear nodes, perhaps some always-inline
functions can help hide details. But at the same time, trying to treat them
the same is not always worthwhile.
- Start to separate treatment of inner/leaves and see how it goes.
- I firmly believe we only need 4 node *kinds*, and later we can decouple
the size classes as a separate concept. I'm willing to put serious time
into that once the broad details are right. I will also investigate pointer
tagging if we can confirm that can work similarly for dsa pointers.

Regarding size class decoupling, I'll respond to a point made earlier:

On Fri, Sep 30, 2022 at 10:47 PM Masahiko Sawada 
wrote:
> With this idea, we can just repalloc() to grow to the larger size in a
> pair but I'm slightly concerned that the more size class we use, the
> more frequent the node needs to grow.

Well, yes, but that's orthogonal. For example, v6 has 5 node kinds. Imagine
that we have 4 node kinds, but the SIMD node kind used 2 size classes. Then
the nodes would grow at *exactly* the same frequency as they do today. I
listed many ways a size class could fit into a power-of-two (and there are
more), but we have a choice in how many to actually use. It's a trade off
between memory usage and complexity.

> If we want to support node
> shrink, the deletion is also affected.

Not necessarily. We don't have to shrink at the same granularity as
growing. My evidence is simple: we don't shrink at all now. :-)

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-10-09 Thread John Naylor
On Mon, Oct 10, 2022 at 12:16 PM John Naylor 
wrote:
> Thanks for that! Now I can show clear results on some aspects in a simple
way. The attached patches (apply on top of v6)

Forgot the patchset...

--
John Naylor
EDB: http://www.enterprisedb.com


radix-v6-addendum-jcn1.tar.gz
Description: application/gzip


Re: Remove unnecessary commas for goto labels

2022-10-10 Thread John Naylor
On Sun, Oct 9, 2022 at 10:08 AM Julien Rouhaud  wrote:
>
> Hi,
>
> On Sun, Oct 09, 2022 at 07:42:58AM +0800, Japin Li wrote:
> >
> > Hi hackers,
> >
> > I find there are some unnecessary commas for goto lables,
> > attached a patch to remove them.
>
> You mean semi-colon?  +1, and a quick regex later I don't see any other
> occurrence.

Interestingly, I did find that in C, some statement is needed after a
label, even an empty one, otherwise it won't compile. That's not the case
here, though, so pushed.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Insertion Sort Improvements

2022-09-27 Thread John Naylor
On Tue, Sep 27, 2022 at 4:39 PM Benjamin Coutu  wrote:
>
> Getting back to improvements for small sort runs, it might make sense to
consider using in-register based sorting via sorting networks for very
small runs.

> It looks like some of the commercial DBMSs do this very successfully.

"Looks like"?

> They use 4 512bit registers (AVX-512) in this example, which could
technically store 4 * 4 sort-elements (given that the sorting key is 64 bit
and the tuple pointer is 64 bit). I wonder whether this could also be done
with just SSE (instead of AVX), which the project now readily supports
thanks to your recent efforts?

SortTuples are currently 24 bytes, and supported vector registers are 16
bytes, so not sure how you think that would work.

More broadly, the more invasive a change is, the more risk is involved, and
the more effort to test and evaluate. If you're serious about trying to
improve insertion sort performance, the simple idea we discussed at the
start of the thread is a much more modest step that has a good chance of
justifying the time put into it. That is not to say it's easy, however,
because testing is a non-trivial amount of work.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-28 Thread John Naylor
On Wed, Sep 28, 2022 at 10:49 AM Masahiko Sawada 
wrote:

> BTW We need to consider not only aset/slab but also DSA since we
> allocate dead tuple TIDs on DSM in parallel vacuum cases. FYI DSA uses
> the following size classes:
>
> static const uint16 dsa_size_classes[] = {
> [...]

Thanks for that info -- I wasn't familiar with the details of DSA. For the
non-parallel case, I plan to at least benchmark using aset because I gather
it's the most heavily optimized. I'm thinking that will allow other problem
areas to be more prominent. I'll also want to compare total context size
compared to slab to see if possibly less fragmentation makes up for other
wastage.

Along those lines, one thing I've been thinking about is the number of size
classes. There is a tradeoff between memory efficiency and number of
branches when searching/inserting. My current thinking is there is too much
coupling between size class and data type. Each size class currently uses a
different data type and a different algorithm to search and set it, which
in turn requires another branch. We've found that a larger number of size
classes leads to poor branch prediction [1] and (I imagine) code density.

I'm thinking we can use "flexible array members" for the values/pointers,
and keep the rest of the control data in the struct the same. That way, we
never have more than 4 actual "kinds" to code and branch on. As a bonus,
when migrating a node to a larger size class of the same kind, we can
simply repalloc() to the next size. To show what I mean, consider this new
table:

node2:   5 +  6   +(5)+  2*8 =   32 bytes
node6:   5 +  6   +(5)+  6*8 =   64

node12:  5 + 27   + 12*8 =  128
node27:  5 + 27   + 27*8 =  248(->256)

node91:  5 + 256 + 28 +(7)+ 91*8 = 1024
node219: 5 + 256 + 28 +(7)+219*8 = 2048

node256: 5 + 32   +(3)+256*8 = 2088(->4096)

Seven size classes are grouped into the four kinds.

The common base at the front is here 5 bytes because there is a new uint8
field for "capacity", which we can ignore for node256 since we assume we
can always insert/update that node. The control data is the same in each
pair, and so the offset to the pointer/value array is the same. Thus,
migration would look something like:

case FOO_KIND:
if (unlikely(count == capacity))
{
  if (capacity == XYZ) /* for smaller size class of the pair */
  {
;
capacity = next-higher-capacity;
goto do_insert;
  }
  else
;
}
else
{
do_insert:
  <...>;
  break;
}
/* FALLTHROUGH */
...

One disadvantage is that this wastes some space by reserving the full set
of control data in the smaller size class of the pair, but it's usually
small compared to array size. Somewhat unrelated, we could still implement
Andres' idea [1] to dispense with the isset array in inner nodes of the
indirect array type (now node128), since we can just test if the pointer is
null.

[1]
https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de

--
John Naylor
EDB: http://www.enterprisedb.com


Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-10-06 Thread John Naylor
On Fri, Sep 16, 2022 at 1:01 PM Masahiko Sawada 
wrote:
> In addition to two patches, I've attached the third patch. It's not
> part of radix tree implementation but introduces a contrib module
> bench_radix_tree, a tool for radix tree performance benchmarking. It
> measures loading and lookup performance of both the radix tree and a
> flat array.

Hi Masahiko, I've been using these benchmarks, along with my own
variations, to try various things that I've mentioned. I'm long overdue for
an update, but the picture is not yet complete.

For now, I have two questions that I can't figure out on my own:

1. There seems to be some non-obvious limit on the number of keys that are
loaded (or at least what the numbers report). This is independent of the
number of tids per block. Example below:

john=# select * from bench_shuffle_search(0, 8*1000*1000);
NOTICE:  num_keys = 800, height = 3, n4 = 0, n16 = 1, n32 = 0, n128 =
25, n256 = 981
  nkeys  | rt_mem_allocated | array_mem_allocated | rt_load_ms |
array_load_ms | rt_search_ms | array_serach_ms
-+--+-++---+--+-
 800 |268435456 |4800 |661 |
 29 |  276 | 389

john=# select * from bench_shuffle_search(0, 9*1000*1000);
NOTICE:  num_keys = 8388608, height = 3, n4 = 0, n16 = 1, n32 = 0, n128 =
262144, n256 = 1028
  nkeys  | rt_mem_allocated | array_mem_allocated | rt_load_ms |
array_load_ms | rt_search_ms | array_serach_ms
-+--+-++---+--+-
 8388608 |276824064 |5400 |718 |
 33 |  311 | 446

The array is the right size, but nkeys hasn't kept pace. Can you reproduce
this? Attached is the patch I'm using to show the stats when running the
test. (Side note: The numbers look unfavorable for radix tree because I'm
using 1 tid per block here.)

2. I found that bench_shuffle_search() is much *faster* for traditional
binary search on an array than bench_seq_search(). I've found this to be
true in every case. This seems counterintuitive to me -- any idea why this
is? Example:

john=# select * from bench_seq_search(0, 100);
NOTICE:  num_keys = 100, height = 2, n4 = 0, n16 = 0, n32 = 31251, n128
= 1, n256 = 122
  nkeys  | rt_mem_allocated | array_mem_allocated | rt_load_ms |
array_load_ms | rt_search_ms | array_serach_ms
-+--+-++---+--+-
 100 | 10199040 |   18000 |168 |
106 |  827 |3348

john=# select * from bench_shuffle_search(0, 100);
NOTICE:  num_keys = 100, height = 2, n4 = 0, n16 = 0, n32 = 31251, n128
= 1, n256 = 122
  nkeys  | rt_mem_allocated | array_mem_allocated | rt_load_ms |
array_load_ms | rt_search_ms | array_serach_ms
-+--+-++---+--+-
 100 | 10199040 |   18000 |171 |
107 |  827 |    1400

--
John Naylor
EDB: http://www.enterprisedb.com
From 43a50a385930ee340d0a3b003910c704a0ff342c Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Thu, 6 Oct 2022 09:07:41 +0700
Subject: [PATCH v65 1/5] Turn on per-node counts in benchmark

Also add gitigore, fix whitespace, and change to NOTICE
---
 contrib/bench_radix_tree/.gitignore | 3 +++
 contrib/bench_radix_tree/bench_radix_tree.c | 5 +
 src/backend/lib/radixtree.c | 2 +-
 src/include/lib/radixtree.h | 2 +-
 4 files changed, 10 insertions(+), 2 deletions(-)
 create mode 100644 contrib/bench_radix_tree/.gitignore

diff --git a/contrib/bench_radix_tree/.gitignore b/contrib/bench_radix_tree/.gitignore
new file mode 100644
index 00..8830f5460d
--- /dev/null
+++ b/contrib/bench_radix_tree/.gitignore
@@ -0,0 +1,3 @@
+*data
+log/*
+results/*
diff --git a/contrib/bench_radix_tree/bench_radix_tree.c b/contrib/bench_radix_tree/bench_radix_tree.c
index 5806ef7519..36c5218ae7 100644
--- a/contrib/bench_radix_tree/bench_radix_tree.c
+++ b/contrib/bench_radix_tree/bench_radix_tree.c
@@ -13,6 +13,7 @@
 #include "fmgr.h"
 #include "funcapi.h"
 #include "lib/radixtree.h"
+#include 
 #include "miscadmin.h"
 #include "utils/timestamp.h"
 
@@ -183,6 +184,8 @@ bench_search(FunctionCallInfo fcinfo, bool shuffle)
 	TimestampDifference(start_time, end_time, , );
 	rt_load_ms = secs * 1000 + usecs / 1000;
 
+	rt_stats(rt);
+
 	/* measure the load time of the array */
 	itemptrs = MemoryContextAllocHuge(CurrentMemoryContext,
 	  sizeof(ItemPointerData) * ntids);
@@ -292,6 +295,8 @@ bench_load_random_int(PG_FUNCTION_ARGS)
 	TimestampDifference(start_time, end_time, , );
 	load_time_ms = secs * 1000 + usecs / 1000;
 
+	

Re: fix typos

2022-08-04 Thread John Naylor
On Thu, Aug 4, 2022 at 8:41 PM Tom Lane  wrote:
>
> John Naylor  writes:

> > RepOriginId is a typedef for uint16, so this can't print the wrong
answer,
> > but it is inconsistent with other uses. So it seems we don't need to
> > backpatch this one?
>
> Um ... if it's int16, then it can't be an OID, so I'd say this message has
> far worse problems than %d vs %u.  It should not use that terminology.

The catalog has the following. Since it's not a real oid, maybe this column
should be rethought?

CATALOG(pg_replication_origin,6000,ReplicationOriginRelationId)
BKI_SHARED_RELATION
{
/*
* Locally known id that get included into WAL.
*
* This should never leave the system.
*
* Needs to fit into an uint16, so we don't waste too much space in WAL
* records. For this reason we don't use a normal Oid column here, since
* we need to handle allocation of new values manually.
*/
Oid roident;
[...]

--
John Naylor
EDB: http://www.enterprisedb.com


Re: optimize lookups in snapshot [sub]xip arrays

2022-08-04 Thread John Naylor
On Fri, Aug 5, 2022 at 5:15 AM Nathan Bossart 
wrote:
>
> On Thu, Aug 04, 2022 at 02:58:14PM +0700, John Naylor wrote:
> > Were you considering adding the new function to simd.h now that that's
> > committed? It's a bit up in the air what should go in there, but this
new
> > function is low-level and generic enough to be a candidate...
>
> I don't have a strong opinion.  I went with a separate file because I
> envisioned a variety of possible linear search functions (e.g., char,
> uint16, uint32), and some might not use SIMD instructions.  Futhermore, it
> seemed less obvious to look in simd.h for linear search functions.

That is a good point. Maybe potential helpers in simd.h should only deal
specifically with vector registers, with it's users providing C fallbacks.
I don't have any good ideas of where to put the new function, though.

> > I wonder if the "pg_" prefix is appropriate here, as that is most often
> > used for things that hide specific details *and* where the base name
would
> > clash, like OS calls or C library functions. I'm not quite sure where
the
> > line is drawn, but I mean that "linearsearch" is a generic algorithm and
> > not a specific API we are implementing, if that makes sense.
>
> Yeah, I was concerned about clashing with lsearch() and lfind().  I will
> drop the prefix.

Hmm, I didn't know about those. lfind() is similar enough that it would
make sense to have pg_lfind32() etc in src/include/port/pg_lsearch.h, at
least for the v4 version that returns the pointer. We already declare
bsearch_arg() in src/include/port.h and that's another kind of array
search. Returning bool is different enough to have a different name.
pg_lfind32_ispresent()?  *_noptr()? Meh.

Having said all that, the man page under BUGS [1] says "The naming is
unfortunate."

> > Out of curiosity do we know how much we get by loading four registers
> > rather than two?
>
> The small program I've been using for testing takes about 40% more time
> with the two register approach.  The majority of this test involves
> searching for elements that either don't exist in the array or that live
> near the end of the array, so this is probably close to the worst case.

Ok, sounds good.

[1] https://man7.org/linux/man-pages/man3/lsearch.3.html#BUGS

--
John Naylor
EDB: http://www.enterprisedb.com


use SSE2 for is_valid_ascii

2022-08-10 Thread John Naylor
new thread [was: WIP Patch: Add a function that returns binary JSONB as a bytea]

> I wrote:
> > We can also shave a
> > few percent by having pg_utf8_verifystr use SSE2 for the ascii path. I
> > can look into this.
>
> Here's a patch for that. If the input is mostly ascii, I'd expect that
> part of the flame graph to shrink by 40-50% and give a small boost
> overall.

Here is an updated patch using the new USE_SSE2 symbol. The style is
different from the last one in that each stanza has platform-specific
code. I wanted to try it this way because is_valid_ascii() is already
written in SIMD-ish style using general purpose registers and bit
twiddling, so it seemed natural to see the two side-by-side. Sometimes
they can share the same comment. If we think this is bad for
readability, I can go back to one block each, but that way leads to
duplication of code and it's difficult to see what's different for
each platform, IMO.

--
John Naylor
EDB: http://www.enterprisedb.com
From 69d56a21192ed2f03bc08f078cfff7ba5cb0d80b Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Fri, 5 Aug 2022 13:30:47 +0700
Subject: [PATCH v2] Use SSE2 in is_valid_ascii where available.

This is ~40% faster than portable C on contemporary Intel hardware.
---
 src/common/wchar.c   | 18 +++--
 src/include/mb/pg_wchar.h| 50 ++--
 src/test/regress/expected/conversion.out |  3 +-
 src/test/regress/sql/conversion.sql  |  3 +-
 4 files changed, 57 insertions(+), 17 deletions(-)

diff --git a/src/common/wchar.c b/src/common/wchar.c
index 1e6e198bf2..a305e0e66b 100644
--- a/src/common/wchar.c
+++ b/src/common/wchar.c
@@ -1918,26 +1918,20 @@ pg_utf8_verifystr(const unsigned char *s, int len)
 	const int	orig_len = len;
 	uint32		state = BGN;
 
-/*
- * Sixteen seems to give the best balance of performance across different
- * byte distributions.
- */
-#define STRIDE_LENGTH 16
-
-	if (len >= STRIDE_LENGTH)
+	if (len >= ASCII_CHECK_LEN)
 	{
-		while (len >= STRIDE_LENGTH)
+		while (len >= ASCII_CHECK_LEN)
 		{
 			/*
 			 * If the chunk is all ASCII, we can skip the full UTF-8 check,
 			 * but we must first check for a non-END state, which means the
 			 * previous chunk ended in the middle of a multibyte sequence.
 			 */
-			if (state != END || !is_valid_ascii(s, STRIDE_LENGTH))
-utf8_advance(s, , STRIDE_LENGTH);
+			if (state != END || !is_valid_ascii(s, ASCII_CHECK_LEN))
+utf8_advance(s, , ASCII_CHECK_LEN);
 
-			s += STRIDE_LENGTH;
-			len -= STRIDE_LENGTH;
+			s += ASCII_CHECK_LEN;
+			len -= ASCII_CHECK_LEN;
 		}
 
 		/* The error state persists, so we only need to check for it here. */
diff --git a/src/include/mb/pg_wchar.h b/src/include/mb/pg_wchar.h
index 011b0b3abd..cbb1dfe978 100644
--- a/src/include/mb/pg_wchar.h
+++ b/src/include/mb/pg_wchar.h
@@ -19,6 +19,8 @@
 #ifndef PG_WCHAR_H
 #define PG_WCHAR_H
 
+#include "port/simd.h"
+
 /*
  * The pg_wchar type
  */
@@ -704,25 +706,57 @@ extern WCHAR *pgwin32_message_to_UTF16(const char *str, int len, int *utf16len);
  * Verify a chunk of bytes for valid ASCII.
  *
  * Returns false if the input contains any zero bytes or bytes with the
- * high-bit set. Input len must be a multiple of 8.
+ * high-bit set. Input len must be a multiple of the chunk size (8 or 16).
+ * Since the chunk size is platform-specific, we provide the ASCII_CHECK_LEN
+ * convenience macro for callers to pass for len.
  */
 static inline bool
 is_valid_ascii(const unsigned char *s, int len)
 {
 	const unsigned char *const s_end = s + len;
+
+#ifdef USE_SSE2
+	__m128i		chunk,
+error_cum = _mm_setzero_si128();
+#else
 	uint64		chunk,
 highbit_cum = UINT64CONST(0),
 zero_cum = UINT64CONST(0x8080808080808080);
+#endif
+
+	/*
+	 * With two chunks, gcc can unroll the loop. Even if the compiler can
+	 * unroll a longer loop, it's not worth it because callers might have to
+	 * use a byte-wise algorithm if we return false.
+	 */
+#ifdef USE_SSE2
+#define ASCII_CHECK_LEN (2 * sizeof(__m128i))
+#else
+#define ASCII_CHECK_LEN (2 * sizeof(uint64))
+#endif
 
 	Assert(len % sizeof(chunk) == 0);
 
 	while (s < s_end)
 	{
+#ifdef USE_SSE2
+		chunk = _mm_loadu_si128((const __m128i *) s);
+#else
 		memcpy(, s, sizeof(chunk));
+#endif
+
+		/* Capture any zero bytes in this chunk. */
+#ifdef USE_SSE2
+
+		/*
+		 * Set all bits in each lane of the error accumulator where input
+		 * bytes are zero.
+		 */
+		error_cum = _mm_or_si128(error_cum,
+ _mm_cmpeq_epi8(chunk, _mm_setzero_si128()));
+#else
 
 		/*
-		 * Capture any zero bytes in this chunk.
-		 *
 		 * First, add 0x7f to each byte. This sets the high bit in each byte,
 		 * unless it was a zero. If any resulting high bits are zero, the
 		 * corresponding high bits in the zero accumulator will be cleared.
@@ -734,13 +768,22 @@ is_valid_ascii(const unsigned char *s, int len)
 		 * because we check for those separately.
 		 */
 		zero

Re: optimize lookups in snapshot [sub]xip arrays

2022-08-09 Thread John Naylor
On Wed, Aug 10, 2022 at 7:13 AM Nathan Bossart  wrote:
>
> On Tue, Aug 09, 2022 at 01:00:37PM -0700, Nathan Bossart wrote:
> > Your adjustments in 0002 seem reasonable to me.  I think it makes sense to
> > ensure there is test coverage for pg_lfind32(), but I don't know if that
> > syscache code is the right choice.  For non-USE_SSE2 builds, it might make
> > these lookups more expensive.

Yeah.

On Wed, Aug 10, 2022 at 9:25 AM Masahiko Sawada  wrote:
> I think that for non-USE_SSE2 builds, there is no additional overhead
> as all assertion-related code in pg_lfind32 depends on USE_SSE2.

Nathan is referring to RelationSupportsSysCache() and
RelationHasSysCache(). They currently use binary search and using
linear search on non-x86-64 platforms is probably slower.

[Nathan again]
> One option might be to create a small test module for pg_lfind32().  Here
> is an example.

LGTM, let's see what the buildfarm thinks of 0001.

--
John Naylor
EDB: http://www.enterprisedb.com




<    4   5   6   7   8   9   10   11   12   13   >