Hi

As reported in the «effect of JIT tuple deform?» thread, there are for some 
cases slowdowns when using JIT tuple deforming.
I've played with the generated code and with the LLVM optimizer trying to fix 
that issue, here are the results of my experiments, with the corresponding 
patches.

All performance measurements are done following the test from 
https://www.postgresql.org/message-id/CAFj8pRAOcSXNnykfH=M6mNaHo
+g=FaUs=dldzsohdjbkujr...@mail.gmail.com

Base measurements : 

No JIT : 850ms 
JIT without tuple deforming : 820 ms (0.2ms optimizing)
JIT with tuple deforming, no opt : 1650 ms (1.5ms)
JIT with tuple deforming, -O3 : 770 ms (105ms)

1) force a -O1 when deforming

This is by far the best I managed to get. With -O1, the queries are even 
faster than with -O3 since the optimizer is faster, while generating an 
already efficient code.
I have tried adding the right passes to the passmanager, but it looks like the 
interesting ones are not available unless you enable -O1.

JIT with tuple deforming, -O1 : 725 ms (54ms)

2) improve the LLVM IR code

The code generator in llvmjit-deform.c currently rely on the LLVM optimizer to 
do the right thing. For instance, it can generate a lot of empty blocks with 
only a jump. If we don't want to enable the LLVM optimizer for every code, we 
have to get rid of this kind of pattern. The attached patch does that. When 
the optimizer is not used, this gets a few cycles boost, nothing impressive.
I have tried to go closer to the optimized bitcode, but it requires building 
phi nodes manually instead of using alloca, and this isn't enough to bring us 
to the performance level of -O1.

JIT with tuple deforming, no opt : 1560 ms (1.5ms)

3) *experimental* : faster non-NULL handling

Currently, the generated code always look at the tuple header bitfield to 
check each field null-ness, using afterwards an and against the hasnulls bit.
Checking only for hasnulls improves performance when there are mostly null-
less tuples, but taxes the performance when nulls are found.
I have not yet suceeded in implementing it, but I think that using the 
statistics collected for a given table, we could use that when we know that we 
may benefit from it.

JIT with tuple deforming, no opt : 1520 ms (1.5ms)
JIT with tuple deforming, -O1 : 690 ms (54ms)
>From 14a226107f845454676a2e14ae0fb843a5b4f668 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet <p.p...@pinaraf.info>
Date: Wed, 11 Jul 2018 23:41:59 +0200
Subject: [PATCH 1/2] Check for the hasnulls attribute before checking
 individual fields

---
 src/backend/jit/llvm/llvmjit_deform.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c
index 795f67114e..c53855eb63 100644
--- a/src/backend/jit/llvm/llvmjit_deform.c
+++ b/src/backend/jit/llvm/llvmjit_deform.c
@@ -48,6 +48,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 	LLVMBasicBlockRef b_out;
 	LLVMBasicBlockRef b_dead;
 	LLVMBasicBlockRef *attcheckattnoblocks;
+	LLVMBasicBlockRef *attfaststartblocks;
 	LLVMBasicBlockRef *attstartblocks;
 	LLVMBasicBlockRef *attisnullblocks;
 	LLVMBasicBlockRef *attcheckalignblocks;
@@ -145,6 +146,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 	b = LLVMCreateBuilder();
 
 	attcheckattnoblocks = palloc(sizeof(LLVMBasicBlockRef) * natts);
+	attfaststartblocks = palloc(sizeof(LLVMBasicBlockRef) * natts);
 	attstartblocks = palloc(sizeof(LLVMBasicBlockRef) * natts);
 	attisnullblocks = palloc(sizeof(LLVMBasicBlockRef) * natts);
 	attcheckalignblocks = palloc(sizeof(LLVMBasicBlockRef) * natts);
@@ -239,6 +241,8 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 	{
 		attcheckattnoblocks[attnum] =
 			l_bb_append_v(v_deform_fn, "block.attr.%d.attcheckattno", attnum);
+		attfaststartblocks[attnum] =
+			l_bb_append_v(v_deform_fn, "block.attr.%d.faststart", attnum);
 		attstartblocks[attnum] =
 			l_bb_append_v(v_deform_fn, "block.attr.%d.start", attnum);
 		attisnullblocks[attnum] =
@@ -337,7 +341,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 
 		/*
 		 * If this is the first attribute, slot->tts_nvalid was 0. Therefore
-		 * reset offset to 0 to, it be from a previous execution.
+		 * reset offset to 0 too, it could be from a previous execution.
 		 */
 		if (attnum == 0)
 		{
@@ -351,7 +355,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 		 */
 		if (attnum <= guaranteed_column_number)
 		{
-			LLVMBuildBr(b, attstartblocks[attnum]);
+			LLVMBuildBr(b, attfaststartblocks[attnum]);
 		}
 		else
 		{
@@ -361,8 +365,19 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 									 l_attno,
 									 v_maxatt,
 									 "heap_natts");
-			LLVMBuildCondBr(b, v_islast, b_out, attstartblocks[attnum]);
+			LLVMBuildCondBr(b, v_islast, b_out, attfaststartblocks[attnum]);
 		}
+
+		LLVMPositionBuilderAtEnd(b, attfaststartblocks[attnum]);
+
+		/*
+		 * Fast-start block : check if there can be nulls on tuple.
+		 * If not, jump straight to align checks.
+		 */
+		{
+			LLVMBuildCondBr(b, v_hasnulls, attstartblocks[attnum], attcheckalignblocks[attnum]);
+		}
+
 		LLVMPositionBuilderAtEnd(b, attstartblocks[attnum]);
 
 		/*
@@ -375,7 +390,6 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 			LLVMBasicBlockRef b_ifnotnull;
 			LLVMBasicBlockRef b_ifnull;
 			LLVMBasicBlockRef b_next;
-			LLVMValueRef v_attisnull;
 			LLVMValueRef v_nullbyteno;
 			LLVMValueRef v_nullbytemask;
 			LLVMValueRef v_nullbyte;
@@ -399,9 +413,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 									  l_int8_const(0),
 									  "attisnull");
 
-			v_attisnull = LLVMBuildAnd(b, v_hasnulls, v_nullbit, "");
-
-			LLVMBuildCondBr(b, v_attisnull, b_ifnull, b_ifnotnull);
+			LLVMBuildCondBr(b, v_nullbit, b_ifnull, b_ifnotnull);
 
 			LLVMPositionBuilderAtEnd(b, b_ifnull);
 
-- 
2.18.0

>From 4da278ee49b91d34120747c6763c248ad52da7b7 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet <p.p...@pinaraf.info>
Date: Mon, 2 Jul 2018 13:44:10 +0200
Subject: [PATCH] Introduce opt1 in LLVM/JIT, and force it with deforming

---
 src/backend/jit/llvm/llvmjit.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 5d0cdab1fc..025319e9c1 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -91,10 +91,12 @@ static const char *llvm_layout = NULL;
 
 
 static LLVMTargetMachineRef llvm_opt0_targetmachine;
+static LLVMTargetMachineRef llvm_opt1_targetmachine;
 static LLVMTargetMachineRef llvm_opt3_targetmachine;
 
 static LLVMTargetRef llvm_targetref;
 static LLVMOrcJITStackRef llvm_opt0_orc;
+static LLVMOrcJITStackRef llvm_opt1_orc;
 static LLVMOrcJITStackRef llvm_opt3_orc;
 
 
@@ -277,6 +279,8 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 #if LLVM_VERSION_MAJOR < 5
 	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt0_orc, funcname)))
 		return (void *) (uintptr_t) addr;
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt1_orc, funcname)))
+		return (void *) (uintptr_t) addr;
 	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt3_orc, funcname)))
 		return (void *) (uintptr_t) addr;
 #else
@@ -284,6 +288,10 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 		elog(ERROR, "failed to look up symbol \"%s\"", funcname);
 	if (addr)
 		return (void *) (uintptr_t) addr;
+	if (LLVMOrcGetSymbolAddress(llvm_opt1_orc, &addr, funcname))
+		elog(ERROR, "failed to look up symbol \"%s\"", funcname);
+	if (addr)
+		return (void *) (uintptr_t) addr;
 	if (LLVMOrcGetSymbolAddress(llvm_opt3_orc, &addr, funcname))
 		elog(ERROR, "failed to look up symbol \"%s\"", funcname);
 	if (addr)
@@ -420,6 +428,8 @@ llvm_optimize_module(LLVMJitContext *context, LLVMModuleRef module)
 
 	if (context->base.flags & PGJIT_OPT3)
 		compile_optlevel = 3;
+	else if (context->base.flags & PGJIT_DEFORM)
+		compile_optlevel = 1;
 	else
 		compile_optlevel = 0;
 
@@ -491,6 +501,8 @@ llvm_compile_module(LLVMJitContext *context)
 
 	if (context->base.flags & PGJIT_OPT3)
 		compile_orc = llvm_opt3_orc;
+	else if (context->base.flags & PGJIT_DEFORM)
+		compile_orc = llvm_opt1_orc;
 	else
 		compile_orc = llvm_opt0_orc;
 
@@ -646,6 +658,11 @@ llvm_session_initialize(void)
 								LLVMCodeGenLevelNone,
 								LLVMRelocDefault,
 								LLVMCodeModelJITDefault);
+	llvm_opt1_targetmachine =
+		LLVMCreateTargetMachine(llvm_targetref, llvm_triple, cpu, features,
+								LLVMCodeGenLevelLess,
+								LLVMRelocDefault,
+								LLVMCodeModelJITDefault);
 	llvm_opt3_targetmachine =
 		LLVMCreateTargetMachine(llvm_targetref, llvm_triple, cpu, features,
 								LLVMCodeGenLevelAggressive,
@@ -661,12 +678,14 @@ llvm_session_initialize(void)
 	LLVMLoadLibraryPermanently(NULL);
 
 	llvm_opt0_orc = LLVMOrcCreateInstance(llvm_opt0_targetmachine);
+	llvm_opt1_orc = LLVMOrcCreateInstance(llvm_opt1_targetmachine);
 	llvm_opt3_orc = LLVMOrcCreateInstance(llvm_opt3_targetmachine);
 
 #if defined(HAVE_DECL_LLVMORCREGISTERGDB) && HAVE_DECL_LLVMORCREGISTERGDB
 	if (jit_debugging_support)
 	{
 		LLVMOrcRegisterGDB(llvm_opt0_orc);
+		LLVMOrcRegisterGDB(llvm_opt1_orc);
 		LLVMOrcRegisterGDB(llvm_opt3_orc);
 	}
 #endif
@@ -674,6 +693,7 @@ llvm_session_initialize(void)
 	if (jit_profiling_support)
 	{
 		LLVMOrcRegisterPerf(llvm_opt0_orc);
+		LLVMOrcRegisterPerf(llvm_opt1_orc);
 		LLVMOrcRegisterPerf(llvm_opt3_orc);
 	}
 #endif
@@ -700,6 +720,16 @@ llvm_shutdown(int code, Datum arg)
 		llvm_opt3_orc = NULL;
 	}
 
+	if (llvm_opt1_orc)
+	{
+#if defined(HAVE_DECL_LLVMORCREGISTERPERF) && HAVE_DECL_LLVMORCREGISTERPERF
+		if (jit_profiling_support)
+			LLVMOrcUnregisterPerf(llvm_opt1_orc);
+#endif
+		LLVMOrcDisposeInstance(llvm_opt1_orc);
+		llvm_opt1_orc = NULL;
+	}
+
 	if (llvm_opt0_orc)
 	{
 #if defined(HAVE_DECL_LLVMORCREGISTERPERF) && HAVE_DECL_LLVMORCREGISTERPERF
-- 
2.18.0

>From c5983755a3d3470a072c8fc6169a98fa79b0b75e Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet <p.p...@pinaraf.info>
Date: Fri, 13 Jul 2018 09:45:44 +0200
Subject: [PATCH] Skip alignment code blocks when they are not needed

---
 src/backend/jit/llvm/llvmjit_deform.c | 57 ++++++++++++++++-----------
 1 file changed, 35 insertions(+), 22 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c
index 795f67114e..d042e4fba5 100644
--- a/src/backend/jit/llvm/llvmjit_deform.c
+++ b/src/backend/jit/llvm/llvmjit_deform.c
@@ -331,6 +331,34 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 		LLVMValueRef l_attno = l_int16_const(attnum);
 		LLVMValueRef v_attdatap;
 		LLVMValueRef v_resultp;
+		bool			has_alignment;
+		bool			has_alignment_check;
+		LLVMBasicBlockRef	align_block_target;
+
+		/* determine required alignment */
+		if (att->attalign == 'i')
+			alignto = ALIGNOF_INT;
+		else if (att->attalign == 'c')
+			alignto = 1;
+		else if (att->attalign == 'd')
+			alignto = ALIGNOF_DOUBLE;
+		else if (att->attalign == 's')
+			alignto = ALIGNOF_SHORT;
+		else
+		{
+			elog(ERROR, "unknown alignment");
+			alignto = 0;
+		}
+
+		has_alignment = (alignto > 1 &&
+			(known_alignment < 0 || known_alignment != TYPEALIGN(alignto, known_alignment)));
+		has_alignment_check = has_alignment && (att->attlen == -1);
+		if (has_alignment_check)
+			align_block_target = attcheckalignblocks[attnum];
+		else if (has_alignment)
+			align_block_target = attalignblocks[attnum];
+		else
+			align_block_target = attstoreblocks[attnum];
 
 		/* build block checking whether we did all the necessary attributes */
 		LLVMPositionBuilderAtEnd(b, attcheckattnoblocks[attnum]);
@@ -363,6 +391,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 									 "heap_natts");
 			LLVMBuildCondBr(b, v_islast, b_out, attstartblocks[attnum]);
 		}
+
 		LLVMPositionBuilderAtEnd(b, attstartblocks[attnum]);
 
 		/*
@@ -381,7 +410,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 			LLVMValueRef v_nullbyte;
 			LLVMValueRef v_nullbit;
 
-			b_ifnotnull = attcheckalignblocks[attnum];
+			b_ifnotnull = align_block_target;
 			b_ifnull = attisnullblocks[attnum];
 
 			if (attnum + 1 == natts)
@@ -420,27 +449,12 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 		else
 		{
 			/* nothing to do */
-			LLVMBuildBr(b, attcheckalignblocks[attnum]);
+			LLVMBuildBr(b, align_block_target);
 			LLVMPositionBuilderAtEnd(b, attisnullblocks[attnum]);
-			LLVMBuildBr(b, attcheckalignblocks[attnum]);
+			LLVMBuildBr(b, align_block_target);
 		}
 		LLVMPositionBuilderAtEnd(b, attcheckalignblocks[attnum]);
 
-		/* determine required alignment */
-		if (att->attalign == 'i')
-			alignto = ALIGNOF_INT;
-		else if (att->attalign == 'c')
-			alignto = 1;
-		else if (att->attalign == 'd')
-			alignto = ALIGNOF_DOUBLE;
-		else if (att->attalign == 's')
-			alignto = ALIGNOF_SHORT;
-		else
-		{
-			elog(ERROR, "unknown alignment");
-			alignto = 0;
-		}
-
 		/* ------
 		 * Even if alignment is required, we can skip doing it if provably
 		 * unnecessary:
@@ -450,8 +464,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 		 *   is compatible with current column.
 		 * ------
 		 */
-		if (alignto > 1 &&
-			(known_alignment < 0 || known_alignment != TYPEALIGN(alignto, known_alignment)))
+		if (has_alignment)
 		{
 			/*
 			 * When accessing a varlena field we have to "peek" to see if we
@@ -462,7 +475,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 			 * length word, or the first byte of a correctly aligned 4-byte
 			 * length word; in either case we need not align.
 			 */
-			if (att->attlen == -1)
+			if (has_alignment_check)
 			{
 				LLVMValueRef v_possible_padbyte;
 				LLVMValueRef v_ispad;
@@ -526,7 +539,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 		else
 		{
 			LLVMPositionBuilderAtEnd(b, attcheckalignblocks[attnum]);
-			LLVMBuildBr(b, attalignblocks[attnum]);
+			LLVMBuildBr(b, attstoreblocks[attnum]);
 			LLVMPositionBuilderAtEnd(b, attalignblocks[attnum]);
 			LLVMBuildBr(b, attstoreblocks[attnum]);
 		}
-- 
2.18.0

Reply via email to