from:"Pierre Ducroquet"

Re: Is a modern build system acceptable for older platforms

2018-05-28 Thread Pierre Ducroquet

On Monday, May 28, 2018 4:37:06 AM CEST Yuriy Zhuravlev wrote:
> > Can't see getting rid of those entirely. None of the github style
> > platforms copes with reasonable complex discussions.
> 
> I disagree. A good example of complex discussions on github is Rust
> language tracker for RFCs:
> https://github.com/rust-lang/rfcs/issues
> and one concrete example: https://github.com/rust-lang/rfcs/issues/2327
> I have no any problem with complex discussions on github.

It is indeed hard to follow on github, and would be even worse with bigger 
threads.
Email readers show threads in a hierarchical way, we can see who answered to 
who, discussions can fork to completely different aspects of an issue without 
being mixed together.

> Anyway, it's much better than tons of emails in your mailbox without tags
> and status of discussion.

A github thread does not show what I read / what I have to read, does it now ?

Re: effect of JIT tuple deform?

2018-07-02 Thread Pierre Ducroquet

On Wednesday, June 27, 2018 5:38:31 PM CEST Pavel Stehule wrote:
> 2018-06-27 17:19 GMT+02:00 Tomas Vondra :
> > On 06/26/2018 09:25 PM, Pavel Stehule wrote:
> >> Hi
> >> 
> >> ...
> >> 
> >> So I am able to see effect of jit_tuple_deforming, and very well, but
> >> only if optimization is active. When optimization is not active then
> >> jit_tuple_deforming does slowdown.
> >> 
> >> So maybe a usage of jit_tuple_deforming can be conditioned by
> >> jit_optimization?
> > 
> > Can you share the test case and some detail about the hardware and
> > PostgreSQL configuration?
> 
> I did very simple test
> 
> 
> 0.
> 
> master branch without asserts, shared buffer to 1GB
> tested on Lenovo T520 8GB RAM 8CPU, i7
> Fedora 28, gcc  CFLAGS="-ggdb -Og -g3 -fno-omit-frame-pointer"  --with-llvm
> 
> 1.
> 
> select 'create table wt(' || string_agg(format('%I int', 'c' || i),',') ||
> ')' from generate_series(1,100) g(i) \gexec
> 
> 
> 2.
> 
> begin;
> select 'insert into wt values(' || (select
> string_agg((random()*1)::int::text,',') from generate_series(1,j - j +
> 100) g(i)) || ')' from generate_series(1,100) gg(j) \gexec
> insert into wt select * from wt;
> commit;
> 
> 3.
> 
> set max_paralel_workers to 0; -- the effect of JIT will be more visible
> 
> analyze wt;
> \timing
> 
> select sum(c99) from wt;
> 
> I tested some combination of:
> 
> jit: off on
> jit_inline_above_cost: 0, 10
> jit_optimize_above_cost: 0, 10
> jit_tuple_deforming: on, off
> 
> 
> My primitive tests shows nice possitive effect of jit_tuple_deforming if
> jit optimization is active. When jit optimization is not active, then
> jit_tuple_deforming did slowdown in my test.
> 
> So there is range of costs between 10 and 50 where
> jit_tuple_deforming didn't work well (without optimization)
> 
> I am limmited by small memory of my notebook - when I created table larger
> than 3GB, then I got IO waits on my crypted disc, and any effect of JIT was
> eliminated.
> 
> Regards
> 
> Pavel

Hi

I have studied this case a bit, and I think too that there is something wrong 
here.
Right now, jit_optimize is a -O3. It's really expensive, and triggering it 
here is not the right solution. In the attached patch, I force a -O1 for tuple 
deforming. With your test case, on my computer, the results are :
- no jit : 850ms
- jit with tuple deforming without optimizations : 1650 ms (1.5ms spent 
optimizing)
- jit without tuple deforming : 820ms (0.2ms)
- jit with tuple deforming with optimization (-O3) : 770ms (105ms)
- jit with tuple deforming with patch (-O1) : 725ms (54ms)

I will look at the generated code for tuple deforming, but I think we should 
pre-optimize the LLVM bytecode if we do not want to involve the LLVM 
optimization passes.

Regards

 Pierre
>From c2e70c8fbb7715283d3d53bdf5a70e4db18c99a9 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Mon, 2 Jul 2018 13:44:10 +0200
Subject: [PATCH] Introduce opt1 in LLVM/JIT, and force it with deforming

---
 src/backend/jit/llvm/llvmjit.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 5d0cdab1fc..025319e9c1 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -91,10 +91,12 @@ static const char *llvm_layout = NULL;
 
 
 static LLVMTargetMachineRef llvm_opt0_targetmachine;
+static LLVMTargetMachineRef llvm_opt1_targetmachine;
 static LLVMTargetMachineRef llvm_opt3_targetmachine;
 
 static LLVMTargetRef llvm_targetref;
 static LLVMOrcJITStackRef llvm_opt0_orc;
+static LLVMOrcJITStackRef llvm_opt1_orc;
 static LLVMOrcJITStackRef llvm_opt3_orc;
 
 
@@ -277,6 +279,8 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 #if LLVM_VERSION_MAJOR < 5
 	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt0_orc, funcname)))
 		return (void *) (uintptr_t) addr;
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt1_orc, funcname)))
+		return (void *) (uintptr_t) addr;
 	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt3_orc, funcname)))
 		return (void *) (uintptr_t) addr;
 #else
@@ -284,6 +288,10 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 		elog(ERROR, "failed to look up symbol \"%s\"", funcname);
 	if (addr)
 		return (void *) (uintptr_t) addr;
+	if (LLVMOrcGetSymbolAddress(llvm_opt1_orc, , funcname))
+		elog(ERROR, "failed to look up symbol \"%s\"", funcname);
+	if (addr)
+		return (void *) (uintptr_t) addr;
 	if (LLVMOrcGetSymbolAddress(llvm_opt3_orc, , funcname))
 		elog(ERROR, "failed to look up symbol \"%s\"", funcname);
 	if (addr)
@@ -420,6 +428,8 @@ l

Re: [RFC] Add an until-0 loop in psql

2018-04-30 Thread Pierre Ducroquet

On Monday, April 30, 2018 1:01:25 PM CEST Daniel Verite wrote:
>   Corey Huinker wrote:
> > As of v11, DO blocks can do transactions. I think this will meet your
> > needs.
> They do support COMMIT and ROLLBACK in the current
> development tree, but not VACUUM as in Pierre's example.
> 
> postgres=# \echo :SERVER_VERSION_NAME
> 11devel
> 
> postgres=# do ' begin vacuum; end ';
> ERROR:VACUUM cannot be executed from a function
> CONTEXT:  SQL statement "vacuum"
> PL/pgSQL function inline_code_block line 1 at SQL statement
> 
> 
> Best regards,

Indeed, vacuum is going to be the biggest offender here, sadly.
One could work around this of course (on top of my head, using notify to wake-
up another client that would launch the required vacuums…)
Being able to do transactions in DO blocks is a great new feature of v11 I was 
not aware of. But psql saw the addition of \if recently, so why not having 
loops in there too ? (Something better than this hack of course, it was just a 
10 minutes hack-sprint for a demo)

Regards

 Pierre

Re: JIT compiling with LLVM v9.0

2018-01-24 Thread Pierre Ducroquet

On Wednesday, January 24, 2018 8:20:38 AM CET Andres Freund wrote:
> As the patchset is large (500kb) and I'm still quickly evolving it, I do
> not yet want to attach it. The git tree is at
>   https://git.postgresql.org/git/users/andresfreund/postgres.git
> in the jit branch
>  
> https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=shor
> tlog;h=refs/heads/jit
> 
> to build --with-llvm has to be passed to configure, llvm-config either
> needs to be in PATH or provided with LLVM_CONFIG to make. A c++ compiler
> and clang need to be available under common names or provided via CXX /
> CLANG respectively.
> 
> Regards,
> 
> Andres Freund

Hi

I tried to build on Debian sid, using GCC 7 and LLVM 5. I used the following 
to compile, using your branch @3195c2821d :

$ export LLVM_CONFIG=/usr/bin/llvm-config-5.0
$ ./configure --with-llvm
$ make

And I had the following build error :
llvmjit_wrap.cpp:32:10: fatal error: llvm-c/DebugInfo.h: No such file or 
directory
 #include "llvm-c/DebugInfo.h"
  ^~~~
compilation terminated.

In LLVM 5.0, it looks like DebugInfo.h is not available in llvm-c, only as a C
++ API in llvm/IR/DebugInfo.h.

For 'sport' (I have not played with LLVM API since more than one year), I 
tried to fix it, changing it to the C++ include.

The DebugInfo related one was easy, only one function was used.
But I still could not build because the LLVM API changed between 5.0 and 6.0 
regarding value info SummaryList. 

llvmjit_wrap.cpp: In function 
‘std::unique_ptr<llvm::StringMap<llvm::StringSet<> > > 
llvm_build_inline_plan(llvm::Module*)’:
llvmjit_wrap.cpp:285:48: error: ‘class llvm::GlobalValueSummary’ has no member 
named ‘getBaseObject’
fs = llvm::cast(gvs->getBaseObject());
^

That one was a bit uglier.

I'm not sure how to test everything properly, so the patch is attached for 
both these issues, do as you wish with it… :)

Regards

 Pierre Ducroquet

>From fdfea09dd7410d6ed7ad54df1ba3092bd0eecb92 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet <pina...@pinaraf.info>
Date: Wed, 24 Jan 2018 22:28:34 +0100
Subject: [PATCH] Allow building with LLVM 5.0

---
 src/backend/lib/llvmjit_wrap.cpp | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/src/backend/lib/llvmjit_wrap.cpp b/src/backend/lib/llvmjit_wrap.cpp
index b745aec4fe..7961148a85 100644
--- a/src/backend/lib/llvmjit_wrap.cpp
+++ b/src/backend/lib/llvmjit_wrap.cpp
@@ -29,7 +29,6 @@ extern "C"
 
 #include "llvm-c/Core.h"
 #include "llvm-c/BitReader.h"
-#include "llvm-c/DebugInfo.h"
 
 #include 
 #include 
@@ -50,6 +49,7 @@ extern "C"
 #include "llvm/Analysis/ModuleSummaryAnalysis.h"
 #include "llvm/Bitcode/BitcodeReader.h"
 #include "llvm/IR/CallSite.h"
+#include "llvm/IR/DebugInfo.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/ModuleSummaryIndex.h"
 #include "llvm/Linker/IRMover.h"
@@ -218,6 +218,13 @@ llvm_inline(LLVMModuleRef M)
 	llvm_execute_inline_plan(mod, globalsToInline.get());
 }
 
+
+inline llvm::GlobalValueSummary *GlobalValueSummary__getBaseObject(llvm::GlobalValueSummary *gvs) {
+  if (auto *AS = llvm::dyn_cast(gvs))
+return >getAliasee();
+  return gvs;
+}
+
 /*
  * Build information necessary for inlining external function references in
  * mod.
@@ -282,7 +289,7 @@ llvm_build_inline_plan(llvm::Module *mod)
 			const llvm::Module *defMod;
 			llvm::Function *funcDef;
 
-			fs = llvm::cast(gvs->getBaseObject());
+			fs = llvm::cast(GlobalValueSummary__getBaseObject(gvs.get()));
 			elog(DEBUG2, "func %s might be in %s",
  funcName.data(),
  modPath.data());
@@ -476,7 +483,7 @@ load_module(llvm::StringRef Identifier)
 	 * code. Until that changes, not much point in wasting memory and cycles
 	 * on processing debuginfo.
 	 */
-	LLVMStripModuleDebugInfo(mod);
+	llvm::StripDebugInfo(*llvm::unwrap(mod));
 
 	return std::unique_ptr(llvm::unwrap(mod));
 }
-- 
2.15.1

Re: JIT compiling with LLVM v9.0

2018-01-25 Thread Pierre Ducroquet

On Thursday, January 25, 2018 7:38:16 AM CET Andres Freund wrote:
> Hi,
> 
> On 2018-01-24 22:33:30 -0800, Jeff Davis wrote:
> > On Wed, Jan 24, 2018 at 1:35 PM, Pierre Ducroquet <p.p...@pinaraf.info> 
wrote:
> > > In LLVM 5.0, it looks like DebugInfo.h is not available in llvm-c, only
> > > as a C ++ API in llvm/IR/DebugInfo.h.
> > 
> > The LLVM APIs don't seem to be very stable; won't there just be a
> > continuous stream of similar issues?
> 
> There'll be some of that yes. But the entire difference between 5 and
> what will be 6 was not including one header, and not calling one unneded
> function. That doesn't seem like a crazy amount of adaption that needs
> to be done.  From a quick look about porting to 4, it'll be a bit, but
> not much more effort.

I don't know when this would be released, but the minimal supported LLVM 
version will have a strong influence on the availability of that feature. If 
today this JIT compiling was released with only LLVM 5/6 support, it would be 
unusable for most Debian users (llvm-5 is only available in sid). Even llvm 4 
is not available in latest stable.
I'm already trying to build with llvm-4 and I'm going to try further with llvm 
3.9 (Debian Stretch doesn't have a more recent than this one, and I won't have 
something better to play with my data), I'll keep you informed. For sport, I 
may also try llvm 3.5 (for Debian Jessie).

 Pierre

Re: JIT compiling with LLVM v9.0

2018-01-28 Thread Pierre Ducroquet

On Thursday, January 25, 2018 8:02:54 AM CET Andres Freund wrote:
> Hi!
> 
> On 2018-01-24 22:51:36 -0800, Jeff Davis wrote:
> > Can we store the bitcode in pg_proc, simplifying deployment and
> > allowing extensions to travel over replication?
> 
> Yes, we could. You'd need to be a bit careful that all the machines have
> similar-ish cpu generations or compile with defensive settings, but that
> seems okay.

Hi

Doing this would 'bind' the database to the LLVM release used. LLVM can, as 
far as I know, generate bitcode only for the current version, and will only be 
able to read bitcode from previous versions. So you can't have, for instance a 
master server with LLVM 5 and a standby server with LLVM 4.
So maybe PostgreSQL would have to expose what LLVM version is currently used ? 
Or a major PostgreSQL release could accept only one major LLVM release, as was 
suggested in another thread ?

 Pierre

Re: JIT compiling with LLVM v9.0

2018-01-28 Thread Pierre Ducroquet

On Thursday, January 25, 2018 8:12:42 PM CET Andres Freund wrote:
> Hi,
> 
> On 2018-01-25 10:00:14 +0100, Pierre Ducroquet wrote:
> > I don't know when this would be released,
> 
> August-October range.
> 
> > but the minimal supported LLVM
> > version will have a strong influence on the availability of that feature.
> > If today this JIT compiling was released with only LLVM 5/6 support, it
> > would be unusable for most Debian users (llvm-5 is only available in
> > sid). Even llvm 4 is not available in latest stable.
> > I'm already trying to build with llvm-4 and I'm going to try further with
> > llvm 3.9 (Debian Stretch doesn't have a more recent than this one, and I
> > won't have something better to play with my data), I'll keep you
> > informed. For sport, I may also try llvm 3.5 (for Debian Jessie).
> 
> I don't think it's unreasonable to not support super old llvm
> versions. This is a complex feature, and will take some time to
> mature. Supporting too many LLVM versions at the outset will have some
> cost.  Versions before 3.8 would require supporting mcjit rather than
> orc, and I don't think that'd be worth doing.  I think 3.9 might be a
> reasonable baseline...
> 
> Greetings,
> 
> Andres Freund

Hi

I have fixed the build issues with LLVM 3.9 and 4.0. The LLVM documentation is 
really lacking when it comes to porting from version x to x+1.
The only really missing part I found is that in 3.9, GlobalValueSummary has no 
flag showing if it's not EligibleToImport. I am not sure about the 
consequences.
I'm still fixing some runtime issues so I will not bother you with the patch 
right now.
BTW, the makefile for src/backend/lib does not remove the llvmjit_types.bc 
file when cleaning, and doesn't seem to install in the right folder.

Regards

Pierre

Re: JIT compiling with LLVM v9.0

2018-01-29 Thread Pierre Ducroquet

On Monday, January 29, 2018 10:46:13 AM CET Andres Freund wrote:
> Hi,
> 
> On 2018-01-28 23:02:56 +0100, Pierre Ducroquet wrote:
> > I have fixed the build issues with LLVM 3.9 and 4.0. The LLVM
> > documentation is really lacking when it comes to porting from version x
> > to x+1.
> > The only really missing part I found is that in 3.9, GlobalValueSummary
> > has no flag showing if it's not EligibleToImport. I am not sure about the
> > consequences.
> 
> I think that'd not be too bad, it'd just lead to some small increase in
> overhead as more modules would be loaded.
> 
> > BTW, the makefile for src/backend/lib does not remove the llvmjit_types.bc
> > file when cleaning, and doesn't seem to install in the right folder.
> 
> Hm, both seems to be right here? Note that the llvmjit_types.bc file
> should *not* go into the bitcode/ directory, as it's about syncing types
> not inlining. I've added a comment to that effect.

The file was installed in lib/ while the code expected it in lib/postgresql. 
So there was something wrong here.
And deleting the file when cleaning is needed if at configure another llvm 
version is used. The file must be generated with a clang release that is not 
more recent than the llvm version linked to postgresql. Otherwise, the bitcode 
generated is not accepted by llvm.

Regards

 Pierre

Re: JIT compiling with LLVM v9.1

2018-02-02 Thread Pierre Ducroquet

On Monday, January 29, 2018 10:53:50 AM CET Andres Freund wrote:
> Hi,
> 
> On 2018-01-23 23:20:38 -0800, Andres Freund wrote:
> > == Code ==
> > 
> > As the patchset is large (500kb) and I'm still quickly evolving it, I do
> > not yet want to attach it. The git tree is at
> > 
> >   https://git.postgresql.org/git/users/andresfreund/postgres.git
> > 
> > in the jit branch
> > 
> >   https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=s
> >   hortlog;h=refs/heads/jit
> I've just pushed an updated and rebased version of the tree:
> - Split the large "jit infrastructure" commits into a number of smaller
>   commits
> - Split the C++ file
> - Dropped some of the performance stuff done to heaptuple.c - that was
>   mostly to make performance comparisons a bit more interesting, but
>   doesn't seem important enough to deal with.
> - Added a commit renaming datetime.h symbols so they don't conflict with
>   LLVM variables anymore, removing ugly #undef PM/#define PM dance
>   around includes. Will post separately.
> - Reduced the number of pointer constants in the generated LLVM IR, by
>   doing more getelementptr accesses (stem from before the time types
>   were automatically synced)
> - Increased number of comments a bit
> 
> There's a jit-before-rebase-2018-01-29 tag, for the state of the tree
> before the rebase.
> 
> Regards,
> 
> Andres

Hi

I have successfully built the JIT branch against LLVM 4.0.1 on Debian testing. 
This is not enough for Debian stable (LLVM 3.9 is the latest available there), 
but it's a first step.
I've split the patch in four files. The first three fix the build issues, the 
last one fixes a runtime issue.
I think they are small enough to not be a burden for you in your developments. 
But if you don't want to carry these ifdefs right now, I maintain them in a 
branch on a personal git and rebase as frequently as I can.

LLVM 3.9 support isn't going to be hard, but I prefer splitting. I also hope 
this will help more people test this wonderful toy… :)

Regards

 Pierre


>From 770104331a36a8d207053227b850396f1392939a Mon Sep 17 00:00:00 2001
From: Pierre 
Date: Fri, 2 Feb 2018 09:11:55 +0100
Subject: [PATCH 1/4] Add support for LLVM4 in llvmjit.c

---
 src/backend/lib/llvmjit.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/src/backend/lib/llvmjit.c b/src/backend/lib/llvmjit.c
index 8e5ba94c98..d0c5537610 100644
--- a/src/backend/lib/llvmjit.c
+++ b/src/backend/lib/llvmjit.c
@@ -230,12 +230,19 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 
 		addr = 0;
 		if (LLVMOrcGetSymbolAddressIn(handle->stack, , handle->orc_handle, mangled))
-			elog(ERROR, "failed to lookup symbol");
+			elog(ERROR, "failed to lookup symbol %s", mangled);
 		if (addr)
 			return (void *) addr;
 	}
 #endif
 
+#if LLVM_VERSION_MAJOR < 5
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt0_orc, mangled)))
+		return (void *) addr;
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt3_orc, mangled)))
+		return (void *) addr;
+	elog(ERROR, "failed to lookup symbol %s for %s", mangled, funcname);
+#else
 	if (LLVMOrcGetSymbolAddress(llvm_opt0_orc, , mangled))
 		elog(ERROR, "failed to lookup symbol");
 	if (addr)
@@ -244,7 +251,7 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 		elog(ERROR, "failed to lookup symbol");
 	if (addr)
 		return (void *) addr;
-
+#endif
 	elog(ERROR, "failed to JIT: %s", funcname);
 
 	return NULL;
@@ -380,11 +387,21 @@ llvm_compile_module(LLVMJitContext *context)
 	 * faster instruction selection mechanism is used.
 	 */
 	{
-		LLVMSharedModuleRef smod;
 		instr_time tb, ta;
 
 		/* emit the code */
 		INSTR_TIME_SET_CURRENT(ta);
+#if LLVM_VERSION < 5
+		orc_handle = LLVMOrcAddEagerlyCompiledIR(compile_orc, context->module,
+		llvm_resolve_symbol, NULL);
+		if (!orc_handle)
+		{
+			elog(ERROR, "failed to jit module");
+		}
+#else
+		LLVMSharedModuleRef smod;
+
+		LLVMSharedModuleRef smod;
 		smod = LLVMOrcMakeSharedModule(context->module);
 		if (LLVMOrcAddEagerlyCompiledIR(compile_orc, _handle, smod,
 		llvm_resolve_symbol, NULL))
@@ -392,6 +409,7 @@ llvm_compile_module(LLVMJitContext *context)
 			elog(ERROR, "failed to jit module");
 		}
 		LLVMOrcDisposeSharedModuleRef(smod);
+#endif
 		INSTR_TIME_SET_CURRENT(tb);
 		INSTR_TIME_SUBTRACT(tb, ta);
 		ereport(DEBUG1, (errmsg("time to emit: %.3fs",
-- 
2.15.1

>From 079ad7087e2ab106c0f04fa9056c93afa9a43b7c Mon Sep 17 00:00:00 2001
From: Pierre 
Date: Fri, 2 Feb 2018 09:13:40 +0100
Subject: [PATCH 2/4] Add LLVM4 support in llvmjit_error.cpp

---
 src/backend/lib/llvmjit_error.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/lib/llvmjit_error.cpp b/src/backend/lib/llvmjit_error.cpp
index 70cecd114b..04e51b2a31 100644
--- a/src/backend/lib/llvmjit_error.cpp
+++ b/src/backend/lib/llvmjit_error.cpp
@@ -56,7 +56,9 @@

Re: JIT compiling with LLVM v9.1

2018-02-02 Thread Pierre Ducroquet

On Friday, February 2, 2018 10:48:16 AM CET Pierre Ducroquet wrote:
> On Monday, January 29, 2018 10:53:50 AM CET Andres Freund wrote:
> > Hi,
> > 
> > On 2018-01-23 23:20:38 -0800, Andres Freund wrote:
> > > == Code ==
> > > 
> > > As the patchset is large (500kb) and I'm still quickly evolving it, I do
> > > not yet want to attach it. The git tree is at
> > > 
> > >   https://git.postgresql.org/git/users/andresfreund/postgres.git
> > > 
> > > in the jit branch
> > > 
> > >   https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a
> > >   =s
> > >   hortlog;h=refs/heads/jit
> > 
> > I've just pushed an updated and rebased version of the tree:
> > - Split the large "jit infrastructure" commits into a number of smaller
> > 
> >   commits
> > 
> > - Split the C++ file
> > - Dropped some of the performance stuff done to heaptuple.c - that was
> > 
> >   mostly to make performance comparisons a bit more interesting, but
> >   doesn't seem important enough to deal with.
> > 
> > - Added a commit renaming datetime.h symbols so they don't conflict with
> > 
> >   LLVM variables anymore, removing ugly #undef PM/#define PM dance
> >   around includes. Will post separately.
> > 
> > - Reduced the number of pointer constants in the generated LLVM IR, by
> > 
> >   doing more getelementptr accesses (stem from before the time types
> >   were automatically synced)
> > 
> > - Increased number of comments a bit
> > 
> > There's a jit-before-rebase-2018-01-29 tag, for the state of the tree
> > before the rebase.
> > 
> > Regards,
> > 
> > Andres
> 
> Hi
> 
> I have successfully built the JIT branch against LLVM 4.0.1 on Debian
> testing. This is not enough for Debian stable (LLVM 3.9 is the latest
> available there), but it's a first step.
> I've split the patch in four files. The first three fix the build issues,
> the last one fixes a runtime issue.
> I think they are small enough to not be a burden for you in your
> developments. But if you don't want to carry these ifdefs right now, I
> maintain them in a branch on a personal git and rebase as frequently as I
> can.
> 
> LLVM 3.9 support isn't going to be hard, but I prefer splitting. I also hope
> this will help more people test this wonderful toy… :)
> 
> Regards
> 
>  Pierre

For LLVM 3.9, only small changes were needed.
I've attached the patches to this email.
I only did very basic, primitive testing, but it seems to work.
I'll do more testing in the next days.

 Pierre

>From 5ca9594a7f52b7daab8562293010fe8c807107ee Mon Sep 17 00:00:00 2001
From: Pierre <pierre.ducroq...@people-doc.com>
Date: Fri, 2 Feb 2018 11:29:45 +0100
Subject: [PATCH 1/2] Fix building with LLVM 3.9

---
 src/backend/lib/llvmjit_inline.cpp | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/src/backend/lib/llvmjit_inline.cpp b/src/backend/lib/llvmjit_inline.cpp
index 8a747cbfc0..a785261bea 100644
--- a/src/backend/lib/llvmjit_inline.cpp
+++ b/src/backend/lib/llvmjit_inline.cpp
@@ -37,7 +37,12 @@ extern "C"
 #include 
 #include 
 #include 
+#if LLVM_MAJOR_VERSION > 3
 #include 
+#else
+#include "llvm/Bitcode/ReaderWriter.h"
+#include "llvm/Support/Error.h"
+#endif
 #include 
 #include 
 #include 
@@ -100,7 +105,12 @@ llvm_inline(LLVMModuleRef M)
 	llvm_execute_inline_plan(mod, globalsToInline.get());
 }
 
-#if LLVM_VERSION_MAJOR < 5
+#if LLVM_VERSION_MAJOR < 4
+bool operator!(const llvm::ValueInfo ) {
+	return !(  (vi.Kind == llvm::ValueInfo::VI_GUID && vi.TheValue.Id)
+		|| (vi.Kind == llvm::ValueInfo::VI_Value && vi.TheValue.V));
+}
+#elif LLVM_VERSION_MAJOR < 5
 bool operator!(const llvm::ValueInfo ) {
 	return !(  (vi.Kind == llvm::ValueInfo::VI_GUID && vi.TheValue.Id)
 		|| (vi.Kind == llvm::ValueInfo::VI_Value && vi.TheValue.GV));
@@ -188,12 +198,15 @@ llvm_build_inline_plan(llvm::Module *mod)
  funcName.data(),
  modPath.data());
 
+// XXX Missing in LLVM < 4.0 ?
+#if LLVM_VERSION_MAJOR > 3
 			if (gvs->notEligibleToImport())
 			{
 elog(DEBUG1, "uneligible to import %s due to summary",
 	 funcName.data());
 continue;
 			}
+#endif
 
 			if ((int) fs->instCount() > threshold)
 			{
@@ -339,8 +352,10 @@ llvm_execute_inline_plan(llvm::Module *mod, ImportMapTy *globalsToInline)
 
 #if LLVM_VERSION_MAJOR > 4
 #define IRMOVE_PARAMS , /*IsPerformingImport=*/false
-#else
+#elif LLVM_VERSION_MAJOR > 3
 #define IRMOVE_PARAMS , /*LinkModuleInlineAsm=*/false, /*IsPerformingImport=*/false
+#else
+#define IRMOVE_PARAMS
 #endif
 		if (

Re: JIT compiling with LLVM v9.1

2018-02-05 Thread Pierre Ducroquet

On Sunday, February 4, 2018 12:45:50 AM CET Andreas Karlsson wrote:
> On 02/02/2018 10:48 AM, Pierre Ducroquet wrote:
> > I have successfully built the JIT branch against LLVM 4.0.1 on Debian
> > testing. This is not enough for Debian stable (LLVM 3.9 is the latest
> > available there), but it's a first step.
> > I've split the patch in four files. The first three fix the build issues,
> > the last one fixes a runtime issue.
> > I think they are small enough to not be a burden for you in your
> > developments. But if you don't want to carry these ifdefs right now, I
> > maintain them in a branch on a personal git and rebase as frequently as I
> > can.
> 
> I tested these patches and while the code built for me and passed the
> test suite on Debian testing I have a weird bug where the very first
> query fails to JIT while the rest work as they should. I think I need to
> dig into LLVM's codebase to see what it is, but can you reproduce this
> bug at your machine?
> 
> Code to reproduce:
> 
> SET jit_expressions = true;
> SET jit_above_cost = 0;
> SELECT 1;
> SELECT 1;
> 
> Output:
> 
> postgres=# SELECT 1;
> ERROR:  failed to jit module
> postgres=# SELECT 1;
>   ?column?
> --
>  1
> (1 row)
> 
> Config:
> 
> Version: You patches applied on top of
> 302b7a284d30fb0e00eb5f0163aa933d4d9bea10
> OS: Debian testing
> llvm/clang: 4.0.1-8
> 
> Andreas


I have fixed the patches, I was wrong on 'guessing' the migration of the API 
for one function.
I have rebuilt the whole patch set. It is still based on 302b7a284d and has 
been tested with both LLVM 3.9 and 4.0 on Debian testing.

Thanks for your feedback !
>From ebf577fc6b1e74ddb2f6b9aef1c2632ab807dd70 Mon Sep 17 00:00:00 2001
From: Pierre <pierre.ducroq...@people-doc.com>
Date: Fri, 2 Feb 2018 09:11:55 +0100
Subject: [PATCH 1/6] Add support for LLVM4 in llvmjit.c

---
 src/backend/lib/llvmjit.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/src/backend/lib/llvmjit.c b/src/backend/lib/llvmjit.c
index 8e5ba94c98..046cd53ef3 100644
--- a/src/backend/lib/llvmjit.c
+++ b/src/backend/lib/llvmjit.c
@@ -230,12 +230,19 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 
 		addr = 0;
 		if (LLVMOrcGetSymbolAddressIn(handle->stack, , handle->orc_handle, mangled))
-			elog(ERROR, "failed to lookup symbol");
+			elog(ERROR, "failed to lookup symbol %s", mangled);
 		if (addr)
 			return (void *) addr;
 	}
 #endif
 
+#if LLVM_VERSION_MAJOR < 5
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt0_orc, mangled)))
+		return (void *) addr;
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt3_orc, mangled)))
+		return (void *) addr;
+	elog(ERROR, "failed to lookup symbol %s for %s", mangled, funcname);
+#else
 	if (LLVMOrcGetSymbolAddress(llvm_opt0_orc, , mangled))
 		elog(ERROR, "failed to lookup symbol");
 	if (addr)
@@ -244,7 +251,7 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 		elog(ERROR, "failed to lookup symbol");
 	if (addr)
 		return (void *) addr;
-
+#endif
 	elog(ERROR, "failed to JIT: %s", funcname);
 
 	return NULL;
@@ -380,11 +387,18 @@ llvm_compile_module(LLVMJitContext *context)
 	 * faster instruction selection mechanism is used.
 	 */
 	{
-		LLVMSharedModuleRef smod;
 		instr_time tb, ta;
 
 		/* emit the code */
 		INSTR_TIME_SET_CURRENT(ta);
+#if LLVM_VERSION_MAJOR < 5
+		orc_handle = LLVMOrcAddEagerlyCompiledIR(compile_orc, context->module,
+			 llvm_resolve_symbol, NULL);
+		// It seems there is no error return from that function in LLVM < 5.
+#else
+		LLVMSharedModuleRef smod;
+
+		LLVMSharedModuleRef smod;
 		smod = LLVMOrcMakeSharedModule(context->module);
 		if (LLVMOrcAddEagerlyCompiledIR(compile_orc, _handle, smod,
 		llvm_resolve_symbol, NULL))
@@ -392,6 +406,7 @@ llvm_compile_module(LLVMJitContext *context)
 			elog(ERROR, "failed to jit module");
 		}
 		LLVMOrcDisposeSharedModuleRef(smod);
+#endif
 		INSTR_TIME_SET_CURRENT(tb);
 		INSTR_TIME_SUBTRACT(tb, ta);
 		ereport(DEBUG1, (errmsg("time to emit: %.3fs",
-- 
2.15.1

>From f90c156c9b26eda38d3a1312c3de87ffba50340a Mon Sep 17 00:00:00 2001
From: Pierre <pierre.ducroq...@people-doc.com>
Date: Fri, 2 Feb 2018 09:13:40 +0100
Subject: [PATCH 2/6] Add LLVM4 support in llvmjit_error.cpp

---
 src/backend/lib/llvmjit_error.cpp | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/backend/lib/llvmjit_error.cpp b/src/backend/lib/llvmjit_error.cpp
index 70cecd114b..04e51b2a31 100644
--- a/src/backend/lib/llvmjit_error.cpp
+++ b/src/backend/lib/llvmjit_error.cpp
@@ -56,7 +56,9 @@ llvm_enter_fatal_on_oom(void)
 	if (fatal_new_handler_depth == 0)
 	{
 		old_new_handler = std::set_new_handler(fatal_system_new_handler);
+#if LLVM

Re: JIT compiling with LLVM v9.1

2018-02-05 Thread Pierre Ducroquet

On Sunday, February 4, 2018 12:45:50 AM CET Andreas Karlsson wrote:
> On 02/02/2018 10:48 AM, Pierre Ducroquet wrote:
> > I have successfully built the JIT branch against LLVM 4.0.1 on Debian
> > testing. This is not enough for Debian stable (LLVM 3.9 is the latest
> > available there), but it's a first step.
> > I've split the patch in four files. The first three fix the build issues,
> > the last one fixes a runtime issue.
> > I think they are small enough to not be a burden for you in your
> > developments. But if you don't want to carry these ifdefs right now, I
> > maintain them in a branch on a personal git and rebase as frequently as I
> > can.
> 
> I tested these patches and while the code built for me and passed the
> test suite on Debian testing I have a weird bug where the very first
> query fails to JIT while the rest work as they should. I think I need to
> dig into LLVM's codebase to see what it is, but can you reproduce this
> bug at your machine?
> 
> Code to reproduce:
> 
> SET jit_expressions = true;
> SET jit_above_cost = 0;
> SELECT 1;
> SELECT 1;
> 
> Output:
> 
> postgres=# SELECT 1;
> ERROR:  failed to jit module
> postgres=# SELECT 1;
>   ?column?
> --
>  1
> (1 row)
> 
> Config:
> 
> Version: You patches applied on top of
> 302b7a284d30fb0e00eb5f0163aa933d4d9bea10
> OS: Debian testing
> llvm/clang: 4.0.1-8
> 
> Andreas

Hi

Indeed, thanks for reporting this. I scripted the testing but failed to see 
it, I forgot to set on_error_stop.
I will look into this and fix it.

Thanks

 Pierre

Re: JIT compiling with LLVM v10.0

2018-02-07 Thread Pierre Ducroquet

	llvm::install_fatal_error_handler(fatal_llvm_error_handler);
 	}
 	fatal_new_handler_depth++;
@@ -72,7 +74,9 @@ llvm_leave_fatal_on_oom(void)
 	if (fatal_new_handler_depth == 0)
 	{
 		std::set_new_handler(old_new_handler);
+#if LLVM_VERSION_MAJOR > 4
 		llvm::remove_bad_alloc_error_handler();
+#endif
 		llvm::remove_fatal_error_handler();
 	}
 }
@@ -87,7 +91,9 @@ llvm_reset_after_error(void)
 	if (fatal_new_handler_depth != 0)
 	{
 		std::set_new_handler(old_new_handler);
+#if LLVM_VERSION_MAJOR > 4
 		llvm::remove_bad_alloc_error_handler();
+#endif
 		llvm::remove_fatal_error_handler();
 	}
 	fatal_new_handler_depth = 0;
-- 
2.16.1

>From 57477cd2005e78b70d8409d422327b23eb71737a Mon Sep 17 00:00:00 2001
From: Pierre <pierre.ducroq...@people-doc.com>
Date: Fri, 2 Feb 2018 09:23:56 +0100
Subject: [PATCH 3/8] Add LLVM4 support in llvmjit_inline.cpp

---
 src/backend/jit/llvm/llvmjit_inline.cpp | 36 +++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_inline.cpp b/src/backend/jit/llvm/llvmjit_inline.cpp
index 7d0d18b43d..4fa0e5ab64 100644
--- a/src/backend/jit/llvm/llvmjit_inline.cpp
+++ b/src/backend/jit/llvm/llvmjit_inline.cpp
@@ -100,6 +100,13 @@ llvm_inline(LLVMModuleRef M)
 	llvm_execute_inline_plan(mod, globalsToInline.get());
 }
 
+#if LLVM_VERSION_MAJOR < 5
+bool operator!(const llvm::ValueInfo ) {
+	return !(  (vi.Kind == llvm::ValueInfo::VI_GUID && vi.TheValue.Id)
+		|| (vi.Kind == llvm::ValueInfo::VI_Value && vi.TheValue.GV));
+}
+#endif
+
 /*
  * Build information necessary for inlining external function references in
  * mod.
@@ -146,7 +153,14 @@ llvm_build_inline_plan(llvm::Module *mod)
 		if (threshold == -1)
 			continue;
 
+#if LLVM_VERSION_MAJOR > 4
 		llvm::ValueInfo funcVI = llvm_index->getValueInfo(funcGUID);
+#else
+		const llvm::const_gvsummary_iterator  = llvm_index->findGlobalValueSummaryList(funcGUID);
+		if (I == llvm_index->end())
+			continue;
+		llvm::ValueInfo funcVI = llvm::ValueInfo(I->first);
+#endif
 
 		/* if index doesn't know function, we don't have a body, continue */
 		if (!funcVI)
@@ -157,7 +171,12 @@ llvm_build_inline_plan(llvm::Module *mod)
 		 * look up module(s), check if function actually is defined (there
 		 * could be hash conflicts).
 		 */
+#if LLVM_VERSION_MAJOR > 4
 		for (const auto  : funcVI.getSummaryList())
+#else
+		auto it_gvs = llvm_index->findGlobalValueSummaryList(funcVI.getGUID());
+		for (const auto : it_gvs->second)
+#endif
 		{
 			const llvm::FunctionSummary *fs;
 			llvm::StringRef modPath = gvs->modulePath();
@@ -318,9 +337,14 @@ llvm_execute_inline_plan(llvm::Module *mod, ImportMapTy *globalsToInline)
 
 		}
 
+#if LLVM_VERSION_MAJOR > 4
+#define IRMOVE_PARAMS , /*IsPerformingImport=*/false
+#else
+#define IRMOVE_PARAMS , /*LinkModuleInlineAsm=*/false, /*IsPerformingImport=*/false
+#endif
 		if (Mover.move(std::move(importMod), GlobalsToImport.getArrayRef(),
-	   [](llvm::GlobalValue &, llvm::IRMover::ValueAdder) {},
-	   /*IsPerformingImport=*/false))
+	   [](llvm::GlobalValue &, llvm::IRMover::ValueAdder) {}
+	   IRMOVE_PARAMS))
 			elog(ERROR, "function import failed with linker error");
 	}
 }
@@ -619,9 +643,17 @@ llvm_load_index(void)
 elog(ERROR, "failed to open %s: %s", subpath,
 	 EC.message().c_str());
 			llvm::MemoryBufferRef ref(*MBOrErr.get().get());
+#if LLVM_VERSION_MAJOR > 4
 			llvm::Error e = llvm::readModuleSummaryIndex(ref, *index, 0);
 			if (e)
 elog(ERROR, "could not load summary at %s", subpath);
+#else
+			std::unique_ptr subindex = std::move(llvm::getModuleSummaryIndex(ref).get());
+			if (!subindex)
+elog(ERROR, "could not load summary at %s", subpath);
+			else
+index->mergeFrom(std::move(subindex), 0);
+#endif
 		}
 	}
 
-- 
2.16.1

>From e0b3147fe7b4eec8593b62c78d7491796c9e9a2f Mon Sep 17 00:00:00 2001
From: Pierre <pierre.ducroq...@people-doc.com>
Date: Fri, 2 Feb 2018 10:34:09 +0100
Subject: [PATCH 4/8] Don't emit bitcode depending on an LLVM 5+ function

---
 src/backend/jit/llvm/llvmjit_expr.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c
index a06319b1b6..4b3c5367e5 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -170,7 +170,11 @@ get_LifetimeEnd(LLVMModuleRef mod)
 	LLVMTypeRef sig;
 	LLVMValueRef fn;
 	LLVMTypeRef param_types[2];
+#if LLVM_VERSION_MAJOR > 4
 	const char *nm = "llvm.lifetime.end.p0i8";
+#else
+	const char *nm = "llvm.lifetime.end";
+#endif
 
 	fn = LLVMGetNamedFunction(mod, nm);
 	if (fn)
-- 
2.16.1

>From 1df043c0345f02a58a68bae0c671dbbb4a5d97f1 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet <pina...@pinaraf.info>
Date: Wed, 7 Feb 2018 20:22:37 +0100
Subject: [PATCH 5/8] Fix warning

---
 src/b

Re: JIT compiling with LLVM v9.1

2018-02-05 Thread Pierre Ducroquet

On Monday, February 5, 2018 10:20:27 PM CET Andreas Karlsson wrote:
> OK that fixed the issue, but you have a typo in your patch set.
> 
> diff --git a/src/backend/lib/llvmjit_inline.cpp
> b/src/backend/lib/llvmjit_inline.cpp
> index a785261bea..51f38e10d2 100644
> --- a/src/backend/lib/llvmjit_inline.cpp
> +++ b/src/backend/lib/llvmjit_inline.cpp
> @@ -37,7 +37,7 @@ extern "C"
>   #include 
>   #include 
>   #include 
> -#if LLVM_MAJOR_VERSION > 3
> +#if LLVM_VERSION_MAJOR > 3
>   #include 
>   #else
>   #include "llvm/Bitcode/ReaderWriter.h"

Thanks, it's weird I had no issue with it. I will fix in the next patch set.

> Also I get some warning. Not sure if they are from your patches or from
> Andres's.
> 
> llvmjit_error.cpp:118:1: warning: unused function
> 'fatal_llvm_new_handler' [-Wunused-function]
> fatal_llvm_new_handler(void *user_data,
> ^
> 1 warning generated.
> llvmjit_inline.cpp:114:6: warning: no previous prototype for function
> 'operator!' [-Wmissing-prototypes]
> bool operator!(const llvm::ValueInfo ) {
>   ^
> 1 warning generated.

Both are mine, I knew about the first one, but I did not see the second one. I 
will fix them too, thanks for the review!

> psqlscanslash.l: In function ‘psql_scan_slash_option’:
> psqlscanslash.l:550:8: warning: variable ‘lexresult’ set but not used
> [-Wunused-but-set-variable]
>int   final_state;
>  ^

I'm not sure Andres's patches have anything to do with psql, it's surprising.

Re: JIT compiling with LLVM v10.1

2018-02-14 Thread Pierre Ducroquet

On Wednesday, February 14, 2018 7:17:10 PM CET Andres Freund wrote:
> Hi,
> 
> On 2018-02-07 06:54:05 -0800, Andres Freund wrote:
> > I've pushed v10.0. The big (and pretty painful to make) change is that
> > now all the LLVM specific code lives in src/backend/jit/llvm, which is
> > built as a shared library which is loaded on demand.
> > 
> > The layout is now as follows:
> > 
> > src/backend/jit/jit.c:
> > Part of JITing always linked into the server. Supports loading the
> > LLVM using JIT library.
> > 
> > src/backend/jit/llvm/
> > 
> > Infrastructure:
> >  llvmjit.c:
> > General code generation and optimization infrastructure
> >  
> >  llvmjit_error.cpp, llvmjit_wrap.cpp:
> > Error / backward compat wrappers
> >  
> >  llvmjit_inline.cpp:
> > Cross module inlining support
> > 
> > Code-Gen:
> >   llvmjit_expr.c
> >   
> > Expression compilation
> >   
> >   llvmjit_deform.c
> >   
> > Deform compilation
> 
> I've pushed a revised version that hopefully should address Jeff's
> wish/need of being able to experiment with this out of core. There's now
> a "jit_provider" PGC_POSTMASTER GUC that's by default set to
> "llvmjit". llvmjit.so is the .so implementing JIT using LLVM. It fills a
> set of callbacks via
> extern void _PG_jit_provider_init(JitProviderCallbacks *cb);
> which can also be implemented by any other potential provider.
> 
> The other two biggest changes are that I've added a README
> https://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;
> f=src/backend/jit/README;hb=jit and that I've revised the configure support
> so it does more error
> checks, and moved it into config/llvm.m4.
> 
> There's a larger smattering of small changes too.
> 
> I'm pretty happy with how the separation of core / shlib looks now. I'm
> planning to work on cleaning and then pushing some of the preliminary
> patches (fixed tupledesc, grouping) over the next few days.
> 
> Greetings,
> 
> Andres Freund

Hi

Here are the LLVM4 and LLVM3.9 compatibility patches.
Successfully built, and executed some silly queries with JIT forced to make 
sure it worked.

 Pierre>From c856a5db2f0ba34ba7c230a65f60277ae0e7347f Mon Sep 17 00:00:00 2001
From: Pierre <pierre.ducroq...@people-doc.com>
Date: Fri, 2 Feb 2018 09:11:55 +0100
Subject: [PATCH 1/8] Add support for LLVM4 in llvmjit.c

Signed-off-by: Pierre Ducroquet <pina...@pinaraf.info>
---
 src/backend/jit/llvm/llvmjit.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 7a96ece0f7..7557dc9a19 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -222,13 +222,20 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 
 		addr = 0;
 		if (LLVMOrcGetSymbolAddressIn(handle->stack, , handle->orc_handle, mangled))
-			elog(ERROR, "failed to lookup symbol");
+			elog(ERROR, "failed to lookup symbol %s", mangled);
 		if (addr)
 			return (void *) addr;
 	}
 
 #else
 
+#if LLVM_VERSION_MAJOR < 5
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt0_orc, mangled)))
+		return (void *) addr;
+	if ((addr = LLVMOrcGetSymbolAddress(llvm_opt3_orc, mangled)))
+		return (void *) addr;
+	elog(ERROR, "failed to lookup symbol %s for %s", mangled, funcname);
+#else
 	if (LLVMOrcGetSymbolAddress(llvm_opt0_orc, , mangled))
 		elog(ERROR, "failed to lookup symbol");
 	if (addr)
@@ -237,6 +244,8 @@ llvm_get_function(LLVMJitContext *context, const char *funcname)
 		elog(ERROR, "failed to lookup symbol");
 	if (addr)
 		return (void *) addr;
+#endif // LLVM_VERSION_MAJOR
+
 #endif
 
 	elog(ERROR, "failed to JIT: %s", funcname);
@@ -374,11 +383,18 @@ llvm_compile_module(LLVMJitContext *context)
 	 * faster instruction selection mechanism is used.
 	 */
 	{
-		LLVMSharedModuleRef smod;
 		instr_time tb, ta;
 
 		/* emit the code */
 		INSTR_TIME_SET_CURRENT(ta);
+#if LLVM_VERSION_MAJOR < 5
+		orc_handle = LLVMOrcAddEagerlyCompiledIR(compile_orc, context->module,
+			 llvm_resolve_symbol, NULL);
+		// It seems there is no error return from that function in LLVM < 5.
+#else
+		LLVMSharedModuleRef smod;
+
+		LLVMSharedModuleRef smod;
 		smod = LLVMOrcMakeSharedModule(context->module);
 		if (LLVMOrcAddEagerlyCompiledIR(compile_orc, _handle, smod,
 		llvm_resolve_symbol, NULL))
@@ -386,6 +402,7 @@ llvm_compile_module(LLVMJitContext *context)
 			elog(ERROR, "failed to jit module");
 		}
 		LLVMOrcDisposeSharedModuleRef(smod);
+#endif
 		INSTR_TIME_SET_CURRENT(tb);
 		INSTR_TIME_SUBTRACT(tb, ta);
 		ereport(DEBUG1, (

Re: Query is over 2x slower with jit=on

2018-08-22 Thread Pierre Ducroquet

On Wednesday, August 22, 2018 6:51:55 PM CEST Andres Freund wrote:
> On 2018-08-22 18:39:18 +0200, Andreas Joseph Krogh wrote:
> > Just to be clear; The query really runs slower (wall-clock time), it's not
> > just the timing.
> 
> I bet it's not actually running slower, it "just" takes longer to start
> up due to the JITing in each worker. I suspect what we should do is to
> multiple the cost limits by the number of workers, to model that.  But
> without the fixed instrumentation that's harder to see...

It depends on the query. It has been shown in other threads that query can 
indeed take longer to run because of JITing : if the cost is too low to fire 
LLVM optimizer, the generated code can be so bad it will be slower than the 
non-JIT executor.
Cf for instance a previous discussion here : 
http://www.postgresql-archive.org/PATCH-LLVM-tuple-deforming-improvements-td6029385.html

I think it would be interesting to try the query from this thread with a patch 
forcing the LLVM codegen to O1 (I found no PassManager there to play with, it 
seems to be an off/on/extreme switch ; patch 0001-LLVM-Use-the-O1-CodeGen-
level.patch from thread mentioned above).

Re: [PATCH] LLVM tuple deforming improvements

2018-07-13 Thread Pierre Ducroquet

On Friday, July 13, 2018 11:08:45 PM CEST Andres Freund wrote:
> Hi,
> 
> Thanks for looking at this!
> 
> On 2018-07-13 10:20:42 +0200, Pierre Ducroquet wrote:
> > 2) improve the LLVM IR code
> > 
> > The code generator in llvmjit-deform.c currently rely on the LLVM
> > optimizer to do the right thing. For instance, it can generate a lot of
> > empty blocks with only a jump. If we don't want to enable the LLVM
> > optimizer for every code, we have to get rid of this kind of pattern. The
> > attached patch does that. When the optimizer is not used, this gets a few
> > cycles boost, nothing impressive. I have tried to go closer to the
> > optimized bitcode, but it requires building phi nodes manually instead of
> > using alloca, and this isn't enough to bring us to the performance level
> > of -O1.
> 
> Building phi blocks manually is too painful imo. But there's plenty
> blocks we could easily skip entirely, even without creating phi nodes.
> 
> > From 4da278ee49b91d34120747c6763c248ad52da7b7 Mon Sep 17 00:00:00 2001
> > From: Pierre Ducroquet 
> > Date: Mon, 2 Jul 2018 13:44:10 +0200
> > Subject: [PATCH] Introduce opt1 in LLVM/JIT, and force it with deforming
> 
> I think I'd rather go for more explicit pipelines than defaulting to OX
> pipelines.  This is too late for v11, and I suspect quite strongly that
> we'll end up not relying on any of the default pipelines going forward -
> they're just not targeted at our usecase.  I just didn't go there for
> v11, because I was running out of steam / time.

Hi

After looking at the optimization passes, I noticed that, at least in that 
case, most benefits do not come from any of the bitcode level passes.
Using a -O3 pipeline only on the bitcode gives at best a 20% performance 
boost.
Using a -O0 on the bitcode, but changing the codegen opt level from None (O1) 
to Less (O1) yields even better performance than a complete O1.
The attached patch alone gives a query time of 650 ms, vs 725ms for 'full' O1 
and 770ms for O3.

As far as I know (and from quickly looking at LLVM code, it doesn't look like 
this changed recently), the CodeGen part doesn't expose a pass manager, thus 
making it impossible to have our own optimization pipeline there, and we do 
not control the code generation directly since we rely on the ORC execution 
engine.


Regards

 Pierre
>From 8b66f60869e285b6f45f3cb900f8c1df44df15ee Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Sat, 14 Jul 2018 01:51:31 +0200
Subject: [PATCH] LLVM - Use the O1 CodeGen level

---
 src/backend/jit/llvm/llvmjit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
index 5d0cdab1fc..a73619ae2f 100644
--- a/src/backend/jit/llvm/llvmjit.c
+++ b/src/backend/jit/llvm/llvmjit.c
@@ -643,7 +643,7 @@ llvm_session_initialize(void)
 
 	llvm_opt0_targetmachine =
 		LLVMCreateTargetMachine(llvm_targetref, llvm_triple, cpu, features,
-LLVMCodeGenLevelNone,
+LLVMCodeGenLevelDefault,
 LLVMRelocDefault,
 LLVMCodeModelJITDefault);
 	llvm_opt3_targetmachine =
-- 
2.18.0

Re: function lca('{}'::ltree[]) caused DB Instance crash

2018-07-13 Thread Pierre Ducroquet

On Friday, July 13, 2018 12:09:20 PM CEST 李海龙 wrote:
> HI,Oleg && pgsql-hackers
> 
> Plese help me to check this is a bug of ltree?
> 

Hi

There is indeed a bug. The _lca function in _ltree_op.c tries to allocate 0 
bytes of memory, doesn't initialize it and dereference it in lca_inner.
The attached basic patch fixes it.

Regards

 Pierre
>From 4e59747cea428d39c80974c408e95ba86bf63ecc Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Fri, 13 Jul 2018 12:47:36 +0200
Subject: [PATCH] Fix segfault with lca('{}'::ltree[])

---
 contrib/ltree/_ltree_op.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/contrib/ltree/_ltree_op.c b/contrib/ltree/_ltree_op.c
index 9bb6bcaeff..6afcd95cd1 100644
--- a/contrib/ltree/_ltree_op.c
+++ b/contrib/ltree/_ltree_op.c
@@ -297,6 +297,9 @@ _lca(PG_FUNCTION_ARGS)
 	ltree	  **a,
 			   *res;
 
+	if (num == 0)
+		PG_RETURN_NULL();
+
 	if (ARR_NDIM(la) > 1)
 		ereport(ERROR,
 (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
-- 
2.18.0

[PATCH] LLVM tuple deforming improvements

2018-07-13 Thread Pierre Ducroquet

Hi

As reported in the «effect of JIT tuple deform?» thread, there are for some 
cases slowdowns when using JIT tuple deforming.
I've played with the generated code and with the LLVM optimizer trying to fix 
that issue, here are the results of my experiments, with the corresponding 
patches.

All performance measurements are done following the test from 
https://www.postgresql.org/message-id/CAFj8pRAOcSXNnykfH=M6mNaHo
+g=FaUs=dldzsohdjbkujr...@mail.gmail.com

Base measurements : 

No JIT : 850ms 
JIT without tuple deforming : 820 ms (0.2ms optimizing)
JIT with tuple deforming, no opt : 1650 ms (1.5ms)
JIT with tuple deforming, -O3 : 770 ms (105ms)

1) force a -O1 when deforming

This is by far the best I managed to get. With -O1, the queries are even 
faster than with -O3 since the optimizer is faster, while generating an 
already efficient code.
I have tried adding the right passes to the passmanager, but it looks like the 
interesting ones are not available unless you enable -O1.

JIT with tuple deforming, -O1 : 725 ms (54ms)

2) improve the LLVM IR code

The code generator in llvmjit-deform.c currently rely on the LLVM optimizer to 
do the right thing. For instance, it can generate a lot of empty blocks with 
only a jump. If we don't want to enable the LLVM optimizer for every code, we 
have to get rid of this kind of pattern. The attached patch does that. When 
the optimizer is not used, this gets a few cycles boost, nothing impressive.
I have tried to go closer to the optimized bitcode, but it requires building 
phi nodes manually instead of using alloca, and this isn't enough to bring us 
to the performance level of -O1.

JIT with tuple deforming, no opt : 1560 ms (1.5ms)

3) *experimental* : faster non-NULL handling

Currently, the generated code always look at the tuple header bitfield to 
check each field null-ness, using afterwards an and against the hasnulls bit.
Checking only for hasnulls improves performance when there are mostly null-
less tuples, but taxes the performance when nulls are found.
I have not yet suceeded in implementing it, but I think that using the 
statistics collected for a given table, we could use that when we know that we 
may benefit from it.

JIT with tuple deforming, no opt : 1520 ms (1.5ms)
JIT with tuple deforming, -O1 : 690 ms (54ms)
>From 14a226107f845454676a2e14ae0fb843a5b4f668 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Wed, 11 Jul 2018 23:41:59 +0200
Subject: [PATCH 1/2] Check for the hasnulls attribute before checking
 individual fields

---
 src/backend/jit/llvm/llvmjit_deform.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/src/backend/jit/llvm/llvmjit_deform.c b/src/backend/jit/llvm/llvmjit_deform.c
index 795f67114e..c53855eb63 100644
--- a/src/backend/jit/llvm/llvmjit_deform.c
+++ b/src/backend/jit/llvm/llvmjit_deform.c
@@ -48,6 +48,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 	LLVMBasicBlockRef b_out;
 	LLVMBasicBlockRef b_dead;
 	LLVMBasicBlockRef *attcheckattnoblocks;
+	LLVMBasicBlockRef *attfaststartblocks;
 	LLVMBasicBlockRef *attstartblocks;
 	LLVMBasicBlockRef *attisnullblocks;
 	LLVMBasicBlockRef *attcheckalignblocks;
@@ -145,6 +146,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 	b = LLVMCreateBuilder();
 
 	attcheckattnoblocks = palloc(sizeof(LLVMBasicBlockRef) * natts);
+	attfaststartblocks = palloc(sizeof(LLVMBasicBlockRef) * natts);
 	attstartblocks = palloc(sizeof(LLVMBasicBlockRef) * natts);
 	attisnullblocks = palloc(sizeof(LLVMBasicBlockRef) * natts);
 	attcheckalignblocks = palloc(sizeof(LLVMBasicBlockRef) * natts);
@@ -239,6 +241,8 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 	{
 		attcheckattnoblocks[attnum] =
 			l_bb_append_v(v_deform_fn, "block.attr.%d.attcheckattno", attnum);
+		attfaststartblocks[attnum] =
+			l_bb_append_v(v_deform_fn, "block.attr.%d.faststart", attnum);
 		attstartblocks[attnum] =
 			l_bb_append_v(v_deform_fn, "block.attr.%d.start", attnum);
 		attisnullblocks[attnum] =
@@ -337,7 +341,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 
 		/*
 		 * If this is the first attribute, slot->tts_nvalid was 0. Therefore
-		 * reset offset to 0 to, it be from a previous execution.
+		 * reset offset to 0 too, it could be from a previous execution.
 		 */
 		if (attnum == 0)
 		{
@@ -351,7 +355,7 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 		 */
 		if (attnum <= guaranteed_column_number)
 		{
-			LLVMBuildBr(b, attstartblocks[attnum]);
+			LLVMBuildBr(b, attfaststartblocks[attnum]);
 		}
 		else
 		{
@@ -361,8 +365,19 @@ slot_compile_deform(LLVMJitContext *context, TupleDesc desc, int natts)
 	 l_attno,
 	 v_maxatt,
 	 "heap_natts");
-			LLVMBuildCondBr(b, v_islast, b_out, attstartblocks[attnum]);
+			LLVMBuildCon

Re: Build fails with different versions of clang and LLVM

2018-04-23 Thread Pierre Ducroquet

On Monday, April 23, 2018 10:33:04 AM CEST Heikki Linnakangas wrote:
> Hi!
> 
> I tried compiling with --with-llvm on my laptop, and got this:
> > $ make -s
> > All of PostgreSQL successfully made. Ready to install.
> > $ make -s install
> > Invalid Summary Block: version expected
> > error: can't create ModuleSummaryIndexObjectFile for buffer: Corrupted
> > bitcode #0 0x7fbe98032085
> > llvm::sys::PrintStackTrace(llvm::raw_ostream&)
> > (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0x707085) #1
> > 0x7fbe9803023e llvm::sys::RunSignalHandlers()
> > (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0x70523e) #2
> > 0x7fbe98030362
> > (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0x705362) #3
> > 0x7fbe9771f160 __restore_rt
> > (/lib/x86_64-linux-gnu/libpthread.so.0+0x12160) #4 0x7fbe985cfba4
> > (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0xca4ba4) #5
> > 0x7fbe985e3318 llvm::WriteIndexToFile(llvm::ModuleSummaryIndex
> > const&, llvm::raw_ostream&, std::map > std::char_traits, std::allocator >, std::map > llvm::GlobalValueSummary*, std::less,
> > std::allocator > > >, std::less > std::allocator > >,
> > std::allocator > std::char_traits, std::allocator > const, std::map > long, llvm::GlobalValueSummary*, std::less,
> > std::allocator > > > > > >*) (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0xcb8318) #6
> > 0x55d1120f4ac7 (/usr/lib/llvm-3.9/bin/llvm-lto+0x19ac7)
> > #7 0x55d1120f5f95 (/usr/lib/llvm-3.9/bin/llvm-lto+0x1af95)
> > #8 0x55d1120ea0f0 _init (/usr/lib/llvm-3.9/bin/llvm-lto+0xf0f0)
> > #9 0x7fbe96a96f2a __libc_start_main
> > (/lib/x86_64-linux-gnu/libc.so.6+0x20f2a) #10 0x55d1120ebe8a _init
> > (/usr/lib/llvm-3.9/bin/llvm-lto+0x10e8a) Stack dump:
> > 0.  Program arguments: /usr/lib/llvm-3.9/bin/llvm-lto -thinlto
> > -thinlto-action=thinlink -o postgres.index.bc
> > postgres/access/brin/brin.bc postgres/access/brin/brin_pageops.bc [LOT OF
> > FILES REMOVED TO KEEP THIS SHORT] postgres/utils/fmgrtab.bc
> > postgres/jit/jit.bc Segmentation fault
> > Makefile:246: recipe for target 'install-postgres-bitcode' failed
> > make[2]: *** [install-postgres-bitcode] Error 139
> > Makefile:42: recipe for target 'install-backend-recurse' failed
> > make[1]: *** [install-backend-recurse] Error 2
> > GNUmakefile:11: recipe for target 'install-src-recurse' failed
> > make: *** [install-src-recurse] Error 2
> 
> I think this is because I have a funny combination of clang and LLVM:
> > $ clang --version
> > clang version 3.8.1-24 (tags/RELEASE_381/final)
> > Target: x86_64-pc-linux-gnu
> > Thread model: posix
> > InstalledDir: /usr/bin
> > $ llvm-config --version
> > bash: llvm-config: command not found
> > $ llvm-config-3.9 --version
> > 3.9.1
> 
> So, I have LLVM 3.9 installed, but no clang 3.9. Only clang 3.8.
> 
> The autoconf macro in llvm.m4 says:
> >   # Could check clang version, but it doesn't seem that
> >   # important. Systems with a new enough LLVM version are usually
> >   # going to have a decent clang version too. It's also not entirely
> >   # clear what the minimum version is.
> 
> In light of the above error, I think we should do more. Apparently LLVM
> 3.9 cannot read bitcode files generated by clang 3.8, so we should at
> least check that clang version is 3.9 or above.
> 
> Googling around, the LLVM bitcode format is supposed to be compatible
> across versions, but I'm not sure I trust that. Perhaps we should check
> that the LLVM and clang versions match? Instead of searching for any
> 'clang' program with PGAC_PATH_PROGS, perhaps we should always use
> "`lvm-config-3.9 --bindir`/clang", throwing an error if it doesn't exist.
> 
> BTW, I'm surprised that llvm-lto is invoked in the "make install" stage.
> I would expect it to be done by plain "make" already, and "make install"
> would just copy the files to the right places.

Hi

The bitcode format does not respect strict compatibility rules. LLVM and clang 
versions must match if we don't want any surprise.
See for instance
https://stackoverflow.com/questions/15836430/how-stable-is-the-llvm-assembly-language

In your case, this should have worked, but I am not surprised at all that this 
failed, I had similar issues when testing 3.9/4.0 on the same system. I 
thought the build system already checked for version equality.


 Pierre

[RFC] Add an until-0 loop in psql

2018-04-24 Thread Pierre Ducroquet

Hi

When running database migrations with .sql files on a live database, it's not 
uncommon to have to run a migration in a loop to prevent a big lock on a 
table.
For instance if one want to delete some old datas from a big table one would 
write :

DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad = true 
LIMIT 1000);
VACUUM big_table;

Right now, doing this is quite inefficient. We either have to write a script 
in another language, or run psql in a shell loop and wait for the migration to 
stop altering rows.

The attached **proof of concept** patch (I insist, it's a 15 minutes hack 
sprint with no previous knowledge of psql code) implements an 'until-0' loop 
in psql.
The previous migration could be simply written as :

\until-0
BEGIN;
DELETE FROM big_table WHERE id IN (SELECT id FROM big_table WHERE bad = true 
LIMIT 1000);
VACUUM big_table;
COMMIT;
\end-until

And psql will execute it until there is no row affected in the inner queries.

I am willing to write a proper patch for this (I hope the tell/seek is an 
acceptable implementation…), but I prefer having some feedback first.

Thanks

 Pierrediff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 4c85f43f09..d706e38ffc 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -401,6 +401,19 @@ exec_command(const char *cmd,
 		status = exec_command_shell_escape(scan_state, active_branch);
 	else if (strcmp(cmd, "?") == 0)
 		status = exec_command_slash_command_help(scan_state, active_branch);
+	else if (strcmp(cmd, "until-0") == 0) {
+		status = PSQL_CMD_SKIP_LINE;
+		pset.is_in_until = ftell(pset.cur_cmd_source);
+		pset.has_affected_rows = false;
+	} else if (strcmp(cmd, "end-until") == 0) {
+		status = PSQL_CMD_SKIP_LINE;
+		if (pset.has_affected_rows) {
+			fseek(pset.cur_cmd_source, pset.is_in_until, SEEK_SET);
+			pset.has_affected_rows = false;
+		} else {
+			pset.is_in_until = 0;
+		}
+	}
 	else
 		status = PSQL_CMD_UNKNOWN;
 
diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c
index c06ce3ca09..869dbf6dcd 100644
--- a/src/bin/psql/mainloop.c
+++ b/src/bin/psql/mainloop.c
@@ -430,6 +430,12 @@ MainLoop(FILE *source)
 {
 	success = SendQuery(query_buf->data);
 	slashCmdStatus = success ? PSQL_CMD_SEND : PSQL_CMD_ERROR;
+	if (success && pset.is_in_until && !pset.has_affected_rows)
+	{
+		const char *row_count = GetVariable(pset.vars, "ROW_COUNT");
+		if (row_count != NULL && strcmp(row_count, "0") != 0)
+			pset.has_affected_rows = true;
+	}
 	pset.stmt_lineno = 1;
 
 	/* transfer query to previous_buf by pointer-swapping */
diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h
index 69e617e6b5..f3f92fe899 100644
--- a/src/bin/psql/settings.h
+++ b/src/bin/psql/settings.h
@@ -139,6 +139,9 @@ typedef struct _psqlSettings
 	const char *prompt3;
 	PGVerbosity verbosity;		/* current error verbosity level */
 	PGContextVisibility show_context;	/* current context display level */
+	
+	long		is_in_until;
+	int			has_affected_rows;
 } PsqlSettings;
 
 extern PsqlSettings pset;
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
index be57574cd3..beba8851a3 100644
--- a/src/bin/psql/startup.c
+++ b/src/bin/psql/startup.c
@@ -134,6 +134,8 @@ main(int argc, char *argv[])
 	pset.last_error_result = NULL;
 	pset.cur_cmd_source = stdin;
 	pset.cur_cmd_interactive = false;
+	pset.is_in_until = 0;
+	pset.has_affected_rows = false;
 
 	/* We rely on unmentioned fields of pset.popt to start out 0/false/NULL */
 	pset.popt.topt.format = PRINT_ALIGNED;

Re: JIT compiling with LLVM v12

2018-03-29 Thread Pierre Ducroquet

On Thursday, March 29, 2018 2:39:17 PM CEST Jesper Pedersen wrote:
> Hi Andres,
> 
> On 03/28/2018 05:27 PM, Andres Freund wrote:
> > On 2018-03-27 10:34:26 -0700, Andres Freund wrote:
> >> On 2018-03-27 10:05:47 -0400, Peter Eisentraut wrote:
> >>> On 3/13/18 19:40, Andres Freund wrote:
>  I've pushed a revised and rebased version of my JIT patchset.
> >>> 
> >>> What is the status of this item as far as the commitfest is concerned?
> >> 
> >> 7/10 committed. Inlining, Explain, Docs remain.
> > 
> > I've pushed these three.
> 
> It seems that clang is being picked up as the main compiler in certain
> situations, ala
> 
> ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
> -Wformat-security -fno-strict-aliasing -fwrapv
> -fexcess-precision=standard -g -O0 -fno-omit-frame-pointer
> -I../../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o
> auth-scram.o auth-scram.c -MMD -MP -MF .deps/auth-scram.Po
> ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute
> -Wformat-security -fno-strict-aliasing -fwrapv
> -fexcess-precision=standard -g -O0 -fno-omit-frame-pointer
> -I../../../src/include  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o
> be-secure-openssl.o be-secure-openssl.c -MMD -MP -MF
> .deps/be-secure-openssl.Po
> /usr/lib64/ccache/clang -Wno-ignored-attributes -fno-strict-aliasing
> -fwrapv -O2  -I../../../src/include  -D_GNU_SOURCE
> -I/usr/include/libxml2  -flto=thin -emit-llvm -c -o be-fsstubs.bc
> be-fsstubs.c
> /usr/lib64/ccache/clang -Wno-ignored-attributes -fno-strict-aliasing
> -fwrapv -O2  -I../../../src/include  -D_GNU_SOURCE
> -I/usr/include/libxml2  -flto=thin -emit-llvm -c -o namespace.bc namespace.c
> 
> I would expect LLVM to be isolated to the jit/ hierarchy.

Clang is needed to emit the LLVM bitcode required for inlining. The "-emit-
llvm" flag is used for that. A dual compilation is required for inlining to 
work, one compilation with gcc/clang/msvc/… to build the postgresql binary, 
one with clang to generate the .bc files for inlining.
It can be surprising, but there is little way around that (or we accept only 
clang to build postgresql, but there would be a riot).

Re: ALTER TABLE does not check for column existence before starting operations

2018-03-02 Thread Pierre Ducroquet

On Friday, March 2, 2018 2:44:16 PM CET David Steele wrote:
> Hi Pierre,
> 
> On 3/2/18 6:36 AM, Pierre Ducroquet wrote:
> > While working on a big table recently, I noticed that ALTER TABLE does not
> > check for column existence in operations like SET NOT NULL before starting
> > working on the table, for instance adding a primary key.
> > It is thus possible, if a typo has been made, to generate a long lock and
> > a
> > lot of WAL that will serve no purpose since the whole transaction will be
> > discarded.
> > 
> > For example :
> > 
> > toto=# alter table test add primary key(i), alter column typo set not
> > null;
> > ERROR:  column "typo" of relation "test" does not exist
> > Time: 10.794 s
> > 
> > The attached patch fixes this behaviour by adding a small check in the
> > first pass of alter table to make sure that a column referenced by an
> > alter command exists first. It also checks if the column is added by
> > another alter sub- command. It does not handle every scenario (dropping a
> > column and then altering it for instance), these are left to the exec
> > code to exclude. The patch has been checked with make check, and I see no
> > documentation change to do since this does not alter any existing
> > documented behaviour.
> This looks like a good idea.  However, the last CF for PG11 is in
> progress so it might be difficult to attract much comment/review right now.
> 
> I recommend entering this patch in the 2018-09 CF so it doesn't get lost.

Hi

Thanks for the answer.
I saw that bug two days ago but I had no time then to do the patch. Had I seen 
the CF window was that close I would have hurried up… Heh, this will just wait 
a few months. I will enter it in the 2018-09 CF as soon as it opens.

Regards

 Pierre

ALTER TABLE does not check for column existence before starting operations

2018-03-02 Thread Pierre Ducroquet

Hi

While working on a big table recently, I noticed that ALTER TABLE does not 
check for column existence in operations like SET NOT NULL before starting 
working on the table, for instance adding a primary key.
It is thus possible, if a typo has been made, to generate a long lock and a 
lot of WAL that will serve no purpose since the whole transaction will be 
discarded.

For example :

toto=# alter table test add primary key(i), alter column typo set not null;
ERROR:  column "typo" of relation "test" does not exist
Time: 10.794 s


The attached patch fixes this behaviour by adding a small check in the first 
pass of alter table to make sure that a column referenced by an alter command 
exists first. It also checks if the column is added by another alter sub-
command. It does not handle every scenario (dropping a column and then 
altering it for instance), these are left to the exec code to exclude.
The patch has been checked with make check, and I see no documentation change 
to do since this does not alter any existing documented behaviour.


Regards

 Pierre
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 74e020bffc..48efb6ef5b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -352,6 +352,7 @@ static void ATRewriteTables(AlterTableStmt *parsetree,
 List **wqueue, LOCKMODE lockmode);
 static void ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode);
 static AlteredTableInfo *ATGetQueueEntry(List **wqueue, Relation rel);
+static void ATColumnExists(Relation rel, const char *colName, AlteredTableInfo *tab);
 static void ATSimplePermissions(Relation rel, int allowed_targets);
 static void ATWrongRelkindError(Relation rel, int allowed_targets);
 static void ATSimpleRecursion(List **wqueue, Relation rel,
@@ -3592,6 +3593,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			 */
 			ATSimplePermissions(rel, ATT_TABLE | ATT_VIEW | ATT_FOREIGN_TABLE);
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
+			ATColumnExists(rel, cmd->name, tab);
 			/* No command-specific prep needed */
 			pass = cmd->def ? AT_PASS_ADD_CONSTR : AT_PASS_DROP;
 			break;
@@ -3611,6 +3613,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE);
 			ATPrepDropNotNull(rel, recurse, recursing);
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
+			ATColumnExists(rel, cmd->name, tab);
 			/* No command-specific prep needed */
 			pass = AT_PASS_DROP;
 			break;
@@ -3618,6 +3621,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE);
 			ATPrepSetNotNull(rel, recurse, recursing);
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
+			ATColumnExists(rel, cmd->name, tab);
 			/* No command-specific prep needed */
 			pass = AT_PASS_ADD_CONSTR;
 			break;
@@ -3630,12 +3634,14 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 		case AT_SetOptions:		/* ALTER COLUMN SET ( options ) */
 		case AT_ResetOptions:	/* ALTER COLUMN RESET ( options ) */
 			ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW | ATT_INDEX | ATT_FOREIGN_TABLE);
+			ATColumnExists(rel, cmd->name, tab);
 			/* This command never recurses */
 			pass = AT_PASS_MISC;
 			break;
 		case AT_SetStorage:		/* ALTER COLUMN SET STORAGE */
 			ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW | ATT_FOREIGN_TABLE);
 			ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
+			ATColumnExists(rel, cmd->name, tab);
 			/* No command-specific prep needed */
 			pass = AT_PASS_MISC;
 			break;
@@ -3643,6 +3649,10 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			ATSimplePermissions(rel,
 ATT_TABLE | ATT_COMPOSITE_TYPE | ATT_FOREIGN_TABLE);
 			ATPrepDropColumn(wqueue, rel, recurse, recursing, cmd, lockmode);
+			if (!cmd->missing_ok)
+			{
+ATColumnExists(rel, cmd->name, tab);
+			}
 			/* Recursion occurs during execution phase */
 			pass = AT_PASS_DROP;
 			break;
@@ -4850,6 +4860,54 @@ ATSimplePermissions(Relation rel, int allowed_targets)
 		RelationGetRelationName(rel;
 }
 
+/*
+ * ATColumnExists
+ *
+ * - Ensure that the targeted column exists
+ */
+static void
+ATColumnExists(Relation rel, const char *colName, AlteredTableInfo *tab)
+{
+	HeapTuple	tuple;
+	List		*subcmds;
+	bool		column_found = false;
+	tuple = SearchSysCacheCopyAttName(RelationGetRelid(rel), colName);
+
+	if (HeapTupleIsValid(tuple))
+	{
+		column_found = true;
+	}
+	else
+	{
+		/* Check that the column is not added in a previous operation */
+		subcmds = tab->subcmds[AT_PASS_ADD_COL];
+		if (subcmds != NULL)
+		{
+			ListCell   *lcmd;
+			ColumnDef  *coldef;
+
+			foreach(lcmd, subcmds)
+			{
+AlterTableCmd *cmd = castNode(AlterTableCmd, lfirst(lcmd));
+coldef = (ColumnDef *) cmd->def;
+if (strcmp(coldef->colname, colName) == 0)
+{
+	column_found = true;
+	break;
+

Re: Poor plan when using EXISTS in the expression list

2018-10-04 Thread Pierre Ducroquet

On Thursday, October 4, 2018 4:46:26 PM CEST Geoff Winkless wrote:
> On Thu, 4 Oct 2018 at 13:11, Pierre Ducroquet  wrote:
> > Our developpers ORM (Django's) sadly can not use EXISTS in the where
> > clauses
> > without having it in the expression part of the SELECT statement.
> 
> I don't know if this will be helpful to you (and I appreciate there's still
> the underlying PG issue), but there's a suggestion here that you can work
> around this using .extra()
> 
> https://stackoverflow.com/a/38880144/321161

Sure this helps when you know the trap and don't use the Exist support in 
Django, but this still mean any developer with Django may create a query that, 
on small volumes, will be a bit slow, and will blow up on big volumes. We 
sadly can not monitor every piece of code written by developers or imported in 
the dependencies.

Poor plan when using EXISTS in the expression list

2018-10-04 Thread Pierre Ducroquet

Hello

Our developpers ORM (Django's) sadly can not use EXISTS in the where clauses 
without having it in the expression part of the SELECT statement.
I was expecting it to create queries performing a bit worse than queries 
without this useless expression, but it turns out this trigger an extremely 
poor planning, with an additional Seq Scan of the table referenced in EXISTS.
Thus the query select a.*, exists (select * from b where a_id = a.id) from a 
where exists (select * from b where a_id = a.id); can be orders of magnitude 
slower than select a.* from a where exists (select * from b where a_id = 
a.id);

This has been reproduced on PostgreSQL 9.6 and 11 beta4.

Example :

test=> create table a (id serial primary key, b text);   
CREATE TABLE

test=> create table b (id serial primary key, a_id integer not null references 
a(id), c text);  
CREATE TABLE

test=> explain select a.* from a where exists (select * from b  where a_id = 
a.id); 
  QUERY PLAN   
---
 Hash Join  (cost=29.50..62.60 rows=635 width=36)
   Hash Cond: (a.id = b.a_id)
   ->  Seq Scan on a  (cost=0.00..22.70 rows=1270 width=36)
   ->  Hash  (cost=27.00..27.00 rows=200 width=4)
 ->  HashAggregate  (cost=25.00..27.00 rows=200 width=4)
   Group Key: b.a_id
   ->  Seq Scan on b  (cost=0.00..22.00 rows=1200 width=4)
(7 rows)

test=> explain select a.*, exists (select * from b where a_id = a.id) from a;
   QUERY PLAN
-
 Seq Scan on a  (cost=0.00..5314.37 rows=1270 width=37)
   SubPlan 1
 ->  Seq Scan on b  (cost=0.00..25.00 rows=6 width=0)
   Filter: (a_id = a.id)
   SubPlan 2
 ->  Seq Scan on b b_1  (cost=0.00..22.00 rows=1200 width=4)
(6 rows)

test=> explain select a.*, exists (select * from b where a_id = a.id)  from a 
where exists (select * from b  where a_id = a.id); 
  QUERY PLAN   
---
 Hash Join  (cost=29.50..2708.43 rows=635 width=37)
   Hash Cond: (a.id = b.a_id)
   ->  Seq Scan on a  (cost=0.00..22.70 rows=1270 width=36)
   ->  Hash  (cost=27.00..27.00 rows=200 width=4)
 ->  HashAggregate  (cost=25.00..27.00 rows=200 width=4)
   Group Key: b.a_id
   ->  Seq Scan on b  (cost=0.00..22.00 rows=1200 width=4)
   SubPlan 1
 ->  Seq Scan on b b_1  (cost=0.00..25.00 rows=6 width=0)
   Filter: (a_id = a.id)
   SubPlan 2
 ->  Seq Scan on b b_2  (cost=0.00..22.00 rows=1200 width=4)
(12 rows)


Thanks

 Pierre

[Patch] Invalid permission check in pg_stats for functional indexes

2019-04-06 Thread Pierre Ducroquet

Hi

When using a functional index on a table, we realized that the permission 
check done in pg_stats was incorrect and thus preventing valid access to the 
statistics from users.

How to reproduce:

create table tbl1 (a integer, b integer);
insert into tbl1 select x, x % 50 from generate_series(1, 20) x;
create index on tbl1 using btree ((a % (b + 1)));
analyze ;

create user demo_priv encrypted password 'demo';
revoke ALL on SCHEMA public from PUBLIC ;
grant select on tbl1 to demo_priv;
grant usage on schema public to demo_priv;

And as demo_priv user:

select tablename, attname from pg_stats where tablename like 'tbl1%';

Returns:
 tablename | attname 
---+-
 tbl1  | a
 tbl1  | b
(2 rows)


Expected:
   tablename   | attname 
---+-
 tbl1  | a
 tbl1  | b
 tbl1_expr_idx | expr
(3 rows)


The attached patch fixes this by introducing a second path in privilege check 
in pg_stats view.
I have not written a regression test yet, mainly because I'm not 100% certain 
where to write it. Given some hints, I would happily add it to this patch.

Regards

 Pierre Ducroquet
>From c2f638a9491e103311161208715dfcbcb55a2fbd Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Sat, 6 Apr 2019 13:22:29 +0200
Subject: [PATCH] Use a different permission check path for indexes and
 relations in pg_stats

pg_statistic contains information about both table attributes and functional indexes,
but the permission check done in pg_stats was only valid for table attributes.
This patch thus implements a seperate permission check for functional indexes, that
verifies the access for each attribute contained in the index.
---
 src/backend/catalog/system_views.sql | 10 +-
 src/test/regress/expected/rules.out  |  4 +++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 72f786d6f8..a1e8ea969c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -248,7 +248,15 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
  JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
  LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
 WHERE NOT attisdropped
-AND has_column_privilege(c.oid, a.attnum, 'select')
+AND (c.relkind = 'r' AND has_column_privilege(c.oid, a.attnum, 'select'::text)
+OR
+(c.relkind = 'i' AND NOT(EXISTS(
+SELECT 1 FROM pg_depend
+WHERE pg_depend.objid = c.oid
+AND pg_depend.refobjsubid > 0
+AND NOT has_column_privilege(pg_depend.refobjid, pg_depend.refobjsubid::smallint, 'select'::text)))
+)
+)
 AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
 
 REVOKE ALL on pg_statistic FROM public;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index bf7fca54ee..34953ae4f4 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2262,7 +2262,9 @@ pg_stats| SELECT n.nspname AS schemaname,
  JOIN pg_class c ON ((c.oid = s.starelid)))
  JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum
  LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
-  WHERE ((NOT a.attisdropped) AND has_column_privilege(c.oid, a.attnum, 'select'::text) AND ((c.relrowsecurity = false) OR (NOT row_security_active(c.oid;
+  WHERE ((NOT a.attisdropped) AND (((c.relkind = 'r'::"char") AND has_column_privilege(c.oid, a.attnum, 'select'::text)) OR ((c.relkind = 'i'::"char") AND (NOT (EXISTS ( SELECT 1
+   FROM pg_depend
+  WHERE ((pg_depend.objid = c.oid) AND (pg_depend.refobjsubid > 0) AND (NOT has_column_privilege(pg_depend.refobjid, (pg_depend.refobjsubid)::smallint, 'select'::text AND ((c.relrowsecurity = false) OR (NOT row_security_active(c.oid;
 pg_tables| SELECT n.nspname AS schemaname,
 c.relname AS tablename,
 pg_get_userbyid(c.relowner) AS tableowner,
-- 
2.20.1

Re: Row Level Security − leakproof-ness and performance implications

2019-02-21 Thread Pierre Ducroquet

On Wednesday, February 20, 2019 5:24:17 PM CET Tom Lane wrote:
> Pierre Ducroquet  writes:
> > For simple functions like enum_eq/enum_ne, marking them leakproof is an
> > obvious fix (patch attached to this email, including also textin/textout).
> 
> This is not nearly as "obvious" as you think.  See prior discussions,
> notably
> https://www.postgresql.org/message-id/flat/31042.1546194242%40sss.pgh.pa.us
> 
> Up to now we've taken a very strict definition of what leakproofness
> means; as Noah stated, if a function can throw errors for some inputs,
> it's not considered leakproof, even if those inputs should never be
> encountered in practice.  Most of the things we've been willing to
> mark leakproof are straight-line code that demonstrably can't throw
> any errors at all.
> 
> The previous thread seemed to have consensus that the hazards in
> text_cmp and friends were narrow enough that nobody had a practical
> problem with marking them leakproof --- but we couldn't define an
> objective policy statement that would allow making such decisions,
> so nothing's been changed as yet.  I think it is important to have
> an articulable policy about this, not just a seat-of-the-pants
> conclusion about the risk level in a particular function.
> 
>   regards, tom lane


I undestand these decisions, but it makes RLS quite fragile, with numerous un-
documented side-effects. In order to save difficulties from future users, I 
wrote this patch to the documentation, listing the biggest restrictions I hit 
with RLS so far.

Regards

 Pierre
>From 050c9777cbe417bd53a17043b24928ba5c037250 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Thu, 21 Feb 2019 15:46:59 +0100
Subject: [PATCH] Document some of the row-level security limitations

RLS relies a lot on marking functions (and thus operators) LEAKPROOF.
The current policy in PostgreSQL is extremely strict, not allowing
functions that could, for instance, leak the input size through a
malloc() call. While strong on the security side, this policy has
side effects that are not documented currently and make RLS much
harder to implement than simply adding policies on the tables.
---
 doc/src/sgml/ref/create_policy.sgml | 30 -
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/create_policy.sgml b/doc/src/sgml/ref/create_policy.sgml
index 2e1229c4f9..372b2935ea 100644
--- a/doc/src/sgml/ref/create_policy.sgml
+++ b/doc/src/sgml/ref/create_policy.sgml
@@ -595,7 +595,35 @@ AND
user-defined functions which might not be trustworthy.  However,
functions and operators marked by the system (or the system
administrator) as LEAKPROOF may be evaluated before
-   policy expressions, as they are assumed to be trustworthy.
+   policy expressions, as they are assumed to be trustworthy. Please note that
+   marking functions as LEAKPROOF on the default pg_catalog
+   is done very carefully, thus preventing at least the following features to 
+   work or to achieve the usually expected level of performance in default 
+   settings:
+
+   
+
+ 
+  for the following types, most operators and functions are not marked
+  as LEAKPROOF: arrays, enums, ranges
+ 
+
+
+ 
+  full-text search: operators and functions are not
+  LEAKPROOF,
+ 
+
+
+ 
+  functional indexes on non-leakproof functions are not
+  considered when executing queries and enforcing policies.
+ 
+
+   
+
+   Any query using these features on a table with a policy can not use 
+   indexes other than the ones required to enforce the policy.
   
 
   
-- 
2.20.1

Re: Row Level Security − leakproof-ness and performance implications

2019-02-19 Thread Pierre Ducroquet

On Wednesday, February 20, 2019 12:43:50 AM CET Laurenz Albe wrote:
> Pierre Ducroquet wrote:
> > In order to increase our security, we have started deploying row-level
> > security in order to add another safety net if any issue was to happen in
> > our applications.
> > After a careful monitoring of our databases, we found out that a lot of
> > queries started to go south, going extremely slow.
> > The root of these slowdowns is that a lot of the PostgreSQL functions are
> > not marked as leakproof, especially the ones used for operators.
> > In current git master, the following query returns 258 functions that are
> > used by operators returning booleans and not marked leakproof:
> > 
> > SELECT proname FROM pg_proc
> > 
> > WHERE exists(select 1 from pg_operator where oprcode::name =
> > proname)
> > AND NOT(proleakproof)
> > AND prorettype = 16;
> > 
> > Among these, if you have for instance a table like:
> > create table demo_enum (
> > 
> > username text,
> > flag my_enum
> > 
> > );
> > 
> > With an index on (username, flag), the index will only be used on username
> > because flag is an enum and the enum_eq function is not marked leakproof.
> > 
> > For simple functions like enum_eq/enum_ne, marking them leakproof is an
> > obvious fix (patch attached to this email, including also textin/textout).
> > And
> The enum_eq part looks totally safe, but the text functions allocate memory,
> so you could create memory pressure, wait for error messages that tell you
> the size of the allocation that failed and this way learn about the data.
> 
> Is this a paranoid idea?

This is not paranoid, it depends on your threat model.
In the model we implemented, the biggest threat we consider from an user point 
of view is IDOR (Insecure Direct Object Reference): a developer forgetting to 
check that its input is sane and matches the other parts of the URL or the 
current session user. Exploiting the leak you mentioned in such a situation is 
almost impossible, and to be honest the attacker has much easier targets if he 
gets to that level…
Maybe leakproof is too simple? Should we be able to specify a 'paranoid-level' 
to allow some leaks depending on our threat model?

> > if we had a 'RLS-enabled' context on functions, a way to make a lot of
> > built- in functions leakproof would simply be to prevent them from
> > logging errors containing values.
> > 
> > For others, like arraycontains, it's much trickier :[...]
> 
> I feel a little out of my depth here, so I won't comment.
> 
> > A third issue we noticed is the usage of functional indexes. If you create
> > an index on, for instance, (username, leaky_function(my_field)), and then
> > search on leaky_functon(my_field) = 42, the functional index can be used
> > only if leaky_function is marked leakproof, even if it is never going to
> > be executed on invalid rows thanks to the index. After a quick look at
> > the source code, it also seems complicated to implement since the
> > decision to reject potential dangerous calls to leaky_function is done
> > really early in the process, before the optimizer starts.
> 
> If there is a bitmap index scan, the condition will be rechecked, so the
> function will be executed.

That's exactly my point: using a bitmap index scan would be dangerous and thus 
should not be allowed, but an index scan works fine. Or the recheck on bitmap 
scan must first check the RLS condition before doing its check.

> > I am willing to spend some time on these issues, but I have no specific
> > knowledge of the planner and optimizer, and I fear proper fixes would
> > require much digging there. If you think it would be appropriate for
> > functions like arraycontains or range_contains to require the 'inner'
> > comparison function to be leakproof, or just keep looking at the other
> > functions in pg_proc and leakproof the ones that can be, I would be happy
> > to write the corresponding patches.
> 
> Thanks, and I think that every function that can safely be marked leakproof
> is a progress!

Thank you for your comments

Row Level Security − leakproof-ness and performance implications

2019-02-19 Thread Pierre Ducroquet

Hello

In order to increase our security, we have started deploying row-level 
security in order to add another safety net if any issue was to happen in our 
applications.
After a careful monitoring of our databases, we found out that a lot of 
queries started to go south, going extremely slow.
The root of these slowdowns is that a lot of the PostgreSQL functions are not 
marked as leakproof, especially the ones used for operators.
In current git master, the following query returns 258 functions that are used 
by operators returning booleans and not marked leakproof:

SELECT proname FROM pg_proc 
WHERE exists(select 1 from pg_operator where oprcode::name = proname) 
AND NOT(proleakproof) 
AND prorettype = 16;


Among these, if you have for instance a table like:
create table demo_enum (
username text,
flag my_enum
);

With an index on (username, flag), the index will only be used on username 
because flag is an enum and the enum_eq function is not marked leakproof.

For simple functions like enum_eq/enum_ne, marking them leakproof is an 
obvious fix (patch attached to this email, including also textin/textout). And 
if we had a 'RLS-enabled' context on functions, a way to make a lot of built-
in functions leakproof would simply be to prevent them from logging errors 
containing values.

For others, like arraycontains, it's much trickier : arraycontains can be 
leakproof only if the comparison function of the inner type is leakproof too. 
This raises some interesting issues, like arraycontains being marked as strict 
parallel safe while there is nothing making it mandatory for the inner type.
In order to optimize queries on a table like:
create table demo_array (username text, my_array int[]);
one would need to mark arraycontains leakproof, thus implying that any 
comparison operator must be leakproof too? Another possibility, not that 
complicated, would be to create specific-types entries in pg_proc for each 
type that has a leakproof comparison operator. Or the complicated path would 
be to have a 'dynamic' leakproof for functions like arraycontains that depend 
on the inner type, but since this is determined at planning, it seems very 
complicated to implement.

A third issue we noticed is the usage of functional indexes. If you create an 
index on, for instance, (username, leaky_function(my_field)), and then search 
on leaky_functon(my_field) = 42, the functional index can be used only if 
leaky_function is marked leakproof, even if it is never going to be executed 
on invalid rows thanks to the index. After a quick look at the source code, it 
also seems complicated to implement since the decision to reject potential 
dangerous calls to leaky_function is done really early in the process, before 
the optimizer starts.


I am willing to spend some time on these issues, but I have no specific 
knowledge of the planner and optimizer, and I fear proper fixes would require 
much digging there. If you think it would be appropriate for functions like 
arraycontains or range_contains to require the 'inner' comparison function to 
be leakproof, or just keep looking at the other functions in pg_proc and 
leakproof the ones that can be, I would be happy to write the corresponding 
patches.


Best regards

>From a4dd6eb9a16e7e2d2e44b03c5a04c485841aab25 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Tue, 19 Feb 2019 17:21:12 +0100
Subject: [PATCH] Mark textin/out and enum_eq/ne as leakproof

---
 src/include/catalog/pg_proc.dat | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index a4e173b484..d4ac402835 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -95,10 +95,10 @@
   prorettype => 'regprocedure', proargtypes => 'text',
   prosrc => 'to_regprocedure' },
 { oid => '46', descr => 'I/O',
-  proname => 'textin', prorettype => 'text', proargtypes => 'cstring',
+  proname => 'textin', proleakproof => 't', prorettype => 'text', proargtypes => 'cstring',
   prosrc => 'textin' },
 { oid => '47', descr => 'I/O',
-  proname => 'textout', prorettype => 'cstring', proargtypes => 'text',
+  proname => 'textout', proleakproof => 't', prorettype => 'cstring', proargtypes => 'text',
   prosrc => 'textout' },
 { oid => '48', descr => 'I/O',
   proname => 'tidin', prorettype => 'tid', proargtypes => 'cstring',
@@ -8339,10 +8339,10 @@
   proname => 'enum_out', provolatile => 's', prorettype => 'cstring',
   proargtypes => 'anyenum', prosrc => 'enum_out' },
 { oid => '3508',
-  proname => 'enum_eq', prorettype => 'bool', proargtypes => 'anyenum anyenum',
+  proname => 'enum_eq', proleakproof => 't', prorettype => 'bool', proargtypes => 'anyenum anyenum',
   prosrc => 'enum_eq' },
 { oid => '3509',
-  proname => 'enum_ne',

[Patch] Add a reset_computed_values function in pg_stat_statements

2019-09-01 Thread Pierre Ducroquet

Hello

pg_stat_statements is a great tool to track performance issue in live 
databases, especially when adding interfaces like PoWA on top of it.
But so far, tools like PoWA can not track the min_time, max_time, mean_time 
and sum_var_time of queries : these statistics are cumulated over time, 
fetching points in time would be of little to no use, especially when looking 
at the impact of a DDL change.
This patch thus introduces a simple pg_stat_statements_reset_computed_values 
function that reset the computed statistics, leaving all other information 
alive, thus allowing the aforementioned scenario.

Regards

 Pierre Ducroquet>From 5e6d16d738e5279968321c5f3695e72ded2432db Mon Sep 17 00:00:00 2001
From: Pierre 
Date: Thu, 14 Feb 2019 14:37:48 +0100
Subject: [PATCH] Add a function to reset the cumulative statistics

pg_stat_statements has two parts : raw statistics that are simple 'stable'
counters, and computed statistics (averages, min time, max time...)

When using pg_stat_statements to find and fix performance issues, being
able to reset the computed statistics can help track the issue and the
impact of the fixes.
This would also make it possible for tools like powa to collect these
statistics too and track them over time by resetting them after each
collection.
---
 .../pg_stat_statements--1.7--1.8.sql  | 11 
 .../pg_stat_statements/pg_stat_statements.c   | 52 +--
 .../pg_stat_statements.control|  2 +-
 3 files changed, 61 insertions(+), 4 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql

diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 00..7690a9ceba
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,11 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+CREATE FUNCTION pg_stat_statements_reset_computed_values()
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C PARALLEL SAFE;
+
+
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 221b47298c..3a6c227a80 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -150,6 +150,7 @@ typedef struct Counters
 	double		max_time;		/* maximum execution time in msec */
 	double		mean_time;		/* mean execution time in msec */
 	double		sum_var_time;	/* sum of variances in execution time in msec */
+	int64		computed_calls;		/* # of times executed considered for the previous computed values */
 	int64		rows;			/* total # of retrieved or affected rows */
 	int64		shared_blks_hit;	/* # of shared buffer hits */
 	int64		shared_blks_read;	/* # of shared disk blocks read */
@@ -289,6 +290,7 @@ static bool pgss_save;			/* whether to save stats across shutdown */
 void		_PG_init(void);
 void		_PG_fini(void);
 
+PG_FUNCTION_INFO_V1(pg_stat_statements_reset_computed_values);
 PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
 PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
@@ -1252,8 +1254,9 @@ pgss_store(const char *query, uint64 queryId,
 			e->counters.usage = USAGE_INIT;
 
 		e->counters.calls += 1;
+		e->counters.computed_calls += 1;
 		e->counters.total_time += total_time;
-		if (e->counters.calls == 1)
+		if (e->counters.computed_calls == 1)
 		{
 			e->counters.min_time = total_time;
 			e->counters.max_time = total_time;
@@ -1268,7 +1271,7 @@ pgss_store(const char *query, uint64 queryId,
 			double		old_mean = e->counters.mean_time;
 
 			e->counters.mean_time +=
-(total_time - old_mean) / e->counters.calls;
+(total_time - old_mean) / e->counters.computed_calls;
 			e->counters.sum_var_time +=
 (total_time - old_mean) * (total_time - e->counters.mean_time);
 
@@ -1324,7 +1327,7 @@ pg_stat_statements_reset_1_7(PG_FUNCTION_ARGS)
 }
 
 /*
- * Reset statement statistics.
+ * Reset all statement statistics.
  */
 Datum
 pg_stat_statements_reset(PG_FUNCTION_ARGS)
@@ -1334,6 +1337,49 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/*
+ * Reset computed statistics from all statements.
+ */
+Datum
+pg_stat_statements_reset_computed_values(PG_FUNCTION_ARGS)
+{
+	if (!pgss || !pgss_hash)
+		ereport(ERROR,
+(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("pg_stat_statements must be loaded via shared_preload_libraries")));
+	entry_reset_computed();
+	PG_RETURN_VOID();
+}
+
+
+/*
+ * Reset statement computed statistics.
+ * This takes a shared lock only on the hash table, and a lock per entry
+ */
+static void
+entry_reset_computed(void)
+{
+	HASH_SEQ_STATUS

Re: [Patch] Invalid permission check in pg_stats for functional indexes

2019-09-03 Thread Pierre Ducroquet

On Tuesday, September 3, 2019 12:39:51 PM CEST Kuntal Ghosh wrote:
> Hello Pierre,

Hello Kuntal
> 
> > When using a functional index on a table, we realized that the permission
> > check done in pg_stats was incorrect and thus preventing valid access to
> > the statistics from users.
> > 
> > The attached patch fixes this by introducing a second path in privilege
> > check in pg_stats view.
> 
> The patch doesn't apply on the latest HEAD [1].

All my apologies for that. I submitted this patch some time ago but forgot to 
add it to the commit fest. Attached to this mail is a rebased version.

> IIUC, the patch introduces an additional privilege check for the
> underlying objects involved in the expression/functional index. If the
> user has 'select' privileges on all of the columns/objects included in
> the expression/functional index, then it should be visible in pg_stats
> view. I've applied the patch manually and tested the feature. It works
> as expected.

Indeed, you understood correctly. I have not digged around to find out the 
origin of the current situation, but it does not look like an intentional 
behaviour, more like a small oversight.

> > I have not written a regression test yet, mainly because I'm not 100%
> > certain where to write it. Given some hints, I would happily add it to
> > this patch.
> Yeah, it'll be good to have some regression tests for the same. I'm
> also not sure which regression file best suites for these tests.



Thank you very much for your review

 Pierre
>From d1d99104f8f157257a30cb9bf9cfea1cc467b1a7 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Sat, 6 Apr 2019 13:22:29 +0200
Subject: [PATCH] Use a different permission check path for indexes and
 relations in pg_stats

pg_statistic contains information about both table attributes and functional indexes,
but the permission check done in pg_stats was only valid for table attributes.
This patch thus implements a seperate permission check for functional indexes, that
verifies the access for each attribute contained in the index.
---
 src/backend/catalog/system_views.sql | 10 +-
 src/test/regress/expected/rules.out  |  4 +++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..07bdfc12ac 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -248,7 +248,15 @@ CREATE VIEW pg_stats WITH (security_barrier) AS
  JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
  LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
 WHERE NOT attisdropped
-AND has_column_privilege(c.oid, a.attnum, 'select')
+AND (c.relkind = 'r' AND has_column_privilege(c.oid, a.attnum, 'select'::text)
+OR
+(c.relkind = 'i' AND NOT(EXISTS(
+SELECT 1 FROM pg_depend
+WHERE pg_depend.objid = c.oid
+AND pg_depend.refobjsubid > 0
+AND NOT has_column_privilege(pg_depend.refobjid, pg_depend.refobjsubid::smallint, 'select'::text)))
+)
+)
 AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
 
 REVOKE ALL on pg_statistic FROM public;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd146..9f5679314a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -2283,7 +2283,9 @@ pg_stats| SELECT n.nspname AS schemaname,
  JOIN pg_class c ON ((c.oid = s.starelid)))
  JOIN pg_attribute a ON (((c.oid = a.attrelid) AND (a.attnum = s.staattnum
  LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace)))
-  WHERE ((NOT a.attisdropped) AND has_column_privilege(c.oid, a.attnum, 'select'::text) AND ((c.relrowsecurity = false) OR (NOT row_security_active(c.oid;
+  WHERE ((NOT a.attisdropped) AND (((c.relkind = 'r'::"char") AND has_column_privilege(c.oid, a.attnum, 'select'::text)) OR ((c.relkind = 'i'::"char") AND (NOT (EXISTS ( SELECT 1
+   FROM pg_depend
+  WHERE ((pg_depend.objid = c.oid) AND (pg_depend.refobjsubid > 0) AND (NOT has_column_privilege(pg_depend.refobjid, (pg_depend.refobjsubid)::smallint, 'select'::text AND ((c.relrowsecurity = false) OR (NOT row_security_active(c.oid;
 pg_stats_ext| SELECT cn.nspname AS schemaname,
 c.relname AS tablename,
 sn.nspname AS statistics_schemaname,
-- 
2.23.0

Re: [Patch] Add a reset_computed_values function in pg_stat_statements

2019-09-03 Thread Pierre Ducroquet

On Monday, September 2, 2019 3:25:00 AM CEST Euler Taveira wrote:
> Em dom, 1 de set de 2019 às 06:04, Pierre Ducroquet
> 
>  escreveu:
> > Hello
> > 
> > pg_stat_statements is a great tool to track performance issue in live
> > databases, especially when adding interfaces like PoWA on top of it.
> > But so far, tools like PoWA can not track the min_time, max_time,
> > mean_time
> > and sum_var_time of queries : these statistics are cumulated over time,
> > fetching points in time would be of little to no use, especially when
> > looking at the impact of a DDL change.
> > This patch thus introduces a simple
> > pg_stat_statements_reset_computed_values function that reset the computed
> > statistics, leaving all other information alive, thus allowing the
> > aforementioned scenario.
> 

Hello

Thank you for reviewing this patch.

> Pierre, I don't understand why you want to reset part of the statement
> statistics. If you want the minimal query time every week, reset all
> statistics of that statement (v12 or later). Postgres provides
> histogram metrics (min, max, mean, stddev). AFAICS you want a metric
> type called timer (combination of histogram and meter). For example,
> calls, sum, min, max, mean, standard deviation will be calculated each
> hour. If I were to add a new metric to pg_stat_statements, it would be
> last_time. Compare histogram metrics with the last statement time is
> useful to check if a certain optimization was effective.

min/max/mean timing in pg_stat_statements are *computed* fields, not simple 
cumulative ones, hence the distinction I do here. For a tool like PoWA, that 
(to simplify) copies all the statistics every few minutes to do its own 
computation, it makes it impossible to get any sense out of them. The facility 
to reset all statistics is an option, but it feels like an heavy solution when 
only some statistics are different from the others.

> Talking about your patch, math is wrong. If you don't reset
> total_time, all computed values will be incorrect. You are changing
> the actual meaning of those metrics and I think it is unacceptable
> because it will break applications. 

I think you misunderstood the way this patch is written.
total_time, in this function, is not a global statistics: pgss_store receive 
the total time of the current query. When the computed statistics are reset, 
we simply drop them and start them fresh from the first query. There is 
absolutely no compatibility impact here.

> Instead, you should provide (i) new counters or (ii) add GUC to control this
> behavior. It is a trade-off between incompatibility and too many metrics.
> Though I wouldn't like to break compatibility (as I said you can always
> reset all statement statistics). 

As explained: it does not break compatibility, I don't see a need for a new 
GUC.

> Why don't you provide a pg_stat_statements_reset_computed_values(userid Oid,
> dbid Oid, queryid bigint)?

Because I did not need one, and with pg_stat_statements_reset(userid Oid, dbid 
Oid, queryid bigint) being newly added, I did not think about it.
I can add one in a new version of the patch of course.

> You forgot to provide documentation for the new function.

Indeed, I am not very used to SGML and first wanted a discussion about the 
feature.

> You should revoke pg_stat_statements_reset_computed_values from PUBLIC.

Indeed, I missed that one.

Attached to this mail is a new version fixing the revoke issue, and adding 
documentation.


Regards

 Pierre>From 4e20277aad50e8de322f19d7c18e86dead436e0f Mon Sep 17 00:00:00 2001
From: Pierre 
Date: Thu, 14 Feb 2019 14:37:48 +0100
Subject: [PATCH] Add a function to reset the cumulative statistics

pg_stat_statements has two parts : raw statistics that are simple 'stable'
counters, and computed statistics (averages, min time, max time...)

When using pg_stat_statements to find and fix performance issues, being
able to reset the computed statistics can help track the issue and the
impact of the fixes.
This would also make it possible for tools like powa to collect these
statistics too and track them over time by resetting them after each
collection.
---
 .../pg_stat_statements--1.7--1.8.sql  | 12 +
 .../pg_stat_statements/pg_stat_statements.c   | 51 +--
 .../pg_stat_statements.control|  2 +-
 doc/src/sgml/pgstatstatements.sgml| 25 +
 4 files changed, 86 insertions(+), 4 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql

diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 00..351187e0a1
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,12 @@
+/* contrib/pg_

[Patch] optimizer - simplify $VAR1 IS NULL AND $VAR1 IS NOT NULL

2019-11-06 Thread Pierre Ducroquet

Hello

In several queries relying on views, I noticed that the optimizer miss a quite 
simple to implement optimization. My views contain several branches, with 
different paths that are simplified by the caller of the view. This 
simplification is based on columns to be null or not.

Today, even with a single table, the following (silly) query is not optimized 
away:
SELECT * FROM test WHERE a IS NULL AND a IS NOT NULL;

In more complex cases, it of course isn't any better:
SELECT * FROM (
SELECT a, NULL::integer AS b FROM foo
  UNION ALL
SELECT a, b FROM bar WHERE b IS NOT NULL
) WHERE a = 1 AND b IS NULL;

The attached patch handles both situations. When flattening and simplifying 
the AND clauses, a list of the NullChecks is built, and subsequent NullChecks 
are compared to the list. If opposite NullChecks on the same variable are 
found, the whole AND is optimized away.
This lead to nice boosts, since instead of having 'never executed' branches, 
the optimizer can go even further. Right now, the algorithmic complexity of 
this optimization is not great: it is in O(n²), with n being the number of 
NullCheck in a given AND clause. But compared to the possible benefits, and 
the very low risk of n being high enough to have a real planification-time 
impact, I feel this optimization would be worth it.


Regards

 Pierre
>From 0a300b6fdd934daa6fb79e9bc67bdc2adfa3cc72 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Wed, 6 Nov 2019 18:05:01 +0100
Subject: [PATCH] Simplify AND clauses that have NOT NULL clauses on similar
 variables

It's not uncommon to have views with several branches, with fields that are null depending on the branch.
If the view is queried with an IS NULL or IS NOT NULL on such a field, huge simplification can be done.

This patch thus looks, when flattening and simplifying AND clauses, for opposite NOT NULL on similar variables.
If this happens, the whole AND clause is discarded and simplified into false.
---
 src/backend/optimizer/util/clauses.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index a04b62274d..2ba66a62a5 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -3781,6 +3781,7 @@ simplify_and_arguments(List *args,
 {
 	List	   *newargs = NIL;
 	List	   *unprocessed_args;
+	List	   *null_checks = NIL;
 
 	/* See comments in simplify_or_arguments */
 	unprocessed_args = list_copy(args);
@@ -3844,6 +3845,49 @@ simplify_and_arguments(List *args,
 			continue;
 		}
 
+		if (IsA(arg, NullTest))
+		{
+			NullTest	*null_check = (NullTest*) arg;
+
+			/* Only check for $VAR IS NULL, ignore other expressions */
+			if (!IsA(null_check->arg, Var))
+goto next;
+
+			if (null_checks == NIL)
+			{
+null_checks = lappend(null_checks, null_check);
+goto next;
+			}
+			else
+			{
+Var			*null_check_var = (Var *) null_check->arg;
+
+/* Now, we can check every other null check from our list */
+ListCell	*other_null;
+foreach (other_null, null_checks)
+{
+	NullTest	*other_null_check = (NullTest *) lfirst(other_null);
+	Var			*other_null_var = (Var *) other_null_check->arg;
+
+	if ((other_null_check->nulltesttype != null_check->nulltesttype) &&
+		(other_null_var->varno == null_check_var->varno) &&
+		(other_null_var->varattno == null_check_var->varattno))
+	{
+		*forceFalse = true;
+
+		/*
+		* Once we detect a FALSE result we can just exit the loop
+		* immediately.  However, if we ever add a notion of
+		* non-removable functions, we'd need to keep scanning.
+		*/
+		return NIL;
+	}
+}
+lappend(null_checks, null_check);
+			}
+		}
+
+next:
 		/* else emit the simplified arg into the result list */
 		newargs = lappend(newargs, arg);
 	}
-- 
2.24.0.rc2

Re: PATCH: add support for IN and @> in functional-dependency statistics use

2020-02-02 Thread Pierre Ducroquet

On Saturday, February 1, 2020 3:24:46 PM CET Tomas Vondra wrote:
> On Sat, Feb 01, 2020 at 08:51:04AM +0100, Pierre Ducroquet wrote:
> >Hello
> >
> >At my current job, we have a lot of multi-tenant databases, thus with
> >tables containing a tenant_id column. Such a column introduces a severe
> >bias in statistics estimation since any other FK in the next columns is
> >very likely to have a functional dependency on the tenant id. We found
> >several queries where this functional dependency messed up the estimations
> >so much the optimizer chose wrong plans.
> >When we tried to use extended statistics with CREATE STATISTIC on
> >tenant_id, other_id, we noticed that the current implementation for
> >detecting functional dependency lacks two features (at least in our use
> >case):
> >- support for IN clauses
> >- support for the array contains operator (that could be considered as a
> >special case of IN)
> 
> Thanks for the patch. I don't think the lack of support for these clause
> types is an oversight - we haven't done them because we were not quite
> sure the functional dependencies can actually apply to them. But maybe
> we can support them, I'm not against that in principle.
> 
> >After digging in the source code, I think the lack of support for IN
> >clauses is an oversight and due to the fact that IN clauses are
> >ScalarArrayOpExpr instead of OpExpr. The attached patch fixes this by
> >simply copying the code- path for OpExpr and changing the type name. It
> >compiles and the results are correct, with a dependency being correctly
> >taken into consideration when estimating rows. If you think such a copy
> >paste is bad and should be factored in another static bool function,
> >please say so and I will happily provide an updated patch.
> 
> Hmmm. Consider a query like this:
> 
>... WHERE tenant_id = 1 AND another_column IN (2,3)
> 
> which kinda contradicts the idea of functional dependencies that knowing
> a value in one column, tells us something about a value in a second
> column. But that assumes a single value, which is not quite true here.
> 
> The above query is essentially the same thing as
> 
>... WHERE (tenant_id=1 AND (another_column=2 OR another_column=3))
> 
> and also
> 
>... WHERE (tenant_id=1 AND another_column=2)
>   OR (tenant_id=1 AND another_column=3)
> 
> at wchich point we could apply functional dependencies - but we'd do it
> once for each AND-clause, and then combine the results to compute
> selectivity for the OR clause.
> 
> But this means that if we apply functional dependencies directly to the
> original clause, it'll be inconsistent. Which seems a bit unfortunate.
> 
> Or do I get this wrong?

In my tests, I've got a table with two columns a and b, generated this way:
CREATE TABLE test (a INT, b INT)
AS SELECT i, i/10 FROM 
  generate_series(1, 10) s(i),
  generate_series(1, 5) x;

With statistics defined on the a, b columns

Here are the estimated selectivity results without any patch:

SELECT * FROM test WHERE a = 1 AND b = 1 : 5
SELECT * FROM test WHERE a = 1 AND (b = 1 OR b = 2) : 1
SELECT * FROM test WHERE (a = 1 AND b = 1) OR (a = 1 AND b = 2) : 1
SELECT * FROM test WHERE a = 1 AND b IN (1, 2) : 1

With the patch, the estimated rows of the last query goes back to 5, which is 
more logical. The other ones don't change.

> BTW the code added in the 0001 patch is the same as for is_opclause, so
> maybe we can simply do
> 
>  if (is_opclause(rinfo->clause) ||
>  IsA(rinfo->clause, ScalarArrayOpExpr))
>  {
>  ...
>  }
> 
> instead of just duplicating the code.

I would love doing that, but the ScalarArrayOpExpr and OpExpr are not binary 
compatible for the members used here. In ScalarArrayOpExpr, on AMD64, args is 
at offset 24 and opno at 4, while they are at 32 and 4 in OpExpr. I can work 
around with this kind of code, but I don't like it much:
List *args;
Oid opno;
if (IsA(rinfo->clause, OpExpr))
{
/* If it's an opclause, we will check for Var = Const or Const = Var. */
OpExpr *expr = (OpExpr *) rinfo->clause;
args = expr->args;
opno = expr->opno;
}
else if (IsA(rinfo->clause, ScalarArrayOpExpr))
{
/* If it's a ScalarArrayOpExpr, check for Var IN Const. */
ScalarArrayOpExpr *expr = (ScalarArrayOpExpr *) rinfo->clause;
args = expr->args;
opno = expr->opno;
}

Or I can rewrite it in C++ to play with templates... :)

> We also need some at least some
> regression tests, testing functional dependencies with this clause type.

Agreed

> >The lack of support for @> operator, on the other hand, seems to be a
> >decisio

PATCH: add support for IN and @> in functional-dependency statistics use

2020-02-01 Thread Pierre Ducroquet

Hello

At my current job, we have a lot of multi-tenant databases, thus with tables 
containing a tenant_id column. Such a column introduces a severe bias in 
statistics estimation since any other FK in the next columns is very likely to 
have a functional dependency on the tenant id. We found several queries where 
this functional dependency messed up the estimations so much the optimizer 
chose wrong plans.
When we tried to use extended statistics with CREATE STATISTIC on tenant_id, 
other_id, we noticed that the current implementation for detecting functional 
dependency lacks two features (at least in our use case):
- support for IN clauses
- support for the array contains operator (that could be considered as a 
special case of IN)

After digging in the source code, I think the lack of support for IN clauses 
is an oversight and due to the fact that IN clauses are ScalarArrayOpExpr 
instead of OpExpr. The attached patch fixes this by simply copying the code-
path for OpExpr and changing the type name. It compiles and the results are 
correct, with a dependency being correctly taken into consideration when 
estimating rows. If you think such a copy paste is bad and should be factored 
in another static bool function, please say so and I will happily provide an 
updated patch.
The lack of support for @> operator, on the other hand, seems to be a decision 
taken when writing the initial code, but I can not find any mathematical nor 
technical reason for it. The current code restricts dependency calculation to 
the = operator, obviously because inequality operators are not going to 
work... but array contains is just several = operators grouped, thus the same 
for the dependency calculation. The second patch refactors the operator check 
in order to also include array contains.

I tested the patches on current HEAD, but I can test and provide back-ported 
versions of the patch for other versions if needed (this code path hardly 
changed since its introduction in 10).

Best regards

 Pierre Ducroquet
>From d2b89748e1b520c276875538149eb466e97c21b4 Mon Sep 17 00:00:00 2001
From: Pierre 
Date: Fri, 31 Jan 2020 23:58:35 +0100
Subject: [PATCH 1/2] Add support for IN clauses in dependencies check

---
 src/backend/statistics/dependencies.c | 34 +++
 1 file changed, 34 insertions(+)

diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index e2f6c5bb97..d4844a9ec6 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -801,6 +801,40 @@ dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
 
 		/* OK to proceed with checking "var" */
 	}
+	else if (IsA(rinfo->clause, ScalarArrayOpExpr))
+	{
+		/* If it's an opclause, check for Var IN Const. */
+		ScalarArrayOpExpr	   *expr = (ScalarArrayOpExpr *) rinfo->clause;
+
+		/* Only expressions with two arguments are candidates. */
+		if (list_length(expr->args) != 2)
+			return false;
+
+		/* Make sure non-selected argument is a pseudoconstant. */
+		if (is_pseudo_constant_clause(lsecond(expr->args)))
+			var = linitial(expr->args);
+		else if (is_pseudo_constant_clause(linitial(expr->args)))
+			var = lsecond(expr->args);
+		else
+			return false;
+
+		/*
+		 * If it's not an "=" operator, just ignore the clause, as it's not
+		 * compatible with functional dependencies.
+		 *
+		 * This uses the function for estimating selectivity, not the operator
+		 * directly (a bit awkward, but well ...).
+		 *
+		 * XXX this is pretty dubious; probably it'd be better to check btree
+		 * or hash opclass membership, so as not to be fooled by custom
+		 * selectivity functions, and to be more consistent with decisions
+		 * elsewhere in the planner.
+		 */
+		if (get_oprrest(expr->opno) != F_EQSEL)
+			return false;
+
+		/* OK to proceed with checking "var" */
+	}
 	else if (is_notclause(rinfo->clause))
 	{
 		/*
-- 
2.24.1

>From 02363c52d0d03f58b8e79088a7261a9fc1e3bbb7 Mon Sep 17 00:00:00 2001
From: Pierre 
Date: Fri, 31 Jan 2020 23:58:55 +0100
Subject: [PATCH 2/2] Add support for array contains in dependency check

---
 src/backend/statistics/dependencies.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index d4844a9ec6..c3f37a474b 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -785,7 +785,7 @@ dependency_is_compatible_clause(Node *clause, Index relid, AttrNumber *attnum)
 			return false;
 
 		/*
-		 * If it's not an "=" operator, just ignore the clause, as it's not
+		 * If it's not an "=" or "@>" operator, just ignore the clause, as it's not
 		 * compatible with functional dependencies.
 		 *
 		 * This uses the function for estimating selectivity, not the operator
@@ -796,8 +796,13

[PATCH] fix a performance issue with multiple logical-decoding walsenders

2019-12-26 Thread Pierre Ducroquet

Hello

Our current setup uses logical replication to build a BI replication server 
along our primary clusters (running PG 10.10 so far). This implies having one 
logical replication slot per database. After some analysis, we identified two 
hot spots behind this issue. Fixing them gave us a 10 fold performance 
improvement in decoding speed.


We noticed our load had gotten quite bigger on our primary since the 
introduction of this replication, seeing spikes in system time when a lot of 
wal were being written (for instance when creating GIN indexes).

The first hot spot is PostmasterIsAlive. The implementation reads on a pipe to 
know if the postmaster is still alive, but this is very expensive kernel-wise. 
When switching the implementation to a much more primitive (and probably 
wrong):
bool PostmasterIsAliveInternal() {
return getppid() == PostmasterPid;
}
we stopped seeing spikes in system time.
But after doing that, the CPU time used by our walsenders increased. We 
reached a second hot spot, this time in XLogSendLogical, where each walsender 
was using 100% of user CPU for minutes. After checking with perf, it appears 
all the decoders are fighting on GetFlushRecPtr.

On PostgreSQL 12, the call to PostmasterIsAlive is no longer present in 
WalSndLoop (thanks to commit cfdf4dc4f), so only the second hot spot is still 
present, with the same effects.

Attached to this email are two patches. 

The first one, specific to PG 10 for our use-case, simplifies the 
PostmasterIsAlive function, as described above. I don't know if this 
implementation is valid, but it was needed to uncover the performance issue in 
XLogSendLogical. Would it be possible to remove the systematic call to 
PostmasterIsAlive in WalSndLoop? We are not certain of the behaviour.

The second one was tested on PG 10 and PG 12 (with 48 lines offset). It has on 
PG12 the same effect it has on a PG10+isAlive patch. Instead of calling each 
time GetFlushRecPtr, we call it only if we notice we have reached the value of 
the previous call. This way, when the senders are busy decoding, we are no 
longer fighting for a spinlock to read the FlushRecPtr.

Here are some benchmark results.
On PG 10, to decode our replication stream, we went from 3m 43s to over 5 
minutes after removing the first hot spot, and then down to 22 seconds.
On PG 12, we had to change the benchmark (due to GIN indexes creation being 
more optimized) so we can not compare directly with our previous bench. We 
went from 15m 11s down to 59 seconds.
If needed, we can provide scripts to reproduce this situation. It is quite 
simple: add ~20 walsenders doing logical replication in database A, and then 
generate a lot of data in database B. The walsenders will be woken up by the 
activity on database B, but not sending it thus keeping hitting the same 
locks.

Regards

diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index 85db6b2..7a76687 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -272,6 +272,13 @@ bool
 PostmasterIsAlive(void)
 {
 #ifndef WIN32
+
+	if (getppid() != PostmasterPid)
+		return false;
+	else
+		return true;
+
+/*
 	char		c;
 	ssize_t		rc;
 
@@ -287,6 +294,7 @@ PostmasterIsAlive(void)
 		elog(FATAL, "unexpected data in postmaster death monitoring pipe");
 
 	return false;
+*/
 #else			/* WIN32 */
 	return (WaitForSingleObject(PostmasterHandle, 0) == WAIT_TIMEOUT);
 #endif			/* WIN32 */
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 74330f8..8dd327e 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2765,7 +2765,7 @@ XLogSendLogical(void)
 {
 	XLogRecord *record;
 	char	   *errm;
-	XLogRecPtr	flushPtr;
+	static XLogRecPtr flushPtr = 0;
 
 	/*
 	 * Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
@@ -2782,11 +2782,6 @@ XLogSendLogical(void)
 	if (errm != NULL)
 		elog(ERROR, "%s", errm);
 
-	/*
-	 * We'll use the current flush point to determine whether we've caught up.
-	 */
-	flushPtr = GetFlushRecPtr();
-
 	if (record != NULL)
 	{
 		/*
@@ -2801,7 +2796,14 @@ XLogSendLogical(void)
 
 	/* Set flag if we're caught up. */
 	if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
-		WalSndCaughtUp = true;
+	{
+		/*
+		 * We'll use the current flush point to determine whether we've caught up.
+		 */
+		flushPtr = GetFlushRecPtr();
+		if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
+			WalSndCaughtUp = true;
+	}
 
 	/*
 	 * If we're caught up and have been requested to stop, have WalSndLoop()

Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders

2019-12-30 Thread Pierre Ducroquet

On Sunday, December 29, 2019 1:32:31 PM CET Julien Rouhaud wrote:
> On Sat, Dec 28, 2019 at 1:56 PM Pierre Ducroquet  
wrote:
> > Thank you for your comments.
> > Attached to this email is a patch with better comments regarding the
> > XLogSendLogical change.
> 
> Arguably the first test to compare to InvalidXLogRecPtr is unneeded,
> as any value of EndRecPtr is greater or equal than that value.  It
> will only save at best 1 GetFlushRecPtr() per walsender process
> lifetime, so I'm not sure it's worth arguing about it.  Other than
> that I still think that it's a straightforward optimization that
> brings nice speedup, and I don't see any problem with this patch.  I
> think that given the time of the year you should create a commitfest
> entry for this patch to make sure it won't be forgotten (and obviously
> I'll mark it as RFC, unless someone objects by then).
> 
> > We've spent quite some time yesterday benching it again, this time with
> > changes that must be fully processed by the decoder. The speed-up is
> > obviously much smaller, we are only ~5% faster than without the patch.
> 
> I'm assuming that it's benchmarking done with multiple logical slots?
> Anyway, a 5% speedup in the case that this patch is not aimed to
> optimize is quite nice!

I've created a commitfest entry for this patch.
https://commitfest.postgresql.org/26/2403/
I would like to know if it would be acceptable to backport this to PostgreSQL 
12. I have to write a clean benchmark for that (our previous benchs are either 
PG10 or PG12 specific), but the change from Thomas Munro that removed the 
calls to PostmasterIsAlive is very likely to have the same side-effect we 
observed in PG10 when patching IsAlive, aka. moving the pressure from the pipe 
reads to the PostgreSQL locks between processes, and this made the whole 
process slower: the benchmark showed a serious regression, going from 3m45s to 
5m15s to decode the test transactions.

Regards

 Pierre

Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders

2020-01-06 Thread Pierre Ducroquet

On Monday, January 6, 2020 6:57:33 PM CET Tom Lane wrote:
> Pierre Ducroquet  writes:
> > Attached to this email is a patch with better comments regarding the
> > XLogSendLogical change.
> 
> Hi,
>   This patch entirely fails to apply for me (and for the cfbot too).
> It looks like (a) it's missing a final newline and (b) all the tabs
> have been mangled into spaces, and not correctly mangled either.
> I could probably reconstruct a workable patch if I had to, but
> it seems likely that it'd be easier for you to resend it with a
> little more care about attaching an unmodified attachment.
> 
> As for the question of back-patching, it seems to me that it'd
> likely be reasonable to put this into v12, but probably not
> further back.  There will be no interest in back-patching
> commit cfdf4dc4f, and it seems like the argument for this
> patch is relatively weak without that.
> 
>   regards, tom lane

Hi

My deepest apologies for the patch being broken, I messed up when transferring 
it between my computers after altering the comments. The verbatim one attached 
to this email applies with no issue on current HEAD.
The patch regarding PostmasterIsAlive is completely pointless since v12 where 
the function was rewritten, and was included only to help reproduce the issue 
on older versions. Back-patching the walsender patch further than v12 would 
imply back-patching all the machinery introduced for PostmasterIsAlive 
(9f09529952) or another intrusive change there, a too big risk indeed.

Regards 

 Pierre
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 6e80e67590..4b57db6fc2 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2774,7 +2774,13 @@ XLogSendLogical(void)
 {
 	XLogRecord *record;
 	char	   *errm;
-	XLogRecPtr	flushPtr;
+
+	/*
+	 * We'll use the current flush point to determine whether we've caught up.
+	 * This variable is static in order to cache it accross calls. This caching
+	 * is needed because calling GetFlushRecPtr needs to acquire an expensive lock.
+	 */
+	static XLogRecPtr flushPtr = InvalidXLogRecPtr;
 
 	/*
 	 * Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
@@ -2791,11 +2797,6 @@ XLogSendLogical(void)
 	if (errm != NULL)
 		elog(ERROR, "%s", errm);
 
-	/*
-	 * We'll use the current flush point to determine whether we've caught up.
-	 */
-	flushPtr = GetFlushRecPtr();
-
 	if (record != NULL)
 	{
 		/*
@@ -2808,7 +2809,15 @@ XLogSendLogical(void)
 		sentPtr = logical_decoding_ctx->reader->EndRecPtr;
 	}
 
-	/* Set flag if we're caught up. */
+	/* Initialize flushPtr if needed */
+	if (flushPtr == InvalidXLogRecPtr)
+		flushPtr = GetFlushRecPtr();
+
+	/* If EndRecPtr is past our flushPtr, we must update it to know if we caught up */
+	if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
+		flushPtr = GetFlushRecPtr();
+
+	/* If EndRecPtr is still past our flushPtr, it means we caught up */
 	if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
 		WalSndCaughtUp = true;

Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders

2019-12-28 Thread Pierre Ducroquet

On Thursday, December 26, 2019 8:18:46 PM CET Julien Rouhaud wrote:
> Hello Pierre,
> 
> On Thu, Dec 26, 2019 at 5:43 PM Pierre Ducroquet  
wrote:
> > The second one was tested on PG 10 and PG 12 (with 48 lines offset). It
> > has on PG12 the same effect it has on a PG10+isAlive patch. Instead of
> > calling each time GetFlushRecPtr, we call it only if we notice we have
> > reached the value of the previous call. This way, when the senders are
> > busy decoding, we are no longer fighting for a spinlock to read the
> > FlushRecPtr.
> 
> The patch is quite straightforward and looks good to me.
> 
> -XLogRecPtrflushPtr;
> +static XLogRecPtr flushPtr = 0;
> 
> You should use InvalidXLogRecPtr instead though, and maybe adding some
> comments to explain why the static variable is a life changer here.
> 
> > Here are some benchmark results.
> > On PG 10, to decode our replication stream, we went from 3m 43s to over 5
> > minutes after removing the first hot spot, and then down to 22 seconds.
> > On PG 12, we had to change the benchmark (due to GIN indexes creation
> > being
> > more optimized) so we can not compare directly with our previous bench. We
> > went from 15m 11s down to 59 seconds.
> > If needed, we can provide scripts to reproduce this situation. It is quite
> > simple: add ~20 walsenders doing logical replication in database A, and
> > then generate a lot of data in database B. The walsenders will be woken
> > up by the activity on database B, but not sending it thus keeping hitting
> > the same locks.
> 
> Quite impressive speedup!



Hi

Thank you for your comments.
Attached to this email is a patch with better comments regarding the 
XLogSendLogical change.
We've spent quite some time yesterday benching it again, this time with 
changes that must be fully processed by the decoder. The speed-up is obviously 
much smaller, we are only ~5% faster than without the patch.

Regardsdiff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index 74330f8c84..fbb30c6847 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -2765,7 +2765,13 @@ XLogSendLogical(void)
 {
XLogRecord *record;
char   *errm;
-   XLogRecPtr  flushPtr;
+
+   /*
+* We'll use the current flush point to determine whether we've caught 
up.
+* This variable is static in order to cache it accross calls. This 
caching
+* is needed because calling GetFlushRecPtr needs to acquire an 
expensive lock.
+*/
+   static XLogRecPtr flushPtr = InvalidXLogRecPtr;
 
/*
 * Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
@@ -2782,11 +2788,6 @@ XLogSendLogical(void)
if (errm != NULL)
elog(ERROR, "%s", errm);
 
-   /*
-* We'll use the current flush point to determine whether we've caught 
up.
-*/
-   flushPtr = GetFlushRecPtr();
-
if (record != NULL)
{
/*
@@ -2799,7 +2800,16 @@ XLogSendLogical(void)
sentPtr = logical_decoding_ctx->reader->EndRecPtr;
}
 
-   /* Set flag if we're caught up. */
+
+   /* Initialize flushPtr if needed */
+   if (flushPtr == InvalidXLogRecPtr)
+   flushPtr = GetFlushRecPtr();
+
+   /* If EndRecPtr is past our flushPtr, we must update it to know if we 
caught up */
+   if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
+   flushPtr = GetFlushRecPtr();
+
+   /* If EndRecPtr is still past our flushPtr, it means we caught up */
if (logical_decoding_ctx->reader->EndRecPtr >= flushPtr)
WalSndCaughtUp = true;

[PATCH] Remove useless distinct clauses

2020-07-31 Thread Pierre Ducroquet

Hi

In a recent audit, I noticed that application developers have a tendency to 
abuse the distinct clause. For instance they use an ORM and add a distinct at 
the top level just because they don't know the cost it has, or they don't know 
that using EXISTS is a better way to express their queries than doing JOINs 
(or worse, they can't do better).

They thus have this kind of queries (considering tbl1 has a PK of course):
SELECT DISTINCT * FROM tbl1;
SELECT DISTINCT * FROM tbl1 ORDER BY a;
SELECT DISTINCT tbl1.* FROM tbl1
JOIN tbl2 ON tbl2.a = tbl1.id;

These can be transformed into:
SELECT * FROM tbl1 ORDER BY *;
SELECT * FROM tbl1 ORDER BY a;
SELECT tbl1.* FROM tbl1 SEMI-JOIN tbl2 ON tbl2.a = tbl1.id ORDER BY tbl1.*;

The attached patch does that.
I added extra safeties in several place just to be sure I don't touch 
something I can not handle, but I may have been very candid with the distinct 
to sort transformation.
The cost of this optimization is quite low : for queries that don't have any 
distinct, it's just one if. If there is a distinct, we iterate once through 
every target, then we fetch the PK and iterate through the DISTINCT clause 
fields. If it is possible to optimize, we then iterate through the JOINs.

Any comment on this would be more than welcome!

Regards

 Pierre

>From bc71aea5a0c1911c1f63567c21a870241b76231b Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Fri, 31 Jul 2020 10:28:25 +0200
Subject: [PATCH] Add a new remove_useless_distinct function in planner

This function removes useless distinct that are (sadly) often
added when developers use an ORM.
For instance, we may have the following queries:
SELECT DISTINCT * FROM tbl;
SELECT DISTINCT * FROM tbl ORDER BY b;
SELECT DISTINCT tbl1.* FROM tbl1 JOIN tbl2 ON tbl2.a_id = tbl1.id;
In the first two queries (if tbl has a PK of course), the distinct
is useless, it's at best an implicit order by.
In the third query, the distinct turns the JOIN into a SEMI-JOIN.
All these are thus optimized here.
---
 src/backend/optimizer/plan/planner.c | 132 +++
 1 file changed, 132 insertions(+)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b40a112c25..a88389acb5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -140,6 +140,7 @@ static double preprocess_limit(PlannerInfo *root,
 			   double tuple_fraction,
 			   int64 *offset_est, int64 *count_est);
 static void remove_useless_groupby_columns(PlannerInfo *root);
+static void remove_useless_distinct(PlannerInfo *root);
 static List *preprocess_groupclause(PlannerInfo *root, List *force);
 static List *extract_rollup_sets(List *groupingSets);
 static List *reorder_grouping_sets(List *groupingSets, List *sortclause);
@@ -650,6 +651,12 @@ subquery_planner(PlannerGlobal *glob, Query *parse,
 	 */
 	replace_empty_jointree(parse);
 
+	/*
+	 * If there is a distinct clause, try to remove it
+	 */
+	if (parse->distinctClause)
+		remove_useless_distinct(root);
+
 	/*
 	 * Look for ANY and EXISTS SubLinks in WHERE and JOIN/ON clauses, and try
 	 * to transform them into joins.  Note that this step does not descend
@@ -3207,6 +3214,131 @@ remove_useless_groupby_columns(PlannerInfo *root)
 	}
 }
 
+/*
+ * remove_useless_distinct
+ *		Remove any distinct that is redundant due to being applied on the
+ *		primary key of the table.
+ *
+ * Currently, we only make use of pkey constraints for this, however, we may
+ * want to take this further in the future and also use unique constraints
+ * with NOT NULL columns.
+ */
+static void
+remove_useless_distinct(PlannerInfo *root)
+{
+	Index distincted_table = 0;
+	if (list_length(root->parse->rtable) > 1)
+	{
+		/**
+		 * There is more than one table.
+		 * If no field is selected from another table, this is a 'typical'
+		 * semi-join written using distinct instead of exists.
+		 */
+		ListCell *targetCell;
+		Var *targetVar;
+		foreach (targetCell, root->parse->targetList)
+		{
+			/**
+			 * This only accept Vars. Later, we could go lower and look for Vars 
+			 * inside function parameters for instance.
+			 */
+			TargetEntry *target = (TargetEntry*) lfirst(targetCell);
+			if (!IsA(target->expr, Var))
+			{
+return;
+			}
+			else
+			{
+targetVar = (Var*) target->expr;
+if (distincted_table == 0)
+	distincted_table = targetVar->varno;
+else if (distincted_table != targetVar->varno)
+	return;
+			}
+		}
+	}
+	else
+	{
+		distincted_table = 1;
+	}
+
+	/**
+	 * We will fetch the PK, check if we have a distinct on it.
+	 */
+	ListCell *distinctItemCell;
+	TargetEntry *entry;
+	Oid pk_oid = InvalidOid;
+	Bitmapset *pk_attnos;
+	RangeTblEntry *tbl;
+	Var *var;
+	SortGroupClause *distinctItem;
+
+	// To get the pk, we need the table
+	tbl = (RangeTblEntry*) list_nth(root->parse->rtable, distincted_table - 1);
+	// Make sure it is a relkind we understand

Re: Improvements in pg_dump/pg_restore toc format and performances

2023-09-19 Thread Pierre Ducroquet

On Monday, September 18, 2023 11:54:42 PM CEST Nathan Bossart wrote:
> On Mon, Sep 18, 2023 at 02:52:47PM -0700, Nathan Bossart wrote:
> > On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
> >> I ended up writing several patches that shaved some time for pg_restore
> >> -l,
> >> and reduced the toc.dat size.
> > 
> > I've only just started taking a look at these patches, and I intend to do
> > a
> > more thorough review in the hopefully-not-too-distant future.
> 
> Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this
> to waiting-on-author.

Attached updated patches fix this regression, I'm sorry I missed that.

>From bafc4a3775d732895aa919de21e6f512cc726f49 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Wed, 26 Jul 2023 21:31:33 +0200
Subject: [PATCH 1/4] drop has oids field instead of having static values

---
 src/bin/pg_dump/pg_backup_archiver.c | 3 +--
 src/bin/pg_dump/pg_backup_archiver.h | 3 ++-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 4d83381d84..f4c782d63d 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -2510,7 +2510,6 @@ WriteToc(ArchiveHandle *AH)
 		WriteStr(AH, te->tablespace);
 		WriteStr(AH, te->tableam);
 		WriteStr(AH, te->owner);
-		WriteStr(AH, "false");
 
 		/* Dump list of dependencies */
 		for (i = 0; i < te->nDeps; i++)
@@ -2618,7 +2617,7 @@ ReadToc(ArchiveHandle *AH)
 		is_supported = true;
 		if (AH->version < K_VERS_1_9)
 			is_supported = false;
-		else
+		else if (AH->version < K_VERS_1_16)
 		{
 			tmp = ReadStr(AH);
 
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index b07673933d..b78e326f73 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -68,10 +68,11 @@
 #define K_VERS_1_15 MAKE_ARCHIVE_VERSION(1, 15, 0)	/* add
 	 * compression_algorithm
 	 * in header */
+#define K_VERS_1_16 MAKE_ARCHIVE_VERSION(1, 16, 0)	/* optimize dump format */
 
 /* Current archive version number (the format we can output) */
 #define K_VERS_MAJOR 1
-#define K_VERS_MINOR 15
+#define K_VERS_MINOR 16
 #define K_VERS_REV 0
 #define K_VERS_SELF MAKE_ARCHIVE_VERSION(K_VERS_MAJOR, K_VERS_MINOR, K_VERS_REV)
 
-- 
2.42.0

>From e8dbb51713daefa1596ce63ea6990d2fd1c4e27f Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Wed, 26 Jul 2023 22:23:28 +0200
Subject: [PATCH 2/4] convert sscanf to strtoul

---
 src/bin/pg_dump/pg_backup_archiver.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index f4c782d63d..f9efb2badf 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -2556,13 +2556,13 @@ ReadToc(ArchiveHandle *AH)
 		if (AH->version >= K_VERS_1_8)
 		{
 			tmp = ReadStr(AH);
-			sscanf(tmp, "%u", >catalogId.tableoid);
+			te->catalogId.tableoid = strtoul(tmp, NULL, 10);
 			free(tmp);
 		}
 		else
 			te->catalogId.tableoid = InvalidOid;
 		tmp = ReadStr(AH);
-		sscanf(tmp, "%u", >catalogId.oid);
+		te->catalogId.oid = strtoul(tmp, NULL, 10);
 		free(tmp);
 
 		te->tag = ReadStr(AH);
@@ -2646,7 +2646,7 @@ ReadToc(ArchiveHandle *AH)
 	depSize *= 2;
 	deps = (DumpId *) pg_realloc(deps, sizeof(DumpId) * depSize);
 }
-sscanf(tmp, "%d", [depIdx]);
+deps[depIdx] = strtoul(tmp, NULL, 10);
 free(tmp);
 depIdx++;
 			}
-- 
2.42.0

>From eb02760df97ab8e95287662a88ac3439fbc0a4fa Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Thu, 27 Jul 2023 10:04:51 +0200
Subject: [PATCH 3/4] store oids as integer instead of string

---
 src/bin/pg_dump/pg_backup_archiver.c | 63 ++--
 1 file changed, 42 insertions(+), 21 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index f9efb2badf..feacc22701 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -2470,7 +2470,6 @@ void
 WriteToc(ArchiveHandle *AH)
 {
 	TocEntry   *te;
-	char		workbuf[32];
 	int			tocCount;
 	int			i;
 
@@ -2494,11 +2493,8 @@ WriteToc(ArchiveHandle *AH)
 		WriteInt(AH, te->dumpId);
 		WriteInt(AH, te->dataDumper ? 1 : 0);
 
-		/* OID is recorded as a string for historical reasons */
-		sprintf(workbuf, "%u", te->catalogId.tableoid);
-		WriteStr(AH, workbuf);
-		sprintf(workbuf, "%u", te->catalogId.oid);
-		WriteStr(AH, workbuf);
+		WriteInt(AH, te->catalogId.tableoid);
+		WriteInt(AH, te->catalogId.oid);
 
 		WriteStr(AH, te->tag);
 		WriteStr(AH, te->desc);
@@ -2514,10 +2510,9 @@ WriteToc(ArchiveHandle *AH)
 		/* Dump list of dependencies */
 		for

Re: Improvements in pg_dump/pg_restore toc format and performances

2023-09-18 Thread Pierre Ducroquet

On Monday, September 18, 2023 11:52:47 PM CEST Nathan Bossart wrote:
> On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
> > I ended up writing several patches that shaved some time for pg_restore
> > -l,
> > and reduced the toc.dat size.
> 
> I've only just started taking a look at these patches, and I intend to do a
> more thorough review in the hopefully-not-too-distant future.

Thank you very much.

> Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this
> to waiting-on-author.

I did not notice anything running meson test -v, I'll look further into it in 
the next days.

> > First patch is "finishing" the job of removing has oids support. When this
> > support was removed, instead of dropping the field from the dumps and
> > increasing the dump versions, the field was kept as is. This field stores
> > a
> > boolean as a string, "true" or "false". This is not free, and requires 10
> > bytes per toc entry.
> 
> This sounds reasonable to me.  I wonder why this wasn't done when WITH OIDS
> was removed in v12.

I suppose it is an oversight, or not wanting to increase the dump version for 
no reason.

> > The second patch removes calls to sscanf and replaces them with strtoul.
> > This was the biggest speedup for pg_restore -l.
> 
> Nice.
> 
> > The third patch changes the dump format further to remove these strtoul
> > calls and store the integers as is instead.
> 
> Do we need to worry about endianness here?

I used the ReadInt/WriteInt functions already defined in pg_dump that take care 
of this issue, so there should be no need to worry.

> > The fourth patch is dirtier and does more changes to the dump format.
> > Instead of storing the owner, tablespace, table access method and schema
> > of each object as a string, pg_dump builds an array of these, stores them
> > at the beginning of the file and replaces the strings with integer fields
> > in the dump. This reduces the file size further, and removes a lot of
> > calls to ReadStr, thus saving quite some time.
> 
> This sounds promising.
> 
> > Patch   Toc sizeDump -s durationpg_restore -l duration
> > HEAD214M23.1s   1.27s
> > #1 (has oid)210M22.9s   1.26s
> > #2 (scanf)  210M22.9s   1.07s
> > #3 (no strtoul) 202M22.8s   0.94s
> > #4 (string list)181M23.1s   0.87s
> 
> At a glance, the size improvements in 0004 look the most interesting to me.

Yes it is, and the speed benefits are interesting too (at least for my usecase)

Re: Improvements in pg_dump/pg_restore toc format and performances

2023-09-19 Thread Pierre Ducroquet

On Monday, September 18, 2023 11:54:42 PM CEST Nathan Bossart wrote:
> On Mon, Sep 18, 2023 at 02:52:47PM -0700, Nathan Bossart wrote:
> > On Thu, Jul 27, 2023 at 10:51:11AM +0200, Pierre Ducroquet wrote:
> >> I ended up writing several patches that shaved some time for pg_restore
> >> -l,
> >> and reduced the toc.dat size.
> > 
> > I've only just started taking a look at these patches, and I intend to do
> > a
> > more thorough review in the hopefully-not-too-distant future.
> 
> Since cfbot is failing on some pg_upgrade and pg_dump tests, I've set this
> to waiting-on-author.

FYI, the failures are related to patch 0004, I said it was dirty, but it was 
apparently an understatement. Patches 0001 to 0003 don't exhibit any 
regression.

log_line_prefix: make it possible to add the search_path

2022-07-25 Thread Pierre Ducroquet

Hello

I'm working on several databases where schemas are used to differentiate the 
tenants.
This is great for performance, but several tools are lacking around this 
usecase by not showing the schema, one of them being log_line_prefix.
It is possible to work around this using the application_name, but a mistake 
on the application side would be fatal, while the search_path would still 
indicate the real tables used in a query.
The attached patch implements this, using %S. I've not written the 
documentation yet, since I'm not sure this would be acceptable as is, or if a 
more "generic" method should be used (I thought of %{name} to fetch an 
arbitrary GUC, but did not implement due to a lack of need for that feature)


>From d28ea4452c6e78bf8e67db49d1c72dbf4bd1ca48 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Mon, 25 Jul 2022 09:33:49 +0200
Subject: [PATCH] log_line_prefix: make it possible to add search_path

This adds support for %S to add the current search_path in log_line_prefix, thus
making it easier to track slow queries on databases with for instance one schema
per tenant.
---
 src/backend/utils/error/elog.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 95f32de4e2..41e894bc83 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -68,6 +68,7 @@
 
 #include "access/transam.h"
 #include "access/xact.h"
+#include "catalog/namespace.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -83,7 +84,6 @@
 #include "utils/memutils.h"
 #include "utils/ps_status.h"
 
-
 /* In this module, access gettext() via err_gettext() */
 #undef _
 #define _(x) err_gettext(x)
@@ -2791,6 +2791,12 @@ log_status_format(StringInfo buf, const char *format, ErrorData *edata)
 	appendStringInfo(buf, "%lld",
 	 (long long) pgstat_get_my_query_id());
 break;
+			case 'S':
+if (padding != 0)
+	appendStringInfo(buf, "%*s", padding, namespace_search_path);
+else
+	appendStringInfoString(buf, namespace_search_path);
+
 			default:
 /* format error - ignore it */
 break;
-- 
2.37.1

Improvements in pg_dump/pg_restore toc format and performances

2023-07-27 Thread Pierre Ducroquet

Hi

Following the thread "Inefficiency in parallel pg_restore with many tables", I 
started digging into why the toc.dat files are that big and where time is spent 
when parsing them.

I ended up writing several patches that shaved some time for pg_restore -l, 
and reduced the toc.dat size.

First patch is "finishing" the job of removing has oids support. When this 
support was removed, instead of dropping the field from the dumps and 
increasing the dump versions, the field was kept as is. This field stores a 
boolean as a string, "true" or "false". This is not free, and requires 10 
bytes per toc entry.

The second patch removes calls to sscanf and replaces them with strtoul. This 
was the biggest speedup for pg_restore -l.

The third patch changes the dump format further to remove these strtoul calls 
and store the integers as is instead.

The fourth patch is dirtier and does more changes to the dump format. Instead 
of storing the owner, tablespace, table access method and schema of each 
object as a string, pg_dump builds an array of these, stores them at the 
beginning of the file and replaces the strings with integer fields in the dump. 
This reduces the file size further, and removes a lot of calls to ReadStr, thus 
saving quite some time.

Toc has 453999 entries.

Patch   Toc sizeDump -s durationpg_restore -l duration
HEAD214M23.1s   1.27s
#1 (has oid)210M22.9s   1.26s
#2 (scanf)  210M22.9s   1.07s
#3 (no strtoul) 202M22.8s   0.94s
#4 (string list)181M23.1s   0.87s

Patch four is likely to require more changes. I don't know PostgreSQL code 
enough to do better than calling pgmalloc/pgrealloc and maintaining a char** 
manually, I guess there are structs and functions that do that in a better 
way. And the location of string tables in the file and in the structures is 
probably not acceptable, I suppose these should go to the toc header instead.

I still submit these for comments and first review.

Best regards

 Pierre Ducroquet
>From cf5afd22a6e754ccfab0cd4477e378e32baf425a Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Wed, 26 Jul 2023 21:31:33 +0200
Subject: [PATCH 1/4] drop has oids field instead of having static values

---
 src/bin/pg_dump/pg_backup_archiver.c | 3 +--
 src/bin/pg_dump/pg_backup_archiver.h | 3 ++-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 39ebcfec32..2888baa168 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -2506,7 +2506,6 @@ WriteToc(ArchiveHandle *AH)
 		WriteStr(AH, te->tablespace);
 		WriteStr(AH, te->tableam);
 		WriteStr(AH, te->owner);
-		WriteStr(AH, "false");
 
 		/* Dump list of dependencies */
 		for (i = 0; i < te->nDeps; i++)
@@ -2614,7 +2613,7 @@ ReadToc(ArchiveHandle *AH)
 		is_supported = true;
 		if (AH->version < K_VERS_1_9)
 			is_supported = false;
-		else
+		else if (AH->version < K_VERS_1_16)
 		{
 			tmp = ReadStr(AH);
 
diff --git a/src/bin/pg_dump/pg_backup_archiver.h b/src/bin/pg_dump/pg_backup_archiver.h
index 18b38c17ab..930141a506 100644
--- a/src/bin/pg_dump/pg_backup_archiver.h
+++ b/src/bin/pg_dump/pg_backup_archiver.h
@@ -68,10 +68,11 @@
 #define K_VERS_1_15 MAKE_ARCHIVE_VERSION(1, 15, 0)	/* add
 	 * compression_algorithm
 	 * in header */
+#define K_VERS_1_16 MAKE_ARCHIVE_VERSION(1, 16, 0)	/* optimize dump format */
 
 /* Current archive version number (the format we can output) */
 #define K_VERS_MAJOR 1
-#define K_VERS_MINOR 15
+#define K_VERS_MINOR 16
 #define K_VERS_REV 0
 #define K_VERS_SELF MAKE_ARCHIVE_VERSION(K_VERS_MAJOR, K_VERS_MINOR, K_VERS_REV)
 
-- 
2.41.0

>From 62c6bb9bca491586314760edbf8330308e8528c9 Mon Sep 17 00:00:00 2001
From: Pierre Ducroquet 
Date: Wed, 26 Jul 2023 22:23:28 +0200
Subject: [PATCH 2/4] convert sscanf to strtoul

---
 src/bin/pg_dump/pg_backup_archiver.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 2888baa168..0fd6510c10 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -2552,13 +2552,13 @@ ReadToc(ArchiveHandle *AH)
 		if (AH->version >= K_VERS_1_8)
 		{
 			tmp = ReadStr(AH);
-			sscanf(tmp, "%u", >catalogId.tableoid);
+			te->catalogId.tableoid = strtoul(tmp, NULL, 10);
 			free(tmp);
 		}
 		else
 			te->catalogId.tableoid = InvalidOid;
 		tmp = ReadStr(AH);
-		sscanf(tmp, "%u", >catalogId.oid);
+		te->catalogId.oid = strtoul(tmp, NULL, 10);
 		free(tmp);
 
 		te->tag = ReadStr(AH);
@@ -2642,7 +2642,7 @@ ReadToc(ArchiveHandle *AH)
 	depSize *= 2;
 	deps = (DumpId *) pg_realloc(deps, sizeof(DumpId) * depSize);
 }
-sscanf(tmp, "%d", [depIdx]);
+deps[depIdx] =

Re: Inefficiency in parallel pg_restore with many tables

2023-07-24 Thread Pierre Ducroquet

On Saturday, July 15, 2023 7:47:12 PM CEST Tom Lane wrote:
> I'm not sure how big a deal this is in practice: in most situations
> the individual jobs are larger than they are in this toy example,
> plus the initial non-parallelizable part of the restore is a bigger
> bottleneck anyway with this many tables.  Still, we do have one
> real-world complaint, so maybe we should look into improving it.

Hi

For what it's worth, at my current job it's kind of a big deal. I was going to 
start looking at the bad performance I got on pg_restore for some databases 
with over 50k tables (in 200 namespaces) when I found this thread. The dump 
weights in about 2,8GB, the toc.dat file is 230MB, 50 120 tables, 142 069 
constraints and 73 669 indexes.

HEAD pg_restore duration: 30 minutes
pg_restore with latest patch from Nathan Bossart: 23 minutes

This is indeed better, but there is still a lot of room for improvements. With 
such usecases, I was able to go much faster using the patched pg_restore with 
a script that parallelize on each schema instead of relying on the choices 
made by pg_restore. It seems the choice of parallelizing only the data loading 
is losing nice speedup opportunities with a huge number of objects.

patched pg_restore + parallel restore of schemas: 10 minutes

Anyway, the patch works really fine as is, and I will certainly keep trying 
future iterations.

Regards

 Pierre

47 matches

Mail list logo