Re: undefined symbol: PQgssEncInUse

2019-05-29 Thread Donald Dong
On May 29, 2019, at 8:23 PM, Paul Guo  wrote:
> Have you used the correct libpq library? If yes, you might want to check the 
> build logs and related files to see where is wrong. In my environment, it's 
> ok with both gssapi enabled or disabled.

Thank you! Resetting libpq's path fixes it.

Regards,
Donald Dong



Re: Dead stores in src/common/sha2.c

2019-05-29 Thread Heikki Linnakangas

On 29/05/2019 18:47, Hamlin, Garick L wrote:

On Wed, May 29, 2019 at 11:01:05AM -0400, Tom Lane wrote:

At the same time, I'm not sure if we should just write this off as an
ignorable warning.  If the C compiler concludes these are dead stores
it'll probably optimize them away, leading to not accomplishing the
goal of wiping the values.


Yeah, I mean it's odd to put code in to zero/hide state knowing it's
probably optimized out.

We could also take it out, but maybe it does help somewhere?

... or put in a comment that says: This probably gets optimized away, but
we don't consider it much of a risk.


There's a function called explicit_bzero() in glibc, for this purpose. 
See 
https://www.gnu.org/software/libc/manual/html_node/Erasing-Sensitive-Data.html. 
It's not totally portable, but it's also available in some BSDs, at 
least. That documentation mentions the possibility that it might force 
variables to be stored in memory that would've otherwise been kept only 
in registers, but says that in most situations it's nevertheless better 
to use explicit_bero() than not. So I guess we should use that, if it's 
available.


- Heikki




Re: undefined symbol: PQgssEncInUse

2019-05-29 Thread Paul Guo
Have you used the correct libpq library? If yes, you might want to check
the build logs and related files to see where is wrong. In my environment,
it's ok with both gssapi enabled or disabled.

On Thu, May 30, 2019 at 9:22 AM Donald Dong  wrote:

> Hi,
>
> After I make temp-install on HEAD with a clean build, I fail to start
> psql (tmp_install/usr/local/pgsql/bin/psql) and get an error message:
>
> ./psql: symbol lookup error: ./psql: undefined symbol: PQgssEncInUse
>
> However, make check and other tests still work. For me, it is fine
> until commit b0b39f72b9904bcb80f97b35837ccff1578aa4b8. I wonder if
> this only occurs to me?
>
> Thank you,
> Donald Dong
>
>
>


undefined symbol: PQgssEncInUse

2019-05-29 Thread Donald Dong
Hi,

After I make temp-install on HEAD with a clean build, I fail to start
psql (tmp_install/usr/local/pgsql/bin/psql) and get an error message:

./psql: symbol lookup error: ./psql: undefined symbol: PQgssEncInUse

However, make check and other tests still work. For me, it is fine
until commit b0b39f72b9904bcb80f97b35837ccff1578aa4b8. I wonder if
this only occurs to me?

Thank you,
Donald Dong




Print baserestrictinfo for varchar fields

2019-05-29 Thread Donald Dong
Hi,

I noticed that debug_print_rel outputs "unknown expr" when the fields
in baserestrictinfo are typed as varchar.

create table tbl_a(id int, info varchar(32));

RELOPTINFO (tbl_a): rows=4 width=86
baserestrictinfo: unknown expr = pattern

My approach is to handle the RelabelType case in print_expr. After
the patch, I get:

RELOPTINFO (tbl_a): rows=4 width=86
baserestrictinfo: tbl_a.info = pattern

I wonder if this is a proper way of fixing it?

Thank you,
Donald Dong



001_print_baserestrictinfo_varchar.patch
Description: Binary data


Re: coverage increase for worker_spi

2019-05-29 Thread Tom Lane
Alvaro Herrera  writes:
> Tom pointed out that coverage for worker_spi is 0%.  For a module that
> only exists to provide coverage, that's pretty stupid.  This patch
> increases coverage to 90.9% line-wise and 100% function-wise, which
> seems like a sufficient starting point.

> How would people feel about me getting this in master at this point in
> the cycle, it being just some test code?  We can easily revert if
> it seems too unstable.

I'm not opposed to adding a new test case at this point in the cycle,
but as written this one seems more or less guaranteed to fail under
load.  You can't just sleep for worker_spi.naptime and expect that
the worker will certainly have run.

Perhaps you could use a plpgsql DO block with a loop to wait up
to X seconds until the expected state appears, for X around 120
to 180 seconds (compare poll_query_until in the TAP tests).

regards, tom lane




coverage increase for worker_spi

2019-05-29 Thread Alvaro Herrera
Tom pointed out that coverage for worker_spi is 0%.  For a module that
only exists to provide coverage, that's pretty stupid.  This patch
increases coverage to 90.9% line-wise and 100% function-wise, which
seems like a sufficient starting point.

How would people feel about me getting this in master at this point in
the cycle, it being just some test code?  We can easily revert if
it seems too unstable.

-- 
Álvaro Herrera
diff --git a/src/test/modules/worker_spi/Makefile b/src/test/modules/worker_spi/Makefile
index 7cdb33c9df..cbf9b2e37f 100644
--- a/src/test/modules/worker_spi/Makefile
+++ b/src/test/modules/worker_spi/Makefile
@@ -6,6 +6,14 @@ EXTENSION = worker_spi
 DATA = worker_spi--1.0.sql
 PGFILEDESC = "worker_spi - background worker example"
 
+REGRESS = worker_spi
+
+# enable our module in shared_preload_libraries for dynamic bgworkers
+REGRESS_OPTS = --temp-config $(top_srcdir)/src/test/modules/worker_spi/dynamic.conf
+
+# Disable installcheck to ensure we cover dynamic bgworkers.
+NO_INSTALLCHECK = 1
+
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/src/test/modules/worker_spi/dynamic.conf b/src/test/modules/worker_spi/dynamic.conf
new file mode 100644
index 00..646885a9c7
--- /dev/null
+++ b/src/test/modules/worker_spi/dynamic.conf
@@ -0,0 +1,2 @@
+worker_spi.naptime = 1
+shared_preload_libraries = worker_spi
diff --git a/src/test/modules/worker_spi/expected/worker_spi.out b/src/test/modules/worker_spi/expected/worker_spi.out
new file mode 100644
index 00..28b2970d01
--- /dev/null
+++ b/src/test/modules/worker_spi/expected/worker_spi.out
@@ -0,0 +1,22 @@
+\c postgres
+CREATE EXTENSION worker_spi;
+SELECT worker_spi_launch(1) IS NOT NULL;
+ ?column? 
+--
+ t
+(1 row)
+
+INSERT INTO schema1.counted VALUES ('total', 0), ('delta', 1);
+SELECT pg_sleep(CASE WHEN count(*) = 0 THEN 0 ELSE current_setting('worker_spi.naptime')::int END)
+  FROM schema1.counted WHERE type = 'delta';
+ pg_sleep 
+--
+ 
+(1 row)
+
+SELECT * FROM schema1.counted;
+ type  | value 
+---+---
+ total | 1
+(1 row)
+
diff --git a/src/test/modules/worker_spi/sql/worker_spi.sql b/src/test/modules/worker_spi/sql/worker_spi.sql
new file mode 100644
index 00..ac454cff57
--- /dev/null
+++ b/src/test/modules/worker_spi/sql/worker_spi.sql
@@ -0,0 +1,7 @@
+\c postgres
+CREATE EXTENSION worker_spi;
+SELECT worker_spi_launch(1) IS NOT NULL;
+INSERT INTO schema1.counted VALUES ('total', 0), ('delta', 1);
+SELECT pg_sleep(CASE WHEN count(*) = 0 THEN 0 ELSE current_setting('worker_spi.naptime')::int END)
+  FROM schema1.counted WHERE type = 'delta';
+SELECT * FROM schema1.counted;
-- 
2.17.1



Re: incorrect xlog.c coverage report

2019-05-29 Thread Alvaro Herrera
On 2018-Nov-22, Michael Paquier wrote:

> On Thu, Nov 22, 2018 at 10:56:39AM +0900, Masahiko Sawada wrote:
> > On Thu, Nov 22, 2018 at 10:43 AM Thomas Munro 
> >  wrote:
> >> Presumably you could add your own call to __gcov_flush() in
> >> quickdie(), so that we get GCOV data but no other atexit()-like stuff.
> >> I see that some people advocate doing that in signal handlers, but I
> >> don't know if it's really safe.  If that is somehow magically OK,
> >> you'd probably also need the chdir() hack from proc_exit() to get
> >> per-pid files.
> > 
> > That's probably a good idea, I'm also not sure if it's really safe
> > though. An alternative approach could be that we can do $node->restart
> > after recovered from $node->teardown_node() to write gcda file surely,
> > although it would make the tests hard to read.
> 
> Thanks for looking at the details around that.  I'd prefer much if we
> have a solution like what's outline here because we should really try to
> have coverage even for code paths which involve an immediate shutdown
> (mainly for recovery).  Manipulating the tests to get a better coverage
> feels more like a band-aid solution, and does not help folks with custom
> TAP tests in their plugins.

I've just realized that we didn't do anything about this (see line 5380
in https://coverage.postgresql.org/src/backend/access/transam/xlog.c.gcov.html)
, and we should.  I still prefer the solution I proposed (which is to
edit the test files to avoid immediate shutdown in certain places), but
I admit that adding __gcov_flush() to quickdie() seems to have gotten
more votes.

Are there objections to doing that now on the master branch?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Why does SPI_connect change the memory context?

2019-05-29 Thread Tom Lane
Jeff Davis  writes:
> SPI_connect() changes the memory context to a newly-created one, and
> then SPI_finish() restores it. That seems a bit dangerous because the
> caller might not be expecting it. Is there a reason it doesn't just
> change to the new memory context as-needed?

Because the expectation is that palloc inside the SPI procedure will
allocate in a procedure-specific context.  If the caller isn't expecting
that, they haven't read the documentation, specifically

https://www.postgresql.org/docs/devel/spi-memory.html

which says

  
   SPI_connect creates a new memory context and
   makes it current.  SPI_finish restores the
   previous current memory context and destroys the context created by
   SPI_connect.  These actions ensure that
   transient memory allocations made inside your C function are
   reclaimed at C function exit, avoiding memory leakage.
  

regards, tom lane




Re: Rearranging ALTER TABLE to avoid multi-operations bugs

2019-05-29 Thread Tom Lane
Robert Haas  writes:
> From my point of view, the DDL code doesn't do a great job separating
> parsing/parse analysis from optimization/execution.  The ALTER TABLE
> stuff is actually pretty good in this regard.

Meh.  I think a pretty fair characterization of the bug(s) I'm trying to
fix is "we separated parse analysis from execution when we should not
have, because it leads to parse analysis being done against the wrong
database state".  So I'm *very* suspicious of any argument that we should
try to separate them more, let alone that doing so will somehow fix this
set of bugs.

>> Also, recursive ProcessUtility cases exist independently of this issue,
>> in particular in CreateSchemaCommand.  My worry about my patch upthread
>> is not really that it introduces another one, but that it doesn't do
>> anything towards providing a uniform framework/notation for all these
>> cases.

> I'm not really sure I understand this.

Well, I tried to wrap what are currently a random set of ProcessUtility
arguments into one struct to reduce the notational burden.  But as things
are set up, that's specific to the ALTER TABLE case.  I'm feeling like it
should not be, but I'm not very sure where to draw the line between
arguments that should be folded into the struct and ones that shouldn't.

Note that I think there are live bugs in here that are directly traceable
to not having tried to fold those arguments before.  Of the four existing
recursive ProcessUtility calls with context = PROCESS_UTILITY_SUBCOMMAND,
two pass down the outer call's "ParamListInfo params", and two don't ---
how is it not a bug that they don't all behave alike?  And none of the
four pass down the outer call's QueryEnvironment, which seems like even
more of a bug.  So it feels like we ought to have a uniform approach
to what gets passed down during recursion, and enforce it by passing
all such values in a struct rather than as independent arguments.

regards, tom lane




Re: [HACKERS] Runtime Partition Pruning

2019-05-29 Thread Robert Haas
On Wed, May 29, 2019 at 6:02 PM Tom Lane  wrote:
> Well, it *is* a problem.  The whole point of this discussion I think is
> to try to get better information "by default" for routine bug reports.
> So if those come from production servers without debug symbols, which
> I believe will be the usual case, then it seems likely to me that
> libunwind will produce no better results than glibc.  (But perhaps
> I'm wrong about that --- I have not experimented with libunwind.)

Sure, I agree.

> Now it's true that "install debug symbols" is less of an ask than
> "install debug symbols, *and* gdb, and make sure server core dumps are
> enabled, and then go through this arcane manual procedure next time
> you get a core dump".  But we shouldn't fool ourselves that it isn't
> an ask that's going to be hard for people with corporate policies
> against installing extra stuff on production servers.

There may be cases where that is true, but as you say, it's better
than what we have now.  Plus, what exactly is the alternative?  We
could:

- encourage packagers to install debug symbols by default (but they
might not; it might even be against policy), or
- invent our own system for generating backtraces and ignore what the
OS toolchain knows how to do (sounds painfully complex and expensive),
or
- just live with the fact that it's imperfect.

Is there a fourth option?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Rearranging ALTER TABLE to avoid multi-operations bugs

2019-05-29 Thread Robert Haas
On Wed, May 29, 2019 at 5:52 PM Tom Lane  wrote:
> Hm ... I'm not exactly clear on why that would be a superior solution.
> It would imply that standalone CREATE INDEX etc would call into the
> ALTER TABLE framework --- how is that not equally a layering violation?

Well, the framework could be renamed to something more general, I
suppose, but I don't see a *layering* concern.

>From my point of view, the DDL code doesn't do a great job separating
parsing/parse analysis from optimization/execution.  The ALTER TABLE
stuff is actually pretty good in this regard.  But when you build
something that is basically a parse tree and pass it to some other
function that thinks that parse tree may well be coming straight from
the user, you are not doing a good job distinguishing between a
statement and an action which that statement may caused to be
performed.

> Also, recursive ProcessUtility cases exist independently of this issue,
> in particular in CreateSchemaCommand.  My worry about my patch upthread
> is not really that it introduces another one, but that it doesn't do
> anything towards providing a uniform framework/notation for all these
> cases.

I'm not really sure I understand this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: [HACKERS] Runtime Partition Pruning

2019-05-29 Thread Tom Lane
Robert Haas  writes:
> On Fri, May 24, 2019 at 12:09 PM Tom Lane  wrote:
>> Is it actually better?  The basic problem with backtrace() is that it
>> only knows about global functions, and so reports call sites in static
>> functions as if they were in whatever global function physically precedes
>> the static one.  I think doing materially better requires depending on
>> debug symbols, which (at least in the Red Hat world) aren't going to
>> be there in a typical production situation.

> I don't have an opinion on glibc vs. libunwind, but I don't understand
> this argument.  If you are unlucky enough to have a production server
> that is crashing due to some hitherto-unknown bug, and if it's not
> possible to get a good backtrace without installing debugging symbols,
> then you are going to have to pick between (1) installing those
> debugging symbols and (2) getting a poor backtrace.  I don't really
> see that as a problem so much as just the way life is.

Well, it *is* a problem.  The whole point of this discussion I think is
to try to get better information "by default" for routine bug reports.
So if those come from production servers without debug symbols, which
I believe will be the usual case, then it seems likely to me that
libunwind will produce no better results than glibc.  (But perhaps
I'm wrong about that --- I have not experimented with libunwind.)

Now it's true that "install debug symbols" is less of an ask than
"install debug symbols, *and* gdb, and make sure server core dumps are
enabled, and then go through this arcane manual procedure next time
you get a core dump".  But we shouldn't fool ourselves that it isn't
an ask that's going to be hard for people with corporate policies
against installing extra stuff on production servers.

regards, tom lane




Re: Rearranging ALTER TABLE to avoid multi-operations bugs

2019-05-29 Thread Tom Lane
Robert Haas  writes:
> On Sun, May 26, 2019 at 6:24 PM Tom Lane  wrote:
>> Anybody have thoughts about a different way to approach it?

> I mean, in an ideal world, I think we'd never call back out to
> ProcessUtility() from within AlterTable().  That seems like a pretty
> clear layering violation.  I assume the reason we've never tried to do
> better is a lack of round tuits and/or sufficient motivation.

> In terms of what we'd do instead, I suppose we'd try to move as much
> as possible inside the ALTER TABLE framework proper and have
> everything call into that.

Hm ... I'm not exactly clear on why that would be a superior solution.
It would imply that standalone CREATE INDEX etc would call into the
ALTER TABLE framework --- how is that not equally a layering violation?

Also, recursive ProcessUtility cases exist independently of this issue,
in particular in CreateSchemaCommand.  My worry about my patch upthread
is not really that it introduces another one, but that it doesn't do
anything towards providing a uniform framework/notation for all these
cases.

regards, tom lane




Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-05-29 Thread Robert Haas
On Tue, May 28, 2019 at 11:27 AM Antonin Houska  wrote:
> We thought that it's more convenient for administrator to enter password. To
> achieve that, we can instead tell the admin how to use the key derivation tool
> (pg_keysetup) and send its output to postgres. So yes, it's possible to always
> receive the key from the "encryption key command". I'll change this.

Sounds good.  I'm not quite sure how this is going to work.  Ideally
you'd like the encryption key command to fetch the key from something
like ssh-agent, or maybe pop up a window on the user's terminal with a
key prompt.  Just reading from stdin and writing to stdout is not
going to be very convenient...

> Maintenance of a long patch series requires additional effort and I was busy
> enough with major code rearrangements so far. I'll continue splitting the
> stuff into more diffs.

Right, I understand.  Whatever can be split off can be reviewed and
committed separately, which will start to ease the burden of
maintaining the patch set.  I hope.  But getting over the initial hump
can be a lot of work.

> > +pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade 
> > with
> > +the --link option.)
> >
> > That seems pretty unfortunate.  Any way to do better?
>
> The reason is that the initialization vector (IV) for relation page encryption
> is derived from RelFileNode, and both dbNode and relNode can be different in
> the new cluster. Therefore the data of the old cluster need to be decrypted
> and encrypted using the new IV before it's copied to the new cluster. That's
> why we can't use symlink.
>
> As an alternative, I thought of deriving the IV by hashing the
> .. string, but that would mean that the
> "reencryption" is triggered by commands like "ALTER TABLE ... RENAME TO ...",
> "ALTER TABLE ... SET SCHEMA ...", etc.
>
> Currently I'm thinking of making the IV less dependent on RelFileNode
> (e.g. generate a random number for which RelFileNode serves as the seed) and
> storing it in pg_class as a storage parameter. Therefore we wouldn't have to
> pay any extra attention to transfer of the IV to the new cluster. However, the
> IV storage parameter would need special treatment:
>
> 1. DBA should not be able to remove or change it using ALTER TABLE command
> because inability to decrypt the data can be the only outcome.
>
> 2. The \d+ command of psql client should not display the IV. Displaying it
> probably wouldn't be a security risk, but as we encrypt the whole instance,
> the IV would appear for every single table and that might be annoying.
>
> What do you think about this approach?

I don't think it's a very good idea.  For one thing, you can't store
the IV for pg_class in pg_class; that has a circularity problem.  For
another thing, I'm not sure we can or should try to do catalog access
from every place where an IV might be needed.  In fact, I'd go so far
as to say that sounds like a recipe for a lot of pain and fragility.

One potential approach is to modify pg_upgrade to preserve the dbnode
and relfilenode.  I don't know of any fundamental reason why such a
change shouldn't be possible, although it is probably a bunch of work,
and there may be problems with which I am not acquainted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Inconsistent error message wording for REINDEX CONCURRENTLY

2019-05-29 Thread Robert Haas
On Mon, May 27, 2019 at 4:02 AM Michael Paquier  wrote:
> On Mon, May 27, 2019 at 12:20:58AM -0400, Alvaro Herrera wrote:
> > I wonder if we really want to abolish all distinction between "cannot do
> > X" and "Y is not supported".  I take the former to mean that the
> > operation is impossible to do for some reason, while the latter means we
> > just haven't implemented it yet and it seems likely to get implemented
> > in a reasonable timeframe.  See some excellent commentary about about
> > the "can not" wording at
> > https://postgr.es/m/CA+TgmoYS8jKhETyhGYTYMcbvGPwYY=qa6yyp9b47mx7mwee...@mail.gmail.com
>
> Incorrect URL?

That's one of my messages that never made it through to the list.

Try 
http://postgr.es/m/ca+tgmoz0hzulgvlkf_lrtnydijic4nqd-epcdf_ngtmksfn...@mail.gmail.com

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: message style

2019-05-29 Thread Robert Haas
While reading another thread that attempted to link to this email, I
discovered that this email never made it to the list archives.  I am
trying again.

On Tue, Apr 30, 2019 at 12:05 PM Robert Haas  wrote:
>
> On Tue, Apr 30, 2019 at 10:58 AM Alvaro Herrera
>  wrote:
> > My problem here is not really the replacement of the name to %s, but the
> > "may not be" part of it.  We don't use "may not be" anywhere else; most
> > places seem to use "foo cannot be X" and a small number of other places
> > use "foo must not be Y".  I'm not able to judge which of the two is
> > better (so change all messages to use that form), or if there's a
> > semantic difference and if so which one to use in this case.
>
> The message style guidelines specifically discourage the use of "may",
> IMHO for good reason.  "mumble may not be flidgy" could be trying to
> tell you that something is impermissible, as in "the children may not
> run in the house."  But it could also be warning you that something is
> doubtful, as in "the children may not receive Easter candy this year
> because there is a worldwide chocolate shortage."  Sometimes the same
> sentence can be read either way, like "this table may not be
> truncated," which can mean either that TRUNCATE is going to fail if
> run in the future, or that it is unclear whether TRUNCATE was already
> run in the past.
>
> As far as "cannot" and "must not" is murkier, but it looks to me as
> though we prefer "cannot" to "must not" about 9:1, so most often
> "cannot" is the right thing, but not always.  The distinction seems to
> be that we use "cannot" to talk about things that we are unwilling or
> unable to do in the future, whereas "must not" is used to admonish the
> user about what has already taken place.  Consider:
>
> array must not contain nulls
> header key must not contain newlines
> cast function must not return a set
> interval time zone \"%s\" must not include months or days
> function \"%s\" must not be SECURITY DEFINER
>
> vs.
>
> cannot drop %s because %s requires it
> cannot PREPARE a transaction that has manipulated logical replication workers
> cannot reindex temporary tables of other sessions
> cannot inherit from partitioned table \"%s\"
>
> The first set of messages are essentially complaints about the past.
> The array shouldn't have contained nulls, but it did!  The header key
> should not have contained newlines, but it did!  The cast function
> should not return a set, but it does!  Hence, we are sad and are
> throwing an error.  The second set are statements that we are
> unwilling or unable to proceed, but they don't necessarily carry the
> connotation that there is a problem already in the past.  You've just
> asked for something you are not going to get.
>
> I think principle that still leaves some ambiguity.  For example, you
> could phrase that second of the "cannot" message as "must not try to
> PREPARE a transaction that has manipulated logical replication
> workers."  That's grammatical and everything, but it sounds a bit
> accusatory, like the user is in trouble or something.  I think that's
> probably why we tend to prefer "cannot" in most cases.  But sometimes
> that would lead to a longer or less informative message.  For example,
> you can't just change
>
> function \"%s\" must not be SECURITY DEFINER
>
> to
>
> function \"%s\" can not be SECURITY DEFINER
>
> ...because the user will rightly respond "well, I guess it can,
> because it is."  We could say
>
> can not execute security definer functions from PL/Tcl
>
> ...but that sucks because we now have no reasonable place to put the
> function name.  We could say
>
> can not execute security definer function \"%s\" from PL/Tcl
>
> ...but that also sucks because now the message only says that this one
> particular security definer function cannot be executed, rather than
> saying that ALL security definer functions cannot be executed.   To
> really get there, you'd have to do something like
>
> function "\%s" cannot be executed by PL/Tcl because it is a security
> definer function
>
> ...which is fine, but kinda long.  On the plus side it's more clear
> about the source of the error (PL/Tcl) than the current message which
> doesn't state that explicitly, so perhaps it's an improvement anyway,
> but the general point is that sometimes I think there is no succinct
> way of expressing the complaint clearly without using "must not".
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: some questions about fast-path-lock

2019-05-29 Thread Robert Haas
On Mon, May 27, 2019 at 2:01 AM Alex  wrote:
> I got some idea from the README under storage/lmgr and read some code of  
> LockAcquireExtended ,   but I still have some questions now.
>
> LWLockAcquire(>backendLock, LW_EXCLUSIVE);
> if (FastPathStrongRelationLocks->count[fasthashcode] != 0)
> acquired = false;
> else
>  acquired = FastPathGrantRelationLock(locktag->locktag_field2,
> lockmode);
>
> 1.  In the README,   it says:  "A key point of this algorithm is that it must 
> be possible to verify the
> absence of possibly conflicting locks without fighting over a shared LWLock or
> spinlock.  Otherwise, this effort would simply move the contention bottleneck
> from one place to another."
>
> but in the code, there is LWLockAcquire in the above code.  Actually I can't 
> think out how can we proceed without a lock.

The per-backend lock is not heavily contended, because under normal
circumstances it is only accessed by a single backend.  If there is a
potential lock conflict that must be analyzed then another backend may
acquire it and that might lead to a little bit of contention, but it
happens quite rarely -- so the overall contention is still much less
than if everyone is fighting over the lock manager partition locks.

> 2.   Why does the MyProc->backendLock work?   it is MyProc not a global lock.

It's still an LWLock.  Putting it inside of MyProc doesn't make it
magically stop working.  MyProc is in shared memory, not backend-local
memory, if that's what you are confused about.

> 3. for the line,acquired = 
> FastPathGrantRelationLock(locktag->locktag_field2,
> lockmode);I think it should  be able to replaced with  "acquired = true" 
> (but obviously I'm wrong)  .   I read "FastPathGrantRelationLock" but can't 
> understand it.

It can't say 'acquired = true' because each backend can only acquire a
maximum of 16 relation locks via the fast-path mechanism.  If a
process acquires more than 16 relation locks, at least some of them
will have to be acquired without benefit of the fast-path.  This value
could be changed by changing the value of the constant
FP_LOCK_SLOTS_PER_BACKEND, but since we scan the array linearly,
making it too big will lead to other problems.  I don't quite
understand what about FastPathGrantRelationLock you don't understand -
it's a pretty straightforwardly-coded search for either (a) an
existing fastpath slot for the specified relid or failing that (b) an
unused fastpath slot.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Names

2019-05-29 Thread Robert Haas
On Tue, May 28, 2019 at 12:35 AM Sascha Kuhl  wrote:
> Are there teams behind the names or does everybody write with their personal 
> name?

I think if you spend some time reading the mailing list, you'll be
able to figure out the answer to this question and many others you
might have.  People are generally pretty clear in their messages
whether or not they collaborated with others on the work which they
are presenting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Why does SPI_connect change the memory context?

2019-05-29 Thread Jeff Davis
SPI_connect() changes the memory context to a newly-created one, and
then SPI_finish() restores it. That seems a bit dangerous because the
caller might not be expecting it. Is there a reason it doesn't just
change to the new memory context as-needed?

spi.c:161:

   /* ... and switch to procedure's context */
_SPI_current->savedcxt = MemoryContextSwitchTo(_SPI_current-
>procCxt);

Regards,
Jeff Davis






Re: [HACKERS] Runtime Partition Pruning

2019-05-29 Thread Robert Haas
On Fri, May 24, 2019 at 12:09 PM Tom Lane  wrote:
> Is it actually better?  The basic problem with backtrace() is that it
> only knows about global functions, and so reports call sites in static
> functions as if they were in whatever global function physically precedes
> the static one.  I think doing materially better requires depending on
> debug symbols, which (at least in the Red Hat world) aren't going to
> be there in a typical production situation.

I don't have an opinion on glibc vs. libunwind, but I don't understand
this argument.  If you are unlucky enough to have a production server
that is crashing due to some hitherto-unknown bug, and if it's not
possible to get a good backtrace without installing debugging symbols,
then you are going to have to pick between (1) installing those
debugging symbols and (2) getting a poor backtrace.  I don't really
see that as a problem so much as just the way life is.  You can't
expect to get good debugging output without debugging symbols, just as
you can't expect to get a clean audit without bank statements or a
clear idea of how to find your way to an unknown destination without a
map.  If you don't have the thing that contains the information that
is needed, you can't get the information; c'est la vie.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Rearranging ALTER TABLE to avoid multi-operations bugs

2019-05-29 Thread Robert Haas
On Sun, May 26, 2019 at 6:24 PM Tom Lane  wrote:
> Anybody have thoughts about a different way to approach it?

I mean, in an ideal world, I think we'd never call back out to
ProcessUtility() from within AlterTable().  That seems like a pretty
clear layering violation.  I assume the reason we've never tried to do
better is a lack of round tuits and/or sufficient motivation.

In terms of what we'd do instead, I suppose we'd try to move as much
as possible inside the ALTER TABLE framework proper and have
everything call into that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Different row estimations on base rels

2019-05-29 Thread Donald Dong
On May 29, 2019, at 1:36 PM, Robert Haas  wrote:
> Well, it's all there in the code.  I believe the issue is that the
> final estimates are based on the number of rows that will be returned
> from the relation, which is often less, and occasionally more, than
> the total of the rows in the relation.  The reason it's often less is
> because there might be a WHERE clause or similar which rules out some
> of the rows.  The reason it might be more is because a nested loop
> could return the same rows multiple times.

Yes, indeed. I was confused, and I guess I could've thought about it
about more before posting here. Thank you for answering this
question!

Regards,
Donald Dong




Re: [HACKERS] Unlogged tables cleanup

2019-05-29 Thread Robert Haas
On Sun, May 26, 2019 at 10:11 PM Michael Paquier  wrote:
> If we should do something about such cases, then this brings us back
> to do (INIT | CLEANUP) at the end of recovery anyway?

I believe that could fix problems #1 and #2, but we'd have to think
about what new issues it would introduce.

#3 seems like it needs a different fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Different row estimations on base rels

2019-05-29 Thread Robert Haas
On Sun, May 26, 2019 at 1:00 PM Donald Dong  wrote:
> I noticed the estimated rows of the base relations during the join
> searching is *very* different from the estimations in the final plan.
>
> Join search (rows of the initial_rels):
> RELOPTINFO (ct): rows=1 width=4
> RELOPTINFO (it): rows=1 width=4
> RELOPTINFO (mc): rows=17567 width=32
> RELOPTINFO (mi_idx): rows=1380035 width=8
> RELOPTINFO (t): rows=2528356 width=25
>
> The final plan:
> Seq Scan on company_type ct
>   (cost=0.00..1.05 rows=1 width=4)
> Seq Scan on info_type it
>   (cost=0.00..2.41 rows=1 width=4)
> Parallel Seq Scan on movie_companies mc
>   (cost=0.00..37814.90 rows=7320 width=32)
> Parallel Seq Scan on movie_info_idx mi_idx
>   (cost=0.00..13685.15 rows=575015 width=8)
> Index Scan using title_pkey on title t
>   (cost=0.43..0.58 rows=1 width=25)
>
> By looking at the joinrel->rows, I would expect relation t to have
> the largest size, however, this is not true at all. I wonder what's
> causing this observation, and how to get estimations close to the
> final plan?

Well, it's all there in the code.  I believe the issue is that the
final estimates are based on the number of rows that will be returned
from the relation, which is often less, and occasionally more, than
the total of the rows in the relation.  The reason it's often less is
because there might be a WHERE clause or similar which rules out some
of the rows.  The reason it might be more is because a nested loop
could return the same rows multiple times.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Pinned files at Windows

2019-05-29 Thread Michael Paquier
On Mon, May 27, 2019 at 05:52:13PM +0300, Konstantin Knizhnik wrote:
> Postgres is opening file with FILE_SHARE_DELETE  flag which makes it
> possible to unlink opened file.
> But unlike Unixes, the file is not actually deleted. You can see it using
> "dir" command.
> And stat() function also doesn't return error in this case:
> 
> https://stackoverflow.com/questions/27270374/deletefile-or-unlink-calls-succeed-but-doesnt-remove-file
> 
> So first check in  pgwin32_safestat (r < 0) is not working at all: stat()
> returns 0, but subsequent call of GetFileAttributesEx
> returns 5 (ERROR_ACCESS_DENIED).

So you would basically hijack the result of GetFileAttributesEx() so
as any errors returned by this function complain with ENOENT for
everything seen.  Why would that be a sane idea?  What if say a
permission or another error is legit, but instead ENOENT is returned
as you propose, then the caller would be confused by an incorrect
status.

As you mention, what we did as of 9951741 may not be completely right,
and the reason why it was done this way comes from here:
https://www.postgresql.org/message-id/20160712083220.1426.58...@wrigleys.postgresql.org

Could we instead come up with a reliable way to detect if a file is in
a deletion pending state?  Mapping blindly EACCES to ENOENT is not a
solution I think we can rely on (perhaps we could check only after
ERROR_ACCESS_DENIED using GetLastError() and map back to ENOENT in
this case still this can be triggered if a virus scanner holds the
file for read, no?).  stat() returning 0 for a file pending for
deletion which will go away physically once the handles still keeping
the file around are closed is not something I would have imagined is
sane, but that's what we need to deal with...  Windows has a long
history of keeping things compatible, sometimes in their own weird
way, and it seems that we have one here so I cannot imagine that this
behavior is going to change.

Looking around, I have found out about NtCreateFile() which could be
able to report a proper pending deletion status, still that's only
available in kernel mode.  Perhaps others have ideas?
--
Michael


signature.asc
Description: PGP signature


Re: Dead encoding conversion functions

2019-05-29 Thread Daniel Gustafsson
> On 29 May 2019, at 15:03, Tom Lane  wrote:
> 
> Pursuant to today's discussion at PGCon about code coverage, I went
> nosing into some of the particularly under-covered subdirectories
> in our tree,

On a similar, but much less important/interesting note.  I fat-fingered when
compiling isolationtester on the plane over here and happened to compile
src/test/examples, and in there testlo.c and testlo64.c has two dead functions
for which the callsites have been commented out since the Postgres95 import
(and now cause a warning).  Is there any (historic?) reason to keep that code?
It also seems kind of broken as it doesn’t really handle the open() call
failure very well.

cheers ./daniel



Dead encoding conversion functions

2019-05-29 Thread Tom Lane
Pursuant to today's discussion at PGCon about code coverage, I went
nosing into some of the particularly under-covered subdirectories
in our tree, and immediately tripped over an interesting factoid:
the ASCII<->MIC and ASCII<->UTF8 encoding conversion functions are
untested ... not because the regression tests don't try, but because
those conversions are unreachable.  pg_do_encoding_conversion() and
its sister functions have hard-wired fast paths for any conversion
in which the source or target encoding is SQL_ASCII, so that an
encoding conversion function declared for such a case will never
be used.

(The coverage results do show ascii_to_utf8 as being covered, but
that's just because alter_table.sql randomly chose to test
ALTER CONVERSION using a user-defined conversion from SQL_ASCII
to UTF8, rather than any other case.  CreateConversionCommand()
will invoke the specified function on an empty string just to see
if it works, so that's where that "coverage" comes from.)

This situation seems kinda silly.  My inclination is to delete
these functions as useless, but I suppose another approach is
to suppress the fast paths if there's a declared conversion function.
(Doing so would likely require added catalog lookups in places we
might not want them...)

If we do delete them as useless, it might also be advisable to change
CreateConversionCommand() to refuse creation of conversions to/from
SQL_ASCII, to prevent future confusion.

Thoughts?

regards, tom lane




Re: Quick doc typo fix

2019-05-29 Thread Guillaume Lelarge
Le mer. 29 mai 2019 19:52, Michael Paquier  a écrit :

> On Wed, May 29, 2019 at 07:47:12PM +0200, Guillaume Lelarge wrote:
> > Yeah, I still have quite a lot to process. That might be better to do it
> > all in once.
>
> OK, thanks!  Could you ping me on this thread once you think you are
> done?
>

Sure.


Re: Quick doc typo fix

2019-05-29 Thread Michael Paquier
On Wed, May 29, 2019 at 07:47:12PM +0200, Guillaume Lelarge wrote:
> Yeah, I still have quite a lot to process. That might be better to do it
> all in once.

OK, thanks!  Could you ping me on this thread once you think you are
done?
--
Michael


signature.asc
Description: PGP signature


Re: Indexing - comparison of tree structures

2019-05-29 Thread Michael Paquier
On Tue, May 28, 2019 at 11:37:54AM -0700, Andres Freund wrote:
> 1) Please respect the list style of properly quoting responses inline,
>and only responding to messages that are somewhat related to the
>previous content
> 2) You ask a lot of question, without actually responding to responses
> 3) Please do some of your own research, before asking
>questions. E.g. there's documentation about our btree implementation
>etc in our source tree.

In this case, you may find the various README in the code to be
of interest.  All index access methods are located in
src/backend/access/, and nbtree/README includes documentation for
btree indexes.
--
Michael


signature.asc
Description: PGP signature


Re: Quick doc typo fix

2019-05-29 Thread Guillaume Lelarge
Le mer. 29 mai 2019 19:45, Michael Paquier  a écrit :

> On Wed, May 29, 2019 at 05:30:33PM +0200, Guillaume Lelarge wrote:
> > And here is another one. See patch attached.
>
> Are you still going through some parts of the documentation?  Perhaps
> you may notice something else?  I am wondering if it would be better
> to wait a bit more so as we can group all issues you are finding at
> once.
>

Yeah, I still have quite a lot to process. That might be better to do it
all in once.


Re: Quick doc typo fix

2019-05-29 Thread Michael Paquier
On Wed, May 29, 2019 at 05:30:33PM +0200, Guillaume Lelarge wrote:
> And here is another one. See patch attached.

Are you still going through some parts of the documentation?  Perhaps
you may notice something else?  I am wondering if it would be better
to wait a bit more so as we can group all issues you are finding at
once.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Built-in plugin for logical decoding output

2019-05-29 Thread Dave Cramer
Reviving this thread.



On Tue, 26 Sep 2017 at 13:57, Henry  wrote:

> It seems test_decoding.c could be easily changed to support JSON by using
> the built in PostgreSQL functions (json.c composite_to_json) to convert a
> Datum into SQL. It's use of OidOutputFunctionCall could be modified to emit
> arrays and composite types as JSON. This might be enough to enable
> downstream frameworks to parse (without having to code to the terse and
> positional composite structure format).
>
> It could be a minimal change to have in core using the built in JSON
> support with no additional libraries. I have not made changes to this code
> but it seems like it should work.
>
> Thank you,
> Henry
>
> On Tue, Sep 26, 2017 at 9:37 AM Alvaro Hernandez  wrote:
>
>>
>>
>> On 26/09/17 17:50, Craig Ringer wrote:
>>
>> On 26 September 2017 at 22:14, Magnus Hagander 
>> wrote:
>>
>>>
>>>
>>> On Tue, Sep 26, 2017 at 2:16 PM, Alvaro Hernandez 
>>> wrote:
>>>



 But what about earlier versions? Any chance it could be backported
 down to 9.4? If that would be acceptable, I could probably help/do that...
>>>
>>>
>>> The likelihood is zero if you mean backported into core of earlier
>>> versions.
>>>
>>
>> Right. We don't add features to back branches.
>>
>>
>> Yeah, I know the policy. But asking is free ;) and in my opinion this
>> would be a very good reason to have an exception, if there would be a clear
>> desire to have a single, unified, production quality output plugin across
>> all PostgreSQL versions. At least I tried ;)
>>
>>
>>
>>
>>
>>>
>>> If you mean backported as a standalone extension that could be installed
>>> on a previous version, probably. I'm not sure if it relies on any internals
>>> not present before that would make it harder, but it would probably at
>>> least be possible.
>>>
>>>
>> All the pub/sub stuff is new and hooked into syscache etc. So you'd be
>> doing a bunch of emulation/shims using user catalogs. Not impossible, but
>> probably irritating and verbose. And you'd have none of the DDL required to
>> manage it, so you'd need SQL-function equivalents.
>>
>> I suspect you'd be better off tweaking pglogical to speak the same
>> protocol as pg10, since the pgoutput protocol is an evolution of
>> pglogical's protocol. Then using pglogical on older versions.
>>
>>
>>
>> Given all this, if I would be doing an app based on logical decoding,
>> I think I will stick to test_decoding for <10
>>
>>
>> Thanks,
>>
>>
>> Álvaro
>>
>>
>> --
>>
>> Alvaro Hernandez
>>
>>
>> ---
>> OnGres
>>
>>

I believe there is a valid reason for providing a reasonably feature
complete plugin in core. Specifically in instances such as cloud providers
where the user does not control what is installed on the server it would be
useful to have a decent output plugin.

Having had a cursory look at pgoutput I see no reason why pgoutput could
not be used as general purpose output plugin.

One thing that would be nice is to remove the requirement for a publication
as creating a publication on all tables requires a superuser.
I'm also curious why pgoutput does not send attributes in binary ? This
seems like a rather small change that should provide some significant
performance benefits.


Dave Cramer

da...@postgresintl.com
www.postgresintl.com


Re: [HACKERS] [PATCH] Generic type subscripting

2019-05-29 Thread Pavel Stehule
st 29. 5. 2019 v 17:49 odesílatel Dmitry Dolgov <9erthali...@gmail.com>
napsal:

> Rebase after pg_indent. Besides, off the list there was a suggestion that
> this
> could be useful to accept more than one data type as a key for
> subscripting.
> E.g. for jsonb it probably makes sense to understand both a simple key
> name and
> jsonpath:
>
> jsonb['a'] and jsonb['$.a']
>
> While to implement it can be technically relatively straightforward I
> guess, I
> wonder if there is any opinion about how valuable it could be and what it
> should looks like from the syntax point of view (since I believe a user
> needs
> to specify which type needs to be used).
>

It is difficult decision - possibility to use jsonpath looks great, but
necessity to cast every time is not friendly.

Probably there can be preferred type if subscripting is of unknown type.
There can be similar rules to function's parameters.

so jsonb['a'] -- key
jsonb['$.a'] -- key
jsonb['$.a'::jsonpath'] -- json path

but it can be source of bad issues - so I think we don't need this feature
in this moment. This feature can be implemented later, I think.

Regards

Pavel


Re: Index Skip Scan

2019-05-29 Thread Dmitry Dolgov
> On Sat, May 11, 2019 at 6:35 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> Here is the updated version with the changes I was talking about (mostly about
> readability and code cleanup). I've also added few tests for a cursor 
> behaviour.

And one more cosmetic rebase after pg_indent.


v16-0001-Index-skip-scan.patch
Description: Binary data


Re: Dead stores in src/common/sha2.c

2019-05-29 Thread Hamlin, Garick L
On Wed, May 29, 2019 at 11:01:05AM -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Wed, May 29, 2019 at 01:24:19PM +, Hamlin, Garick L wrote:
> >> I ran clang checker and noticed these.   It looks like the 
> >> sha2 implementation is trying to zero out state on exit, but
> >> clang checker finds at least 'a' is a dead store.  
> >> 
> >> Should we fix this?
> >> Is something like the attached sensible?
> >> Is there a common/better approach to zero-out in PG ?
> 
> > This code comes from the SHA-2 implementation of OpenBSD, so it is not
> > adapted to directly touch it.  What's the current state of this code
> > in upstream?  Should we perhaps try to sync with the upstream
> > implementation instead?
> > After a quick search I am not seeing that this area has actually
> > changed:
> > http://fxr.watson.org/fxr/source/crypto/sha2.c?v=OPENBSD
> 
> Hm ... plastering "volatile"s all over it isn't good for readability
> or for quality of the generated code.  (In particular, I'm worried
> about this patch causing all those variables to be forced into memory
> instead of registers.)

Yeah, I don't actually think it's a great approach which is why I 
was wondering what if PG had a right approach.  I figured it was
the clearest way to start the discussion.  Especially, since I wasn't
sure if people would want to fix it.

> At the same time, I'm not sure if we should just write this off as an
> ignorable warning.  If the C compiler concludes these are dead stores
> it'll probably optimize them away, leading to not accomplishing the
> goal of wiping the values.

Yeah, I mean it's odd to put code in to zero/hide state knowing it's
probably optimized out.  

We could also take it out, but maybe it does help somewhere?

... or put in a comment that says: This probably gets optimized away, but
we don't consider it much of a risk.

> On the third hand, that goal may not be worth much, particularly not
> if the variables do get kept in registers.

I haven't looked at the asm.  
Maybe they are in registers...

Garick




Re: Quick doc typo fix

2019-05-29 Thread Michael Paquier
On Tue, May 28, 2019 at 09:46:48PM +0200, Guillaume Lelarge wrote:
> Hehe, that was the first thing I wrote :) but went with "table and index"
> as it was also used a bit later in the chapter. Both are fine with me.

Okay, done this way.  Thanks for the report.
--
Michael


signature.asc
Description: PGP signature


Re: Quick doc typo fix

2019-05-29 Thread Guillaume Lelarge
Le mar. 28 mai 2019 à 21:46, Guillaume Lelarge  a
écrit :

> Le mar. 28 mai 2019 à 21:28, Michael Paquier  a
> écrit :
>
>> On Tue, May 28, 2019 at 05:05:10PM +0200, Guillaume Lelarge wrote:
>> >   
>> >> linkend="catalog-pg-am">pg_am
>> > -  index access methods
>> > +  table and index access methods
>> >   
>>
>> Perhaps we could just say "relation" here?  That's the term used on
>> the paragraph describing pg_am.
>>
>
> Hehe, that was the first thing I wrote :) but went with "table and index"
> as it was also used a bit later in the chapter. Both are fine with me.
>
>
And here is another one. See patch attached.


-- 
Guillaume.
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index 2413089ffe..4c34ad9cc9 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -174,7 +174,7 @@
  
 
  
-  COPY's FORCE_QUOTE options is
+  COPY's FORCE_QUOTE option is
   currently not supported by file_fdw.
  
 


Re: docs about FKs referencing partitioned tables

2019-05-29 Thread Michael Paquier
On Wed, May 29, 2019 at 02:33:26PM +0900, Amit Langote wrote:
> But couple of other points mentioned in 5.11.3.3. Caveats (of 5.11. Table
> Partitioning) are already repeated in 5.10.1. Caveats; for example, note
> the point about VACUUM, ANALYZE, INSERT ON CONFLICT, etc. applying to
> single tables.  So, perhaps it won't hurt to repeat the caveat about
> indexes and foreign keys too.

OK, committed as such.  Your patch linked to the top of the
inheritance section, so I redirected that to the actual section about
caveats for clarity.
--
Michael


signature.asc
Description: PGP signature


Re: accounting for memory used for BufFile during hash joins

2019-05-29 Thread Tomas Vondra

On Tue, May 28, 2019 at 03:40:01PM +0800, Hubert Zhang wrote:

On Sat, May 4, 2019 at 8:34 AM Tomas Vondra 
wrote:

Hi Tomas

I read your second patch which uses overflow buf files to reduce the total
number of batches.
It would solve the hash join OOM problem what you discussed above: 8K per
batch leads to batch bloating problem.

I mentioned in another thread:

https://www.postgresql.org/message-id/flat/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com
There is another hashjoin OOM problem which disables splitting batches too
early. PG uses a flag hashtable->growEnable to determine whether to split
batches. Once one splitting failed(all the tuples are assigned to only one
batch of two split ones) The growEnable flag would be turned off forever.

The is an opposite side of batch bloating problem. It only contains too few
batches and makes the in-memory hash table too large to fit into memory.



Yes. There are deffinitely multiple separate issues in the hashjoin code,
and the various improvements discussed in this (and other) thread usually
address just a subset of them. We need to figure out how to combine them
or maybe devise some more generic solution.

So I think we need to take a step back, and figure out how to combine
these improvements - otherwise we might commit a fix for one issue, making
it much harder/impossible to improve the other issues.

The other important question is whether we see these cases as outliers
(and the solutions as last-resort-attempt-to-survive kind of fix) or more
widely applicable optimizations. I've seen some interesting speedups with
the overflow-batches patch, but my feeling is we should really treat it as
a last-resort to survive. 


I had a chat about this with Thomas Munro yesterday. Unfortunately, some
beer was involved but I do vaguely remember he more or less convinced me
the BNL (block nested loop join) might be the right approach here. We
don't have any patch for that yet, though :-(


Here is the tradeoff: one batch takes more than 8KB(8KB makes sense, due to
performance), in-memory hash table takes memory as well and splitting
batched may(not must) reduce the in-memory hash table size but introduce
more batches(and thus more memory usage 8KB*#batch).
Can we conclude that it would be worth to splitting if satisfy:
(The reduced memory of in-memory hash table) - (8KB * number of new
batches) > 0



Something like that, yes.


So I'm considering to combine our patch with your patch to fix join OOM
problem. No matter the OOM is introduced by (the memory usage of in-memory
hash table) or (8KB * number of batches).

nbatch_inmemory in your patch could also use the upper rule to redefine.

What's your opinion?



One of the issues with my "overflow batches" patch, pointed out to me by
Thomas yesterday, is that it only works with non-parallel hash join. And
we don't know how to make it work in the parallel mode :-(


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: Dead stores in src/common/sha2.c

2019-05-29 Thread Tom Lane
Michael Paquier  writes:
> On Wed, May 29, 2019 at 01:24:19PM +, Hamlin, Garick L wrote:
>> I ran clang checker and noticed these.   It looks like the 
>> sha2 implementation is trying to zero out state on exit, but
>> clang checker finds at least 'a' is a dead store.  
>> 
>> Should we fix this?
>> Is something like the attached sensible?
>> Is there a common/better approach to zero-out in PG ?

> This code comes from the SHA-2 implementation of OpenBSD, so it is not
> adapted to directly touch it.  What's the current state of this code
> in upstream?  Should we perhaps try to sync with the upstream
> implementation instead?
> After a quick search I am not seeing that this area has actually
> changed:
> http://fxr.watson.org/fxr/source/crypto/sha2.c?v=OPENBSD

Hm ... plastering "volatile"s all over it isn't good for readability
or for quality of the generated code.  (In particular, I'm worried
about this patch causing all those variables to be forced into memory
instead of registers.)

At the same time, I'm not sure if we should just write this off as an
ignorable warning.  If the C compiler concludes these are dead stores
it'll probably optimize them away, leading to not accomplishing the
goal of wiping the values.

On the third hand, that goal may not be worth much, particularly not
if the variables do get kept in registers.

regards, tom lane




Re: Dead stores in src/common/sha2.c

2019-05-29 Thread Michael Paquier
On Wed, May 29, 2019 at 01:24:19PM +, Hamlin, Garick L wrote:
> I ran clang checker and noticed these.   It looks like the 
> sha2 implementation is trying to zero out state on exit, but
> clang checker finds at least 'a' is a dead store.  
> 
> Should we fix this?
> Is something like the attached sensible?
> Is there a common/better approach to zero-out in PG ?

This code comes from the SHA-2 implementation of OpenBSD, so it is not
adapted to directly touch it.  What's the current state of this code
in upstream?  Should we perhaps try to sync with the upstream
implementation instead?

After a quick search I am not seeing that this area has actually
changed:
http://fxr.watson.org/fxr/source/crypto/sha2.c?v=OPENBSD
--
Michael


signature.asc
Description: PGP signature


Dead stores in src/common/sha2.c

2019-05-29 Thread Hamlin, Garick L
I ran clang checker and noticed these.   It looks like the 
sha2 implementation is trying to zero out state on exit, but
clang checker finds at least 'a' is a dead store.  

Should we fix this?
Is something like the attached sensible?
Is there a common/better approach to zero-out in PG ?

Garick
diff --git a/src/common/sha2.c b/src/common/sha2.c
index ae0a1a3..9c19ef2 100644
--- a/src/common/sha2.c
+++ b/src/common/sha2.c
@@ -457,7 +457,16 @@ SHA256_Transform(pg_sha256_ctx *context, const uint8 *data)
context->state[7] += h;
 
/* Clean up */
-   a = b = c = d = e = f = g = h = T1 = T2 = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
 }
 #endif /* 
SHA2_UNROLL_TRANSFORM */
 
@@ -693,7 +702,15 @@ SHA512_Transform(pg_sha512_ctx *context, const uint8 *data)
context->state[7] += h;
 
/* Clean up */
-   a = b = c = d = e = f = g = h = T1 = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
 }
 #else  /* 
SHA2_UNROLL_TRANSFORM */
 
@@ -783,7 +800,16 @@ SHA512_Transform(pg_sha512_ctx *context, const uint8 *data)
context->state[7] += h;
 
/* Clean up */
-   a = b = c = d = e = f = g = h = T1 = T2 = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
+   *((volatile uint32 *) ) = 0;
 }
 #endif /* 
SHA2_UNROLL_TRANSFORM */
 


Doc fix on information_schema.views

2019-05-29 Thread Gilles Darold

Hi,


Seems that per the documentation on information_schema.views [1] we do 
not support check_options ("Applies to a feature not available in 
PostgreSQL").



Attached is a patch that fix this description. As CHECK OPTION is 
supported since 9.4, the patch might be applied on all versions since 9.4.



[1] https://www.postgresql.org/docs/current/infoschema-views.html

--
Gilles Darold
http://www.darold.net/

diff --git a/doc/src/sgml/information_schema.sgml b/doc/src/sgml/information_schema.sgml
index 234a3bb6d1..d8c7902ff8 100644
--- a/doc/src/sgml/information_schema.sgml
+++ b/doc/src/sgml/information_schema.sgml
@@ -6737,7 +6737,7 @@ ORDER BY c.ordinal_position;
  
   check_option
   character_data
-  Applies to a feature not available in PostgreSQL
+  CASCADED or LOCAL if a check option is defined, NONE if not
  
 
  


Server crash due to assertion failure in CheckOpSlotCompatibility()

2019-05-29 Thread Ashutosh Sharma
Hi All,

I'm getting a server crash when executing the following test-case:

create table t1(a int primary key, b text);
insert into t1 values (1, 'aa'), (2, 'bb'), (3, 'aa'), (4, 'bb');
select a, b, array_agg(a order by a) from t1 group by grouping sets ((a),
(b));

*Backtrace:*
#0  0x7f37d0630277 in raise () from /lib64/libc.so.6
#1  0x7f37d0631968 in abort () from /lib64/libc.so.6
#2  0x00a5685e in ExceptionalCondition (conditionName=0xc29fd0
"!(op->d.fetch.kind == slot->tts_ops)", errorType=0xc29cc1
"FailedAssertion",
fileName=0xc29d09 "execExprInterp.c", lineNumber=1905) at assert.c:54
#3  0x006dfa2b in CheckOpSlotCompatibility (op=0x2e84e38,
slot=0x2e6e268) at execExprInterp.c:1905
#4  0x006dd446 in ExecInterpExpr (state=0x2e84da0,
econtext=0x2e6d8e8, isnull=0x7ffe53cba4af) at execExprInterp.c:439
#5  0x007010e5 in ExecEvalExprSwitchContext (state=0x2e84da0,
econtext=0x2e6d8e8, isNull=0x7ffe53cba4af)
at ../../../src/include/executor/executor.h:307
#6  0x00701be7 in advance_aggregates (aggstate=0x2e6d6b0) at
nodeAgg.c:679
#7  0x00703a5d in agg_retrieve_direct (aggstate=0x2e6d6b0) at
nodeAgg.c:1847
#8  0x007034da in ExecAgg (pstate=0x2e6d6b0) at nodeAgg.c:1572
#9  0x006e797f in ExecProcNode (node=0x2e6d6b0) at
../../../src/include/executor/executor.h:239
#10 0x006ea174 in ExecutePlan (estate=0x2e6d458,
planstate=0x2e6d6b0, use_parallel_mode=false, operation=CMD_SELECT,
sendTuples=true,
numberTuples=0, direction=ForwardScanDirection, dest=0x2e76b30,
execute_once=true) at execMain.c:1648
#11 0x006e7f91 in standard_ExecutorRun (queryDesc=0x2e7b3b8,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:365
#12 0x006e7dc7 in ExecutorRun (queryDesc=0x2e7b3b8,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:309
#13 0x008e40c7 in PortalRunSelect (portal=0x2e10bc8, forward=true,
count=0, dest=0x2e76b30) at pquery.c:929
#14 0x008e3d66 in PortalRun (portal=0x2e10bc8,
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x2e76b30,
altdest=0x2e76b30,
completionTag=0x7ffe53cba850 "") at pquery.c:770

The following Assert statement in *CheckOpSlotCompatibility*() fails.

1905Assert(op->d.fetch.kind == slot->tts_ops);

And above assert statement was added by you as a part of the following git
commit.

commit 15d8f83128e15de97de61430d0b9569f5ebecc26
Author: Andres Freund 
Date:   Thu Nov 15 22:00:30 2018 -0800

Verify that expected slot types match returned slot types.

This is important so JIT compilation knows what kind of tuple slot the
deforming routine can expect. There's also optimization potential for
expression initialization without JIT compilation. It e.g. seems
plausible to elide EEOP_*_FETCHSOME ops entirely when dealing with
virtual slots.

Author: Andres Freund

*Analysis:*
I did some quick investigation on this and found that when the aggregate is
performed on the first group i.e. group by 'a', all the input tuples are
fetched from the outer plan and stored into the tuplesort object and for
the subsequent groups i.e. from the second group onwards, the tuples stored
in tuplessort object during 1st phase is used. But, then, the tuples stored
in the tuplesort object are actually the minimal tuples whereas it is
expected to be a heap tuple which actually results into the assertion
failure.

I might be wrong, but it seems to me like the slot fetched from tuplesort
object needs to be converted to the heap tuple. Actually the following
lines of code in agg_retrieve_direct() gets executed only when we have
crossed a group boundary. I think, at least the function call to
ExecCopySlotHeapTuple(outerslot); followed by ExecForceStoreHeapTuple();
should always happen irrespective of the group boundary limit is crossed or
not... Sorry if I'm saying something ...

1871 * If we are grouping,
check whether we've crossed a group
  │
   │1872 * boundary.

  │
   │1873 */

 │
   │1874if (node->aggstrategy
!= AGG_PLAIN)
   │
   │1875{

 │
   │1876
 tmpcontext->ecxt_innertuple = firstSlot;
  │
   │1877if
(!ExecQual(aggstate->phase->eqfunctions[node->numCols - 1],
  │
   │1878
   tmpcontext))
  │
   │1879{

 │
   │1880
 aggstate->grp_firstTuple = ExecCopySlotHeapTuple(outerslot);
  │
   │1881break;

  │
   │1882}

 │
   │1883}

-- 
With 

Re: How to know referenced sub-fields of a composite type?

2019-05-29 Thread Haribabu Kommi
On Wed, May 29, 2019 at 4:51 PM Kohei KaiGai  wrote:

> Hi Amit,
>
> 2019年5月29日(水) 13:26 Amit Langote :
> >
> > Kaigai-san,
> >
> > On 2019/05/29 12:13, Kohei KaiGai wrote:
> > > One interesting data type in Apache Arrow is "Struct" data type. It is
> > > equivalent to composite
> > > type in PostgreSQL. The "Struct" type has sub-fields, and individual
> > > sub-fields have its own
> > > values array for each.
> > >
> > > It means we can skip to load the sub-fields unreferenced, if
> > > query-planner can handle
> > > referenced and unreferenced sub-fields correctly.
> > > On the other hands, it looks to me RelOptInfo or other optimizer
> > > related structure don't have
> > > this kind of information. RelOptInfo->attr_needed tells extension
> > > which attributes are referenced
> > > by other relation, however, its granularity is not sufficient for
> sub-fields.
> >
> > Isn't that true for some other cases as well, like when a query accesses
> > only some sub-fields of a json(b) column?  In that case too, planner
> > itself can't optimize away access to other sub-fields.  What it can do
> > though is match a suitable index to the operator used to access the
> > individual sub-fields, so that the index (if one is matched and chosen)
> > can optimize away accessing unnecessary sub-fields.  IOW, it seems to me
> > that the optimizer leaves it up to the indexes (and plan nodes) to
> further
> > optimize access to within a field.  How is this case any different?
> >
> I think it is a little bit different scenario.
> Even if an index on sub-fields can indicate the tuples to be fetched,
> the fetched tuple contains all the sub-fields because heaptuple is
> row-oriented data.
> For example, if WHERE-clause checks a sub-field: "x" then aggregate
> function references other sub-field "y", Scan/Join node has to return
> a tuple that contains both "x" and "y". IndexScan also pops up a tuple
> with a full composite type, so here is no problem if we cannot know
> which sub-fields are referenced in the later stage.
> Maybe, if IndexOnlyScan supports to return a partial composite type,
> it needs similar infrastructure that can be used for a better composite
> type support on columnar storage.
>

There is another issue related to the columnar store that needs targeted
columns for projection from the scan is discussed in zedstore [1].
Projecting all columns from a columnar store is quite expensive than
the row store.

[1] -
https://www.postgresql.org/message-id/CALfoeivu-n5o8Juz9wW%2BkTjnis6_%2BrfMf%2BzOTky1LiTVk-ZFjA%40mail.gmail.com


Regards,
Haribabu Kommi
Fujitsu Australia


Re: MSVC Build support with visual studio 2019

2019-05-29 Thread Haribabu Kommi
On Mon, May 27, 2019 at 8:14 PM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

>
> On Thu, May 23, 2019 at 3:44 AM Haribabu Kommi 
> wrote:
>
>>
>> Updated patches are attached for all branches.
>>
>>
> I have gone through all patches and there are a couple of typos:
>

Thanks for the review.


> 1. s/prodcutname/productname/
>
> 1.1 In file: 0001-support-building-with-visual-studio-2019_v9.4.patch
> @@ -97,8 +97,9 @@
>Visual Studio 2013. Building with
>Visual Studio 2015 is supported down to
>Windows Vista and Windows Server 2008.
> -   Building with Visual Studio 2017 is
> supported
> -   down to Windows 7 SP1 and Windows Server
> 2008 R2 SP1.
> +   Building with Visual Studio 2017 and
> +   Visual Studio 2019 are supported down to
>
> 1.2 In file:
> 0001-support-building-with-visual-studio-2019_v10_to_v9.5.patch
> @@ -97,8 +97,9 @@
>Visual Studio 2013. Building with
>Visual Studio 2015 is supported down to
>Windows Vista and Windows Server 2008.
> -   Building with Visual Studio 2017 is
> supported
> -   down to Windows 7 SP1 and Windows Server
> 2008 R2 SP1.
> +   Building with Visual Studio 2017 and
> +   Visual Studio 2019 are supported down to
>

Corrected.


> 2. s/stuido/studio/
>
>  2.1 In file: 0001-support-building-with-visual-studio-2019_v9.4.patch
>  @@ -143,12 +173,12 @@ sub DetermineVisualStudioVersion
>  sub _GetVisualStudioVersion
>  {
>   my ($major, $minor) = @_;
> - # visual 2017 hasn't changed the nmake version to 15, so still using
> the older version for comparison.
> + # The major visual stuido that is suppored has nmake version >= 14.20
> and < 15.
>
>  2.2 In file:
> 0001-support-building-with-visual-studio-2019_v10_to_v9.5.patch
> @@ -132,12 +162,12 @@ sub DetermineVisualStudioVersion
>  sub _GetVisualStudioVersion
>  {
>   my ($major, $minor) = @_;
> - # visual 2017 hasn't changed the nmake version to 15, so still using
> the older version for comparison.
> + # The major visual stuido that is suppored has nmake version >= 14.20
> and < 15.
>
>  2.3 In file:  0001-support-building-with-visual-studio-2019_v11.patch
> @@ -139,12 +165,12 @@ sub _GetVisualStudioVersion
>  {
>   my ($major, $minor) = @_;
>
> - # visual 2017 hasn't changed the nmake version to 15, so still using
> the older version for comparison.
> + # The major visual stuido that is suppored has nmake version >= 14.20
> and < 15.
>
>  2.4 In file:  0001-Support-building-with-visual-studio-2019_HEAD.patch
> @@ -106,17 +132,17 @@ sub _GetVisualStudioVersion
>  {
>   my ($major, $minor) = @_;
>
> - # visual 2017 hasn't changed the nmake version to 15, so still using
> the older version for comparison.
> + # The major visual stuido that is suppored has nmake version >= 14.20
> and < 15.
>

Corrected. And also 'supported' spelling is also wrong.

Updated patches are attached.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-support-building-with-visual-studio-2019_v11_v3.patch
Description: Binary data


0001-support-building-with-visual-studio-2019_v10_to_v9.6_v3.patch
Description: Binary data


0001-support-building-with-visual-studio-2019_v9.4_v3.patch
Description: Binary data


0001-Support-building-with-visual-studio-2019_HEAD_v3.patch
Description: Binary data


Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-05-29 Thread Pavan Deolasee
On Tue, 28 May 2019 at 4:36 PM, Andres Freund  wrote:

> Hi,
>
> On 2019-04-07 18:04:27 -0700, Andres Freund wrote:
> > Here's a *prototype* patch for this.  It only implements what I
> > described for heap_multi_insert, not for plain inserts. I wanted to see
> > what others think before investing additional time.
>
> Pavan, are you planning to work on this for v13 CF1? Or have you lost
> interest on the topic?
>

Yes, I plan to work on it.

Thanks,
Pavan
-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: How to know referenced sub-fields of a composite type?

2019-05-29 Thread Kohei KaiGai
Hi Amit,

2019年5月29日(水) 13:26 Amit Langote :
>
> Kaigai-san,
>
> On 2019/05/29 12:13, Kohei KaiGai wrote:
> > One interesting data type in Apache Arrow is "Struct" data type. It is
> > equivalent to composite
> > type in PostgreSQL. The "Struct" type has sub-fields, and individual
> > sub-fields have its own
> > values array for each.
> >
> > It means we can skip to load the sub-fields unreferenced, if
> > query-planner can handle
> > referenced and unreferenced sub-fields correctly.
> > On the other hands, it looks to me RelOptInfo or other optimizer
> > related structure don't have
> > this kind of information. RelOptInfo->attr_needed tells extension
> > which attributes are referenced
> > by other relation, however, its granularity is not sufficient for 
> > sub-fields.
>
> Isn't that true for some other cases as well, like when a query accesses
> only some sub-fields of a json(b) column?  In that case too, planner
> itself can't optimize away access to other sub-fields.  What it can do
> though is match a suitable index to the operator used to access the
> individual sub-fields, so that the index (if one is matched and chosen)
> can optimize away accessing unnecessary sub-fields.  IOW, it seems to me
> that the optimizer leaves it up to the indexes (and plan nodes) to further
> optimize access to within a field.  How is this case any different?
>
I think it is a little bit different scenario.
Even if an index on sub-fields can indicate the tuples to be fetched,
the fetched tuple contains all the sub-fields because heaptuple is
row-oriented data.
For example, if WHERE-clause checks a sub-field: "x" then aggregate
function references other sub-field "y", Scan/Join node has to return
a tuple that contains both "x" and "y". IndexScan also pops up a tuple
with a full composite type, so here is no problem if we cannot know
which sub-fields are referenced in the later stage.
Maybe, if IndexOnlyScan supports to return a partial composite type,
it needs similar infrastructure that can be used for a better composite
type support on columnar storage.

> > Probably, all we can do right now is walk-on the RelOptInfo list to
> > lookup FieldSelect node
> > to see the referenced sub-fields. Do we have a good idea instead of
> > this expensive way?
> > # Right now, PG-Strom loads all the sub-fields of Struct column from
> > arrow_fdw foreign-table
> > # regardless of referenced / unreferenced sub-fields. Just a second best.
>
> I'm missing something, but if PG-Strom/arrow_fdw does look at the
> FieldSelect nodes to see which sub-fields are referenced, why doesn't it
> generate a plan that will only access those sub-fields or why can't it?
>
Likely, it is not a technical problem but not a smart implementation.
If I missed some existing infrastructure we can apply, it may be more suitable
than query/expression tree walking.

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei