Re: [HACKERS] Multiple false-positive warnings from Valgrind

2017-03-31 Thread Noah Misch
On Fri, Mar 31, 2017 at 04:40:07PM +0300, Aleksander Alekseev wrote:

> > > And it seems to me that this is caused by the routines of OpenSSL.
> > > When building without --with-openssl, using the fallback
> > > implementations of SHA256 and RAND_bytes I see no warnings generated
> > > by scram_build_verifier... I think it makes most sense to discard that
> > > from the list of open items.
> > 
> > FWIW a document of the function says that,
> > 
> > https://www.openssl.org/docs/man1.0.1/crypto/RAND_bytes.html
> > 
> > > The contents of buf is mixed into the entropy pool before
> > > retrieving the new pseudo-random bytes unless disabled at compile
> > > time (see FAQ).
> > 
> > This isn't saying that RAND_bytes does the same thing but
> > something similar can be happening there.
> 
> OK, turned out that warnings regarding uninitialized values disappear
> after removing --with-openssl. That's a good thing.

Does this remove the noise under --with-openssl?

--- a/src/port/pg_strong_random.c
+++ b/src/port/pg_strong_random.c
@@ -104,7 +104,10 @@ pg_strong_random(void *buf, size_t len)
 */
 #if defined(USE_OPENSSL_RANDOM)
if (RAND_bytes(buf, len) == 1)
+   {
+   VALGRIND_MAKE_MEM_DEFINED(buf, len);
return true;
+   }
return false;
 
/*

> What about all these memory leak reports [1]? If I see them should I just
> ignore them or, if reports look false positive, suggest a patch that
> modifies a Valgrind suppression file? In other words what is current
> consensus in community regarding Valgrind and it's reports?

Pass --leak-check=no; PostgreSQL intentionally leaks a lot.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Making clausesel.c Smarter

2017-03-31 Thread Claudio Freire
On Fri, Feb 24, 2017 at 7:00 AM, David Rowley
 wrote:
> I've stumbled over an interesting query out in the wild where the
> query was testing a nullable timestamptz column was earlier than some
> point in time, and also checking the column IS NOT NULL.
>
> The use pattern here was that records which required processing had
> this timestamp set to something other than NULL, a worker would come
> along and process those, and UPDATE the record to NULL to mark the
> fact that it was now processed. So what we are left with was a table
> with a small number of rows with a value in this timestamp column, and
> an ever-growing number of rows with a NULL value.
>
> A highly simplified version of the query was checking for records that
> required processing before some date, say:
>
> SELECT * FROM a WHERE ts < '2017-02-24' AND ts IS NOT NULL;
>
> Of course, the ts IS NOT NULL is not required here, but I can
> understand how someone might make the mistake of adding this. The
> simple solution to the problem was to have that null test removed, but
> that seemingly innocent little mistake caused some pain due to very
> slow running queries which held on to a nested loop plan 33 times
> longer than it should have been doing. Basically what was happening
> here is that clauselist_selectivity() was applying the selectivity
> with the ~0.97 null_fact from pg_statistic over the top of the already
> applied estimate on the number of rows below the constant timestamp.
>
> Since the diagnosis of this problem was not instant, and some amount
> of pain was suffered before the solution was found,  I wondered if it
> might be worth smartening up the planner a bit in this area.
>
> We're already pretty good at handling conditions like: SELECT * FROM a
> WHERE x < 10 and x < 1; where we'll effectively ignore the x < 10
> estimate since x < 1 is more restrictive, so if we just build on that
> ability a bit we could get enough code to cover the above case.
>
> I've attached a draft patch which improves the situation here.

I thought to take a quick look at this patch. I'll probably take a
deeper look later, but some initial comments:

-typedef struct RangeQueryClause
+typedef struct CachedSelectivityClause
 {
-struct RangeQueryClause *next;/* next in linked list */
+struct CachedSelectivityClause *next;/* next in linked list */
 Node   *var;/* The common variable of the clauses */
-boolhave_lobound;/* found a low-bound clause yet? */
-boolhave_hibound;/* found a high-bound clause yet? */
+intselmask;/* Bitmask of which sel types are stored */
 Selectivity lobound;/* Selectivity of a var > something clause */
 Selectivity hibound;/* Selectivity of a var < something clause */
-} RangeQueryClause;


As seems customary in other code, perhaps you should define some
HAS_X() macros for dealing with the selmask.

Something like

#SELMASK_HAS_LOBOUND(selmask) (((selmask) & CACHESEL_LOBOUND) != 0)
etc..

+static void addCachedSelectivityRangeVar(CachedSelectivityClause
**cslist, Node *var,
bool varonleft, bool isLTsel, Selectivity s2);

You're changing clause -> var throughout the code when dealing with
nodes, but looking at their use, they hold neither. All those
addCachedSelectivity functions are usually passed expression nodes. If
you're renaming, perhaps you should rename to "expr".


On Fri, Feb 24, 2017 at 7:00 AM, David Rowley
 wrote:
> Now one thing I was unsure of in the patch was this "Other clauses"
> concept that I've come up with. In the patch we have:
>
> default:
> addCachedSelectivityOtherClause(, var, s2);
> break;
>
> I was unsure if this should only apply to F_EQSEL and F_NEQSEL. My
> imagination here won't stretch far enough to imagine a OpExpr which
> passes with a NULL operand. If it did my logic is broken, and if
> that's the case then I think limiting "Others" to mean F_EQSEL and
> F_NEQSEL would be the workaround.

While not specifically applicable only to "Others", something needs
consideration here:

User-defined functions can be nonstrict. An OpExpr involving such a
user-defined function could possibly pass on null.

I would suggest avoiding making the assumption that it doesn't unless
the operator is strict.

One could argue that such an operator should provide its own
selectivity estimator, but the strictness check shouldn't be too
difficult to add, and I think it's a better choice.

So you'd have to check that:

default:
if (op_strict(expr->opno) && func_strict(expr->opfuncid))
addCachedSelectivityOtherClause(, var, s2);
else
s1 = s1 * s2;
break;

So, I went ahead and did that.

Considering this setup:

createdb pgtest
cat 

Re: [HACKERS] Table collision in join.sql and aggregates.sql

2017-03-31 Thread Douglas Doole
D'oh! The "temp" declaration had been removed from our test since we don't
use temp tables. I missed that when applying it to the community code.

You can ignore me now.

On Fri, Mar 31, 2017 at 4:01 PM Tom Lane  wrote:

> Douglas Doole  writes:
> > As we've been merging our code with 9.6, a couple developers have had
> > one-off failures in the join.sql and aggregates.sql test because the
> tables
> > T1, T2 and T3 have the wrong definitions.
>
> > Digging into it, I found that both files create the tables T1, T2, and T3
> > for a short period of time and then drop them. Since the
> parallel_schedule
> > file puts these two files into the same group, they can run concurrently.
> > If it happens that the the two files hit the T1, T2, T3 tests at the same
> > time, then you see the table definition problems.
>
> Hmmm ... that would indeed be a problem, except that aggregate.sql's
> tables are temp tables, which should mean that they are in a schema
> different from the one that join.sql is putting its tables in.  Are you
> sure you've identified the issue correctly?  Because if this doesn't
> work, there are an awful lot of similar hazards elsewhere in the
> regression tests.
>
> regards, tom lane
>


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-31 Thread Petr Jelinek
On 01/04/17 04:19, Andres Freund wrote:
> On 2017-03-31 21:30:17 -0400, Robert Haas wrote:
>> On Fri, Mar 31, 2017 at 9:26 PM, Petr Jelinek
>>  wrote:
 Hmm, I don't know if there's any good reason not to just use strcmp(),
 but sure, OK.  Committed and back-patched.
>>>
>>> Hmm culicidae still fails, this time only in parallel worker code. This
>>> didn't happen on my machine which is strange. Looking at the code, we
>>> are passing the fps->entrypoint as function pointer again so of course
>>> it fails. We have some code to load libraries again but even that gets
>>> initial entrypoint passed as function pointer
>>> (ParallelExtensionTrampoline). I wonder if we'll have to generalize the
>>> InternalBGWorkers even more to some kind of internal function name to
>>> pointer map and add the parallel entry points there as well.
>>
>> Argh, I forgot about that.  I think we need to use something like
>> ParallelExtensionTrampoline all the time, not just for libraries.
>> Since effectively, we've determined that postgres itself has the same
>> problem.
> 
> I wonder if we shouldn't just introduce a KnownFunctionPointer[] map,
> that can be used by various facilities (bgworkers themselves,
> parallelism, and probably more coming).  Otherwise we'll introduce
> mechanisms like 7d8f6986b8 in multiple places.
> 

Yes that's what I meant with what I said above.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TPC-H Q20 from 1 hour to 19 hours!

2017-03-31 Thread Tomas Vondra



On 04/01/2017 02:57 AM, Robert Haas wrote:

On Fri, Mar 31, 2017 at 7:53 AM, Rafia Sabih
 wrote:

So, it looks like in the problematic area, it is not improving much.
Please find the attached file for the query plan of Q20 with and
without patch. I haven't evaluated the performance of this query with
patch.


Yeah, I don't think this patch is fixing whatever is going wrong with
Q20.  It's just a problem Tomas found while investigating that issue.
Unfortunately, I don't think we know what's really going on here yet.



Yeah, it's just a side-bug. I'll look into the issue again once I get 
back home from pgconf.us, it's a little bit difficult to investigate 
this over ssh on the in-flight wifi :-/


My hunch is that the two foreign keys interfere in some strange way, 
i.e. we should apply just one and end up applying both, or something 
like that. Or perhaps it gets confused by the join-to-aggregated-rel?.


It's a bit weird, though, because the patch was originally meant to help 
with multi-column keys. It was then extended to also consider 
single-column FKs, but that should really be just 1/ndistinct anyway.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Avoid GatherMerge crash when there are no workers.

2017-03-31 Thread Andres Freund
Hi,

On 2017-04-01 01:22:14 +, Robert Haas wrote:
> Avoid GatherMerge crash when there are no workers.

I think the gather merge code needs a bit more test coverage (sorry to
make this a larger theme today).  As shown by
https://coverage.postgresql.org/src/backend/executor/nodeGatherMerge.c.gcov.html
we don't actually merge anything (heap_compare_slots is not exercised).

I btw also wonder if it's good that we have a nearly identical copy of
heap_compare_slots and a bunch of the calling code in both
nodeMergeAppend.c and nodeGatherMerge.c.  On the other hand, it's not
heavily envolving code.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-31 Thread Andres Freund
On 2017-03-31 21:30:17 -0400, Robert Haas wrote:
> On Fri, Mar 31, 2017 at 9:26 PM, Petr Jelinek
>  wrote:
> >> Hmm, I don't know if there's any good reason not to just use strcmp(),
> >> but sure, OK.  Committed and back-patched.
> >
> > Hmm culicidae still fails, this time only in parallel worker code. This
> > didn't happen on my machine which is strange. Looking at the code, we
> > are passing the fps->entrypoint as function pointer again so of course
> > it fails. We have some code to load libraries again but even that gets
> > initial entrypoint passed as function pointer
> > (ParallelExtensionTrampoline). I wonder if we'll have to generalize the
> > InternalBGWorkers even more to some kind of internal function name to
> > pointer map and add the parallel entry points there as well.
> 
> Argh, I forgot about that.  I think we need to use something like
> ParallelExtensionTrampoline all the time, not just for libraries.
> Since effectively, we've determined that postgres itself has the same
> problem.

I wonder if we shouldn't just introduce a KnownFunctionPointer[] map,
that can be used by various facilities (bgworkers themselves,
parallelism, and probably more coming).  Otherwise we'll introduce
mechanisms like 7d8f6986b8 in multiple places.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-31 Thread Andres Freund
On 2017-04-01 03:57:05 +0200, Petr Jelinek wrote:
> On 01/04/17 03:44, Andres Freund wrote:
> > On 2017-04-01 03:26:01 +0200, Petr Jelinek wrote:
> >> On 01/04/17 02:53, Robert Haas wrote:
> >>> On Fri, Mar 31, 2017 at 8:32 PM, Petr Jelinek
> >>>  wrote:
>  Right, changed to BGW_MAXLEN.
> >>>
> >>> Hmm, I don't know if there's any good reason not to just use strcmp(),
> >>> but sure, OK.  Committed and back-patched.
> > 
> > Cool!
> > 
> > 
> >> Hmm culicidae still fails, this time only in parallel worker code.
> >> This  didn't happen on my machine which is strange.
> > 
> > Interesting.  Did you reproduce the failure before?  I wonder if the
> > issue is that you don't build a position independent executable?
> > 
> 
> I did reproduce the original issue yes, that's how I tested the patch. I
> just copy pasted CFLAGS from the buildfarm configuration of culicidae TBH.

That's not necessarily sufficient, unless you also use a sufficiently
new debian - they changed gcc to default to PIE being enabled by
default.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Ashutosh Sharma
Hi,

On Sat, Apr 1, 2017 at 7:06 AM, Ashutosh Sharma  wrote:
> On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas  wrote:
>> On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma  
>> wrote:
>>> Well, That is another option but the main idea was to be inline with
>>> the btree code.
>>
>> That's not a bad goal in principal, but _bt_killitems() doesn't have
>> any similar argument.
>
> It was there but later got removed with some patch (may be the patch
> for reducing pinning and buffer content lock for btree scans). The
> following commit has this changes.
>
> commit 09cb5c0e7d6fbc9dee26dc429e4fc0f2a88e5272
> Author: Tom Lane 
> Date:   Sun May 7 01:21:30 2006 +
>
>>
>> (Also, speaking of consistency, why did we end up with
>> _hash_kill_items, with an underscore between kill and items, and
>> _bt_killitems, without one?)
>
> That is just to follow the naming convention in hash.h Please check
> the review comments for this at - [1].
>
>>
>>> Moreover, I am not sure if acquiring lwlock inside
>>> hashendscan (basically the place where we are trying to close down the
>>> things) would look good.
>>
>> Well, if that's not a good thing to do, hiding it inside some other
>> function doesn't make it better.
>
> okay, agreed. I will submit the patch very shortly.

As suggested, I am now acquiring lock inside the caller function.
Attached is the patch having this changes. Thanks.

>
>
> [1] - 
> https://www.postgresql.org/message-id/cf6abd8a-77b5-f1c7-8e50-5bef461e0522%40redhat.com

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 34cc08f..b835f77 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -476,9 +476,17 @@ hashrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
 	Relation	rel = scan->indexRelation;
 
-	/* Before leaving current page, deal with any killed items */
+	/*
+	 * Before leaving current page, deal with any killed items.
+	 * Also, ensure that we acquire lock on current page before
+	 * calling _hash_kill_items.
+	 */
 	if (so->numKilled > 0)
+	{
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
 		_hash_kill_items(scan);
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
+	}
 
 	_hash_dropscanbuf(rel, so);
 
@@ -507,9 +515,17 @@ hashendscan(IndexScanDesc scan)
 	HashScanOpaque so = (HashScanOpaque) scan->opaque;
 	Relation	rel = scan->indexRelation;
 
-	/* Before leaving current page, deal with any killed items */
+	/*
+	 * Before leaving current page, deal with any killed items.
+	 * Also, ensure that we acquire lock on current page before
+	 * calling _hash_kill_items.
+	 */
 	if (so->numKilled > 0)
+	{
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_SHARE);
 		_hash_kill_items(scan);
+		LockBuffer(so->hashso_curbuf, BUFFER_LOCK_UNLOCK);
+	}
 
 	_hash_dropscanbuf(rel, so);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-31 Thread Petr Jelinek
On 01/04/17 03:44, Andres Freund wrote:
> On 2017-04-01 03:26:01 +0200, Petr Jelinek wrote:
>> On 01/04/17 02:53, Robert Haas wrote:
>>> On Fri, Mar 31, 2017 at 8:32 PM, Petr Jelinek
>>>  wrote:
 Right, changed to BGW_MAXLEN.
>>>
>>> Hmm, I don't know if there's any good reason not to just use strcmp(),
>>> but sure, OK.  Committed and back-patched.
> 
> Cool!
> 
> 
>> Hmm culicidae still fails, this time only in parallel worker code.
>> This  didn't happen on my machine which is strange.
> 
> Interesting.  Did you reproduce the failure before?  I wonder if the
> issue is that you don't build a position independent executable?
> 

I did reproduce the original issue yes, that's how I tested the patch. I
just copy pasted CFLAGS from the buildfarm configuration of culicidae TBH.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-31 Thread Andres Freund
On 2017-04-01 03:26:01 +0200, Petr Jelinek wrote:
> On 01/04/17 02:53, Robert Haas wrote:
> > On Fri, Mar 31, 2017 at 8:32 PM, Petr Jelinek
> >  wrote:
> >> Right, changed to BGW_MAXLEN.
> > 
> > Hmm, I don't know if there's any good reason not to just use strcmp(),
> > but sure, OK.  Committed and back-patched.

Cool!


> Hmm culicidae still fails, this time only in parallel worker code.
> This  didn't happen on my machine which is strange.

Interesting.  Did you reproduce the failure before?  I wonder if the
issue is that you don't build a position independent executable?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Ashutosh Sharma
On Sat, Apr 1, 2017 at 6:00 AM, Robert Haas  wrote:
> On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma  
> wrote:
>> Well, That is another option but the main idea was to be inline with
>> the btree code.
>
> That's not a bad goal in principal, but _bt_killitems() doesn't have
> any similar argument.

It was there but later got removed with some patch (may be the patch
for reducing pinning and buffer content lock for btree scans). The
following commit has this changes.

commit 09cb5c0e7d6fbc9dee26dc429e4fc0f2a88e5272
Author: Tom Lane 
Date:   Sun May 7 01:21:30 2006 +

>
> (Also, speaking of consistency, why did we end up with
> _hash_kill_items, with an underscore between kill and items, and
> _bt_killitems, without one?)

That is just to follow the naming convention in hash.h Please check
the review comments for this at - [1].

>
>> Moreover, I am not sure if acquiring lwlock inside
>> hashendscan (basically the place where we are trying to close down the
>> things) would look good.
>
> Well, if that's not a good thing to do, hiding it inside some other
> function doesn't make it better.

okay, agreed. I will submit the patch very shortly.


[1] - 
https://www.postgresql.org/message-id/cf6abd8a-77b5-f1c7-8e50-5bef461e0522%40redhat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] A better way to expand hash indexes.

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 1:15 AM, Mithun Cy  wrote:
> Thanks, I have tried to fix all of the comments.

Thanks.

Hmm, don't the changes to contrib/pageinspect/expected/hash.out
indicate that you've broken something?  The hash index has only 4
buckets, so the new code shouldn't be doing anything differently, but
you've got highmask and lowmask changing for some reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 9:26 PM, Petr Jelinek
 wrote:
>> Hmm, I don't know if there's any good reason not to just use strcmp(),
>> but sure, OK.  Committed and back-patched.
>
> Hmm culicidae still fails, this time only in parallel worker code. This
> didn't happen on my machine which is strange. Looking at the code, we
> are passing the fps->entrypoint as function pointer again so of course
> it fails. We have some code to load libraries again but even that gets
> initial entrypoint passed as function pointer
> (ParallelExtensionTrampoline). I wonder if we'll have to generalize the
> InternalBGWorkers even more to some kind of internal function name to
> pointer map and add the parallel entry points there as well.

Argh, I forgot about that.  I think we need to use something like
ParallelExtensionTrampoline all the time, not just for libraries.
Since effectively, we've determined that postgres itself has the same
problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-31 Thread Petr Jelinek
On 01/04/17 02:53, Robert Haas wrote:
> On Fri, Mar 31, 2017 at 8:32 PM, Petr Jelinek
>  wrote:
>> Right, changed to BGW_MAXLEN.
> 
> Hmm, I don't know if there's any good reason not to just use strcmp(),
> but sure, OK.  Committed and back-patched.
> 

Hmm culicidae still fails, this time only in parallel worker code. This
didn't happen on my machine which is strange. Looking at the code, we
are passing the fps->entrypoint as function pointer again so of course
it fails. We have some code to load libraries again but even that gets
initial entrypoint passed as function pointer
(ParallelExtensionTrampoline). I wonder if we'll have to generalize the
InternalBGWorkers even more to some kind of internal function name to
pointer map and add the parallel entry points there as well.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-31 Thread Robert Haas
On Tue, Mar 28, 2017 at 11:08 PM, Tomas Vondra
 wrote:
> Maybe. It depends on how valuable it's to keep Gather and GatherMerge
> similar - having nreaders in one and not the other seems a bit weird. But
> maybe the refactoring would remove it from both nodes?

Yeah, it appears to be the case that both Gather and GatherMerge
inevitably end up with nworkers_launched == nreaders, which seems
dumb.  If we're going to clean this up, I think we should  make them
both match.

> Also, it does not really solve the issue that we're using 'nreaders' or
> 'nworkers_launched' to access array with one extra element.

I'm not clear that there's any fundamental solution to that problem
other than trying to clarify it through comments.

Meantime, I think it's not good to leave this crashing, so I pushed
Rushabh's v2 patch for the actual crash for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TPC-H Q20 from 1 hour to 19 hours!

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 7:53 AM, Rafia Sabih
 wrote:
> So, it looks like in the problematic area, it is not improving much.
> Please find the attached file for the query plan of Q20 with and
> without patch. I haven't evaluated the performance of this query with
> patch.

Yeah, I don't think this patch is fixing whatever is going wrong with
Q20.  It's just a problem Tomas found while investigating that issue.
Unfortunately, I don't think we know what's really going on here yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 8:32 PM, Petr Jelinek
 wrote:
> Right, changed to BGW_MAXLEN.

Hmm, I don't know if there's any good reason not to just use strcmp(),
but sure, OK.  Committed and back-patched.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical replication launcher crash on buildfarm

2017-03-31 Thread Petr Jelinek
On 31/03/17 15:42, Robert Haas wrote:
> On Tue, Mar 28, 2017 at 7:39 PM, Petr Jelinek
>  wrote:
>> Sigh, forgot git add for the docs, so one more try...
> 
> +if (strncmp(worker->bgw_library_name, "postgres", 8) != 0)
> +return NULL;
> 
> I think that's not right.  You don't want to match postgresshdkgjsdglkjs.
>

Right, changed to BGW_MAXLEN.

> Aside from that, these look good to me.
> 

Cool.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 74b9f0d7063071dd443fa820984fba477f4446cc Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Tue, 28 Mar 2017 23:24:34 +0200
Subject: [PATCH] Use library_name/function_name for loading main entry point
 of internal bgworkers

---
 src/backend/access/transam/parallel.c|  7 +--
 src/backend/postmaster/bgworker.c| 76 ++--
 src/include/access/parallel.h|  2 +
 src/test/modules/worker_spi/worker_spi.c |  6 ++-
 4 files changed, 73 insertions(+), 18 deletions(-)

diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index cde0ed3..d2bed72 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -109,7 +109,6 @@ static dlist_head pcxt_list = DLIST_STATIC_INIT(pcxt_list);
 /* Private functions. */
 static void HandleParallelMessage(ParallelContext *pcxt, int i, StringInfo msg);
 static void ParallelExtensionTrampoline(dsm_segment *seg, shm_toc *toc);
-static void ParallelWorkerMain(Datum main_arg);
 static void WaitForParallelWorkersToExit(ParallelContext *pcxt);
 
 
@@ -456,7 +455,9 @@ LaunchParallelWorkers(ParallelContext *pcxt)
 		BGWORKER_SHMEM_ACCESS | BGWORKER_BACKEND_DATABASE_CONNECTION;
 	worker.bgw_start_time = BgWorkerStart_ConsistentState;
 	worker.bgw_restart_time = BGW_NEVER_RESTART;
-	worker.bgw_main = ParallelWorkerMain;
+	worker.bgw_main = NULL;
+	sprintf(worker.bgw_library_name, "postgres");
+	sprintf(worker.bgw_function_name, "ParallelWorkerMain");
 	worker.bgw_main_arg = UInt32GetDatum(dsm_segment_handle(pcxt->seg));
 	worker.bgw_notify_pid = MyProcPid;
 	memset(_extra, 0, BGW_EXTRALEN);
@@ -928,7 +929,7 @@ AtEOXact_Parallel(bool isCommit)
 /*
  * Main entrypoint for parallel workers.
  */
-static void
+void
 ParallelWorkerMain(Datum main_arg)
 {
 	dsm_segment *seg;
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 52bc4e9..5ea5abf 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -16,6 +16,7 @@
 
 #include "miscadmin.h"
 #include "libpq/pqsignal.h"
+#include "access/parallel.h"
 #include "postmaster/bgworker_internals.h"
 #include "postmaster/postmaster.h"
 #include "storage/barrier.h"
@@ -94,6 +95,25 @@ struct BackgroundWorkerHandle
 static BackgroundWorkerArray *BackgroundWorkerData;
 
 /*
+ * List of internal background workers. These are used for mapping the
+ * function name to actual function when building with EXEC_BACKEND and also
+ * to allow these to be loaded outside of shared_preload_libraries.
+ */
+typedef struct InternalBGWorkerMain
+{
+	char			   *bgw_function_name;
+	bgworker_main_type	bgw_main;
+} InternalBGWorkerMain;
+
+static const InternalBGWorkerMain InternalBGWorkers[] = {
+	{"ParallelWorkerMain", ParallelWorkerMain},
+	/* Dummy entry marking end of the array. */
+	{NULL, NULL}
+};
+
+static bgworker_main_type GetInternalBgWorkerMain(BackgroundWorker *worker);
+
+/*
  * Calculate shared memory needed.
  */
 Size
@@ -695,22 +715,27 @@ StartBackgroundWorker(void)
 #endif
 	}
 
+	/* For internal workers set the entry point to known function address. */
+	entrypt = GetInternalBgWorkerMain(worker);
+
 	/*
-	 * If bgw_main is set, we use that value as the initial entrypoint.
-	 * However, if the library containing the entrypoint wasn't loaded at
-	 * postmaster startup time, passing it as a direct function pointer is not
-	 * possible.  To work around that, we allow callers for whom a function
-	 * pointer is not available to pass a library name (which will be loaded,
-	 * if necessary) and a function name (which will be looked up in the named
+	 * Otherwise, if bgw_main is set, we use that value as the initial
+	 * entrypoint. This does not work well EXEC_BACKEND outside Windows but
+	 * we keep the logic for backwards compatibility. In other cases use
+	 * the entry point specified by library name (which will be loaded, if
+	 * necessary) and a function name (which will be looked up in the named
 	 * library).
 	 */
-	if (worker->bgw_main != NULL)
-		entrypt = worker->bgw_main;
-	else
-		entrypt = (bgworker_main_type)
-			load_external_function(worker->bgw_library_name,
-   worker->bgw_function_name,
-   true, NULL);
+	if (entrypt == NULL)
+	{
+		if (worker->bgw_main != NULL)
+			entrypt = worker->bgw_main;
+		else
+			entrypt = (bgworker_main_type)
+

Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 6:09 PM, Ashutosh Sharma  wrote:
> Well, That is another option but the main idea was to be inline with
> the btree code.

That's not a bad goal in principal, but _bt_killitems() doesn't have
any similar argument.

(Also, speaking of consistency, why did we end up with
_hash_kill_items, with an underscore between kill and items, and
_bt_killitems, without one?)

> Moreover, I am not sure if acquiring lwlock inside
> hashendscan (basically the place where we are trying to close down the
> things) would look good.

Well, if that's not a good thing to do, hiding it inside some other
function doesn't make it better.  I think it's fine, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Somebody has not thought through subscription locking considerations

2017-03-31 Thread Petr Jelinek
On 01/04/17 01:57, Petr Jelinek wrote:
> On 01/04/17 01:20, Tom Lane wrote:
>> Petr Jelinek  writes:
>>> But the pg_subscription_rel is also not accessed on heap_open, the
>>> problematic code is called from heap_drop_with_catalog. And VACUUM FULL
>>> pg_class calls heap_drop_with_catalog() when doing the heap swap (and it
>>> goes through performDeletion so through dependency info which is what I
>>> mean by everything else does this).
>>
>> Hmmm.  What the heap_drop_with_catalog call is being applied to is a
>> short-lived table that is not pg_class.  It happens to own the disk
>> storage that used to belong to pg_class, but it is not pg_class;
>> in particular it doesn't have the same OID, and it is not what would
>> be looked in if someone happened to need to fetch a pg_class row
>> at that point.  So there's no circularity involved.
>>
>> On further reflection it seems like you were right, the problem is
>> taking a self-exclusive lock on pg_subscription_rel during low-level
>> catalog operations.  We're going to have to find a way to reduce that
>> lock strength, or we're going to have a lot of deadlock problems.
>>
> 
> Well we have heavy lock because we were worried about failure scenarios
> in our dumb upsert in SetSubscriptionRelState which does cache lookup
> and if it finds something it updates, otherwise inserts. We have similar
> pattern in other places but they are called from DDL statements usually
> so the worst that can happen is DDL fails with weird errors when done
> concurrently with same name so using RowExclusiveLock is okay.
> 
> That being said, looking at use-cases for SetSubscriptionRelState that's
> basically CREATE SUBSCRIPTION, ALTER SUBSCRIPTION REFRESH and tablesync
> worker. So the DDL thing applies to first ones as well and tablesync
> should not be running in case the record does not exist so it's fine if
> it fails. In terms of RemoveSubscriptionRel that's only called from
> heap_drop_with_catalog and tablesync holds relation lock so there is no
> way heap_drop_with_catalog will happen on the same relation. This leads
> me to thinking that RowExclusiveLock is fine for both
> SetSubscriptionRelState and RemoveSubscriptionRel as long as we document
> that callers should be aware that SetSubscriptionRelState has
> concurrency issues and fail on unique index check.
> 

And a simple patch to do so. Peter do you see any problem with doing this?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From a12259efe7b4d0af570e7d2d7cb48b249158a1bd Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Sat, 1 Apr 2017 02:20:16 +0200
Subject: [PATCH] Use weaker locks when updating subscription relation state

---
 src/backend/catalog/pg_subscription.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index e7a1634..b0bd248 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -215,6 +215,9 @@ textarray_to_stringlist(ArrayType *textarray)
 
 /*
  * Set the state of a subscription table.
+ *
+ * The insert or update logic in this function is not concurrency safe so
+ * it may raise an error.
  */
 Oid
 SetSubscriptionRelState(Oid subid, Oid relid, char state,
@@ -227,7 +230,7 @@ SetSubscriptionRelState(Oid subid, Oid relid, char state,
 	Datum		values[Natts_pg_subscription_rel];
 
 	/* Prevent concurrent changes. */
-	rel = heap_open(SubscriptionRelRelationId, ShareRowExclusiveLock);
+	rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
 
 	/* Try finding existing mapping. */
 	tup = SearchSysCacheCopy2(SUBSCRIPTIONRELMAP,
@@ -358,8 +361,7 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
 	HeapTuple	tup;
 	int			nkeys = 0;
 
-	/* Prevent concurrent changes (see SetSubscriptionRelState()). */
-	rel = heap_open(SubscriptionRelRelationId, ShareRowExclusiveLock);
+	rel = heap_open(SubscriptionRelRelationId, RowExclusiveLock);
 
 	if (OidIsValid(subid))
 	{
@@ -387,7 +389,7 @@ RemoveSubscriptionRel(Oid subid, Oid relid)
 	}
 	heap_endscan(scan);
 
-	heap_close(rel, ShareRowExclusiveLock);
+	heap_close(rel, RowExclusiveLock);
 }
 
 
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 2:14 PM, Mike Palmiotto
 wrote:
> Attached you will find two patches, which were rebased on master as of
> 156d388 (applied with `git am --revert [patch file]`). The first gets
> rid of some pesky compiler warnings and the second implements the
> sepgsql handling of partitioned tables.

0001 has the problem that we have a firm rule against putting any
#includes whatsoever before "postgres.h".  This stdbool issue has come
up before, though, and I fear we're going to need to do something
about it.

0002 looks extremely straightforward, but I wonder if we could get one
of the people on this list who knows about sepgsql to have a look?
(Stephen Frost, Joe Conway, KaiGai Kohei...)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 3:03 PM, Brandur Leach  wrote:
> Thank you Peter for the assist here and great sleuthing in
> the disassembly, and thanks Teodor for committing!
>
> Neha pointed out (thanks as well) a couple typos upthread
> that I hadn't gotten around to fixing. I've attached a new
> three line patch to sort those out.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Somebody has not thought through subscription locking considerations

2017-03-31 Thread Petr Jelinek
On 01/04/17 01:57, Petr Jelinek wrote:
> On 01/04/17 01:20, Tom Lane wrote:
>> Petr Jelinek  writes:
>>> But the pg_subscription_rel is also not accessed on heap_open, the
>>> problematic code is called from heap_drop_with_catalog. And VACUUM FULL
>>> pg_class calls heap_drop_with_catalog() when doing the heap swap (and it
>>> goes through performDeletion so through dependency info which is what I
>>> mean by everything else does this).
>>
>> Hmmm.  What the heap_drop_with_catalog call is being applied to is a
>> short-lived table that is not pg_class.  It happens to own the disk
>> storage that used to belong to pg_class, but it is not pg_class;
>> in particular it doesn't have the same OID, and it is not what would
>> be looked in if someone happened to need to fetch a pg_class row
>> at that point.  So there's no circularity involved.
>>
>> On further reflection it seems like you were right, the problem is
>> taking a self-exclusive lock on pg_subscription_rel during low-level
>> catalog operations.  We're going to have to find a way to reduce that
>> lock strength, or we're going to have a lot of deadlock problems.
>>
> 
> Well we have heavy lock because we were worried about failure scenarios
> in our dump upsert in SetSubscriptionRelState which does cache lookup

*dumb (ie, like me ;) )


-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Somebody has not thought through subscription locking considerations

2017-03-31 Thread Petr Jelinek
On 01/04/17 01:20, Tom Lane wrote:
> Petr Jelinek  writes:
>> But the pg_subscription_rel is also not accessed on heap_open, the
>> problematic code is called from heap_drop_with_catalog. And VACUUM FULL
>> pg_class calls heap_drop_with_catalog() when doing the heap swap (and it
>> goes through performDeletion so through dependency info which is what I
>> mean by everything else does this).
> 
> Hmmm.  What the heap_drop_with_catalog call is being applied to is a
> short-lived table that is not pg_class.  It happens to own the disk
> storage that used to belong to pg_class, but it is not pg_class;
> in particular it doesn't have the same OID, and it is not what would
> be looked in if someone happened to need to fetch a pg_class row
> at that point.  So there's no circularity involved.
> 
> On further reflection it seems like you were right, the problem is
> taking a self-exclusive lock on pg_subscription_rel during low-level
> catalog operations.  We're going to have to find a way to reduce that
> lock strength, or we're going to have a lot of deadlock problems.
> 

Well we have heavy lock because we were worried about failure scenarios
in our dump upsert in SetSubscriptionRelState which does cache lookup
and if it finds something it updates, otherwise inserts. We have similar
pattern in other places but they are called from DDL statements usually
so the worst that can happen is DDL fails with weird errors when done
concurrently with same name so using RowExclusiveLock is okay.

That being said, looking at use-cases for SetSubscriptionRelState that's
basically CREATE SUBSCRIPTION, ALTER SUBSCRIPTION REFRESH and tablesync
worker. So the DDL thing applies to first ones as well and tablesync
should not be running in case the record does not exist so it's fine if
it fails. In terms of RemoveSubscriptionRel that's only called from
heap_drop_with_catalog and tablesync holds relation lock so there is no
way heap_drop_with_catalog will happen on the same relation. This leads
me to thinking that RowExclusiveLock is fine for both
SetSubscriptionRelState and RemoveSubscriptionRel as long as we document
that callers should be aware that SetSubscriptionRelState has
concurrency might fail on unique index check.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] Savepoint-related statements terminates connection

2017-03-31 Thread Michael Paquier
On Sat, Apr 1, 2017 at 1:06 AM, Alvaro Herrera  wrote:
> Ashutosh Bapat wrote:
>> Please add this to the next commitfest.
>
> If this cannot be reproduced in 9.6, then it must be added to the
> Open Items wiki page instead.

The behavior reported can be reproduced further down (just tried on
9.3, gave up below). Like Tsunakawa-san, I am surprised to see that an
elog() message is exposed to the user.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-31 Thread Peter Geoghegan
I had a quick look at this on the flight back from PGConf.US.

On Fri, Mar 31, 2017 at 10:40 AM, Anastasia Lubennikova
 wrote:
> But I haven't considered the possibility of index_form_tuple() failure.
> Fixed in this version of the patch. Now it creates a copy of tupledesc to
> pass it to index_form_tuple.

I think that we need to be 100% sure that index_truncate_tuple() will
not generate an IndexTuple that is larger than the original.
Otherwise, you could violate the "1/3 of page size exceeded" thing. We
need to catch that when the user actually inserts an oversized value.
After that, it's always too late. (See my remarks to Tom on other
thread about this, too.)

> We'd discussed with other reviewers, they suggested index_truncate_tuple()
> instead of index_reform_tuple().
> I think that this name reflects the essence of the function clear enough and
> don't feel like renaming it again.

+1.

Feedback so far:

* index_truncate_tuple() should have as an argument the number of
attributes. No need to "#include utils/rel.h" that way.

* I think that we should store this (the number of attributes), and
use it directly when comparing, per my remarks to Tom over on that
other thread. We should also use the free bit within
IndexTupleData.t_info, to indicate that the IndexTuple was truncated,
just to make it clear to everyone that might care that that's how
these truncated IndexTuples need to be represented.

Doing this would have no real impact on your patch, because for you
this will be 100% redundant. It will help external tools, and perhaps
another, more general suffix truncation patch that comes in the
future. We should try very hard to have a future-proof on-disk
representation. I think that this is quite possible.

* I suggest adding a "can't happen" defensive check + error that
checks that the tuple returned by index_truncate_tuple() is sized <=
the original. This cannot be allowed to ever happen. (Not that I think
it will.)

* I see a small bug. You forgot to teach _bt_findsplitloc() about
truncation. It does this currently, which you did not update:

/*
 * The first item on the right page becomes the high key of the left page;
 * therefore it counts against left space as well as right space.
 */
leftfree -= firstrightitemsz;

I think that this accounting needs to be fixed.

* Note sure about one thing. What's the reason for this change?

> -   /* Log left page */
> -   if (!isleaf)
> -   {
> -   /*
> -* We must also log the left page's high key, because the right
> -* page's leftmost key is suppressed on non-leaf levels.  Show it
> -* as belonging to the left page buffer, so that it is not stored
> -* if XLogInsert decides it needs a full-page image of the left
> -* page.
> -*/
> -   itemid = PageGetItemId(origpage, P_HIKEY);
> -   item = (IndexTuple) PageGetItem(origpage, itemid);
> -   XLogRegisterBufData(0, (char *) item, 
> MAXALIGN(IndexTupleSize(item)));
> -   }
> +   /*
> +* We must also log the left page's high key, because the right
> +* page's leftmost key is suppressed on non-leaf levels.  Show it
> +* as belonging to the left page buffer, so that it is not stored
> +* if XLogInsert decides it needs a full-page image of the left
> +* page.
> +*/
> +   itemid = PageGetItemId(origpage, P_HIKEY);
> +   item = (IndexTuple) PageGetItem(origpage, itemid);
> +   XLogRegisterBufData(0, (char *) item, MAXALIGN(IndexTupleSize(item)));

Is this related to the problem that you mentioned to me that you'd
fixed when we spoke in person earlier today? You said something about
WAL logging, but I don't recall any details. I don't remember seeing
this change in prior versions.

Anyway, whatever the reason for doing this on the leaf level now, the
comments should be updated to explain it.

* Speaking of WAL-logging, I think that there is another bug in
btree_xlog_split(). You didn't change this existing code at all:

/*
 * On leaf level, the high key of the left page is equal to the first key
 * on the right page.
 */
if (isleaf)
{
ItemId  hiItemId = PageGetItemId(rpage, P_FIRSTDATAKEY(ropaque));

left_hikey = PageGetItem(rpage, hiItemId);
left_hikeysz = ItemIdGetLength(hiItemId);
}

It seems like this was missed when you changed WAL-logging, since you
do something for this on the logging side, but not here, on the replay
side. No?

That's all I have for now. Maybe I can look again later, or tomorrow.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Somebody has not thought through subscription locking considerations

2017-03-31 Thread Tom Lane
Petr Jelinek  writes:
> But the pg_subscription_rel is also not accessed on heap_open, the
> problematic code is called from heap_drop_with_catalog. And VACUUM FULL
> pg_class calls heap_drop_with_catalog() when doing the heap swap (and it
> goes through performDeletion so through dependency info which is what I
> mean by everything else does this).

Hmmm.  What the heap_drop_with_catalog call is being applied to is a
short-lived table that is not pg_class.  It happens to own the disk
storage that used to belong to pg_class, but it is not pg_class;
in particular it doesn't have the same OID, and it is not what would
be looked in if someone happened to need to fetch a pg_class row
at that point.  So there's no circularity involved.

However ... by that same token, it ought to be okay to consult
pg_subscription_rel during that drop step.  Indeed, if we put
an IsCatalogRelation check in there, it wouldn't fire, because
the target table has OID 16409 according to your stack trace.
So that line of thought is indeed not very helpful.

On further reflection it seems like you were right, the problem is
taking a self-exclusive lock on pg_subscription_rel during low-level
catalog operations.  We're going to have to find a way to reduce that
lock strength, or we're going to have a lot of deadlock problems.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw IMPORT SCHEMA and partitioned tables

2017-03-31 Thread Michael Paquier
On Sat, Apr 1, 2017 at 4:55 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Mar 31, 2017 at 3:31 PM, Andres Freund  wrote:
>>> Hm. Wonder if something like that shouldn't be backpatched - because
>>> otherwise using postgres_fdw from an old server against a newer one will
>>> do weird stuff.  I don't know what kind of policy we've committed to
>>> with postgresImportForeignSchema...
>
>> I don't think I'd like to promise that postgres_fdw will always be
>> forward-compatible.  Backward-compatibility is hard enough already.

Thanks for the commit.

> Unless I'm missing something, the behavior will be that an older
> version will simply ignore remote partitioned tables (they will not
> pass the relkind filter in the query).  Seems pretty fail-soft,
> so I think it's fine.

Yeah, I would suggest to revisit that if we get actual complaints, but
I would not push much in favor of it. It's not an area where nothing
can be done to improve the user experience.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Table collision in join.sql and aggregates.sql

2017-03-31 Thread Tom Lane
Douglas Doole  writes:
> As we've been merging our code with 9.6, a couple developers have had
> one-off failures in the join.sql and aggregates.sql test because the tables
> T1, T2 and T3 have the wrong definitions.

> Digging into it, I found that both files create the tables T1, T2, and T3
> for a short period of time and then drop them. Since the parallel_schedule
> file puts these two files into the same group, they can run concurrently.
> If it happens that the the two files hit the T1, T2, T3 tests at the same
> time, then you see the table definition problems.

Hmmm ... that would indeed be a problem, except that aggregate.sql's
tables are temp tables, which should mean that they are in a schema
different from the one that join.sql is putting its tables in.  Are you
sure you've identified the issue correctly?  Because if this doesn't
work, there are an awful lot of similar hazards elsewhere in the
regression tests.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Somebody has not thought through subscription locking considerations

2017-03-31 Thread Petr Jelinek
On 01/04/17 00:52, Tom Lane wrote:
> Petr Jelinek  writes:
>> On 31/03/17 21:00, Tom Lane wrote:
>>> Looking at dependency info isn't going to fix this, it only moves the
>>> unsafe catalog access somewhere else (ie pg_depend instead of
>>> pg_subscription_rel).  I suspect the only safe solution is doing an
>>> IsCatalogRelation or similar test pretty early in the logical replication
>>> code paths.
> 
>> I don't follow, everything else does dependency info check in this
>> situation, how is this any different?
> 
> What do you mean by "everything else"?  The key point here is that
> access to the bootstrap catalogs like pg_class and pg_attribute can't
> be dependent on accesses to other catalogs, or we get into circularities.
> We certainly aren't trying to look in pg_depend when we do a heap_open.
> 
> (Or, if you tell me that we are now because the logical replication
> patch added it, I'm going to start agitating for reverting the patch
> and sending it back for redesign.)

But the pg_subscription_rel is also not accessed on heap_open, the
problematic code is called from heap_drop_with_catalog. And VACUUM FULL
pg_class calls heap_drop_with_catalog() when doing the heap swap (and it
goes through performDeletion so through dependency info which is what I
mean by everything else does this).

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Somebody has not thought through subscription locking considerations

2017-03-31 Thread Tom Lane
Petr Jelinek  writes:
> On 31/03/17 21:00, Tom Lane wrote:
>> Looking at dependency info isn't going to fix this, it only moves the
>> unsafe catalog access somewhere else (ie pg_depend instead of
>> pg_subscription_rel).  I suspect the only safe solution is doing an
>> IsCatalogRelation or similar test pretty early in the logical replication
>> code paths.

> I don't follow, everything else does dependency info check in this
> situation, how is this any different?

What do you mean by "everything else"?  The key point here is that
access to the bootstrap catalogs like pg_class and pg_attribute can't
be dependent on accesses to other catalogs, or we get into circularities.
We certainly aren't trying to look in pg_depend when we do a heap_open.

(Or, if you tell me that we are now because the logical replication
patch added it, I'm going to start agitating for reverting the patch
and sending it back for redesign.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Table collision in join.sql and aggregates.sql

2017-03-31 Thread Douglas Doole
As we've been merging our code with 9.6, a couple developers have had
one-off failures in the join.sql and aggregates.sql test because the tables
T1, T2 and T3 have the wrong definitions.

Digging into it, I found that both files create the tables T1, T2, and T3
for a short period of time and then drop them. Since the parallel_schedule
file puts these two files into the same group, they can run concurrently.
If it happens that the the two files hit the T1, T2, T3 tests at the same
time, then you see the table definition problems.

I took the easy way of solving this and renamed the tables in
aggregates.sql to AGG1, AGG2, and AGG3. (Picked on aggregates.sql since it
had the T1, T2, T3 tests added in 9.6.)

Doug
- Salesforce
*** a/src/test/regress/expected/aggregates.out
--- b/src/test/regress/expected/aggregates.out
***
*** 957,1023  LINE 1: select (select max(min(unique1)) from int8_tbl) from tenk1;
  --
  -- Test removal of redundant GROUP BY columns
  --
! create temp table t1 (a int, b int, c int, d int, primary key (a, b));
! create temp table t2 (x int, y int, z int, primary key (x, y));
! create temp table t3 (a int, b int, c int, primary key(a, b) deferrable);
  -- Non-primary-key columns can be removed from GROUP BY
! explain (costs off) select * from t1 group by a,b,c,d;
!   QUERY PLAN  
! --
   HashAggregate
 Group Key: a, b
!->  Seq Scan on t1
  (3 rows)
  
  -- No removal can happen if the complete PK is not present in GROUP BY
! explain (costs off) select a,c from t1 group by a,c,d;
!   QUERY PLAN  
! --
   HashAggregate
 Group Key: a, c, d
!->  Seq Scan on t1
  (3 rows)
  
  -- Test removal across multiple relations
  explain (costs off) select *
! from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
! group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z;
!   QUERY PLAN   
! ---
   Group
!Group Key: t1.a, t1.b, t2.x, t2.y
 ->  Merge Join
!  Merge Cond: ((t1.a = t2.x) AND (t1.b = t2.y))
!  ->  Index Scan using t1_pkey on t1
!  ->  Index Scan using t2_pkey on t2
  (6 rows)
  
! -- Test case where t1 can be optimized but not t2
! explain (costs off) select t1.*,t2.x,t2.z
! from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y
! group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z;
!   QUERY PLAN   
! ---
   HashAggregate
!Group Key: t1.a, t1.b, t2.x, t2.z
 ->  Merge Join
!  Merge Cond: ((t1.a = t2.x) AND (t1.b = t2.y))
!  ->  Index Scan using t1_pkey on t1
!  ->  Index Scan using t2_pkey on t2
  (6 rows)
  
  -- Cannot optimize when PK is deferrable
! explain (costs off) select * from t3 group by a,b,c;
!   QUERY PLAN  
! --
   HashAggregate
 Group Key: a, b, c
!->  Seq Scan on t3
  (3 rows)
  
! drop table t1;
! drop table t2;
! drop table t3;
  --
  -- Test combinations of DISTINCT and/or ORDER BY
  --
--- 957,1023 
  --
  -- Test removal of redundant GROUP BY columns
  --
! create temp table agg1 (a int, b int, c int, d int, primary key (a, b));
! create temp table agg2 (x int, y int, z int, primary key (x, y));
! create temp table agg3 (a int, b int, c int, primary key(a, b) deferrable);
  -- Non-primary-key columns can be removed from GROUP BY
! explain (costs off) select * from agg1 group by a,b,c,d;
!QUERY PLAN   
! 
   HashAggregate
 Group Key: a, b
!->  Seq Scan on agg1
  (3 rows)
  
  -- No removal can happen if the complete PK is not present in GROUP BY
! explain (costs off) select a,c from agg1 group by a,c,d;
!QUERY PLAN   
! 
   HashAggregate
 Group Key: a, c, d
!->  Seq Scan on agg1
  (3 rows)
  
  -- Test removal across multiple relations
  explain (costs off) select *
! from agg1 inner join agg2 on agg1.a = agg2.x and agg1.b = agg2.y
! group by agg1.a,agg1.b,agg1.c,agg1.d,agg2.x,agg2.y,agg2.z;
!   QUERY PLAN   
! ---
   Group
!Group Key: agg1.a, agg1.b, agg2.x, agg2.y
 ->  Merge Join
!  Merge Cond: ((agg1.a = agg2.x) AND (agg1.b = agg2.y))
!  ->  Index Scan using agg1_pkey on agg1
!  ->  Index Scan using agg2_pkey on agg2
  (6 rows)
  
! -- Test case where agg1 can be optimized but not agg2
! explain (costs off) select agg1.*,agg2.x,agg2.z
! from agg1 inner join agg2 on agg1.a = agg2.x and agg1.b = agg2.y
! group by agg1.a,agg1.b,agg1.c,agg1.d,agg2.x,agg2.z;
!   QUERY PLAN   
! ---
   HashAggregate
!Group Key: agg1.a, agg1.b, agg2.x, agg2.z
 ->  Merge Join
!  Merge Cond: ((agg1.a = 

Re: [HACKERS] Somebody has not thought through subscription locking considerations

2017-03-31 Thread Petr Jelinek
On 31/03/17 21:00, Tom Lane wrote:
> Petr Jelinek  writes:
>> On 31/03/17 20:23, Tom Lane wrote:
>>> No, the problematic part is that there is any heap_open happening at
>>> all.  That open could very easily result in a recursive attempt to read
>>> pg_class, for example, which is going to be fatal if we're in the midst
>>> of vacuum full'ing or reindex'ing pg_class.  It's frankly astonishing
>>> to me that this patch seems to have survived testing under
>>> CLOBBER_CACHE_ALWAYS, because it's only the catalog caches that are
>>> preventing such recursive lookups.
> 
>> Hmm okay, so the solution is to either use standard dependency info for
>> this so that it's only called for tables that are actually know to be
>> subscribed or have some exceptions in the current code to call the
>> function only for user catalogs. Any preferences?
> 
> Looking at dependency info isn't going to fix this, it only moves the
> unsafe catalog access somewhere else (ie pg_depend instead of
> pg_subscription_rel).  I suspect the only safe solution is doing an
> IsCatalogRelation or similar test pretty early in the logical replication
> code paths.
> 

I don't follow, everything else does dependency info check in this
situation, how is this any different?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock optimization for multicore Power machines

2017-03-31 Thread Tom Lane
Alexander Korotkov  writes:
>> It seems that on this platform definition of atomics should be provided by
>> fallback.h.  But it doesn't because I already defined 
>> PG_HAVE_ATOMIC_U32_SUPPORT
>> in arch-ppc.h.  I think in this case we shouldn't provide ppc-specific
>> implementation of pg_atomic_fetch_mask_add_u32().  However, I don't know
>> how to do this assuming arch-ppc.h is included before compiler-specific
>> headers.  Thus, in arch-ppc.h we don't know yet if we would find
>> implementation of atomics for this platform.  One possible solution is to
>> provide assembly implementation for all atomics in arch-ppc.h.

> BTW, implementation for all atomics in arch-ppc.h would be too invasive and
> shouldn't be considered for v10.
> However, I made following workaround: declare pg_atomic_uint32 and
> pg_atomic_fetch_mask_add_u32_impl() only when we know that generic-gcc.h
> would declare gcc-based atomics.

I don't have a well-informed opinion on whether this is a reasonable thing
to do, but I imagine Andres does.

> Could you, please, check it on Apple PPC?

It does compile and pass "make check" on prairiedog.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioning vs ON CONFLICT

2017-03-31 Thread Peter Geoghegan
On Fri, Mar 31, 2017 at 5:44 PM, Robert Haas  wrote:
> And, indeed, you could get an unique constraint or exclusion error
> because of an index on the child even though it's not global to the
> partitioning hierarchy.  So maybe we can support this after all, but
> having messed it up once, I'm inclined to think we should postpone
> this to v11, think it over some more, a

Fine by me.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Ashutosh Sharma
On Sat, Apr 1, 2017 at 1:08 AM, Robert Haas  wrote:
> On Mon, Mar 27, 2017 at 5:39 AM, Ashutosh Sharma  
> wrote:
>> Thanks for reporting this problem. Could you please let me know on for
>> how long did you run sqlsmith to get this crash. However, I have found
>> the reason for this crash. This is basically happening when trying to
>> retrieve the tuples using cursor. Basically the current hash index
>> scan work tuple-at-a-time which means once it finds tuple on page, it
>> releases lock from the page but keeps pin on it and finally returns
>> the tuple. When the requested number of tuples are processed there is
>> no lock on the page that was being scanned but yes there is a pin on
>> it. Finally, when trying to close a cursor at the end of scan, if any
>> killed tuples has been identified we try to first mark these items as
>> dead with the help of _hash_kill_items(). But, since we only have pin
>> on this page, the assert check 'LWLockHeldByMe()' fails.
>
> Instead of adding bool haveLock to _hash_kill_items, how about just
> having the caller acquire the lock before calling the function and
> release it afterwards?  Since the acquire is at the beginning of the
> function and the release at the end, there seems to be no point in
> putting the acquire/release inside the function rather than in the
> caller.

Well, That is another option but the main idea was to be inline with
the btree code. Moreover, I am not sure if acquiring lwlock inside
hashendscan (basically the place where we are trying to close down the
things) would look good.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioning vs ON CONFLICT

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 5:33 PM, Peter Geoghegan  wrote:
> In my opinion, for the very limited ON CONFLICT DO NOTHING + no
> inference specification case, the implementation should not care about
> the presence or absence of unique indexes within or across partitions.

Hmm.  That's an interesting point.  The documentation says:

ON CONFLICT can be used to specify an alternative action to raising a
unique constraint or exclusion constraint violation error.

And, indeed, you could get an unique constraint or exclusion error
because of an index on the child even though it's not global to the
partitioning hierarchy.  So maybe we can support this after all, but
having messed it up once, I'm inclined to think we should postpone
this to v11, think it over some more, and try to make sure that our
second try doesn't crash...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation improvements for partitioning

2017-03-31 Thread Robert Haas
On Tue, Mar 28, 2017 at 6:35 AM, Amit Langote
 wrote:
> Attached updated patch.

Committed with editing here and there.  I just left out the thing
about UNION ALL views, which seemed to brief a treatment to deserve
its own subsection.  Maybe a longer explanation of that is worthwhile
or maybe it's not, but that can be a separate patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioning vs ON CONFLICT

2017-03-31 Thread Peter Geoghegan
On Fri, Mar 31, 2017 at 4:47 PM, Robert Haas  wrote:
> /*
>  * Open partition indices (remember we do not support ON CONFLICT in
>  * case of partitioned tables, so we do not need support information
>  * for speculative insertion)
>  */
>
> Part of the question here is definitional.  Peter rightly pointed out
> upthread that we support INSERT .. ON CONFLICT in an inheritance
> situation, but that is different, because it infers whether there is a
> conflict in the particular child into which you are trying to insert,
> not whether there is a conflict across the whole hierarchy.

I would say that it doesn't infer anything at all, in the sense that
infer_arbiter_indexes() returns very early. It's then implied that
whatever constraint index OIDs that the executor later happens to find
will have speculative insertions. The optimizer doesn't try to predict
what that will look like within the executor, or even on a foreign
postgres_fdw server in the case of foreign tables. Foreign table
indexes are not known to the local installation, which is one reason
for this. You could INSERT ... ON CONFLICT DO NOTHING (no inference
specification) into an ordinary table with no indexes, and that also
works fine. (It's just silly.)

> More or
> less by definition, trying to insert into the room of the partitioning
> hierarchy is a different beast: it should consider uniqueness across
> the whole hierarchy in determining whether there is a conflict and, as
> Simon pointed out in the second email on the thread, we lack a
> mechanism to do that.

In my opinion, for the very limited ON CONFLICT DO NOTHING + no
inference specification case, the implementation should not care about
the presence or absence of unique indexes within or across partitions.
It might be sloppy for an application developer to do a whole lot of
this, but that's not a judgement I think we can make for them.

I don't feel strongly about this, though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REFERENCES privilege should not be symmetric (was Re: [GENERAL] Postgres Permissions Article)

2017-03-31 Thread Tom Lane
"David G. Johnston"  writes:
> On Fri, Mar 31, 2017 at 10:40 AM, Tom Lane  wrote:
>> I think the benefit is reduction of user confusion.  Admittedly, since
>> Paul is the first person I can remember ever having complained about it,
>> maybe nobody else is confused.

> ​After going back-and-forth on this (and not being able to independently
> come to the conclusion that what we are adhering to is actually a typo) I'm
> going to toss my +1 in with Robert's.

OK, the consensus seems clearly against changing this in the back branches,
so I'll just fix it in HEAD.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Foreign tables don't enforce the partition constraint

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 5:11 AM, Ashutosh Bapat
 wrote:
> Per https://www.postgresql.org/docs/devel/static/sql-createforeigntable.html,
> constraints on the foreign table should represent a constraint that is
> being enforced by the remote server.

Right.  This is user error.  Having the *local* server try to enforce
the constraint would slow down the system without guaranteeing
anything, because somebody could modify the table on the remote server
directly.

> Similarly, a partition constraint
> should also be enforced at the foreign server. Probably we should
> update documentation of create foreign table to mention this.

That is a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioning vs ON CONFLICT

2017-03-31 Thread Robert Haas
On Thu, Mar 30, 2017 at 5:54 AM, Amit Langote
 wrote:
> I found out why the crash occurs, but while I was trying to fix it, I
> started growing doubtful about the way this is being handled currently.
>
> Patch to fix the crash would be to pass 'true' instead of 'false' for
> speculative when ExecSetupPartitionTupleRouting() calls ExecOpenIndices()
> on leaf partitions.  That will initialize the information needed when
> ExecInsert() wants check for conflicts using the constraint-enforcing
> indexes.  If we do initialize the speculative insertion info (which will
> fix the crash), ExecCheckIndexConstraints() will be called on a given leaf
> partition's index to check if there is any conflict.  But since the insert
> was performed on the root table, conflicts should be checked across all
> the partitions, which won't be the case.  Even though the action is
> NOTHING, the check for conflicts still uses only that one leaf partition's
> index, which seems insufficient.
>
> Commit 8355a011a0 enabled specifying ON CONFLICT DO NOTHING on when
> inserting into a partitioned root table, but given the above, I think we
> might need to reconsider it.

It seems obvious in retrospect that the commit in question was not
quite up to the mark, considering that it didn't even update the
comment here:

/*
 * Open partition indices (remember we do not support ON CONFLICT in
 * case of partitioned tables, so we do not need support information
 * for speculative insertion)
 */

Part of the question here is definitional.  Peter rightly pointed out
upthread that we support INSERT .. ON CONFLICT in an inheritance
situation, but that is different, because it infers whether there is a
conflict in the particular child into which you are trying to insert,
not whether there is a conflict across the whole hierarchy.  More or
less by definition, trying to insert into the room of the partitioning
hierarchy is a different beast: it should consider uniqueness across
the whole hierarchy in determining whether there is a conflict and, as
Simon pointed out in the second email on the thread, we lack a
mechanism to do that.  If somebody wants to consider only conflicts
within a specific partition, they can use INSERT .. ON CONFLICT with
the name of that partition, and that'll work fine; if they target the
parent, that should really apply globally to the hierarchy, which we
can't support.

So I think you (Amit) are right, and we should revert this commit.  We
can try again to make this work in a future release once we've had a
chance to think about it some more.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partitioned tables and relfilenode

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 4:33 AM, Kyotaro HORIGUCHI
 wrote:
> I have no more comment on this. Thank you.

I committed this with a few tweaks.  I simplified the wording in the
documentation a bit, removed or adjusted a couple of comments, and
slightly changed the way the logic works in parseRelOptions in a way
that I think is simpler.

Thanks for reviewing, and thanks to Maksim as well, and thanks to Amit
for writing the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Peter Geoghegan
On Mar 31, 2017 2:17 PM, "Peter Geoghegan"  wrote:

On Fri, Mar 31, 2017 at 2:11 PM, Tom Lane  wrote:
>> The patch does actually store truncated/key-only tuples in the hi keys /
>> non-leaf-nodes, which don't need the "included" columns.
>
> Hm.  Since index tuples lack any means of indicating the actual number
> of columns included (ie there's no equivalent of the natts field that
> exists in HeapTupleHeaders), I think that this is an unreasonably
> dangerous design.  It'd be better to store nulls for the missing
> fields.  That would force a null bitmap to be included whether or
> not there were nulls in the key columns, but if we're only doing it
> once per page that doesn't seem like much cost.

We're doing it once per page for the leaf page high key, but that's
used as the downlink in the second phase of a B-Tree page split --
it's directly copied. So, including a NULL bitmap would make
Anastasia's patch significantly less useful, since that now has to be
in every internal page, and might imply that you end up with less
fan-in much of the time (e.g., int4 attribute is truncated from the
end relative to leaf IndexTuple format).


BTW, what about the 1/3 of a page restriction on tuple size?

I think that truncation has to strictly guarantee that the resulting tuple
will be no larger than the original. Otherwise, you get into trouble here.
But, that's still not my main objection to using the null bitmap.
--
Peter Geoghegan
(Sent from my phone)


Re: [HACKERS] postgres_fdw IMPORT SCHEMA and partitioned tables

2017-03-31 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 31, 2017 at 3:31 PM, Andres Freund  wrote:
>> Hm. Wonder if something like that shouldn't be backpatched - because
>> otherwise using postgres_fdw from an old server against a newer one will
>> do weird stuff.  I don't know what kind of policy we've committed to
>> with postgresImportForeignSchema...

> I don't think I'd like to promise that postgres_fdw will always be
> forward-compatible.  Backward-compatibility is hard enough already.

Unless I'm missing something, the behavior will be that an older
version will simply ignore remote partitioned tables (they will not
pass the relkind filter in the query).  Seems pretty fail-soft,
so I think it's fine.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Peter Geoghegan
I don't see why you'd necessarily need to know where in the index the tuple
came from under my proposal. Besides, as I already pointed out, we already
hard code minus infinity handling within internal pages during tuple
comparisons.

Anastasia's patch could modify nbtree in exactly the same way as suffix
truncation would. I see no conflict there. Quite the opposite, in fact.

--
Peter Geoghegan
(Sent from my phone)


Re: [HACKERS] [sqlsmith] Failed assertion in _hash_kill_items/MarkBufferDirtyHint

2017-03-31 Thread Robert Haas
On Mon, Mar 27, 2017 at 5:39 AM, Ashutosh Sharma  wrote:
> Thanks for reporting this problem. Could you please let me know on for
> how long did you run sqlsmith to get this crash. However, I have found
> the reason for this crash. This is basically happening when trying to
> retrieve the tuples using cursor. Basically the current hash index
> scan work tuple-at-a-time which means once it finds tuple on page, it
> releases lock from the page but keeps pin on it and finally returns
> the tuple. When the requested number of tuples are processed there is
> no lock on the page that was being scanned but yes there is a pin on
> it. Finally, when trying to close a cursor at the end of scan, if any
> killed tuples has been identified we try to first mark these items as
> dead with the help of _hash_kill_items(). But, since we only have pin
> on this page, the assert check 'LWLockHeldByMe()' fails.

Instead of adding bool haveLock to _hash_kill_items, how about just
having the caller acquire the lock before calling the function and
release it afterwards?  Since the acquire is at the beginning of the
function and the release at the end, there seems to be no point in
putting the acquire/release inside the function rather than in the
caller.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, Mar 31, 2017 at 2:32 PM, Tom Lane  wrote:
>> ... I also think we're setting up a situation where tools
>> like amcheck and pg_filedump are going to be unable to cope, because
>> figuring out what a particular tuple contains is going to require context
>> they haven't got.  It just seems way too dangerous.

> Why wouldn't they have the context?

Up to now, we have always had the property that you could decode an index
tuple given only the tuple and its relation's tupdesc (and likewise for
heap tuples).  This patch breaks that: now you need to know where in the
index the tuple came from.  That's a property that I think we will regret
losing.  No doubt we can make it work for some value of "work", but
there's going to be broken tools, and other opportunity costs down the
road.

> There is a free bit within IndexTupleData.t_info that could indicate
> that this is what happened.

Yeah, I was just looking at that, but it's the only bit we've got left.
Don't think we can use it both to deal with Anastasia's patch and your
suffix truncation idea.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw IMPORT SCHEMA and partitioned tables

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 3:31 PM, Andres Freund  wrote:
>> Committed after rewording the documentation.
>
> Hm. Wonder if something like that shouldn't be backpatched - because
> otherwise using postgres_fdw from an old server against a newer one will
> do weird stuff.  I don't know what kind of policy we've committed to
> with postgresImportForeignSchema...

I don't think I'd like to promise that postgres_fdw will always be
forward-compatible.  Backward-compatibility is hard enough already.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw IMPORT SCHEMA and partitioned tables

2017-03-31 Thread Andres Freund
On 2017-03-31 15:25:19 -0400, Robert Haas wrote:
> On Fri, Mar 31, 2017 at 12:51 AM, Amit Langote
>  wrote:
> > Thanks, no more comments from my side.
> 
> Committed after rewording the documentation.

Hm. Wonder if something like that shouldn't be backpatched - because
otherwise using postgres_fdw from an old server against a newer one will
do weird stuff.  I don't know what kind of policy we've committed to
with postgresImportForeignSchema...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-31 Thread Robert Haas
On Tue, Mar 21, 2017 at 4:10 PM, Alvaro Herrera
 wrote:
> Well, the number of work items is currently fixed; but if you have many
> BRIN indexes, then you'd overflow (lose requests).  By using DSA I am
> making it easy to patch this afterwards so that an arbitrary number of
> requests can be recorded.

But that could also use an arbitrarily large amount of memory, and any
leaks will be cluster-lifespan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw IMPORT SCHEMA and partitioned tables

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 12:51 AM, Amit Langote
 wrote:
> Thanks, no more comments from my side.

Committed after rewording the documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Peter Geoghegan
On Fri, Mar 31, 2017 at 3:03 PM, Tom Lane  wrote:
> The reason it works fine for heap tuples is that heap tuple headers
> explicitly record the number of attributes in the tuple.  Index
> tuples don't.

Per my previous mail, I think we can change things so that Index
tuples effectively record that in all relevant cases. i.e., in what I
called separator key index tuples -- high key tuples in leaf pages,
and all internal page index tuples (high keys and downlinks/internal
items).

We already store a minus infinity downlink on internal pages, which
doesn't bother diagnostic tools at all, despite being its own special
case without real Datum values.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] few fts functions for jsonb

2017-03-31 Thread Oleg Bartunov
On 30 Mar 2017 23:43, "Dmitry Dolgov" <9erthali...@gmail.com> wrote:

On 31 March 2017 at 00:01, Andrew Dunstan 
wrote:
>
> I have just noticed as I was writing/testing the non-existent docs for
> this patch that it doesn't supply variants of to_tsvector that take a
> regconfig as the first argument. Is there a reason for that? Why
> should the json(b) versions be different from the text versions?

No, there is no reason, I just missed that. Here is a new version of the
patch (only the functions part)
to add those variants.


Congratulations with patch committed, who will write an addition
documentation? I think we need to touch  FTS and JSON parts.


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Jeff Janes
On Thu, Mar 30, 2017 at 4:13 AM, Pavan Deolasee 
wrote:

>
>
> On Thu, Mar 30, 2017 at 3:29 PM, Dilip Kumar 
> wrote:
>
>> On Wed, Mar 29, 2017 at 11:51 AM, Pavan Deolasee
>>  wrote:
>> > Thanks. I think your patch of tracking interesting attributes seems ok
>> too
>> > after the performance issue was addressed. Even though we can still
>> improve
>> > that further, at least Mithun confirmed that there is no significant
>> > regression anymore and in fact for one artificial case, patch does
>> better
>> > than even master.
>>
>> I was trying to compile these patches on latest
>> head(f90d23d0c51895e0d7db7910538e85d3d38691f0) for some testing but I
>> was not able to compile it.
>>
>> make[3]: *** [postgres.bki] Error 1
>>
>
> Looks like OID conflict to me.. Please try rebased set.
>

broken again on oid conflicts for 3373 to 3375 from the monitoring
permissions commi 25fff40798fc4.

After bumping those, I get these compiler warnings:

heapam.c: In function 'heap_delete':
heapam.c:3298: warning: 'root_offnum' may be used uninitialized in this
function
heapam.c: In function 'heap_update':
heapam.c:4311: warning: 'root_offnum' may be used uninitialized in this
function
heapam.c:4311: note: 'root_offnum' was declared here
heapam.c:3784: warning: 'root_offnum' may be used uninitialized in this
function
heapam.c: In function 'heap_lock_tuple':
heapam.c:5087: warning: 'root_offnum' may be used uninitialized in this
function


And I get a regression test failure, attached.

Cheers,

Jeff


regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 31, 2017 at 2:32 PM, Tom Lane  wrote:
>> It just seems way too dangerous.

> So, we end up with heap tuples with different numbers of attributes
> all the time, whenever you add a column.  It works fine - on decoding,
> the additional columns will be treated as null (unless somebody
> commits Serge Rielau's patch, which regrettably nobody has gotten
> around to reviewing).  Why is this case different?

The reason it works fine for heap tuples is that heap tuple headers
explicitly record the number of attributes in the tuple.  Index
tuples don't.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-31 Thread Brandur Leach
Hello,

Thank you Peter for the assist here and great sleuthing in
the disassembly, and thanks Teodor for committing!

Neha pointed out (thanks as well) a couple typos upthread
that I hadn't gotten around to fixing. I've attached a new
three line patch to sort those out.

Brandur


On Wed, Mar 29, 2017 at 1:31 PM, Teodor Sigaev  wrote:

> Thank you, pushed
>
> Excellent! I've attached a new (and hopefully final)
>> version of the patch.
>>
>
> --
> Teodor Sigaev   E-mail: teo...@sigaev.ru
>WWW:
> http://www.sigaev.ru/
>


0001-Fix-two-typos-introduced-by-macaddr-SortSupport.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Somebody has not thought through subscription locking considerations

2017-03-31 Thread Tom Lane
Petr Jelinek  writes:
> On 31/03/17 20:23, Tom Lane wrote:
>> No, the problematic part is that there is any heap_open happening at
>> all.  That open could very easily result in a recursive attempt to read
>> pg_class, for example, which is going to be fatal if we're in the midst
>> of vacuum full'ing or reindex'ing pg_class.  It's frankly astonishing
>> to me that this patch seems to have survived testing under
>> CLOBBER_CACHE_ALWAYS, because it's only the catalog caches that are
>> preventing such recursive lookups.

> Hmm okay, so the solution is to either use standard dependency info for
> this so that it's only called for tables that are actually know to be
> subscribed or have some exceptions in the current code to call the
> function only for user catalogs. Any preferences?

Looking at dependency info isn't going to fix this, it only moves the
unsafe catalog access somewhere else (ie pg_depend instead of
pg_subscription_rel).  I suspect the only safe solution is doing an
IsCatalogRelation or similar test pretty early in the logical replication
code paths.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] parallel explain analyze support not exercised

2017-03-31 Thread Andres Freund
Hi,

As visible in [1], the explain analyze codepaths of parallel query isn't
exercised in the tests.  That used to be not entirely trivial if the
output was to be displayed (due to timing), but we should be able to do
that now that we have the SUMMARY option.

E.g.
SET max_parallel_workers = 0;
EXPLAIN (analyze, timing off, summary off, costs off) SELECT * FROM blarg2 
WHERE generate_series < 0;
┌───┐
│QUERY PLAN │
├───┤
│ Gather (actual rows=0 loops=1)│
│   Workers Planned: 10 │
│   Workers Launched: 0 │
│   ->  Parallel Seq Scan on blarg2 (actual rows=0 loops=1) │
│ Filter: (generate_series < 0) │
│ Rows Removed by Filter: 1000  │
└───┘

should be reproducible.  I'd suggest additionally adding one tests that
throws the EXPLAIN output away, but actually enables paralellism.

Greetings,

Andres Freund

[1] 
https://coverage.postgresql.org/src/backend/executor/execParallel.c.gcov.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> It'd be useful for FieldStore - we'd not have to error out anymore if
>> the number of columns changes (which I previously had "solved" by using
>> MaxHeapAttributeNumber sized values/nulls array).

> Ah, I see.  But in that case you're not really truncating the tuple
> --- what you want is to be allowed to pass undersized datum/nulls
> arrays and have the missing columns be taken as nulls.

No, scratch that: you can't use an "extended" heap_form_tuple to solve
that problem, because it's not only the form-tuple end but the
deform-tuple end that needs fullsize arrays.  Otherwise, we'd be losing
trailing-column values in the deform-and-reform cycle.  So I still
don't see that there's a valid application for this feature.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin autosummarization -- autovacuum "work items"

2017-03-31 Thread Alvaro Herrera
Here's a version of this patch which I consider final.

Thanks for your tips on using DSA.  No crashes now.

I am confused about not needing dsa_attach the second time around.  If I
do that, the dsa handle has been 0x7f'd, which I don't understand since
it is supposed to be allocated in TopMemoryContext.  I didn't dig too
deep to try and find what is causing that behavior.  Once we do, it's
easy to remove the dsa_detach/dsa_attach calls.

I added a new SQL-callable function to invoke summarization of an
individual page range.  That is what I wanted to do in vacuum (rather
than a scan of the complete index), and it seems independently useful.

I also removed the behavior that on index creation the final partial
block range is always summarized.  It's pointless.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index 5bf11dc..5140a38 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -74,9 +74,14 @@
tuple; those tuples remain unsummarized until a summarization run is
invoked later, creating initial summaries.
This process can be invoked manually using the
-   brin_summarize_new_values(regclass) function,
-   or automatically when VACUUM processes the table.
+   brin_summarize_range(regclass, bigint) or
+   brin_summarize_new_values(regclass) functions;
+   automatically when VACUUM processes the table;
+   or by automatic summarization executed by autovacuum, as insertions
+   occur.  (This last trigger is disabled by default and can be enabled
+   with the autosummarize parameter.)
   
+
  
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6887eab..25c18d1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -19685,6 +19685,13 @@ postgres=# SELECT * FROM 
pg_walfile_name_offset(pg_stop_backup());
   
   

+brin_summarize_range(index 
regclass, blockNumber 
bigint)
+   
+   integer
+   summarize the page range covering the given block, if not 
already summarized
+  
+  
+   
 gin_clean_pending_list(index 
regclass)

bigint
@@ -19700,7 +19707,8 @@ postgres=# SELECT * FROM 
pg_walfile_name_offset(pg_stop_backup());
 that are not currently summarized by the index; for any such range
 it creates a new summary index tuple by scanning the table pages.
 It returns the number of new page range summaries that were inserted
-into the index.
+into the index.  brin_summarize_range does the same, except
+it only summarizes the range that covers the given block number.

 

diff --git a/doc/src/sgml/ref/create_index.sgml 
b/doc/src/sgml/ref/create_index.sgml
index 7163b03..83ee7d3 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -382,7 +382,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS 
] 
 

-BRIN indexes accept a different parameter:
+BRIN indexes accept different parameters:

 

@@ -396,6 +396,16 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS 
] 
 

+
+   
+autosummarize
+
+
+ Defines whether a summarization run is invoked for the previous page
+ range whenever an insertion is detected on the next one.
+
+
+   

   
 
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index b22563b..707d04e 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -26,6 +26,7 @@
 #include "catalog/pg_am.h"
 #include "miscadmin.h"
 #include "pgstat.h"
+#include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
 #include "utils/builtins.h"
@@ -60,10 +61,12 @@ typedef struct BrinOpaque
BrinDesc   *bo_bdesc;
 } BrinOpaque;
 
+#define BRIN_ALL_BLOCKRANGES   InvalidBlockNumber
+
 static BrinBuildState *initialize_brin_buildstate(Relation idxRel,
   BrinRevmap *revmap, 
BlockNumber pagesPerRange);
 static void terminate_brin_buildstate(BrinBuildState *state);
-static void brinsummarize(Relation index, Relation heapRel,
+static void brinsummarize(Relation index, Relation heapRel, BlockNumber 
pageRange,
  double *numSummarized, double *numExisting);
 static void form_and_insert_tuple(BrinBuildState *state);
 static void union_tuples(BrinDesc *bdesc, BrinMemTuple *a,
@@ -126,8 +129,11 @@ brinhandler(PG_FUNCTION_ARGS)
  * with those of the new tuple.  If the tuple values are not consistent with
  * the summary tuple, we need to update the index tuple.
  *
+ * If autosummarization is enabled, check if we need to summarize the previous
+ * page range.
+ *
  * If the range is not currently summarized (i.e. the revmap returns NULL for
- * it), there's nothing to do.
+ * it), there's nothing to do for this tuple.
  */
 bool
 

Re: [HACKERS] bumping HASH_VERSION to 3

2017-03-31 Thread Tom Lane
Robert Haas  writes:
> Forcing a reindex in v10 kills three birds
> with one stone:

> - No old, not logged, possibly corrupt hash indexes floating around
> after an upgrade to v10.
> - Can remove the backward-compatibility code added by
> 293e24e507838733aba4748b514536af2d39d7f2 instead of keeping it around
> forever.
> - No need to worry about doing an in-place upgrade of the metapage for
> the above-mentioned patch.

> Thoughts?

+1, as long as we're clear on what will happen when pg_upgrade'ing
an installation containing hash indexes.  I think a minimum requirement is
that it succeed and be able to start up, and allow the user to manually
REINDEX such indexes afterwards.  Bonus points for:

1. teaching pg_upgrade to create a script containing the required REINDEX
commands.  (I think it's produced scripts for similar requirements in the
past.)

2. marking the index invalid so that the system would silently ignore it
until it's been reindexed.  I think there might be adequate infrastructure
for that already thanks to REINDEX CONCURRENTLY, and it'd just be a matter
of getting pg_upgrade to hack the indexes' catalog state.  (If not, it's
probably not worth the trouble.)

A variant on that might just be to not transfer over hash indexes,
leaving the user to CREATE them rather than REINDEX them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] parallel bitmapscan isn't exercised in regression tests

2017-03-31 Thread Andres Freund
Hi,

The parallel code-path isn't actually exercised in the tests added in
[1], as evidenced by [2] (they just explain).  That imo needs to be
fixed.

Greetings,

Andres Freund

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f35742ccb7aa53ee3ed8416bbb378b0c3eeb6bb9
[2] 
https://coverage.postgresql.org/src/backend/executor/nodeBitmapHeapscan.c.gcov.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Peter Geoghegan
On Fri, Mar 31, 2017 at 2:32 PM, Tom Lane  wrote:
> I think you are failing to get the point.  I am not on about whether
> we need a few bytes more or less, I am on about whether the patch
> even works.  As an example, it's quite unclear what the width of
> such an index tuple's nulls bitmap will be if it exists, and there
> certainly will be cases where it must exist because of nulls in the
> keys columns.  I also think we're setting up a situation where tools
> like amcheck and pg_filedump are going to be unable to cope, because
> figuring out what a particular tuple contains is going to require context
> they haven't got.  It just seems way too dangerous.

Why wouldn't they have the context? I think that we can use the page
offset for internal items to indicate the number of attributes that
were truncated in each case. That field is currently unused in all
relevant cases (for "separator" IndexTuples). This wouldn't be the
first time we put a magic value into an item pointer offset. That
detail would be redundant for Anastasia's patch, but we can imagine a
future in which that isn't the case.

There is a free bit within IndexTupleData.t_info that could indicate
that this is what happened. I am not going to comment on how dangerous
any of this may or may not be just yet, but FWIW I think it would be
worth using that bit to allow suffix truncation to work. Suffix
truncation is a widely used technique, that Anastasia's patch happens
to be a special case of. It's a technique that is almost as old as
B-Trees themselves.

The use of a dedicated bit probably wouldn't be necessary, but perhaps
it makes things safer for the patch.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-03-31 Thread Mike Palmiotto
On Fri, Mar 31, 2017 at 2:14 PM, Mike Palmiotto
 wrote:
> On Mon, Mar 27, 2017 at 12:09 PM, Mike Palmiotto
>  wrote:
>> On Mon, Mar 27, 2017 at 11:46 AM, Robert Haas  wrote:
>>> 
>>>  Note that sepgsql hasn't been updated to work with RLS yet, either,
>>> but we didn't regard that as an open item for RLS, or if we did the
>>> resolution was just to document it.  I am not opposed to giving a
>>> little more time to get this straightened out, but if a patch doesn't
>>> show up fairly soon then I think we should just document that sepgsql
>>> doesn't support partitioned tables in v10.  sepgsql has a fairly
>>> lengthy list of implementation restrictions already, so one more is
>>> not going to kill anybody -- or if it will then that person should
>>> produce a patch soon.
>>
>> Okay, I'll make sure I get something fleshed out today or tomorrow.
>
> Apologies for the delay. I was waffling over whether to reference
> PartitionedRelationId in sepgsql, but ended up deciding to just handle
> RELKIND_PARTITIONED_TABLE and treat the classOid as
> RelationRelationId. Seeing as there is a relid in pg_class which
> corresponds to the partitioned table, this chosen route seemed
> acceptable.

Newest patches remove cruft from said waffling. No need to include
pg_partitioned_table.h if we're not referencing PartitionedRelationId.

>
> Here is a demonstration of the partitioned table working with sepgsql hooks:
> https://gist.github.com/anonymous/b10f476a95ae9cdd39b83ef872d4b1e6
>
> Attached you will find two patches, which were rebased on master as of
> 156d388 (applied with `git am --revert [patch file]`). The first gets
> rid of some pesky compiler warnings and the second implements the
> sepgsql handling of partitioned tables.

That should have read `git am --reject [patch file]`. Apologies for
the inaccuracy.

Thanks,

-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com
From 82ff9a4e18d3baefa5530b8515e46cf8225519de Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Tue, 28 Mar 2017 16:44:54 +
Subject: [PATCH 1/2] Silence some sepgsql compiler warnings

selinux/label.h includes stdbool.h, which redefines the bool type and results in
a warning: assignment from incompatible pointer type for sepgsql_fmgr_hook. Move
selinux/label.h above postgres.h, so the bool type is properly defined.

Additionally, sepgsql throws compiler warnings due to possibly uninitialized
tclass in code paths for indexes. Set tclass to a bogus -1 to silence these
warnings.
---
 contrib/sepgsql/label.c| 4 ++--
 contrib/sepgsql/relation.c | 8 ++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index 1a8f884..a404cd7 100644
--- a/contrib/sepgsql/label.c
+++ b/contrib/sepgsql/label.c
@@ -8,6 +8,8 @@
  *
  * -
  */
+#include 
+
 #include "postgres.h"
 
 #include "access/heapam.h"
@@ -37,8 +39,6 @@
 
 #include "sepgsql.h"
 
-#include 
-
 /*
  * Saved hook entries (if stacked)
  */
diff --git a/contrib/sepgsql/relation.c b/contrib/sepgsql/relation.c
index ab98a9b..fd6fe8b 100644
--- a/contrib/sepgsql/relation.c
+++ b/contrib/sepgsql/relation.c
@@ -300,8 +300,10 @@ sepgsql_relation_post_create(Oid relOid)
 			tclass = SEPG_CLASS_DB_VIEW;
 			break;
 		case RELKIND_INDEX:
-			/* deal with indexes specially; no need for tclass */
+			/* other indexes are handled specially below; set tclass to -1 to
+			 * silence compiler warning */
 			sepgsql_index_modify(relOid);
+			tclass = -1;
 			goto out;
 		default:
 			/* ignore other relkinds */
@@ -432,7 +434,9 @@ sepgsql_relation_drop(Oid relOid)
 			/* ignore indexes on toast tables */
 			if (get_rel_namespace(relOid) == PG_TOAST_NAMESPACE)
 return;
-			/* other indexes are handled specially below; no need for tclass */
+			/* other indexes are handled specially below; set tclass to -1 to
+			 * silence compiler warning */
+			tclass = -1;
 			break;
 		default:
 			/* ignore other relkinds */
-- 
2.7.4

From b3a44aa37b5724ea88fb0d1110ba18be6618e283 Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Wed, 29 Mar 2017 14:59:37 +
Subject: [PATCH 2/2] Add partitioned table support to sepgsql

Account for RELKIND_PARTITIONED_RELATIONS in sepgsql and treat the objects like
regular relations. This allows for proper create/alter/drop hook behavior for
partitioned tables.
---
 contrib/sepgsql/label.c|  3 ++-
 contrib/sepgsql/relation.c | 33 +++--
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index a404cd7..88a04a4 100644
--- a/contrib/sepgsql/label.c
+++ b/contrib/sepgsql/label.c
@@ -779,7 +779,8 @@ exec_object_restorecon(struct selabel_handle * sehnd, Oid catalogId)
 			case 

Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 2:32 PM, Tom Lane  wrote:
> Peter Geoghegan  writes:
>> On Fri, Mar 31, 2017 at 2:11 PM, Tom Lane  wrote:
>>> Hm.  Since index tuples lack any means of indicating the actual number
>>> of columns included (ie there's no equivalent of the natts field that
>>> exists in HeapTupleHeaders), I think that this is an unreasonably
>>> dangerous design.  It'd be better to store nulls for the missing
>>> fields.  That would force a null bitmap to be included whether or
>>> not there were nulls in the key columns, but if we're only doing it
>>> once per page that doesn't seem like much cost.
>
>> We're doing it once per page for the leaf page high key, but that's
>> used as the downlink in the second phase of a B-Tree page split --
>> it's directly copied. So, including a NULL bitmap would make
>> Anastasia's patch significantly less useful,
>
> I think you are failing to get the point.  I am not on about whether
> we need a few bytes more or less, I am on about whether the patch
> even works.  As an example, it's quite unclear what the width of
> such an index tuple's nulls bitmap will be if it exists, and there
> certainly will be cases where it must exist because of nulls in the
> keys columns.  I also think we're setting up a situation where tools
> like amcheck and pg_filedump are going to be unable to cope, because
> figuring out what a particular tuple contains is going to require context
> they haven't got.  It just seems way too dangerous.

So, we end up with heap tuples with different numbers of attributes
all the time, whenever you add a column.  It works fine - on decoding,
the additional columns will be treated as null (unless somebody
commits Serge Rielau's patch, which regrettably nobody has gotten
around to reviewing).  Why is this case different?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Somebody has not thought through subscription locking considerations

2017-03-31 Thread Petr Jelinek
On 31/03/17 20:23, Tom Lane wrote:
> Petr Jelinek  writes:
>> On 31/03/17 19:35, Tom Lane wrote:
>>> At the very least, it would be a good idea to exclude the system
>>> catalogs from logical replication, wouldn't it?
> 
>> They are excluded.
> 
> Well, the exclusion is apparently not checked early enough.  I would say
> that touches of system catalogs should never result in any attempts to
> access the logical relation infrastructure, but what we have here is
> such an access.
> 
>> The problematic part is that the pg_subscription_rel manipulation
>> functions acquire ShareRowExclusiveLock and not the usual
>> RowExclusiveLock because there were some worries about concurrency.
> 
> No, the problematic part is that there is any heap_open happening at
> all.  That open could very easily result in a recursive attempt to read
> pg_class, for example, which is going to be fatal if we're in the midst
> of vacuum full'ing or reindex'ing pg_class.  It's frankly astonishing
> to me that this patch seems to have survived testing under
> CLOBBER_CACHE_ALWAYS, because it's only the catalog caches that are
> preventing such recursive lookups.
> 

Hmm okay, so the solution is to either use standard dependency info for
this so that it's only called for tables that are actually know to be
subscribed or have some exceptions in the current code to call the
function only for user catalogs. Any preferences?

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: [[Parallel] Shared] Hash

2017-03-31 Thread Andres Freund
Hi Thomas,

On 2017-03-31 17:53:12 +1300, Thomas Munro wrote:
> Thanks very much to Rafia for testing, and to Andres for his copious
> review feedback.  Here's a new version.  Changes:

I've not looked at that aspect, but one thing I think would be good is
to first add patch that increases coverage of nodeHash[join].c to nearly
100%.  There's currently significant bits of nodeHash.c that aren't
covered (skew optimization, large tuples).

https://coverage.postgresql.org/src/backend/executor/nodeHash.c.gcov.html
https://coverage.postgresql.org/src/backend/executor/nodeHashjoin.c.gcov.html

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Tom Lane
Peter Geoghegan  writes:
> On Fri, Mar 31, 2017 at 2:11 PM, Tom Lane  wrote:
>> Hm.  Since index tuples lack any means of indicating the actual number
>> of columns included (ie there's no equivalent of the natts field that
>> exists in HeapTupleHeaders), I think that this is an unreasonably
>> dangerous design.  It'd be better to store nulls for the missing
>> fields.  That would force a null bitmap to be included whether or
>> not there were nulls in the key columns, but if we're only doing it
>> once per page that doesn't seem like much cost.

> We're doing it once per page for the leaf page high key, but that's
> used as the downlink in the second phase of a B-Tree page split --
> it's directly copied. So, including a NULL bitmap would make
> Anastasia's patch significantly less useful,

I think you are failing to get the point.  I am not on about whether
we need a few bytes more or less, I am on about whether the patch
even works.  As an example, it's quite unclear what the width of
such an index tuple's nulls bitmap will be if it exists, and there
certainly will be cases where it must exist because of nulls in the
keys columns.  I also think we're setting up a situation where tools
like amcheck and pg_filedump are going to be unable to cope, because
figuring out what a particular tuple contains is going to require context
they haven't got.  It just seems way too dangerous.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2017-03-31 Thread David Fetter
On Fri, Mar 31, 2017 at 12:20:51PM -0500, Kevin Grittner wrote:
> On Thu, Mar 30, 2017 at 11:51 AM, Kevin Grittner  wrote:
> 
> > New version attached.  It needs some of these problem cases added
> > to the testing, and a mention in the docs that only C and plpgsql
> > triggers can use the feature so far.  I'll add those tomorrow.
> 
> Done and attached.
> 
> Now the question is, should it be pushed?

Yes.  Among other things, that'll get it buildfarm tested and give
people interested in other PLs better visibility.

That, and I suspect that people will start using this infrastructure
for some very cool projects.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-31 Thread Anastasia Lubennikova

31.03.2017 20:57, Robert Haas:

On Fri, Mar 31, 2017 at 1:40 PM, Anastasia Lubennikova
 wrote:

First of all, I want to thank you and Robert for reviewing this patch.
Your expertise in postgres subsystems is really necessary for features like
this.
I just wonder, why don't you share your thoughts and doubts till the "last
call".

I haven't done any significant technical work other than review
patches in 14 months, and in the last several months I've often worked
10 and 12 hour days to get more review done.

I think at one level you've got a fair complaint here - it's hard to
get things committed, and this patch probably didn't get as much
attention as it deserved.  It's not so easy to know how to fix that.
I'm pretty sure "tell Andres and Robert to work harder" isn't it.



*off-topic*
No complaints from me, I understand how difficult is reviewing and 
highly appreciate your work.

The problem is that not all developers are qualified enough to do a review.
I've tried to make a course about postrges internals. Something like 
"Deep dive into postgres codebase for hackers".
And it turned out to be really helpful for new developers. So, I wonder, 
maybe we could write some tips for new reviewers and testers as well.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bumping HASH_VERSION to 3

2017-03-31 Thread Jesper Pedersen

On 03/31/2017 02:17 PM, Robert Haas wrote:

Starting a new thread about this to get more visibility.

Despite the extensive work that has been done on hash indexes this
release, we have thus far not made any change to the on-disk format
that is not nominally backward-compatible.  Commit
293e24e507838733aba4748b514536af2d39d7f2 did make a change for new
hash indexes, but included backward-compatibility code so that old
indexes would continue to work.  However, I'd like to also commit
Mithun Cy's patch to expand hash indexes more gradually -- latest
version in 
http://postgr.es/m/cad__oujd-ibxm91zcqziayftwqjxnfqgmv361v9zke83s6i...@mail.gmail.com
-- and that's not backward-compatible.

It would be possible to write code to convert the old metapage format
to the new metapage format introduced by that patch, and it wouldn't
be very hard, but I think it would be better to NOT do that, and
instead force everybody upgrading to v10 to rebuild all of their hash
indexes.   If we don't do that, then we'll never know whether
instances of hash index corruption reported against v10 or higher are
caused by defects in the new code, because there's always the chance
that the hash index could have been built on a pre-v10 version, got
corrupted because of the lack of WAL-logging, and then been brought up
to v10+ via pg_upgrade.  Forcing a reindex in v10 kills three birds
with one stone:

- No old, not logged, possibly corrupt hash indexes floating around
after an upgrade to v10.
- Can remove the backward-compatibility code added by
293e24e507838733aba4748b514536af2d39d7f2 instead of keeping it around
forever.
- No need to worry about doing an in-place upgrade of the metapage for
the above-mentioned patch.

Thoughts?



+1

Best regards,
 Jesper



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Pavan Deolasee
On Fri, Mar 31, 2017 at 11:16 PM, Robert Haas  wrote:

> On Fri, Mar 31, 2017 at 10:24 AM, Pavan Deolasee
>  wrote:
> > Having worked on it for some time now, I can say that WARM uses pretty
> much
> > the same infrastructure that HOT uses for cleanup/pruning tuples from the
> > heap. So the risk of having a bug which can eat your data from the heap
> is
> > very low. Sure, it might mess up with indexes, return duplicate keys, not
> > return a row when it should have. Not saying they are not bad bugs, but
> > probably much less severe than someone removing live rows from the heap.
>
> Yes, that's true.  If there's nothing wrong with the way pruning
> works, then any other problem can be fixed by reindexing, I suppose.
>
>
Yeah, I think so.


> I'm not generally a huge fan of on-off switches for things like this,
> but I know Simon likes them.  I think the question is how much they
> really insulate us from bugs.  For the hash index patch, for example,
> the only way to really get insulation from bugs added in this release
> would be to ship both the old and the new code in separate index AMs
> (hash, hash2).  The code has been restructured so much in the process
> of doing all of this that any other form of on-off switch would be
> pretty hit-or-miss whether it actually provided any protection.
>
> Now, I am less sure about this case, but my guess is that you can't
> really have this be something that can be flipped on and off for a
> table.  Once a table has any WARM updates in it, the code that knows
> how to cope with that has to be enabled, and it'll work as well or
> poorly as it does.


That's correct. Once enabled, we will need to handle the case of two index
pointers pointing to the same root. The only way to get rid of that is
probably do a complete rewrite/reindex, I suppose. But I was mostly talking
about immutable flag at table creation time as rightly guessed.


>   Now, I understand you to be suggesting a flag at
> table-creation time that would, maybe, be immutable after that, but
> even then - are we going to run completely unmodified 9.6 code for
> tables where that's not enabled, and only go through any of the WARM
> logic when it is enabled?  Doesn't sound likely.  The commits already
> made from this patch series certainly affect everybody, and I can't
> see us adding switches that bypass
> ce96ce60ca2293f75f36c3661e4657a3c79ffd61 for example.


I don't think I am going to claim that either. But probably only 5% of the
new code would then be involved. Which is a lot less and a lot more
manageable. Having said that, I think if we at all do this, we should only
do it based on our experiences in the beta cycle, as a last resort. Based
on my own experiences during HOT development, long running pgbench tests,
with several concurrent clients, subjected to multiple AV cycles and
periodic consistency checks, usually brings up issues related to heap
corruption. So my confidence level is relatively high on that part of the
code. That's not to suggest that there can't be any bugs.

Obviously then there are other things such as regression to some workload
or additional work required by vacuum etc. And I think we should address
them and I'm fairly certain we can do that. It may not happen immediately,
but if we provide right knobs, may be those who are affected can fall back
to the old behaviour or not use the new code at all while we improve things
for them. Some of these things I could have already implemented, but
without a clear understanding of whether the feature will get in or not,
it's hard to keep putting infinite efforts into the patch. All
non-committers go through that dilemma all the time, I'm sure.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Somebody has not thought through subscription locking considerations

2017-03-31 Thread Tom Lane
Petr Jelinek  writes:
> On 31/03/17 19:35, Tom Lane wrote:
>> At the very least, it would be a good idea to exclude the system
>> catalogs from logical replication, wouldn't it?

> They are excluded.

Well, the exclusion is apparently not checked early enough.  I would say
that touches of system catalogs should never result in any attempts to
access the logical relation infrastructure, but what we have here is
such an access.

> The problematic part is that the pg_subscription_rel manipulation
> functions acquire ShareRowExclusiveLock and not the usual
> RowExclusiveLock because there were some worries about concurrency.

No, the problematic part is that there is any heap_open happening at
all.  That open could very easily result in a recursive attempt to read
pg_class, for example, which is going to be fatal if we're in the midst
of vacuum full'ing or reindex'ing pg_class.  It's frankly astonishing
to me that this patch seems to have survived testing under
CLOBBER_CACHE_ALWAYS, because it's only the catalog caches that are
preventing such recursive lookups.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [BUGS] BUG #14600: Passwords in user mappings leaked by psql \deu+ command

2017-03-31 Thread Feike Steenbergen
Forwarding message from pgsql-bugs for review


Attached a patch which copies the logic from commit
93a6be63a55a8cd0d73b3fa81eb6a46013a3a974.

In the current implementation we only consider privileges of the foreign
server
in determining whether or not to show the user mapping details. This patch
copies the same logic (and documentation) used in commit
93a6be63a55a8cd0d73b3fa81eb6a46013a3a974 to not always show the user mapping
options.

regards,

Feike


user_mappings_leak.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] bumping HASH_VERSION to 3

2017-03-31 Thread Magnus Hagander
On Fri, Mar 31, 2017 at 8:17 PM, Robert Haas  wrote:

> Starting a new thread about this to get more visibility.
>
> Despite the extensive work that has been done on hash indexes this
> release, we have thus far not made any change to the on-disk format
> that is not nominally backward-compatible.  Commit
> 293e24e507838733aba4748b514536af2d39d7f2 did make a change for new
> hash indexes, but included backward-compatibility code so that old
> indexes would continue to work.  However, I'd like to also commit
> Mithun Cy's patch to expand hash indexes more gradually -- latest
> version in http://postgr.es/m/CAD__OujD-iBxm91ZcqziaYftWqJxnFqgMv361V9
> zke83s6i...@mail.gmail.com
> -- and that's not backward-compatible.
>
> It would be possible to write code to convert the old metapage format
> to the new metapage format introduced by that patch, and it wouldn't
> be very hard, but I think it would be better to NOT do that, and
> instead force everybody upgrading to v10 to rebuild all of their hash
> indexes.   If we don't do that, then we'll never know whether
> instances of hash index corruption reported against v10 or higher are
> caused by defects in the new code, because there's always the chance
> that the hash index could have been built on a pre-v10 version, got
> corrupted because of the lack of WAL-logging, and then been brought up
> to v10+ via pg_upgrade.  Forcing a reindex in v10 kills three birds
> with one stone:
>
> - No old, not logged, possibly corrupt hash indexes floating around
> after an upgrade to v10.
> - Can remove the backward-compatibility code added by
> 293e24e507838733aba4748b514536af2d39d7f2 instead of keeping it around
> forever.
> - No need to worry about doing an in-place upgrade of the metapage for
> the above-mentioned patch.
>
> Thoughts?
>

Given the state of hash indexes in <= 9.6, I think this is a reasonable
tradeoff. Most people won't be using them at all today. Those that do will
have to "pay" with a REINDEX on upgrade. I think the benefits definitely
outweigh the cost.

So +1 for doing it.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] delta relations in AFTER triggers

2017-03-31 Thread Kevin Grittner
On Fri, Mar 31, 2017 at 12:58 PM, Robert Haas  wrote:
> On Fri, Mar 31, 2017 at 1:50 PM, Tom Lane  wrote:

>> I would vote for calling the struct typedef EphemeralNamedRelation and
>> using the abbreviation ENR (capitalized that way, not as Enr) in related
>> function names, where that seemed sensible.  I really do *not* like
>> "Enr" as a global name.
>
> Yeah, I had the same thought about capitalization but wasn't sure if
> it was worth suggesting.  But since Tom did, +1 from me.

Will do.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Peter Geoghegan
On Fri, Mar 31, 2017 at 2:11 PM, Tom Lane  wrote:
>> The patch does actually store truncated/key-only tuples in the hi keys /
>> non-leaf-nodes, which don't need the "included" columns.
>
> Hm.  Since index tuples lack any means of indicating the actual number
> of columns included (ie there's no equivalent of the natts field that
> exists in HeapTupleHeaders), I think that this is an unreasonably
> dangerous design.  It'd be better to store nulls for the missing
> fields.  That would force a null bitmap to be included whether or
> not there were nulls in the key columns, but if we're only doing it
> once per page that doesn't seem like much cost.

We're doing it once per page for the leaf page high key, but that's
used as the downlink in the second phase of a B-Tree page split --
it's directly copied. So, including a NULL bitmap would make
Anastasia's patch significantly less useful, since that now has to be
in every internal page, and might imply that you end up with less
fan-in much of the time (e.g., int4 attribute is truncated from the
end relative to leaf IndexTuple format).

I've implemented a rough prototype of suffix truncation, that seems to
work well enough, and keeps amcheck happy, and I base my remarks on
the experience of writing that prototype. Using the NULL bitmap this
way was the first thing I tried.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] bumping HASH_VERSION to 3

2017-03-31 Thread Robert Haas
Starting a new thread about this to get more visibility.

Despite the extensive work that has been done on hash indexes this
release, we have thus far not made any change to the on-disk format
that is not nominally backward-compatible.  Commit
293e24e507838733aba4748b514536af2d39d7f2 did make a change for new
hash indexes, but included backward-compatibility code so that old
indexes would continue to work.  However, I'd like to also commit
Mithun Cy's patch to expand hash indexes more gradually -- latest
version in 
http://postgr.es/m/cad__oujd-ibxm91zcqziayftwqjxnfqgmv361v9zke83s6i...@mail.gmail.com
-- and that's not backward-compatible.

It would be possible to write code to convert the old metapage format
to the new metapage format introduced by that patch, and it wouldn't
be very hard, but I think it would be better to NOT do that, and
instead force everybody upgrading to v10 to rebuild all of their hash
indexes.   If we don't do that, then we'll never know whether
instances of hash index corruption reported against v10 or higher are
caused by defects in the new code, because there's always the chance
that the hash index could have been built on a pre-v10 version, got
corrupted because of the lack of WAL-logging, and then been brought up
to v10+ via pg_upgrade.  Forcing a reindex in v10 kills three birds
with one stone:

- No old, not logged, possibly corrupt hash indexes floating around
after an upgrade to v10.
- Can remove the backward-compatibility code added by
293e24e507838733aba4748b514536af2d39d7f2 instead of keeping it around
forever.
- No need to worry about doing an in-place upgrade of the metapage for
the above-mentioned patch.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-03-31 Thread Mike Palmiotto
On Mon, Mar 27, 2017 at 12:09 PM, Mike Palmiotto
 wrote:
> On Mon, Mar 27, 2017 at 11:46 AM, Robert Haas  wrote:
>> 
>>  Note that sepgsql hasn't been updated to work with RLS yet, either,
>> but we didn't regard that as an open item for RLS, or if we did the
>> resolution was just to document it.  I am not opposed to giving a
>> little more time to get this straightened out, but if a patch doesn't
>> show up fairly soon then I think we should just document that sepgsql
>> doesn't support partitioned tables in v10.  sepgsql has a fairly
>> lengthy list of implementation restrictions already, so one more is
>> not going to kill anybody -- or if it will then that person should
>> produce a patch soon.
>
> Okay, I'll make sure I get something fleshed out today or tomorrow.

Apologies for the delay. I was waffling over whether to reference
PartitionedRelationId in sepgsql, but ended up deciding to just handle
RELKIND_PARTITIONED_TABLE and treat the classOid as
RelationRelationId. Seeing as there is a relid in pg_class which
corresponds to the partitioned table, this chosen route seemed
acceptable.

Here is a demonstration of the partitioned table working with sepgsql hooks:
https://gist.github.com/anonymous/b10f476a95ae9cdd39b83ef872d4b1e6

Attached you will find two patches, which were rebased on master as of
156d388 (applied with `git am --revert [patch file]`). The first gets
rid of some pesky compiler warnings and the second implements the
sepgsql handling of partitioned tables.

Thanks,

-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com
From 06463a4545c1cd0a2740e201d06a36b78dc2da8c Mon Sep 17 00:00:00 2001
From: Mike Palmiotto 
Date: Wed, 29 Mar 2017 14:59:37 +
Subject: [PATCH 2/2] Add partitioned table support to sepgsql

Account for RELKIND_PARTITIONED_RELATIONS in sepgsql and treat the objects like
regular relations. This allows for proper create/alter/drop hook behavior for
partitioned tables.
---
 contrib/sepgsql/hooks.c|  1 +
 contrib/sepgsql/label.c|  4 +++-
 contrib/sepgsql/relation.c | 34 --
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/contrib/sepgsql/hooks.c b/contrib/sepgsql/hooks.c
index 93cc8de..89e71e3 100644
--- a/contrib/sepgsql/hooks.c
+++ b/contrib/sepgsql/hooks.c
@@ -15,6 +15,7 @@
 #include "catalog/pg_class.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_namespace.h"
+#include "catalog/pg_partitioned_table.h"
 #include "catalog/pg_proc.h"
 #include "commands/seclabel.h"
 #include "executor/executor.h"
diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c
index a404cd7..546 100644
--- a/contrib/sepgsql/label.c
+++ b/contrib/sepgsql/label.c
@@ -23,6 +23,7 @@
 #include "catalog/pg_class.h"
 #include "catalog/pg_database.h"
 #include "catalog/pg_namespace.h"
+#include "catalog/pg_partitioned_table.h"
 #include "catalog/pg_proc.h"
 #include "commands/dbcommands.h"
 #include "commands/seclabel.h"
@@ -779,7 +780,8 @@ exec_object_restorecon(struct selabel_handle * sehnd, Oid catalogId)
 			case RelationRelationId:
 relForm = (Form_pg_class) GETSTRUCT(tuple);
 
-if (relForm->relkind == RELKIND_RELATION)
+if (relForm->relkind == RELKIND_RELATION ||
+	relForm->relkind == RELKIND_PARTITIONED_TABLE)
 	objtype = SELABEL_DB_TABLE;
 else if (relForm->relkind == RELKIND_SEQUENCE)
 	objtype = SELABEL_DB_SEQUENCE;
diff --git a/contrib/sepgsql/relation.c b/contrib/sepgsql/relation.c
index fd6fe8b..93022d4 100644
--- a/contrib/sepgsql/relation.c
+++ b/contrib/sepgsql/relation.c
@@ -19,6 +19,7 @@
 #include "catalog/pg_attribute.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_namespace.h"
+#include "catalog/pg_partitioned_table.h"
 #include "commands/seclabel.h"
 #include "lib/stringinfo.h"
 #include "utils/builtins.h"
@@ -54,12 +55,14 @@ sepgsql_attribute_post_create(Oid relOid, AttrNumber attnum)
 	ObjectAddress object;
 	Form_pg_attribute attForm;
 	StringInfoData audit_name;
+	char		relkind;
 
 	/*
 	 * Only attributes within regular relation have individual security
 	 * labels.
 	 */
-	if (get_rel_relkind(relOid) != RELKIND_RELATION)
+	relkind = get_rel_relkind(relOid);
+	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
 		return;
 
 	/*
@@ -135,8 +138,10 @@ sepgsql_attribute_drop(Oid relOid, AttrNumber attnum)
 {
 	ObjectAddress object;
 	char	   *audit_name;
+	char		relkind;
 
-	if (get_rel_relkind(relOid) != RELKIND_RELATION)
+	relkind = get_rel_relkind(relOid);
+	if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE)
 		return;
 
 	/*
@@ -167,8 +172,11 @@ sepgsql_attribute_relabel(Oid relOid, AttrNumber attnum,
 {
 	ObjectAddress object;
 	char	   *audit_name;
+	char		relkind;
 
-	if (get_rel_relkind(relOid) != RELKIND_RELATION)
+	relkind = get_rel_relkind(relOid);
+
+	if (relkind != RELKIND_RELATION && relkind != 

Re: [HACKERS] Somebody has not thought through subscription locking considerations

2017-03-31 Thread Petr Jelinek
On 31/03/17 19:35, Tom Lane wrote:
> Masahiko Sawada  writes:
>> On Fri, Mar 31, 2017 at 9:53 AM, Petr Jelinek
>>  wrote:
>>> On 30/03/17 07:25, Tom Lane wrote:
 I await with interest an explanation of what "VACUUM FULL pg_class" is
 doing trying to acquire ShareRowExclusiveLock on pg_subscription_rel, not
 to mention why a DROP SEQUENCE is holding some fairly strong lock on that
 relation.
> 
>> VACUUM FULL of any table acquires ShareRowExclusiveLock on
>> pg_subscription_rel because when doDeletion removes old heap the
>> RemoveSubscriptionRel is called in heap_drop_with_catalog.
> 
> This seems entirely horrid: it *guarantees* deadlock possibilities.
> And I wonder what happens when I VACUUM FULL pg_subscription_rel
> itself.
> 
> At the very least, it would be a good idea to exclude the system
> catalogs from logical replication, wouldn't it?
> 

They are excluded. It works same way for triggers and many other objects
so I would not say it's horrid.

The problematic part is that the pg_subscription_rel manipulation
functions acquire ShareRowExclusiveLock and not the usual
RowExclusiveLock because there were some worries about concurrency. I
think though that it's not needed though given the access patterns
there. It's only updated by CREATE SUBSCRIPTION/ALTER SUBSCRIPTION
REFRESH and then by tablesync which holds exclusive lock on the table
itself anyway.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Tom Lane
Robert Haas  writes:
> You might want to have a read through that patch.  I think your
> opinion would be helpful in determining how close that patch is to
> being ready to commit (same for WARM).

Well, now that we have an extra week, maybe I'll find the time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Tom Lane
Andres Freund  writes:
> On 2017-03-31 13:44:52 -0400, Tom Lane wrote:
>> Andres Freund  writes:
>>> The covering indexes patch [1] really needs a version of
>>> heap_form_tuple/index_form_tuple that allows to specify the number of
>>> columns in the to-be generated tuple.

>> I was thinking about that this morning, and wondering why exactly it
>> would need such a thing.  Certainly we're not going to store such a
>> truncated tuple in the index, so I assume that the purpose here is
>> just to pass around the tuple internally for some reason.

> The patch does actually store truncated/key-only tuples in the hi keys /
> non-leaf-nodes, which don't need the "included" columns.

Hm.  Since index tuples lack any means of indicating the actual number
of columns included (ie there's no equivalent of the natts field that
exists in HeapTupleHeaders), I think that this is an unreasonably
dangerous design.  It'd be better to store nulls for the missing
fields.  That would force a null bitmap to be included whether or
not there were nulls in the key columns, but if we're only doing it
once per page that doesn't seem like much cost.

>> Um, I didn't notice anyplace where that would have helped, certainly
>> not on the "form tuple" side.  Tuples that don't meet their own tupdesc
>> don't seem to have wide application to me.

> It'd be useful for FieldStore - we'd not have to error out anymore if
> the number of columns changes (which I previously had "solved" by using
> MaxHeapAttributeNumber sized values/nulls array).

Ah, I see.  But in that case you're not really truncating the tuple
--- what you want is to be allowed to pass undersized datum/nulls
arrays and have the missing columns be taken as nulls.

In short, I'd be okay with an extension that allows shortened input
arrays, but I think it needs to produce fully valid tuples with nulls
in the unsupplied columns.  (In the heap case we could indeed achieve
that effect by storing the smaller natts value in the header, but not
in the index case.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GUC for cleanup indexes threshold.

2017-03-31 Thread Masahiko Sawada
On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  wrote:
> On Wed, Mar 29, 2017 at 2:23 AM, Masahiko Sawada  
> wrote:
>> I was thinking that the status of this patch is still "Needs review"
>> because I sent latest version patch[1].
>
> I think you're right.
>
> I took a look at this today.  I think there is some problem with the
> design of this patch.  I originally proposed a threshold based on the
> percentage of not-all-visible pages on the theory that we'd just skip
> looking at the indexes altogether in that case.  But that's not what
> the patch does: it only avoids the index *cleanup*, not the index
> *vacuum*.

Hmm, I guess there is misunderstanding on this thread (my English
might have confused you, sorry).

I've been proposing the patch that allows lazy vacuum skip only the
index cleanup vacuum. Because the problem reported by xu jian[1] is
that second vacuum freeze takes long time even if the table is not
modified since first vaucuum. The reason is that we cannot skip
lazy_cleanup_index even if the table is not changed at all since
previous vacuum. If the table has a garbage the lazy vacuum does
lazy_vacuum_index, on the other hand, if the table doesn't have
garbage, which means that only insertion was execued or the table was
not modified, the lazy vacuum does lazy_cleanup_index instead. Since
I'd been knew this restriction I proposed it. That's why proposing new
GUC parameter name has been "vacuum_cleanup_index_scale_factor".

> And the comments in btvacuumcleanup say this:
>
> /*
>  * If btbulkdelete was called, we need not do anything, just return the
>  * stats from the latest btbulkdelete call.  If it wasn't called, we must
>  * still do a pass over the index, to recycle any newly-recyclable pages
>  * and to obtain index statistics.
>  *
>  * Since we aren't going to actually delete any leaf items, there's no
>  * need to go through all the vacuum-cycle-ID pushups.
>  */

In current design, we can skip lazy_cleanup_index in case where the
number of pages need to be vacuumed is less than the threshold. For
example, after frozen whole table by aggressive vacuum the vacuum can
skip scanning on the index if only a few insertion is execute on that
table. But if a transaction made a garbage on the table, this patch
cannot prevent from vacuuming on the index as you mentioned.

By skipping index cleanup we could leave recyclable pages that are not
marked as a recyclable. But it can be reclaimed by both next index
vacuum and next index cleanup becuase btvacuumpage calls
RecordFreeIndexPage for recyclable page. So I think it doesn't become
a problem.

And a next concern pointed by Peter is that we stash an XID when a
btree page is deleted, which is used to determine when it's finally
safe to recycle the page. I prevent from this situation by not
allowing lazy vacuum to skip the index cleanup when aggressive vacuum.
But since this makes the advantage of this patch weak I'm considering
better idea.

> So, if I'm reading this correctly, the only time this patch saves
> substantial work - at least in the case of a btree index - is in the
> case where there are no dead tuples at all.  But if we only want to
> avoid the work in that case, then a threshold based on the percentage
> of all-visible pages seems like the wrong thing, because the other
> stuff btvacuumcleanup() is doing doesn't have anything to do with the
> number of all-visible pages.

I thought that if the lazy vacuum skipped almost table according to
visibility map it means that the state of table is changed just a
little and we can skip updating the index statistics. Am I missing
something?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] merge psql ef/ev sf/sv handling functions

2017-03-31 Thread Fabien COELHO


Hello,

While reviewing Corey's \if patch, I complained that there was some amount 
of copy-paste in "psql/command.c".


Here is an attempt at merging some functions which removes 160 lines of 
code.


--
Fabien.diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 94a3cfc..edf875d 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -75,10 +75,8 @@ static backslashResult exec_command_d(PsqlScanState scan_state, bool active_bran
 			   const char *cmd);
 static backslashResult exec_command_edit(PsqlScanState scan_state, bool active_branch,
   PQExpBuffer query_buf, PQExpBuffer previous_buf);
-static backslashResult exec_command_ef(PsqlScanState scan_state, bool active_branch,
-PQExpBuffer query_buf);
-static backslashResult exec_command_ev(PsqlScanState scan_state, bool active_branch,
-PQExpBuffer query_buf);
+static backslashResult exec_command_ef_ev(PsqlScanState scan_state, bool active_branch,
+		  PQExpBuffer query_buf, bool is_func);
 static backslashResult exec_command_echo(PsqlScanState scan_state, bool active_branch,
   const char *cmd);
 static backslashResult exec_command_elif(PsqlScanState scan_state, ConditionalStack cstack,
@@ -118,10 +116,8 @@ static backslashResult exec_command_s(PsqlScanState scan_state, bool active_bran
 static backslashResult exec_command_set(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_setenv(PsqlScanState scan_state, bool active_branch,
 	const char *cmd);
-static backslashResult exec_command_sf(PsqlScanState scan_state, bool active_branch,
-const char *cmd);
-static backslashResult exec_command_sv(PsqlScanState scan_state, bool active_branch,
-const char *cmd);
+static backslashResult exec_command_sf_sv(PsqlScanState scan_state, bool active_branch,
+		  const char *cmd, bool is_func);
 static backslashResult exec_command_t(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_T(PsqlScanState scan_state, bool active_branch);
 static backslashResult exec_command_timing(PsqlScanState scan_state, bool active_branch);
@@ -322,9 +318,9 @@ exec_command(const char *cmd,
 		status = exec_command_edit(scan_state, active_branch,
    query_buf, previous_buf);
 	else if (strcmp(cmd, "ef") == 0)
-		status = exec_command_ef(scan_state, active_branch, query_buf);
+		status = exec_command_ef_ev(scan_state, active_branch, query_buf, true);
 	else if (strcmp(cmd, "ev") == 0)
-		status = exec_command_ev(scan_state, active_branch, query_buf);
+		status = exec_command_ef_ev(scan_state, active_branch, query_buf, false);
 	else if (strcmp(cmd, "echo") == 0 || strcmp(cmd, "qecho") == 0)
 		status = exec_command_echo(scan_state, active_branch, cmd);
 	else if (strcmp(cmd, "elif") == 0)
@@ -380,9 +376,9 @@ exec_command(const char *cmd,
 	else if (strcmp(cmd, "setenv") == 0)
 		status = exec_command_setenv(scan_state, active_branch, cmd);
 	else if (strcmp(cmd, "sf") == 0 || strcmp(cmd, "sf+") == 0)
-		status = exec_command_sf(scan_state, active_branch, cmd);
+		status = exec_command_sf_sv(scan_state, active_branch, cmd, true);
 	else if (strcmp(cmd, "sv") == 0 || strcmp(cmd, "sv+") == 0)
-		status = exec_command_sv(scan_state, active_branch, cmd);
+		status = exec_command_sf_sv(scan_state, active_branch, cmd, false);
 	else if (strcmp(cmd, "t") == 0)
 		status = exec_command_t(scan_state, active_branch);
 	else if (strcmp(cmd, "T") == 0)
@@ -976,28 +972,29 @@ exec_command_edit(PsqlScanState scan_state, bool active_branch,
 }
 
 /*
- * \ef -- edit the named function, or present a blank CREATE FUNCTION
- * template if no argument is given
+ * \ef/\ev -- edit the named function/view, or
+ * present a blank CREATE FUNCTION/VIEW template if no argument is given
  */
 static backslashResult
-exec_command_ef(PsqlScanState scan_state, bool active_branch,
-PQExpBuffer query_buf)
+exec_command_ef_ev(PsqlScanState scan_state, bool active_branch,
+	  PQExpBuffer query_buf, bool is_func)
 {
 	backslashResult status = PSQL_CMD_SKIP_LINE;
 
 	if (active_branch)
 	{
-		char	   *func = psql_scan_slash_option(scan_state,
-  OT_WHOLE_LINE, NULL, true);
+		char	   *stuff = psql_scan_slash_option(scan_state,
+	OT_WHOLE_LINE, NULL, true);
 		int			lineno = -1;
 
-		if (pset.sversion < 80400)
+		if (pset.sversion < (is_func ? 80400 : 70400))
 		{
 			char		sverbuf[32];
 
-			psql_error("The server (version %s) does not support editing function source.\n",
+			psql_error("The server (version %s) does not support editing %s.\n",
 	   formatPGVersionNumber(pset.sversion, false,
-			 sverbuf, sizeof(sverbuf)));
+			 sverbuf, sizeof(sverbuf)),
+	   is_func ? "function source" : "view definitions");
 			status = PSQL_CMD_ERROR;
 		}
 		else if (!query_buf)
@@ -1007,36 +1004,43 @@ exec_command_ef(PsqlScanState scan_state, bool active_branch,
 		}
 		else
 		{
-			Oid			foid = InvalidOid;
+			Oid			stuff_oid = InvalidOid;

Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-31 Thread Anastasia Lubennikova

31.03.2017 20:47, Andres Freund:

Maybe it would be better to modify index_form_tuple() to accept a new
argument with a number of attributes, then you can just Assert that
this number is never higher than the number of attributes in the
TupleDesc.

Good point.
I agree that this function is a bit strange. I have to set
tupdesc->nattrs to support compatibility with index_form_tuple().
I didn't want to add neither a new field to tupledesc nor a new
parameter to index_form_tuple(), because they are used widely.


But I haven't considered the possibility of index_form_tuple() failure.
Fixed in this version of the patch. Now it creates a copy of tupledesc to
pass it to index_form_tuple.

That seems like it'd actually be a noticeable increase in memory
allocator overhead.  I think we should just add (as just proposed in
separate thread), a _extended version of it that allows to specify the
number of columns.


The function is called not that often. Only once per page split for 
indexes with included columns.
Doesn't look like dramatic overhead. So I decided that a wrapper 
function would be more appropriate than refactoring of all 
index_form_tuple() calls.

But index_form_tuple_extended() looks like a better solution.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] delta relations in AFTER triggers

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 1:50 PM, Tom Lane  wrote:
> Kevin Grittner  writes:
>> On Thu, Mar 23, 2017 at 7:14 PM, Thomas Munro
>>  wrote:
>>> my only other comment would be a bikeshed issue:
>>> Enr isn't a great name for a struct.
>
>> I know, but EphemeralNamedRelation starts to get kinda long,
>> especially when making the normal sorts of concatenations.  I
>> started making the change and balked when I saw things like
>> EphemeralNamedRelationMetadataData and a function named
>> EphemeralNamedRelationMetadataGetTupDesc() in place of
>> EnrmdGetTupDesc().  A 40 character function name make for a lot of
>> line-wrapping to stay within pgindent limits.  Any suggestions?
>
> I would vote for calling the struct typedef EphemeralNamedRelation and
> using the abbreviation ENR (capitalized that way, not as Enr) in related
> function names, where that seemed sensible.  I really do *not* like
> "Enr" as a global name.

Yeah, I had the same thought about capitalization but wasn't sure if
it was worth suggesting.  But since Tom did, +1 from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REFERENCES privilege should not be symmetric (was Re: [GENERAL] Postgres Permissions Article)

2017-03-31 Thread Magnus Hagander
On Fri, Mar 31, 2017 at 7:40 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > On Fri, Mar 31, 2017 at 11:29 AM, Tom Lane  wrote:
> >> The argument for not back-patching a bug fix usually boils down to
> >> fear of breaking existing applications, but it's hard to see how
> >> removal of a permission check could break a working application ---
> >> especially when the permission check is as hard to trigger as this one.
> >> How many table owners ever revoke their own REFERENCES permission?
>
> > Sure, but that argument cuts both ways.  If nobody ever does that, who
> > will be helped by back-patching this?
> > I certainly agree that back-patching this change is pretty low risk.
> > I just don't think it has any real benefits.
>
> I think the benefit is reduction of user confusion.  Admittedly, since
> Paul is the first person I can remember ever having complained about it,
> maybe nobody else is confused.
>

I think we also need to be extra careful about changing *security related*
behavior in back branches, even more so than other behavior. In this case I
think it's quite unlikely that it would hit somebody, but the risk is
there. And people generally auto-upgrade to the latest minor releases,
whereas they at least in theory read the top of the release notes when
doing a major upgrade (ok, most people probably don't, but at least some
do).

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 1:40 PM, Anastasia Lubennikova
 wrote:
> First of all, I want to thank you and Robert for reviewing this patch.
> Your expertise in postgres subsystems is really necessary for features like
> this.
> I just wonder, why don't you share your thoughts and doubts till the "last
> call".

I haven't done any significant technical work other than review
patches in 14 months, and in the last several months I've often worked
10 and 12 hour days to get more review done.

I think at one level you've got a fair complaint here - it's hard to
get things committed, and this patch probably didn't get as much
attention as it deserved.  It's not so easy to know how to fix that.
I'm pretty sure "tell Andres and Robert to work harder" isn't it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REFERENCES privilege should not be symmetric (was Re: [GENERAL] Postgres Permissions Article)

2017-03-31 Thread David G. Johnston
On Fri, Mar 31, 2017 at 10:40 AM, Tom Lane  wrote:

> Robert Haas  writes:
> > On Fri, Mar 31, 2017 at 11:29 AM, Tom Lane  wrote:
> >> The argument for not back-patching a bug fix usually boils down to
> >> fear of breaking existing applications, but it's hard to see how
> >> removal of a permission check could break a working application ---
> >> especially when the permission check is as hard to trigger as this one.
> >> How many table owners ever revoke their own REFERENCES permission?
>
> > Sure, but that argument cuts both ways.  If nobody ever does that, who
> > will be helped by back-patching this?
> > I certainly agree that back-patching this change is pretty low risk.
> > I just don't think it has any real benefits.
>
> I think the benefit is reduction of user confusion.  Admittedly, since
> Paul is the first person I can remember ever having complained about it,
> maybe nobody else is confused.
>

​After going back-and-forth on this (and not being able to independently
come to the conclusion that what we are adhering to is actually a typo) I'm
going to toss my +1 in with Robert's.  If anyone actually complains about
the behavior and not just the documentation we could consider back-patching
if any release before 10.0 is still under support.

There have been non-bug fix improvements to the docs that didn't get
back-patched covering topics more confusing than this.  Expecting those
learning the system to consult the most recent version of the docs is
standard fare here.  From a practical perspective the revised current docs
will be applicable for past versions as long as one doesn't go a get their
REFERENCES permission revoked somehow.  If they do, and wonder why, the
docs and these list will be able to explain it reasonably well.

David J.​


Re: [HACKERS] delta relations in AFTER triggers

2017-03-31 Thread Tom Lane
Kevin Grittner  writes:
> On Thu, Mar 23, 2017 at 7:14 PM, Thomas Munro
>  wrote:
>> my only other comment would be a bikeshed issue:
>> Enr isn't a great name for a struct.

> I know, but EphemeralNamedRelation starts to get kinda long,
> especially when making the normal sorts of concatenations.  I
> started making the change and balked when I saw things like
> EphemeralNamedRelationMetadataData and a function named
> EphemeralNamedRelationMetadataGetTupDesc() in place of
> EnrmdGetTupDesc().  A 40 character function name make for a lot of
> line-wrapping to stay within pgindent limits.  Any suggestions?

I would vote for calling the struct typedef EphemeralNamedRelation and
using the abbreviation ENR (capitalized that way, not as Enr) in related
function names, where that seemed sensible.  I really do *not* like
"Enr" as a global name.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Andres Freund
On 2017-03-31 13:44:52 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > The covering indexes patch [1] really needs a version of
> > heap_form_tuple/index_form_tuple that allows to specify the number of
> > columns in the to-be generated tuple.
> 
> I was thinking about that this morning, and wondering why exactly it
> would need such a thing.  Certainly we're not going to store such a
> truncated tuple in the index, so I assume that the purpose here is
> just to pass around the tuple internally for some reason.

The patch does actually store truncated/key-only tuples in the hi keys /
non-leaf-nodes, which don't need the "included" columns.


> > Previously the faster expression
> > evaluation stuff could also have benefited form the same for both
> > forming and deforming tuples.
> 
> Um, I didn't notice anyplace where that would have helped, certainly
> not on the "form tuple" side.  Tuples that don't meet their own tupdesc
> don't seem to have wide application to me.

It'd be useful for FieldStore - we'd not have to error out anymore if
the number of columns changes (which I previously had "solved" by using
MaxHeapAttributeNumber sized values/nulls array).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 1:44 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> The covering indexes patch [1] really needs a version of
>> heap_form_tuple/index_form_tuple that allows to specify the number of
>> columns in the to-be generated tuple.
>
> I was thinking about that this morning, and wondering why exactly it
> would need such a thing.  Certainly we're not going to store such a
> truncated tuple in the index, ...

The patch stores it in the index as a high key.

You might want to have a read through that patch.  I think your
opinion would be helpful in determining how close that patch is to
being ready to commit (same for WARM).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2017-03-31 Thread Andres Freund
On 2017-03-31 20:40:59 +0300, Anastasia Lubennikova wrote:
> 30.03.2017 22:11, Andres Freund
> > Any objection from reviewers to push both patches?
> 
> First of all, I want to thank you and Robert for reviewing this patch.
> Your expertise in postgres subsystems is really necessary for features like
> this.
> I just wonder, why don't you share your thoughts and doubts till the "last
> call".

Because there's a lot of other patches?  I only looked because Teodor
announced he was thinking about committing - I just don't have the
energy to look at all patches before they're ready to commit.
Unfortunatly "ready-for-committer" is very frequently not actually that
:(


> > Maybe it would be better to modify index_form_tuple() to accept a new
> > argument with a number of attributes, then you can just Assert that
> > this number is never higher than the number of attributes in the
> > TupleDesc.
> Good point.
> I agree that this function is a bit strange. I have to set
> tupdesc->nattrs to support compatibility with index_form_tuple().
> I didn't want to add neither a new field to tupledesc nor a new
> parameter to index_form_tuple(), because they are used widely.
> 
> 
> But I haven't considered the possibility of index_form_tuple() failure.
> Fixed in this version of the patch. Now it creates a copy of tupledesc to
> pass it to index_form_tuple.

That seems like it'd actually be a noticeable increase in memory
allocator overhead.  I think we should just add (as just proposed in
separate thread), a _extended version of it that allows to specify the
number of columns.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 10:24 AM, Pavan Deolasee
 wrote:
> Having worked on it for some time now, I can say that WARM uses pretty much
> the same infrastructure that HOT uses for cleanup/pruning tuples from the
> heap. So the risk of having a bug which can eat your data from the heap is
> very low. Sure, it might mess up with indexes, return duplicate keys, not
> return a row when it should have. Not saying they are not bad bugs, but
> probably much less severe than someone removing live rows from the heap.

Yes, that's true.  If there's nothing wrong with the way pruning
works, then any other problem can be fixed by reindexing, I suppose.

> And we can make it a table level property, keep it off by default, turn it
> off on system tables in this release and change the defaults only when we
> get more confidence assuming people use it by explicitly turning it on. Now
> may be that's not the right approach and keeping it off by default will mean
> it receives much less testing than we would like. So we can keep it on in
> the beta cycle and then take a call. I went a good length to make it work on
> system tables because during HOT development, Tom told me that it better
> work for everything or it doesn't work at all. But with WARM it works for
> system tables and I know no known bugs, but if we don't want to risk system
> tables, we might want to turn it off (just prior to release may be).

I'm not generally a huge fan of on-off switches for things like this,
but I know Simon likes them.  I think the question is how much they
really insulate us from bugs.  For the hash index patch, for example,
the only way to really get insulation from bugs added in this release
would be to ship both the old and the new code in separate index AMs
(hash, hash2).  The code has been restructured so much in the process
of doing all of this that any other form of on-off switch would be
pretty hit-or-miss whether it actually provided any protection.

Now, I am less sure about this case, but my guess is that you can't
really have this be something that can be flipped on and off for a
table.  Once a table has any WARM updates in it, the code that knows
how to cope with that has to be enabled, and it'll work as well or
poorly as it does.  Now, I understand you to be suggesting a flag at
table-creation time that would, maybe, be immutable after that, but
even then - are we going to run completely unmodified 9.6 code for
tables where that's not enabled, and only go through any of the WARM
logic when it is enabled?  Doesn't sound likely.  The commits already
made from this patch series certainly affect everybody, and I can't
see us adding switches that bypass
ce96ce60ca2293f75f36c3661e4657a3c79ffd61 for example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-03-31 Thread Robert Haas
On Thu, Mar 30, 2017 at 10:49 AM, Petr Jelinek
 wrote:
> While reading this thread I am thinking if we could just not do WARM on
> TOAST and compressed values if we know there might be regressions there.
> I mean I've seen the problem WARM tries to solve mostly on timestamp or
> boolean values and sometimes counters so it would still be helpful to
> quite a lot of people even if we didn't do TOAST and compressed values
> in v1. It's not like not doing WARM sometimes is somehow terrible, we'll
> just fall back to current behavior.

Good point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow to specify #columns in heap/index_form_tuple

2017-03-31 Thread Tom Lane
Andres Freund  writes:
> The covering indexes patch [1] really needs a version of
> heap_form_tuple/index_form_tuple that allows to specify the number of
> columns in the to-be generated tuple.

I was thinking about that this morning, and wondering why exactly it
would need such a thing.  Certainly we're not going to store such a
truncated tuple in the index, so I assume that the purpose here is
just to pass around the tuple internally for some reason.  What's wrong
with just generating the full tuple, perhaps with nulls for the extra
columns if you're concerned about memory space?

> Previously the faster expression
> evaluation stuff could also have benefited form the same for both
> forming and deforming tuples.

Um, I didn't notice anyplace where that would have helped, certainly
not on the "form tuple" side.  Tuples that don't meet their own tupdesc
don't seem to have wide application to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REFERENCES privilege should not be symmetric (was Re: [GENERAL] Postgres Permissions Article)

2017-03-31 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 31, 2017 at 11:29 AM, Tom Lane  wrote:
>> The argument for not back-patching a bug fix usually boils down to
>> fear of breaking existing applications, but it's hard to see how
>> removal of a permission check could break a working application ---
>> especially when the permission check is as hard to trigger as this one.
>> How many table owners ever revoke their own REFERENCES permission?

> Sure, but that argument cuts both ways.  If nobody ever does that, who
> will be helped by back-patching this?
> I certainly agree that back-patching this change is pretty low risk.
> I just don't think it has any real benefits.

I think the benefit is reduction of user confusion.  Admittedly, since
Paul is the first person I can remember ever having complained about it,
maybe nobody else is confused.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: pageinspect / add page_checksum and bt_page_items(bytea)

2017-03-31 Thread Tomas Vondra

On 03/31/2017 06:01 PM, Ashutosh Sharma wrote:


It seems like the latest patch(v4) shared by Tomas upthread is an
empty patch. If I am not wrong, please share the correct patch.
Thanks.



OMG, this is the second time I managed to generate an empty patch. I 
really need to learn not to do that ..


Attached is v5 of the patch, hopefully correct this time ;-)

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 3d69aa9..bc2d27c 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c
@@ -41,6 +41,7 @@
 
 PG_FUNCTION_INFO_V1(bt_metap);
 PG_FUNCTION_INFO_V1(bt_page_items);
+PG_FUNCTION_INFO_V1(bt_page_items_bytea);
 PG_FUNCTION_INFO_V1(bt_page_stats);
 
 #define IS_INDEX(r) ((r)->rd_rel->relkind == RELKIND_INDEX)
@@ -253,14 +254,62 @@ struct user_args
 	OffsetNumber offset;
 };
 
+/*
+ * bt_page_print_tuples
+ *	form a tuple describing index tuple at a given offset
+ */
+static Datum
+bt_page_print_tuples(FuncCallContext *fctx, Page page, OffsetNumber offset)
+{
+	char	   *values[6];
+	HeapTuple	tuple;
+	ItemId		id;
+	IndexTuple	itup;
+	int			j;
+	int			off;
+	int			dlen;
+	char	   *dump;
+	char	   *ptr;
+
+	id = PageGetItemId(page, offset);
+
+	if (!ItemIdIsValid(id))
+		elog(ERROR, "invalid ItemId");
+
+	itup = (IndexTuple) PageGetItem(page, id);
+
+	j = 0;
+	values[j++] = psprintf("%d", offset);
+	values[j++] = psprintf("(%u,%u)",
+		   ItemPointerGetBlockNumberNoCheck(>t_tid),
+		ItemPointerGetOffsetNumberNoCheck(>t_tid));
+	values[j++] = psprintf("%d", (int) IndexTupleSize(itup));
+	values[j++] = psprintf("%c", IndexTupleHasNulls(itup) ? 't' : 'f');
+	values[j++] = psprintf("%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
+
+	ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
+	dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
+	dump = palloc0(dlen * 3 + 1);
+	values[j] = dump;
+	for (off = 0; off < dlen; off++)
+	{
+		if (off > 0)
+			*dump++ = ' ';
+		sprintf(dump, "%02x", *(ptr + off) & 0xff);
+		dump += 2;
+	}
+
+	tuple = BuildTupleFromCStrings(fctx->attinmeta, values);
+
+	return HeapTupleGetDatum(tuple);
+}
+
 Datum
 bt_page_items(PG_FUNCTION_ARGS)
 {
 	text	   *relname = PG_GETARG_TEXT_PP(0);
 	uint32		blkno = PG_GETARG_UINT32(1);
 	Datum		result;
-	char	   *values[6];
-	HeapTuple	tuple;
 	FuncCallContext *fctx;
 	MemoryContext mctx;
 	struct user_args *uargs;
@@ -300,6 +349,11 @@ bt_page_items(PG_FUNCTION_ARGS)
 		if (blkno == 0)
 			elog(ERROR, "block 0 is a meta page");
 
+		if (P_ISMETA(opaque))
+			ereport(ERROR,
+	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+  errmsg("block is a meta page")));
+
 		CHECK_RELATION_BLOCK_RANGE(rel, blkno);
 
 		buffer = ReadBuffer(rel, blkno);
@@ -345,47 +399,8 @@ bt_page_items(PG_FUNCTION_ARGS)
 
 	if (fctx->call_cntr < fctx->max_calls)
 	{
-		ItemId		id;
-		IndexTuple	itup;
-		int			j;
-		int			off;
-		int			dlen;
-		char	   *dump;
-		char	   *ptr;
-
-		id = PageGetItemId(uargs->page, uargs->offset);
-
-		if (!ItemIdIsValid(id))
-			elog(ERROR, "invalid ItemId");
-
-		itup = (IndexTuple) PageGetItem(uargs->page, id);
-
-		j = 0;
-		values[j++] = psprintf("%d", uargs->offset);
-		values[j++] = psprintf("(%u,%u)",
-			   ItemPointerGetBlockNumberNoCheck(>t_tid),
-			ItemPointerGetOffsetNumberNoCheck(>t_tid));
-		values[j++] = psprintf("%d", (int) IndexTupleSize(itup));
-		values[j++] = psprintf("%c", IndexTupleHasNulls(itup) ? 't' : 'f');
-		values[j++] = psprintf("%c", IndexTupleHasVarwidths(itup) ? 't' : 'f');
-
-		ptr = (char *) itup + IndexInfoFindDataOffset(itup->t_info);
-		dlen = IndexTupleSize(itup) - IndexInfoFindDataOffset(itup->t_info);
-		dump = palloc0(dlen * 3 + 1);
-		values[j] = dump;
-		for (off = 0; off < dlen; off++)
-		{
-			if (off > 0)
-*dump++ = ' ';
-			sprintf(dump, "%02x", *(ptr + off) & 0xff);
-			dump += 2;
-		}
-
-		tuple = BuildTupleFromCStrings(fctx->attinmeta, values);
-		result = HeapTupleGetDatum(tuple);
-
-		uargs->offset = uargs->offset + 1;
-
+		result = bt_page_print_tuples(fctx, uargs->page, uargs->offset);
+		uargs->offset++;
 		SRF_RETURN_NEXT(fctx, result);
 	}
 	else
@@ -396,6 +411,90 @@ bt_page_items(PG_FUNCTION_ARGS)
 	}
 }
 
+/*---
+ * bt_page_items_bytea()
+ *
+ * Get IndexTupleData set in a btree page
+ *
+ * Usage: SELECT * FROM bt_page_items(get_raw_page('t1_pkey', 1));
+ *---
+ */
+
+Datum
+bt_page_items_bytea(PG_FUNCTION_ARGS)
+{
+	bytea	   *raw_page = PG_GETARG_BYTEA_P(0);
+	Datum		result;
+	FuncCallContext *fctx;
+	struct user_args *uargs;
+	int			raw_page_size;
+
+	if (!superuser())
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ (errmsg("must be superuser to use pageinspect functions";
+
+	if (SRF_IS_FIRSTCALL())
+	{
+		

  1   2   >