Re: PostgreSQL crashes with SIGSEGV

2018-03-28 Thread Peter Geoghegan
On Wed, Mar 28, 2018 at 10:30 AM, Tom Lane  wrote:
> It looks good to me.  The only real objection would be if someone came
> up with a test case proving that there's a significant performance
> degradation from the extra copies.  But given that these are back
> branches, it would take a pretty steep penalty for me to want to take
> the risk of refactoring to avoid that.
>
> I've pushed it with some cosmetic adjustments.

Thank you, Tom.

-- 
Peter Geoghegan



Re: PostgreSQL crashes with SIGSEGV

2018-03-28 Thread Tom Lane
Peter Geoghegan  writes:
> On Mon, Mar 26, 2018 at 5:14 AM, Kyotaro HORIGUCHI
>  wrote:
>> If no one objects, I'll mark this as Ready for Commit in a couple
>> of days.

> Thank you for the review, Horiguchi-san. It's hard to decide how
> important each goal is when coming up with a back-patchable fix like
> this. When the goals are somewhat in competition with each other, a
> second or a third opinion is particularly appreciated.

It looks good to me.  The only real objection would be if someone came
up with a test case proving that there's a significant performance
degradation from the extra copies.  But given that these are back
branches, it would take a pretty steep penalty for me to want to take
the risk of refactoring to avoid that.

I've pushed it with some cosmetic adjustments.

regards, tom lane



Re: PostgreSQL crashes with SIGSEGV

2018-03-27 Thread Peter Geoghegan
On Tue, Mar 27, 2018 at 2:23 AM, Kyotaro HORIGUCHI
 wrote:
>> > If no one objects, I'll mark this as Ready for Commit in a couple
>> > of days.
>>
>> Thank you for the review, Horiguchi-san. It's hard to decide how
>> important each goal is when coming up with a back-patchable fix like
>> this. When the goals are somewhat in competition with each other, a
>> second or a third opinion is particularly appreciated.
>
> Understood. I'll wait for the other opinions.

I wasn't specifically requesting that you not mark the patch as ready
for committer, actually. I was just expressing that your input was
valuable. Sorry for being unclear.

-- 
Peter Geoghegan



Re: PostgreSQL crashes with SIGSEGV

2018-03-27 Thread Kyotaro HORIGUCHI
At Mon, 26 Mar 2018 20:40:51 -0700, Peter Geoghegan  wrote in 

> On Mon, Mar 26, 2018 at 5:14 AM, Kyotaro HORIGUCHI
>  wrote:
> >> Attached patches do it that way. I'm happy with what I came up with,
> >> which is a lot simpler than my first approach. The extra copying seems
> >> likely to be well worth it, since it is fairly isolated in practice,
> >> especially on 9.6. There is no extra copying from v10+, since they
> >> don't need the first fix at all.
> 
> > If no one objects, I'll mark this as Ready for Commit in a couple
> > of days.
> 
> Thank you for the review, Horiguchi-san. It's hard to decide how
> important each goal is when coming up with a back-patchable fix like
> this. When the goals are somewhat in competition with each other, a
> second or a third opinion is particularly appreciated.

Understood. I'll wait for the other opinions.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: PostgreSQL crashes with SIGSEGV

2018-03-26 Thread Peter Geoghegan
On Mon, Mar 26, 2018 at 5:14 AM, Kyotaro HORIGUCHI
 wrote:
>> Attached patches do it that way. I'm happy with what I came up with,
>> which is a lot simpler than my first approach. The extra copying seems
>> likely to be well worth it, since it is fairly isolated in practice,
>> especially on 9.6. There is no extra copying from v10+, since they
>> don't need the first fix at all.

> If no one objects, I'll mark this as Ready for Commit in a couple
> of days.

Thank you for the review, Horiguchi-san. It's hard to decide how
important each goal is when coming up with a back-patchable fix like
this. When the goals are somewhat in competition with each other, a
second or a third opinion is particularly appreciated.

-- 
Peter Geoghegan



Re: PostgreSQL crashes with SIGSEGV

2018-03-26 Thread Kyotaro HORIGUCHI
At Thu, 22 Feb 2018 16:22:47 -0800, Peter Geoghegan  wrote in 

> On Mon, Feb 12, 2018 at 6:15 PM, Peter Geoghegan  wrote:
> > * For 9.5 and 9.6, the approach taken in bugfix commit d8589946d
> > should be taken even further -- we should always copy. Moreover, we
> > should always copy within tuplesort_getdatum(), for the same reasons.
> >
> > * For 9.5, 9.6, 10, and master, we should make sure that
> > tuplesort_getdatum() uses the caller's memory context. The fact that
> > it doesn't already do so seems like a simple oversight. We should do
> > this to be consistent with tuplesort_gettupleslot(). (This isn't
> > critical, but seems like a good idea.)
> >
> > * For 9.5, 9.6, 10, and master, we should adjust some comments from
> > tuplesort_getdatum() callers, so that they no longer say that
> > tuplesort datum tuple memory lives in tuplesort context. That won't be
> > true anymore.
> 
> Attached patches do it that way. I'm happy with what I came up with,
> which is a lot simpler than my first approach. The extra copying seems
> likely to be well worth it, since it is fairly isolated in practice,
> especially on 9.6. There is no extra copying from v10+, since they
> don't need the first fix at all.

FWIW I examined the case by myself. And I confirmed that the
cause is tuple freeing after tuplesort_end conducted by
shouldFree. As a principle, since it is declared that the caller
is responsible to free the result, tuplesort shouldn't free it
out of the caller's control. Even if it is not currently causig
use-after-free problem, it is also possible to happen.

>From this point of view, the patch for 9.5 and 9.6 looks fine and
actually fixes the problem.

For 10+, copying is controlled by the caller side, but only
tuplesort_getdatum() didn't make the copy in the caller
context. It is just an overlook and the fix looks reasonable.

I'm not appropriate for checking comment wording but it makes
sense for me.

If no one objects, I'll mark this as Ready for Commit in a couple
of days.

reagards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: PostgreSQL crashes with SIGSEGV

2018-02-22 Thread Peter Geoghegan
On Mon, Feb 12, 2018 at 6:15 PM, Peter Geoghegan  wrote:
> * For 9.5 and 9.6, the approach taken in bugfix commit d8589946d
> should be taken even further -- we should always copy. Moreover, we
> should always copy within tuplesort_getdatum(), for the same reasons.
>
> * For 9.5, 9.6, 10, and master, we should make sure that
> tuplesort_getdatum() uses the caller's memory context. The fact that
> it doesn't already do so seems like a simple oversight. We should do
> this to be consistent with tuplesort_gettupleslot(). (This isn't
> critical, but seems like a good idea.)
>
> * For 9.5, 9.6, 10, and master, we should adjust some comments from
> tuplesort_getdatum() callers, so that they no longer say that
> tuplesort datum tuple memory lives in tuplesort context. That won't be
> true anymore.

Attached patches do it that way. I'm happy with what I came up with,
which is a lot simpler than my first approach. The extra copying seems
likely to be well worth it, since it is fairly isolated in practice,
especially on 9.6. There is no extra copying from v10+, since they
don't need the first fix at all.

-- 
Peter Geoghegan
From fcbdb2e4fb76a75a0c56ca258f99df888b174dc3 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Thu, 14 Dec 2017 21:22:41 -0800
Subject: [PATCH] Fix double free of tuple with grouping sets/tuplesort.c.

Have tuplesort_gettupleslot() copy tuple contents for slot into caller's
memory context in all cases.  This necessitates an extra copy on 9.5 and
9.6, though only for cases where an explicit copy was not already
required (mostly just TSS_SORTEDONTAPE fetches, though all
TSS_FINALMERGE fetches on 9.5).  Do the same for tuplesort_getdatum().

This fixes a crash that can occur on at least 9.5 and 9.6, where some
grouping sets queries may call tuplesort_end() at node shutdown time,
and free tuple memory that the executor/a tuple slot imagines it is
responsible for.

In the case of all other fetch routines, continue to allocate the memory
in the tuplesort-owned context (only change comments).  This is a little
inconsistent with what we now do for tuplesort_gettupleslot() and
tuplesort_getdatum(), but that's preferable to adding new copy overhead
in the backbranches where it's clearly unnecessary.  These other fetch
routines provide the weakest possible guarantees about tuple memory
lifespan from v10 on, anyway, so this actually seems more consistent
overall.

In passing, change the memory context switch point within
tuplesort_getdatum() to matchtuplesort_gettupleslot(), so that we
actually allocate memory in the right context there (caller's context),
which seems like an independent bug.  It's not clear that this can cause
a crash, but it's still wrong.

Only the latter, less important fix is required on v10 and master,
because in v10 there is already no question of not having to make a copy
in caller's context anyway (see commits e94568ec and 3856cf96).

Author: Peter Geoghegan
Reported-By: Bernd Helmle
Analyzed-By: Peter Geoghegan, Andreas Seltenreich, Bernd Helmle
Discussion: https://www.postgresql.org/message-id/1512661638.9720.34.ca...@oopsware.de
Backpatch: 9.5-, where the issue is known to cause a crash.
---
 src/backend/utils/adt/orderedsetaggs.c | 21 +-
 src/backend/utils/sort/tuplesort.c | 74 +-
 2 files changed, 63 insertions(+), 32 deletions(-)

diff --git a/src/backend/utils/adt/orderedsetaggs.c b/src/backend/utils/adt/orderedsetaggs.c
index fe44d56..4d1e201 100644
--- a/src/backend/utils/adt/orderedsetaggs.c
+++ b/src/backend/utils/adt/orderedsetaggs.c
@@ -457,10 +457,9 @@ percentile_disc_final(PG_FUNCTION_ARGS)
 		elog(ERROR, "missing row in percentile_disc");
 
 	/*
-	 * Note: we *cannot* clean up the tuplesort object here, because the value
-	 * to be returned is allocated inside its sortcontext.  We could use
-	 * datumCopy to copy it out of there, but it doesn't seem worth the
-	 * trouble, since the cleanup callback will clear the tuplesort later.
+	 * Note: we could clean up the tuplesort object here, but it doesn't seem
+	 * worth the trouble, since the cleanup callback will clear the tuplesort
+	 * later.
 	 */
 
 	/* We shouldn't have stored any nulls, but do the right thing anyway */
@@ -575,10 +574,9 @@ percentile_cont_final_common(FunctionCallInfo fcinfo,
 	}
 
 	/*
-	 * Note: we *cannot* clean up the tuplesort object here, because the value
-	 * to be returned may be allocated inside its sortcontext.  We could use
-	 * datumCopy to copy it out of there, but it doesn't seem worth the
-	 * trouble, since the cleanup callback will clear the tuplesort later.
+	 * Note: we could clean up the tuplesort object here, but it doesn't seem
+	 * worth the trouble, since the cleanup callback will clear the tuplesort
+	 * later.
 	 */
 
 	PG_RETURN_DATUM(val);
@@ -1097,10 +1095,9 @@ mode_final(PG_FUNCTION_ARGS)
 		pfree(DatumGetPointer(last_val));
 
 	/*
-	 * Note: we *cannot* clean up the tuplesort object here, because the value
-	 * to be

Re: PostgreSQL crashes with SIGSEGV

2018-02-12 Thread Peter Geoghegan
On Wed, Feb 7, 2018 at 7:02 PM, Peter Geoghegan  wrote:
> On Wed, Feb 7, 2018 at 4:41 PM, Tom Lane  wrote:
>> Peter Geoghegan  writes:
>>> It would be nice to get an opinion on this mode_final() + tuplesort
>>> memory lifetime business from you, Tom.
>>
>> I'm fairly sure that that bit in mode_final() was just a hack to make
>> it work.  If there's a better, more principled way, let's go for it.
>
> This is the more principled way. It is much easier to make every
> single tuplesort caller on every release branch follow this rule (or
> those on 9.5+, at least).

I now think that my approach to fixing 9.6 in
WIP-tuplesort-memcontext-fix.patch is too complicated to justify
backpatching. I had the right idea there, and have no reason to think
it won't work, but it now seems like the complexity simply isn't worth
it. The advantage of WIP-tuplesort-memcontext-fix.patch was that it
avoided an extra copy within tuplesort_gettupleslot() on the earlier
Postgres versions it targeted (the versions where that function does
not have a "copy" argument), by arranging to make sure that low-level
routines have tuplesort caller context passed all the way down.
However, now that I consider the frequency that
WIP-tuplesort-memcontext-fix.patch would avoid such copying given real
9.6 workloads, its approach looks rather unappealing -- we should
instead just do a copy in all cases.

Another way of putting it is that it now seems like the approach taken
in bugfix commit d8589946d should be taken even further for 9.6, so
that we *always* copy for the tuplesort_gettupleslot() caller, rather
than just copying in the most common cases. We'd also sometimes have
to free the redundant memory allocated by tuplesort_gettuple_common()
within tuplesort_gettupleslot() if we went this way -- the should_free
= true case would have tuplesort_gettuple_common() do a pfree() after
copying. Needing a pfree() is a consequence of allocating memory for
caller, and then copying it for caller when we know that we're using
caller's memory context. A bit weird, but certainly very simple.

New plan:

* For 9.5 and 9.6, the approach taken in bugfix commit d8589946d
should be taken even further -- we should always copy. Moreover, we
should always copy within tuplesort_getdatum(), for the same reasons.

* For 9.5, 9.6, 10, and master, we should make sure that
tuplesort_getdatum() uses the caller's memory context. The fact that
it doesn't already do so seems like a simple oversight. We should do
this to be consistent with tuplesort_gettupleslot(). (This isn't
critical, but seems like a good idea.)

* For 9.5, 9.6, 10, and master, we should adjust some comments from
tuplesort_getdatum() callers, so that they no longer say that
tuplesort datum tuple memory lives in tuplesort context. That won't be
true anymore.

Anyone have an opinion on this?

The advantages of this approach are:

- It's far simpler than WIP-tuplesort-memcontext-fix.patch, and can be
applied to 9.5 and 9.6 with only small adjustments.

- It leaves all branches essentially consistent with v10+. v10+ gets
everything right already (except for that one minor
tuplesort_getdatum() + caller context issue), and it seems sensible to
treat v10 as a kind of model to follow here.

There are also some disadvantages for this new plan, though:

- There is a slightly awkward question for tuplesort_getdatum() in
9.6: Is tuplesort_getdatum() *always* explicitly copying an acceptable
overhead, given that tuplesort_getdatum() is not known to cause a
crash? I doubt so myself, since tuplesort_getdatum() *always* copies
on Postgres v10+ anyway, and even on 9.6 copying is already the common
case.

- There is a new overhead in 9.5. As I said, 9.6 mostly already copies
anyway, since it already has d8589946d -- 9.5 never got that commit.
This is very similar to the situation we faced about a year ago with
d8589946d on 9.6, since there isn't going to be much extra copying
than the copying that d8589946d already implies. ISTM that d8589946d
set a precedent that makes the situation that this creates for 9.5
okay today.

-- 
Peter Geoghegan



Re: PostgreSQL crashes with SIGSEGV

2018-02-07 Thread Peter Geoghegan
On Wed, Feb 7, 2018 at 4:41 PM, Tom Lane  wrote:
> Peter Geoghegan  writes:
>> It would be nice to get an opinion on this mode_final() + tuplesort
>> memory lifetime business from you, Tom.
>
> I'm fairly sure that that bit in mode_final() was just a hack to make
> it work.  If there's a better, more principled way, let's go for it.

This is the more principled way. It is much easier to make every
single tuplesort caller on every release branch follow this rule (or
those on 9.5+, at least).

My big concern with making tuplesort_getdatum() deliberately copy into
caller's context is that that could introduce memory leaks in caller
code that evolved in a world where tuplesort_end() frees
pass-by-reference datum memory. Seems like the only risky case is
certain ordered set aggregate code, though. And, it's probably just a
matter of fixing the comments. I'll read through the bug report thread
that led up to October's commit be0ebb65 [1] tomorrow, to make sure of
this.

I just noticed that process_ordered_aggregate_single() says that the
behavior I propose for tuplesort_getdatum() is what it *already*
expects:

/*
 * Note: if input type is pass-by-ref, the datums returned by the sort are
 * freshly palloc'd in the per-query context, so we must be careful to
 * pfree them when they are no longer needed.
 */

This supports the idea that ordered set aggregate code is the odd one
out when it comes to beliefs about tuplesort memory contexts. Even
just among tuplesort_getdatum() callers, though even more so among
tuplesort callers in general. One simple rule for all tuplesort fetch
routines is the high-level goal here.

>> Note that you removed the quoted comment within be0ebb65, back in October.
>
> There were multiple instances of basically that same comment before.
> AFAICS I just consolidated them into one place, in the header comment for
> ordered_set_shutdown.

I see. So, to restate my earlier remarks in terms of the newer
organization: The last line in that header comment will need to be
removed, since it will become false under my scheme. The line is:
"Note that many of the finalfns could *not* free the tuplesort object,
at least not without extra data copying, because what they return is a
pointer to a datum inside the tuplesort object".

[1] 
https://www.postgresql.org/message-id/flat/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A%40mail.gmail.com#cab4elo5rzhoamut9xsf72ozbendllxzksk07fisvsujnzb8...@mail.gmail.com
-- 
Peter Geoghegan



Re: PostgreSQL crashes with SIGSEGV

2018-02-07 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Jan 17, 2018 at 2:23 PM, Peter Geoghegan  wrote:
>> A complicating factor for this fix of mine is that mode_final() seems
>> to have its own ideas about tuple memory lifetime, over and above what
>> tuplesort_getdatum() explicitly promises, as can be seen here:
>> /*
>> * Note: we *cannot* clean up the tuplesort object here, because the value
>> * to be returned is allocated inside its sortcontext.  We could use
>> * datumCopy to copy it out of there, but it doesn't seem worth the
>> * trouble, since the cleanup callback will clear the tuplesort later.
>> */

>> ISTM that either grouping sets or mode_final() must necessarily be
>> wrong, because each oversteps, and infers a different contract from
>> tuplesort tuple fetching routines (different assumptions about memory
>> contexts are made in each case). Only one can be right, unless it's
>> okay to have one rule for tuplesort_getdatum() and another for
>> tuplesort_gettupleslot() (which seems questionable to me). I still
>> think that grouping sets is right (and that mode_final() is wrong). Do
>> you?

> It would be nice to get an opinion on this mode_final() + tuplesort
> memory lifetime business from you, Tom.

I'm fairly sure that that bit in mode_final() was just a hack to make
it work.  If there's a better, more principled way, let's go for it.

> Note that you removed the quoted comment within be0ebb65, back in October.

There were multiple instances of basically that same comment before.
AFAICS I just consolidated them into one place, in the header comment for
ordered_set_shutdown.

regards, tom lane



Re: PostgreSQL crashes with SIGSEGV

2018-02-06 Thread Peter Geoghegan
On Tue, Feb 6, 2018 at 5:54 PM, Craig Ringer  wrote:
>> In a quick look at the patches, WIP-kludge-fix.patch seems clearly
>> unacceptable for back-patching because it changes the signature and
>> behavior of ExecResetTupleTable, which external code might well be using.

> Definitely is using, in the case of BDR and pglogical. But we can patch in a
> version check easily enough.

That won't be necessary. The WIP-kludge-fix.patch approach is never
going to be used, and was only really posted for illustrative
purposes.

-- 
Peter Geoghegan



Re: PostgreSQL crashes with SIGSEGV

2018-02-06 Thread Craig Ringer
On 18 January 2018 at 03:23, Tom Lane  wrote:

> Aleksandr Parfenov  writes:
> > The new status of this patch is: Ready for Committer
>
> I don't feel particularly comfortable committing a patch that
> was clearly labeled as a rushed draft by its author.
> Peter, where do you stand on this work?
>
> In a quick look at the patches, WIP-kludge-fix.patch seems clearly
> unacceptable for back-patching because it changes the signature and
> behavior of ExecResetTupleTable, which external code might well be using.
>


Definitely is using, in the case of BDR and pglogical. But we can patch in
a version check easily enough.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: PostgreSQL crashes with SIGSEGV

2018-02-06 Thread Peter Geoghegan
On Wed, Jan 17, 2018 at 2:23 PM, Peter Geoghegan  wrote:
> A complicating factor for this fix of mine is that mode_final() seems
> to have its own ideas about tuple memory lifetime, over and above what
> tuplesort_getdatum() explicitly promises, as can be seen here:
>
> /*
>  * Note: we *cannot* clean up the tuplesort object here, because the value
>  * to be returned is allocated inside its sortcontext.  We could use
>  * datumCopy to copy it out of there, but it doesn't seem worth the
>  * trouble, since the cleanup callback will clear the tuplesort later.
>  */

> ISTM that either grouping sets or mode_final() must necessarily be
> wrong, because each oversteps, and infers a different contract from
> tuplesort tuple fetching routines (different assumptions about memory
> contexts are made in each case). Only one can be right, unless it's
> okay to have one rule for tuplesort_getdatum() and another for
> tuplesort_gettupleslot() (which seems questionable to me). I still
> think that grouping sets is right (and that mode_final() is wrong). Do
> you?

It would be nice to get an opinion on this mode_final() + tuplesort
memory lifetime business from you, Tom.

Note that you removed the quoted comment within be0ebb65, back in October.

-- 
Peter Geoghegan



Re: PostgreSQL crashes with SIGSEGV

2018-01-17 Thread Peter Geoghegan
On Wed, Jan 17, 2018 at 1:00 PM, Tom Lane  wrote:
>> You could make the same objection to changing tuplesort_getdatum()
>> outside of the master branch, though. I think that going back further
>> than that for the (arguably independent) tuplesort_getdatum() subset
>> fix might still be a good idea. I wonder where you stand on this.
>
> I haven't been following the thread very closely, so I don't have an
> opinion on that.

A complicating factor for this fix of mine is that mode_final() seems
to have its own ideas about tuple memory lifetime, over and above what
tuplesort_getdatum() explicitly promises, as can be seen here:

/*
 * Note: we *cannot* clean up the tuplesort object here, because the value
 * to be returned is allocated inside its sortcontext.  We could use
 * datumCopy to copy it out of there, but it doesn't seem worth the
 * trouble, since the cleanup callback will clear the tuplesort later.
 */

My WIP-tuplesort-memcontext-fix.patch fix is premised on the idea that
nodeAgg.c/grouping sets got it right: nodeAgg.c should be able to
continue to assume that in "owning" the memory used for a tuple (in a
table slot), it has it in its own memory context -- otherwise, the
whole tts_shouldFree tuple slot mechanism is prone to double-frees.
This comment directly contradicts/undermines that premise.

ISTM that either grouping sets or mode_final() must necessarily be
wrong, because each oversteps, and infers a different contract from
tuplesort tuple fetching routines (different assumptions about memory
contexts are made in each case). Only one can be right, unless it's
okay to have one rule for tuplesort_getdatum() and another for
tuplesort_gettupleslot() (which seems questionable to me). I still
think that grouping sets is right (and that mode_final() is wrong). Do
you?

-- 
Peter Geoghegan



Re: PostgreSQL crashes with SIGSEGV

2018-01-17 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Jan 17, 2018 at 12:31 PM, Tom Lane  wrote:
>> +1.  If the problem isn't known to be reproducible in those branches,
>> the risk of adding new bugs seems to outweigh any benefit.

> You could make the same objection to changing tuplesort_getdatum()
> outside of the master branch, though. I think that going back further
> than that for the (arguably independent) tuplesort_getdatum() subset
> fix might still be a good idea. I wonder where you stand on this.

I haven't been following the thread very closely, so I don't have an
opinion on that.

regards, tom lane



Re: PostgreSQL crashes with SIGSEGV

2018-01-17 Thread Peter Geoghegan
On Wed, Jan 17, 2018 at 12:31 PM, Tom Lane  wrote:
> Probably not very.  It'd be nice to have it done by the next minor
> releases, ie before 5-Feb ... but given that these bugs are years
> old, missing that deadline would not be catastrophic.

Got it.

>> I'm not sure whether or not we should also apply this
>> still-to-be-written 9.5 patch to 9.4 and 9.3, since those versions
>> don't have grouping sets, and so cannot crash. ISTM that we should
>> leave them alone, since tuplesort has had this problem forever.
>
> +1.  If the problem isn't known to be reproducible in those branches,
> the risk of adding new bugs seems to outweigh any benefit.

You could make the same objection to changing tuplesort_getdatum()
outside of the master branch, though. I think that going back further
than that for the (arguably independent) tuplesort_getdatum() subset
fix might still be a good idea. I wonder where you stand on this.

-- 
Peter Geoghegan



Re: PostgreSQL crashes with SIGSEGV

2018-01-17 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Jan 17, 2018 at 11:23 AM, Tom Lane  wrote:
>> I don't feel particularly comfortable committing a patch that
>> was clearly labeled as a rushed draft by its author.
>> Peter, where do you stand on this work?

> I would like to take another pass over
> WIP-tuplesort-memcontext-fix.patch, to be on the safe side. I'm
> currently up to my neck in parallel CREATE INDEX work, though, and
> would prefer to avoid context switching for a week or two, if
> possible. How time sensitive do you think this is?

Probably not very.  It'd be nice to have it done by the next minor
releases, ie before 5-Feb ... but given that these bugs are years
old, missing that deadline would not be catastrophic.

> I'm not sure whether or not we should also apply this
> still-to-be-written 9.5 patch to 9.4 and 9.3, since those versions
> don't have grouping sets, and so cannot crash. ISTM that we should
> leave them alone, since tuplesort has had this problem forever.

+1.  If the problem isn't known to be reproducible in those branches,
the risk of adding new bugs seems to outweigh any benefit.

regards, tom lane



Re: PostgreSQL crashes with SIGSEGV

2018-01-17 Thread Peter Geoghegan
On Wed, Jan 17, 2018 at 11:23 AM, Tom Lane  wrote:
> Aleksandr Parfenov  writes:
>> The new status of this patch is: Ready for Committer
>
> I don't feel particularly comfortable committing a patch that
> was clearly labeled as a rushed draft by its author.
> Peter, where do you stand on this work?

I would like to take another pass over
WIP-tuplesort-memcontext-fix.patch, to be on the safe side. I'm
currently up to my neck in parallel CREATE INDEX work, though, and
would prefer to avoid context switching for a week or two, if
possible. How time sensitive do you think this is?

I think we'll end up doing this:

* Committing the minimal modifications made (in both WIP patches) to
tuplesort_getdatum() to both v10 and master branches.
tuplesort_getdatum() must follow the example of
tuplesort_gettupleslot() on these branches, since
tuplesort_gettupleslot() already manages to get everything right in
recent releases. (There is no known tuplesort_getdatum() crash on
these versions, but this still seems necessary on general principle.)

* Committing something pretty close to
WIP-tuplesort-memcontext-fix.patch to 9.6.

* Committing another fix to 9.5. This fix will apply the same
principles as WIP-tuplesort-memcontext-fix.patch, but will be simpler
mechanically, since the whole batch memory mechanism added to
tuplesort.c for 9.6 isn't involved.

I'm not sure whether or not we should also apply this
still-to-be-written 9.5 patch to 9.4 and 9.3, since those versions
don't have grouping sets, and so cannot crash. ISTM that we should
leave them alone, since tuplesort has had this problem forever.

Perhaps you should go ahead and commit a patch with just the changes
to tuplesort_getdatum() now. This part seems low risk, and worth doing
in a single, (almost) consistent pass over the back branches. This
part is a trivial backpatch, and could be thought of as an independent
problem. (It will be nice to get v10 and master branches completely
out of the way quickly.)

> In a quick look at the patches, WIP-kludge-fix.patch seems clearly
> unacceptable for back-patching because it changes the signature and
> behavior of ExecResetTupleTable, which external code might well be using.

The signature change isn't required, and it was silly of me to add it.
But I also really dislike the general approach it takes, and mostly
posted it because I thought that it was a useful counterpoint.

-- 
Peter Geoghegan



Re: PostgreSQL crashes with SIGSEGV

2018-01-17 Thread Tom Lane
Aleksandr Parfenov  writes:
> The new status of this patch is: Ready for Committer

I don't feel particularly comfortable committing a patch that
was clearly labeled as a rushed draft by its author.
Peter, where do you stand on this work?

In a quick look at the patches, WIP-kludge-fix.patch seems clearly
unacceptable for back-patching because it changes the signature and
behavior of ExecResetTupleTable, which external code might well be using.

regards, tom lane



Re: PostgreSQL crashes with SIGSEGV

2018-01-15 Thread Aleksandr Parfenov
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi,

All information is related to WIP-tuplesort-memcontext-fix.patch.
The patch applies and fixes the bug on REL9_6_STABLE and LGTM.

However, REL9_5_STABLE affected as well and the patch doesn't applicable to it.
But it may be another entry because the difference in tuplesort may cause some 
changes in the fix too.

The new status of this patch is: Ready for Committer