subject:"\\\[HACKERS\\\] PoC\\\: Partial sort"

Re: [HACKERS] PoC: Partial sort

2017-01-30 Thread Michael Paquier

On Mon, Dec 5, 2016 at 2:04 PM, Haribabu Kommi  wrote:
>
>
> On Fri, Dec 2, 2016 at 4:05 AM, Robert Haas  wrote:
>>
>> On Tue, Sep 13, 2016 at 4:32 AM, Alexander Korotkov
>>  wrote:
>> > On Fri, Apr 8, 2016 at 10:09 PM, Peter Geoghegan  wrote:
>> >> On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov
>> >>  wrote:
>> >> > Hmm... I'm not completely agree with that. In typical usage partial
>> >> > sort
>> >> > should definitely use quicksort.  However, fallback to other sort
>> >> > methods is
>> >> > very useful.  Decision of partial sort usage is made by planner.  But
>> >> > planner makes mistakes.  For example, our HashAggregate is purely
>> >> > in-memory.
>> >> > In the case of planner mistake it causes OOM.  I met such situation
>> >> > in
>> >> > production and not once.  This is why I'd like partial sort to have
>> >> > graceful
>> >> > degradation for such cases.
>> >>
>> >> I think that this should be moved to the next CF, unless a committer
>> >> wants to pick it up today.
>> >
>> > Patch was rebased to current master.
>>
>> Just a few quick observations on this...
>>
>> It strikes me that the API contract change in ExecMaterializesOutput
>> is pretty undesirable.  I think it would be better to have a new
>> executor node for this node rather than overloading the existing
>> "Sort" node, sharing code where possible of course.  The fact that
>> this would distinguish them more clearly in an EXPLAIN plan seems
>> good, too.  "Partial Sort" is the obvious thing, but there might be
>> even better alternatives -- maybe "Incremental Sort" or something like
>> that?  Because it's not partially sorting the data, it's making data
>> that already has some sort order have a more rigorous sort order.
>>
>> I think that it will probably be pretty common to have queries where
>> the data is sorted by (mostly_unique_col) and we want to get it sorted
>> by (mostly_unique_col, disambiguation_col).  In such cases I wonder if
>> we'll incur a lot of overhead by feeding single tuples to the
>> tuplesort stuff and performing lots of 1-item sorts.  Not sure if that
>> case needs any special optimization.
>>
>> I also think that the "HeapTuple prev" bit in SortState is probably
>> not the right way of doing things.  I think that should use an
>> additional TupleTableSlot rather than a HeapTuple.
>>
>
> The feedback from the reviewer has received at the end of commitfest.
> So Moved to next CF with "waiting on author" status.

This patch is on its 6th commit fest now. As the thread has died and
as feedback has been provided but not answered I am marking it as
returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-12-04 Thread Haribabu Kommi

On Fri, Dec 2, 2016 at 4:05 AM, Robert Haas  wrote:

> On Tue, Sep 13, 2016 at 4:32 AM, Alexander Korotkov
>  wrote:
> > On Fri, Apr 8, 2016 at 10:09 PM, Peter Geoghegan  wrote:
> >> On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov
> >>  wrote:
> >> > Hmm... I'm not completely agree with that. In typical usage partial
> sort
> >> > should definitely use quicksort.  However, fallback to other sort
> >> > methods is
> >> > very useful.  Decision of partial sort usage is made by planner.  But
> >> > planner makes mistakes.  For example, our HashAggregate is purely
> >> > in-memory.
> >> > In the case of planner mistake it causes OOM.  I met such situation in
> >> > production and not once.  This is why I'd like partial sort to have
> >> > graceful
> >> > degradation for such cases.
> >>
> >> I think that this should be moved to the next CF, unless a committer
> >> wants to pick it up today.
> >
> > Patch was rebased to current master.
>
> Just a few quick observations on this...
>
> It strikes me that the API contract change in ExecMaterializesOutput
> is pretty undesirable.  I think it would be better to have a new
> executor node for this node rather than overloading the existing
> "Sort" node, sharing code where possible of course.  The fact that
> this would distinguish them more clearly in an EXPLAIN plan seems
> good, too.  "Partial Sort" is the obvious thing, but there might be
> even better alternatives -- maybe "Incremental Sort" or something like
> that?  Because it's not partially sorting the data, it's making data
> that already has some sort order have a more rigorous sort order.
>
> I think that it will probably be pretty common to have queries where
> the data is sorted by (mostly_unique_col) and we want to get it sorted
> by (mostly_unique_col, disambiguation_col).  In such cases I wonder if
> we'll incur a lot of overhead by feeding single tuples to the
> tuplesort stuff and performing lots of 1-item sorts.  Not sure if that
> case needs any special optimization.
>
> I also think that the "HeapTuple prev" bit in SortState is probably
> not the right way of doing things.  I think that should use an
> additional TupleTableSlot rather than a HeapTuple.
>
>
The feedback from the reviewer has received at the end of commitfest.
So Moved to next CF with "waiting on author" status.


Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] PoC: Partial sort

2016-12-01 Thread Peter Geoghegan

On Mon, Nov 21, 2016 at 11:04 PM, Haribabu Kommi
 wrote:
> you assigned as reviewer to the current patch in the 11-2016 commitfest.
> But you haven't shared your review yet in this commitfest on the latest
> patch posted by the author. If you don't have any comments on the patch,
> please move the patch into "ready for committer" state to get committer's
> attention. This will help us in smoother operation of commitfest.

Sorry for the delay on this.

I agree with Robert's remarks today on TupleTableSlot, and would like
to see a revision that does that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-12-01 Thread Robert Haas

On Tue, Sep 13, 2016 at 4:32 AM, Alexander Korotkov
 wrote:
> On Fri, Apr 8, 2016 at 10:09 PM, Peter Geoghegan  wrote:
>> On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov
>>  wrote:
>> > Hmm... I'm not completely agree with that. In typical usage partial sort
>> > should definitely use quicksort.  However, fallback to other sort
>> > methods is
>> > very useful.  Decision of partial sort usage is made by planner.  But
>> > planner makes mistakes.  For example, our HashAggregate is purely
>> > in-memory.
>> > In the case of planner mistake it causes OOM.  I met such situation in
>> > production and not once.  This is why I'd like partial sort to have
>> > graceful
>> > degradation for such cases.
>>
>> I think that this should be moved to the next CF, unless a committer
>> wants to pick it up today.
>
> Patch was rebased to current master.

Just a few quick observations on this...

It strikes me that the API contract change in ExecMaterializesOutput
is pretty undesirable.  I think it would be better to have a new
executor node for this node rather than overloading the existing
"Sort" node, sharing code where possible of course.  The fact that
this would distinguish them more clearly in an EXPLAIN plan seems
good, too.  "Partial Sort" is the obvious thing, but there might be
even better alternatives -- maybe "Incremental Sort" or something like
that?  Because it's not partially sorting the data, it's making data
that already has some sort order have a more rigorous sort order.

I think that it will probably be pretty common to have queries where
the data is sorted by (mostly_unique_col) and we want to get it sorted
by (mostly_unique_col, disambiguation_col).  In such cases I wonder if
we'll incur a lot of overhead by feeding single tuples to the
tuplesort stuff and performing lots of 1-item sorts.  Not sure if that
case needs any special optimization.

I also think that the "HeapTuple prev" bit in SortState is probably
not the right way of doing things.  I think that should use an
additional TupleTableSlot rather than a HeapTuple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-11-21 Thread Haribabu Kommi

Hi Peter,


This is a gentle reminder.

you assigned as reviewer to the current patch in the 11-2016 commitfest.
But you haven't shared your review yet in this commitfest on the latest
patch posted by the author. If you don't have any comments on the patch,
please move the patch into "ready for committer" state to get committer's
attention. This will help us in smoother operation of commitfest.

Please Ignore if you already shared your review.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] PoC: Partial sort

2016-10-02 Thread Michael Paquier

On Tue, Sep 13, 2016 at 5:32 PM, Alexander Korotkov
 wrote:
> On Fri, Apr 8, 2016 at 10:09 PM, Peter Geoghegan  wrote:
>>
>> On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov
>>  wrote:
>> > Hmm... I'm not completely agree with that. In typical usage partial sort
>> > should definitely use quicksort.  However, fallback to other sort
>> > methods is
>> > very useful.  Decision of partial sort usage is made by planner.  But
>> > planner makes mistakes.  For example, our HashAggregate is purely
>> > in-memory.
>> > In the case of planner mistake it causes OOM.  I met such situation in
>> > production and not once.  This is why I'd like partial sort to have
>> > graceful
>> > degradation for such cases.
>>
>> I think that this should be moved to the next CF, unless a committer
>> wants to pick it up today.
>
>
> Patch was rebased to current master.

Applies on HEAD at e8bdee27 and passes make-check, now I am seeing
zero documentation so it is a bit hard to see what this patch is
achieving without reading the thread.

$ git diff master --check
src/backend/optimizer/prep/prepunion.c:967: trailing whitespace.
+   cost_sort(_p, root, NIL, 0,
src/backend/utils/sort/tuplesort.c:1244: trailing whitespace.
+ * tuplesort_updatemax

+ * Returns true if the plan node isautomatically materializes its output
Typo here.

Still, this has received to reviews, so moved to next CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-09-13 Thread Alexander Korotkov

On Fri, Apr 8, 2016 at 10:09 PM, Peter Geoghegan  wrote:

> On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov
>  wrote:
> > Hmm... I'm not completely agree with that. In typical usage partial sort
> > should definitely use quicksort.  However, fallback to other sort
> methods is
> > very useful.  Decision of partial sort usage is made by planner.  But
> > planner makes mistakes.  For example, our HashAggregate is purely
> in-memory.
> > In the case of planner mistake it causes OOM.  I met such situation in
> > production and not once.  This is why I'd like partial sort to have
> graceful
> > degradation for such cases.
>
> I think that this should be moved to the next CF, unless a committer
> wants to pick it up today.
>

Patch was rebased to current master.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


partial-sort-basic-9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-04-08 Thread Peter Geoghegan

On Wed, Mar 30, 2016 at 8:02 AM, Alexander Korotkov
 wrote:
> Hmm... I'm not completely agree with that. In typical usage partial sort
> should definitely use quicksort.  However, fallback to other sort methods is
> very useful.  Decision of partial sort usage is made by planner.  But
> planner makes mistakes.  For example, our HashAggregate is purely in-memory.
> In the case of planner mistake it causes OOM.  I met such situation in
> production and not once.  This is why I'd like partial sort to have graceful
> degradation for such cases.

I think that this should be moved to the next CF, unless a committer
wants to pick it up today.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-03-30 Thread Alexander Korotkov

Hi, Peter!

Thank you for review!

On Thu, Mar 24, 2016 at 3:39 AM, Peter Geoghegan  wrote:

> >> Sort Method
> >> 
> >>
> >> Even thought the explain analyze above shows "top-N heapsort" as its
> >> sort method, that isn't really true. I actually ran this through a
> >> debugger, which is why the above plan took so long to execute, in case
> >> you wondered. I saw that in practice the first sort executed for the
> >> first group uses a quicksort, while only the second sort (needed for
> >> the 2 and last group in this example) used a top-N heapsort.
>
> > With partial sort we run multiple sorts in the same node. Ideally, we
> need
> > to provide some aggregated information over runs.
> > This situation looks very similar to subplan which is called multiple
> times.
> > I checked how it works for now.
>
> Noticed this in nodeSort.c:
>
> +   if (node->tuplesortstate != NULL)
> +   {
> +   tuplesort_reset((Tuplesortstate *) node->tuplesortstate);
> +   node->groupsCount++;
> +   }
> +   else
> +   {
> +   /* Support structures for cmpSortSkipCols - already
> sorted columns */
> +   if (skipCols)
> +   prepareSkipCols(plannode, node);
>
> +   /*
> +* Only pass on remaining columns that are unsorted.
> Skip abbreviated
> +* keys usage for partial sort.  We unlikely will have
> huge groups
> +* with partial sort.  Therefore usage of abbreviated
> keys would be
> +* likely a waste of time.
> +*/
> tuplesortstate = tuplesort_begin_heap(tupDesc,
>
> You should comment on which case is which, and put common case (no
> skip cols) first. Similarly, the ExecSort() for(;;) should put the
> common (non-partial) case first, which it does, but then the "first
> tuple in partial sort" case first, then the "second or subsequent
> partial sort" case last.
>

Done.

More comments here, please:
>
> +typedef struct SkipKeyData
> +{
> + FunctionCallInfoData fcinfo;
> + FmgrInfo flinfo;
> + OffsetNumber attno;
> +} SkipKeyData;
>
> (What's SkipKeyData?)
>
> Also want comments for new SortState fields.


Done.


> SortState.prev is a
> palloc()'d copy of tuple, which should be directly noted, as it is for
> similar aggregate cases, etc.
>
> Should you be more aggressive about freeing memory allocated for
> SortState.prev tuples?
>

Fixed.


> The new function cmpSortSkipCols() should say "Special case for
> NULL-vs-NULL, else use standard comparison", or something. "Lets
> pretend NULL is a value for implementation convenience" cases are
> considered the exception, and are always noted as the exception.
>

Comment is added.


> > In the case of subplan explain analyze gives us just information about
> last
> > subplan run. This makes me uneasy. From one side, it's probably OK that
> > partial sort behaves like subplan while showing information just about
> last
> > sort run. From the other side, we need some better solution for that in
> > general case.
>
> I see what you mean, but I wasn't so much complaining about that, as
> complaining about the simple fact that we use a top-N heap sort *at
> all*. This feels like the "limit" case is playing with partial sort
> sub-sorts in a way that it shouldn't.
>
> I see you have code like this to make this work:
>
> +   /*
> +* Adjust bound_Done with number of tuples we've actually sorted.
> +*/
> +   if (node->bounded)
> +   {
> +   if (node->finished)
> +   node->bound_Done = node->bound;
> +   else
> +   node->bound_Done = Min(node->bound,
> node->bound_Done + nTuples);
>
> But, why bother? Why not simply prevent tuplesort.c from ever using
> the top-N heapsort method when it is called from nodeSort.c for a
> partial sort (probably in the planner)?
>
> Why, at a high level, does it make sense to pass down a limit to *any*
> sort operation that makes up a partial sort, even the last? This seems
> to be adding complexity without a benefit. A big advantage of top-N
> heapsorts is that much less memory could be used, but this patch
> already has the memory allocated that belonged to previous performsort
> calls (mostly -- certainly has all those tuplesort.c memtuples
> throughout, a major user of memory overall).  It's not going to be
> very good at preventing work, except almost by accident because we
> happen to have a limit up to just past the beginning of a smaller
> partial sort group. I'd rather use quicksort, which is very versatile.
> Top-N sorts make sense when sorting itself is the bottleneck, which it
> probably won't be for a partial sort (that's the whole point).
>

Hmm... I'm not completely agree with that. In typical usage partial sort
should definitely use quicksort.  However, fallback to other sort methods
is very useful.  Decision of partial sort usage is

Re: [HACKERS] PoC: Partial sort

2016-03-29 Thread Alexander Korotkov

On Tue, Mar 29, 2016 at 4:56 PM, David Steele  wrote:

> On 3/23/16 8:39 PM, Peter Geoghegan wrote:
>
> This looks like an old change you missed:
>>
>> - * compare_path_fractional_costs
>> + * compare_fractional_path_costs
>>
>> All in all, this looks significantly better. Thanks for your work on
>> this. Sorry for the delay in my response, and that my review was
>> relatively rushed, but I'm rather busy at the moment with fighting
>> fires.
>>
>
> It looks like a new patch is required before this can be marked "ready for
> committer".  Will you have that ready soon?
>

Yes, that's it.  I'm working on it now.  I'm going to post it until
tomorrow.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] PoC: Partial sort

2016-03-29 Thread David Steele


Hi Alexander,

On 3/23/16 8:39 PM, Peter Geoghegan wrote:


This looks like an old change you missed:

- * compare_path_fractional_costs
+ * compare_fractional_path_costs

All in all, this looks significantly better. Thanks for your work on
this. Sorry for the delay in my response, and that my review was
relatively rushed, but I'm rather busy at the moment with fighting
fires.


It looks like a new patch is required before this can be marked "ready 
for committer".  Will you have that ready soon?


Thanks,
--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-03-23 Thread Peter Geoghegan

Hi,

On Tue, Mar 1, 2016 at 7:06 AM, Alexander Korotkov  wrote:
> I finally went over your review.

I'll respond to your points here. Note that I'm reviewing
"partial-sort-basic-7.patch", which you sent on March 13. I respond
here because this is where you answered my questions (I had no
feedback on "partial-sort-basic-6.patch", which didn't use the new
upper planner pathification stuff, unlike this latest version).

> On Wed, Nov 4, 2015 at 4:44 AM, Peter Geoghegan  wrote:
>>
>> Explain output
>> ---

>> I think it might be a good idea to also have a "Sort Groups: 2" field
>> above. That illustrates that you are in fact performing 2 small sorts
>> per group, which is the reality. As you said, it's good to have this
>> be high, because then the sort operations don't need to do too many
>> comparisons, which could be expensive.
>
>
> I agree with your notes. In the attached version of path explain output was
> revised as you proposed.

Cool.

>> Sort Method
>> 
>>
>> Even thought the explain analyze above shows "top-N heapsort" as its
>> sort method, that isn't really true. I actually ran this through a
>> debugger, which is why the above plan took so long to execute, in case
>> you wondered. I saw that in practice the first sort executed for the
>> first group uses a quicksort, while only the second sort (needed for
>> the 2 and last group in this example) used a top-N heapsort.

> With partial sort we run multiple sorts in the same node. Ideally, we need
> to provide some aggregated information over runs.
> This situation looks very similar to subplan which is called multiple times.
> I checked how it works for now.

Noticed this in nodeSort.c:

+   if (node->tuplesortstate != NULL)
+   {
+   tuplesort_reset((Tuplesortstate *) node->tuplesortstate);
+   node->groupsCount++;
+   }
+   else
+   {
+   /* Support structures for cmpSortSkipCols - already
sorted columns */
+   if (skipCols)
+   prepareSkipCols(plannode, node);

+   /*
+* Only pass on remaining columns that are unsorted.
Skip abbreviated
+* keys usage for partial sort.  We unlikely will have
huge groups
+* with partial sort.  Therefore usage of abbreviated
keys would be
+* likely a waste of time.
+*/
tuplesortstate = tuplesort_begin_heap(tupDesc,

You should comment on which case is which, and put common case (no
skip cols) first. Similarly, the ExecSort() for(;;) should put the
common (non-partial) case first, which it does, but then the "first
tuple in partial sort" case first, then the "second or subsequent
partial sort" case last.

More comments here, please:

+typedef struct SkipKeyData
+{
+ FunctionCallInfoData fcinfo;
+ FmgrInfo flinfo;
+ OffsetNumber attno;
+} SkipKeyData;

(What's SkipKeyData?)

Also want comments for new SortState fields. SortState.prev is a
palloc()'d copy of tuple, which should be directly noted, as it is for
similar aggregate cases, etc.

Should you be more aggressive about freeing memory allocated for
SortState.prev tuples?

The new function cmpSortSkipCols() should say "Special case for
NULL-vs-NULL, else use standard comparison", or something. "Lets
pretend NULL is a value for implementation convenience" cases are
considered the exception, and are always noted as the exception.

> In the case of subplan explain analyze gives us just information about last
> subplan run. This makes me uneasy. From one side, it's probably OK that
> partial sort behaves like subplan while showing information just about last
> sort run. From the other side, we need some better solution for that in
> general case.

I see what you mean, but I wasn't so much complaining about that, as
complaining about the simple fact that we use a top-N heap sort *at
all*. This feels like the "limit" case is playing with partial sort
sub-sorts in a way that it shouldn't.

I see you have code like this to make this work:

+   /*
+* Adjust bound_Done with number of tuples we've actually sorted.
+*/
+   if (node->bounded)
+   {
+   if (node->finished)
+   node->bound_Done = node->bound;
+   else
+   node->bound_Done = Min(node->bound,
node->bound_Done + nTuples);

But, why bother? Why not simply prevent tuplesort.c from ever using
the top-N heapsort method when it is called from nodeSort.c for a
partial sort (probably in the planner)?

Why, at a high level, does it make sense to pass down a limit to *any*
sort operation that makes up a partial sort, even the last? This seems
to be adding complexity without a benefit. A big advantage of top-N
heapsorts is that much less memory could be used, but this patch
already has the memory allocated that belonged to previous performsort
calls (mostly -- certainly

Re: [HACKERS] PoC: Partial sort

2016-03-13 Thread Alexander Korotkov

Hi!

Tom committed upper planner pathification patch.
Partial sort patch rebased to master is attached.  It was quite huge rebase
in planner part of the patch.  But I think now patch becomes better, much
more logical.
It's probably, something was missed after rebase.  I'm going to examine
this path more carefully next week.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


partial-sort-basic-7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-02-15 Thread Peter Geoghegan

On Sun, Jan 31, 2016 at 4:16 AM, Alvaro Herrera
 wrote:
> Great to have a new version -- there seems to be a lot of interest in
> this patch.  I'm moving this one to the next commitfest, thanks.

I am signed up to review this patch.

I was very surprised to see it in "Needs Review" state in the CF app
(Alexander just rebased the patch, and didn't do anything with the CF
app entry). Once again, this seems to have happened just because
Alvaro moved the patch to the next CF.

I've marked it "Waiting on Author" once more. Hopefully the CF app
will be fixed soon, so moving a patch to the next commitfest no longer
clobbers its state.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-01-31 Thread Alvaro Herrera

Alexander Korotkov wrote:
 
> I'm sorry that I didn't found time for this yet. I'm certainly planning to
> get back to this in near future. The attached version is just rebased
> without any optimization.

Great to have a new version -- there seems to be a lot of interest in
this patch.  I'm moving this one to the next commitfest, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-01-24 Thread Alexander Korotkov

Hi, Tomas!

On Sat, Jan 23, 2016 at 3:07 PM, Tomas Vondra 
wrote:

> On 10/20/2015 01:17 PM, Alexander Korotkov wrote:
>
>> On Fri, Oct 16, 2015 at 7:11 PM, Alexander Korotkov
>> > wrote:
>>
>> On Sun, Jun 7, 2015 at 11:01 PM, Peter Geoghegan > > wrote:
>>
>> On Sun, Jun 7, 2015 at 8:10 AM, Andreas Karlsson
>> > wrote:
>> > Are you planning to work on this patch for 9.6?
>>
>> FWIW I hope so. It's a nice patch.
>>
>>
>> I'm trying to to whisk dust. Rebased version of patch is attached.
>> This patch isn't passing regression tests because of plan changes.
>> I'm not yet sure about those changes: why they happens and are they
>> really regression?
>> Since I'm not very familiar with planning of INSERT ON CONFLICT and
>> RLS, any help is appreciated.
>>
>>
>> Planner regression is fixed in the attached version of patch. It appears
>> that get_cheapest_fractional_path_for_pathkeys() behaved wrong when no
>> ordering is required.
>>
>>
> Alexander, are you working on this patch? I'd like to look at the patch,
> but the last available version (v4) no longer applies - there's plenty of
> bitrot. Do you plan to send an updated / rebased version?
>

I'm sorry that I didn't found time for this yet. I'm certainly planning to
get back to this in near future. The attached version is just rebased
without any optimization.

The main thing I'm particularly interested in is how much is this coupled
> with the Sort node, and whether it's possible to feed partially sorted
> tuples into other nodes.
>
> I'm particularly thinking about Hash Aggregate, because the partial sort
> allows to keep only the "current group" in a hash table, making it much
> more memory efficient / faster. What do you think?
>

This seems to me very reasonable optimization. And it would be nice to
implement some generalized way of presorted group processing. For instance,
we could have some special node, say "Group Scan" which have 2 children:
source and node which process every group. For "partial sort" the second
node would be Sort node. But it could be Hash Aggregate node as well.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


partial-sort-basic-5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-01-24 Thread Alexander Korotkov

Hi!

On Sat, Jan 23, 2016 at 10:07 PM, Peter Geoghegan  wrote:

> On Sat, Jan 23, 2016 at 4:07 AM, Tomas Vondra
>  wrote:
> > The main thing I'm particularly interested in is how much is this coupled
> > with the Sort node, and whether it's possible to feed partially sorted
> > tuples into other nodes.
>
> That's cool, but I'm particularly interested in seeing Alexander get
> back to this because it's an important project on its own. We should
> really have this.
>

Thank you for your review and interest in this patch! I'm sorry for huge
delay I made. I'm going to get back to this soon.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] PoC: Partial sort

2016-01-23 Thread Tomas Vondra

Hi,

On 10/20/2015 01:17 PM, Alexander Korotkov wrote:

On Fri, Oct 16, 2015 at 7:11 PM, Alexander Korotkov
> wrote:

On Sun, Jun 7, 2015 at 11:01 PM, Peter Geoghegan > wrote:

On Sun, Jun 7, 2015 at 8:10 AM, Andreas Karlsson
> wrote:
> Are you planning to work on this patch for 9.6?

FWIW I hope so. It's a nice patch.

I'm trying to to whisk dust. Rebased version of patch is attached.
This patch isn't passing regression tests because of plan changes.
I'm not yet sure about those changes: why they happens and are they
really regression?
Since I'm not very familiar with planning of INSERT ON CONFLICT and
RLS, any help is appreciated.

Planner regression is fixed in the attached version of patch. It appears
that get_cheapest_fractional_path_for_pathkeys() behaved wrong when no
ordering is required.

Alexander, are you working on this patch? I'd like to look at the patch, 
but the last available version (v4) no longer applies - there's plenty 
of bitrot. Do you plan to send an updated / rebased version?

The main thing I'm particularly interested in is how much is this 
coupled with the Sort node, and whether it's possible to feed partially 
sorted tuples into other nodes.

I'm particularly thinking about Hash Aggregate, because the partial sort 
allows to keep only the "current group" in a hash table, making it much 
more memory efficient / faster. What do you think?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-01-23 Thread Peter Geoghegan

On Sat, Jan 23, 2016 at 4:07 AM, Tomas Vondra
 wrote:
> The main thing I'm particularly interested in is how much is this coupled
> with the Sort node, and whether it's possible to feed partially sorted
> tuples into other nodes.

That's cool, but I'm particularly interested in seeing Alexander get
back to this because it's an important project on its own. We should
really have this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2015-11-03 Thread Peter Geoghegan

On Tue, Oct 20, 2015 at 4:17 AM, Alexander Korotkov
 wrote:
> Planner regression is fixed in the attached version of patch. It appears
> that get_cheapest_fractional_path_for_pathkeys() behaved wrong when no
> ordering is required.

I took a look at this. My remarks are not comprehensive, but are just
what I noticed having looked at this for the first time in well over a
year.

Before anything else, I would like to emphasize that I think that this
is pretty important work; it's not just a "nice to have". I'm very
glad you picked it up, because we need it. In the real world, there
will be *lots* of cases that this helps.

Explain output
---

A query like your original test query looks like this for me:

postgres=# explain analyze select * from test order by v1, v2 limit 100;
 QUERY
PLAN

 Limit  (cost=429.54..434.53 rows=100 width=16) (actual
time=15125.819..22414.038 rows=100 loops=1)
   ->  Partial sort  (cost=429.54..50295.52 rows=100 width=16)
(actual time=15125.799..22413.297 rows=100 loops=1)
 Sort Key: v1, v2
 Presorted Key: v1
 Sort Method: top-N heapsort  Memory: 27kB
 ->  Index Scan using test_v1_idx on test
(cost=0.42..47604.43 rows=100 width=16) (actual time=1.663..13.066
rows=151 loops=1)
 Planning time: 0.948 ms
 Execution time: 22414.895 ms
(8 rows)

I thought about it for a while, and decided that you have the basic
shape of the explain output right here. I see where you are going by
having the sort node drive things.

I don't think the node should be called "Partial sort", though. I
think that this is better presented as just a "Sort" node with a
presorted key.

I think it might be a good idea to also have a "Sort Groups: 2" field
above. That illustrates that you are in fact performing 2 small sorts
per group, which is the reality. As you said, it's good to have this
be high, because then the sort operations don't need to do too many
comparisons, which could be expensive.

Sort Method

Even thought the explain analyze above shows "top-N heapsort" as its
sort method, that isn't really true. I actually ran this through a
debugger, which is why the above plan took so long to execute, in case
you wondered. I saw that in practice the first sort executed for the
first group uses a quicksort, while only the second sort (needed for
the 2 and last group in this example) used a top-N heapsort.

Is it really worth using a top-N heapsort to avoid sorting the last
little bit of tuples in the last group? Maybe we should never push
down to an individual sort operation (we have one
tuplesort_performsort() per group) that it should be bounded. Our
quicksort falls back on insertion sort in the event of only having 7
elements (at that level of recursion), so having this almost always
use quicksort may be no bad thing.

Even if you don't like that, the "Sort Method" shown above is just
misleading. I wonder, also, if you need to be more careful about
whether or not "Memory" is really the high watermark, as opposed to
the memory used by the last sort operation of many. There could be
many large tuples in one grouping, for example. Note that the current
code will not show any "Memory" in explain analyze for cases that have
memory freed before execution is done, which this is not consistent
with. Maybe that's not so important. Unsure.

trace_sort output shows that these queries often use a large number of
tiny individual sorts. Maybe that's okay, or maybe we should make it
clearer that the sorts are related. I now use trace_sort a lot.

Abbreviated Keys
---

It could be very bad for performance that the first non-presorted key
uses abbreviated keys. There needs to be a way to tell tuplesort to
not waste its time with them, just as there currently is for bounded
(top-N heapsort) sorts. They're almost certainly the wrong way to go,
unless you have huge groups (where partial sorting is unlikely to win
in the first place).

Other issues in executor

This is sort of an optimizer issue, but code lives in execAmi.c.
Assert is redundant here:

+   case T_Sort:
+   /* We shouldn't reach here without having plan node */
+   Assert(node);
+   /* With skipCols sort node holds only last bucket */
+   if (node && ((Sort *)node)->skipCols == 0)
+   return true;
+   else
+   return false;

I don't like that you've added a Plan node argument to
ExecMaterializesOutput() in this function, too.

There is similar assert/pointer test code within
ExecSupportsBackwardScan() and ExecSupportsMarkRestore(). In general,
I have

Re: [HACKERS] PoC: Partial sort

2015-10-29 Thread Peter Geoghegan

On Tue, Oct 20, 2015 at 4:17 AM, Alexander Korotkov
 wrote:
> Planner regression is fixed in the attached version of patch. It appears
> that get_cheapest_fractional_path_for_pathkeys() behaved wrong when no
> ordering is required.

I don't see an entry in the CF app for this. This seems like something
I should review, though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2015-10-20 Thread Alexander Korotkov

On Fri, Oct 16, 2015 at 7:11 PM, Alexander Korotkov 
wrote:

> On Sun, Jun 7, 2015 at 11:01 PM, Peter Geoghegan  wrote:
>
>> On Sun, Jun 7, 2015 at 8:10 AM, Andreas Karlsson 
>> wrote:
>> > Are you planning to work on this patch for 9.6?
>>
>> FWIW I hope so. It's a nice patch.
>>
>
> I'm trying to to whisk dust. Rebased version of patch is attached. This
> patch isn't passing regression tests because of plan changes. I'm not yet
> sure about those changes: why they happens and are they really regression?
> Since I'm not very familiar with planning of INSERT ON CONFLICT and RLS,
> any help is appreciated.
>

Planner regression is fixed in the attached version of patch. It appears
that get_cheapest_fractional_path_for_pathkeys() behaved wrong when no
ordering is required.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


partial-sort-basic-4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2015-10-16 Thread Alexander Korotkov

On Sun, Jun 7, 2015 at 11:01 PM, Peter Geoghegan  wrote:

> On Sun, Jun 7, 2015 at 8:10 AM, Andreas Karlsson 
> wrote:
> > Are you planning to work on this patch for 9.6?
>
> FWIW I hope so. It's a nice patch.
>

I'm trying to to whisk dust. Rebased version of patch is attached. This
patch isn't passing regression tests because of plan changes. I'm not yet
sure about those changes: why they happens and are they really regression?
Since I'm not very familiar with planning of INSERT ON CONFLICT and RLS,
any help is appreciated.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

partial-sort-basic-3.patch
Description: Binary data

regression.diffs
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2015-06-07 Thread Peter Geoghegan

On Sun, Jun 7, 2015 at 8:10 AM, Andreas Karlsson andr...@proxel.se wrote:
 Are you planning to work on this patch for 9.6?

FWIW I hope so. It's a nice patch.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2015-06-07 Thread Andreas Karlsson


On 09/15/2014 01:58 PM, Alexander Korotkov wrote:

On Sun, Sep 14, 2014 at 9:32 AM, Peter Geoghegan p...@heroku.com
mailto:p...@heroku.com wrote:

I think we might be better off if a tuplesort function was called
shortly after tuplesort_begin_heap() is called. How top-n heap sorts
work is something that largely lives in tuplesort's head. Today, we
call tuplesort_set_bound() to hint to tuplesort By the way, this is a
top-n heap sort applicable sort. I think that with this patch, we
should then hint (where applicable) by the way, you won't actually be
required to sort those first n indexed attributes; rather, you can
expect to scan those in logical order. You may work the rest out
yourself, and may be clever about exploiting the sorted-ness of the
first few columns. The idea of managing a bunch of tiny sorts from
with ExecSort(), and calling the new function tuplesort_reset() seems
questionable. tuplesortstate is supposed to be private/opaque to
nodeSort.c, and the current design strains that.

I'd like to keep nodeSort.c simple. I think it's pretty clear that the
guts of this do not belong within ExecSort(), in any case. Also, the
additions there should be much better commented, wherever they finally
end up.


As I understand, you propose to incapsulate partial sort algorithm into
tuplesort. However, in this case we anyway need some significant change
of its interface: let tuplesort decide when it's able to return tuple.
Otherwise, we would miss significant part of LIMIT clause optimization.
tuplesort_set_bound() can't solve all the cases. There could be other
planner nodes between the partial sort and LIMIT.


Hi,

Are you planning to work on this patch for 9.6?

I generally agree with Peter that the changes to the sorting probably 
belong in the tuplesort code rather than in the executor. This way it 
should also be theoretically possible to support mark/restore.


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-09-15 Thread Alexander Korotkov

On Sun, Sep 14, 2014 at 7:39 AM, Peter Geoghegan p...@heroku.com wrote:

 On Fri, Sep 12, 2014 at 2:19 PM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  Actually, higher cardinality skip columns is better. Sorting of smaller
  groups is faster than sorting larger groups of same size. Also, with
 smaller
  groups you achieve limit more accurate (in average), i.e. sort smaller
  amount of total rows.

 Higher cardinality leading key columns are better for what? Do you
 mean they're better in that those cases are more sympathetic to your
 patch, or better in the general sense that they'll perform better for
 the user? The first example query, that you chose to demonstrate your
 patch had a leading, indexed column (column v1) with much lower
 cardinality/n_distinct than the column that had to be sorted on
 (column v2). That was what my comments were based on.

 When this feature is used, there will often be a significantly lower
 cardinality in the leading, indexed column (as in your example).
 Otherwise, the user might well have been happy to just order on the
 indexed/leading column alone. That isn't definitely true, but it's
 often true.


I just meant higher cardinality is cheaper for do partial sort. We could
have some misunderstood because of notions high and low are relative.
When cardinality is 1 then partial sort seems to be useless. When
cardinality is few then it still could be cheaper to do sequential scan +
sort rather than index scan + partial sort. When cardinality is close to
number of rows then as you mentioned user probably don't need to sort by
rest of columns. At least one exception is when user needs to force
uniqueness of order. So, we need to target something in the middle of this
two corner cases.


  I'm not sure if that's worth it to more or less
  duplicate heap_tuple_attr_equals() to save a mere n expensive
  comparisons, but it's something to think about (actually, there are
  probably less than even n comparisons in practice because there'll be
  a limit).
 
  Not correct. Smaller groups are not OK. Imagine that two representations
 of
  same skip column value exists. Index may return them in any order, even
  change them one by one. In this case sorting on other column never takes
  place, while it should.

 I think I explained this badly - it wouldn't be okay to make the
 grouping smaller, but as you say we could tie-break with a proper
 B-Tree support function 1 comparison to make sure we really had
 reached the end of our grouping.

 FWIW I want all bttextcmp()/varstr_cmp() comparisons to try a memcmp()
 first, so the use of the equality operator with text in mind that you
 mention may soon not be useful at all. The evidence suggests that
 memcmp() is so much cheaper than special preparatory NUL-termination +
 strcoll() that we should always try it first when sorting text, even
 when we have no good reason to think it will work. The memcmp() is
 virtually free. And so, you see why it might be worth thinking about
 this when we already have reasonable confidence that many comparisons
 will indicate that datums are equal. Other datatypes will have
 expensive real comparators, not just text or numeric.


When strings are not equal bttextcmp still needs to use strcoll while
texteq doesn't need that. So, it would be still advantage of using equality
operator over comparison function: equality operator don't have to compare
unequal values. However, real cost of this advantage depends on presorted
columns cardinality as well.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-09-15 Thread Alexander Korotkov

On Sun, Sep 14, 2014 at 9:32 AM, Peter Geoghegan p...@heroku.com wrote:

 I think we might be better off if a tuplesort function was called
 shortly after tuplesort_begin_heap() is called. How top-n heap sorts
 work is something that largely lives in tuplesort's head. Today, we
 call tuplesort_set_bound() to hint to tuplesort By the way, this is a
 top-n heap sort applicable sort. I think that with this patch, we
 should then hint (where applicable) by the way, you won't actually be
 required to sort those first n indexed attributes; rather, you can
 expect to scan those in logical order. You may work the rest out
 yourself, and may be clever about exploiting the sorted-ness of the
 first few columns. The idea of managing a bunch of tiny sorts from
 with ExecSort(), and calling the new function tuplesort_reset() seems
 questionable. tuplesortstate is supposed to be private/opaque to
 nodeSort.c, and the current design strains that.

 I'd like to keep nodeSort.c simple. I think it's pretty clear that the
 guts of this do not belong within ExecSort(), in any case. Also, the
 additions there should be much better commented, wherever they finally
 end up.


As I understand, you propose to incapsulate partial sort algorithm into
tuplesort. However, in this case we anyway need some significant change of
its interface: let tuplesort decide when it's able to return tuple.
Otherwise, we would miss significant part of LIMIT clause optimization.
tuplesort_set_bound() can't solve all the cases. There could be other
planner nodes between the partial sort and LIMIT.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-09-13 Thread Alexander Korotkov

On Tue, Aug 19, 2014 at 2:02 PM, David Rowley dgrowle...@gmail.com wrote:

 Here's a few notes from reading over the code:

 * pathkeys.c

   EquivalenceMember *member = (EquivalenceMember *)
  lfirst(list_head(key-pk_eclass-ec_members));

 You can use linitial() instead of lfirst(list_head()). The same thing
 occurs in costsize.c


Fixed.


 * pathkeys.c

 The following fragment:

 n = pathkeys_common(root-query_pathkeys, pathkeys);

 if (n != 0)
 {
 /* It's useful ... or at least the first N keys are */
  return n;
 }

 return 0; /* path ordering not useful */
 }

 Could just read:

 /* return the number of path keys in common, or 0 if there are none */
  return pathkeys_common(root-query_pathkeys, pathkeys);


Fixed.


 * execnodes.h

 In struct SortState, some new fields don't have a comment.


Fixed.


 I've also thrown a few different workloads at the patch and I'm very
 impressed with most of the results. Especially when LIMIT is used, however
 I've found a regression case which I thought I should highlight, but for
 now I can't quite see what could be done to fix it.

 create table a (x int not null, y int not null);
 insert into a select x.x,y.y from generate_series(1,100) x(x) cross
 join generate_series(1,10) y(y);

 Patched:
 explain analyze select x,y from a where x+0=1 order by x,y limit 10;
  QUERY PLAN

 
  Limit  (cost=92.42..163.21 rows=10 width=8) (actual
 time=6239.426..6239.429 rows=10 loops=1)
-  Partial sort  (cost=92.42..354064.37 rows=5 width=8) (actual
 time=6239.406..6239.407 rows=10 loops=1)
  Sort Key: x, y
  Presorted Key: x
  Sort Method: quicksort  Memory: 25kB
  -  Index Scan using a_x_idx on a  (cost=0.44..353939.13
 rows=5 width=8) (actual time=0.059..6239.319 rows=10 loops=1)
Filter: ((x + 0) = 1)
Rows Removed by Filter: 990
  Planning time: 0.212 ms
  Execution time: 6239.505 ms
 (10 rows)


 Time: 6241.220 ms

 Unpatched:
 explain analyze select x,y from a where x+0=1 order by x,y limit 10;
  QUERY PLAN

 
  Limit  (cost=195328.26..195328.28 rows=10 width=8) (actual
 time=3077.759..3077.761 rows=10 loops=1)
-  Sort  (cost=195328.26..195453.26 rows=5 width=8) (actual
 time=3077.757..3077.757 rows=10 loops=1)
  Sort Key: x, y
  Sort Method: quicksort  Memory: 25kB
  -  Seq Scan on a  (cost=0.00..194247.77 rows=5 width=8)
 (actual time=0.018..3077.705 rows=10 loops=1)
Filter: ((x + 0) = 1)
Rows Removed by Filter: 990
  Planning time: 0.510 ms
  Execution time: 3077.837 ms
 (9 rows)


 Time: 3080.201 ms

 As you can see, the patched version performs an index scan in order to get
 the partially sorted results, but it does end up quite a bit slower than
 the seqscan/sort that the unpatched master performs. I'm not quite sure how
 realistic the x+0 = 1 WHERE clause is, but perhaps the same would happen if
 something like x+y = 1 was performed too After a bit more analysis on
 this, I see that if I change the 50k estimate to 10 in the debugger that
 the num_groups is properly estimated at 1 and it then performs the seq scan
 instead. So it looks like the costings of the patch are not to blame here.
 (The 50k row estimate comes from rel tuples  / DEFAULT_NUM_DISTINCT)


Yes, the error comes from assumption of 50k row estimate. I've checked
similar example when estimate is fine.

create table b as (select x.x,y.y,x.x z from generate_series(1,100)
x(x) cross join generate_series(1,10) y(y));
create index b_x_idx on b(x);
analyze b;

There is column z which is both not in index and not in order by clause.
If we replace x+0=1 with z=1 optimizer didn't decide to use partial
sort.

explain analyze select x,y,z from b where z=1 order by x,y limit 10;
QUERY PLAN
──
 Limit  (cost=179056.59..179056.61 rows=10 width=12) (actual
time=1072.498..1072.500 rows=10 loops=1)
   -  Sort  (cost=179056.59..179056.63 rows=18 width=12) (actual
time=1072.495..1072.495 rows=10 loops=1)
 Sort Key: x, y
 Sort Method: quicksort  Memory: 25kB
 -  Seq Scan on b  (cost=0.00..179056.21 rows=18 width=12) (actual
time=0.020..1072.454 rows=10 loops=1)
   Filter: (z = 1)
   Rows Removed by Filter: 990
 Planning time: 0.501 ms
 Execution time: 1072.555 ms
(9 rows)

If we event force optimizer to use partial sort then cost estimation will
be fine.

set enable_seqscan = off;
explain

Re: [HACKERS] PoC: Partial sort

2014-09-13 Thread Peter Geoghegan

On Fri, Sep 12, 2014 at 2:19 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 Actually, higher cardinality skip columns is better. Sorting of smaller
 groups is faster than sorting larger groups of same size. Also, with smaller
 groups you achieve limit more accurate (in average), i.e. sort smaller
 amount of total rows.

Higher cardinality leading key columns are better for what? Do you
mean they're better in that those cases are more sympathetic to your
patch, or better in the general sense that they'll perform better for
the user? The first example query, that you chose to demonstrate your
patch had a leading, indexed column (column v1) with much lower
cardinality/n_distinct than the column that had to be sorted on
(column v2). That was what my comments were based on.

When this feature is used, there will often be a significantly lower
cardinality in the leading, indexed column (as in your example).
Otherwise, the user might well have been happy to just order on the
indexed/leading column alone. That isn't definitely true, but it's
often true.

 I'm not sure if that's worth it to more or less
 duplicate heap_tuple_attr_equals() to save a mere n expensive
 comparisons, but it's something to think about (actually, there are
 probably less than even n comparisons in practice because there'll be
 a limit).

 Not correct. Smaller groups are not OK. Imagine that two representations of
 same skip column value exists. Index may return them in any order, even
 change them one by one. In this case sorting on other column never takes
 place, while it should.

I think I explained this badly - it wouldn't be okay to make the
grouping smaller, but as you say we could tie-break with a proper
B-Tree support function 1 comparison to make sure we really had
reached the end of our grouping.

FWIW I want all bttextcmp()/varstr_cmp() comparisons to try a memcmp()
first, so the use of the equality operator with text in mind that you
mention may soon not be useful at all. The evidence suggests that
memcmp() is so much cheaper than special preparatory NUL-termination +
strcoll() that we should always try it first when sorting text, even
when we have no good reason to think it will work. The memcmp() is
virtually free. And so, you see why it might be worth thinking about
this when we already have reasonable confidence that many comparisons
will indicate that datums are equal. Other datatypes will have
expensive real comparators, not just text or numeric.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-09-13 Thread Peter Geoghegan

Some quick comments on partial-sort-basic-2.patch:

 *** a/src/include/utils/tuplesort.h
 --- b/src/include/utils/tuplesort.h
 ***
 *** 24,29 
 --- 24,30 
   #include executor/tuptable.h
   #include fmgr.h
   #include utils/relcache.h
 + #include utils/sortsupport.h

Why include sortsupport.h here?

I would like to see more comments, especially within ExecSort(). The
structure of that routine is quite unclear.

I don't really like this MakeSortSupportKeys() stuff within ExecSort():

 !   /* Support structures for cmpSortSkipCols - already sorted columns */
 !   if (skipCols)
 !   node-skipKeys = MakeSortSupportKeys(skipCols,
 !plannode-sortColIdx,
 !plannode-sortOperators,
 !plannode-collations,
 !plannode-nullsFirst);

 +   /* Only pass on remaining columns that are unsorted */
 tuplesortstate = tuplesort_begin_heap(tupDesc,
 ! plannode-numCols - skipCols,
 ! 
 (plannode-sortColIdx[skipCols]),
 ! 
 (plannode-sortOperators[skipCols]),
 ! 
 (plannode-collations[skipCols]),
 ! 
 (plannode-nullsFirst[skipCols]),
   work_mem,
   node-randomAccess);

You're calling the new function MakeSortSupportKeys() (which
encapsulates setting up sortsupport state for all attributes) twice;
first, to populate the skip keys (the indexed attribute(s)), and
second, when tuplesort_begin_heap() is called, so that there is state
for unindexed sort groups that must be manually sorted. That doesn't
seem great.

I think we might be better off if a tuplesort function was called
shortly after tuplesort_begin_heap() is called. How top-n heap sorts
work is something that largely lives in tuplesort's head. Today, we
call tuplesort_set_bound() to hint to tuplesort By the way, this is a
top-n heap sort applicable sort. I think that with this patch, we
should then hint (where applicable) by the way, you won't actually be
required to sort those first n indexed attributes; rather, you can
expect to scan those in logical order. You may work the rest out
yourself, and may be clever about exploiting the sorted-ness of the
first few columns. The idea of managing a bunch of tiny sorts from
with ExecSort(), and calling the new function tuplesort_reset() seems
questionable. tuplesortstate is supposed to be private/opaque to
nodeSort.c, and the current design strains that.

I'd like to keep nodeSort.c simple. I think it's pretty clear that the
guts of this do not belong within ExecSort(), in any case. Also, the
additions there should be much better commented, wherever they finally
end up.

In this struct:

 *** a/src/include/nodes/execnodes.h
 --- b/src/include/nodes/execnodes.h
 *** typedef struct SortState
 *** 1670,1678 
 --- 1670,1682 
  boolbounded;/* is the result set bounded? */
  int64   bound;  /* if bounded, how many tuples are needed */
  boolsort_Done;  /* sort completed yet? */
 +   boolfinished;   /* fetching tuples from outer node
 +  is finished ? */
  boolbounded_Done;   /* value of bounded we did the sort with */
  int64   bound_Done; /* value of bound we did the sort with */
  void   *tuplesortstate; /* private state of tuplesort.c */
 +   SortSupport skipKeys;   /* columns already sorted in input */
 +   HeapTuple   prev;   /* previous tuple from outer node */
   } SortState;

I think you should be clearer about the scope and duration of fields
like finished, if this really belongs here. In general, there should
be some high-level comments about how the feature added by the patch
fits together, which there isn't right now.

That's all I have for now.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-09-12 Thread Alexander Korotkov

On Sun, Jul 13, 2014 at 6:45 AM, Peter Geoghegan p...@heroku.com wrote:

 On Mon, Feb 10, 2014 at 10:59 AM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  Done. Patch is splitted.

 I took a quick look at this.

 Have you thought about making your new cmpSortSkipCols() function not
 use real comparisons? Since in the circumstances in which this
 optimization is expected to be effective (e.g. your original example)
 we can also expect a relatively low cardinality for the first n
 indexed attributes (again, as in your original example), in general
 when cmpSortSkipCols() is called there is a high chance that it will
 return true. If any pair of tuples (logically adjacent tuples fed in
 to cmpSortSkipCols() by an index scan in logical order) are not fully
 equal (i.e. their leading, indexed attributes are not equal) then we
 don't care about the details -- we just know that a new sort grouping
 is required.


Actually, higher cardinality skip columns is better. Sorting of smaller
groups is faster than sorting larger groups of same size. Also, with
smaller groups you achieve limit more accurate (in average), i.e. sort
smaller amount of total rows.


 The idea here is that you can get away with simple binary equality
 comparisons, as we do when considering HOT-safety. Of course, you
 might find that two bitwise unequal values are equal according to
 their ordinary B-Tree support function 1 comparator (e.g. two numerics
 that differ only in their display scale). AFAICT this should be okay,
 since that just means that you have smaller sort groupings than
 strictly necessary. I'm not sure if that's worth it to more or less
 duplicate heap_tuple_attr_equals() to save a mere n expensive
 comparisons, but it's something to think about (actually, there are
 probably less than even n comparisons in practice because there'll be
 a limit).


Not correct. Smaller groups are not OK. Imagine that two representations of
same skip column value exists. Index may return them in any order, even
change them one by one. In this case sorting on other column never takes
place, while it should. But some optimizations are still possible:

   1. Use bitwise comparison first, then recheck. But, no guarantees that
   acceleration will be achieved.
   2. Use equality check instead of btree comparison. For text datatype
   it would be rather faster because of no locale-aware comparison.


--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-08-19 Thread David Rowley

On Tue, Feb 11, 2014 at 7:59 AM, Alexander Korotkov aekorot...@gmail.com
wrote:

 Done. Patch is splitted.


I've started to look at this, and for now I'm still finding my way around
the patch, so I'm not quite there yet with understanding everything.
Never-the-less it seems best to post my comments early, so as to help
maintain concurrency between the review and getting the patch into shape.

I've only been looking at partial-sort-basic-1.patch so far;

The patch no longer applies to master, but this was only due to a tab being
replaced by 2 spaces in a pgident run. I've attached an updated patch which
currently applies without any issues.

Here's a few notes from reading over the code:

* pathkeys.c

  EquivalenceMember *member = (EquivalenceMember *)
lfirst(list_head(key-pk_eclass-ec_members));

You can use linitial() instead of lfirst(list_head()). The same thing
occurs in costsize.c

* pathkeys.c

The following fragment:

n = pathkeys_common(root-query_pathkeys, pathkeys);

if (n != 0)
{
/* It's useful ... or at least the first N keys are */
return n;
}

return 0; /* path ordering not useful */
}

Could just read:

/* return the number of path keys in common, or 0 if there are none */
return pathkeys_common(root-query_pathkeys, pathkeys);

* execnodes.h

In struct SortState, some new fields don't have a comment.


I've also thrown a few different workloads at the patch and I'm very
impressed with most of the results. Especially when LIMIT is used, however
I've found a regression case which I thought I should highlight, but for
now I can't quite see what could be done to fix it.

create table a (x int not null, y int not null);
insert into a select x.x,y.y from generate_series(1,100) x(x) cross
join generate_series(1,10) y(y);

Patched:
explain analyze select x,y from a where x+0=1 order by x,y limit 10;
 QUERY PLAN

 Limit  (cost=92.42..163.21 rows=10 width=8) (actual
time=6239.426..6239.429 rows=10 loops=1)
   -  Partial sort  (cost=92.42..354064.37 rows=5 width=8) (actual
time=6239.406..6239.407 rows=10 loops=1)
 Sort Key: x, y
 Presorted Key: x
 Sort Method: quicksort  Memory: 25kB
 -  Index Scan using a_x_idx on a  (cost=0.44..353939.13
rows=5 width=8) (actual time=0.059..6239.319 rows=10 loops=1)
   Filter: ((x + 0) = 1)
   Rows Removed by Filter: 990
 Planning time: 0.212 ms
 Execution time: 6239.505 ms
(10 rows)


Time: 6241.220 ms

Unpatched:
explain analyze select x,y from a where x+0=1 order by x,y limit 10;
 QUERY PLAN

 Limit  (cost=195328.26..195328.28 rows=10 width=8) (actual
time=3077.759..3077.761 rows=10 loops=1)
   -  Sort  (cost=195328.26..195453.26 rows=5 width=8) (actual
time=3077.757..3077.757 rows=10 loops=1)
 Sort Key: x, y
 Sort Method: quicksort  Memory: 25kB
 -  Seq Scan on a  (cost=0.00..194247.77 rows=5 width=8)
(actual time=0.018..3077.705 rows=10 loops=1)
   Filter: ((x + 0) = 1)
   Rows Removed by Filter: 990
 Planning time: 0.510 ms
 Execution time: 3077.837 ms
(9 rows)


Time: 3080.201 ms

As you can see, the patched version performs an index scan in order to get
the partially sorted results, but it does end up quite a bit slower than
the seqscan/sort that the unpatched master performs. I'm not quite sure how
realistic the x+0 = 1 WHERE clause is, but perhaps the same would happen if
something like x+y = 1 was performed too After a bit more analysis on
this, I see that if I change the 50k estimate to 10 in the debugger that
the num_groups is properly estimated at 1 and it then performs the seq scan
instead. So it looks like the costings of the patch are not to blame here.
(The 50k row estimate comes from rel tuples  / DEFAULT_NUM_DISTINCT)

That's all I have at the moment... More to follow soon.

Regards

David Rowley


partial-sort-basic-1_rebased.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-07-12 Thread Peter Geoghegan

On Mon, Feb 10, 2014 at 10:59 AM, Alexander Korotkov
aekorot...@gmail.com wrote:
 Done. Patch is splitted.

I took a quick look at this.

Have you thought about making your new cmpSortSkipCols() function not
use real comparisons? Since in the circumstances in which this
optimization is expected to be effective (e.g. your original example)
we can also expect a relatively low cardinality for the first n
indexed attributes (again, as in your original example), in general
when cmpSortSkipCols() is called there is a high chance that it will
return true. If any pair of tuples (logically adjacent tuples fed in
to cmpSortSkipCols() by an index scan in logical order) are not fully
equal (i.e. their leading, indexed attributes are not equal) then we
don't care about the details -- we just know that a new sort grouping
is required.

The idea here is that you can get away with simple binary equality
comparisons, as we do when considering HOT-safety. Of course, you
might find that two bitwise unequal values are equal according to
their ordinary B-Tree support function 1 comparator (e.g. two numerics
that differ only in their display scale). AFAICT this should be okay,
since that just means that you have smaller sort groupings than
strictly necessary. I'm not sure if that's worth it to more or less
duplicate heap_tuple_attr_equals() to save a mere n expensive
comparisons, but it's something to think about (actually, there are
probably less than even n comparisons in practice because there'll be
a limit).

A similar idea appears in my SortSupport for text (Poor man's
normalized key/strxfrm()) patch. A poor man's key comparison didn't
work out, and there may be further differences that aren't captured in
the special simple key representation, so we need to do a proper
comparison to figure it out for sure. However, within the sortsupport
routine comparator, we know that we're being called in this context,
as a tie-breaker for a poor man's normalized key comparison that
returned 0, and so are optimistic about the two datums being fully
equal. An optimistic memcmp() is attempted before a strcoll() here if
the lengths also match.

I have not actually added special hints so that we're optimistic about
keys being equal in other places (places that have nothing to do with
the general idea of poor man's normalized keys), but that might not be
a bad idea. Actually, it might not be a bad idea to just always have
varstr_cmp() attempt a memcmp() first when two texts have equal
length, no matter how it's called.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-02-24 Thread Robert Haas

On Wed, Feb 19, 2014 at 1:39 PM, Marti Raudsepp ma...@juffo.org wrote:
 On Wed, Feb 12, 2014 at 11:54 PM, Marti Raudsepp ma...@juffo.org wrote:
 With partial-sort-basic-1 and this fix on the same test suite, the
 planner overhead is now a more manageable 0.5% to 1.3%; one test is
 faster by 0.5%.

 Ping, Robert or anyone, does this overhead seem bearable or is that
 still too much?

 Do these numbers look conclusive enough or should I run more tests?

Tom should really be the one to comment on this, I think.  I read
through the patch quickly and it looks much less scary than the early
versions, but it's not obvious to me whether the remaining overhead is
enough to worry about.  I'd need to spend more time studying it to
form a really sound opinion on that topic, and unfortunately I don't
have that time right now.

I think it'd be interesting to try to determine specifically where
that overhead is coming from.  Pick the test case where it's the worst
(1.3%) and do a perf with and without the patch and look at the
difference in the call graph.  It's possible we could have changes on
that order of magnitude just from more or less fortuitous code layout
decisions as code shifts around, but it's also possible that there's a
real effect there we should think harder about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-02-19 Thread Marti Raudsepp

On Wed, Feb 12, 2014 at 11:54 PM, Marti Raudsepp ma...@juffo.org wrote:
 With partial-sort-basic-1 and this fix on the same test suite, the
 planner overhead is now a more manageable 0.5% to 1.3%; one test is
 faster by 0.5%.

Ping, Robert or anyone, does this overhead seem bearable or is that
still too much?

Do these numbers look conclusive enough or should I run more tests?

 I think the 1st patch now has a bug in initial_cost_mergejoin; you
 still pass the presorted_keys argument to cost_sort, making it
 calculate a partial sort cost

Ping, Alexander?

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-02-19 Thread Alexander Korotkov

On Thu, Feb 13, 2014 at 1:54 AM, Marti Raudsepp ma...@juffo.org wrote:

 I think the 1st patch now has a bug in initial_cost_mergejoin; you
 still pass the presorted_keys argument to cost_sort, making it
 calculate a partial sort cost, but generated plans never use partial
 sort. I think 0 should be passed instead. Patch attached, needs to be
 applied on top of partial-sort-basic-1 and then reverse-applied on
 partial-sort-merge-1.


It doesn't look so for me. Merge join doesn't find partial sort especially.
But if path with some presorted pathkeys will be accidentally selected then
partial sort will be used. See create_mergejoin_plan function. So, I think
this cost_sort call is relevant to create_mergejoin_plan. If we don't want
partial sort to be used in such rare cases then we should revert it from
both places. However, I doubt that it does any overhead, so we can leave it
as is.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-02-12 Thread Marti Raudsepp

On Mon, Feb 10, 2014 at 8:59 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 Done. Patch is splitted.

Thanks!

I think the 1st patch now has a bug in initial_cost_mergejoin; you
still pass the presorted_keys argument to cost_sort, making it
calculate a partial sort cost, but generated plans never use partial
sort. I think 0 should be passed instead. Patch attached, needs to be
applied on top of partial-sort-basic-1 and then reverse-applied on
partial-sort-merge-1.

With partial-sort-basic-1 and this fix on the same test suite, the
planner overhead is now a more manageable 0.5% to 1.3%; one test is
faster by 0.5%. Built with asserts disabled, ran on Intel i5-3570K. In
an effort to reduce variance, I locked the server and pgbench to a
single CPU core (taskset -c 3), but there are still noticeable
run-to-run differences, so these numbers are a bit fuzzy. The faster
result is definitely not a fluke, however; it happens every time.

 On Mon, Feb 10, 2014 at 2:33 PM, Marti Raudsepp ma...@juffo.org wrote:
 AFAICT this only happens once per plan and the overhead is O(n) to the
 number of pathkeys?

I was of course wrong about that, it also adds extra overhead when
iterating over the paths list.


Test taskset -c 3 run2.sh from
https://github.com/intgr/benchjunk/tree/master/partial_sort

Overhead percentages (between best of each 3 runs):
join5.sql 0.7
star5.sql 0.8
line5.sql 0.5
lim_join5.sql -0.5
lim_star5.sql 1.3
lim_line5.sql 0.5
limord_join5.sql 0.6
limord_star5.sql 0.5
limord_line5.sql 0.7

Raw results:
git 48870dd
join5.sql tps = 509.328070 (excluding connections establishing)
join5.sql tps = 509.772190 (excluding connections establishing)
join5.sql tps = 510.651517 (excluding connections establishing)
star5.sql tps = 499.208698 (excluding connections establishing)
star5.sql tps = 498.200314 (excluding connections establishing)
star5.sql tps = 496.269315 (excluding connections establishing)
line5.sql tps = 797.968831 (excluding connections establishing)
line5.sql tps = 797.011690 (excluding connections establishing)
line5.sql tps = 796.379258 (excluding connections establishing)
lim_join5.sql tps = 394.946024 (excluding connections establishing)
lim_join5.sql tps = 395.417689 (excluding connections establishing)
lim_join5.sql tps = 395.482958 (excluding connections establishing)
lim_star5.sql tps = 423.434393 (excluding connections establishing)
lim_star5.sql tps = 423.774305 (excluding connections establishing)
lim_star5.sql tps = 424.386099 (excluding connections establishing)
lim_line5.sql tps = 733.007330 (excluding connections establishing)
lim_line5.sql tps = 731.794731 (excluding connections establishing)
lim_line5.sql tps = 732.356280 (excluding connections establishing)
limord_join5.sql tps = 385.317921 (excluding connections establishing)
limord_join5.sql tps = 385.915870 (excluding connections establishing)
limord_join5.sql tps = 384.747848 (excluding connections establishing)
limord_star5.sql tps = 417.992615 (excluding connections establishing)
limord_star5.sql tps = 416.944685 (excluding connections establishing)
limord_star5.sql tps = 418.262647 (excluding connections establishing)
limord_line5.sql tps = 708.979203 (excluding connections establishing)
limord_line5.sql tps = 710.926866 (excluding connections establishing)
limord_line5.sql tps = 710.928907 (excluding connections establishing)

48870dd + partial-sort-basic-1.patch.gz + fix-cost_sort.patch
join5.sql tps = 505.488181 (excluding connections establishing)
join5.sql tps = 507.222759 (excluding connections establishing)
join5.sql tps = 506.549654 (excluding connections establishing)
star5.sql tps = 495.432915 (excluding connections establishing)
star5.sql tps = 494.906793 (excluding connections establishing)
star5.sql tps = 492.623808 (excluding connections establishing)
line5.sql tps = 789.315968 (excluding connections establishing)
line5.sql tps = 793.875456 (excluding connections establishing)
line5.sql tps = 790.545990 (excluding connections establishing)
lim_join5.sql tps = 396.956732 (excluding connections establishing)
lim_join5.sql tps = 397.515213 (excluding connections establishing)
lim_join5.sql tps = 397.578669 (excluding connections establishing)
lim_star5.sql tps = 417.459963 (excluding connections establishing)
lim_star5.sql tps = 418.024803 (excluding connections establishing)
lim_star5.sql tps = 418.830234 (excluding connections establishing)
lim_line5.sql tps = 729.186915 (excluding connections establishing)
lim_line5.sql tps = 726.288788 (excluding connections establishing)
lim_line5.sql tps = 728.123296 (excluding connections establishing)
limord_join5.sql tps = 383.484767 (excluding connections establishing)
limord_join5.sql tps = 383.021960 (excluding connections establishing)
limord_join5.sql tps = 383.722051 (excluding connections establishing)
limord_star5.sql tps = 414.138460 (excluding connections establishing)
limord_star5.sql tps = 414.063766 (excluding connections establishing)
limord_star5.sql

Re: [HACKERS] PoC: Partial sort

2014-02-10 Thread Marti Raudsepp

On Sun, Feb 9, 2014 at 7:37 PM, Alexander Korotkov aekorot...@gmail.com wrote:
 This is not only place that worry me about planning overhead. See
 get_cheapest_fractional_path_for_pathkeys. I had to estimate number of
 groups for each sorting column in order to get right fractional path.

AFAICT this only happens once per plan and the overhead is O(n) to the
number of pathkeys? I can't get worried about that, but I guess it's
better to test anyway.

PS: You didn't answer my questions about splitting the patch. I guess
I'll have to do that anyway to run the tests.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-02-10 Thread Alexander Korotkov

On Mon, Feb 10, 2014 at 2:33 PM, Marti Raudsepp ma...@juffo.org wrote:

 On Sun, Feb 9, 2014 at 7:37 PM, Alexander Korotkov aekorot...@gmail.com
 wrote:
  This is not only place that worry me about planning overhead. See
  get_cheapest_fractional_path_for_pathkeys. I had to estimate number of
  groups for each sorting column in order to get right fractional path.

 AFAICT this only happens once per plan and the overhead is O(n) to the
 number of pathkeys? I can't get worried about that, but I guess it's
 better to test anyway.

 PS: You didn't answer my questions about splitting the patch. I guess
 I'll have to do that anyway to run the tests.


Done. Patch is splitted.

--
With best regards,
Alexander Korotkov.


partial-sort-basic-1.patch.gz
Description: GNU Zip compressed data


partial-sort-merge-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-02-09 Thread Alexander Korotkov

On Thu, Feb 6, 2014 at 12:39 PM, Marti Raudsepp ma...@juffo.org wrote:

 On Thu, Feb 6, 2014 at 5:31 AM, Robert Haas robertmh...@gmail.com wrote:
  Hmm, sounds a little steep.  Why is it so expensive?  I'm probably
  missing something here, because I would have thought that planner
  support for partial sorts would consist mostly of considering the same
  sorts we consider today, but with the costs reduced by the batching.

 I guess it's because the patch undoes some optimizations in the
 mergejoin planner wrt caching merge clauses and adds a whole lot of
 code to find_mergeclauses_for_pathkeys. In other code paths the
 overhead does seem to be negligible.

 Notice the removal of:
 /* Select the right mergeclauses, if we didn't already */
 /*
  * Avoid rebuilding clause list if we already made one;
  * saves memory in big join trees...
  */


This is not only place that worry me about planning overhead.
See get_cheapest_fractional_path_for_pathkeys. I had to estimate number of
groups for each sorting column in order to get right fractional path. For
partial sort path, cost of first batch should be included into initial
cost.
If don't do so, optimizer can pick up strange plans basing on assumption
that it need only few rows from inner node. See an example.

create table test1 as (
select id,
(random()*100)::int as v1,
(random()*1)::int as v2
from generate_series(1,100) id);

create table test2 as (
select id,
(random()*100)::int as v1,
(random()*1)::int as v2
from generate_series(1,100) id);

create index test1_v1_idx on test1 (v1);

Plan without fraction estimation in
get_cheapest_fractional_path_for_pathkeys:

postgres=# explain select * from test1 t1 join test2 t2 on t1.v1 = t2.v1
order by t1.v1, t1.id limit 10;
QUERY PLAN

--
 Limit  (cost=198956893.20..198956913.33 rows=10 width=24)
   -  Partial sort  (cost=198956893.20..19909637942.82 rows=9791031169
width=24)
 Sort Key: t1.v1, t1.id
 Presorted Key: t1.v1
 -  Nested Loop  (cost=0.42..19883065506.84 rows=9791031169
width=24)
   Join Filter: (t1.v1 = t2.v1)
   -  Index Scan using test1_v1_idx on test1 t1
 (cost=0.42..47600.84 rows=100 width=12)
   -  Materialize  (cost=0.00..25289.00 rows=100 width=12)
 -  Seq Scan on test2 t2  (cost=0.00..15406.00
rows=100 width=12)
(9 rows)

Current version of patch:

postgres=# explain select * from test1 t1 join test2 t2 on t1.v1 = t2.v1
order by t1.v1, t1.id limit 10;
QUERY PLAN

--
 Limit  (cost=3699913.43..3699913.60 rows=10 width=24)
   -  Partial sort  (cost=3699913.43..173638549.67 rows=9791031169
width=24)
 Sort Key: t1.v1, t1.id
 Presorted Key: t1.v1
 -  Merge Join  (cost=150444.79..147066113.70 rows=9791031169
width=24)
   Merge Cond: (t1.v1 = t2.v1)
   -  Index Scan using test1_v1_idx on test1 t1
 (cost=0.42..47600.84 rows=100 width=12)
   -  Materialize  (cost=149244.84..154244.84 rows=100
width=12)
 -  Sort  (cost=149244.84..151744.84 rows=100
width=12)
   Sort Key: t2.v1
   -  Seq Scan on test2 t2  (cost=0.00..15406.00
rows=100 width=12)
(11 rows)

I don't compare actual execution times because I didn't wait until first
plan execution ends up :-)
But anyway costs are extraordinary and inner sequential scan of 100
rows is odd.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-02-06 Thread Marti Raudsepp

On Thu, Feb 6, 2014 at 5:31 AM, Robert Haas robertmh...@gmail.com wrote:
 Hmm, sounds a little steep.  Why is it so expensive?  I'm probably
 missing something here, because I would have thought that planner
 support for partial sorts would consist mostly of considering the same
 sorts we consider today, but with the costs reduced by the batching.

I guess it's because the patch undoes some optimizations in the
mergejoin planner wrt caching merge clauses and adds a whole lot of
code to find_mergeclauses_for_pathkeys. In other code paths the
overhead does seem to be negligible.

Notice the removal of:
/* Select the right mergeclauses, if we didn't already */
/*
 * Avoid rebuilding clause list if we already made one;
 * saves memory in big join trees...
 */

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-02-06 Thread Robert Haas

On Thu, Feb 6, 2014 at 3:39 AM, Marti Raudsepp ma...@juffo.org wrote:
 On Thu, Feb 6, 2014 at 5:31 AM, Robert Haas robertmh...@gmail.com wrote:
 Hmm, sounds a little steep.  Why is it so expensive?  I'm probably
 missing something here, because I would have thought that planner
 support for partial sorts would consist mostly of considering the same
 sorts we consider today, but with the costs reduced by the batching.

 I guess it's because the patch undoes some optimizations in the
 mergejoin planner wrt caching merge clauses and adds a whole lot of
 code to find_mergeclauses_for_pathkeys. In other code paths the
 overhead does seem to be negligible.

 Notice the removal of:
 /* Select the right mergeclauses, if we didn't already */
 /*
  * Avoid rebuilding clause list if we already made one;
  * saves memory in big join trees...
  */

Yeah, I noticed that.  My feeling is that those optimizations got put
in there because someone found them to be important, so I'm skeptical
about removing them.  It may be that having the capability to do a
partial sort makes it seem worth spending more CPU looking for merge
joins, but I'd vote for making any such change a separate patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-02-06 Thread Marti Raudsepp

On Thu, Feb 6, 2014 at 9:15 PM, Robert Haas robertmh...@gmail.com wrote:
 It may be that having the capability to do a
 partial sort makes it seem worth spending more CPU looking for merge
 joins, but I'd vote for making any such change a separate patch.

Agreed.

Alexander, should I work on splitting up the patch in two, or do you
want to do it yourself?

Should I merge my coding style and enable_partialsort patches while at
it, or do you still have reservations about those?

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-02-06 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Feb 6, 2014 at 3:39 AM, Marti Raudsepp ma...@juffo.org wrote:
 I guess it's because the patch undoes some optimizations in the
 mergejoin planner wrt caching merge clauses and adds a whole lot of
 code to find_mergeclauses_for_pathkeys. In other code paths the
 overhead does seem to be negligible.

 Yeah, I noticed that.  My feeling is that those optimizations got put
 in there because someone found them to be important, so I'm skeptical
 about removing them.

I put them in, and yeah they are important.  Even with those, and even
with the rather arbitrary heuristic restrictions that joinpath.c puts on
what mergeclause lists to consider, the existing planner spends a whole
lot of effort on mergejoins --- possibly disproportionate to their actual
value.  I think that any patch that removes those optimizations is not
going to fly.  If anything, it'd be better to reduce the number of
mergejoins considered even further, because a lot of the possible plans
are not usefully different.

It's already the case that we expect indxpath.c to predict the useful
orderings (by reference to query_pathkeys and available mergejoin clauses)
and generate suitable paths, rather than trying to identify the orderings
at join time.  Can't that approach be extended to cover this technique?

In any case, the bottom line is that we don't want this patch to cause
the planner to consider large numbers of new but useless sort orderings.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-02-05 Thread Marti Raudsepp

On Tue, Jan 28, 2014 at 7:51 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Tue, Jan 28, 2014 at 7:41 AM, Marti Raudsepp ma...@juffo.org wrote:

 But some benchmarks of planning performance are certainly warranted.


 I didn't test it, but I worry that overhead might be high.
 If it's true then it could be like constraint_exclusion option which id
 off by default because of planning overhead.


Sorry I didn't get around to this before.

I ran some synthetic benchmarks with single-column inner joins between 5
tables, with indexes on both joined columns, using only EXPLAIN (so
measuring planning time, not execution) in 9 scenarios to excercise
different code paths. According to these measurements, the overhead ranges
between 1.0 and 4.5% depending on the scenario.


Merge join with partial sort children seems like a fairly obscure use case
(though I'm sure it can help a lot in those cases). The default should
definitely allow partial sort in normal ORDER BY queries. What's under
question here is whether to enable partial sort for mergejoin.

So I see 3 possible resolutions:
1. The overhead is deemed acceptable to enable by default, in which case
we're done here.
2. Add a three-value runtime setting like: enable_partialsort = [ off |
no_mergejoin | on ], defaulting to no_mergejoin (just to get the point
across, clearly we need better naming). This is how constraint_exclusion
works.
3. Remove the partialsort mergejoin code entirely, keeping the rest of the
cases.

What do you think?


All the tests are available here:
https://github.com/intgr/benchjunk/tree/master/partial_sort (using script
run2.sh)

Overhead by test (partial-sort-7.patch.gz):
join5.sql 2.9% (all joins on the same column)
star5.sql 1.7% (star schema kind of join)
line5.sql 1.9% (joins chained to each other)
lim_join5.sql 4.5% (same as above, with LIMIT 1)
lim_star5.sql 2.8%
lim_line5.sql 1.8%
limord_join5.sql 4.3% (same as above, with ORDER BY  LIMIT 1)
limord_star5.sql 3.9%
limord_line5.sql 1.0%

Full data:
PostgreSQL @ git ac8bc3b
join5.sql tps = 499.490173 (excluding connections establishing)
join5.sql tps = 503.756335 (excluding connections establishing)
join5.sql tps = 504.814072 (excluding connections establishing)
star5.sql tps = 492.799230 (excluding connections establishing)
star5.sql tps = 492.570615 (excluding connections establishing)
star5.sql tps = 491.949985 (excluding connections establishing)
line5.sql tps = 773.945050 (excluding connections establishing)
line5.sql tps = 773.858068 (excluding connections establishing)
line5.sql tps = 774.551240 (excluding connections establishing)
lim_join5.sql tps = 392.539745 (excluding connections establishing)
lim_join5.sql tps = 391.867549 (excluding connections establishing)
lim_join5.sql tps = 393.361655 (excluding connections establishing)
lim_star5.sql tps = 418.431804 (excluding connections establishing)
lim_star5.sql tps = 419.258985 (excluding connections establishing)
lim_star5.sql tps = 419.434697 (excluding connections establishing)
lim_line5.sql tps = 713.852506 (excluding connections establishing)
lim_line5.sql tps = 713.636694 (excluding connections establishing)
lim_line5.sql tps = 712.971719 (excluding connections establishing)
limord_join5.sql tps = 381.068465 (excluding connections establishing)
limord_join5.sql tps = 380.379359 (excluding connections establishing)
limord_join5.sql tps = 381.182385 (excluding connections establishing)
limord_star5.sql tps = 412.997935 (excluding connections establishing)
limord_star5.sql tps = 411.401352 (excluding connections establishing)
limord_star5.sql tps = 413.209784 (excluding connections establishing)
limord_line5.sql tps = 688.906406 (excluding connections establishing)
limord_line5.sql tps = 689.445483 (excluding connections establishing)
limord_line5.sql tps = 688.758042 (excluding connections establishing)

partial-sort-7.patch.gz
join5.sql tps = 479.508034 (excluding connections establishing)
join5.sql tps = 488.263674 (excluding connections establishing)
join5.sql tps = 490.127433 (excluding connections establishing)
star5.sql tps = 482.106063 (excluding connections establishing)
star5.sql tps = 484.179687 (excluding connections establishing)
star5.sql tps = 483.027372 (excluding connections establishing)
line5.sql tps = 758.092993 (excluding connections establishing)
line5.sql tps = 759.697814 (excluding connections establishing)
line5.sql tps = 759.792792 (excluding connections establishing)
lim_join5.sql tps = 375.517211 (excluding connections establishing)
lim_join5.sql tps = 375.539109 (excluding connections establishing)
lim_join5.sql tps = 375.841645 (excluding connections establishing)
lim_star5.sql tps = 407.683110 (excluding connections establishing)
lim_star5.sql tps = 407.414409 (excluding connections establishing)
lim_star5.sql tps = 407.526613 (excluding connections establishing)
lim_line5.sql tps = 699.905101 (excluding connections establishing)
lim_line5.sql tps = 700.349675 (excluding

Re: [HACKERS] PoC: Partial sort

2014-02-05 Thread Robert Haas

On Wed, Feb 5, 2014 at 6:58 PM, Marti Raudsepp ma...@juffo.org wrote:
 I ran some synthetic benchmarks with single-column inner joins between 5
 tables, with indexes on both joined columns, using only EXPLAIN (so
 measuring planning time, not execution) in 9 scenarios to excercise
 different code paths. According to these measurements, the overhead ranges
 between 1.0 and 4.5% depending on the scenario.

Hmm, sounds a little steep.  Why is it so expensive?  I'm probably
missing something here, because I would have thought that planner
support for partial sorts would consist mostly of considering the same
sorts we consider today, but with the costs reduced by the batching.
Changing the cost estimation that way can't be that much more
expensive than what we're already doing, so the overhead should be
minimal.  What the patch is actually doing seems to be something quite
a bit more invasive than that, but I'm not sure what it is exactly, or
why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-28 Thread Marti Raudsepp

On Tue, Jan 28, 2014 at 7:51 AM, Alexander Korotkov
aekorot...@gmail.com wrote:
 I didn't test it, but I worry that overhead might be high.
 If it's true then it could be like constraint_exclusion option which id off
 by default because of planning overhead.

I see, that makes sense.

I will try to find the time to run some benchmarks in the coming few days.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-27 Thread Alexander Korotkov

Hi!

On Tue, Jan 21, 2014 at 3:24 AM, Marti Raudsepp ma...@juffo.org wrote:

 On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov
 aekorot...@gmail.com wrote:
 On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp ma...@juffo.org wrote:
  I've been trying it out in a few situations. I implemented a new
  enable_partialsort GUC to make it easier to turn on/off, this way it's
 a lot
  easier to test. The attached patch applies on top of
 partial-sort-5.patch
 
  I though about such option. Generally not because of testing convenience,
  but because of overhead of planning. This way you implement it is quite
  naive :)

 I don't understand. I had another look at this and cost_sort still
 seems like the best place to implement this, since that's where the
 patch decides how many pre-sorted columns to use. Both mergejoin and
 simple order-by plans call into it. If enable_partialsort=false then I
 skip all pre-sorted options except full sort, making cost_sort behave
 pretty much like it did before the patch.

 I could change pathkeys_common to return 0, but that seems like a
 generic function that shouldn't be tied to partialsort. The old code
 paths called pathkeys_contained_in anyway, which has similar
 complexity. (Apart for initial_cost_mergejoin, but that doesn't seem
 special enough to make an exception for).

 Or should I use?:
   enable_partialsort ? pathkeys_common(...) : 0

  For instance, merge join rely on partial sort which will be
  replaced with simple sort.

 Are you saying that enable_partialsort=off should keep
 partialsort-based mergejoins enabled?

 Or are you saying that merge joins shouldn't use simple sort at all?
 But merge join was already able to use full Sort nodes before your
 patch.


Sorry that I didn't explained it. In particular I mean following:
1) With enable_partialsort = off all mergejoin logic should behave as
without partial sort patch.
2) With partial sort patch get_cheapest_fractional_path_for_pathkeys
function is much more expensive to execute. With enable_partialsort = off
it should be as cheap as without partial sort patch.
I'll try to implement this option in this week.
For now, I have attempt to fix extra columns in mergejoin problem. It would
be nice if you test it.

--
With best regards,
Alexander Korotkov.


partial-sort-7.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-27 Thread Marti Raudsepp

On Mon, Jan 27, 2014 at 9:26 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 For now, I have attempt to fix extra columns in mergejoin problem. It would
 be nice if you test it.

Yes, it solves the test cases I was trying with, thanks.

 1) With enable_partialsort = off all mergejoin logic should behave as
 without partial sort patch.
 2) With partial sort patch get_cheapest_fractional_path_for_pathkeys
 function is much more expensive to execute. With enable_partialsort = off it
 should be as cheap as without partial sort patch.

When it comes to planning time, I really don't think you should
bother. The planner enable_* settings are meant for troubleshooting,
debugging and learning about the planner. You should not expect people
to disable them in a production setting. It's not worth complicating
the code for that rare case.

This is stated in the documentation
(http://www.postgresql.org/docs/current/static/runtime-config-query.html)
and repeatedly on the mailing lists.

But some benchmarks of planning performance are certainly warranted.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-27 Thread Alexander Korotkov

On Tue, Jan 28, 2014 at 7:41 AM, Marti Raudsepp ma...@juffo.org wrote:

 On Mon, Jan 27, 2014 at 9:26 PM, Alexander Korotkov
 aekorot...@gmail.com wrote:
  For now, I have attempt to fix extra columns in mergejoin problem. It
 would
  be nice if you test it.

 Yes, it solves the test cases I was trying with, thanks.

  1) With enable_partialsort = off all mergejoin logic should behave as
  without partial sort patch.
  2) With partial sort patch get_cheapest_fractional_path_for_pathkeys
  function is much more expensive to execute. With enable_partialsort =
 off it
  should be as cheap as without partial sort patch.

 When it comes to planning time, I really don't think you should
 bother. The planner enable_* settings are meant for troubleshooting,
 debugging and learning about the planner. You should not expect people
 to disable them in a production setting. It's not worth complicating
 the code for that rare case.

 This is stated in the documentation
 (http://www.postgresql.org/docs/current/static/runtime-config-query.html)
 and repeatedly on the mailing lists.

 But some benchmarks of planning performance are certainly warranted.


I didn't test it, but I worry that overhead might be high.
If it's true then it could be like constraint_exclusion option which id off
by default because of planning overhead.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-01-26 Thread Marti Raudsepp

On Mon, Jan 20, 2014 at 2:43 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 Another changes in this version of patch:
 1) Applied patch to don't compare skipCols in tuplesort by Marti Raudsepp
 2) Adjusting sort bound after processing buckets.

Hi,

Here's a patch with some whitespace and coding style fixes for
partial-sort-6.patch

I tried to understand the mergejoin regression, but this code still
looks like Chinese to me. Can anyone else have a look at it?

Test case: 
http://www.postgresql.org/message-id/cabrt9rdd-p2rlrdhsmq8rcob46k4a5o+bgz_up2brgeeh4r...@mail.gmail.com
Original report:
http://www.postgresql.org/message-id/CABRT9RCLLUyJ=bkeB132aVA_mVNx5==lvvvqmvuqdgufztw...@mail.gmail.com

Regards,
Marti
From a3cedb922c5a12e43ee94b9d6f5a2aefba701708 Mon Sep 17 00:00:00 2001
From: Marti Raudsepp ma...@juffo.org
Date: Sun, 26 Jan 2014 16:25:45 +0200
Subject: [PATCH 1/2] Whitespace  coding style fixes

---
 src/backend/executor/nodeSort.c | 17 +
 src/backend/optimizer/path/costsize.c   |  8 
 src/backend/optimizer/path/pathkeys.c   | 18 +-
 src/backend/optimizer/plan/createplan.c |  2 +-
 src/backend/optimizer/plan/planner.c|  6 +++---
 src/backend/utils/sort/tuplesort.c  |  2 +-
 6 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index f38190d..2e50497 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -27,13 +27,14 @@
 static bool
 cmpSortSkipCols(SortState *node, TupleDesc tupDesc, HeapTuple a, TupleTableSlot *b)
 {
-	int n = ((Sort *)node-ss.ps.plan)-skipCols, i;
+	int			i,
+n = ((Sort *)node-ss.ps.plan)-skipCols;
 
 	for (i = 0; i  n; i++)
 	{
-		Datum datumA, datumB;
-		bool isnullA, isnullB;
-		AttrNumber attno = node-skipKeys[i].ssup_attno;
+		Datum		datumA, datumB;
+		bool		isnullA, isnullB;
+		AttrNumber	attno = node-skipKeys[i].ssup_attno;
 
 		datumA = heap_getattr(a, attno, tupDesc, isnullA);
 		datumB = slot_getattr(b, attno, isnullB);
@@ -147,7 +148,7 @@ ExecSort(SortState *node)
 		tuplesort_set_bound(tuplesortstate, node-bound - node-bound_Done);
 
 	/*
-	 * Put next group of tuples where skipCols sort values are equal to
+	 * Put next group of tuples where skipCols' sort values are equal to
 	 * tuplesort.
 	 */
 	for (;;)
@@ -177,10 +178,10 @@ ExecSort(SortState *node)
 			}
 			else
 			{
-bool cmp;
-cmp = cmpSortSkipCols(node, tupDesc, node-prev, slot);
+bool		equal;
+equal = cmpSortSkipCols(node, tupDesc, node-prev, slot);
 node-prev = ExecCopySlotTuple(slot);
-if (!cmp)
+if (!equal)
 	break;
 			}
 		}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 9e79c6d..3a18632 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1331,13 +1331,13 @@ cost_sort(Path *path, PlannerInfo *root,
 	 */
 	if (presorted_keys  0)
 	{
-		List *groupExprs = NIL;
-		ListCell *l;
-		int i = 0;
+		List	   *groupExprs = NIL;
+		ListCell   *l;
+		int			i = 0;
 
 		foreach(l, pathkeys)
 		{
-			PathKey *key = (PathKey *)lfirst(l);
+			PathKey	   *key = (PathKey *) lfirst(l);
 			EquivalenceMember *member = (EquivalenceMember *)
 lfirst(list_head(key-pk_eclass-ec_members));
 
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index 55d8ef4..1e1a09a 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -319,10 +319,9 @@ compare_pathkeys(List *keys1, List *keys2)
 int
 pathkeys_common(List *keys1, List *keys2)
 {
-	int n;
+	int 		n = 0;
 	ListCell   *key1,
 			   *key2;
-	n = 0;
 
 	forboth(key1, keys1, key2, keys2)
 	{
@@ -460,7 +459,7 @@ get_cheapest_fractional_path_for_pathkeys(List *paths,
 	num_groups = (double *)palloc(sizeof(double) * list_length(pathkeys));
 	foreach(l, pathkeys)
 	{
-		PathKey *key = (PathKey *)lfirst(l);
+		PathKey *key = (PathKey *) lfirst(l);
 		EquivalenceMember *member = (EquivalenceMember *)
 			lfirst(list_head(key-pk_eclass-ec_members));
 
@@ -1085,7 +1084,6 @@ find_mergeclauses_for_pathkeys(PlannerInfo *root,
 	List	   *mergeclauses = NIL;
 	ListCell   *i;
 	bool	   *used = (bool *)palloc0(sizeof(bool) * list_length(restrictinfos));
-	int			k;
 	List	   *unusedRestrictinfos = NIL;
 	List	   *usedPathkeys = NIL;
 
@@ -1103,6 +1101,7 @@ find_mergeclauses_for_pathkeys(PlannerInfo *root,
 		EquivalenceClass *pathkey_ec = pathkey-pk_eclass;
 		List	   *matched_restrictinfos = NIL;
 		ListCell   *j;
+		int			k = 0;
 
 		/*--
 		 * A mergejoin clause matches a pathkey if it has the same EC.
@@ -1140,7 +1139,6 @@ find_mergeclauses_for_pathkeys(PlannerInfo *root,
 		 * deal with the case in create_mergejoin_plan().
 		 *--
 		 */
-		k = 0;
 		foreach(j, restrictinfos)
 		{
 			RestrictInfo *rinfo = (RestrictInfo *) lfirst(j);
@@ -1182,7 +1180,9 @@

Re: [HACKERS] PoC: Partial sort

2014-01-20 Thread Alexander Korotkov

On Sun, Jan 19, 2014 at 5:57 AM, Andreas Karlsson andr...@proxel.se wrote:

 On 01/18/2014 08:13 PM, Jeremy Harris wrote:

 On 31/12/13 01:41, Andreas Karlsson wrote:

 On 12/29/2013 08:24 AM, David Rowley wrote:

 If it was possible to devise some way to reuse any
 previous tuplesortstate perhaps just inventing a reset method which
 clears out tuples, then we could see performance exceed the standard
 seqscan - sort. The code the way it is seems to lookup the sort
 functions from the syscache for each group then allocate some sort
 space, so quite a bit of time is also spent in palloc0() and pfree()

 If it was not possible to do this then maybe adding a cost to the number
 of sort groups would be better so that the optimization is skipped if
 there are too many sort groups.


 It should be possible. I have hacked a quick proof of concept for
 reusing the tuplesort state. Can you try it and see if the performance
 regression is fixed by this?

 One thing which have to be fixed with my patch is that we probably want
 to close the tuplesort once we have returned the last tuple from
 ExecSort().

 I have attached my patch and the incremental patch on Alexander's patch.


 How does this work in combination with randomAccess ?


 As far as I can tell randomAccess was broken by the partial sort patch
 even before my change since it would not iterate over multiple tuplesorts
 anyway.

 Alexander: Is this true or am I missing something?


Yes, I decided that Sort node shouldn't provide randomAccess in the case of
skipCols !=0. See assert in the beginning of ExecInitSort. I decided that
it would be better to add explicit materialize node rather than store extra
tuples in tuplesortstate each time.
I also adjusted ExecSupportsMarkRestore, ExecMaterializesOutput and
ExecMaterializesOutput to make planner believe so. I found path-pathtype
to be absolutely never T_Sort. Correct me if I'm wrong.

Another changes in this version of patch:
1) Applied patch to don't compare skipCols in tuplesort by Marti Raudsepp
2) Adjusting sort bound after processing buckets.

--
With best regards,
Alexander Korotkov.


partial-sort-6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-20 Thread Marti Raudsepp

Hi,

On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp ma...@juffo.org wrote:
 I've been trying it out in a few situations. I implemented a new
 enable_partialsort GUC to make it easier to turn on/off, this way it's a lot
 easier to test. The attached patch applies on top of partial-sort-5.patch

 I though about such option. Generally not because of testing convenience,
 but because of overhead of planning. This way you implement it is quite
 naive :)

I don't understand. I had another look at this and cost_sort still
seems like the best place to implement this, since that's where the
patch decides how many pre-sorted columns to use. Both mergejoin and
simple order-by plans call into it. If enable_partialsort=false then I
skip all pre-sorted options except full sort, making cost_sort behave
pretty much like it did before the patch.

I could change pathkeys_common to return 0, but that seems like a
generic function that shouldn't be tied to partialsort. The old code
paths called pathkeys_contained_in anyway, which has similar
complexity. (Apart for initial_cost_mergejoin, but that doesn't seem
special enough to make an exception for).

Or should I use?:
  enable_partialsort ? pathkeys_common(...) : 0

 For instance, merge join rely on partial sort which will be
 replaced with simple sort.

Are you saying that enable_partialsort=off should keep
partialsort-based mergejoins enabled?

Or are you saying that merge joins shouldn't use simple sort at all?
But merge join was already able to use full Sort nodes before your
patch.

What am I missing?

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Jeremy Harris


On 13/01/14 18:01, Alexander Korotkov wrote:

Thanks. It's included into attached version of patch. As wall as estimation
improvements, more comments and regression tests fix.


Would it be possible to totally separate the two sets of sort-keys,
only giving the non-index set to the tuplesort?  At present tuplesort
will, when it has a group to sort, make wasted compares on the
indexed set of keys before starting on the non-indexed set.
--
Cheers,
  Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Marti Raudsepp

Hi,

There's another small regression with this patch when used with expensive
comparison functions, such as long text fields.

If we go through all this trouble in cmpSortSkipCols to prove that the
first N sortkeys are equal, it would be nice if Tuplesort could skip their
comparisons entirely; that's another nice optimization this patch can
provide.

I've implemented that in the attached patch, which applies on top of your
partial-sort-5.patch

Should the Sort Key field in EXPLAIN output be changed as well? I'd say
no, I think that makes the partial sort steps harder to read.

Generate test data:
create table longtext as select (select repeat('a', 1000*100)) a,
generate_series(1,1000) i;
create index on longtext(a);

Unpatched (using your original partial-sort-5.patch):
=# explain analyze select * from longtext order by a, i limit 10;
 Limit  (cost=2.34..19.26 rows=10 width=1160) (actual time=13477.739..13477.756
rows=10 loops=1)
   -  Partial sort  (cost=2.34..1694.15 rows=1000 width=1160) (actual time=
13477.737..13477.742 rows=10 loops=1)
 Sort Key: a, i
 Presorted Key: a
 Sort Method: top-N heapsort  Memory: 45kB
 -  Index Scan using longtext_a_idx on longtext  (cost=0.65..1691.65
rows=1000 width=1160) (actual time=0.015..2.364 rows=1000 loops=1)
 Total runtime: 13478.158 ms
(7 rows)

=# set enable_indexscan=off;
=# explain analyze select * from longtext order by a, i limit 10;
 Limit  (cost=198.61..198.63 rows=10 width=1160) (actual
time=6970.439..6970.458 rows=10 loops=1)
   -  Sort  (cost=198.61..201.11 rows=1000 width=1160) (actual
time=6970.438..6970.444 rows=10 loops=1)
 Sort Key: a, i
 Sort Method: top-N heapsort  Memory: 45kB
 -  Seq Scan on longtext  (cost=0.00..177.00 rows=1000 width=1160)
(actual time=0.007..1.763 rows=1000 loops=1)
 Total runtime: 6970.491 ms

Patched:
=# explain analyze select * from longtext order by a, i ;
 Partial sort  (cost=2.34..1694.15 rows=1000 width=1160) (actual
time=0.024..4.603 rows=1000 loops=1)
   Sort Key: a, i
   Presorted Key: a
   Sort Method: quicksort  Memory: 27kB
   -  Index Scan using longtext_a_idx on longtext  (cost=0.65..1691.65
rows=1000 width=1160) (actual time=0.013..2.094 rows=1000 loops=1)
 Total runtime: 5.418 ms

Regards,
Marti
From fbc6c31528018bca64dc54f65e1cd292f8de482a Mon Sep 17 00:00:00 2001
From: Marti Raudsepp ma...@juffo.org
Date: Sat, 18 Jan 2014 19:16:15 +0200
Subject: [PATCH] Batched sort: skip comparisons for known-equal columns

---
 src/backend/executor/nodeSort.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index cf1f79e..5abda1d 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -125,10 +125,10 @@ ExecSort(SortState *node)
 	{
 		tuplesortstate = tuplesort_begin_heap(tupDesc,
 			  plannode-numCols - skipCols,
-			  (plannode-sortColIdx)[skipCols],
-			  plannode-sortOperators,
-			  plannode-collations,
-			  plannode-nullsFirst,
+			  (plannode-sortColIdx[skipCols]),
+			  (plannode-sortOperators[skipCols]),
+			  (plannode-collations[skipCols]),
+			  (plannode-nullsFirst[skipCols]),
 			  work_mem,
 			  node-randomAccess);
 		if (node-bounded)
-- 
1.8.5.3


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Marti Raudsepp

Funny, I just wrote a patch to do that some minutes ago (didn't see your
email yet).

http://www.postgresql.org/message-id/CABRT9RCK=wmFUYZdqU_+MOFW5PDevLxJmZ5B=etjjnubvya...@mail.gmail.com

Regards,
Marti



On Sat, Jan 18, 2014 at 7:10 PM, Jeremy Harris j...@wizmail.org wrote:

 On 13/01/14 18:01, Alexander Korotkov wrote:

 Thanks. It's included into attached version of patch. As wall as
 estimation
 improvements, more comments and regression tests fix.


 Would it be possible to totally separate the two sets of sort-keys,
 only giving the non-index set to the tuplesort?  At present tuplesort
 will, when it has a group to sort, make wasted compares on the
 indexed set of keys before starting on the non-indexed set.
 --
 Cheers,
   Jeremy




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Jeremy Harris


On 31/12/13 01:41, Andreas Karlsson wrote:

On 12/29/2013 08:24 AM, David Rowley wrote:

If it was possible to devise some way to reuse any
previous tuplesortstate perhaps just inventing a reset method which
clears out tuples, then we could see performance exceed the standard
seqscan - sort. The code the way it is seems to lookup the sort
functions from the syscache for each group then allocate some sort
space, so quite a bit of time is also spent in palloc0() and pfree()

If it was not possible to do this then maybe adding a cost to the number
of sort groups would be better so that the optimization is skipped if
there are too many sort groups.


It should be possible. I have hacked a quick proof of concept for
reusing the tuplesort state. Can you try it and see if the performance
regression is fixed by this?

One thing which have to be fixed with my patch is that we probably want
to close the tuplesort once we have returned the last tuple from
ExecSort().

I have attached my patch and the incremental patch on Alexander's patch.


How does this work in combination with randomAccess ?
--
Thanks,
   Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Marti Raudsepp

On Sat, Jan 18, 2014 at 7:22 PM, Marti Raudsepp ma...@juffo.org wrote:
 Total runtime: 5.418 ms

Oops, shouldn't have rushed this. Clearly the timings should have
tipped me off that it's broken. I didn't notice that cmpSortSkipCols
was re-using tuplesort's sortkeys.

Here's a patch that actually works; I added a new skipKeys attribute
to SortState. I had to extract the SortSupport-creation code from
tuplesort_begin_heap to a new function; but that's fine, because it
was already duplicated in ExecInitMergeAppend too.

I reverted the addition of tuplesort_get_sortkeys, which is not needed now.

Now the timings are:
Unpatched partial sort: 13478.158 ms
Full sort: 6802.063 ms
Patched partial sort: 6618.962 ms

Regards,
Marti
From 7d9f34c09e7836504725ff11be7e63a2fc438ae9 Mon Sep 17 00:00:00 2001
From: Marti Raudsepp ma...@juffo.org
Date: Mon, 13 Jan 2014 20:38:45 +0200
Subject: [PATCH] Partial sort: skip comparisons for known-equal columns

---
 src/backend/executor/nodeMergeAppend.c | 18 +-
 src/backend/executor/nodeSort.c| 26 +-
 src/backend/utils/sort/sortsupport.c   | 29 +
 src/backend/utils/sort/tuplesort.c | 31 +--
 src/include/nodes/execnodes.h  |  1 +
 src/include/utils/sortsupport.h|  3 +++
 src/include/utils/tuplesort.h  |  2 --
 7 files changed, 60 insertions(+), 50 deletions(-)

diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 74fa40d..db6ec23 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -126,19 +126,11 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 	 * initialize sort-key information
 	 */
 	mergestate-ms_nkeys = node-numCols;
-	mergestate-ms_sortkeys = palloc0(sizeof(SortSupportData) * node-numCols);
-
-	for (i = 0; i  node-numCols; i++)
-	{
-		SortSupport sortKey = mergestate-ms_sortkeys + i;
-
-		sortKey-ssup_cxt = CurrentMemoryContext;
-		sortKey-ssup_collation = node-collations[i];
-		sortKey-ssup_nulls_first = node-nullsFirst[i];
-		sortKey-ssup_attno = node-sortColIdx[i];
-
-		PrepareSortSupportFromOrderingOp(node-sortOperators[i], sortKey);
-	}
+	mergestate-ms_sortkeys = MakeSortSupportKeys(mergestate-ms_nkeys,
+  node-sortColIdx,
+  node-sortOperators,
+  node-collations,
+  node-nullsFirst);
 
 	/*
 	 * initialize to show we have not run the subplans yet
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 55cdb05..7645645 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -28,20 +28,19 @@ static bool
 cmpSortSkipCols(SortState *node, TupleDesc tupDesc, HeapTuple a, TupleTableSlot *b)
 {
 	int n = ((Sort *)node-ss.ps.plan)-skipCols, i;
-	SortSupport sortKeys = tuplesort_get_sortkeys(node-tuplesortstate);
 
 	for (i = 0; i  n; i++)
 	{
 		Datum datumA, datumB;
 		bool isnullA, isnullB;
-		AttrNumber attno = sortKeys[i].ssup_attno;
+		AttrNumber attno = node-skipKeys[i].ssup_attno;
 
 		datumA = heap_getattr(a, attno, tupDesc, isnullA);
 		datumB = slot_getattr(b, attno, isnullB);
 
 		if (ApplySortComparator(datumA, isnullA,
-  datumB, isnullB,
-  sortKeys[i]))
+datumB, isnullB,
+node-skipKeys[i]))
 			return false;
 	}
 	return true;
@@ -123,12 +122,21 @@ ExecSort(SortState *node)
 		tuplesort_reset((Tuplesortstate *) node-tuplesortstate);
 	else
 	{
+		/* Support structures for cmpSortSkipCols - already sorted columns */
+		if (skipCols)
+			node-skipKeys = MakeSortSupportKeys(skipCols,
+ plannode-sortColIdx,
+ plannode-sortOperators,
+ plannode-collations,
+ plannode-nullsFirst);
+
+		/* Only pass on remaining columns that are unsorted */
 		tuplesortstate = tuplesort_begin_heap(tupDesc,
-			  plannode-numCols,
-			  plannode-sortColIdx,
-			  plannode-sortOperators,
-			  plannode-collations,
-			  plannode-nullsFirst,
+			  plannode-numCols - skipCols,
+			  (plannode-sortColIdx[skipCols]),
+			  (plannode-sortOperators[skipCols]),
+			  (plannode-collations[skipCols]),
+			  (plannode-nullsFirst[skipCols]),
 			  work_mem,
 			  node-randomAccess);
 		if (node-bounded)
diff --git a/src/backend/utils/sort/sortsupport.c b/src/backend/utils/sort/sortsupport.c
index 347f448..df82f5f 100644
--- a/src/backend/utils/sort/sortsupport.c
+++ b/src/backend/utils/sort/sortsupport.c
@@ -85,6 +85,35 @@ PrepareSortSupportComparisonShim(Oid cmpFunc, SortSupport ssup)
 }
 
 /*
+ * Build an array of SortSupportData structures from separated arrays.
+ */
+SortSupport
+MakeSortSupportKeys(int nkeys, AttrNumber *attNums,
+	Oid *sortOperators, Oid *sortCollations,
+	bool *nullsFirstFlags)
+{
+	SortSupport

Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Andreas Karlsson


On 01/18/2014 08:13 PM, Jeremy Harris wrote:

On 31/12/13 01:41, Andreas Karlsson wrote:

On 12/29/2013 08:24 AM, David Rowley wrote:

If it was possible to devise some way to reuse any
previous tuplesortstate perhaps just inventing a reset method which
clears out tuples, then we could see performance exceed the standard
seqscan - sort. The code the way it is seems to lookup the sort
functions from the syscache for each group then allocate some sort
space, so quite a bit of time is also spent in palloc0() and pfree()

If it was not possible to do this then maybe adding a cost to the number
of sort groups would be better so that the optimization is skipped if
there are too many sort groups.


It should be possible. I have hacked a quick proof of concept for
reusing the tuplesort state. Can you try it and see if the performance
regression is fixed by this?

One thing which have to be fixed with my patch is that we probably want
to close the tuplesort once we have returned the last tuple from
ExecSort().

I have attached my patch and the incremental patch on Alexander's patch.


How does this work in combination with randomAccess ?


As far as I can tell randomAccess was broken by the partial sort patch 
even before my change since it would not iterate over multiple 
tuplesorts anyway.


Alexander: Is this true or am I missing something?

--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-16 Thread Jeremy Harris


On 22/12/13 20:26, Alexander Korotkov wrote:

On Sat, Dec 14, 2013 at 6:30 PM, Jeremy Harris j...@wizmail.org wrote:


On 14/12/13 12:54, Andres Freund wrote:

Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.



Eg:  http://www.postgresql.org/message-id/5291467e.6070...@wizmail.org

Maybe Alexander and I should bash our heads together.



Partial sort patch is mostly optimizer/executor improvement rather than
improvement of sort algorithm itself.


I finally got as far as understanding Alexander's cleverness, and it
does make the performance advantage (on partially-sorted input) of the
merge-sort irrelevant.

There's a slight tradeoff possible between the code complexity of
the chunking code front-ending the sorter and just using the
enhanced sorter.  The chunking does reduce the peak memory usage
quite nicely too.

The implementation of the chunker does O(n) compares using the
keys of the feed-stream index, to identify the chunk boundaries.
Would it be possible to get this information from the Index Scan?
--
Cheers,
   Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-14 Thread Alexander Korotkov

Hi!

On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp ma...@juffo.org wrote:

 First, thanks a lot for working on this feature. This PostgreSQL
 shortcoming crops up in all the time in web applications that implement
 paging by multiple sorted columns.


Thanks!

I've been trying it out in a few situations. I implemented a new
 enable_partialsort GUC to make it easier to turn on/off, this way it's a
 lot easier to test. The attached patch applies on top of
 partial-sort-5.patch


I though about such option. Generally not because of testing convenience,
but because of overhead of planning. This way you implement it is quite
naive :) For instance, merge join rely on partial sort which will be
replaced with simple sort.


 I will spend more time reviewing the patch, but some of this planner code
 is over my head. If there's any way I can help to make sure this lands in
 the next version, let me know.

 

 The patch performs just as well as I would expect it to:

 marti=# select ac.name, r.name from artist_credit ac join release r on (
 ac.id=r.artist_credit) order by ac.name, r.name limit 1000;
 Time: 9.830 ms
 marti=# set enable_partialsort = off;
 marti=# select ac.name, r.name from artist_credit ac join release r on (
 ac.id=r.artist_credit) order by ac.name, r.name limit 1000;
 Time: 1442.815 ms

 A difference of almost 150x!

 There's a missed opportunity in that the code doesn't consider pushing new
 Sort steps into subplans. For example, if there's no index on
 language(name) then this query cannot take advantage partial sorts:

 marti=# explain select l.name, r.name from language l join release r on (
 l.id=r.language) order by l.name, r.name limit 1000;
  Limit  (cost=123203.20..123205.70 rows=1000 width=32)
-  Sort  (cost=123203.20..126154.27 rows=1180430 width=32)
  Sort Key: l.name, r.name
  -  Hash Join  (cost=229.47..58481.49 rows=1180430 width=32)
Hash Cond: (r.language = l.id)
-  Seq Scan on release r  (cost=0.00..31040.10
 rows=1232610 width=26)
-  Hash  (cost=131.43..131.43 rows=7843 width=14)
  -  Seq Scan on language l  (cost=0.00..131.43
 rows=7843 width=14)

 But because there are only so few languages, it would be a lot faster to
 sort languages in advance and then do partial sort:
  Limit  (rows=1000 width=31)
-  Partial sort  (rows=1180881 width=31)
  Sort Key: l.name, r.name
  Presorted Key: l.name
  -  Nested Loop  (rows=1180881 width=31)
-  Sort  (rows=7843 width=10)
  Sort Key: name
  -  Seq Scan on language  (rows=7843 width=14)
-  Index Scan using release_language_idx on release r
 (rows=11246 width=25)
  Index Cond: (language = l.id)

 Even an explicit sorted CTE cannot take advantage of partial sorts:
 marti=# explain with sorted_lang as (select id, name from language order
 by name)
 marti-# select l.name, r.name from sorted_lang l join release r on 
 (l.id=r.language)
 order by l.name, r.name limit 1000;
  Limit  (cost=3324368.83..3324371.33 rows=1000 width=240)
CTE sorted_lang
  -  Sort  (cost=638.76..658.37 rows=7843 width=14)
Sort Key: language.name
-  Seq Scan on language  (cost=0.00..131.43 rows=7843 width=14)
-  Sort  (cost=3323710.46..3439436.82 rows=46290543 width=240)
  Sort Key: l.name, r.name
  -  Merge Join  (cost=664.62..785649.92 rows=46290543 width=240)
Merge Cond: (r.language = l.id)
-  Index Scan using release_language_idx on release r
 (cost=0.43..87546.06 rows=1232610 width=26)
-  Sort  (cost=664.19..683.80 rows=7843 width=222)
  Sort Key: l.id
  -  CTE Scan on sorted_lang l  (cost=0.00..156.86
 rows=7843 width=222)

 But even with these limitations, this will easily be the killer feature of
 the next release, for me at least.


I see. But I don't think it can be achieved by small changes in planner.
Moreover, I didn't check but I think if you remove ordering by r.name you
will still not get sorting languages in the inner node. So, this problem is
not directly related to partial sort.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-01-14 Thread Marti Raudsepp

On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov aekorot...@gmail.com
wrote:
 I implemented a new
 enable_partialsort GUC to make it easier to turn on/off

 I though about such option. Generally not because of testing convenience,
 but because of overhead of planning. This way you implement it is quite
 naive :) For instance, merge join rely on partial sort which will be
 replaced with simple sort.

Oh, this actually highlights a performance regression with the partial sort
patch. I assumed the planner will discard the full sort because of higher
costs, but it looks like the new code always assumes that a Partial sort
will be cheaper than a Join Filter without considering costs. When doing a
join USING (unique_indexed_value, something), the new plan is significantly
worse.

Unpatched:
marti=# explain analyze select * from release a join release b using (id,
name);
 Merge Join  (cost=0.85..179810.75 rows=12 width=158) (actual
time=0.011..1279.596 rows=1232610 loops=1)
   Merge Cond: (a.id = b.id)
   Join Filter: ((a.name)::text = (b.name)::text)
   -  Index Scan using release_id_idx on release a  (cost=0.43..79120.04
rows=1232610 width=92) (actual time=0.005..211.928 rows=1232610 loops=1)
   -  Index Scan using release_id_idx on release b  (cost=0.43..79120.04
rows=1232610 width=92) (actual time=0.004..371.592 rows=1232610 loops=1)
 Total runtime: 1309.049 ms

Patched:
 Merge Join  (cost=0.98..179810.87 rows=12 width=158) (actual
time=0.037..5034.158 rows=1232610 loops=1)
   Merge Cond: ((a.id = b.id) AND ((a.name)::text = (b.name)::text))
   -  Partial sort  (cost=0.49..82201.56 rows=1232610 width=92) (actual
time=0.013..955.938 rows=1232610 loops=1)
 Sort Key: a.id, a.name
 Presorted Key: a.id
 Sort Method: quicksort  Memory: 25kB
 -  Index Scan using release_id_idx on release a
 (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.007..449.332
rows=1232610 loops=1)
   -  Materialize  (cost=0.49..85283.09 rows=1232610 width=92) (actual
time=0.019..1352.377 rows=1232610 loops=1)
 -  Partial sort  (cost=0.49..82201.56 rows=1232610 width=92)
(actual time=0.018..1223.251 rows=1232610 loops=1)
   Sort Key: b.id, b.name
   Presorted Key: b.id
   Sort Method: quicksort  Memory: 25kB
   -  Index Scan using release_id_idx on release b
 (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.004..597.258
rows=1232610 loops=1)
 Total runtime: 5166.906 ms


There's another wishlist kind of thing with top-N heapsort bounds; if I
do a query with LIMIT 1000 then every sort batch has Tuplesortstate.bound
set to 1000, but it could be reduced after each batch. If the first batch
is 900 rows then the 2nd batch only needs the top 100 rows at most.

Also, I find the name partial sort a bit confusing; this feature is not
actually sorting *partially*, it's finishing the sort of partially-sorted
data. Perhaps batched sort would explain the feature better? Because it
does the sort in multiple batches instead of all at once. But maybe that's
just me.

Regards,
Marti

Re: [HACKERS] PoC: Partial sort

2014-01-14 Thread Alexander Korotkov

On Tue, Jan 14, 2014 at 11:16 PM, Marti Raudsepp ma...@juffo.org wrote:

 On Tue, Jan 14, 2014 at 5:49 PM, Alexander Korotkov aekorot...@gmail.com
 wrote:
  I implemented a new
  enable_partialsort GUC to make it easier to turn on/off

  I though about such option. Generally not because of testing convenience,
  but because of overhead of planning. This way you implement it is quite
  naive :) For instance, merge join rely on partial sort which will be
  replaced with simple sort.

 Oh, this actually highlights a performance regression with the partial
 sort patch. I assumed the planner will discard the full sort because of
 higher costs, but it looks like the new code always assumes that a Partial
 sort will be cheaper than a Join Filter without considering costs. When
 doing a join USING (unique_indexed_value, something), the new plan is
 significantly worse.

 Unpatched:
 marti=# explain analyze select * from release a join release b using (id,
 name);
  Merge Join  (cost=0.85..179810.75 rows=12 width=158) (actual
 time=0.011..1279.596 rows=1232610 loops=1)
Merge Cond: (a.id = b.id)
Join Filter: ((a.name)::text = (b.name)::text)
-  Index Scan using release_id_idx on release a  (cost=0.43..79120.04
 rows=1232610 width=92) (actual time=0.005..211.928 rows=1232610 loops=1)
-  Index Scan using release_id_idx on release b  (cost=0.43..79120.04
 rows=1232610 width=92) (actual time=0.004..371.592 rows=1232610 loops=1)
  Total runtime: 1309.049 ms

 Patched:
  Merge Join  (cost=0.98..179810.87 rows=12 width=158) (actual
 time=0.037..5034.158 rows=1232610 loops=1)
Merge Cond: ((a.id = b.id) AND ((a.name)::text = (b.name)::text))
-  Partial sort  (cost=0.49..82201.56 rows=1232610 width=92) (actual
 time=0.013..955.938 rows=1232610 loops=1)
  Sort Key: a.id, a.name
  Presorted Key: a.id
  Sort Method: quicksort  Memory: 25kB
  -  Index Scan using release_id_idx on release a
  (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.007..449.332
 rows=1232610 loops=1)
-  Materialize  (cost=0.49..85283.09 rows=1232610 width=92) (actual
 time=0.019..1352.377 rows=1232610 loops=1)
  -  Partial sort  (cost=0.49..82201.56 rows=1232610 width=92)
 (actual time=0.018..1223.251 rows=1232610 loops=1)
Sort Key: b.id, b.name
Presorted Key: b.id
Sort Method: quicksort  Memory: 25kB
-  Index Scan using release_id_idx on release b
  (cost=0.43..79120.04 rows=1232610 width=92) (actual time=0.004..597.258
 rows=1232610 loops=1)
  Total runtime: 5166.906 ms
 


Interesting. Could you share the dataset?

There's another wishlist kind of thing with top-N heapsort bounds; if I
 do a query with LIMIT 1000 then every sort batch has Tuplesortstate.bound
 set to 1000, but it could be reduced after each batch. If the first batch
 is 900 rows then the 2nd batch only needs the top 100 rows at most.


Right. Just didn't implement it yet.


 Also, I find the name partial sort a bit confusing; this feature is not
 actually sorting *partially*, it's finishing the sort of partially-sorted
 data. Perhaps batched sort would explain the feature better? Because it
 does the sort in multiple batches instead of all at once. But maybe that's
 just me.


I'm not sure. For me batched sort sounds like we're going to sort in
batch something that we sorted separately before. Probably I'm wrong
because I'm far from native english :)

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2014-01-14 Thread Marti Raudsepp

On Tue, Jan 14, 2014 at 9:28 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Tue, Jan 14, 2014 at 11:16 PM, Marti Raudsepp ma...@juffo.org wrote:

 Oh, this actually highlights a performance regression with the partial
 sort patch.


 Interesting. Could you share the dataset?


It occurs with many datasets if work_mem is sufficiently low (10MB in my
case). Here's a quicker way to reproduce a similar issue:

create table foo as select i, i as j from generate_series(1,1000) i;
create index on foo(i);
explain analyze select * from foo a join foo b using (i, j);

The real data is from the release table from MusicBrainz database dump:
https://musicbrainz.org/doc/MusicBrainz_Database/Download . It's nontrivial
to set up though, so if you still need the real data, I can upload a pgdump
for you.

Regards,
Marti

Re: [HACKERS] PoC: Partial sort

2014-01-13 Thread Alexander Korotkov

On Tue, Dec 31, 2013 at 5:41 AM, Andreas Karlsson andr...@proxel.se wrote:

 On 12/29/2013 08:24 AM, David Rowley wrote:

 If it was possible to devise some way to reuse any
 previous tuplesortstate perhaps just inventing a reset method which
 clears out tuples, then we could see performance exceed the standard
 seqscan - sort. The code the way it is seems to lookup the sort
 functions from the syscache for each group then allocate some sort
 space, so quite a bit of time is also spent in palloc0() and pfree()

 If it was not possible to do this then maybe adding a cost to the number
 of sort groups would be better so that the optimization is skipped if
 there are too many sort groups.


 It should be possible. I have hacked a quick proof of concept for reusing
 the tuplesort state. Can you try it and see if the performance regression
 is fixed by this?

 One thing which have to be fixed with my patch is that we probably want to
 close the tuplesort once we have returned the last tuple from ExecSort().

 I have attached my patch and the incremental patch on Alexander's patch.


Thanks. It's included into attached version of patch. As wall as estimation
improvements, more comments and regression tests fix.

--
With best regards,
Alexander Korotkov.


partial-sort-5.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2014-01-13 Thread Marti Raudsepp

Hi Alexander,

First, thanks a lot for working on this feature. This PostgreSQL
shortcoming crops up in all the time in web applications that implement
paging by multiple sorted columns.

I've been trying it out in a few situations. I implemented a new
enable_partialsort GUC to make it easier to turn on/off, this way it's a
lot easier to test. The attached patch applies on top of
partial-sort-5.patch

I will spend more time reviewing the patch, but some of this planner code
is over my head. If there's any way I can help to make sure this lands in
the next version, let me know.



The patch performs just as well as I would expect it to:

marti=# select ac.name, r.name from artist_credit ac join release r on (
ac.id=r.artist_credit) order by ac.name, r.name limit 1000;
Time: 9.830 ms
marti=# set enable_partialsort = off;
marti=# select ac.name, r.name from artist_credit ac join release r on (
ac.id=r.artist_credit) order by ac.name, r.name limit 1000;
Time: 1442.815 ms

A difference of almost 150x!

There's a missed opportunity in that the code doesn't consider pushing new
Sort steps into subplans. For example, if there's no index on
language(name) then this query cannot take advantage partial sorts:

marti=# explain select l.name, r.name from language l join release r on (
l.id=r.language) order by l.name, r.name limit 1000;
 Limit  (cost=123203.20..123205.70 rows=1000 width=32)
   -  Sort  (cost=123203.20..126154.27 rows=1180430 width=32)
 Sort Key: l.name, r.name
 -  Hash Join  (cost=229.47..58481.49 rows=1180430 width=32)
   Hash Cond: (r.language = l.id)
   -  Seq Scan on release r  (cost=0.00..31040.10 rows=1232610
width=26)
   -  Hash  (cost=131.43..131.43 rows=7843 width=14)
 -  Seq Scan on language l  (cost=0.00..131.43
rows=7843 width=14)

But because there are only so few languages, it would be a lot faster to
sort languages in advance and then do partial sort:
 Limit  (rows=1000 width=31)
   -  Partial sort  (rows=1180881 width=31)
 Sort Key: l.name, r.name
 Presorted Key: l.name
 -  Nested Loop  (rows=1180881 width=31)
   -  Sort  (rows=7843 width=10)
 Sort Key: name
 -  Seq Scan on language  (rows=7843 width=14)
   -  Index Scan using release_language_idx on release r
(rows=11246 width=25)
 Index Cond: (language = l.id)

Even an explicit sorted CTE cannot take advantage of partial sorts:
marti=# explain with sorted_lang as (select id, name from language order by
name)
marti-# select l.name, r.name from sorted_lang l join release r on
(l.id=r.language)
order by l.name, r.name limit 1000;
 Limit  (cost=3324368.83..3324371.33 rows=1000 width=240)
   CTE sorted_lang
 -  Sort  (cost=638.76..658.37 rows=7843 width=14)
   Sort Key: language.name
   -  Seq Scan on language  (cost=0.00..131.43 rows=7843 width=14)
   -  Sort  (cost=3323710.46..3439436.82 rows=46290543 width=240)
 Sort Key: l.name, r.name
 -  Merge Join  (cost=664.62..785649.92 rows=46290543 width=240)
   Merge Cond: (r.language = l.id)
   -  Index Scan using release_language_idx on release r
(cost=0.43..87546.06 rows=1232610 width=26)
   -  Sort  (cost=664.19..683.80 rows=7843 width=222)
 Sort Key: l.id
 -  CTE Scan on sorted_lang l  (cost=0.00..156.86
rows=7843 width=222)

But even with these limitations, this will easily be the killer feature of
the next release, for me at least.

Regards,
Marti


On Mon, Jan 13, 2014 at 8:01 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Tue, Dec 31, 2013 at 5:41 AM, Andreas Karlsson andr...@proxel.sewrote:

 On 12/29/2013 08:24 AM, David Rowley wrote:

 If it was possible to devise some way to reuse any
 previous tuplesortstate perhaps just inventing a reset method which
 clears out tuples, then we could see performance exceed the standard
 seqscan - sort. The code the way it is seems to lookup the sort
 functions from the syscache for each group then allocate some sort
 space, so quite a bit of time is also spent in palloc0() and pfree()

 If it was not possible to do this then maybe adding a cost to the number
 of sort groups would be better so that the optimization is skipped if
 there are too many sort groups.


 It should be possible. I have hacked a quick proof of concept for reusing
 the tuplesort state. Can you try it and see if the performance regression
 is fixed by this?

 One thing which have to be fixed with my patch is that we probably want
 to close the tuplesort once we have returned the last tuple from ExecSort().

 I have attached my patch and the incremental patch on Alexander's patch.


 Thanks. It's included into attached version of patch. As wall as
 estimation improvements, more comments and regression tests fix.

 --
 With best regards,
 Alexander Korotkov.


 --
 Sent via

Re: [HACKERS] PoC: Partial sort

2013-12-31 Thread David Rowley

On Tue, Dec 31, 2013 at 2:41 PM, Andreas Karlsson andr...@proxel.se wrote:

 On 12/29/2013 08:24 AM, David Rowley wrote:

 If it was possible to devise some way to reuse any
 previous tuplesortstate perhaps just inventing a reset method which
 clears out tuples, then we could see performance exceed the standard
 seqscan - sort. The code the way it is seems to lookup the sort
 functions from the syscache for each group then allocate some sort
 space, so quite a bit of time is also spent in palloc0() and pfree()

 If it was not possible to do this then maybe adding a cost to the number
 of sort groups would be better so that the optimization is skipped if
 there are too many sort groups.


 It should be possible. I have hacked a quick proof of concept for reusing
 the tuplesort state. Can you try it and see if the performance regression
 is fixed by this?

 One thing which have to be fixed with my patch is that we probably want to
 close the tuplesort once we have returned the last tuple from ExecSort().

 I have attached my patch and the incremental patch on Alexander's patch.


Thanks, the attached is about 5 times faster than it was previously with my
test case upthread.

The times now look like:

No pre-sortable index:
Total runtime: 86.278 ms

With pre-sortable index with partial sorting
Total runtime: 47.500 ms

With the query where there is no index the sort remained in memory.

I spent some time trying to find a case where the partial sort is slower
than the seqscan - sort. The only places partial sort seems slower are
when the number of estimated sort groups are around the crossover point
where the planner would be starting to think about performing a seqscan -
sort instead. I'm thinking right now that it's not worth raising the costs
around this as the partial sort is less likely to become a disk sort than
the full sort is.

I'll keep going with trying to break it.

Regards

David Rowley



 --
 Andreas Karlsson

Re: [HACKERS] PoC: Partial sort

2013-12-31 Thread Alvaro Herrera

David Rowley escribió:

 I was about to test it tonight, but I'm having trouble getting the patch to
 compile... I'm really wondering which compiler you are using as it seems
 you're declaring your variables in some strange places.. See nodeSort.c
 line 101. These variables are declared after there has been an if statement
 in the same scope. That's not valid in C. (The patch did however apply
 without any complaints).

AFAIR C99 allows mixed declarations and code.  Visual Studio only
implements C89 though, which is why it fails to compile there.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2013-12-31 Thread Andreas Karlsson


On 12/28/2013 04:51 PM, Alexander Korotkov wrote:

I've compiled it with clang. Yeah, there was mixed declarations. I've
rechecked it with gcc, now it gives no warnings. I didn't try it with
visual studio, but I hope it will be OK.


I looked at this version of the patch and noticed a possibility for 
improvement. You could decrement the bound for the tuplesort after every 
completed sort. Otherwise the optimizations for small limits wont apply 
to partial sort.


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2013-12-28 Thread David Rowley

On Sat, Dec 28, 2013 at 9:28 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Tue, Dec 24, 2013 at 6:02 AM, Andreas Karlsson andr...@proxel.sewrote:
 Attached revision of patch implements partial sort usage in merge joins.



I'm looking forward to doing a bit of testing on this patch. I think it is
a really useful feature to get a bit more out of existing indexes.

I was about to test it tonight, but I'm having trouble getting the patch to
compile... I'm really wondering which compiler you are using as it seems
you're declaring your variables in some strange places.. See nodeSort.c
line 101. These variables are declared after there has been an if statement
in the same scope. That's not valid in C. (The patch did however apply
without any complaints).

Here's a list of the errors I get when compiling with visual studios on
windows.

D:\Postgres\c\pgsql.sln (default target) (1) -
D:\Postgres\c\postgres.vcxproj (default target) (2) -
(ClCompile target) -
  src\backend\executor\nodeSort.c(101): error C2275: 'Sort' : illegal use
of this type as an expression [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(101): error C2065: 'plannode' :
undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(102): error C2275: 'PlanState' : illegal
use of this type as an expression [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(102): error C2065: 'outerNode' :
undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(103): error C2275: 'TupleDesc' : illegal
use of this type as an expression [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(103): error C2146: syntax error : missing
';' before identifier 'tupDesc' [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(103): error C2065: 'tupDesc' : undeclared
identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(120): error C2065: 'outerNode' :
undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(121): error C2065: 'tupDesc' : undeclared
identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(121): error C2065: 'outerNode' :
undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(125): error C2065: 'tupDesc' : undeclared
identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(126): error C2065: 'plannode' :
undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(126): error C2223: left of '-numCols'
must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(127): error C2065: 'plannode' :
undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(127): error C2223: left of '-sortColIdx'
must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(128): error C2065: 'plannode' :
undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(128): error C2223: left of
'-sortOperators' must point to struct/union
[D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(129): error C2065: 'plannode' :
undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(129): error C2223: left of '-collations'
must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(130): error C2065: 'plannode' :
undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(130): error C2223: left of '-nullsFirst'
must point to struct/union [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(132): error C2198: 'tuplesort_begin_heap'
: too few arguments for call [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(143): error C2065: 'outerNode' :
undeclared identifier [D:\Postgres\c\postgres.vcxproj]
  src\backend\executor\nodeSort.c(167): error C2065: 'tupDesc' : undeclared
identifier [D:\Postgres\c\postgres.vcxproj]

13 Warning(s)
24 Error(s)


Regards

David Rowley

Re: [HACKERS] PoC: Partial sort

2013-12-28 Thread David Rowley

On Sun, Dec 29, 2013 at 4:51 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 I've compiled it with clang. Yeah, there was mixed declarations. I've
 rechecked it with gcc, now it gives no warnings. I didn't try it with
 visual studio, but I hope it will be OK.


Thanks for the patch. It now compiles without any problems.
I've been doing a bit of testing with the patch testing a few different
workloads. One thing that I've found is that in my test case when the table
only contains 1 tuple for any given presort columns that the query is
actually slower than when there are say 100 tuples to sort for any given
presort group.

Here is my test case:

DROP TABLE IF EXISTS temperature_readings;

CREATE TABLE temperature_readings (
  readingid SERIAL NOT NULL,
  timestamp TIMESTAMP NOT NULL,
  locationid INT NOT NULL,
  temperature INT NOT NULL,
  PRIMARY KEY (readingid)
);

INSERT INTO temperature_readings (timestamp,locationid,temperature)
SELECT ts.timestamp, loc.locationid, -10 + random() * 40
FROM generate_series('1900-04-01','2000-04-01','1 day'::interval)
ts(timestamp)
CROSS JOIN generate_series(1,1) loc(locationid);

VACUUM ANALYZE temperature_readings;

-- Warm buffers
SELECT AVG(temperature) FROM temperature_readings;

explain (buffers, analyze) select * from temperature_readings order by
timestamp,locationid; -- (seqscan - sort) 70.805ms

-- create an index to allow presorting on timestamp.
CREATE INDEX temperature_readings_timestamp_idx ON temperature_readings
(timestamp);

-- warm index buffers
SELECT COUNT(*) FROM (SELECT timestamp FROM temperature_readings ORDER BY
timestamp) c;

explain (buffers, analyze) select * from temperature_readings order by
timestamp,locationid; -- index scan - partial sort 253.032ms

The first query without the index to presort on runs in 70.805 ms, the 2nd
query uses the index to presort and runs in 253.032 ms.

I ran the code through a performance profiler and found that about 80% of
the time is spent in tuplesort_end and tuplesort_begin_heap.

If it was possible to devise some way to reuse any previous tuplesortstate
perhaps just inventing a reset method which clears out tuples, then we
could see performance exceed the standard seqscan - sort. The code the way
it is seems to lookup the sort functions from the syscache for each group
then allocate some sort space, so quite a bit of time is also spent in
palloc0() and pfree()

If it was not possible to do this then maybe adding a cost to the number of
sort groups would be better so that the optimization is skipped if there
are too many sort groups.

Regards

David Rowley






 --
 With best regards,
 Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2013-12-23 Thread Andreas Karlsson


On 12/22/2013 04:38 PM, Alexander Korotkov wrote:

postgres=# explain analyze select * from test order by v1, id limit 10;
   QUERY
PLAN
---
  Limit  (cost=11441.77..11442.18 rows=10 width=12) (actual
time=79.980..79.982 rows=10 loops=1)
-  Partial sort  (cost=11441.77..53140.44 rows=100 width=12)
(actual time=79.978..79.978 rows=10 loops=1)
  Sort Key: v1, id
  Presorted Key: v1
  Sort Method: top-N heapsort  Memory: 25kB
  -  Index Scan using test_v1_idx on test  (cost=0.42..47038.83
rows=100 width=12) (actual time=0.031..38.275 rows=100213 loops=1)
  Total runtime: 81.786 ms
(7 rows)


Have you thought about how do you plan to print which sort method and 
how much memory was used? Several different sort methods may have been 
use in the query. Should the largest amount of memory/disk be printed?



However, work with joins needs more improvements.


That would be really nice to have, but the patch seems useful even 
without the improvements to joins.


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2013-12-22 Thread Martijn van Oosterhout

On Sun, Dec 22, 2013 at 07:38:05PM +0400, Alexander Korotkov wrote:
 Hi!
 
 Next revision. It expected to do better work with optimizer. It introduces
 presorted_keys argument of cost_sort function which represent number of
 keys already sorted in Path. Then this function uses estimate_num_groups to
 estimate number of groups with different values of presorted keys and
 assumes that dataset is uniformly divided by
 groups. get_cheapest_fractional_path_for_pathkeys tries to select the path
 matching most part of path keys.
 You can see it's working pretty good on single table queries.

Nice work! The plans look good and the calculated costs seem sane also.

I suppose the problem with the joins is generating the pathkeys?

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] PoC: Partial sort

2013-12-22 Thread Alexander Korotkov

On Sun, Dec 22, 2013 at 8:12 PM, Martijn van Oosterhout
klep...@svana.orgwrote:

 On Sun, Dec 22, 2013 at 07:38:05PM +0400, Alexander Korotkov wrote:
  Hi!
 
  Next revision. It expected to do better work with optimizer. It
 introduces
  presorted_keys argument of cost_sort function which represent number of
  keys already sorted in Path. Then this function uses estimate_num_groups
 to
  estimate number of groups with different values of presorted keys and
  assumes that dataset is uniformly divided by
  groups. get_cheapest_fractional_path_for_pathkeys tries to select the
 path
  matching most part of path keys.
  You can see it's working pretty good on single table queries.

 Nice work! The plans look good and the calculated costs seem sane also.

 I suppose the problem with the joins is generating the pathkeys?


In general, problem is that partial sort is alternative to do less
restrictive merge join and filter it's results. As far as I can see, taking
care about it require some rework of merge optimization. For now, I didn't
get what it's going to look like. I'll try to dig more into details.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2013-12-22 Thread Alexander Korotkov

On Sat, Dec 14, 2013 at 6:30 PM, Jeremy Harris j...@wizmail.org wrote:

 On 14/12/13 12:54, Andres Freund wrote:

 On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:

 Currently when we need to get ordered result from table we have to choose
 one of two approaches: get results from index in exact order we need or
 do
 sort of tuples. However, it could be useful to mix both methods: get
 results from index in order which partially meets our requirements and do
 rest of work from heap.


  
 
 --
   Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
 time=0.097..0.099 rows=10 loops=1)
 -  Sort  (cost=69214.06..71714.06 rows=100 width=16) (actual
 time=0.096..0.097 rows=10 loops=1)
   Sort Key: v1, v2
   Sort Method: top-N heapsort  Memory: 25kB
   -  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
 rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1)
   Total runtime: 0.125 ms
 (6 rows)


 Is that actually all that beneficial when sorting with a bog standard
 qsort() since that doesn't generally benefit from data being pre-sorted?
 I think we might need to switch to a different algorithm to really
 benefit from mostly pre-sorted input.


 Eg:  http://www.postgresql.org/message-id/5291467e.6070...@wizmail.org

 Maybe Alexander and I should bash our heads together.


Partial sort patch is mostly optimizer/executor improvement rather than
improvement of sort algorithm itself. But I would appreciate using
enchantments of sorting algorithms in my work.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2013-12-18 Thread Alexander Korotkov

On Sat, Dec 14, 2013 at 6:39 PM, Martijn van Oosterhout
klep...@svana.orgwrote:

 On Sat, Dec 14, 2013 at 06:21:18PM +0400, Alexander Korotkov wrote:
   Is that actually all that beneficial when sorting with a bog standard
   qsort() since that doesn't generally benefit from data being
 pre-sorted?
   I think we might need to switch to a different algorithm to really
   benefit from mostly pre-sorted input.
  
 
  In this patch I don't do full sort of dataset. For instance, index
 returns
  data ordered by first column and we need to order them also by second
  column. Then this node sorts groups (assumed to be small) where values of
  the first column are same by value of second column. And with limit
 clause
  only required number of such groups will be processed. But, I don't think
  we should expect pre-sorted values of second column inside a group.

 Nice. I imagine this would be mostly beneficial for fast-start plans,
 since you no longer need to sort the whole table prior to returning the
 first tuple.

 Reduced memory usage might be a factor, especially for large sorts
 where you otherwise might need to spool to disk.

 You can now use an index on (a) to improve sorting for (a,b).

 Cost of sorting n groups of size l goes from O(nl log nl) to just O(nl
 log l), useful for large n.


Agree. Your reasoning looks correct.


 Minor comments:

 I find cmpTuple a bad name. That's what it's doing but perhaps
 cmpSkipColumns would be clearer.

 I think it's worthwhile adding a seperate path for the skipCols = 0
 case, to avoid extra copies.


Thanks. I'll take care about.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2013-12-18 Thread Alexander Korotkov

On Sat, Dec 14, 2013 at 7:04 PM, Andres Freund and...@2ndquadrant.comwrote:

 Hi,

 Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
time=0.097..0.099 rows=10 loops=1)
   -  Sort  (cost=69214.06..71714.06 rows=100 width=16) (actual
time=0.096..0.097 rows=10 loops=1)
 Sort Key: v1, v2
 Sort Method: top-N heapsort  Memory: 25kB
 -  Index Scan using test_v1_idx on test
  (cost=0.42..47604.42
rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1)
 Total runtime: 0.125 ms
(6 rows)
  
   Is that actually all that beneficial when sorting with a bog standard
   qsort() since that doesn't generally benefit from data being
 pre-sorted?
   I think we might need to switch to a different algorithm to really
   benefit from mostly pre-sorted input.
  
 
  In this patch I don't do full sort of dataset. For instance, index
 returns
  data ordered by first column and we need to order them also by second
  column.

 Ah, that makes sense.

  But, I don't think we should expect pre-sorted values of second column
  inside a group.

 Yes, if you do it that way, there doesn't seem to any need to assume
 that any more than we usually do.

 I think you should make the explain output reflect the fact that we're
 assuming v1 is presorted and just sorting v2. I'd be happy enough with:
 Sort Key: v1, v2
 Partial Sort: v2
 or even just
 Partial Sort Key: [v1,] v2
 but I am sure others disagree.


Sure, I just didn't change explain output yet. It should look like what you
propose.

   *partial-knn-1.patch*

   Rechecking from the heap means adding a sort node though, which I don't
   see here? Or am I misunderstanding something?

  KNN-GiST contain RB-tree of scanned items. In this patch item is
 rechecked
  inside GiST and reinserted into same RB-tree. It appears to be much
 easier
  implementation for PoC and also looks very efficient. I'm not sure what
 is
  actually right design for it. This is what I like to discuss.

 I don't have enough clue about gist to say wether it's the right design,
 but it doesn't look wrong to my eyes. It'd probably be useful to export
 the knowledge that we are rechecking and how often that happens to the
 outside.
 While I didn't really look into the patch, I noticed in passing that you
 pass a all_dead variable to heap_hot_search_buffer without using the
 result - just pass NULL instead, that performs a bit less work.


Useful notice, thanks.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2013-12-18 Thread Alexander Korotkov

On Sat, Dec 14, 2013 at 11:47 PM, Andreas Karlsson andr...@proxel.sewrote:

 On 12/14/2013 10:59 AM, Alexander Korotkov wrote:

 This patch allows to use index for order-by if order-by clause and index
 has non-empty common prefix. So, index gives right ordering for first n
 order-by columns. In order to provide right order for rest m columns,
 sort node is inserted. This sort node sorts groups of tuples where
 values of first n order-by columns are equal.


 I recently looked at the same problem. I see that you solved the
 rescanning problem by simply forcing the sort to be redone on
 ExecReScanSort if you have done a partial sort.


Naturally, I'm sure I solved it at all :) I just get version of patch
working for very limited use-cases.


 My idea for a solution was to modify tuplesort to allow storing the
 already sorted keys in either memtuples or the sort result file, but
 setting a field so it does not sort thee already sorted tuples again. This
 would allow the rescan to work as it used to, but I am unsure how clean or
 ugly this code would be. Was this something you considered?


I'm not sure. I believe that best answer depends on particular parameter:
how much memory we've for sort, how expensive is underlying node and how it
performs rescan, how big are groups in partial sort.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2013-12-18 Thread Andreas Karlsson


On 12/18/2013 01:02 PM, Alexander Korotkov wrote:

My idea for a solution was to modify tuplesort to allow storing the
already sorted keys in either memtuples or the sort result file, but
setting a field so it does not sort thee already sorted tuples
again. This would allow the rescan to work as it used to, but I am
unsure how clean or ugly this code would be. Was this something you
considered?


I'm not sure. I believe that best answer depends on particular
parameter: how much memory we've for sort, how expensive is underlying
node and how it performs rescan, how big are groups in partial sort.


Yes, if one does not need a rescan your solution will use less memory 
and about the same amount of CPU (if the tuplesort does not spill to 
disk). While if we keep all the already sorted tuples in the tuplesort 
rescans will be cheap but more memory will be used with an increased 
chance of spilling to disk.


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Andres Freund

Hi,

Cool stuff.

On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:
 Currently when we need to get ordered result from table we have to choose
 one of two approaches: get results from index in exact order we need or do
 sort of tuples. However, it could be useful to mix both methods: get
 results from index in order which partially meets our requirements and do
 rest of work from heap.

 --
  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
 time=0.097..0.099 rows=10 loops=1)
-  Sort  (cost=69214.06..71714.06 rows=100 width=16) (actual
 time=0.096..0.097 rows=10 loops=1)
  Sort Key: v1, v2
  Sort Method: top-N heapsort  Memory: 25kB
  -  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
 rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1)
  Total runtime: 0.125 ms
 (6 rows)

Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.

 *partial-knn-1.patch*
 
 KNN-GiST provides ability to get ordered results from index, but this order
 is based only on index information. For instance, GiST index contains
 bounding rectangles for polygons, and we can't get exact distance to
 polygon from index (similar situation is in PostGIS). In attached patch,
 GiST distance method can set recheck flag (similar to consistent method).
 This flag means that distance method returned lower bound of distance and
 we should recheck it from heap.

 See an example.
 
 create table test as (select id, polygon(3+(random()*10)::int,
 circle(point(random(), random()), 0.0003 + random()*0.001)) as p from
 generate_series(1,100) id);
 create index test_idx on test using gist (p);
 
 We can get results ordered by distance from polygon to point.
 
 postgres=# select id, p - point(0.5,0.5) from test order by p -
 point(0.5,0.5) limit 10;

 --
  Limit  (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230
 rows=10 loops=1)
-  Index Scan using test_idx on test  (cost=0.29..157672.29
 rows=100 width=36) (actual time=0.179..0.228 rows=10 loops=1)
  Order By: (p - '(0.5,0.5)'::point)
  Total runtime: 0.305 ms
 (4 rows)

Rechecking from the heap means adding a sort node though, which I don't
see here? Or am I misunderstanding something?
Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Alexander Korotkov

Hi!

Thanks for feedback!

On Sat, Dec 14, 2013 at 4:54 PM, Andres Freund and...@2ndquadrant.comwrote:

 Hi,

 Cool stuff.

 On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:
  Currently when we need to get ordered result from table we have to choose
  one of two approaches: get results from index in exact order we need or
 do
  sort of tuples. However, it could be useful to mix both methods: get
  results from index in order which partially meets our requirements and do
  rest of work from heap.

 
 --
   Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
  time=0.097..0.099 rows=10 loops=1)
 -  Sort  (cost=69214.06..71714.06 rows=100 width=16) (actual
  time=0.096..0.097 rows=10 loops=1)
   Sort Key: v1, v2
   Sort Method: top-N heapsort  Memory: 25kB
   -  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
  rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1)
   Total runtime: 0.125 ms
  (6 rows)

 Is that actually all that beneficial when sorting with a bog standard
 qsort() since that doesn't generally benefit from data being pre-sorted?
 I think we might need to switch to a different algorithm to really
 benefit from mostly pre-sorted input.


In this patch I don't do full sort of dataset. For instance, index returns
data ordered by first column and we need to order them also by second
column. Then this node sorts groups (assumed to be small) where values of
the first column are same by value of second column. And with limit clause
only required number of such groups will be processed. But, I don't think
we should expect pre-sorted values of second column inside a group.


  *partial-knn-1.patch*
 
  KNN-GiST provides ability to get ordered results from index, but this
 order
  is based only on index information. For instance, GiST index contains
  bounding rectangles for polygons, and we can't get exact distance to
  polygon from index (similar situation is in PostGIS). In attached patch,
  GiST distance method can set recheck flag (similar to consistent method).
  This flag means that distance method returned lower bound of distance and
  we should recheck it from heap.

  See an example.
 
  create table test as (select id, polygon(3+(random()*10)::int,
  circle(point(random(), random()), 0.0003 + random()*0.001)) as p from
  generate_series(1,100) id);
  create index test_idx on test using gist (p);
 
  We can get results ordered by distance from polygon to point.
 
  postgres=# select id, p - point(0.5,0.5) from test order by p -
  point(0.5,0.5) limit 10;

 
 --
   Limit  (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230
  rows=10 loops=1)
 -  Index Scan using test_idx on test  (cost=0.29..157672.29
  rows=100 width=36) (actual time=0.179..0.228 rows=10 loops=1)
   Order By: (p - '(0.5,0.5)'::point)
   Total runtime: 0.305 ms
  (4 rows)

 Rechecking from the heap means adding a sort node though, which I don't
 see here? Or am I misunderstanding something?


KNN-GiST contain RB-tree of scanned items. In this patch item is rechecked
inside GiST and reinserted into same RB-tree. It appears to be much easier
implementation for PoC and also looks very efficient. I'm not sure what is
actually right design for it. This is what I like to discuss.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Jeremy Harris


On 14/12/13 12:54, Andres Freund wrote:

On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote:

Currently when we need to get ordered result from table we have to choose
one of two approaches: get results from index in exact order we need or do
sort of tuples. However, it could be useful to mix both methods: get
results from index in order which partially meets our requirements and do
rest of work from heap.



--
  Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
time=0.097..0.099 rows=10 loops=1)
-  Sort  (cost=69214.06..71714.06 rows=100 width=16) (actual
time=0.096..0.097 rows=10 loops=1)
  Sort Key: v1, v2
  Sort Method: top-N heapsort  Memory: 25kB
  -  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1)
  Total runtime: 0.125 ms
(6 rows)


Is that actually all that beneficial when sorting with a bog standard
qsort() since that doesn't generally benefit from data being pre-sorted?
I think we might need to switch to a different algorithm to really
benefit from mostly pre-sorted input.


Eg:  http://www.postgresql.org/message-id/5291467e.6070...@wizmail.org

Maybe Alexander and I should bash our heads together.

--
Cheers,
   Jeremy


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Martijn van Oosterhout

On Sat, Dec 14, 2013 at 06:21:18PM +0400, Alexander Korotkov wrote:
  Is that actually all that beneficial when sorting with a bog standard
  qsort() since that doesn't generally benefit from data being pre-sorted?
  I think we might need to switch to a different algorithm to really
  benefit from mostly pre-sorted input.
 
 
 In this patch I don't do full sort of dataset. For instance, index returns
 data ordered by first column and we need to order them also by second
 column. Then this node sorts groups (assumed to be small) where values of
 the first column are same by value of second column. And with limit clause
 only required number of such groups will be processed. But, I don't think
 we should expect pre-sorted values of second column inside a group.

Nice. I imagine this would be mostly beneficial for fast-start plans,
since you no longer need to sort the whole table prior to returning the
first tuple.

Reduced memory usage might be a factor, especially for large sorts
where you otherwise might need to spool to disk.

You can now use an index on (a) to improve sorting for (a,b).

Cost of sorting n groups of size l goes from O(nl log nl) to just O(nl
log l), useful for large n.

Minor comments:

I find cmpTuple a bad name. That's what it's doing but perhaps
cmpSkipColumns would be clearer.

I think it's worthwhile adding a seperate path for the skipCols = 0
case, to avoid extra copies.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Andres Freund

Hi,

Limit  (cost=69214.06..69214.08 rows=10 width=16) (actual
   time=0.097..0.099 rows=10 loops=1)
  -  Sort  (cost=69214.06..71714.06 rows=100 width=16) (actual
   time=0.096..0.097 rows=10 loops=1)
Sort Key: v1, v2
Sort Method: top-N heapsort  Memory: 25kB
-  Index Scan using test_v1_idx on test  (cost=0.42..47604.42
   rows=100 width=16) (actual time=0.017..0.066 rows=56 loops=1)
Total runtime: 0.125 ms
   (6 rows)
 
  Is that actually all that beneficial when sorting with a bog standard
  qsort() since that doesn't generally benefit from data being pre-sorted?
  I think we might need to switch to a different algorithm to really
  benefit from mostly pre-sorted input.
 
 
 In this patch I don't do full sort of dataset. For instance, index returns
 data ordered by first column and we need to order them also by second
 column.

Ah, that makes sense.

 But, I don't think we should expect pre-sorted values of second column
 inside a group.

Yes, if you do it that way, there doesn't seem to any need to assume
that any more than we usually do.

I think you should make the explain output reflect the fact that we're
assuming v1 is presorted and just sorting v2. I'd be happy enough with:
Sort Key: v1, v2
Partial Sort: v2
or even just
Partial Sort Key: [v1,] v2
but I am sure others disagree.

   *partial-knn-1.patch*

  Rechecking from the heap means adding a sort node though, which I don't
  see here? Or am I misunderstanding something?

 KNN-GiST contain RB-tree of scanned items. In this patch item is rechecked
 inside GiST and reinserted into same RB-tree. It appears to be much easier
 implementation for PoC and also looks very efficient. I'm not sure what is
 actually right design for it. This is what I like to discuss.

I don't have enough clue about gist to say wether it's the right design,
but it doesn't look wrong to my eyes. It'd probably be useful to export
the knowledge that we are rechecking and how often that happens to the
outside.
While I didn't really look into the patch, I noticed in passing that you
pass a all_dead variable to heap_hot_search_buffer without using the
result - just pass NULL instead, that performs a bit less work.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Andreas Karlsson


On 12/14/2013 10:59 AM, Alexander Korotkov wrote:

This patch allows to use index for order-by if order-by clause and index
has non-empty common prefix. So, index gives right ordering for first n
order-by columns. In order to provide right order for rest m columns,
sort node is inserted. This sort node sorts groups of tuples where
values of first n order-by columns are equal.


I recently looked at the same problem. I see that you solved the 
rescanning problem by simply forcing the sort to be redone on 
ExecReScanSort if you have done a partial sort.


My idea for a solution was to modify tuplesort to allow storing the 
already sorted keys in either memtuples or the sort result file, but 
setting a field so it does not sort thee already sorted tuples again. 
This would allow the rescan to work as it used to, but I am unsure how 
clean or ugly this code would be. Was this something you considered?


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

85 matches

Mail list logo