Re: [HACKERS] Tuplesort merge pre-reading

2017-04-18 Thread Heikki Linnakangas
On 04/14/2017 08:19 AM, Peter Geoghegan wrote: BTW, I'm skeptical of the idea of Heikki's around killing polyphase merge itself at this point. I think that keeping most tapes active per pass is useful now that our memory accounting involves handing over an even share to each maybe-active tape

Re: [HACKERS] Tuplesort merge pre-reading

2017-04-14 Thread Peter Geoghegan
On Fri, Apr 14, 2017 at 5:57 AM, Robert Haas wrote: > I don't think there's any one fixed answer, because increasing the > number of tapes reduces I/O by adding CPU cost, and visca versa. Sort of, but if you have to merge hundreds of runs (a situation that should be quite

Re: [HACKERS] Tuplesort merge pre-reading

2017-04-14 Thread Robert Haas
On Fri, Apr 14, 2017 at 1:19 AM, Peter Geoghegan wrote: > Interestingly enough, I think that Knuth was pretty much spot on with > his "sweet spot" of 7 tapes, even if you have modern hardware. Commit > df700e6 (where the sweet spot of merge order 7 was no longer always > used) was

Re: [HACKERS] Tuplesort merge pre-reading

2017-04-13 Thread Peter Geoghegan
On Thu, Apr 13, 2017 at 10:19 PM, Peter Geoghegan wrote: > I actually think Heikki's work here would particularly help on > spinning rust, especially when less memory is available. He > specifically justified it on the basis of it resulting in a more > sequential read pattern,

Re: [HACKERS] Tuplesort merge pre-reading

2017-04-13 Thread Peter Geoghegan
On Thu, Apr 13, 2017 at 9:51 PM, Tom Lane wrote: > I'm fairly sure that the point was exactly what it said, ie improve > locality of access within the temp file by sequentially reading as many > tuples in a row as we could, rather than grabbing one here and one there. > > It

Re: [HACKERS] Tuplesort merge pre-reading

2017-04-13 Thread Tom Lane
I wrote: > Heikki Linnakangas writes: >> I'm talking about the code that reads a bunch of from each tape, loading >> them into the memtuples array. That code was added by Tom Lane, back in >> 1999: >> So apparently there was a benefit back then, but is it still worthwhile? >

Re: [HACKERS] Tuplesort merge pre-reading

2017-04-13 Thread Tom Lane
Heikki Linnakangas writes: > I'm talking about the code that reads a bunch of from each tape, loading > them into the memtuples array. That code was added by Tom Lane, back in > 1999: > commit cf627ab41ab9f6038a29ddd04dd0ff0ccdca714e > Author: Tom Lane >

Re: [HACKERS] Tuplesort merge pre-reading

2016-10-03 Thread Peter Geoghegan
On Mon, Oct 3, 2016 at 3:39 AM, Heikki Linnakangas wrote: >> Can't you just use state->tapeRange, and remove the "continue"? I >> recommend referring to "now-exhausted input tapes" here, too. > > > Don't think so. result_tape == tapeRange only when the merge was done in a >

Re: [HACKERS] Tuplesort merge pre-reading

2016-10-03 Thread Heikki Linnakangas
On 09/30/2016 04:08 PM, Peter Geoghegan wrote: On Thu, Sep 29, 2016 at 4:10 PM, Heikki Linnakangas wrote: Bah, I fumbled the initSlabAllocator() function, attached is a fixed version. This looks much better. It's definitely getting close. Thanks for being considerate of my

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-30 Thread Peter Geoghegan
On Thu, Sep 29, 2016 at 4:10 PM, Heikki Linnakangas wrote: > Bah, I fumbled the initSlabAllocator() function, attached is a fixed > version. This looks much better. It's definitely getting close. Thanks for being considerate of my more marginal concerns. More feedback: * Should

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 11:38 AM, Peter Geoghegan wrote: > On Thu, Sep 29, 2016 at 2:59 PM, Robert Haas wrote: >>> Maybe that was the wrong choice of words. What I mean is that it seems >>> somewhat unprincipled to give over an equal share of memory to

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Peter Geoghegan
On Thu, Sep 29, 2016 at 2:59 PM, Robert Haas wrote: >> Maybe that was the wrong choice of words. What I mean is that it seems >> somewhat unprincipled to give over an equal share of memory to each >> active-at-least-once tape, ... > > I don't get it. If the memory is being

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Heikki Linnakangas
On 09/29/2016 05:41 PM, Heikki Linnakangas wrote: Here's a new patch version, addressing the points you made. Please have a look! Bah, I fumbled the initSlabAllocator() function, attached is a fixed version. - Heikki >From bd74cb9c32b3073637d6932f3b4552598fcdc92a Mon Sep 17 00:00:00 2001

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Heikki Linnakangas
On 09/29/2016 01:52 PM, Peter Geoghegan wrote: * Variables like maxTapes have a meaning that is directly traceable back to Knuth's description of polyphase merge. I don't think that you should do anything to them, on general principle. Ok. I still think that changing maxTapes would make sense,

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 6:52 AM, Peter Geoghegan wrote: >> How is it awkward? > > Maybe that was the wrong choice of words. What I mean is that it seems > somewhat unprincipled to give over an equal share of memory to each > active-at-least-once tape, ... I don't get it. If the

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Peter Geoghegan
On Thu, Sep 29, 2016 at 10:49 AM, Heikki Linnakangas wrote: >> Do I have that right? If so, this seems rather awkward. Hmm. > > > How is it awkward? Maybe that was the wrong choice of words. What I mean is that it seems somewhat unprincipled to give over an equal share of memory

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-29 Thread Heikki Linnakangas
On 09/28/2016 07:20 PM, Peter Geoghegan wrote: On Wed, Sep 28, 2016 at 5:11 PM, Peter Geoghegan wrote: This is why I never pursued batch memory for non-final merges. Isn't that what you're doing here? You're pretty much always setting "state->batchUsed = true". Wait. I guess

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-28 Thread Heikki Linnakangas
On 09/28/2016 07:11 PM, Peter Geoghegan wrote: On Wed, Sep 28, 2016 at 5:04 PM, Heikki Linnakangas wrote: Not sure that I understand. I agree that each merge pass tends to use roughly the same number of tapes, but the distribution of real runs on tapes is quite unbalanced in

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-28 Thread Peter Geoghegan
On Wed, Sep 28, 2016 at 5:11 PM, Peter Geoghegan wrote: > This is why I never pursued batch memory for non-final merges. Isn't > that what you're doing here? You're pretty much always setting > "state->batchUsed = true". Wait. I guess you feel you have to, since it wouldn't be

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-28 Thread Peter Geoghegan
On Wed, Sep 28, 2016 at 5:04 PM, Heikki Linnakangas wrote: >> Not sure that I understand. I agree that each merge pass tends to use >> roughly the same number of tapes, but the distribution of real runs on >> tapes is quite unbalanced in earlier merge passes (due to dummy runs).

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-28 Thread Heikki Linnakangas
On 09/28/2016 06:05 PM, Peter Geoghegan wrote: On Thu, Sep 15, 2016 at 9:51 PM, Heikki Linnakangas wrote: I don't think it makes much difference in practice, because most merge passes use all, or almost all, of the available tapes. BTW, I think the polyphase algorithm prefers

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-28 Thread Peter Geoghegan
On Thu, Sep 15, 2016 at 9:51 PM, Heikki Linnakangas wrote: >> I still don't get why you're doing all of this within mergeruns() (the >> beginning of when we start merging -- we merge all quicksorted runs), >> rather than within beginmerge() (the beginning of one particular merge

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-22 Thread Claudio Freire
On Thu, Sep 22, 2016 at 4:17 AM, Heikki Linnakangas wrote: > On 09/22/2016 03:40 AM, Claudio Freire wrote: >> >> On Tue, Sep 20, 2016 at 3:34 PM, Claudio Freire >> wrote: >>> >>> The results seem all over the map. Some regressions seem significant >>>

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-22 Thread Heikki Linnakangas
On 09/22/2016 03:40 AM, Claudio Freire wrote: On Tue, Sep 20, 2016 at 3:34 PM, Claudio Freire wrote: The results seem all over the map. Some regressions seem significant (both in the amount of performance lost and their significance, since all 4 runs show a similar

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-21 Thread Claudio Freire
On Tue, Sep 20, 2016 at 3:34 PM, Claudio Freire wrote: > On Fri, Sep 9, 2016 at 9:51 PM, Claudio Freire wrote: >> On Fri, Sep 9, 2016 at 8:13 AM, Heikki Linnakangas wrote: >>> >>> Claudio, if you could also repeat the tests you

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-20 Thread Claudio Freire
On Fri, Sep 9, 2016 at 9:51 PM, Claudio Freire wrote: > On Fri, Sep 9, 2016 at 8:13 AM, Heikki Linnakangas wrote: >> >> Claudio, if you could also repeat the tests you ran on Peter's patch set on >> the other thread, with these patches, that'd be nice.

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-17 Thread Peter Geoghegan
On Sat, Sep 17, 2016 at 9:41 AM, Heikki Linnakangas wrote: > Ok. I'll probably read through it myself once more some time next week, and > also have a first look at your actual parallel sorting patch. Have a good > trip! Thanks! It will be good to get away for a while. I'd be

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-17 Thread Heikki Linnakangas
On 09/17/2016 07:27 PM, Peter Geoghegan wrote: On Wed, Sep 14, 2016 at 10:43 AM, Heikki Linnakangas wrote: Addressed all your comments one way or another, new patch attached. So, it's clear that this isn't ready today. As I mentioned, I'm going away for a week now. I ask

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-17 Thread Peter Geoghegan
On Wed, Sep 14, 2016 at 10:43 AM, Heikki Linnakangas wrote: > Addressed all your comments one way or another, new patch attached. So, it's clear that this isn't ready today. As I mentioned, I'm going away for a week now. I ask that you hold off on committing this until I return,

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-15 Thread Peter Geoghegan
On Thu, Sep 15, 2016 at 1:51 PM, Heikki Linnakangas wrote: > BTW, does a 1-way merge make any sense? I was surprised to see this in the > log, even without this patch: > > LOG: finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec > STATEMENT: SELECT COUNT(*) FROM

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-15 Thread Peter Geoghegan
On Thu, Sep 15, 2016 at 1:51 PM, Heikki Linnakangas wrote: >> I still don't get why you're doing all of this within mergeruns() (the >> beginning of when we start merging -- we merge all quicksorted runs), >> rather than within beginmerge() (the beginning of one particular merge

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-15 Thread Heikki Linnakangas
On 09/15/2016 10:12 PM, Peter Geoghegan wrote: On Wed, Sep 14, 2016 at 10:43 AM, Heikki Linnakangas wrote: Spotted another issue with this code just now. Shouldn't it be based on state->tapeRange? You don't want the destTape to get memory, since you don't use batch memory for

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-15 Thread Peter Geoghegan
On Wed, Sep 14, 2016 at 10:43 AM, Heikki Linnakangas wrote: > Addressed all your comments one way or another, new patch attached. Comments > on some specific points below: Cool. My response here is written under time pressure, which is not ideal. I think it's still useful,

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-14 Thread Heikki Linnakangas
Addressed all your comments one way or another, new patch attached. Comments on some specific points below: On 09/12/2016 01:13 AM, Peter Geoghegan wrote: Other things I noticed: * You should probably point out that typically, access to batch memory will still be sequential, despite your

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-12 Thread Peter Geoghegan
On Mon, Sep 12, 2016 at 12:07 PM, Heikki Linnakangas wrote: > Here's a fixed version. I'll go through Peter's comments and address those, > but I don't think there was anything there that should affect performance > much, so I think you can proceed with your benchmarking with

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-12 Thread Heikki Linnakangas
On 09/12/2016 06:47 PM, Claudio Freire wrote: On Mon, Sep 12, 2016 at 12:02 PM, Claudio Freire wrote: On Sun, Sep 11, 2016 at 12:47 PM, Heikki Linnakangas wrote: Here's a new version of these patches, rebased over current master. I squashed the two

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-12 Thread Peter Geoghegan
On Mon, Sep 12, 2016 at 8:47 AM, Claudio Freire wrote: > I spoke too soon, git AM had failed and I didn't notice. I wrote the regression test that causes Postgres to crash with the patch applied. It tests, among other things, that CLUSTER tuples are fixed-up by a routine

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-12 Thread Claudio Freire
On Sun, Sep 11, 2016 at 12:47 PM, Heikki Linnakangas wrote: > Here's a new version of these patches, rebased over current master. I > squashed the two patches into one, there's not much point to keep them > separate. I don't know what was up with the other ones, but this one

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-11 Thread Gavin Flower
On 12/09/16 12:16, Gavin Flower wrote: [...] two blocks would be logically adjacent (which means they are likely to be physically close together, but not guaranteed!). [...] Actual disk layouts are quite complicated, the above is an over simplification, but the message is still valid.

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-11 Thread Gavin Flower
On 12/09/16 10:13, Peter Geoghegan wrote: On Sun, Sep 11, 2016 at 8:47 AM, Heikki Linnakangas wrote: [...] I don't know what the difference is between accessing 10 pages randomly, and accessing a random set of 10 single pages sequentially, in close succession. As Tom would

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-11 Thread Peter Geoghegan
On Sun, Sep 11, 2016 at 3:13 PM, Peter Geoghegan wrote: >> + for (tapenum = 0; tapenum < maxTapes; tapenum++) >> + { >> + LogicalTapeAssignReadBufferSize(state->tapeset, tapenum, >> + (per_tape + (tapenum < cutoff ? 1 : >> 0)) *

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-11 Thread Peter Geoghegan
On Sun, Sep 11, 2016 at 3:13 PM, Peter Geoghegan wrote: > * Doesn't this code need to call MemoryContextAllocHuge() rather than > palloc()?: > >> @@ -709,18 +765,19 @@ LogicalTapeRewind(LogicalTapeSet *lts, int tapenum, >> bool forWrite) >> Assert(lt->frozen); >>

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-11 Thread Peter Geoghegan
On Sun, Sep 11, 2016 at 3:13 PM, Peter Geoghegan wrote: > * Please make this use the ".., + 1023" natural rounding trick that is > used in the similar traces that are removed: > >> +#ifdef TRACE_SORT >> + if (trace_sort) >> + elog(LOG, "using %d kB of memory for read

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-11 Thread Peter Geoghegan
On Sun, Sep 11, 2016 at 8:47 AM, Heikki Linnakangas wrote: > Here's a new version of these patches, rebased over current master. I > squashed the two patches into one, there's not much point to keep them > separate. I think I have my head fully around this now. For some reason,

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-11 Thread Heikki Linnakangas
Here's a new version of these patches, rebased over current master. I squashed the two patches into one, there's not much point to keep them separate. - Heikki >From 6e3813d876cf3efbe5f1b80c45f44ed5494304ab Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date:

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-10 Thread Peter Geoghegan
On Sat, Sep 10, 2016 at 12:04 AM, Heikki Linnakangas wrote: >> Did I misunderstand something? I'm applying the first patch of Peter's >> series (cap number of tapes), then your first one (remove prefetch) >> and second one (use larger read buffers). > > > Yes. I have been testing

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-10 Thread Heikki Linnakangas
On 09/10/2016 04:21 AM, Claudio Freire wrote: On Fri, Sep 9, 2016 at 9:51 PM, Claudio Freire wrote: On Fri, Sep 9, 2016 at 8:13 AM, Heikki Linnakangas wrote: Claudio, if you could also repeat the tests you ran on Peter's patch set on the other

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Claudio Freire
On Fri, Sep 9, 2016 at 9:51 PM, Claudio Freire wrote: > On Fri, Sep 9, 2016 at 8:13 AM, Heikki Linnakangas wrote: >> >> Claudio, if you could also repeat the tests you ran on Peter's patch set on >> the other thread, with these patches, that'd be nice.

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Claudio Freire
On Fri, Sep 9, 2016 at 8:13 AM, Heikki Linnakangas wrote: > > Claudio, if you could also repeat the tests you ran on Peter's patch set on > the other thread, with these patches, that'd be nice. These patches are > effectively a replacement for >

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Peter Geoghegan
On Fri, Sep 9, 2016 at 5:25 AM, Greg Stark wrote: > Wow, this is really cool. We should do something like this for query > execution too. We should certainly do this for tuplestore.c, too. I've been meaning to adopt it to use batch memory. I did look at it briefly, and recall that

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Peter Geoghegan
On Fri, Sep 9, 2016 at 4:55 AM, Heikki Linnakangas wrote: > I'm happy with the amount of testing I've done now, and the results. Does > anyone want to throw out any more test cases where there might be a > regression? If not, let's get these reviewed and committed. I'll try to

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Heikki Linnakangas
On 09/09/2016 03:25 PM, Greg Stark wrote: On Fri, Sep 9, 2016 at 1:01 PM, Heikki Linnakangas wrote: I'm happy with what it looks like. We are in fact getting a more sequential access pattern with these patches, because we're not expanding the pre-read tuples into SortTuples.

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Greg Stark
On Fri, Sep 9, 2016 at 1:01 PM, Heikki Linnakangas wrote: > I'm happy with what it looks like. We are in fact getting a more sequential > access pattern with these patches, because we're not expanding the pre-read > tuples into SortTuples. Keeping densely-packed blocks in memory,

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-09 Thread Heikki Linnakangas
On 09/08/2016 09:59 PM, Heikki Linnakangas wrote: On 09/06/2016 10:26 PM, Peter Geoghegan wrote: On Tue, Sep 6, 2016 at 12:08 PM, Peter Geoghegan wrote: Offhand, I would think that taken together this is very important. I'd certainly want to see cases in the hundreds of

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-08 Thread Heikki Linnakangas
On 09/06/2016 10:26 PM, Peter Geoghegan wrote: On Tue, Sep 6, 2016 at 12:08 PM, Peter Geoghegan wrote: Offhand, I would think that taken together this is very important. I'd certainly want to see cases in the hundreds of megabytes or gigabytes of work_mem alongside your 4MB

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-06 Thread Peter Geoghegan
On Tue, Sep 6, 2016 at 12:08 PM, Peter Geoghegan wrote: > Offhand, I would think that taken together this is very important. I'd > certainly want to see cases in the hundreds of megabytes or gigabytes > of work_mem alongside your 4MB case, even just to be able to talk >

Re: [HACKERS] Tuplesort merge pre-reading

2016-09-06 Thread Peter Geoghegan
On Tue, Sep 6, 2016 at 5:20 AM, Heikki Linnakangas wrote: > I wrote a quick patch to test that, attached. It seems to improve > performance, at least in this small test case: > > create table lotsofints(i integer); > insert into lotsofints select random() * 10.0 from >