Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-11-02 Thread Thomas Munro
On Fri, Nov 3, 2017 at 2:24 PM, Peter Geoghegan wrote: > Thomas Munro wrote: >> That way you don't have to opt in to BufFile's >> double buffering and segmentation schemes just to get shared file >> clean-up, if for some reason you want direct file

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-11-02 Thread Peter Geoghegan
Thomas Munro wrote: I'm going to make an item on my personal TODO list for that. No useful insights on that right now, though. I decided to try that, but it didn't really work: fd.h gets included by front-end code, so I can't very well define a struct and

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-11-02 Thread Thomas Munro
On Wed, Nov 1, 2017 at 2:11 PM, Peter Geoghegan wrote: > On Tue, Oct 31, 2017 at 5:07 PM, Thomas Munro > wrote: >> Another complaint is that perhaps fd.c >> knows too much about buffile.c's business. For example, >> RemovePgTempFilesInDir() knows

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-10-31 Thread Peter Geoghegan
On Tue, Oct 31, 2017 at 5:07 PM, Thomas Munro wrote: > So that's this bit: > > + pg_itoa(worker, filename); > + lts->pfile = BufFileCreateShared(fileset, filename); > > ... and: > > + pg_itoa(i, filename); > + file = BufFileOpenShared(fileset, filename); Right. >

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-10-31 Thread Thomas Munro
On Wed, Nov 1, 2017 at 11:29 AM, Peter Geoghegan wrote: > On Thu, Oct 26, 2017 at 4:22 AM, Rushabh Lathia > wrote: >> Attaching the re based patch according to the v22 parallel-hash patch sets > > I took a quick look at this today, and noticed a few

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-10-31 Thread Peter Geoghegan
On Thu, Oct 26, 2017 at 4:22 AM, Rushabh Lathia wrote: > Attaching the re based patch according to the v22 parallel-hash patch sets I took a quick look at this today, and noticed a few issues: * make_name() is used to name files in sharedtuplestore.c, which is what is

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-09-29 Thread Peter Geoghegan
On Wed, Sep 20, 2017 at 2:32 AM, Rushabh Lathia wrote: > Yes, I haven't touched the randomAccess part yet. My initial goal was > to incorporate the BufFileSet api's here. This is going to need a rebase, due to the commit today to remove replacement selection sort. That

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-09-20 Thread Robert Haas
On Wed, Sep 20, 2017 at 5:32 AM, Rushabh Lathia wrote: > First application for the tuplesort here is CREATE INDEX and that doesn't > need randomAccess. But as you said and in the thread its been discussed, > randomAccess is an important and we should sure put an efforts

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-09-20 Thread Rushabh Lathia
On Wed, Sep 20, 2017 at 5:17 AM, Peter Geoghegan wrote: > On Tue, Sep 19, 2017 at 3:21 AM, Rushabh Lathia > wrote: > > As per the earlier discussion in the thread, I did experiment using > > BufFileSet interface from parallel-hash-v18.patchset. I took

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-09-19 Thread Peter Geoghegan
On Tue, Sep 19, 2017 at 3:21 AM, Rushabh Lathia wrote: > As per the earlier discussion in the thread, I did experiment using > BufFileSet interface from parallel-hash-v18.patchset. I took the reference > of parallel-hash other patches to understand the BufFileSet APIs,

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-08-02 Thread Andres Freund
On 2017-02-10 07:52:57 -0500, Robert Haas wrote: > On Thu, Feb 9, 2017 at 6:38 PM, Thomas Munro > > Up until two minutes ago I assumed that policy would leave only two > > possibilities: you attach to the DSM segment and attach to the > > SharedBufFileManager successfully or you attach to the DSM

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-22 Thread Peter Geoghegan
On Wed, Mar 22, 2017 at 5:44 AM, Robert Haas wrote: >> Actually, that's quite possible with the design I came up with. > > I don't think it is. What sequence of calls do the APIs you've > proposed would accomplish that goal? I don't see anything in this > patch set that

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-22 Thread Robert Haas
On Tue, Mar 21, 2017 at 7:37 PM, Peter Geoghegan wrote: >>> Isn't that an essential part of having a refcount, in general? You >>> were the one that suggested refcounting. >> >> No, quite the opposite. My point in suggesting adding a refcount was >> to avoid needing to have a

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Peter Geoghegan
On Tue, Mar 21, 2017 at 2:49 PM, Thomas Munro wrote: > I'm going to experiment with refactoring the v10 parallel CREATE INDEX > patch to use the SharedBufFileSet interface from > hj-shared-buf-file-v8.patch today and see what problems I run into. I would be happy

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Peter Geoghegan
On Tue, Mar 21, 2017 at 2:03 PM, Robert Haas wrote: > I agree that the extent to which code reuse is possible here is > somewhat unclear, but I am 100% confident that the answer is non-zero. > You and Thomas both need BufFiles that can be shared across multiple > backends

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Thomas Munro
On Wed, Mar 22, 2017 at 10:03 AM, Robert Haas wrote: > On Tue, Mar 21, 2017 at 3:50 PM, Peter Geoghegan wrote: >> I disagree with that. It is a >> trade-off, I suppose. I have now run out of time to work through it >> with you or Thomas, though. > > Bummer.

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Robert Haas
On Tue, Mar 21, 2017 at 3:50 PM, Peter Geoghegan wrote: > On Tue, Mar 21, 2017 at 12:06 PM, Robert Haas wrote: >> From my point of view, the main point is that having two completely >> separate mechanisms for managing temporary files that need to be >> shared

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Peter Geoghegan
On Tue, Mar 21, 2017 at 12:06 PM, Robert Haas wrote: > From my point of view, the main point is that having two completely > separate mechanisms for managing temporary files that need to be > shared across cooperating workers is not a good decision. That's a > need that's

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Robert Haas
On Tue, Mar 21, 2017 at 2:03 PM, Peter Geoghegan wrote: > I think that since the comment refers to code from before 1999, it can > go. Any separate patch to remove it would have an entirely negative > linediff. It's a good general principle that a patch should do one thing well and

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Peter Geoghegan
On Tue, Mar 21, 2017 at 9:10 AM, Robert Haas wrote: > - * This code is moderately slow (~10% slower) compared to the regular > - * btree (insertion) build code on sorted or well-clustered data. On > - * random data, however, the insertion build code is unusable -- the > -

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-21 Thread Robert Haas
On Sun, Mar 19, 2017 at 9:03 PM, Peter Geoghegan wrote: > On Sun, Mar 12, 2017 at 3:05 PM, Peter Geoghegan wrote: >> I attach my V9 of the patch. I came up some stuff for the design of >> resource management that I think meets every design goal that we have >> for

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-19 Thread Peter Geoghegan
On Sun, Mar 12, 2017 at 3:05 PM, Peter Geoghegan wrote: > I attach my V9 of the patch. I came up some stuff for the design of > resource management that I think meets every design goal that we have > for shared/unified BufFiles: Commit 2609e91fc broke the parallel CREATE INDEX cost

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-12 Thread Peter Geoghegan
On Sun, Mar 12, 2017 at 3:05 PM, Peter Geoghegan wrote: > There is still an open item here, though: The leader-as-worker > Tuplesortstate, a special case, can still leak files. I phrased this badly. What I mean is that there can be instances where temp files are left on disk

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-12 Thread Peter Geoghegan
On Thu, Feb 16, 2017 at 8:45 AM, Peter Geoghegan wrote: >> I do not think there should be any reason why we can't get the >> resource accounting exactly correct here. If a single backend manages >> to remove every temporary file that it creates exactly once (and >> that's currently

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-06 Thread Thomas Munro
On Wed, Mar 1, 2017 at 10:29 PM, Thomas Munro wrote: > I'm testing a patch that lets you set up a fixed sized > SharedBufFileSet object in a DSM segment, with its own refcount for > the reason you explained. It supports a dynamically expandable set of > numbered

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-03-01 Thread Thomas Munro
On Sat, Feb 11, 2017 at 1:52 AM, Robert Haas wrote: > On Thu, Feb 9, 2017 at 6:38 PM, Thomas Munro > wrote: >> Yes, potentially unbounded in rare case. If we plan for N batches, >> and then run out of work_mem because our estimates were just

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-16 Thread Robert Haas
On Thu, Feb 16, 2017 at 11:45 AM, Peter Geoghegan wrote: > Maybe I'm just being pedantic here, since we both actually want the > code to do the same thing. Pedantry from either of us? Nah... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-16 Thread Peter Geoghegan
On Wed, Feb 15, 2017 at 6:05 PM, Thomas Munro wrote: > Here are some results with your latest patch, using the same test as > before but this time with SCALE=100 (= 100,000,000 rows). Cool. > To reduce the number of combinations I did only unique data and built >

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-16 Thread Peter Geoghegan
On Thu, Feb 16, 2017 at 6:28 AM, Robert Haas wrote: > On Thu, Feb 9, 2017 at 7:10 PM, Peter Geoghegan wrote: >> At the risk of stating the obvious, ISTM that the right way to do >> this, at a high level, is to err on the side of unneeded extra >> unlink()

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-16 Thread Robert Haas
On Thu, Feb 9, 2017 at 7:10 PM, Peter Geoghegan wrote: > At the risk of stating the obvious, ISTM that the right way to do > this, at a high level, is to err on the side of unneeded extra > unlink() calls, not leaking files. And, to make the window for problem > ("remaining hole

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-15 Thread Thomas Munro
On Sat, Feb 4, 2017 at 2:45 PM, Peter Geoghegan wrote: > It might just have been that the table was too small to be an > effective target for parallel sequential scan with so many workers, > and so a presorted best case CREATE INDEX, which isn't that different, > also fails to see

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-10 Thread Robert Haas
On Thu, Feb 9, 2017 at 6:38 PM, Thomas Munro wrote: > Here's my thought process... please tell me where I'm going wrong: > > I have been assuming that it's not enough to just deal with this when > the leader detaches on the theory that other participants will always

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-09 Thread Peter Geoghegan
On Thu, Feb 9, 2017 at 2:31 PM, Robert Haas wrote: > You might think about plugging that hole by moving the registry of > on-destroy functions into the segment itself and making it a shared > resource. But ASLR breaks that, especially for loadable modules. You > could try

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-09 Thread Thomas Munro
On Fri, Feb 10, 2017 at 11:31 AM, Robert Haas wrote: > On Thu, Feb 9, 2017 at 5:09 PM, Thomas Munro > wrote: >> I agree that the pinned segment case doesn't matter right now, I just >> wanted to point it out. I like your $10 wrench analogy,

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-09 Thread Robert Haas
On Thu, Feb 9, 2017 at 5:09 PM, Thomas Munro wrote: > I agree that the pinned segment case doesn't matter right now, I just > wanted to point it out. I like your $10 wrench analogy, but maybe it > could be argued that adding a dsm_on_destroy() callback mechanism is

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-09 Thread Thomas Munro
On Fri, Feb 10, 2017 at 9:51 AM, Robert Haas wrote: > On Wed, Feb 8, 2017 at 5:36 AM, Thomas Munro > wrote: >> Thinking a bit harder about this, I suppose there could be a kind of >> object called a SharedBufFileManager [... description of

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-09 Thread Robert Haas
On Wed, Feb 8, 2017 at 5:36 AM, Thomas Munro wrote: > Thinking a bit harder about this, I suppose there could be a kind of > object called a SharedBufFileManager (insert better name) which you > can store in a DSM segment. The leader backend that initialises a DSM

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-08 Thread Thomas Munro
On Wed, Feb 8, 2017 at 8:40 PM, Thomas Munro wrote: > On Tue, Feb 7, 2017 at 5:43 PM, Peter Geoghegan wrote: >> Does anyone have any suggestions on how to tackle this? > > Hmm. One approach might be like this: > > [hand-wavy stuff] Thinking a bit

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-07 Thread Thomas Munro
On Tue, Feb 7, 2017 at 5:43 PM, Peter Geoghegan wrote: > However, there are some specific implementation issues with this that > I didn't quite anticipate. I would like to get feedback on these > issues now, from both Thomas and Robert. The issues relate to how much > the patch can

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-06 Thread Peter Geoghegan
On Mon, Jan 30, 2017 at 9:15 PM, Peter Geoghegan wrote: >> IIUC worker_wait() is only being used to keep the worker around so its >> files aren't deleted. Once buffile cleanup is changed to be >> ref-counted (in an on_dsm_detach hook?) then workers might as well >> exit sooner,

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-03 Thread Peter Geoghegan
On Fri, Feb 3, 2017 at 4:15 PM, Thomas Munro wrote: >> I suspect that this system isn't particularly well balanced for the >> task of benchmarking the patch. You would probably see notably better >> scalability than any you've shown in any test if you could add >>

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-03 Thread Thomas Munro
On Sat, Feb 4, 2017 at 11:58 AM, Peter Geoghegan wrote: > On Fri, Feb 3, 2017 at 5:04 AM, Thomas Munro > wrote: >> 1. 'asc' = pre-sorted data (w = 0 shows time in seconds, other columns >> show speed-up relative to that time): >> >> mem | w = 0 | w

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-03 Thread Peter Geoghegan
On Fri, Feb 3, 2017 at 5:04 AM, Thomas Munro wrote: > 1. 'asc' = pre-sorted data (w = 0 shows time in seconds, other columns > show speed-up relative to that time): > > mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8 >

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-03 Thread Peter Geoghegan
On Fri, Feb 3, 2017 at 5:04 AM, Thomas Munro wrote: > I applied your v2 patch on top of > 7ac4a389a7dbddaa8b19deb228f0a988e79c5795^ to avoid a conflict. It > still had a couple of harmless conflicts that I was able to deal with > (not code, just some header stuff

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-02-03 Thread Thomas Munro
On Wed, Feb 1, 2017 at 8:46 PM, Peter Geoghegan wrote: > On Tue, Jan 31, 2017 at 11:23 PM, Thomas Munro > wrote: >> So I'm in favour of this patch, which is relatively simple and give us >> faster index builds soon. Eventually we might also be able

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-31 Thread Peter Geoghegan
On Tue, Jan 31, 2017 at 11:23 PM, Thomas Munro wrote: > 2. All participants: parallel sequential scan, sort, spool to disk; > barrier; leader: merge spooled tuples and build btree. > > This patch is doing the 2nd thing. My understanding is that some > systems

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-31 Thread Thomas Munro
On Wed, Feb 1, 2017 at 5:37 PM, Michael Paquier wrote: > On Tue, Jan 31, 2017 at 2:15 PM, Peter Geoghegan wrote: >> On Mon, Jan 30, 2017 at 8:46 PM, Thomas Munro >> wrote: >>> On Wed, Jan 4, 2017 at 12:53 PM, Peter

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-31 Thread Michael Paquier
On Tue, Jan 31, 2017 at 2:15 PM, Peter Geoghegan wrote: > On Mon, Jan 30, 2017 at 8:46 PM, Thomas Munro > wrote: >> On Wed, Jan 4, 2017 at 12:53 PM, Peter Geoghegan wrote: >>> Attached is V7 of the patch. >> >> I am doing some

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-31 Thread Robert Haas
On Tue, Jan 31, 2017 at 12:15 AM, Peter Geoghegan wrote: >> Should this 64KB minimum be mentioned in the documentation? > > You mean user-visible documentation, and not just tuplesort.h? I don't > think that that's necessary. That's a ludicrously low amount of memory > for a worker

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-30 Thread Peter Geoghegan
On Mon, Jan 30, 2017 at 8:46 PM, Thomas Munro wrote: > On Wed, Jan 4, 2017 at 12:53 PM, Peter Geoghegan wrote: >> Attached is V7 of the patch. > > I am doing some testing. First, some superficial things from first pass: > > [Various minor cosmetic

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-30 Thread Thomas Munro
On Wed, Jan 4, 2017 at 12:53 PM, Peter Geoghegan wrote: > Attached is V7 of the patch. I am doing some testing. First, some superficial things from first pass: Still applies with some offsets and one easy-to-fix rejected hunk in nbtree.c (removing some #include directives and

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-01-03 Thread Peter Geoghegan
On Tue, Dec 20, 2016 at 5:14 PM, Peter Geoghegan wrote: >> Imagine a data structure that is stored in dynamic shared memory and >> contains space for a filename, a reference count, and a mutex. Let's >> call this thing a SharedTemporaryFile or something like that. It >> offers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-21 Thread Peter Geoghegan
On Wed, Dec 21, 2016 at 10:21 AM, Peter Geoghegan wrote: > On Wed, Dec 21, 2016 at 6:00 AM, Robert Haas wrote: >> 3. Just live with the waste of space. > > I am loathe to create a special case for the parallel interface too, > but I think it's possible

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-21 Thread Peter Geoghegan
On Wed, Dec 21, 2016 at 6:00 AM, Robert Haas wrote: > 3. Just live with the waste of space. I am loathe to create a special case for the parallel interface too, but I think it's possible that *no* caller will ever actually need to live with this restriction at any time in

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-21 Thread Robert Haas
On Wed, Dec 21, 2016 at 7:04 AM, Heikki Linnakangas wrote: >> If the worker is always completely finished with the tape before the >> leader touches it, couldn't the leader's LogicalTapeSet just "adopt" >> the tape and overwrite it like any other? > > Currently, the logical tape

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-21 Thread Heikki Linnakangas
On 12/21/2016 12:53 AM, Robert Haas wrote: That leaves one problem, though: reusing space in the final merge phase. If the tapes being merged belong to different LogicalTapeSets, and create one new tape to hold the result, the new tape cannot easily reuse the space of the input tapes because

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-20 Thread Robert Haas
On Tue, Dec 20, 2016 at 8:14 PM, Peter Geoghegan wrote: > Without meaning to sound glib, unification is the process by which > parallel CREATE INDEX has the leader read temp files from workers > sufficient to complete its final on-the-fly merge. That's not glib, but you can't in

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-20 Thread Peter Geoghegan
On Tue, Dec 20, 2016 at 2:53 PM, Robert Haas wrote: >> What I have in mind is something like the attached patch. It refactors >> LogicalTapeRead(), LogicalTapeWrite() etc. functions to take a LogicalTape >> as argument, instead of LogicalTapeSet and tape number.

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-20 Thread Robert Haas
On Wed, Sep 21, 2016 at 12:52 PM, Heikki Linnakangas wrote: > I find this unification business really complicated. I think it'd be simpler > to keep the BufFiles and LogicalTapeSets separate, and instead teach > tuplesort.c how to merge tapes that live on different >

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-04 Thread Haribabu Kommi
On Mon, Dec 5, 2016 at 7:44 AM, Peter Geoghegan wrote: > On Sat, Dec 3, 2016 at 7:23 PM, Tomas Vondra > wrote: > > I do share your concerns about unpredictable behavior - that's > > particularly worrying for pg_restore, which may be used for time-

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-04 Thread Peter Geoghegan
On Sat, Dec 3, 2016 at 7:23 PM, Tomas Vondra wrote: > I do share your concerns about unpredictable behavior - that's > particularly worrying for pg_restore, which may be used for time- > sensitive use cases (DR, migrations between versions), so unpredictable >

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-03 Thread Tomas Vondra
On Sat, 2016-12-03 at 18:37 -0800, Peter Geoghegan wrote: > On Sat, Dec 3, 2016 at 5:45 PM, Alvaro Herrera com> wrote: > > > > I don't think a patch must necessarily consider all possible uses > > that > > the new feature may have.  If we introduce parallel index

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-03 Thread Peter Geoghegan
On Sat, Dec 3, 2016 at 5:45 PM, Alvaro Herrera wrote: > I don't think a patch must necessarily consider all possible uses that > the new feature may have. If we introduce parallel index creation, > that's great; if pg_restore doesn't start using it right away, that's >

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-03 Thread Alvaro Herrera
Peter Geoghegan wrote: > On Mon, Nov 7, 2016 at 8:28 PM, Peter Geoghegan wrote: > > What do we need to teach pg_restore about parallel CREATE INDEX, if > > anything at all? Could this be as simple as a blanket disabling of > > parallelism for CREATE INDEX from pg_restore? Or,

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-12-03 Thread Peter Geoghegan
On Mon, Nov 7, 2016 at 8:28 PM, Peter Geoghegan wrote: > What do we need to teach pg_restore about parallel CREATE INDEX, if > anything at all? Could this be as simple as a blanket disabling of > parallelism for CREATE INDEX from pg_restore? Or, does it need to be > more

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-11-15 Thread Robert Haas
On Wed, Nov 9, 2016 at 10:18 PM, Peter Geoghegan wrote: >> Maybe another way of putting this is that, while there's clearly a >> benefit to having some kind of a cap, it's appropriate to pick a large >> value, such as 500. Having no cap at all risks creating many extra >> tapes

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-11-09 Thread Peter Geoghegan
On Wed, Nov 9, 2016 at 6:57 PM, Robert Haas wrote: > I guess that's possible, but the problem with polyphase merge is that > the increased I/O becomes a pretty significant cost in a hurry. Not if you have a huge RAID array. :-) Obviously I'm not seriously suggesting that

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-11-09 Thread Robert Haas
On Wed, Nov 9, 2016 at 7:54 PM, Peter Geoghegan wrote: >> Now, on the other hand, as far as I can see, the actual amount of >> evidence that 0001 is a good idea which has been presented in this >> forum is pretty near zero. You've argued for it on theoretical >> grounds several

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-11-09 Thread Peter Geoghegan
On Wed, Nov 9, 2016 at 4:54 PM, Peter Geoghegan wrote: > It's more complicated than that. As I said, I think that Knuth > basically had it right with his sweet spot of 7. I think that commit > df700e6b40195d28dc764e0c694ac8cef90d4638 was effective in large part > because a

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-11-09 Thread Peter Geoghegan
On Wed, Nov 9, 2016 at 4:01 PM, Robert Haas wrote: > I gather that 0001, which puts a cap on the number of tapes, is not > actually related to the subject of this thread; it's an independent > change that you think is a good idea. I reviewed the previous > discussion on

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-11-09 Thread Robert Haas
On Mon, Nov 7, 2016 at 11:28 PM, Peter Geoghegan wrote: > I attach V5. I gather that 0001, which puts a cap on the number of tapes, is not actually related to the subject of this thread; it's an independent change that you think is a good idea. I reviewed the previous

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-11-07 Thread Peter Geoghegan
On Mon, Oct 24, 2016 at 6:17 PM, Peter Geoghegan wrote: >> * Cost model. Should probably attempt to guess final index size, and >> derive calculation of number of workers from that. Also, I'm concerned >> that I haven't given enough thought to the low end, where with default >>

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-27 Thread Peter Geoghegan
On Wed, Oct 19, 2016 at 11:33 AM, Peter Geoghegan wrote: > I don't think that eager merging will prove all that effective, > however it's implemented. I see a very I/O bound system when parallel > CREATE INDEX merges serially. There is no obvious reason why you'd > have a

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-26 Thread Peter Geoghegan
On Mon, Aug 1, 2016 at 3:18 PM, Peter Geoghegan wrote: > Setup: > > CREATE TABLE parallel_sort_test AS > SELECT hashint8(i) randint, > md5(i::text) collate "C" padding1, > md5(i::text || '2') collate "C" padding2 > FROM generate_series(0, 1e9::bigint) i; > >

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-24 Thread Peter Geoghegan
On Fri, Oct 7, 2016 at 5:47 PM, Peter Geoghegan wrote: > Work is still needed on: > > * Cost model. Should probably attempt to guess final index size, and > derive calculation of number of workers from that. Also, I'm concerned > that I haven't given enough thought to the low

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-21 Thread Amit Kapila
On Fri, Oct 21, 2016 at 4:25 PM, Amit Kapila wrote: > On Tue, Oct 18, 2016 at 3:48 AM, Peter Geoghegan wrote: >> On Mon, Oct 17, 2016 at 5:36 AM, Amit Kapila wrote: >> >> I read the following paragraph from the Volcano paper

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-21 Thread Amit Kapila
On Tue, Oct 18, 2016 at 3:48 AM, Peter Geoghegan wrote: > On Mon, Oct 17, 2016 at 5:36 AM, Amit Kapila wrote: > > I read the following paragraph from the Volcano paper just now: > > """ > During implementation and benchmarking of parallel sorting, we

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-21 Thread Amit Kapila
On Thu, Oct 20, 2016 at 12:03 AM, Peter Geoghegan wrote: > On Wed, Oct 19, 2016 at 7:39 AM, Robert Haas wrote: >> Gather Merge can't emit a tuple unless it has buffered at least one >> tuple from every producer; otherwise, the next tuple it receives from

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-19 Thread Peter Geoghegan
On Wed, Oct 19, 2016 at 7:39 AM, Robert Haas wrote: > Gather Merge can't emit a tuple unless it has buffered at least one > tuple from every producer; otherwise, the next tuple it receives from > one of those producers might proceed whichever tuple it chooses to > emit.

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-19 Thread Robert Haas
On Mon, Oct 17, 2016 at 8:36 AM, Amit Kapila wrote: >> This project of mine is about parallelizing tuplesort.c, which isn't >> really what you want for parallel query -- you shouldn't try to scope >> the problem as "make the sort more scalable using parallelism" there. >>

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-17 Thread Peter Geoghegan
On Mon, Oct 17, 2016 at 5:36 AM, Amit Kapila wrote: > Okay, but what is the proof or why do you think second is going to > better than first? I don't have proof. It's my opinion that it probably would be, based on partial information, and my intuition. It's hard to prove

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-17 Thread Amit Kapila
On Thu, Oct 13, 2016 at 12:35 AM, Peter Geoghegan wrote: > On Wed, Oct 12, 2016 at 11:09 AM, Robert Haas wrote: > >> On the flip side, what if anything can queries hope to get out of >> parallel sort that they can't get out of Gather Merge? One >>

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-12 Thread Peter Geoghegan
On Wed, Oct 12, 2016 at 11:09 AM, Robert Haas wrote: > I realize that you are primarily targeting utility commands here, and > that is obviously great, because making index builds faster is very > desirable. However, I'd just like to talk for a minute about how this >

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-12 Thread Robert Haas
On Fri, Oct 7, 2016 at 5:47 PM, Peter Geoghegan wrote: > Work is still needed on: > > * Cost model. Should probably attempt to guess final index size, and > derive calculation of number of workers from that. Also, I'm concerned > that I haven't given enough thought to the low

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-10-07 Thread Peter Geoghegan
On Sun, Sep 11, 2016 at 11:05 AM, Peter Geoghegan wrote: > So, while there are still a few loose ends with this revision (it > should still certainly be considered WIP), I wanted to get a revision > out quickly because V1 has been left to bitrot for too long now, and > my

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-27 Thread Robert Haas
On Mon, Sep 26, 2016 at 3:40 PM, Peter Geoghegan wrote: > On Mon, Sep 26, 2016 at 6:58 PM, Robert Haas wrote: >>> That requires some kind of mutual exclusion mechanism, like an LWLock. >> >> No, it doesn't. Shared memory queues are single-reader,

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-26 Thread Peter Geoghegan
On Mon, Sep 26, 2016 at 6:58 PM, Robert Haas wrote: >> That requires some kind of mutual exclusion mechanism, like an LWLock. > > No, it doesn't. Shared memory queues are single-reader, single-writer. The point is that there is a natural dependency when merging is

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-26 Thread Robert Haas
On Sat, Sep 24, 2016 at 9:07 AM, Peter Geoghegan wrote: > On Thu, Sep 22, 2016 at 8:57 PM, Robert Haas wrote: >> On Thu, Sep 22, 2016 at 3:51 PM, Heikki Linnakangas wrote: >>> It'd be good if you could overlap the final merges in the

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-24 Thread Peter Geoghegan
On Thu, Sep 22, 2016 at 8:57 PM, Robert Haas wrote: > On Thu, Sep 22, 2016 at 3:51 PM, Heikki Linnakangas wrote: >> It'd be good if you could overlap the final merges in the workers with the >> merge in the leader. ISTM it would be quite straightforward to

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-24 Thread Peter Geoghegan
On Wed, Sep 21, 2016 at 5:52 PM, Heikki Linnakangas wrote: > I find this unification business really complicated. I can certainly understand why you would. As I said, it's the most complicated part of the patch, which overall is one of the most ambitious patches I've ever

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-22 Thread Robert Haas
On Thu, Sep 22, 2016 at 3:51 PM, Heikki Linnakangas wrote: > It'd be good if you could overlap the final merges in the workers with the > merge in the leader. ISTM it would be quite straightforward to replace the > final tape of each worker with a shared memory queue, so that the

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-22 Thread Heikki Linnakangas
On 08/02/2016 01:18 AM, Peter Geoghegan wrote: No merging in parallel -- Currently, merging worker *output* runs may only occur in the leader process. In other words, we always keep n worker processes busy with scanning-and-sorting (and maybe some merging), but then all

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-21 Thread Heikki Linnakangas
On 08/02/2016 01:18 AM, Peter Geoghegan wrote: Tape unification Sort operations have a unique identifier, generated before any workers are launched, using a scheme based on the leader's PID, and a unique temp file number. This makes all on-disk state (temp files managed by

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-13 Thread Robert Haas
On Sun, Sep 11, 2016 at 2:05 PM, Peter Geoghegan wrote: > On Sun, Sep 11, 2016 at 6:28 AM, Heikki Linnakangas wrote: >> Pushed this "displace root" patch, with some changes: > > Attached is rebased version of the entire patch series, which should > be applied on

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-11 Thread Peter Geoghegan
On Sun, Sep 11, 2016 at 6:28 AM, Heikki Linnakangas wrote: > Pushed this "displace root" patch, with some changes: Attached is rebased version of the entire patch series, which should be applied on top of what you pushed to the master branch today. This features a new scheme

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-11 Thread Peter Geoghegan
On Sun, Sep 11, 2016 at 6:28 AM, Heikki Linnakangas wrote: > * I renamed "tuplesort_heap_siftup()" to "tuplesort_delete_top()". I realize > that this is controversial, per the discussion on the "Is > tuplesort_heap_siftup() a misnomer?" thread. However, now that we have a new >

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-11 Thread Heikki Linnakangas
On 09/10/2016 03:22 AM, Claudio Freire wrote: Overall, however, I believe the patch is in good shape. Only minor form issues need to be changed, the functionality seems both desirable and ready. Pushed this "displace root" patch, with some changes: * I renamed "tuplesort_heap_siftup()" to

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-10 Thread Peter Geoghegan
On Fri, Sep 9, 2016 at 5:22 PM, Claudio Freire wrote: > Since it is true that doing so would make it impossible to keep the > asserts about tupindex in tuplesort_heap_root_displace, I guess it > depends on how useful those asserts are (ie: how likely it is that > those

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-09 Thread Claudio Freire
... On Fri, Sep 9, 2016 at 9:22 PM, Claudio Freire wrote: > Since it is true that doing so would make it impossible to keep the > asserts about tupindex in tuplesort_heap_root_displace, I guess it > depends on how useful those asserts are (ie: how likely it is that > those

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2016-09-09 Thread Peter Geoghegan
On Wed, Sep 7, 2016 at 2:36 PM, Heikki Linnakangas wrote: > 3. If we do that, we'll still have to reserve the tape buffers for all the > tapes that we use during merge. So after we've built the initial runs, we'll > need to reserve memory for those buffers. That might require

  1   2   >