Re: [HACKERS] Using quicksort for every external sort run

2015-12-12 Thread Peter Geoghegan
On Sat, Dec 12, 2015 at 12:10 AM, Jeff Janes wrote: > I have a question about the terminology used in this patch. What is a > tuple proper? What is it in contradistinction to? I would think that > a tuple which is located in its own palloc'ed space is the "proper" > one,

Re: [HACKERS] Using quicksort for every external sort run

2015-12-11 Thread Greg Stark
On Fri, Dec 11, 2015 at 10:41 PM, Greg Stark wrote: > > Interestingly it looks like we could raise the threshold to switching > to insertion sort. At least on my machine the insertion sort is faster > in real time as well as fewer comparisons up to 9 elements. It's > actually

Re: [HACKERS] Using quicksort for every external sort run

2015-12-11 Thread Peter Geoghegan
On Fri, Dec 11, 2015 at 2:41 PM, Greg Stark wrote: > However the number of comparisons is significantly higher. And in the > non-"abbreviated keys" case where the compare is going to be a > function pointer call the number of comparisons is probably more > important than the actual

Re: [HACKERS] Using quicksort for every external sort run

2015-12-11 Thread Greg Stark
On Wed, Dec 9, 2015 at 2:44 AM, Peter Geoghegan wrote: > On Tue, Dec 8, 2015 at 6:39 PM, Greg Stark wrote: > > I guess you mean insertion sort. What's the theoretical justification > for the change? Well my thinking was that hard coding a series of comparisons

Re: [HACKERS] Using quicksort for every external sort run

2015-12-11 Thread Peter Geoghegan
On Fri, Dec 11, 2015 at 2:52 PM, Greg Stark wrote: > Heh. And if I comment out the presorted check the breakeven point is > *exactly* where the threshold is today at 7 elements -- presumably > because Hoare chose it on purpose. I think it was Sedgewick, but yes. I'd be very

Re: [HACKERS] Using quicksort for every external sort run

2015-12-09 Thread Jeremy Harris
On 09/12/15 00:02, Jeff Janes wrote: > The second one consumes that giant tape run along with 232 small tape > runs. In terms of number of comparisons, binary merge works best when the inputs are of similar length. I'd assume the same goes for n-ary merge, but I don't know if comparison count is

Re: [HACKERS] Using quicksort for every external sort run

2015-12-08 Thread Jeff Janes
On Mon, Dec 7, 2015 at 9:01 AM, Jeff Janes wrote: > > The patched one ends with a 2-way, two sequential 233-way merges, and > a final 233-way merge: > > LOG: finished 2-way merge step: CPU 62.08s/435.70u sec elapsed 587.52 sec > LOG: finished 233-way merge step: CPU

Re: [HACKERS] Using quicksort for every external sort run

2015-12-08 Thread Greg Stark
On Wed, Dec 9, 2015 at 12:02 AM, Jeff Janes wrote: > > > Then in the next (final) merge, it is has to read in this huge > fragmented tape run emulation, generating a lot of random IO to read > it. This seems fairly plausible. Logtape.c is basically implementing a small

Re: [HACKERS] Using quicksort for every external sort run

2015-12-08 Thread Greg Stark
On 9 Dec 2015 02:44, "Peter Geoghegan" wrote: > > I guess you mean insertion sort. What's the theoretical justification > for the change? Er, right. Insertion sort. The sort networks I used here are optimal both in number of comparisons and depth. I suspect modern CPUs actually

Re: [HACKERS] Using quicksort for every external sort run

2015-12-08 Thread Peter Geoghegan
On Tue, Dec 8, 2015 at 6:39 PM, Greg Stark wrote: > Fwiw attached are two patches for perusal. One is a trivial patch to > add the size of the tape to trace_sort output. I guess I'll just apply > that without discussion. +1 > The other replaces the selection sort with an > open

Re: [HACKERS] Using quicksort for every external sort run

2015-12-08 Thread Peter Geoghegan
On Tue, Dec 8, 2015 at 6:44 PM, Peter Geoghegan wrote: > On Tue, Dec 8, 2015 at 6:39 PM, Greg Stark wrote: >> Fwiw attached are two patches for perusal. One is a trivial patch to >> add the size of the tape to trace_sort output. I guess I'll just apply >> that

Re: [HACKERS] Using quicksort for every external sort run

2015-12-08 Thread Peter Geoghegan
On Tue, Dec 8, 2015 at 7:09 PM, Peter Geoghegan wrote: > Why multiply by BLCKSZ here? I ask because LogicalTapeSetBlocks() returns blocks directly, not tapes, and I'd expect the same. Also, the callers seem to expect blocks, not bytes. -- Peter Geoghegan -- Sent via

Re: [HACKERS] Using quicksort for every external sort run

2015-12-07 Thread Jeff Janes
On Wed, Dec 2, 2015 at 10:03 AM, Robert Haas wrote: > > While large sorts are uncommon in queries, they are much more common > in index builds. Therefore, I think we ought to be worrying more > about regressions at 64MB than at 4MB, because we ship with >

Re: [HACKERS] Using quicksort for every external sort run

2015-12-07 Thread Peter Geoghegan
On Mon, Dec 7, 2015 at 9:01 AM, Jeff Janes wrote: > So if this patch with this exact workload just happens to land on a > pre-existing infelicity, how big of a deal is that? It wouldn't be > creating a regression, just shoving the region that experiences the > problem

Re: [HACKERS] Using quicksort for every external sort run

2015-12-07 Thread Greg Stark
​So incidentally I've been running some benchmarks myself. Mostly to understand the current scaling behaviour of sorting to better judge whether Peter's analysis of where the pain points are and why we should not worry about optimizing for the multiple merge pass case were on target. I haven't

Re: [HACKERS] Using quicksort for every external sort run

2015-12-06 Thread Peter Geoghegan
On Sun, Dec 6, 2015 at 3:59 PM, Jeff Janes wrote: > My column has the format of ABC-123-456-789-0 > > The name-space identifier ("ABC-") is the same in 99.99% of the > cases. And to date, as well as in my extrapolation, the first two > digits of the numeric part are

Re: [HACKERS] Using quicksort for every external sort run

2015-12-06 Thread Peter Geoghegan
On Tue, Nov 24, 2015 at 4:33 PM, Peter Geoghegan wrote: > So, the bottom line is: This patch seems very good, is unlikely to > have any notable downside (no case has been shown to be regressed), > but has yet to receive code review. I am working on a new version with > the first

Re: [HACKERS] Using quicksort for every external sort run

2015-12-06 Thread Jeff Janes
On Sun, Nov 29, 2015 at 2:01 AM, Peter Geoghegan wrote: > On Sat, Nov 28, 2015 at 4:05 PM, Peter Geoghegan wrote: >> So there was 27.76 seconds spent copying tuples into local memory >> ahead of the quicksort, 2 minutes 56.68 seconds spent actually >>

Re: [HACKERS] Using quicksort for every external sort run

2015-12-06 Thread Peter Geoghegan
On Tue, Nov 24, 2015 at 4:46 PM, Simon Riggs wrote: >> Parallel sort is very important. Robert, Amit and I had a call about >> this earlier today. We're all in agreement that this should be >> extended in that direction, and have a rough idea about how it ought >> to fit

Re: [HACKERS] Using quicksort for every external sort run

2015-12-02 Thread Robert Haas
On Sat, Nov 28, 2015 at 7:05 PM, Peter Geoghegan wrote: > On Sat, Nov 28, 2015 at 2:04 PM, Jeff Janes wrote: >> For me very large sorts (100,000,000 ints) with work_mem below 4MB do >> better with unpatched than with your patch series, by about 5%. Not a

Re: [HACKERS] Using quicksort for every external sort run

2015-12-02 Thread Peter Geoghegan
On Wed, Dec 2, 2015 at 10:03 AM, Robert Haas wrote: >> I'm not very concerned about a regression that is only seen when >> work_mem is set below the (very conservative) postgresql.conf default >> value of 4MB when sorting 100 million integers. > > Perhaps surprisingly, I

Re: [HACKERS] Using quicksort for every external sort run

2015-11-30 Thread Jeff Janes
On Sat, Nov 28, 2015 at 4:05 PM, Peter Geoghegan wrote: > On Sat, Nov 28, 2015 at 2:04 PM, Jeff Janes wrote: ... >> >> The final merging is intermixed with whatever other work goes on to >> build the actual index files out of the sorted data, so I don't

Re: [HACKERS] Using quicksort for every external sort run

2015-11-30 Thread Jeff Janes
On Sun, Nov 29, 2015 at 8:02 PM, David Fetter wrote: >> >> For me very large sorts (100,000,000 ints) with work_mem below 4MB do >> better with unpatched than with your patch series, by about 5%. Not a >> big deal, but also if it is easy to keep the old behavior then I think >>

Re: [HACKERS] Using quicksort for every external sort run

2015-11-30 Thread Peter Geoghegan
On Mon, Nov 30, 2015 at 9:51 AM, Jeff Janes wrote: >> As I said, it seems a little bit unfair to hand-tune work_mem or >> maintenance_work_mem like that. Who can afford to do that? I think you >> agree that it's untenable to have DBAs allocate work_mem differently >> for

Re: [HACKERS] Using quicksort for every external sort run

2015-11-30 Thread Peter Geoghegan
On Mon, Nov 30, 2015 at 5:12 PM, Greg Stark wrote: > I think the take-away is that this is outside the domain where any > interesting break points occur. I think that these are representative of what people want to do with external sorts. We have already had Jeff look for a

Re: [HACKERS] Using quicksort for every external sort run

2015-11-30 Thread Greg Stark
Hm. Here is a log-log chart of those results (sorry for html mail). I'm not really sure if log-log is the right tool to use for a O(nlog(n)) curve though. I think the take-away is that this is outside the domain where any interesting break points occur. Maybe run more tests on the low end to find

Re: [HACKERS] Using quicksort for every external sort run

2015-11-30 Thread Peter Geoghegan
On Mon, Nov 30, 2015 at 12:29 PM, Peter Geoghegan wrote: >> I'm kind of curious as to why the optimal for the patched code appears >> at 1GB and not lower. If I get a chance to rebuild the test, I will >> look into that more. > > I think that the availability of abbreviated keys

Re: [HACKERS] Using quicksort for every external sort run

2015-11-29 Thread Peter Geoghegan
On Sat, Nov 28, 2015 at 4:05 PM, Peter Geoghegan wrote: > So there was 27.76 seconds spent copying tuples into local memory > ahead of the quicksort, 2 minutes 56.68 seconds spent actually > quicksorting, and a trifling 10.32 seconds actually writing the run! I > bet that the

Re: [HACKERS] Using quicksort for every external sort run

2015-11-29 Thread Peter Geoghegan
On Sun, Nov 29, 2015 at 2:01 AM, Peter Geoghegan wrote: > I think that at least the first several > hundred leading attribute tuples are duplicates. I mean duplicate abbreviated keys. There are 40 distinct keys overall in the first 160 tuples, which is why abbreviation is

Re: [HACKERS] Using quicksort for every external sort run

2015-11-29 Thread David Fetter
On Sat, Nov 28, 2015 at 02:04:16PM -0800, Jeff Janes wrote: > On Wed, Nov 18, 2015 at 3:29 PM, Peter Geoghegan wrote: > > On Wed, Nov 18, 2015 at 10:31 AM, Jeff Janes wrote: > > > >> I agree we don't want to optimize for low memory, but I don't think we >

Re: [HACKERS] Using quicksort for every external sort run

2015-11-28 Thread Jeff Janes
On Thu, Nov 19, 2015 at 12:35 PM, Greg Stark wrote: > On Thu, Nov 19, 2015 at 6:56 PM, Peter Geoghegan wrote: >> Yes, I really do mean it when I say that the DBA is not supposed to >> see this message, no matter how much or how little memory or data is >>

Re: [HACKERS] Using quicksort for every external sort run

2015-11-28 Thread Jeff Janes
On Wed, Nov 18, 2015 at 3:29 PM, Peter Geoghegan wrote: > On Wed, Nov 18, 2015 at 10:31 AM, Jeff Janes wrote: > >> I agree we don't want to optimize for low memory, but I don't think we >> should throw it under the bus, either. Right now we are effectively

Re: [HACKERS] Using quicksort for every external sort run

2015-11-28 Thread Peter Geoghegan
On Sat, Nov 28, 2015 at 2:04 PM, Jeff Janes wrote: > For me very large sorts (100,000,000 ints) with work_mem below 4MB do > better with unpatched than with your patch series, by about 5%. Not a > big deal, but also if it is easy to keep the old behavior then I think > we

Re: [HACKERS] Using quicksort for every external sort run

2015-11-25 Thread Peter Geoghegan
On Wed, Nov 25, 2015 at 4:10 AM, Greg Stark wrote: > That's precisely why it's valuable to see a whole series of data > points rather than just one. Often when you see the shape of the > curve, especially any breaks or changes in the behaviour that helps > understand the

Re: [HACKERS] Using quicksort for every external sort run

2015-11-25 Thread Greg Stark
On Wed, Nov 25, 2015 at 2:31 AM, Peter Geoghegan wrote: > > There already was a test case involving a 1TB/16 billion tuple sort > [1] (well, a 1TB gensort Postgres table [2]). Granted, I don't have a > large number of similar test cases across a variety of scales, but > there are

Re: [HACKERS] Using quicksort for every external sort run

2015-11-24 Thread Peter Geoghegan
On Wed, Nov 18, 2015 at 3:29 PM, Peter Geoghegan wrote: >> Overall this is very nice. Doing some real world index builds of >> short text (~20 bytes ascii) identifiers, I could easily get speed ups >> of 40% with your patch if I followed the philosophy of "give it as >> much

Re: [HACKERS] Using quicksort for every external sort run

2015-11-24 Thread Simon Riggs
On 20 November 2015 at 22:58, Peter Geoghegan wrote: > The numbers speak for themselves here. I just want to be clear about > the disadvantages of what I propose, even if it's well worth it > overall in most (all?) cases. > My feeling is that numbers rarely speak for

Re: [HACKERS] Using quicksort for every external sort run

2015-11-24 Thread Peter Geoghegan
On Tue, Nov 24, 2015 at 3:32 PM, Simon Riggs wrote: > My feeling is that numbers rarely speak for themselves, without LSD. (Which > numbers?) Guffaw. > How are we doing here? Keen to see this work get committed, so we can move > onto parallel sort. What's the summary? I

Re: [HACKERS] Using quicksort for every external sort run

2015-11-24 Thread Simon Riggs
On 25 November 2015 at 00:33, Peter Geoghegan wrote: > Parallel sort is very important. Robert, Amit and I had a call about > this earlier today. We're all in agreement that this should be > extended in that direction, and have a rough idea about how it ought > to fit together

Re: [HACKERS] Using quicksort for every external sort run

2015-11-24 Thread Peter Geoghegan
On Tue, Nov 24, 2015 at 4:46 PM, Simon Riggs wrote: >> I had a debug GUC (like the existing one to disable top-N heapsorts) >> that disabled "quicksort with spillover". That's almost the opposite >> of what you're asking for, though, because that makes us never use a >>

Re: [HACKERS] Using quicksort for every external sort run

2015-11-24 Thread Peter Geoghegan
On Tue, Nov 24, 2015 at 6:31 PM, Peter Geoghegan wrote: > (Note that the time taken to copy tuples comprising the final run is > not displayed or accounted for) I mean, comprising the second last run, the run shown, run 40. -- Peter Geoghegan -- Sent via pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-11-24 Thread Peter Geoghegan
On Tue, Nov 24, 2015 at 5:42 PM, Greg Stark wrote: > Actually I kind of agree. What I would like to see is a series of > numbers for increasing sizes of sorts plotted against the same series > for the existing algorithm. Specifically with the sort size varying to > significantly

Re: [HACKERS] Using quicksort for every external sort run

2015-11-24 Thread Greg Stark
On Wed, Nov 25, 2015 at 12:33 AM, Peter Geoghegan wrote: > On Tue, Nov 24, 2015 at 3:32 PM, Simon Riggs wrote: >> My feeling is that numbers rarely speak for themselves, without LSD. (Which >> numbers?) > > Guffaw. Actually I kind of agree. What I would

Re: [HACKERS] Using quicksort for every external sort run

2015-11-22 Thread Peter Geoghegan
On Fri, Nov 20, 2015 at 2:58 PM, Peter Geoghegan wrote: > The numbers speak for themselves here. I just want to be clear about > the disadvantages of what I propose, even if it's well worth it > overall in most (all?) cases. There is a paper called "Critical Evaluation of

Re: [HACKERS] Using quicksort for every external sort run

2015-11-20 Thread Robert Haas
On Thu, Nov 19, 2015 at 5:42 PM, Peter Geoghegan wrote: > I would like more opinions on the multipass_warning message. I can > write a patch that creates a new system view, detailing how sort were > completed, if there is demand. I think a warning message is a terrible idea, and

Re: [HACKERS] Using quicksort for every external sort run

2015-11-20 Thread Robert Haas
On Thu, Nov 19, 2015 at 5:53 PM, Peter Geoghegan wrote: > I'll now talk about my patch series in general -- the actual > consequences of not avoiding a single pass merge phase when the master > branch would have done so. That's what I was asking about. It seemed to me that you

Re: [HACKERS] Using quicksort for every external sort run

2015-11-20 Thread Peter Geoghegan
On Fri, Nov 20, 2015 at 12:50 PM, Robert Haas wrote: > On Thu, Nov 19, 2015 at 5:42 PM, Peter Geoghegan wrote: >> I would like more opinions on the multipass_warning message. I can >> write a patch that creates a new system view, detailing how sort were >>

Re: [HACKERS] Using quicksort for every external sort run

2015-11-20 Thread Peter Geoghegan
On Fri, Nov 20, 2015 at 12:52 PM, Robert Haas wrote: > That's what I was asking about. It seemed to me that you were saying > we could ignore those cases, which doesn't seem to me to be true. I've been around for long enough to know that there are very few cases that can

Re: [HACKERS] Using quicksort for every external sort run

2015-11-19 Thread Peter Geoghegan
On Wed, Nov 18, 2015 at 5:22 PM, Greg Stark wrote: > On Wed, Nov 18, 2015 at 11:29 PM, Peter Geoghegan wrote: >> Other systems expose this explicitly, and, as I said, say in an >> unqualified way that a multi-pass merge should be avoided. Maybe the >> warning

Re: [HACKERS] Using quicksort for every external sort run

2015-11-19 Thread Peter Geoghegan
On Thu, Nov 19, 2015 at 5:32 PM, Greg Stark wrote: > Hm. Have you tested a nearly-sorted input set around 1.5x the size of > work_mem? That should produce a single run using the heap to generate > runs but generate two runs if, AIUI, you're just filling work_mem, > running

Re: [HACKERS] Using quicksort for every external sort run

2015-11-19 Thread Peter Geoghegan
On Thu, Nov 19, 2015 at 2:53 PM, Peter Geoghegan wrote: > The latter 16MB work_mem example query/table is still faster with a > 12MB work_mem than master, even with multiple passes. Quite a bit > faster, in fact: about 37 seconds on master, to about 24.7 seconds > with the

Re: [HACKERS] Using quicksort for every external sort run

2015-11-19 Thread Greg Stark
On Fri, Nov 20, 2015 at 12:54 AM, Peter Geoghegan wrote: > * One run can be created with replacement selection, where a > hyrbid-sort merge strategy needs to create and then merge many runs. > When I started work on this patch, I was pretty sure that case would > be noticeably

Re: [HACKERS] Using quicksort for every external sort run

2015-11-19 Thread Peter Geoghegan
On Wed, Nov 18, 2015 at 6:19 PM, Robert Haas wrote: > On Wed, Nov 18, 2015 at 6:29 PM, Peter Geoghegan wrote: >> In principle, I have no problem with doing that. Through testing, I >> cannot see any actual upside, though. Perhaps I just missed something.

Re: [HACKERS] Using quicksort for every external sort run

2015-11-19 Thread Greg Stark
On Thu, Nov 19, 2015 at 8:35 PM, Greg Stark wrote: > Hm. So a bit of back-of-envelope calculation. If we have want to > buffer at least 1MB for each run -- I think we currently do more > actually -- and say that a 1GB work_mem ought to be enough to run > reasonably (that's per sort

Re: [HACKERS] Using quicksort for every external sort run

2015-11-19 Thread Simon Riggs
On 19 November 2015 at 01:22, Greg Stark wrote: > Perhaps the right thing to do is report a statistic to pg_stats so > DBAs can see how often sorts are in memory, how often they're on disk, > and how often the on disk sort requires n passes. That would put them > in the same

Re: [HACKERS] Using quicksort for every external sort run

2015-11-19 Thread Greg Stark
On Thu, Nov 19, 2015 at 6:56 PM, Peter Geoghegan wrote: > Yes, I really do mean it when I say that the DBA is not supposed to > see this message, no matter how much or how little memory or data is > involved. There is no nuance intended here; it isn't sensible to allow > a

Re: [HACKERS] Using quicksort for every external sort run

2015-11-19 Thread Peter Geoghegan
On Thu, Nov 19, 2015 at 12:35 PM, Greg Stark wrote: > So I think you're kind of right and kind of wrong. The vast majority > of use cases are either sub 1TB or are in work environments designed > specifically for data warehouse queries where a user can obtain much > more memory for

Re: [HACKERS] Using quicksort for every external sort run

2015-11-19 Thread Peter Geoghegan
On Thu, Nov 19, 2015 at 2:35 PM, Robert Haas wrote: > OK, so reversing this analysis, with the default work_mem of 4MB, we'd > need a multi-pass merge for more than 235MB/4 = 58MB of data. That is > very, very far from being a can't-happen scenario, and I would not at >

Re: [HACKERS] Using quicksort for every external sort run

2015-11-19 Thread Robert Haas
On Thu, Nov 19, 2015 at 3:43 PM, Peter Geoghegan wrote: >> I'd be interested in seeing this analysis in some detail. > > Sure. Jeff mentioned 8MB as a work_mem setting, so let's examine a > case where that's the work_mem setting, and see experimentally where > the crossover point

Re: [HACKERS] Using quicksort for every external sort run

2015-11-18 Thread Jeff Janes
On Wed, Aug 19, 2015 at 7:24 PM, Peter Geoghegan wrote: Hi Peter, Your most recent versions of this patch series (not the ones on the email I am replying to) give a compiler warning: tuplesort.c: In function 'mergeruns': tuplesort.c:2741: warning: unused variable 'memNowUsed'

Re: [HACKERS] Using quicksort for every external sort run

2015-11-18 Thread Greg Stark
On Wed, Nov 18, 2015 at 11:29 PM, Peter Geoghegan wrote: > Other systems expose this explicitly, and, as I said, say in an > unqualified way that a multi-pass merge should be avoided. Maybe the > warning isn't the right way of communicating that message to the DBA > in detail,

Re: [HACKERS] Using quicksort for every external sort run

2015-11-18 Thread Robert Haas
On Wed, Nov 18, 2015 at 6:29 PM, Peter Geoghegan wrote: > In principle, I have no problem with doing that. Through testing, I > cannot see any actual upside, though. Perhaps I just missed something. > Even 8MB is enough to avoid the multipass merge in the event of a >

Re: [HACKERS] Using quicksort for every external sort run

2015-11-18 Thread Peter Geoghegan
Hi Jeff, On Wed, Nov 18, 2015 at 10:31 AM, Jeff Janes wrote: > tuplesort.c: In function 'mergeruns': > tuplesort.c:2741: warning: unused variable 'memNowUsed' That was caused by a last-minute change to the mulitpass warning message. I forgot to build at -O2, and missed

Re: [HACKERS] Using quicksort for every external sort run

2015-11-09 Thread Corey Huinker
On Fri, Nov 6, 2015 at 8:08 PM, Peter Geoghegan wrote: > On Wed, Aug 19, 2015 at 7:24 PM, Peter Geoghegan wrote: > > I'll start a new thread for this, since my external sorting patch has > > now evolved well past the original "quicksort with spillover" > >

Re: [HACKERS] Using quicksort for every external sort run

2015-11-06 Thread Peter Geoghegan
On Wed, Aug 19, 2015 at 7:24 PM, Peter Geoghegan wrote: > I'll start a new thread for this, since my external sorting patch has > now evolved well past the original "quicksort with spillover" > idea...although not quite how I anticipated it would. It seems like > I've reached a

Re: [HACKERS] Using quicksort for every external sort run

2015-09-06 Thread Marc Mamin
>I will sketch a simple implementation of parallel sorting based on the >patch series that may be workable, and requires relatively little >implementation effort compare to other ideas that were raised at >various times: Hello, I've only a very superficial understanding on your work, please

Re: [HACKERS] Using quicksort for every external sort run

2015-09-06 Thread Peter Geoghegan
On Sun, Sep 6, 2015 at 1:51 AM, Marc Mamin wrote: > Have you considered performances for cases where multiple CREATE INDEX are > running in parallel? > One of our typical use case are large daily tables (50-300 Mio rows) with up > to 6 index creations > that start

Re: [HACKERS] Using quicksort for every external sort run

2015-08-21 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 11:56 PM, Simon Riggs si...@2ndquadrant.com wrote: This will give more runs, so merging those needs some thought. It will also give a more predictable number of runs, so we'll be able to predict any merging issues ahead of time. We can more easily find out the min/max

Re: [HACKERS] Using quicksort for every external sort run

2015-08-21 Thread Simon Riggs
On 20 August 2015 at 18:41, Peter Geoghegan p...@heroku.com wrote: On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs si...@2ndquadrant.com wrote: On 20 August 2015 at 03:24, Peter Geoghegan p...@heroku.com wrote: The patch is ~3.25x faster than master I've tried to read this post twice

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 6:54 AM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark st...@mit.edu writes: Alternately, has anyone tested whether Timsort would work well? I think that was proposed a few years ago and did not look so good in simple testing. I tested it in 2012. I got as far as

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 5:02 PM, Greg Stark st...@mit.edu wrote: I haven't thought through the exponential growth carefully enough to tell if doubling the run size should decrease the number of passes linearly or by a constant number. It seems that with 5 times the data that previously

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Greg Stark
On Thu, Aug 20, 2015 at 11:16 PM, Peter Geoghegan p...@heroku.com wrote: It could reduce seek time, which might be the dominant cost (but not I/O as such). No I didn't quite follow the argument to completion. Increasing the run size is a win if it reduces the number of passes. In the

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs si...@2ndquadrant.com wrote: On 20 August 2015 at 03:24, Peter Geoghegan p...@heroku.com wrote: The patch is ~3.25x faster than master I've tried to read this post twice and both times my work_mem overflowed. ;-) Can you summarize what this

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Tom Lane
Greg Stark st...@mit.edu writes: Alternately, has anyone tested whether Timsort would work well? I think that was proposed a few years ago and did not look so good in simple testing. regards, tom lane -- Sent via pgsql-hackers mailing list

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Greg Stark
On Thu, Aug 20, 2015 at 3:24 AM, Peter Geoghegan p...@heroku.com wrote: I believe, in general, that we should consider a multi-pass sort to be a kind of inherently suspect thing these days, in the same way that checkpoints occurring 5 seconds apart are: not actually abnormal, but something

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 12:42 PM, Feng Tian ft...@vitessedata.com wrote: Just a quick anecdotal evidence. I did similar experiment about three years ago. The conclusion was that if you have SSD, just do quick sort and forget the longer runs, but if you are using hard drives, longer runs is

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Simon Riggs
On 20 August 2015 at 03:24, Peter Geoghegan p...@heroku.com wrote: The patch is ~3.25x faster than master I've tried to read this post twice and both times my work_mem overflowed. ;-) Can you summarize what this patch does? I understand clearly what it doesn't do... -- Simon Riggs

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 1:28 PM, Feng Tian ft...@vitessedata.com wrote: Agree everything in principal,except one thing -- no, random IO on HDD in 2010s (relative to CPU/Memory/SSD), is not any faster than tape in 1970s. :-) Sure. The advantage of replacement selection could be a deciding

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Feng Tian
On Thu, Aug 20, 2015 at 1:16 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Aug 20, 2015 at 12:42 PM, Feng Tian ft...@vitessedata.com wrote: Just a quick anecdotal evidence. I did similar experiment about three years ago. The conclusion was that if you have SSD, just do quick sort

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Feng Tian
On Thu, Aug 20, 2015 at 10:41 AM, Peter Geoghegan p...@heroku.com wrote: On Thu, Aug 20, 2015 at 8:15 AM, Simon Riggs si...@2ndquadrant.com wrote: On 20 August 2015 at 03:24, Peter Geoghegan p...@heroku.com wrote: The patch is ~3.25x faster than master I've tried to read this post

Re: [HACKERS] Using quicksort for every external sort run

2015-08-20 Thread Peter Geoghegan
On Thu, Aug 20, 2015 at 6:05 AM, Greg Stark st...@mit.edu wrote: On Thu, Aug 20, 2015 at 3:24 AM, Peter Geoghegan p...@heroku.com wrote: I believe, in general, that we should consider a multi-pass sort to be a kind of inherently suspect thing these days, in the same way that checkpoints

[HACKERS] Using quicksort for every external sort run

2015-08-19 Thread Peter Geoghegan
I'll start a new thread for this, since my external sorting patch has now evolved well past the original quicksort with spillover idea...although not quite how I anticipated it would. It seems like I've reached a good point to get some feedback. I attach a patch series featuring a new, more

<    1   2