Re: [HACKERS] Using quicksort for every external sort run

2016-04-07 Thread Peter Geoghegan
On Thu, Apr 7, 2016 at 11:10 AM, Robert Haas wrote: > I prefer units of tuples, with the GUC itself therefore being > unitless. I suggest we call the parameter replacement_sort_threshold > and document that (1) the ideal value may depend on the amount of CPU > cache

Re: [HACKERS] Using quicksort for every external sort run

2016-04-07 Thread Peter Geoghegan
On Thu, Apr 7, 2016 at 11:10 AM, Robert Haas wrote: > I prefer units of tuples, with the GUC itself therefore being > unitless. I suggest we call the parameter replacement_sort_threshold > and document that (1) the ideal value may depend on the amount of CPU > cache

Re: [HACKERS] Using quicksort for every external sort run

2016-04-07 Thread Peter Geoghegan
On Thu, Apr 7, 2016 at 11:05 AM, Robert Haas wrote: > I spent some time today reading through the new 0001 and in general I > think it looks pretty good. Cool. > 1. Changing cost_sort to consider disk access as 90% sequential, 10% > random rather than 75% sequential, 25%

Re: [HACKERS] Using quicksort for every external sort run

2016-04-07 Thread Robert Haas
On Thu, Apr 7, 2016 at 1:17 PM, Peter Geoghegan wrote: >> I certainly agree that GUCs that aren't easy to tune are bad. I'm >> wondering whether the fact that this one is hard to tune is something >> that can be fixed. The comments about "padding" - a term I don't >> like,

Re: [HACKERS] Using quicksort for every external sort run

2016-04-07 Thread Robert Haas
On Mon, Mar 21, 2016 at 2:01 AM, Peter Geoghegan wrote: > On Thu, Mar 17, 2016 at 1:13 PM, Robert Haas wrote: >> OK, I have now committed 0001 > > I attach a revision of the external quicksort patch and supplementary > small patches, rebased on top of the

Re: [HACKERS] Using quicksort for every external sort run

2016-04-07 Thread Peter Geoghegan
On Thu, Apr 7, 2016 at 6:55 AM, Robert Haas wrote: >> In reality there will be multiple processes running at the same time (e.g >> backends when running parallel query), significantly reducing the amount of >> cache per process, making the replacement sort inefficient and

Re: [HACKERS] Using quicksort for every external sort run

2016-04-07 Thread Robert Haas
Sorry for not responding to this thread again sooner. I was on vacation Thursday-Sunday, and have been playing catch-up since then. On Sun, Apr 3, 2016 at 8:24 AM, Tomas Vondra wrote: > Secondly, master is faster only if there's enough on-CPU cache for the >

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Peter Geoghegan
On Sun, Apr 3, 2016 at 4:08 PM, Greg Stark wrote: >> SELECT * FROM (SELECT * FROM (SELECT a FROM int_test UNION SELECT a >> FROM int_test_padding) gg OFFSET 1e10) ff; > > ISTM OFFSET binds more loosely than UNION so these should be equivalent. Not exactly: postgres=# explain

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Greg Stark
On Sun, Apr 3, 2016 at 12:50 AM, Peter Geoghegan wrote: > 1459308434.753 2016-03-30 05:27:14 CEST STATEMENT: SELECT * FROM > (SELECT a FROM int_test UNION SELECT a FROM int_test_padding OFFSET > 1e10) ff; > > I think that this is invalid, because the query was intended as this:

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Peter Geoghegan
I just mean that, as you say, trace_sort is described in the documentation. I don't think we'll end up with any kind of cost model here, so where that would need to happen is only an academic matter. The create index parameter would only be an option for the DBA. That's about the only case I can

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Tomas Vondra
On 04/03/2016 09:41 PM, Peter Geoghegan wrote: Hi Tomas, ... 3) replacement_sort_mem GUC I'm not quite sure what's the plan with this GUC. It was useful for development, but it seems to me it's pretty difficult to tune it in practice (especially if you don't know the internals, which users

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Peter Geoghegan
Hi Tomas, Overall, I agree with your summary. On Sun, Apr 3, 2016 at 5:24 AM, Tomas Vondra wrote: > So, let me sum this up, the way I understand the current status. > > > 1) overall, the patch seems to be a clear performance improvement I think that's clear. There

Re: [HACKERS] Using quicksort for every external sort run

2016-04-03 Thread Tomas Vondra
Hi, So, let me sum this up, the way I understand the current status. 1) overall, the patch seems to be a clear performance improvement There's far more "green" cells than "red" ones in the spreadsheets, and the patch often shaves off 30-75% of the sort duration. Improvements are pretty much

Re: [HACKERS] Using quicksort for every external sort run

2016-04-02 Thread Peter Geoghegan
On Sat, Apr 2, 2016 at 3:22 PM, Peter Geoghegan wrote: > On Sat, Apr 2, 2016 at 3:20 PM, Greg Stark wrote: >> There are also some weird cases in this list where there's a >> significant regression at 32MB but not at 8MB. I would like to see >> 16MB and perhaps

Re: [HACKERS] Using quicksort for every external sort run

2016-04-02 Thread Peter Geoghegan
On Sat, Apr 2, 2016 at 7:31 AM, Tomas Vondra wrote: > So, I do have the results from both machines - I've attached the basic > comparison spreadsheets, the complete summary is available here: > >https://github.com/tvondra/sort-benchmark > > The database log also

Re: [HACKERS] Using quicksort for every external sort run

2016-04-02 Thread Peter Geoghegan
On Sat, Apr 2, 2016 at 3:20 PM, Greg Stark wrote: > These are the averages across all queries across all data sets for the > run-time for the patch versus master (not patched 64 which I think is > the replacement_sort_mem=64MB which appears to not be a win). So even > in the less

Re: [HACKERS] Using quicksort for every external sort run

2016-04-02 Thread Peter Geoghegan
On Sat, Apr 2, 2016 at 3:20 PM, Greg Stark wrote: > There are also some weird cases in this list where there's a > significant regression at 32MB but not at 8MB. I would like to see > 16MB and perhaps 12MB and 24MB. They would help understand if these > are just quirks or there's a

Re: [HACKERS] Using quicksort for every external sort run

2016-04-02 Thread Greg Stark
On Sat, Apr 2, 2016 at 3:31 PM, Tomas Vondra wrote: > So let me be clear: I do think the patch seems to be a significant > performance improvement for most of the queries, and I'm OK with accepting a > few regressions (particularly if we agree those are pathological

Re: [HACKERS] Using quicksort for every external sort run

2016-04-01 Thread Peter Geoghegan
On Thu, Feb 4, 2016 at 3:14 AM, Peter Geoghegan wrote: > Nyberg et al may have said it best in 1994, in the Alphasort Paper [1]: This paper is available from http://www.vldb.org/journal/VLDBJ4/P603.pdf (previously link is now dead) > The paper also has very good analysis of the

Re: [HACKERS] Using quicksort for every external sort run

2016-03-30 Thread Peter Geoghegan
On Wed, Mar 30, 2016 at 4:22 AM, Greg Stark wrote: > I'm sorry I was intending to run those benchmarks again this past week > but haven't gotten around to it. But my plan was to run them on a good > server I borrowed, an i7 with 8MB cache. I can still go ahead with > that but I can

Re: [HACKERS] Using quicksort for every external sort run

2016-03-30 Thread Greg Stark
On Wed, Mar 30, 2016 at 7:23 AM, Peter Geoghegan wrote: > Anyway, what I liked about Greg's approach to finding regressions at > the low end was that when testing, he used the cheapest possible VM > available on Google's cloud platform. When testing the low end, he had > low end

Re: [HACKERS] Using quicksort for every external sort run

2016-03-30 Thread Peter Geoghegan
On Tue, Mar 29, 2016 at 6:02 PM, Tomas Vondra wrote: > That may be easily due to differences between the CPUs and configuration. > For example the Xeon uses a way older CPU with different amounts of CPU > cache, and it's also a multi-socket system. And so on. So,

Re: [HACKERS] Using quicksort for every external sort run

2016-03-29 Thread Peter Geoghegan
On Tue, Mar 29, 2016 at 6:02 PM, Tomas Vondra wrote: > And why not? I mean, why should it be acceptable to slow down? My point was that over 80% of execution time was spent in the HashAggregate, which outputs tuples to the sort. That, and the huge i5/Xeon

Re: [HACKERS] Using quicksort for every external sort run

2016-03-29 Thread Tomas Vondra
Hi, On 03/29/2016 09:43 PM, Peter Geoghegan wrote: On Tue, Mar 29, 2016 at 9:11 AM, Robert Haas wrote: One test that kind of bothers me in particular is the "SELECT DISTINCT a FROM numeric_test ORDER BY a" test on the high_cardinality_random data set. That's a wash at

Re: [HACKERS] Using quicksort for every external sort run

2016-03-29 Thread Peter Geoghegan
On Tue, Mar 29, 2016 at 12:43 PM, Peter Geoghegan wrote: > A complete do-over from Tomas would be best, here. He has already > acknowledged that the i5 CREATE INDEX results were completely invalid. The following analysis is all based on Xeon numbers, which as I've said we should

Re: [HACKERS] Using quicksort for every external sort run

2016-03-29 Thread Peter Geoghegan
On Tue, Mar 29, 2016 at 9:11 AM, Robert Haas wrote: > One test that kind of bothers me in particular is the "SELECT DISTINCT > a FROM numeric_test ORDER BY a" test on the high_cardinality_random > data set. That's a wash at most work_mem values, but at 32MB it's > more

Re: [HACKERS] Using quicksort for every external sort run

2016-03-29 Thread Robert Haas
On Mon, Mar 28, 2016 at 11:18 PM, Peter Geoghegan wrote: > Note that amcheck V2, which I posted just now features tests for > external sorting. The way these work requires discussion. The tests > are motivated in part by the recent strxfrm() debacle, as well as by > the need to

Re: [HACKERS] Using quicksort for every external sort run

2016-03-28 Thread Peter Geoghegan
On Thu, Mar 10, 2016 at 6:54 PM, Peter Geoghegan wrote: > I've used amcheck [2] to test this latest revision -- the tool ought > to not see any problems with any index created with the patch applied. > Reviewers might find it helpful to use amcheck, too. As 9.6 is > stabilized, I

Re: [HACKERS] Using quicksort for every external sort run

2016-03-24 Thread Peter Geoghegan
On Sun, Mar 20, 2016 at 11:01 PM, Peter Geoghegan wrote: > Allowing 0 tuple runs in rare cases seems like the simplest solution. > After all, mergeprereadone() is expressly prepared for 0 tuple runs. > It says "ensure that we have at least one tuple, if any are to be > had".

Re: [HACKERS] Using quicksort for every external sort run

2016-03-23 Thread Peter Geoghegan
On Wed, Mar 23, 2016 at 8:05 PM, Tomas Vondra wrote: > FWIW, maintenance_work_mem was set to 1GB on the i5 machine and 256MB on the > Xeon. Hmm, maybe that's why we see no difference for CREATE INDEX on the i5, > and an improvement on the Xeon. That would explain

Re: [HACKERS] Using quicksort for every external sort run

2016-03-23 Thread Tomas Vondra
Hi, On 03/24/2016 03:00 AM, Peter Geoghegan wrote: On Tue, Mar 22, 2016 at 3:35 PM, Tomas Vondra wrote: I've tested the patch you've sent on 2016/3/11, which I believe is the last version. I haven't tuned the replacement_sort_mem at all. because my understanding

Re: [HACKERS] Using quicksort for every external sort run

2016-03-23 Thread Peter Geoghegan
On Tue, Mar 22, 2016 at 2:27 PM, Tomas Vondra wrote: > For example these two queries got almost 2x as slow for some data sets: > > SELECT a FROM numeric_test UNION SELECT a FROM numeric_test_padding > SELECT a FROM text_test UNION SELECT a FROM text_test_padding > >

Re: [HACKERS] Using quicksort for every external sort run

2016-03-23 Thread Peter Geoghegan
On Tue, Mar 22, 2016 at 3:35 PM, Tomas Vondra wrote: > I've tested the patch you've sent on 2016/3/11, which I believe is the last > version. I haven't tuned the replacement_sort_mem at all. because my > understanding was that it's not a 9.6 material (per your

Re: [HACKERS] Using quicksort for every external sort run

2016-03-22 Thread Tomas Vondra
Hi, On 03/22/2016 11:07 PM, Peter Geoghegan wrote: On Tue, Mar 22, 2016 at 2:27 PM, Tomas Vondra wrote: Each query was executed 5x for each work_mem value (between 8MB and 1GB), and then a median of the runs was computed and that's what's on the "comparison".

Re: [HACKERS] Using quicksort for every external sort run

2016-03-22 Thread Peter Geoghegan
On Tue, Mar 22, 2016 at 2:27 PM, Tomas Vondra wrote: > Each query was executed 5x for each work_mem value (between 8MB and 1GB), > and then a median of the runs was computed and that's what's on the > "comparison". This compares a414d96ad2b without (master) and with

Re: [HACKERS] Using quicksort for every external sort run

2016-03-21 Thread Peter Geoghegan
On Thu, Mar 17, 2016 at 1:13 PM, Robert Haas wrote: > OK, I have now committed 0001 I attach a revision of the external quicksort patch and supplementary small patches, rebased on top of the master branch. Changes: 1. As I mentioned on the "Improve memory management for

Re: [HACKERS] Using quicksort for every external sort run

2016-03-19 Thread Robert Haas
On Thu, Mar 10, 2016 at 9:54 PM, Peter Geoghegan wrote: > 1. Creates a separate memory context for tuplesort's copies of > caller's tuples, which can be reset at key points, avoiding > fragmentation. Every SortTuple.tuple is allocated there (with trivial > exception); *everything

Re: [HACKERS] Using quicksort for every external sort run

2016-03-19 Thread Robert Haas
On Wed, Mar 16, 2016 at 9:42 PM, Peter Geoghegan wrote: > On Wed, Mar 16, 2016 at 3:31 PM, Robert Haas wrote: >> I spent some time looking at this part of the patch yesterday and >> today. > > Thanks for getting back to it. OK, I have now committed 0001,

Re: [HACKERS] Using quicksort for every external sort run

2016-03-19 Thread Peter Geoghegan
On Wed, Mar 16, 2016 at 3:31 PM, Robert Haas wrote: > I spent some time looking at this part of the patch yesterday and > today. Thanks for getting back to it. > - I think that batchmemtuples() is somewhat weird. Normally, > grow_memtuples() doubles the size of the array

Re: [HACKERS] Using quicksort for every external sort run

2016-03-19 Thread Peter Geoghegan
On Wed, Mar 16, 2016 at 6:42 PM, Peter Geoghegan wrote: >> - I think that batchmemtuples() is somewhat weird. Normally, >> grow_memtuples() doubles the size of the array each time it's called. >> So if you somehow called this function when you still had lots of >> memory

Re: [HACKERS] Using quicksort for every external sort run

2016-03-18 Thread Robert Haas
On Wed, Mar 16, 2016 at 9:42 PM, Peter Geoghegan wrote: >> - I haven't yet figured out why we use batching only for the final >> on-the-fly merge pass, instead of doing it for all merges. I expect >> you have a reason. I just don't know what it is. > > The most obvious reason,

Re: [HACKERS] Using quicksort for every external sort run

2016-03-18 Thread Peter Geoghegan
On Thu, Mar 17, 2016 at 1:13 PM, Robert Haas wrote: > OK, I have now committed 0001, and separately, some comment > improvements - or at least, I think they are improvements - based on > this discussion. Thanks! Your changes look good to me. It's always interesting to

Re: [HACKERS] Using quicksort for every external sort run

2016-03-10 Thread Peter Geoghegan
On Sun, Feb 14, 2016 at 8:01 PM, Peter Geoghegan wrote: > The query I'm testing is: "reindex index pgbench_accounts_pkey;" > > Now, with a maintenance_work_mem of 5MB, the most recent revision of > my patch takes about 54.2 seconds to complete this, as compared to > master's 44.4

Re: [HACKERS] Using quicksort for every external sort run

2016-03-10 Thread Peter Geoghegan
On Thu, Mar 10, 2016 at 10:39 AM, Greg Stark wrote: > I want to rerun these on a dedicated machine and with trace_sort > enabled so that we can see how many merge passes were actually > happening and how much I/O was actually happening. Putting the results in context, by keeping

Re: [HACKERS] Using quicksort for every external sort run

2016-03-10 Thread Greg Stark
On Thu, Mar 10, 2016 at 1:40 PM, Tomas Vondra wrote: > I was thinking about running some benchmarks on this patch, but the > thread is pretty huge so I want to make sure I'm not missing something > and this is indeed the most recent version. I also ran some

Re: [HACKERS] Using quicksort for every external sort run

2016-03-10 Thread Peter Geoghegan
On Thu, Mar 10, 2016 at 5:40 AM, Tomas Vondra wrote: > I was thinking about running some benchmarks on this patch, but the > thread is pretty huge so I want to make sure I'm not missing something > and this is indeed the most recent version. Wait 24 - 48 hours,

Re: [HACKERS] Using quicksort for every external sort run

2016-03-10 Thread Tomas Vondra
Hi, On Mon, 2015-12-28 at 15:03 -0800, Peter Geoghegan wrote: > On Fri, Dec 18, 2015 at 11:57 AM, Peter Geoghegan wrote: > > BTW, I'm not necessarily determined to make the new special-purpose > > allocator work exactly as proposed. It seemed useful to prioritize > > simplicity,

Re: [HACKERS] Using quicksort for every external sort run

2016-03-07 Thread Peter Geoghegan
On Mon, Feb 15, 2016 at 3:45 PM, Greg Stark wrote: > I was thinking about this over the past couple weeks. I'm starting to > think the quicksort runs gives at least the beginnings of a way > forward on this front. As I've already pointed out several times, I wrote a tool that

Re: [HACKERS] Using quicksort for every external sort run

2016-02-15 Thread Greg Stark
On Mon, Feb 15, 2016 at 8:43 PM, Jim Nasby wrote: > On 2/7/16 8:57 PM, Peter Geoghegan wrote: >> >> It seems less than ideal that DBAs have to be so >> conservative in sizing work_mem. > > > +10 I was thinking about this over the past couple weeks. I'm starting to

Re: [HACKERS] Using quicksort for every external sort run

2016-02-15 Thread Jim Nasby
On 2/7/16 8:57 PM, Peter Geoghegan wrote: It seems less than ideal that DBAs have to be so conservative in sizing work_mem. +10 -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble!

Re: [HACKERS] Using quicksort for every external sort run

2016-02-14 Thread Peter Geoghegan
On Sun, Feb 7, 2016 at 4:50 PM, Peter Geoghegan wrote: >> I'm not even sure this is necessary. The idea of missing out on >> producing a single sorted run sounds bad but in practice since we >> normally do the final merge on the fly there doesn't seem like there's >> really any

Re: [HACKERS] Using quicksort for every external sort run

2016-02-07 Thread Peter Geoghegan
On Fri, Feb 5, 2016 at 9:31 AM, Robert Haas wrote: > Peter, please weigh in and let me know if I've gotten anything > incorrect here or if you think of other concerns afterwards. Right. Let me give you the executive summary first: I continue to believe, following thinking

Re: [HACKERS] Using quicksort for every external sort run

2016-02-07 Thread Greg Stark
On Sun, Feb 7, 2016 at 8:21 PM, Robert Haas wrote: > On Sun, Feb 7, 2016 at 11:00 AM, Peter Geoghegan wrote: > > It was my understanding, based on your emphasis on producing only a > > single run, as well as your recent remarks on this thread about the > >

Re: [HACKERS] Using quicksort for every external sort run

2016-02-07 Thread Robert Haas
On Sun, Feb 7, 2016 at 11:00 AM, Peter Geoghegan wrote: > Right. Let me give you the executive summary first: I continue to > believe, following thinking about the matter in detail, that this is a > sensible compromise, that weighs everyone's concerns. It is pretty > close to a

Re: [HACKERS] Using quicksort for every external sort run

2016-02-07 Thread Peter Geoghegan
On Sun, Feb 7, 2016 at 4:50 PM, Peter Geoghegan wrote: >> I'm not even sure this is necessary. The idea of missing out on >> producing a single sorted run sounds bad but in practice since we >> normally do the final merge on the fly there doesn't seem like there's >> really any

Re: [HACKERS] Using quicksort for every external sort run

2016-02-07 Thread Peter Geoghegan
On Sun, Feb 7, 2016 at 10:51 AM, Greg Stark wrote: >> > You don't want to change the behavior of the current patch for the >> > second or subsequent run; that should remain a quicksort, pure and >> > simple. Do I have that right? >> >> Yes. > > I'm not even sure this is necessary.

Re: [HACKERS] Using quicksort for every external sort run

2016-02-05 Thread Robert Haas
On Thu, Feb 4, 2016 at 6:14 AM, Peter Geoghegan wrote: > The economics of using 4MB or even 20MB to sort 10GB of data are > already preposterously bad for everyone that runs a database server, > no matter how budget conscious they may be. I can reluctantly accept > that we need

Re: [HACKERS] Using quicksort for every external sort run

2016-02-04 Thread Peter Geoghegan
On Sat, Jan 30, 2016 at 5:29 AM, Robert Haas wrote: >> I meant use "quicksort with spillover" simply because an estimated >> 90%+ of all tuples have already been consumed. Don't consider the >> tuple width, etc. > > Hmm, it's a thought. To be honest, it's a bit annoying

Re: [HACKERS] Using quicksort for every external sort run

2016-02-04 Thread Peter Geoghegan
On Thu, Feb 4, 2016 at 1:46 AM, Peter Geoghegan wrote: > It wasn't my original insight that replacement selection has become > all but obsolete. It took me a while to come around to that point of > view. Nyberg et al may have said it best in 1994, in the Alphasort Paper [1]:

Re: [HACKERS] Using quicksort for every external sort run

2016-01-30 Thread Robert Haas
On Sat, Jan 30, 2016 at 2:25 AM, Peter Geoghegan wrote: > On Fri, Jan 29, 2016 at 2:58 PM, Robert Haas wrote: >> I don't quite know what you mean by these numbers. Add a generic, >> conservative threshold to what? > > I meant use "quicksort with

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Mithun Cy
On Fri, Jan 29, 2016 at 5:11 PM, Mithun Cy wrote > > > > >I just ran some tests on above patch. Mainly to compare > >how "longer sort keys" would behave with new(Qsort) and old Algo(RS) for > sorting. > >I have 8GB of ram and ssd storage. > > > *Key length 520* > > > >

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Mithun Cy
On Tue, Dec 29, 2015 at 4:33 AM, Peter Geoghegan wrote: >Attached is a revision that significantly overhauls the memory patch, >with several smaller changes. I just ran some tests on above patch. Mainly to compare how "longer sort keys" would behave with new(Qsort) and old

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Peter Geoghegan
On Fri, Jan 29, 2016 at 3:41 AM, Mithun Cy wrote: > I just ran some tests on above patch. Mainly to compare > how "longer sort keys" would behave with new(Qsort) and old Algo(RS) for > sorting. > I have 8GB of ram and ssd storage. > > Settings and Results. >

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Robert Haas
On Wed, Jan 27, 2016 at 8:20 AM, Peter Geoghegan wrote: > Correct me if I'm wrong, but I think that the only outstanding issue > with all patches posted here so far is the "quicksort with spillover" > cost model. Hopefully this can be cleared up soon. As I've said, I am > very

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Peter Geoghegan
On Fri, Jan 29, 2016 at 9:24 AM, Robert Haas wrote: > I feel like this could be data driven. I mean, the cost model is > based mainly on the tuple width and the size of the SortTuple array. > So, it should be possible to tests of both algorithms on 32, 64, 96, > 128, ...

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Robert Haas
On Fri, Jan 29, 2016 at 12:46 PM, Peter Geoghegan wrote: > On Fri, Jan 29, 2016 at 9:24 AM, Robert Haas wrote: >> I feel like this could be data driven. I mean, the cost model is >> based mainly on the tuple width and the size of the SortTuple array. >>

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Greg Stark
On 29 Jan 2016 11:58 pm, "Robert Haas" wrote: > It > seems pretty easy to construct cases where this technique regresses, > and a large percentage of those cases are precisely those where > replacement selection would have produced a single run, avoiding the > merge step

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Greg Stark
On 30 Jan 2016 8:27 am, "Greg Stark" wrote: > > > On 29 Jan 2016 11:58 pm, "Robert Haas" wrote: > > It > > seems pretty easy to construct cases where this technique regresses, > > and a large percentage of those cases are precisely those where > >

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Peter Geoghegan
On Fri, Jan 29, 2016 at 2:58 PM, Robert Haas wrote: > I don't quite know what you mean by these numbers. Add a generic, > conservative threshold to what? I meant use "quicksort with spillover" simply because an estimated 90%+ of all tuples have already been consumed.

Re: [HACKERS] Using quicksort for every external sort run

2016-01-27 Thread Peter Geoghegan
On Wed, Dec 23, 2015 at 9:37 AM, Robert Haas wrote: >> But yes, let me concede more clearly: the cost model is based on >> frobbing. But at least it's relatively honest about that, and is >> relatively simple. I think it might be possible to make it simpler, >> but I have a

Re: [HACKERS] Using quicksort for every external sort run

2015-12-28 Thread Peter Geoghegan
On Fri, Dec 18, 2015 at 11:57 AM, Peter Geoghegan wrote: > BTW, I'm not necessarily determined to make the new special-purpose > allocator work exactly as proposed. It seemed useful to prioritize > simplicity, and currently so there is one big "huge palloc()" with > which we blow

Re: [HACKERS] Using quicksort for every external sort run

2015-12-23 Thread Robert Haas
On Tue, Dec 22, 2015 at 8:10 PM, Peter Geoghegan wrote: >> Sure, there are arbitrary numbers all over the code, driven by >> empirical observations about what factors are important to model. But >> this is not that. You don't have a thing called seq_page_cost and a >> thing

Re: [HACKERS] Using quicksort for every external sort run

2015-12-23 Thread Michael Paquier
On Thu, Dec 24, 2015 at 8:44 AM, Peter Geoghegan wrote: > [long blahblah] (Patch moved to next CF, work is going on. Thanks to people here to be active) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] Using quicksort for every external sort run

2015-12-23 Thread Peter Geoghegan
On Wed, Dec 23, 2015 at 1:03 PM, Jeff Janes wrote: > The regression I found when building an index on a column of > 400,000,000 md5(random()::text) with 64MB maintenance_work_mem was not > hard to find at all. I still don't understand what is going on with > it, but it is

Re: [HACKERS] Using quicksort for every external sort run

2015-12-23 Thread Peter Geoghegan
On Wed, Dec 23, 2015 at 1:03 PM, Jeff Janes wrote: > If we do think it is important to almost never cause regressions at > the default maintenance_work_mem (I am agnostic on the importance of > that), then I think we have more work to do here. I just don't know > what that

Re: [HACKERS] Using quicksort for every external sort run

2015-12-23 Thread Peter Geoghegan
On Wed, Dec 23, 2015 at 9:37 AM, Robert Haas wrote: > The point is, nobody can tell WHAT effects this is modeling. > Increasing the tuple size makes the crossover go up. Or down. There are multiple, competing considerations. > This analogy is faulty. It's true that when

Re: [HACKERS] Using quicksort for every external sort run

2015-12-23 Thread Jeff Janes
On Mon, Dec 14, 2015 at 7:22 PM, Peter Geoghegan wrote: > On Mon, Dec 14, 2015 at 6:58 PM, Greg Stark wrote: >> I ran sorts with various parameters on my small NAS server. > > ... > >> without the extra memory optimizations. > > Thanks for taking the time to

Re: [HACKERS] Using quicksort for every external sort run

2015-12-23 Thread Peter Geoghegan
On Wed, Dec 23, 2015 at 1:16 PM, Robert Haas wrote: > On Wed, Dec 23, 2015 at 3:31 PM, Peter Geoghegan wrote: >> On Wed, Dec 23, 2015 at 9:37 AM, Robert Haas wrote: >>> The point is, nobody can tell WHAT effects this is modeling.

Re: [HACKERS] Using quicksort for every external sort run

2015-12-23 Thread Robert Haas
On Wed, Dec 23, 2015 at 3:31 PM, Peter Geoghegan wrote: > On Wed, Dec 23, 2015 at 9:37 AM, Robert Haas wrote: >> The point is, nobody can tell WHAT effects this is modeling. >> Increasing the tuple size makes the crossover go up. Or down. > > There are

Re: [HACKERS] Using quicksort for every external sort run

2015-12-23 Thread Peter Geoghegan
On Wed, Dec 23, 2015 at 7:48 PM, Peter Geoghegan wrote: > I wonder, what's the situation here like with the attached patch > applied on top of what you were testing? I think that we might be > better off with more merge steps when under enormous memory pressure > at the low end,

Re: [HACKERS] Using quicksort for every external sort run

2015-12-22 Thread Robert Haas
On Tue, Dec 22, 2015 at 4:37 PM, Peter Geoghegan wrote: >> This looks like voodoo to me. I assume you tested it and maybe it >> gives correct answers, but it's got to be some kind of world record >> for number of arbitrary constants per SLOC, and there's no real >> justification

Re: [HACKERS] Using quicksort for every external sort run

2015-12-22 Thread Peter Geoghegan
On Tue, Dec 22, 2015 at 2:57 PM, Robert Haas wrote: > On Tue, Dec 22, 2015 at 4:37 PM, Peter Geoghegan wrote: >> That's not fair. DEFAULT_EQ_SEL, DEFAULT_RANGE_INEQ_SEL, and >> DEFAULT_NUM_DISTINCT are each about as arbitrary. We have to do >> something,

Re: [HACKERS] Using quicksort for every external sort run

2015-12-22 Thread Peter Geoghegan
On Tue, Dec 22, 2015 at 9:10 AM, Robert Haas wrote: > So I was looking at the 0001 patch Thanks. I'm going to produce a revision of 0002 shortly, so perhaps hold off on that one. The big change there will be to call grow_memtuples() to allow us to increase the number of

Re: [HACKERS] Using quicksort for every external sort run

2015-12-22 Thread Robert Haas
On Sun, Dec 6, 2015 at 7:25 PM, Peter Geoghegan wrote: > On Tue, Nov 24, 2015 at 4:33 PM, Peter Geoghegan wrote: >> So, the bottom line is: This patch seems very good, is unlikely to >> have any notable downside (no case has been shown to be regressed), >> but

Re: [HACKERS] Using quicksort for every external sort run

2015-12-18 Thread Robert Haas
On Sat, Dec 12, 2015 at 5:28 PM, Peter Geoghegan wrote: > On Sat, Dec 12, 2015 at 12:10 AM, Jeff Janes wrote: >> I have a question about the terminology used in this patch. What is a >> tuple proper? What is it in contradistinction to? I would think that

Re: [HACKERS] Using quicksort for every external sort run

2015-12-18 Thread Peter Geoghegan
On Fri, Dec 18, 2015 at 10:12 AM, Robert Haas wrote: > I don't really like the term "memory pool" either. We're growing a > bunch of little special-purpose allocators all over the code base > because of palloc's somewhat dubious performance and memory usage >

Re: [HACKERS] Using quicksort for every external sort run

2015-12-18 Thread Peter Geoghegan
On Fri, Dec 18, 2015 at 10:12 AM, Robert Haas wrote: > Anyway, I agree with Jeff that this terminology shouldn't creep into > function and structure member names. Okay. > I don't really like the term "memory pool" either. We're growing a > bunch of little special-purpose

Re: [HACKERS] Using quicksort for every external sort run

2015-12-18 Thread Peter Geoghegan
On Fri, Dec 18, 2015 at 12:50 PM, Robert Haas wrote: >> BTW, I'm not necessarily determined to make the new special-purpose >> allocator work exactly as proposed. It seemed useful to prioritize >> simplicity, and currently so there is one big "huge palloc()" with >> which

Re: [HACKERS] Using quicksort for every external sort run

2015-12-18 Thread Robert Haas
On Fri, Dec 18, 2015 at 2:57 PM, Peter Geoghegan wrote: > On Fri, Dec 18, 2015 at 10:12 AM, Robert Haas wrote: >> I don't really like the term "memory pool" either. We're growing a >> bunch of little special-purpose allocators all over the code base >>

Re: [HACKERS] Using quicksort for every external sort run

2015-12-14 Thread Greg Stark
I ran sorts with various parameters on my small NAS server. This is a fairly slow CPU and limited memory machine with lots of disk so I thought it would actually make a good test case for smaller servers. The following is the speedup (for values < 100%) or slowdown (values > 100%) for the first

Re: [HACKERS] Using quicksort for every external sort run

2015-12-14 Thread Peter Geoghegan
On Mon, Dec 14, 2015 at 6:58 PM, Greg Stark wrote: > I ran sorts with various parameters on my small NAS server. ... > without the extra memory optimizations. Thanks for taking the time to benchmark the patch! While I think it's perfectly fair that you didn't apply the final

Re: [HACKERS] Using quicksort for every external sort run

2015-12-14 Thread Peter Geoghegan
On Mon, Dec 14, 2015 at 7:22 PM, Peter Geoghegan wrote: > Thanks for taking the time to benchmark the patch! Also, I should point out that you didn't add work_mem past the point where the master branch will get slower, while the patch continues to get faster. This seems to

Re: [HACKERS] Using quicksort for every external sort run

2015-12-13 Thread Jeff Janes
On Sun, Dec 13, 2015 at 3:40 PM, Peter Geoghegan wrote: > On Sat, Dec 12, 2015 at 4:41 PM, Jeff Janes wrote: > Also, if I am reading this correctly, when we refill a pool from a logical tape we still transform each tuple as it is read from the

Re: [HACKERS] Using quicksort for every external sort run

2015-12-13 Thread Peter Geoghegan
On Sat, Dec 12, 2015 at 4:41 PM, Jeff Janes wrote: > Those usages make sense to me, as they are locally self-contained and > it is clear what they are in contradistinction to. But your usage is > spread throughout (even in function names, not just comments) and > seems to

Re: [HACKERS] Using quicksort for every external sort run

2015-12-13 Thread Jeff Janes
On Tue, Dec 8, 2015 at 6:39 PM, Greg Stark wrote: > On Wed, Dec 9, 2015 at 12:02 AM, Jeff Janes wrote: >> >> >> Then in the next (final) merge, it is has to read in this huge >> fragmented tape run emulation, generating a lot of random IO to read >> it. > >

Re: [HACKERS] Using quicksort for every external sort run

2015-12-13 Thread Peter Geoghegan
On Sun, Dec 13, 2015 at 7:31 PM, Jeff Janes wrote: > The reason for memtuples is to handle random access. Since we are no > longer doing random access, we no longer need it. > > We could free memtuples, re-allocate just enough to form the binary > heap for the N-way merge,

Re: [HACKERS] Using quicksort for every external sort run

2015-12-12 Thread Jeff Janes
On Sun, Dec 6, 2015 at 4:25 PM, Peter Geoghegan wrote: > Maybe we should consider trying to get patch 0002 (the memory > pool/merge patch) committed first, something Greg Stark suggested > privately. That might actually be an easier way of integrating this > work, since it

Re: [HACKERS] Using quicksort for every external sort run

2015-12-12 Thread Ants Aasma
On Sat, Dec 12, 2015 at 12:41 AM, Greg Stark wrote: > On Wed, Dec 9, 2015 at 2:44 AM, Peter Geoghegan wrote: >> On Tue, Dec 8, 2015 at 6:39 PM, Greg Stark wrote: >> >> I guess you mean insertion sort. What's the theoretical justification >> for the

Re: [HACKERS] Using quicksort for every external sort run

2015-12-12 Thread Greg Stark
On Sat, Dec 12, 2015 at 7:42 PM, Ants Aasma wrote: > As the open coding doesn't help with eliminating control flow > dependencies, so my idea is to encode the sort network comparison > order in an array and use that to drive a simple loop. The code size > would be pretty

Re: [HACKERS] Using quicksort for every external sort run

2015-12-12 Thread Jeff Janes
On Sat, Dec 12, 2015 at 2:28 PM, Peter Geoghegan wrote: > On Sat, Dec 12, 2015 at 12:10 AM, Jeff Janes wrote: >> I have a question about the terminology used in this patch. What is a >> tuple proper? What is it in contradistinction to? I would think that

  1   2   >