Re: [HACKERS] Using quicksort for every external sort run

2015-12-09 Thread Jeremy Harris
On 09/12/15 00:02, Jeff Janes wrote: > The second one consumes that giant tape run along with 232 small tape > runs. In terms of number of comparisons, binary merge works best when the inputs are of similar length. I'd assume the same goes for n-ary merge, but I don't know if comparison count is

Re: [HACKERS] [PATCH] Equivalence Class Filters

2015-12-08 Thread Jeremy Harris
On 07/12/15 16:44, Simon Riggs wrote: > There are many optimizations we might adopt, yet planning time is a factor. > It seems simple enough to ignore more complex optimizations if we have > already achieved a threshold cost (say 10). Such a test would add nearly > zero time for the common case.

Re: [HACKERS] Getting sorted data from foreign server

2015-10-08 Thread Jeremy Harris
On 08/10/15 17:09, Robert Haas wrote: > - You consider pushing down ORDER BY if any prefix of the query > pathkeys satisfy is_foreign_expr(), but that doesn't seem right to me. > If the user says SELECT * FROM remotetab ORDER BY a, unsafe(a), > ordering the result set by a doesn't help us much.

Re: [HACKERS] BRIN indexes for MAX, MIN, ORDER BY?

2015-09-29 Thread Jeremy Harris
On 27/09/15 21:58, Gavin Wahl wrote: > Somewhat harder but still possible would be using BRIN indexes to > accelerate ORDER BY. This would require a sorting algorithm that can take > advantage of mostly-sorted inputs. You would sort the page ranges by their > minimum or maximum value, then feed

Re: [HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-08-01 Thread Jeremy Harris
On 31/07/15 18:31, Robert Haas wrote: On Fri, Jul 31, 2015 at 7:21 AM, Jeremy Harris j...@wizmail.org wrote: Heapification is O(n) already, whether siftup (existing) or down. That's not my impression, or what Wikipedia says. Source? Measurements done last year: http://www.postgresql.org

[HACKERS] Re: Using quicksort and a merge step to significantly improve on tuplesort's single run external sort

2015-07-31 Thread Jeremy Harris
On 30/07/15 02:05, Peter Geoghegan wrote: Since heapification is now a big fraction of the total cost of a sort sometimes, even where the heap invariant need not be maintained for any length of time afterwards, it might be worth revisiting the patch to make that an O(n) rather than a O(log n)

Re: [HACKERS] Asynchronous DRAM Self-Refresh

2015-05-23 Thread Jeremy Harris
On 22/05/15 22:30, Josh Berkus wrote: At CoreOS Fest, Intel presented about a technology which they used to improve write times for the nonrelational data store Etcd. It's called Asynchronous DRAM Self-Refresh, or ADR. This is supposedly a feature of all of their chips since E5 which allows

Re: [HACKERS] Asynchronous DRAM Self-Refresh

2015-05-23 Thread Jeremy Harris
On 22/05/15 22:30, Josh Berkus wrote: At CoreOS Fest, Intel presented about a technology which they used to improve write times for the nonrelational data store Etcd. It's called Asynchronous DRAM Self-Refresh, or ADR. This is supposedly a feature of all of their chips since E5 which allows

Re: [HACKERS] Abbreviated keys for text cost model fix

2015-03-03 Thread Jeremy Harris
On 03/03/15 03:08, Arthur Silva wrote: Does it always perform mergesort instead of quicksort when enabled? Yes; there seemed no advantage to any additional complexity. The merge consistently performs fewer comparisons than the quicksort, on random input - and many fewer if there are any sorted

Re: [HACKERS] Abbreviated keys for text cost model fix

2015-02-25 Thread Jeremy Harris
On 25/02/15 00:42, Peter Geoghegan wrote: On Tue, Feb 24, 2015 at 4:32 PM, Jeremy Harris j...@wizmail.org wrote: On 23/02/15 16:40, Tomas Vondra wrote: On 22.2.2015 22:30, Peter Geoghegan wrote: You should try it with the data fully sorted like this, but with one tiny difference: The very

Re: [HACKERS] Abbreviated keys for text cost model fix

2015-02-24 Thread Jeremy Harris
On 23/02/15 16:40, Tomas Vondra wrote: On 22.2.2015 22:30, Peter Geoghegan wrote: You should try it with the data fully sorted like this, but with one tiny difference: The very last tuple is out of order. How does that look? If this case is actually important, a merge-sort can take

Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 00:46, Simon Riggs wrote: Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) - Sort (cost= rows=568733 width=175) (actual time= rows=20 loops=1) Sort Method: top-N heapsort Going off on a tangent, when I was playing with a merge-sort

Re: [HACKERS] EXPLAIN ANALYZE output weird for Top-N Sort

2014-11-14 Thread Jeremy Harris
On 14/11/14 14:54, Tom Lane wrote: Jeremy Harris j...@wizmail.org writes: On 14/11/14 00:46, Simon Riggs wrote: Limit (cost= rows=20 width=175) (actual time= rows=20 loops=1) - Sort (cost= rows=568733 width=175) (actual time= rows=20 loops=1) Sort Method: top-N heapsort

Re: [HACKERS] Directory/File Access Permissions for COPY and Generic File Access Functions

2014-10-29 Thread Jeremy Harris
On 29/10/14 16:11, Andres Freund wrote: I do think checking the link count to be 1 is safe though. You will fail against certain styles of online-backup. -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-26 Thread Jeremy Harris
On 26/02/14 03:08, Robert Haas wrote: On Tue, Feb 25, 2014 at 2:55 PM, Jeremy Harris j...@wizmail.org wrote: On 24/02/14 17:38, Robert Haas wrote: On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris j...@wizmail.org wrote: Run under cachegrind, it takes about N/10 last-level cache misses, all

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-25 Thread Jeremy Harris
On 24/02/14 17:38, Robert Haas wrote: On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris j...@wizmail.org wrote: Run under cachegrind, it takes about N/10 last-level cache misses, all for the new item being introduced to the heap. The existing code takes none at all. Can you explain this further

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-20 Thread Jeremy Harris
On 09/02/14 17:11, Jeremy Harris wrote: On 06/02/14 18:21, Jeff Janes wrote: Did you try sorting already-sorted, reverse sorted, or pipe-organ shaped data sets? We will also need to test it on strings. I usually use md5(random()::text) to generate strings for such purposes, at least

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-07 Thread Jeremy Harris
On 06/02/14 22:12, Jeremy Harris wrote: Did you try sorting already-sorted, reverse sorted, or pipe-organ shaped data sets? Summary (low numbers better): Random ints: 83% compares, level on time. Sorted ints: level compares, 70% time. Reverse-sorted ints: 10% compares, 15

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris
On 05/02/14 21:56, Robert Haas wrote: On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris j...@wizmail.org wrote: The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris
On 06/02/14 18:21, Jeff Janes wrote: How big of sets were you sorting in each case? Big enough to go external. The timings and compare-counts given are purely for the heapify stage not the total for the sort, so are constrained by the work_mem not by the sort size per se. I'm limited to

Re: [HACKERS] Minor performance improvement in transition to external sort

2014-02-06 Thread Jeremy Harris
On 06/02/14 14:22, Robert Haas wrote: On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris j...@wizmail.org wrote: On 05/02/14 21:56, Robert Haas wrote: On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris j...@wizmail.org wrote: The attached patch replaces the existing siftup method for heapify

[HACKERS] Minor performance improvement in transition to external sort

2014-02-04 Thread Jeremy Harris
The attached patch replaces the existing siftup method for heapify with a siftdown method. Tested with random integers it does 18% fewer compares and takes 10% less time for the heapify, over the work_mem range 1024 to 1048576. Both algorithms appear to be O(n) (contradicting Wikipedia's claim

Re: [HACKERS] A better way than tweaking NTUP_PER_BUCKET

2014-01-28 Thread Jeremy Harris
On 27/01/14 18:00, Simon Riggs wrote: On 27 January 2014 17:44, Pavel Stehule pavel.steh...@gmail.com wrote: This topic is interesting - we found very bad performance with hashing large tables with high work_mem. MergeJoin with quicksort was significantly faster. I've seen this also. I

[HACKERS] Questionable coding in orderedsetaggs.c

2014-01-25 Thread Jeremy Harris
In ordered_set_startup() sorts are initialised in non-randomAccess mode (tuplesort_begin_heap() and ~datum(), last argument). The use of tuplesort_skip_tuples() feels very like a random access to me. I think it doesn't fail because the only use (and implementation) is to skip forwards; if

Re: [HACKERS] PoC: Duplicate Tuple Elidation during External Sort for DISTINCT

2014-01-22 Thread Jeremy Harris
On 22/01/14 03:16, Jon Nelson wrote: Greetings -hackers: I have worked up a patch to PostgreSQL which elides tuples during an external sort. The primary use case is when sorted input is being used to feed a DISTINCT operation. The idea is to throw out tuples that compare as identical whenever

Re: [HACKERS] PoC: Duplicate Tuple Elidation during External Sort for DISTINCT

2014-01-22 Thread Jeremy Harris
On 22/01/14 03:53, Tom Lane wrote: Jon Nelson jnelson+pg...@jamponi.net writes: A rough summary of the patch follows: - a GUC variable enables or disables this capability - in nodeAgg.c, eliding duplicate tuples is enabled if the number of distinct columns is equal to the number of sort

Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Jeremy Harris
On 13/01/14 18:01, Alexander Korotkov wrote: Thanks. It's included into attached version of patch. As wall as estimation improvements, more comments and regression tests fix. Would it be possible to totally separate the two sets of sort-keys, only giving the non-index set to the tuplesort? At

Re: [HACKERS] PoC: Partial sort

2014-01-18 Thread Jeremy Harris
On 31/12/13 01:41, Andreas Karlsson wrote: On 12/29/2013 08:24 AM, David Rowley wrote: If it was possible to devise some way to reuse any previous tuplesortstate perhaps just inventing a reset method which clears out tuples, then we could see performance exceed the standard seqscan - sort. The

Re: [HACKERS] PoC: Partial sort

2014-01-16 Thread Jeremy Harris
On 22/12/13 20:26, Alexander Korotkov wrote: On Sat, Dec 14, 2013 at 6:30 PM, Jeremy Harris j...@wizmail.org wrote: On 14/12/13 12:54, Andres Freund wrote: Is that actually all that beneficial when sorting with a bog standard qsort() since that doesn't generally benefit from data being pre

Re: [Lsf-pc] [HACKERS] Linux kernel impact on PostgreSQL performance

2014-01-16 Thread Jeremy Harris
On 14/01/14 22:23, Dave Chinner wrote: On Tue, Jan 14, 2014 at 11:40:38AM -0800, Kevin Grittner wrote: To quantify that, in a production setting we were seeing pauses of up to two minutes with shared_buffers set to 8GB and default dirty

Re: [HACKERS] PoC: Partial sort

2013-12-14 Thread Jeremy Harris
On 14/12/13 12:54, Andres Freund wrote: On 2013-12-14 13:59:02 +0400, Alexander Korotkov wrote: Currently when we need to get ordered result from table we have to choose one of two approaches: get results from index in exact order we need or do sort of tuples. However, it could be useful to mix

Re: [HACKERS] Dynamic Shared Memory stuff

2013-11-23 Thread Jeremy Harris
On 20/11/13 19:58, Robert Haas wrote: Parallel sort, and then parallel other stuff. Eventually general parallel query. I have recently updated https://wiki.postgresql.org/wiki/Parallel_Sort and you may find that interesting/helpful as a statement of intent. I've been playing with an internal

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Jeremy Harris
On 13/11/13 09:07, Leonardo Francalanci wrote: Problem? having 4 btree indexes on random values (imsi+imei * 2, since we have calling and caller) kills the performance in insertion after a while. Surely there's good correlation between IMSI IMEI, so have a separate table to translate one to

[HACKERS] regression tests

2013-09-06 Thread Jeremy Harris
Hi, I don't see the regression tests running any index-hash operations. What am I missing? -- Cheers, Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regression tests

2013-09-06 Thread Jeremy Harris
On 06/09/13 15:44, Robert Haas wrote: On Fri, Sep 6, 2013 at 3:34 AM, Jeremy Harris j...@wizmail.org wrote: I don't see the regression tests running any index-hash operations. What am I missing? What's an index-hash operation? Ones that hit tuplesort_begin_index_hash() -- Cheers

Re: [HACKERS] how to pass data (tuples) to worker processes?

2013-08-07 Thread Jeremy Harris
On 06/08/13 13:59, Robert Haas wrote: My thought is to create a queue abstraction that sits on top of the dynamic shared memory infrastructure, so that you can set aside a portion of your dynamic shared memory segment to use as a ring buffer and send messages back and forth with using some

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2012-10-06 Thread Jeremy Harris
On 10/05/2012 09:03 PM, Tom Lane wrote: Note that allowing subsequent requests to jump the queue would not be a good fix for this; if you do that, it's likely the ex-lock will never be granted, at least not till the next system idle time. Offering that option to the admin sounds like a good

Re: [HACKERS] Memory usage during sorting

2012-03-18 Thread Jeremy Harris
On 2012-03-18 15:25, Tom Lane wrote: Jeff Janesjeff.ja...@gmail.com writes: On Wed, Mar 7, 2012 at 11:55 AM, Robert Haasrobertmh...@gmail.com wrote: The problem there is that none of the files can be deleted until it was entirely read, so you end up with all the data on disk twice. I don't

Re: [HACKERS] random_page_cost vs seq_page_cost

2012-01-05 Thread Jeremy Harris
On 2012-01-05 10:04, Benedikt Grundmann wrote: I have a question of how to benchmark hardware to determine the appropriate ratio of seq_page_cost vs random_page_cost. It'd be really nice if the DBMS measured actual experienced values.. -- Jeremy -- Sent via pgsql-hackers mailing list

Re: [HACKERS] spinlocks on powerpc

2012-01-03 Thread Jeremy Harris
On 2012-01-03 04:44, Robert Haas wrote: On read-only workloads, you get spinlock contention, because everyone who wants a snapshot has to take the LWLock mutex to increment the shared lock count and again (just a moment later) to decrement it. Does the LWLock protect anything but the shared