On 15 February 2012 06:16, Robert Haas robertmh...@gmail.com wrote:
On Fri, Feb 10, 2012 at 10:30 AM, Peter Geoghegan pe...@2ndquadrant.com
wrote:
[ new patch ]
I spent quite a bit of time looking at this today - the patch
specifically, and the issue of making quicksort fast more generally.
On Wed, Feb 15, 2012 at 8:29 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
Cool. I agree that we should do this. It doesn't need to be justified
as a performance optimisation - it makes sense to refactor in this
way. If that makes things faster, then so much the better.
Well, maybe so, but
On 15 February 2012 15:27, Robert Haas robertmh...@gmail.com wrote:
I am inclined to agree that given that we already use Perl to generate
source code like this, it seems natural that we should prefer to do
that, if only to avoid paranoia about the inclusion of a dial-a-bloat
knob. I am at a
On Fri, Feb 10, 2012 at 10:30 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
[ new patch ]
I spent quite a bit of time looking at this today - the patch
specifically, and the issue of making quicksort fast more generally.
It seemed to me that if we're going to have separate copies of the
On 9 February 2012 14:51, Robert Haas robertmh...@gmail.com wrote:
I'm not sure I entirely follow all this, but I'll look at the code
once you have it.
I have attached a revision of the patch, with the adjustments already
described. Note that I have not made this support btree tuplesorting
yet,
On Tue, Feb 07, 2012 at 09:38:39PM -0500, Robert Haas wrote:
Second, there's a concern about binary bloat: duplicating lots of code
with different comparators inlined generates, well, a lot of machine
code. Of course, an 0.8% increase in the size of the resulting binary
is very unlikely to
On Thu, Feb 9, 2012 at 7:24 AM, Noah Misch n...@leadboat.com wrote:
On Tue, Feb 07, 2012 at 09:38:39PM -0500, Robert Haas wrote:
Second, there's a concern about binary bloat: duplicating lots of code
with different comparators inlined generates, well, a lot of machine
code. Of course, an 0.8%
On 9 February 2012 13:50, Robert Haas robertmh...@gmail.com wrote:
I'm also more than slightly concerned that we are losing sight of the
forest for the trees. I have heard reports that sorting large amounts
of data is many TIMES slower for us than for a certain other database
product. I
On Thu, Feb 9, 2012 at 9:37 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
On 9 February 2012 13:50, Robert Haas robertmh...@gmail.com wrote:
I'm also more than slightly concerned that we are losing sight of the
forest for the trees. I have heard reports that sorting large amounts
of data
On Thu, Feb 09, 2012 at 07:24:49AM -0500, Noah Misch wrote:
This patch has gotten more than its fair share of attention for bloat, and I
think that's mostly because there's a dial-a-bloat-level knob sticking out and
begging to be frobbed.
I already emailed Peter privately stating that he
On 9 February 2012 14:51, Robert Haas robertmh...@gmail.com wrote:
I'm not sure I entirely follow all this, but I'll look at the code
once you have it. Are you saying that all the comparetup_yadda
functions are redundant to each other in the single-key case?
Yes, I am. The main reason that
On Thu, Feb 09, 2012 at 03:36:23PM +, Peter Geoghegan wrote:
On 9 February 2012 14:51, Robert Haas robertmh...@gmail.com wrote:
I'm not sure I entirely follow all this, but I'll look at the code
once you have it. Are you saying that all the comparetup_yadda
functions are redundant to
On 9 February 2012 17:16, Bruce Momjian br...@momjian.us wrote:
Yes, I am. The main reason that the loops exist in those functions
(which is the only way that they substantially differ) is because they
each have to get the other keys through various ways that characterise
the tuple class that
It doesn't necessarily matter if we increase the size of the postgres
binary by 10%, precisely because most of that is not going to be in
play from one instant to the next. I'm thinking, in particular, of
btree index specialisations, where it could make perfect sense to go
crazy. You cannot have a
Peter Geoghegan pe...@2ndquadrant.com writes:
It doesn't necessarily matter if we increase the size of the postgres
binary by 10%, precisely because most of that is not going to be in
play from one instant to the next.
You've heard of swapping, no? Basically what I'm hearing from you is
total
On Wed, Feb 8, 2012 at 8:33 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
It doesn't necessarily matter if we increase the size of the postgres
binary by 10%, precisely because most of that is not going to be in
play from one instant to the next.
As Tom says, that doesn't jive with my
On Wed, Feb 8, 2012 at 9:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
IMO this patch is already well past the point of diminishing returns in
value-per-byte-added. I'd like to see it trimmed back to provide a fast
path for just single-column int4/int8/float4/float8 sorts. The other
cases aren't
On Wed, Feb 08, 2012 at 01:33:30PM +, Peter Geoghegan wrote:
It doesn't necessarily matter if we increase the size of the postgres
binary by 10%, precisely because most of that is not going to be in
play from one instant to the next. I'm thinking, in particular, of
btree index
On Wed, Feb 08, 2012 at 10:17:36AM -0500, Robert Haas wrote:
On Wed, Feb 8, 2012 at 9:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
IMO this patch is already well past the point of diminishing returns in
value-per-byte-added. I'd like to see it trimmed back to provide a fast
path for just
Bruce Momjian br...@momjian.us writes:
Yes, please. That would be a big help. Is there no optimization for
strings? I assume they are sorted a lot.
It seems unlikely that it'd be worth including strings, especially if
your locale is not C. This whole thing only makes sense for datatypes
On Wed, Feb 08, 2012 at 11:35:46AM -0500, Tom Lane wrote:
Bruce Momjian br...@momjian.us writes:
Yes, please. That would be a big help. Is there no optimization for
strings? I assume they are sorted a lot.
It seems unlikely that it'd be worth including strings, especially if
your
On 8 February 2012 15:17, Robert Haas robertmh...@gmail.com wrote:
On Wed, Feb 8, 2012 at 9:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
IMO this patch is already well past the point of diminishing returns in
value-per-byte-added. I'd like to see it trimmed back to provide a fast
path for just
On Wed, Feb 8, 2012 at 10:59 AM, Bruce Momjian br...@momjian.us wrote:
On Wed, Feb 08, 2012 at 10:17:36AM -0500, Robert Haas wrote:
On Wed, Feb 8, 2012 at 9:51 AM, Tom Lane t...@sss.pgh.pa.us wrote:
IMO this patch is already well past the point of diminishing returns in
value-per-byte-added.
Robert Haas robertmh...@gmail.com writes:
[ lots of numbers ]
... I just can't get excited about that. However, I
find the single-key optimizations much more compelling, for the
reasons stated above, and feel we ought to include those.
This conclusion seems sound to me, for the reasons you
On 8 February 2012 17:58, Robert Haas robertmh...@gmail.com wrote:
It seems clear that the single sort-key optimizations are a much
better value per byte of code than the type-specific optimizations.
Ignoring client overhead, we get between half and two-thirds of the
benefit, and it costs us
On 8 February 2012 18:48, Peter Geoghegan pe...@2ndquadrant.com wrote:
I think that there may be additional benefits from making the
qsort_arg specialisation look less like a c stdlib one, like refining
the swap logic to have compile-time knowledge of the type it is
sorting. I'm thinking that
On Wed, Feb 8, 2012 at 1:48 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
That was clear from an early stage, and is something that I
acknowledged way back in September
OK, so why didn't/don't we do and commit that part first, and then
proceed to argue about the remainder once it's in?
I
On 8 February 2012 23:33, Robert Haas robertmh...@gmail.com wrote:
On Wed, Feb 8, 2012 at 1:48 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
That was clear from an early stage, and is something that I
acknowledged way back in September
OK, so why didn't/don't we do and commit that part
On 2/6/12 3:19 PM, Bruce Momjian wrote:
While we're waiting for anyone else to weigh in with an opinion on the
right place to draw the line here, do you want to post an updated
patch with the changes previously discussed?
Well, I think we have to ask not only how many people are using
Jim Decibel! Nasby wrote:
I agree that it's probably pretty unusual to index floats.
FWIW: Cubes and points are floats, right? So would spatial indexes benefit
from this optimization, or is it only raw floats?
Jay Levitt
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
On Tue, Feb 7, 2012 at 2:39 PM, Jay Levitt jay.lev...@gmail.com wrote:
Jim Decibel! Nasby wrote:
I agree that it's probably pretty unusual to index floats.
FWIW: Cubes and points are floats, right? So would spatial indexes benefit
from this optimization, or is it only raw floats?
Cubes are
On Tue, Feb 07, 2012 at 09:38:39PM -0500, Robert Haas wrote:
So we need some principled way of deciding how much inlining is
reasonable, because I am 100% certain this is not going to be the last
time someone discovers that a massive exercise in inlining can yield a
nifty performance benefit
On Fri, Jan 27, 2012 at 09:37:37AM -0500, Robert Haas wrote:
On Fri, Jan 27, 2012 at 9:27 AM, Peter Geoghegan pe...@2ndquadrant.com
wrote:
Well, I don't think it's all that subjective - it's more the case that
it is just difficult, or it gets that way as you consider more
specialisations.
On Mon, Feb 06, 2012 at 04:19:07PM -0500, Bruce Momjian wrote:
Peter Geoghegan obviously has done some serious work in improving
sorting, and worked well with the community process. He has done enough
analysis that I am hard-pressed to see how we would get similar
improvement using a
On 6 February 2012 21:19, Bruce Momjian br...@momjian.us wrote:
Peter Geoghegan obviously has done some serious work in improving
sorting, and worked well with the community process.
Thank you for acknowledging that.
It's unfortunate that C does not support expressing these kinds of
ideas in a
On Mon, Feb 06, 2012 at 10:49:10PM +, Peter Geoghegan wrote:
On 6 February 2012 21:19, Bruce Momjian br...@momjian.us wrote:
Peter Geoghegan obviously has done some serious work in improving
sorting, and worked well with the community process.
Thank you for acknowledging that.
It's
On Mon, Feb 06, 2012 at 06:43:04PM -0500, Bruce Momjian wrote:
On Mon, Feb 06, 2012 at 10:49:10PM +, Peter Geoghegan wrote:
On 6 February 2012 21:19, Bruce Momjian br...@momjian.us wrote:
Peter Geoghegan obviously has done some serious work in improving
sorting, and worked well with
On Wed, Feb 01, 2012 at 04:12:58PM -0600, Jim Nasby wrote:
On Jan 26, 2012, at 9:32 PM, Robert Haas wrote:
But if we want to put it on a diet, the first thing I'd probably be
inclined to lose is the float4 specialization. Some members of the
audience will recall that I take dim view of
On 31 January 2012 19:47, Robert Haas robertmh...@gmail.com wrote:
On Fri, Jan 27, 2012 at 3:33 PM, Peter Geoghegan pe...@2ndquadrant.com
wrote:
Patch is attached. I have not changed the duplicate functions. This is
because I concluded that it was the lesser of two evils to have to get
the
On Jan 26, 2012, at 9:32 PM, Robert Haas wrote:
But if we want to put it on a diet, the first thing I'd probably be
inclined to lose is the float4 specialization. Some members of the
audience will recall that I take dim view of floating point arithmetic
generally, but I'll concede that there
Excerpts from Jim Nasby's message of mié feb 01 19:12:58 -0300 2012:
On Jan 26, 2012, at 9:32 PM, Robert Haas wrote:
But if we want to put it on a diet, the first thing I'd probably be
inclined to lose is the float4 specialization. Some members of the
audience will recall that I take dim
On Fri, Jan 27, 2012 at 3:33 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
Patch is attached. I have not changed the duplicate functions. This is
because I concluded that it was the lesser of two evils to have to get
the template to generate both declarations in the header file, and
On Thu, Jan 26, 2012 at 11:36 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
I'm not surprised that you weren't able to measure a performance
regression from the binary bloat. Any such regression is bound to be
very small and probably quite difficult to notice most of the time;
it's really
Uh, obviously I meant causal relationship and not correlation.
On 27 January 2012 13:37, Robert Haas robertmh...@gmail.com wrote:
I completely agree. So the point is that, when faced a patch that
adds an atypically large number of CPU instructions, we ought to ask
ourselves whether those
On Fri, Jan 27, 2012 at 9:27 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
Well, I don't think it's all that subjective - it's more the case that
it is just difficult, or it gets that way as you consider more
specialisations.
Sure it's subjective. Two well-meaning people could have
On Thu, Jan 19, 2012 at 1:47 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
Thoughts?
I generated some random data using this query:
create table nodups (g) as select (g%10)*1000+g/10 from
generate_series(1,1) g;
And then used pgbench to repeatedly sort it using this query:
select *
On 26 January 2012 19:45, Robert Haas robertmh...@gmail.com wrote:
The patch came out about 28% faster than master. Admittedly, that's
with no client overhead, but still: not bad.
Thanks. There was a 28% reduction in the time it took to execute the
query, but there would have also been a
On Thu, Jan 26, 2012 at 4:09 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
On 26 January 2012 19:45, Robert Haas robertmh...@gmail.com wrote:
The patch came out about 28% faster than master. Admittedly, that's
with no client overhead, but still: not bad.
Thanks. There was a 28% reduction
Alright, so while I agree with everything you've asked for, the fact
is that there is a controversy in relation to binary bloat, and that's
the blocker here. How can we satisfactorily resolve that, or is that
question adequately addressed by the benchmark that I produced?
What if third party
On Thu, Jan 26, 2012 at 5:10 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
Alright, so while I agree with everything you've asked for, the fact
is that there is a controversy in relation to binary bloat, and that's
the blocker here. How can we satisfactorily resolve that, or is that
On 27 January 2012 03:32, Robert Haas robertmh...@gmail.com wrote:
But if we want to put it on a diet, the first thing I'd probably be
inclined to lose is the float4 specialization. Some members of the
audience will recall that I take dim view of floating point arithmetic
generally, but I'll
I decided to take advantage of my ongoing access to a benchmarking
server to see how I could get on with a query with an especially large
sort. Now, that box has 16GB of ram, and some people have that much
memory in their laptops these days, so I was somewhat limited in how
far I could push
Obviously, many indexes are unique and thus won't have duplicates at
all. But if someone creates an index and doesn't make it unique, odds
are very high that it has some duplicates. Not sure how many we
typically expect to see, but more than zero...
Peter may not, but I personally admin
On 9 January 2012 19:45, Josh Berkus j...@agliodbs.com wrote:
Obviously, many indexes are unique and thus won't have duplicates at
all. But if someone creates an index and doesn't make it unique, odds
are very high that it has some duplicates. Not sure how many we
typically expect to see,
On 6 January 2012 21:14, Tom Lane t...@sss.pgh.pa.us wrote:
When there are lots of duplicates of a particular indexed value, the
existing code will cause an indexscan to search them in physical order,
whereas if we remove the existing logic it'll be random --- in
particular, accesses to the
On Thu, Jan 5, 2012 at 5:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
There is no compiler anywhere that implements always inline, unless
you are talking about a macro. inline is a hint and nothing more,
and if you think you can force it you are mistaken. So this controversy
is easily resolved:
On 5 January 2012 20:23, Robert Haas robertmh...@gmail.com wrote:
I don't have a problem with the idea of a pg_always_inline, but I'm
wondering what sort of fallback mechanism you propose. It seems to me
that if the performance improvement is conditioned on inlining be done
and we're not
On Fri, Jan 6, 2012 at 12:10 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
As you know, all queries tested have lots and lots of duplicates (a
~1.5GB table that contains the same number of distinct elements as a
~10MB table once did), and removing the duplicate protection for
btrees, on top
On 6 January 2012 17:29, Robert Haas robertmh...@gmail.com wrote:
On Fri, Jan 6, 2012 at 12:10 PM, Peter Geoghegan pe...@2ndquadrant.com
wrote:
As you know, all queries tested have lots and lots of duplicates (a
~1.5GB table that contains the same number of distinct elements as a
~10MB table
Peter Geoghegan pe...@2ndquadrant.com writes:
I didn't bother isolating that, because it doesn't really make sense
to (not having it is probably only of particular value when doing what
I'm doing anyway, but who knows). Go ahead and commit something to
remove that code (get it in both
On 6 January 2012 18:45, Tom Lane t...@sss.pgh.pa.us wrote:
Peter Geoghegan pe...@2ndquadrant.com writes:
I didn't bother isolating that, because it doesn't really make sense
to (not having it is probably only of particular value when doing what
I'm doing anyway, but who knows). Go ahead and
Peter Geoghegan pe...@2ndquadrant.com writes:
On 6 January 2012 18:45, Tom Lane t...@sss.pgh.pa.us wrote:
Actually, I'm going to object to reverting that commit, as I believe
that having equal-keyed index entries in physical table order may offer
some performance benefits at access time. If
On Fri, Jan 6, 2012 at 4:14 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Admittedly, I don't have any numbers quantifying just how useful that
might be, but on the other hand you've not presented any evidence
justifying removing the behavior either. If we believe your position
that indexes don't
On Thu, Dec 29, 2011 at 9:03 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
3. Resolve two anticipated controversies that are, respectively,
somewhat orthogonal and completely orthogonal to the binary bloat
controversy. The first (controversy A) is that I have added a new
piece of
Robert Haas robertmh...@gmail.com writes:
On Thu, Dec 29, 2011 at 9:03 PM, Peter Geoghegan pe...@2ndquadrant.com
wrote:
The first (controversy A) is that I have added a new
piece of infrastructure, pg_always_inline, which, as the name
suggests, is a portable way of insisting that a function
On 5 January 2012 22:27, Tom Lane t...@sss.pgh.pa.us wrote:
There is no compiler anywhere that implements always inline, unless
you are talking about a macro. inline is a hint and nothing more,
and if you think you can force it you are mistaken. So this controversy
is easily resolved: we do
On Thu, Dec 29, 2011 at 8:03 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
* A spreadsheet that shows the results of re-running my earlier heap
tuple sorting benchmark with this new patch. The improvement in the
query that orders by 2 columns is all that is pertinent there, when
considering
On 30 December 2011 19:46, Merlin Moncure mmonc...@gmail.com wrote:
On Thu, Dec 29, 2011 at 8:03 PM, Peter Geoghegan pe...@2ndquadrant.com
wrote:
* A spreadsheet that shows the results of re-running my earlier heap
tuple sorting benchmark with this new patch. The improvement in the
query
On Fri, Dec 30, 2011 at 2:30 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
On 30 December 2011 19:46, Merlin Moncure mmonc...@gmail.com wrote:
On Thu, Dec 29, 2011 at 8:03 PM, Peter Geoghegan pe...@2ndquadrant.com
wrote:
* A spreadsheet that shows the results of re-running my earlier heap
69 matches
Mail list logo