On Thu, Jun 14, 2012 at 6:30 PM, Peter Geoghegan <pe...@2ndquadrant.com> wrote: >> Here we know that it doesn't matter, so the application of Knuth's first law >> of optimization is appropriate. > > I'm not advocating some Byzantine optimisation, or even something that > could reasonably be described as an optimisation at all here. I'm > questioning why you've unnecessarily complicated the code by having > the buffer size just big enough to fit the biggest value seen so far, > but arbitrarily aligned to a value that is completely irrelevant to > bttextfastcmp_locale(), rather than using simple geometric expansion, > which is more or less the standard way of managing the growth of a > dynamic array.
Well, so... palloc, and most malloc implementations, don't actually just double every time forever. They double up to some relatively small number like 1MB, and then they expand in 1MB chunks. std::vector may well behave as you're describing, but that's doing something completely different. We're not accumulating a string of unknown length into a buffer, with the risk of having to copy the whole buffer every time we reallocate. We know the exact size of the buffer we need for any given string, so the cost of a pfree/palloc is only the cost of releasing and allocating memory; there is no additional copying cost, as there would be for std::vector. Look at it another way. Right now, the worst case number of temporary buffer allocations is about 2N lg N, assuming we do about N lg N comparisons during a sort. If we simply implemented the most naive buffer reallocation algorithm possible, we might in the worst case reallocate each buffer up to N times. That is already better than what we do now by a factor of lg N. If we suppose N is about a million, that's an improvement of 20x *in the worst case* - typically, the improvement will be much larger, because all the strings will be of similar length or we won't hit them in exactly increasing order of length. The algorithm I've actually implemented bounds the worst case 1024 times more tightly - given N strings, we can't need to enlarge the buffer more than N/1024 times. So with N of around a million, this algorithm should eliminate *at least* 99.995% of the pallocs we currently do . How much better does it need to be? > You have to grow the array in some way. The basic approach I've > outlined has something to recommend it - why does it make sense to > align the size of the buffer to TEXTBUFLEN in particular though? There's nothing magic about TEXTBUFLEN as far as that goes. We could round to the nearest multiple of 8kB or whatever if that seemed better. But if we rounded to, say, the nearest 1MB, then someone sorting strings that are 2kB long would use 2MB of memory for these buffers. Considering that we ship with work_mem = 1MB, that could easily end up wasting almost twice work_mem. I don't see how you can view that as a trivial problem. It might be worth sucking that up if it seemed likely to dramatically improve performance, but it doesn't. > It's > quite easy to imagine what you've done here resulting in an excessive > number of allocations (and pfree()s), which *could* be expensive. Actually, I can't imagine that at all, per the above analysis. If I could, I might agree with you. :-) > There is a trade-off between space and time to be made here, but I > don't know why you think that the right choice is to use almost the > smallest possible amount of memory in all cases. Because I think that with the current implementation I can have my cake and eat it, too. I believe I've gotten essentially all the available speedup for essentially no memory wastage. If you can convince me that I've missed something and there is still a meaningful amount of palloc overhead left to be squeezed out, I'm all ears. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers