On 21 June 2012 11:40, Florian Pflug <f...@phlo.org> wrote: > At this point, my theory is that your choice of "random" strings prevents > strxfrm() from ever winning over strcoll(). The reason being that you pick > each letter uniformly distributed from a-z, resulting in a probability of > two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for > extremely complex collation rules, strcoll() probably only needs to compare > a few characters to determine the order. Whereas strxfrm() has to compute > the whole sort key, no matter what.
Good point. > The question is thus, how good a model are your "random" strings for the > input of a typical sorting step in postgres? My guess is, a quite good one > actually, since people probably don't deal with a lot of very similar strings > very often. Which makes we wonder if using strxfrm() during sorting wouldn't > be a net loss, all things considered… Well, I think the answer to that has to be no. I posted a simple postgres text -> strxfrm blob bytea SQL-callable wrapper function a few days ago that clearly wins by quite a bit on some sample queries against the dellstore sample database, which has a representative sample set. Completely uniformly-distributed data actually isn't representative of the real world at all. Normal distributions abound. This C++ program was never intended to justify the general utility of using strxfrm() for sorting, which I don't believe is in question. I just wanted to get some idea about the performance characteristics of using strxfrm() to traverse an index. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers