On Tue, Sep 16, 2014 at 02:57:00PM -0700, Peter Geoghegan wrote:
> On Tue, Sep 16, 2014 at 2:07 PM, Peter Eisentraut <pete...@gmx.net> wrote:
> > Clearly, this is worth documenting, but I don't think we can completely
> > prevent the problem.  There has been talk of a built-in index integrity
> > checking tool.  That would be quite useful.
> 
> We could at least use the GNU facility for versioning collations where
> available, LC_IDENTIFICATION [1]. By not versioning collations, we are
> going against the express advice of the Unicode consortium (they also
> advise to do a strcmp() tie-breaker, something that I think we
> independently discovered in 2005, because of a bug report - this is
> what I like to call "the Hungarian issue". They know what our
> constraints are.). I recognize it's a tricky problem, because of our
> historic dependence on OS collations, but I think we should definitely
> do something. That said, I'm not volunteering for the task, because I
> don't have time. While I'm not sure of what the long term solution
> should be, it *is not* okay that we don't version collations. I think
> that even the best possible B-Tree check tool is a not a solution.

Personally I think we should just support ICU as an option. FreeBSD has
been maintaining an out of tree patch for 10 years now so we know it
works.

The FreeBSD patch is not optimal though, these days ICU supports UTF-8
directly so many of the push-ups FreeBSD does are no longer necessary.
It is often faster than glibc and the key sizes for strxfrm are more
compact [1] which is relevent for the recent optimisation patch.

Lets solve this problem for once and for all.

[1] http://site.icu-project.org/charts/collation-icu4c48-glibc

-- 
Martijn van Oosterhout   <klep...@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment: signature.asc
Description: Digital signature

Reply via email to