Here is where I think the timezone and PostGIS cases are fundamentally 
I can pretty easily make sure that all my servers run in the same timezone.  
That's just good practice.  I'm also going to install the same version of 
PostGIS everywhere in a cluster.  I'll build PostGIS and its dependencies from 
the exact same source files, regardless of when I build the machine.

Timezone is a user level setting; PostGIS is a user level library used by a 

glibc is a system level library, and text is a core data type, however.  
Changing versions to something that doesn't match the kernel can lead to system 
level instability, broken linkers, etc.  (I know because I tried).  Here are 
some subtle other problems that fall out:

 * Upgrading glibc, the kernel, and linker through the package manager in order 
to get security updates can cause the corruption.
 * A basebackup that is taken in production and placed on a backup server might 
not be valid on that server, or your desktop machine, or on the spare you keep 
to do PITR when someone screws up.
 * Unless you keep _all_ of your clusters on the same OS, machines from your 
database spare pool probably won't be the right OS when you add them to the 
cluster because a member failed.

Keep in mind here, by OS I mean CentOS versions.  (we're running a mix of late 
5.x and 6.x, because of our numerous issues with the 6.x kernel)

The problem with LC_IDENTIFICATION is that every machine I have seen reports 
revision "1.0", date "2000-06-24".  It doesn't seem like the versioning is 
being actively maintained.

I'm with Martjin here, lets go ICU, if only because it moves sorting to a user 
level library, instead of a system level.  Martjin do you have a link to the 
out of tree patch?  If not I'll find it.  I'd like to apply it to a branch and 
start playing with it.

- Matt K

On Sep 17, 2014, at 7:39 AM, Martijn van Oosterhout <>

> On Tue, Sep 16, 2014 at 02:57:00PM -0700, Peter Geoghegan wrote:
>> On Tue, Sep 16, 2014 at 2:07 PM, Peter Eisentraut <> wrote:
>>> Clearly, this is worth documenting, but I don't think we can completely
>>> prevent the problem.  There has been talk of a built-in index integrity
>>> checking tool.  That would be quite useful.
>> We could at least use the GNU facility for versioning collations where
>> available, LC_IDENTIFICATION [1]. By not versioning collations, we are
>> going against the express advice of the Unicode consortium (they also
>> advise to do a strcmp() tie-breaker, something that I think we
>> independently discovered in 2005, because of a bug report - this is
>> what I like to call "the Hungarian issue". They know what our
>> constraints are.). I recognize it's a tricky problem, because of our
>> historic dependence on OS collations, but I think we should definitely
>> do something. That said, I'm not volunteering for the task, because I
>> don't have time. While I'm not sure of what the long term solution
>> should be, it *is not* okay that we don't version collations. I think
>> that even the best possible B-Tree check tool is a not a solution.
> Personally I think we should just support ICU as an option. FreeBSD has
> been maintaining an out of tree patch for 10 years now so we know it
> works.
> The FreeBSD patch is not optimal though, these days ICU supports UTF-8
> directly so many of the push-ups FreeBSD does are no longer necessary.
> It is often faster than glibc and the key sizes for strxfrm are more
> compact [1] which is relevent for the recent optimisation patch.
> Lets solve this problem for once and for all.
> [1]
> -- 
> Martijn van Oosterhout   <>
>> He who writes carelessly confesses thereby at the very outset that he does
>> not attach much importance to his own thoughts.
>   -- Arthur Schopenhauer

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to