Re: [HACKERS] Collations and Replication; Next Steps

2014-09-30 Thread Bruce Momjian
On Wed, Sep 17, 2014 at 01:07:56PM +, Matthew Kelly wrote: * Unless you keep _all_ of your clusters on the same OS, machines from your database spare pool probably won't be the right OS when you add them to the cluster because a member failed. There has been discussion about having

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-18 Thread Martijn van Oosterhout
On Thu, Sep 18, 2014 at 01:35:10PM +0900, Tatsuo Ishii wrote: In my understanding PostgreSQL's manual MUST include the ICU license term (this is not a problem). What I am not so sure is, any software uses PostgreSQL also MUST include the ICU license or not. If yes, I think this is surely a

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-18 Thread Martijn van Oosterhout
On Wed, Sep 17, 2014 at 03:57:38PM +0100, Greg Stark wrote: Then there's the concern that ICU is a *huge* dependency. ICU is itself larger than the entire Postgres install. It's a big burden on users to have to install and configure a second collation library in addition to the system library

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-18 Thread Oleg Bartunov
On Thu, Sep 18, 2014 at 3:25 PM, Martijn van Oosterhout klep...@svana.org wrote: On Thu, Sep 18, 2014 at 01:35:10PM +0900, Tatsuo Ishii wrote: In my understanding PostgreSQL's manual MUST include the ICU license term (this is not a problem). What I am not so sure is, any software uses

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-18 Thread Heikki Linnakangas
On 09/18/2014 04:12 PM, Oleg Bartunov wrote: On Thu, Sep 18, 2014 at 3:25 PM, Martijn van Oosterhout klep...@svana.org wrote: On Thu, Sep 18, 2014 at 01:35:10PM +0900, Tatsuo Ishii wrote: In my understanding PostgreSQL's manual MUST include the ICU license term (this is not a problem). What

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-18 Thread Peter Geoghegan
On Thu, Sep 18, 2014 at 6:51 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: The same it works with libxml, openssl, libreadline and all the other libraries you can build with. I like the comparison with libxml. If we were to adopt ICU, it would be as a core component that makes collation

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Martijn van Oosterhout
On Tue, Sep 16, 2014 at 02:57:00PM -0700, Peter Geoghegan wrote: On Tue, Sep 16, 2014 at 2:07 PM, Peter Eisentraut pete...@gmx.net wrote: Clearly, this is worth documenting, but I don't think we can completely prevent the problem. There has been talk of a built-in index integrity checking

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Matthew Kelly
Here is where I think the timezone and PostGIS cases are fundamentally different: I can pretty easily make sure that all my servers run in the same timezone. That's just good practice. I'm also going to install the same version of PostGIS everywhere in a cluster. I'll build PostGIS and its

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Robert Haas
On Wed, Sep 17, 2014 at 9:07 AM, Matthew Kelly mke...@tripadvisor.com wrote: Here is where I think the timezone and PostGIS cases are fundamentally different: I can pretty easily make sure that all my servers run in the same timezone. That's just good practice. I'm also going to install

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Matthew Kelly
Let me double check that assertion before we go too far with it. Most of the problems I've seen are across 5 and 6 boundaries. I thought I had case where it was within a minor release but I can't find it right now. I'm going to dig. That being said the sort order changes whether you

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Greg Stark
On Tue, Sep 16, 2014 at 11:41 PM, Peter Geoghegan p...@heroku.com wrote: The timezone case you highlight here seems quite distinct from what Matthew is talking about, because in point of fact the on-disk representation is merely *interpreted* with reference to the timezone database. So, you

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Tatsuo Ishii
Why don't we have our collation data? It seems MySQL has already done this. http://dev.mysql.com/doc/refman/5.0/en/charset-collation-implementations.html I don't think we cannot achieve that because even MySQL accomplishes:-) Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English:

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Greg Stark
On Wed, Sep 17, 2014 at 3:47 PM, Tatsuo Ishii is...@postgresql.org wrote: I don't think we cannot achieve that because even MySQL accomplishes:-) We've always considered it an advantage that we're consistent with the collations in the rest of the system. Generally speaking the fact that Postgres

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Geoghegan
On Wed, Sep 17, 2014 at 6:17 AM, Robert Haas robertmh...@gmail.com wrote: What I find astonishing is that whoever maintains glibc (or the Red Hat packaging for it) thinks it's OK to change the collation order in a minor release. I'd understand changing it between, say, RHEL 6 and RHEL 7. But

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Eisentraut
On 9/17/14 10:47 AM, Tatsuo Ishii wrote: Why don't we have our collation data? It seems MySQL has already done this. Where would you get the source data from? How would you maintain it? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Eisentraut
On 9/16/14 5:57 PM, Peter Geoghegan wrote: On Tue, Sep 16, 2014 at 2:07 PM, Peter Eisentraut pete...@gmx.net wrote: Clearly, this is worth documenting, but I don't think we can completely prevent the problem. There has been talk of a built-in index integrity checking tool. That would be

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Martijn van Oosterhout
On Wed, Sep 17, 2014 at 01:07:56PM +, Matthew Kelly wrote: I'm with Martjin here, lets go ICU, if only because it moves sorting to a user level library, instead of a system level. Martjin do you have a link to the out of tree patch? If not I'll find it. I'd like to apply it to a branch

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Eisentraut
On 9/17/14 9:07 AM, Matthew Kelly wrote: Here is where I think the timezone and PostGIS cases are fundamentally different: I can pretty easily make sure that all my servers run in the same timezone. That's just good practice. I'm also going to install the same version of PostGIS

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Geoghegan
On Wed, Sep 17, 2014 at 11:05 AM, Peter Eisentraut pete...@gmx.net wrote: We could at least use the GNU facility for versioning collations where available, LC_IDENTIFICATION [1]. It looks like the revisions or dates reported by LC_IDENTIFICATION aren't ever updated for most locales. That's

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Eisentraut
On 9/17/14 10:46 AM, Greg Stark wrote: You could have a problem if you have an expression index on (timestamp AT TIME ZONE '...'). I may have the expression slightly wrong but I believe it is posisble to write an immutable expression that depends on the tzdata data as long as it doesn't depend

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Eisentraut
On 9/17/14 2:07 PM, Peter Geoghegan wrote: On Wed, Sep 17, 2014 at 11:05 AM, Peter Eisentraut pete...@gmx.net wrote: We could at least use the GNU facility for versioning collations where available, LC_IDENTIFICATION [1]. It looks like the revisions or dates reported by LC_IDENTIFICATION

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Geoghegan
On Wed, Sep 17, 2014 at 11:08 AM, Peter Eisentraut pete...@gmx.net wrote: I also wrote PostGIS dependent libraries, not PostGIS itself. If you are comparing RHEL 5 and 6, as you wrote elsewhere, then some of those will most likely be different. (Heck, glibc could be different. Is glibc

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Geoghegan
On Wed, Sep 17, 2014 at 7:46 AM, Greg Stark st...@mit.edu wrote: You could have a problem if you have an expression index on (timestamp AT TIME ZONE '...'). I may have the expression slightly wrong but I believe it is posisble to write an immutable expression that depends on the tzdata data as

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Tatsuo Ishii
On 9/17/14 10:47 AM, Tatsuo Ishii wrote: Why don't we have our collation data? It seems MySQL has already done this. Where would you get the source data from? How would you maintain it? Don't know. However seeing that that MySQL manages it, it should be possible for us. Best regards, --

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Tatsuo Ishii
On Wed, Sep 17, 2014 at 3:47 PM, Tatsuo Ishii is...@postgresql.org wrote: I don't think we cannot achieve that because even MySQL accomplishes:-) We've always considered it an advantage that we're consistent with the collations in the rest of the system. Generally speaking the fact that

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Robert Haas
On Wed, Sep 17, 2014 at 10:06 AM, Matthew Kelly mke...@tripadvisor.com wrote: Let me double check that assertion before we go too far with it. Most of the problems I've seen are across 5 and 6 boundaries. I thought I had case where it was within a minor release but I can't find it right now.

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Geoghegan
On Wed, Sep 17, 2014 at 5:16 PM, Robert Haas robertmh...@gmail.com wrote: Of course, there's also the question of whether ICU would have similar issues. You're assuming that they *don't* whack the collation order around in minor releases, or at least that they do so to some lesser degree than

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Oleg Bartunov
We use ICU with postgres for many years in our mchar extension, which provides case-insensitive text data type for popular russian financial system. I don't know if we may ask ICU to give us special BSD-compatible license ?

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Geoghegan
On Wed, Sep 17, 2014 at 9:06 PM, Oleg Bartunov obartu...@gmail.com wrote: We use ICU with postgres for many years in our mchar extension, which provides case-insensitive text data type for popular russian financial system. I don't know if we may ask ICU to give us special BSD-compatible

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Craig Ringer
On 09/17/2014 09:17 PM, Robert Haas wrote: What I find astonishing is that whoever maintains glibc (or the Red Hat packaging for it) thinks it's OK to change the collation order in a minor release. I'd understand changing it between, say, RHEL 6 and RHEL 7. But the idea that minor release,

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Oleg Bartunov
On Thu, Sep 18, 2014 at 1:09 PM, Peter Geoghegan p...@heroku.com wrote: On Wed, Sep 17, 2014 at 9:06 PM, Oleg Bartunov obartu...@gmail.com wrote: We use ICU with postgres for many years in our mchar extension, which provides case-insensitive text data type for popular russian financial

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Tatsuo Ishii
On Wed, Sep 17, 2014 at 9:06 PM, Oleg Bartunov obartu...@gmail.com wrote: We use ICU with postgres for many years in our mchar extension, which provides case-insensitive text data type for popular russian financial system. I don't know if we may ask ICU to give us special BSD-compatible

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-17 Thread Peter Geoghegan
On Wed, Sep 17, 2014 at 9:35 PM, Tatsuo Ishii is...@postgresql.org wrote: In my understanding PostgreSQL's manual MUST include the ICU license term (this is not a problem). What I am not so sure is, any software uses PostgreSQL also MUST include the ICU license or not. If yes, I think this is

[HACKERS] Collations and Replication; Next Steps

2014-09-16 Thread Matthew Kelly
Hello, Last month, I brought up the following issue to the general mailing list about how running streaming replication between machines running different versions of glibc can cause corrupt indexes. http://www.postgresql.org/message-id/ba6132ed-1f6b-4a0b-ac22-81278f5ab...@tripadvisor.com In

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-16 Thread Peter Eisentraut
On 9/16/14 12:06 PM, Matthew Kelly wrote: The second and far more challenging problem is how do we fix this issue? As of our last discussion, Peter Geoghegan revived the proposal of using ICU as an alternative.

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-16 Thread Peter Geoghegan
On Tue, Sep 16, 2014 at 2:07 PM, Peter Eisentraut pete...@gmx.net wrote: Clearly, this is worth documenting, but I don't think we can completely prevent the problem. There has been talk of a built-in index integrity checking tool. That would be quite useful. We could at least use the GNU

Re: [HACKERS] Collations and Replication; Next Steps

2014-09-16 Thread Peter Geoghegan
On Tue, Sep 16, 2014 at 2:07 PM, Peter Eisentraut pete...@gmx.net wrote: It seems to me that this is a more general problem that can affect any data type that relies on anything external. For example, you could probably create a case where indexes are corrupted if you have two different time