subject:"\[HACKERS\] Open issues for collations"

Re: [HACKERS] Open issues for collations

2011-04-18 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 [ assorted comments on original issues ]

I believe that all the collation issues I complained about on 26 March
are now resolved, except for the question of getting some more test
cases, and the question of adding a cares-about-collation flag to
pg_proc.  I've added the latter to the TODO list since everybody agreed
it was too late to consider it for 9.1.

However, I've come across a new issue that maybe requires discussion:
what collation should be associated with a multi-row VALUES in FROM?
For instance, in

SELECT ... FROM
  (VALUES (1, 'foo'), (2, 'bar' COLLATE C)) v(a,b),
  ...

what collation should be imputed to references to v.b?

The way the code currently works is that the first row of the VALUES
list is inspected to determine what collation to report --- so in this
example, you'd get default collation, ignoring the COLLATE clause in the
second row.  There are several problems with this:

1. I think it violates the SQL spec.  SQL:2008 7.3 table value
constructor says that the column types of a VALUES construct are
resolved per the rules of section 9.3, and 9.3 is the one with the
standard verbiage about resolving a common collation, so it's hard
to see how use the first row and ignore the rest satisfies the spec.

2. It doesn't seem to satisfy the POLA --- in the above example,
ignoring the explicit COLLATE clause is rather surprising.  We could
document that that's what it does, perhaps, but if you can attach
COLLATE to any input of an operator or function and get the same
results, it's hard to explain why the same isn't true of a VALUES
column.  Especially when VALUES resolves data types in a much more
symmetrical fashion, eg this works:

  (VALUES (1, '1'), (2, '2'::real)) v(a,b)

3. It's not hard to imagine people thinking they can get row-by-row-
varying collation behavior from something like

SELECT ... FROM
  (VALUES (1, 'foo' COLLATE en_US), (2, 'bar' COLLATE C)) v(a,b),
  ...

So not throwing an error, but rather silently doing something other than
what this SQL seems to say, seems pretty unfriendly to me.

The reason I'm expending so much verbiage on this is that fixing it
seems to require an addition to struct RangeTblEntry, ie, a catversion
bump and forced initdb, so that we can store a list of the resolved
column collations for an RTE_VALUES RTE.  I don't really think there's
much choice though.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-04-18 Thread Peter Eisentraut

On Mon, 2011-04-18 at 10:41 -0400, Tom Lane wrote:
 However, I've come across a new issue that maybe requires discussion:
 what collation should be associated with a multi-row VALUES in FROM?
 For instance, in
 
 SELECT ... FROM
   (VALUES (1, 'foo'), (2, 'bar' COLLATE C)) v(a,b),
   ...
 
 what collation should be imputed to references to v.b?
 
 The way the code currently works is that the first row of the VALUES
 list is inspected to determine what collation to report

Hmm, I do see this on my list of things to address, so yes, it should be
fixed.  I likely got stuck because it's pretty complicated.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-04-09 Thread Peter Eisentraut

On mån, 2011-03-28 at 20:02 -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On lör, 2011-03-26 at 00:36 -0400, Tom Lane wrote:
  * It'd sure be nice if we had some nontrivial test cases that work
  in encodings besides UTF8.  I'm still bothered that the committed
  patch failed to cover single-byte-encoding cases in 
 upper/lower/initcap.
 
  Well, how do we want to maintain these test cases without doing too
  much duplication?  It would be easy to run a small sed script over
  collate.linux.utf8.sql to create, say, a latin1 version out of it.
 
 I tried.  The upper/lower test cases require Turkish characters that
 aren't in Latin1.  I'm not sure if we can readily produce test cases
 that cover both sorting changes and case-folding changes in just one
 single-byte encoding --- anybody?
 
 One thing I noticed but didn't push to committing is that the test
 case has a largely-unnecessary assumption about how the local system's
 locale names spell utf8.  We could eliminate that by having it use
 the trimmed locale names created by initdb.

I see you went for the latter option.  That works pretty well already.
I've also been playing around with separating out the Turkish tests
into a separate file.  That would then probably get the remaining
latin1 file passing, if we also dropped the encoding mention from this
error message:

ERROR:  collation foo for encoding UTF8 does not exist

I had thought hard about this in the past and didn't want to do it, but
since we are now making every effort to effectively hide collations with
the wrong encoding, this would possibly be acceptable.

I'm also seeing promising signs that we might get this test (minus
Turkish, perhaps) passing on Windows.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-04-09 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On mÃ¥n, 2011-03-28 at 20:02 -0400, Tom Lane wrote:
 One thing I noticed but didn't push to committing is that the test
 case has a largely-unnecessary assumption about how the local system's
 locale names spell utf8.  We could eliminate that by having it use
 the trimmed locale names created by initdb.

 I see you went for the latter option.  That works pretty well already.
 I've also been playing around with separating out the Turkish tests
 into a separate file.  That would then probably get the remaining
 latin1 file passing, if we also dropped the encoding mention from this
 error message:

 ERROR:  collation foo for encoding UTF8 does not exist

 I had thought hard about this in the past and didn't want to do it, but
 since we are now making every effort to effectively hide collations with
 the wrong encoding, this would possibly be acceptable.

Not sure.  If we had the test refactored to the point where that was the
only diff you got with a different server encoding, maybe it'd be worth
changing; but right now we're still a long way from there.  I was seeing
this change as mainly targeted towards making the test useful on more
platforms, and since that encoding name is ours and not
platform-specific, it doesn't create any portability issues to show it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-04-08 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 Reading through this thread...
 On Sat, Mar 26, 2011 at 12:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 ** Selecting a field from a record-returning function's output.
 Currently, we'll use the field's declared collation; except that
 if the field has default collation, we'll replace that with the common
 collation of the function's inputs, if any.  Is either part of that
 sane?  Do we need to make this work for functions invoked with other
 syntax than a plain function call, eg operator or cast syntax?

 There were a couple of different ideas about which way we ought to go
 with this, but I'm happy to defer to what Tom and Martijn hashed out:

 MO That seems all a bit weird. I spent some time reading through the SQL
 MO spec to see if I could came up with a few ideas about what they thought
 MO relevent. I think the gist of it is that I think the result row should
 MO have for each column its declared collation in all cases.

 TL That interpretation would be fine with me.  It would let us get rid of
 TL the special-case code at lines 307-324 of parse_collate.c, which I put
 TL in only because there are cases in the collate.linux.utf8.sql regression
 TL test that fail without it.  But I'm perfectly happy to conclude that
 TL those test cases are mistaken.

 I'm not sure whether that's been done, though, or whether we're even
 going to do it.

I looked a bit more closely at this, and I think I finally get the point
of what those regression test cases involving the dup() function are
about.  Consider a trivial polymorphic function such as

create function dummy(anyelement) returns anyelement as
'select $1' language sql;

When applied to a textual argument, this is a function taking and
returning string, and so collation does (and should, I think) propagate
through it.  Thus in

select dummy(x) from tbl order by 1;

you will get ordering by the declared collation of tbl.x, whatever that
is.  But now consider

create function dup(in anyelement, a out anyelement, b out anyelement)
as 'select $1, $2' language sql;

select dup(x).a from tbl order by 1;

It's not unreasonable to think that this should also order by tbl.x's
collation --- if collation propagates through dummy(), why not through
dup()?  And in fact those regression test cases are expecting that it
does propagate in such a case.

Now the discussion that we had earlier in this thread was implicitly
assuming that we were talking about FieldSelect from a known composite
type.  If dup() were declared to return a named composite type, then
using the collation that is declared for that type's a column seems
reasonable.  But when you're dealing with an anonymous record type,
which is what dup() actually returns here, there is no such declaration;
and what's more, the fact that there's a record type at all is just an
implementation detail to most users.

If we take out the kluge in parse_collate.c's handling of FieldSelects,
then what we will get in this example is ordering by the database
default collation.  We can justify that on a narrow language-lawyering
basis by saying dup() returns a composite type, which has no collation,
therefore collation does not propagate through from its arguments to
any column you might select from its result.  But it's going to feel
a bit surprising to anyone who thinks of this in terms of OUT arguments
rather than an anonymous composite type.

I'm inclined to think that we should take out the kluge and rest on the
language-lawyering viewpoint, because otherwise there are going to be
umpteen other corner cases where people are going to expect collation to
propagate and it's not going to work without very major kluging.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-04-08 Thread Peter Eisentraut

On lör, 2011-03-26 at 00:36 -0400, Tom Lane wrote:
 * Remove initdb's warning about useless locales?  Seems like pointless
 noise, or at least something that can be relegated to debug mode.

Quick question on this:  Should we at least warn if zero suitable
locales were found or some other problem scenario?  Or should we just
wait around and see what actual problems, if any, will be reported?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-04-08 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On lÃ¶r, 2011-03-26 at 00:36 -0400, Tom Lane wrote:
 * Remove initdb's warning about useless locales?  Seems like pointless
 noise, or at least something that can be relegated to debug mode.

 Quick question on this:  Should we at least warn if zero suitable
 locales were found or some other problem scenario?  Or should we just
 wait around and see what actual problems, if any, will be reported?

Well, my opinion is that normal users never see the output of initdb
at all, so I don't think there's that much value in complaining there.
But I don't have a serious objection to complaining if we couldn't find
any usable locales at all, either.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-04-08 Thread Alvaro Herrera

Excerpts from Tom Lane's message of vie abr 08 15:27:15 -0300 2011:
 Peter Eisentraut pete...@gmx.net writes:
  On lör, 2011-03-26 at 00:36 -0400, Tom Lane wrote:
  * Remove initdb's warning about useless locales?  Seems like pointless
  noise, or at least something that can be relegated to debug mode.
 
  Quick question on this:  Should we at least warn if zero suitable
  locales were found or some other problem scenario?  Or should we just
  wait around and see what actual problems, if any, will be reported?
 
 Well, my opinion is that normal users never see the output of initdb
 at all, so I don't think there's that much value in complaining there.

Those users are not going to have those problems anyway.  The
problematic users are going to be those running on unusual platforms.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-04-08 Thread Peter Eisentraut

On fre, 2011-04-08 at 16:14 -0300, Alvaro Herrera wrote:
 Excerpts from Tom Lane's message of vie abr 08 15:27:15 -0300 2011:
  Peter Eisentraut pete...@gmx.net writes:
   On lör, 2011-03-26 at 00:36 -0400, Tom Lane wrote:
   * Remove initdb's warning about useless locales?  Seems like pointless
   noise, or at least something that can be relegated to debug mode.
  
   Quick question on this:  Should we at least warn if zero suitable
   locales were found or some other problem scenario?  Or should we just
   wait around and see what actual problems, if any, will be reported?
  
  Well, my opinion is that normal users never see the output of initdb
  at all, so I don't think there's that much value in complaining there.
 
 Those users are not going to have those problems anyway.  The
 problematic users are going to be those running on unusual platforms.

So what is your opinion on the original question?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-04-08 Thread Alvaro Herrera

Excerpts from Peter Eisentraut's message of sáb abr 09 01:32:28 -0300 2011:
 On fre, 2011-04-08 at 16:14 -0300, Alvaro Herrera wrote:
  Excerpts from Tom Lane's message of vie abr 08 15:27:15 -0300 2011:
   Peter Eisentraut pete...@gmx.net writes:
On lör, 2011-03-26 at 00:36 -0400, Tom Lane wrote:
* Remove initdb's warning about useless locales?  Seems like pointless
noise, or at least something that can be relegated to debug mode.
   
Quick question on this:  Should we at least warn if zero suitable
locales were found or some other problem scenario?  Or should we just
wait around and see what actual problems, if any, will be reported?
   
   Well, my opinion is that normal users never see the output of initdb
   at all, so I don't think there's that much value in complaining there.
  
  Those users are not going to have those problems anyway.  The
  problematic users are going to be those running on unusual platforms.
 
 So what is your opinion on the original question?

I feel throwing a warning is appropriate here, because people for which
problems are most likely are going to see them.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-04-05 Thread Robert Haas

Reading through this thread...

On Sat, Mar 26, 2011 at 12:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 ** Selecting a field from a record-returning function's output.
 Currently, we'll use the field's declared collation; except that
 if the field has default collation, we'll replace that with the common
 collation of the function's inputs, if any.  Is either part of that
 sane?  Do we need to make this work for functions invoked with other
 syntax than a plain function call, eg operator or cast syntax?

There were a couple of different ideas about which way we ought to go
with this, but I'm happy to defer to what Tom and Martijn hashed out:

MO That seems all a bit weird. I spent some time reading through the SQL
MO spec to see if I could came up with a few ideas about what they thought
MO relevent. I think the gist of it is that I think the result row should
MO have for each column its declared collation in all cases.

TL That interpretation would be fine with me.  It would let us get rid of
TL the special-case code at lines 307-324 of parse_collate.c, which I put
TL in only because there are cases in the collate.linux.utf8.sql regression
TL test that fail without it.  But I'm perfectly happy to conclude that
TL those test cases are mistaken.

I'm not sure whether that's been done, though, or whether we're even
going to do it.

 ** What to do with domains whose declaration includes a COLLATE clause?
 Currently, we'll impute that collation to the result of a cast to the
 domain type --- even if the cast's input expression includes an
 explicit COLLATE clause.  It's not clear that that's per spec.  If it
 is correct, should we behave similarly for functions that are declared
 to return a domain type?  Should it matter if the cast-to-domain is
 explicit or implicit?  Perhaps it'd be best if domain collations only
 mattered for columns declared with that domain type.  Then we'd have
 a general rule that collations only come into play in an expression
 as a result of (a) the declared type of a column reference or (b)
 an explicit COLLATE clause.

I think we had agreement than a cast to a domain type with a collation
should stomp on any existing collation on the contained expression.

 * In plpgsql, is it OK for declared local variables to inherit the
 function's input collation?  Should we provide a COLLATE option in
 variable declarations to let that be overridden?

I think everyone who responded said yes to both questions.

 * RI triggers should insert COLLATE clauses in generated queries to
 satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the
 referenced column's collation.  Right now you may get either table's
 collation depending on which query type is involved.  I think an obvious
 failure may not be possible so long as equality means the same thing in
 all collations, but it's definitely possible that the planner might
 decide it can't use the referenced column's unique index, which would
 suck for performance.  (Note: this rule seems to prove that the
 committee assumes equality can mean different things in different
 collations, else they'd not have felt the need to specify.)

I tested this and, indeed, if the collations don't match, the index
can't be used.

CREATE TABLE me (x character varying COLLATE en_US);
CREATE TABLE me2 (x character varying COLLATE es_ES);
CREATE TABLE you (x character varying COLLATE es_ES NOT NULL,
PRIMARY KEY (x));
ALTER TABLE me ADD FOREIGN KEY (x) REFERENCES you(x);
ALTER TABLE me2 ADD FOREIGN KEY (x) REFERENCES you(x);
SET enable_seqscan=false;
SET enable_hashjoin=false;

With that setup, this still does a seqscan-and-sort:

EXPLAIN select * from me, you where me.x = you.x;

But this uses the index:

EXPLAIN select * from me2, you where me2.x = you.x;

I found another problem, too:

rhaas=# insert into you values ('1');
INSERT 0 1
rhaas=# insert into me values ('1');
INSERT 0 1
rhaas=# alter table me alter column x set data type varchar collate en_GB;
ERROR:  could not determine which collation to use for string comparison
HINT:  Use the COLLATE clause to set the collation explicitly.
CONTEXT:  SQL statement SELECT fk.x FROM ONLY public.me fk LEFT
OUTER JOIN ONLY public.you pk ON ( pk.x::pg_catalog.text
OPERATOR(pg_catalog.=) fk.x::pg_catalog.text) WHERE pk.x IS NULL
AND (fk.x IS NOT NULL)

 * It'd sure be nice if we had some nontrivial test cases that work in
 encodings besides UTF8.  I'm still bothered that the committed patch
 failed to cover single-byte-encoding cases in upper/lower/initcap.

Seems like no one knows how to do this.

 * Remove initdb's warning about useless locales?  Seems like pointless
 noise, or at least something that can be relegated to debug mode.

Everyone was in favor of this.

 * Is it worth adding a cares-about-collation flag to pg_proc?  Probably
 too late to be worrying about such refinements for 9.1.

Probably too late for this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--

Re: [HACKERS] Open issues for collations

2011-03-29 Thread Martijn van Oosterhout

On Mon, Mar 28, 2011 at 08:02:40PM -0400, Tom Lane wrote:
 One thing I noticed but didn't push to committing is that the test case
 has a largely-unnecessary assumption about how the local system's locale
 names spell utf8.  We could eliminate that by having it use the
 trimmed locale names created by initdb.  I would've made more of a push
 for that if it resulted in a test case that passed on OS X, but it turns
 out that once you get past the locale name spelling, you find out that
 Macs still can't sort UTF8 strings correctly :-(

Yeah, and I don't think it's likely they're ever going to fix it
either. :( On their own website they explain how to do locale based
sorting, and they provide their own interfaces for that which,
unsurprisingly, uses UTF-16 and ICU underneath.

http://developer.apple.com/library/mac/#documentation/CoreFoundation/Conceptual/CFLocales/Articles/CFLocaleConcepts.html

I think the rule that we sort the same as the command-line sort program
is still true however, so I don't think it's surprising as such. Just
unfortunate.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] Open issues for collations

2011-03-28 Thread Peter Eisentraut

On lör, 2011-03-26 at 00:36 -0400, Tom Lane wrote:
 * RI triggers should insert COLLATE clauses in generated queries to
 satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the
 referenced column's collation.  Right now you may get either table's
 collation depending on which query type is involved.  I think an obvious
 failure may not be possible so long as equality means the same thing in
 all collations, but it's definitely possible that the planner might
 decide it can't use the referenced column's unique index, which would
 suck for performance.  (Note: this rule seems to prove that the
 committee assumes equality can mean different things in different
 collations, else they'd not have felt the need to specify.)

Right, but we don't support that yet, so I don't consider that that has
to be addressed right now.  Rather it could go on a list of things to
fix when supporting collations which redefine equality.  The index
mismatch issue is also not urgent.  It's not a regression and it's more
like don't-do-that-then or do-it-differently-then.

 * It'd sure be nice if we had some nontrivial test cases that work in
 encodings besides UTF8.  I'm still bothered that the committed patch
 failed to cover single-byte-encoding cases in upper/lower/initcap.

Well, how do we want to maintain these test cases without doing too much
duplication?  It would be easy to run a small sed script over
collate.linux.utf8.sql to create, say, a latin1 version out of it.
Since it's Linux only, it might be valid to do it that way without
having to make it super-portable in C.

 * Remove initdb's warning about useless locales?  Seems like pointless
 noise, or at least something that can be relegated to debug mode.

Fine with me.

 * Is it worth adding a cares-about-collation flag to pg_proc?  Probably
 too late to be worrying about such refinements for 9.1.

Probably.  It would open up a bunch of new cases to change and
fine-tune.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-03-28 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On lÃ¶r, 2011-03-26 at 00:36 -0400, Tom Lane wrote:
 * It'd sure be nice if we had some nontrivial test cases that work in
 encodings besides UTF8.  I'm still bothered that the committed patch
 failed to cover single-byte-encoding cases in upper/lower/initcap.

 Well, how do we want to maintain these test cases without doing too much
 duplication?  It would be easy to run a small sed script over
 collate.linux.utf8.sql to create, say, a latin1 version out of it.

I tried.  The upper/lower test cases require Turkish characters that
aren't in Latin1.  I'm not sure if we can readily produce test cases
that cover both sorting changes and case-folding changes in just one
single-byte encoding --- anybody?

One thing I noticed but didn't push to committing is that the test case
has a largely-unnecessary assumption about how the local system's locale
names spell utf8.  We could eliminate that by having it use the
trimmed locale names created by initdb.  I would've made more of a push
for that if it resulted in a test case that passed on OS X, but it turns
out that once you get past the locale name spelling, you find out that
Macs still can't sort UTF8 strings correctly :-(

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-03-28 Thread Alvaro Herrera

Excerpts from Tom Lane's message of lun mar 28 21:02:40 -0300 2011:
 Peter Eisentraut pete...@gmx.net writes:
  On lör, 2011-03-26 at 00:36 -0400, Tom Lane wrote:
  * It'd sure be nice if we had some nontrivial test cases that work in
  encodings besides UTF8.  I'm still bothered that the committed patch
  failed to cover single-byte-encoding cases in upper/lower/initcap.
 
  Well, how do we want to maintain these test cases without doing too much
  duplication?  It would be easy to run a small sed script over
  collate.linux.utf8.sql to create, say, a latin1 version out of it.
 
 I tried.  The upper/lower test cases require Turkish characters that
 aren't in Latin1.  I'm not sure if we can readily produce test cases
 that cover both sorting changes and case-folding changes in just one
 single-byte encoding --- anybody?

ISO-8859-9?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-03-28 Thread Euler Taveira de Oliveira


Em 28-03-2011 22:27, Alvaro Herrera escreveu:

Excerpts from Tom Lane's message of lun mar 28 21:02:40 -0300 2011:

I tried.  The upper/lower test cases require Turkish characters that
aren't in Latin1.  I'm not sure if we can readily produce test cases
that cover both sorting changes and case-folding changes in just one
single-byte encoding --- anybody?


ISO-8859-9?

I'm afraid we have to map lang to single byte character set. Not all languages 
prefer ISO-8859.



--
  Euler Taveira de Oliveira
  http://www.timbira.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-03-27 Thread Tom Lane

Martijn van Oosterhout klep...@svana.org writes:
 On Sat, Mar 26, 2011 at 12:36:43AM -0400, Tom Lane wrote:
 ** Selecting a field from a record-returning function's output.
 Currently, we'll use the field's declared collation; except that
 if the field has default collation, we'll replace that with the common
 collation of the function's inputs, if any.  Is either part of that
 sane?  Do we need to make this work for functions invoked with other
 syntax than a plain function call, eg operator or cast syntax?

 That seems all a bit weird. I spent some time reading through the SQL
 spec to see if I could came up with a few ideas about what they thought
 relevent. I think the gist of it is that I think the result row should
 have for each column its declared collation in all cases.

That interpretation would be fine with me.  It would let us get rid of
the special-case code at lines 307-324 of parse_collate.c, which I put
in only because there are cases in the collate.linux.utf8.sql regression
test that fail without it.  But I'm perfectly happy to conclude that
those test cases are mistaken.

 ** What to do with domains whose declaration includes a COLLATE clause?

 The SQL spec considers the collation to be part of the datatype, so if
 you're casting to a domain (or type) you get the collation associated
 with that domain (or type). As per the spec:

 The collation derivation of a declared type with a declared type
 collation that is explicitly or implicitly specified by a data type
 is implicit.

 So the result of a cast would be the collation of the specified
 type/domain with state implicit.

Hm.  That makes sense for explicit CAST syntax, but what about a
function returning a collatable type?  In particular, applying this
rule to the || operator would have us conclude that

x || y COLLATE foo

doesn't have an overall collation of foo, which seems clearly wrong.
But if you claim that it's all based on the result type of the function
then it's hard to avoid that outcome.

 Also, apparently the COLLATE clause as allowed anywhere where a
 datatype is permitted. So you can say:

 CAST( foo AS TEXT COLLATE en_US )

No, you can't; this case is specifically disallowed by SQL:2008 6.12
cast specification syntax rule 1b: data type shall not contain a
collate clause.  Which is a pretty weird thing for them to say if
you believe that collations are nothing more nor less than a data type
property.  It kinda leads me to think that that is not how the committee
sees collations ... although the spec is pretty unhelpful at suggesting
exactly how you should think of them instead.

 Incidently, a function returning a domain seems weird to me. What does
 it mean: (1) the function returns this type, Postgres assumes this is
 true, or (2) function returns something, Postgres does an implicit
 cast?

 In any case, I'd suggest it is treated as being included in the
 resolving of the return collation with the arguments so if the result
 is a domain and you apply the normal rules you get:

 (1) explicit states in the arguments will override it
 (2) if arguments are implicit state and conflict with domain, the
 result is no-collation, otherwise implicitly whatever the domain was
 (3) no arguments have collation, which means you get the domain
 default.

 Which all seems eminently reasonable.

It's reasonable by itself, but it's also rather different from what you
just argued the behavior should be for field selection.  I don't see
the rationale for treating those cases differently.  In fact, given that
Postgres has always treated f(x) and x.f as equivalent notations,
I think there's a pretty strong argument for wanting their effects on
collation choice to be equivalent too.

 * In plpgsql, is it OK for declared local variables to inherit the
 function's input collation?  Should we provide a COLLATE option in
 variable declarations to let that be overridden?  If Oracle understands
 COLLATE, probably we should look at what they do in PL/SQL.

 If COLLATE is allowed anywhere where the datatype is allowed, then the
 COLLATE clause should be permitted there. Otherwise they become the
 specified type with whatever the default is for that type. In
 expressions the coercible-default state will get overridden
 by the IMPLICIT state from the arguments as appropriate.

No, I don't care for that, because then it matters whether an expression
contains any direct references to the input arguments versus containing
just local variables.  In particular, assigning a parameter to a local
variable and then using the local variable instead of the parameter
could change the function's results.  That seems mighty surprising and
bug-prone to me.  So I think it's important that local variables default
to the function's input collation.  I have no problem with adding
explicit COLLATE to the declaration syntax for plpgsql variables,
though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list

Re: [HACKERS] Open issues for collations

2011-03-27 Thread Martijn van Oosterhout

On Sun, Mar 27, 2011 at 03:14:37PM -0400, Tom Lane wrote:
  So the result of a cast would be the collation of the specified
  type/domain with state implicit.
 
 Hm.  That makes sense for explicit CAST syntax, but what about a
 function returning a collatable type?  In particular, applying this
 rule to the || operator would have us conclude that
 
   x || y COLLATE foo
 
 doesn't have an overall collation of foo, which seems clearly wrong.
 But if you claim that it's all based on the result type of the function
 then it's hard to avoid that outcome.

I don't see this. There's no explicit cast in the above expression. The
return type says something, but that gets back to the question of
whether the return type indicates a cast to that type or not. In any
case, the SQL specifically states what should happen in the above case.

The question started with what happens when the function is declared to
return a domain type. I'm really not sure what to do there. I'd suggest
similar to how the typmod is propegated but I'm afraid that the answer
there is we don't. What I suggested with combining it with the
collations of the input types is about the only concrete suggestion I
have.

  Also, apparently the COLLATE clause as allowed anywhere where a
  datatype is permitted. So you can say:
 
  CAST( foo AS TEXT COLLATE en_US )
 
 No, you can't; this case is specifically disallowed by SQL:2008 6.12

Maybe they just thought it confusing that COLLATE would mean different
things in different contexts and forbade it.

  (1) explicit states in the arguments will override it
  (2) if arguments are implicit state and conflict with domain, the
  result is no-collation, otherwise implicitly whatever the domain was
  (3) no arguments have collation, which means you get the domain
  default.
 
  Which all seems eminently reasonable.
 
 It's reasonable by itself, but it's also rather different from what you
 just argued the behavior should be for field selection.  I don't see
 the rationale for treating those cases differently.  In fact, given that
 Postgres has always treated f(x) and x.f as equivalent notations,
 I think there's a pretty strong argument for wanting their effects on
 collation choice to be equivalent too.

Well, that's a good argument. But I'd suggest that here x.f is not a
column reference, so there's no reason it should be treated like one.
The SQL standard says column references determine collations but for
various strings operations it's the inputs that count. This makes sense
if you consider the return type declaration to be merely an assertion
and not a cast.

snip about default collation on declared variables

 No, I don't care for that, because then it matters whether an expression
 contains any direct references to the input arguments versus containing
 just local variables.  In particular, assigning a parameter to a local
 variable and then using the local variable instead of the parameter
 could change the function's results.  That seems mighty surprising and
 bug-prone to me.  So I think it's important that local variables default
 to the function's input collation.  I have no problem with adding
 explicit COLLATE to the declaration syntax for plpgsql variables,
 though.

Ok, you've convinced me here. You effectively set the default collation
for the period of the function, which I think is perfectly defensible.
We might allow people in the future to override it (a la search_path)
but this is a good start.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] Open issues for collations

2011-03-26 Thread Simon Riggs

On Sat, Mar 26, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I think some discussion of which of the things on the open
 item lists need to be done before beta might be helpful, and we ought
 to add any items that are not there but are blockers.

 Here's a quick enumeration of some things I think need discussion about
 the collations patch:

 * Are we happy yet with the collation assignment behavior (see
 parse_collate.c)?  A couple of specific subtopics:

 ** Selecting a field from a record-returning function's output.
 Currently, we'll use the field's declared collation; except that
 if the field has default collation, we'll replace that with the common
 collation of the function's inputs, if any.  Is either part of that
 sane?  Do we need to make this work for functions invoked with other
 syntax than a plain function call, eg operator or cast syntax?

 ** What to do with domains whose declaration includes a COLLATE clause?
 Currently, we'll impute that collation to the result of a cast to the
 domain type --- even if the cast's input expression includes an
 explicit COLLATE clause.  It's not clear that that's per spec.  If it
 is correct, should we behave similarly for functions that are declared
 to return a domain type?  Should it matter if the cast-to-domain is
 explicit or implicit?  Perhaps it'd be best if domain collations only
 mattered for columns declared with that domain type.  Then we'd have
 a general rule that collations only come into play in an expression
 as a result of (a) the declared type of a column reference or (b)
 an explicit COLLATE clause.


 * In plpgsql, is it OK for declared local variables to inherit the
 function's input collation?  Should we provide a COLLATE option in
 variable declarations to let that be overridden?  If Oracle understands
 COLLATE, probably we should look at what they do in PL/SQL.

 * RI triggers should insert COLLATE clauses in generated queries to
 satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the
 referenced column's collation.  Right now you may get either table's
 collation depending on which query type is involved.  I think an obvious
 failure may not be possible so long as equality means the same thing in
 all collations, but it's definitely possible that the planner might
 decide it can't use the referenced column's unique index, which would
 suck for performance.  (Note: this rule seems to prove that the
 committee assumes equality can mean different things in different
 collations, else they'd not have felt the need to specify.)

 * It'd sure be nice if we had some nontrivial test cases that work in
 encodings besides UTF8.  I'm still bothered that the committed patch
 failed to cover single-byte-encoding cases in upper/lower/initcap.

 * Remove initdb's warning about useless locales?  Seems like pointless
 noise, or at least something that can be relegated to debug mode.

 * Is it worth adding a cares-about-collation flag to pg_proc?  Probably
 too late to be worrying about such refinements for 9.1.

 There are a bunch of other minor issues that I'm still working through,
 but these are the ones that seem to merit discussion.

That's a long list and I think it's clear that we won't resolve all of
those issues to everybody's satisfaction in a single release, let
alone in next week or so. We need a way forwards.

What I think we should do is add detailed documentation on how it
works now. There are many people that would love to help, but not
everybody can visualise exactly the points you are making above, I
would confess that I can't. Having docs that clearly explain a neat
new capability and the various possible gotcha/caveats will help
others come up with test cases and ideas.

It seems to me likely that in real usage many of those gotchas will
drop away because they represent unlikely or perverse use cases.

I don't see anything bad in releasing software that has unresolved
questions, as long as those items are clearly flagged up and we
specifically ask for feedback on them. That looks to me to be a
many-eyeballs approach to the problem.
http://en.wikipedia.org/wiki/Linus%27_Law

Tucking our shoelaces into our shoes doesn't mean the loose ends are
resolved fully.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-03-26 Thread Greg Stark

On Sat, Mar 26, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 ** Selecting a field from a record-returning function's output.
 Currently, we'll use the field's declared collation; except that
 if the field has default collation, we'll replace that with the common
 collation of the function's inputs, if any.  Is either part of that
 sane?  Do we need to make this work for functions invoked with other
 syntax than a plain function call, eg operator or cast syntax?

Either of those sounds reasonable but the combination seems weird.
Some example functions might help:

list_words('foo bar bar') - ('foo'), ('bar'), ('baz')
fetch_users_by_lastname('Smith') - (1,'John','Smith','Great
Britain','GB'), (2,'Jürgen','Smith','DE')
fetch_users_by_countrycode('DE') - (2,'Jürgen','Smith','DE')

The first looks like it should definitely inherit. The second the
result set is heterogeneous and inheriting might be the best
compromise but it would produce very strange results for columns like
the country-code which should just use their defined collation of C.
The third case inheriting the country code's collation of C would be
very strange and definitely wrong.

It occurs to me that if we have any inherited cases people might come
to depend on that behaviour and there would be no out for us. Whereas
if we say record return values always use the record type's field's
collations then we could always later add a collation of type
_inherited or _anycollation or some such that indicated that that
column should inherit the arguments' collation and it wouldn't affect
any existing code.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-03-26 Thread Robert Haas

On Mar 26, 2011, at 12:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 ** Selecting a field from a record-returning function's output.
 Currently, we'll use the field's declared collation; except that
 if the field has default collation, we'll replace that with the common
 collation of the function's inputs, if any.  Is either part of that
 sane?  Do we need to make this work for functions invoked with other
 syntax than a plain function call, eg operator or cast syntax?

I am not an expert on this topic in any way. That having been said, the first 
part of that rule seems quite sane. The second part seems less clear, but 
probably also sane.

 ** What to do with domains whose declaration includes a COLLATE clause?
 Currently, we'll impute that collation to the result of a cast to the
 domain type --- even if the cast's input expression includes an
 explicit COLLATE clause.  

I would have thought that an explicit COLLATE clause would trump any action at 
a distance.

 * In plpgsql, is it OK for declared local variables to inherit the
 function's input collation?  Should we provide a COLLATE option in
 variable declarations to let that be overridden?  If Oracle understands
 COLLATE, probably we should look at what they do in PL/SQL.

I don't know what Oracle does, but a collate option in variable declarations 
seems like a very good idea.  Inheriting the input collation if not specified 
seems good too. I also suspect we might need something like COLLATE FROM $1, 
but maybe that's a 9.2 feature.

 * RI triggers should insert COLLATE clauses in generated queries to
 satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the
 referenced column's collation.  Right now you may get either table's
 collation depending on which query type is involved.  I think an obvious
 failure may not be possible so long as equality means the same thing in
 all collations, but it's definitely possible that the planner might
 decide it can't use the referenced column's unique index, which would
 suck for performance.  (Note: this rule seems to prove that the
 committee assumes equality can mean different things in different
 collations, else they'd not have felt the need to specify.)

No idea what to do about this.

 * It'd sure be nice if we had some nontrivial test cases that work in
 encodings besides UTF8.  I'm still bothered that the committed patch
 failed to cover single-byte-encoding cases in upper/lower/initcap.

Or this.

 * Remove initdb's warning about useless locales?  Seems like pointless
 noise, or at least something that can be relegated to debug mode.

+1.

 * Is it worth adding a cares-about-collation flag to pg_proc?  Probably
 too late to be worrying about such refinements for 9.1.

Depends how much knock-on work it'll create.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-03-26 Thread Greg Stark

On Sat, Mar 26, 2011 at 3:16 PM, Robert Haas robertmh...@gmail.com wrote:
 ** What to do with domains whose declaration includes a COLLATE clause?
 Currently, we'll impute that collation to the result of a cast to the
 domain type --- even if the cast's input expression includes an
 explicit COLLATE clause.

 I would have thought that an explicit COLLATE clause would trump any action 
 at a distance.


I think an explicit COLLATE *outside* the cast would. But inside the
cast? The question comes down to whether a domain with a collate
clause is explicitly providing a collation or implicitly.

So again, examples:

CREATE DOMAIN name AS text COLLATE english;
CREATE DOMAIN countrycode AS char(2) COLLATE C;


1) SELECT * from users where country = 'DE' order by first_name COLLATE german;

2) SELECT * from users where country = 'DE' order by
(tolower(first_name) COLLATE german)::name;

3) SELECT * from users order by substr(address,1,2)::countrycode COLLATE english

4) SELECT * from users order by (substr(address,1,2) COLLATE
english)::countrycode

The ones with the collation expressions inside the casts seem very
strange and the behaviour following the domain don't seem
unreasonable. The behaviour with the collate clauses outside the cast
should definitely be follow the explicit collate clause.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-03-26 Thread Robert Haas

On Mar 26, 2011, at 12:34 PM, Greg Stark gsst...@mit.edu wrote:
 The ones with the collation expressions inside the casts seem very
 strange and the behaviour following the domain don't seem
 unreasonable. The behaviour with the collate clauses outside the cast
 should definitely be follow the explicit collate clause.

+1.  That's exactly what I was trying to say; thanks for saying it better.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

2011-03-26 Thread Martijn van Oosterhout

On Sat, Mar 26, 2011 at 12:36:43AM -0400, Tom Lane wrote:
 ** Selecting a field from a record-returning function's output.
 Currently, we'll use the field's declared collation; except that
 if the field has default collation, we'll replace that with the common
 collation of the function's inputs, if any.  Is either part of that
 sane?  Do we need to make this work for functions invoked with other
 syntax than a plain function call, eg operator or cast syntax?

That seems all a bit weird. I spent some time reading through the SQL
spec to see if I could came up with a few ideas about what they thought
relevent. I think the gist of it is that I think the result row should
have for each column its declared collation in all cases.

Firstly, the SQL doesn't go into the general case where the collate
result of a function is based in it inputs. But in any case, if the
function returns a record, the whole record would have that collation,
which is absurd. I think letting it go to the contained columns is just
weird.

Secondly, I think the derivation algorithm is for determing the
collation of expressions which have no otherwise declared collation.
Anything returning a predefined record type has a predefined collation
and it should be used. If you're in a query referring to rowvar.field
and rowvar has a type, that's what should be used.

(No doubt you have some corner cases in mind?)

As for operators, they should behave like functions wherever possible,
otherwise it's just introducing unnecessary differences.

The cast-case is related to below.

 ** What to do with domains whose declaration includes a COLLATE clause?
 Currently, we'll impute that collation to the result of a cast to the
 domain type --- even if the cast's input expression includes an
 explicit COLLATE clause.  It's not clear that that's per spec.  If it
 is correct, should we behave similarly for functions that are declared
 to return a domain type?  Should it matter if the cast-to-domain is
 explicit or implicit?  Perhaps it'd be best if domain collations only
 mattered for columns declared with that domain type.  Then we'd have
 a general rule that collations only come into play in an expression
 as a result of (a) the declared type of a column reference or (b)
 an explicit COLLATE clause.

The SQL spec considers the collation to be part of the datatype, so if
you're casting to a domain (or type) you get the collation associated
with that domain (or type). As per the spec:

The collation derivation of a declared type with a declared type
collation that is explicitly or implicitly specified by a data type
is implicit.

So the result of a cast would be the collation of the specified
type/domain with state implicit.

Also, apparently the COLLATE clause as allowed anywhere where a
datatype is permitted. So you can say:

CAST( foo AS TEXT COLLATE en_US )

Not sure if that works now. The result would be implicit state, as
opposed to if the COLLATE clause appears elsewhere.

Incidently, a function returning a domain seems weird to me. What does
it mean: (1) the function returns this type, Postgres assumes this is
true, or (2) function returns something, Postgres does an implicit
cast?

In any case, I'd suggest it is treated as being included in the
resolving of the return collation with the arguments so if the result
is a domain and you apply the normal rules you get:

(1) explicit states in the arguments will override it
(2) if arguments are implicit state and conflict with domain, the
result is no-collation, otherwise implicitly whatever the domain was
(3) no arguments have collation, which means you get the domain
default.

Which all seems eminently reasonable.

So I'd agree with your rules, but add a case (c) result of a cast.

 * In plpgsql, is it OK for declared local variables to inherit the
 function's input collation?  Should we provide a COLLATE option in
 variable declarations to let that be overridden?  If Oracle understands
 COLLATE, probably we should look at what they do in PL/SQL.

If COLLATE is allowed anywhere where the datatype is allowed, then the
COLLATE clause should be permitted there. Otherwise they become the
specified type with whatever the default is for that type. In
expressions the coercible-default state will get overridden
by the IMPLICIT state from the arguments as appropriate.

I note I'm using the term coercible default here, because that's what
Transact-SQL calls the state for any variable or value that's not a column
reference. I'm just checking and don't see any support for it in the
SQL standard. While it seemed to me to be extremely useful, since it
allows column references to override literals.

 * RI triggers should insert COLLATE clauses in generated queries to
 satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the
 referenced column's collation.  Right now you may get either table's
 collation depending on which query type is involved.  I think an obvious
 failure may not be possible so long as

[HACKERS] Open issues for collations

2011-03-25 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 I think some discussion of which of the things on the open
 item lists need to be done before beta might be helpful, and we ought
 to add any items that are not there but are blockers.

Here's a quick enumeration of some things I think need discussion about
the collations patch:

* Are we happy yet with the collation assignment behavior (see
parse_collate.c)?  A couple of specific subtopics:

** Selecting a field from a record-returning function's output.
Currently, we'll use the field's declared collation; except that
if the field has default collation, we'll replace that with the common
collation of the function's inputs, if any.  Is either part of that
sane?  Do we need to make this work for functions invoked with other
syntax than a plain function call, eg operator or cast syntax?

** What to do with domains whose declaration includes a COLLATE clause?
Currently, we'll impute that collation to the result of a cast to the
domain type --- even if the cast's input expression includes an
explicit COLLATE clause.  It's not clear that that's per spec.  If it
is correct, should we behave similarly for functions that are declared
to return a domain type?  Should it matter if the cast-to-domain is
explicit or implicit?  Perhaps it'd be best if domain collations only
mattered for columns declared with that domain type.  Then we'd have
a general rule that collations only come into play in an expression
as a result of (a) the declared type of a column reference or (b)
an explicit COLLATE clause.


* In plpgsql, is it OK for declared local variables to inherit the
function's input collation?  Should we provide a COLLATE option in
variable declarations to let that be overridden?  If Oracle understands
COLLATE, probably we should look at what they do in PL/SQL.

* RI triggers should insert COLLATE clauses in generated queries to
satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the
referenced column's collation.  Right now you may get either table's
collation depending on which query type is involved.  I think an obvious
failure may not be possible so long as equality means the same thing in
all collations, but it's definitely possible that the planner might
decide it can't use the referenced column's unique index, which would
suck for performance.  (Note: this rule seems to prove that the
committee assumes equality can mean different things in different
collations, else they'd not have felt the need to specify.)

* It'd sure be nice if we had some nontrivial test cases that work in
encodings besides UTF8.  I'm still bothered that the committed patch
failed to cover single-byte-encoding cases in upper/lower/initcap.

* Remove initdb's warning about useless locales?  Seems like pointless
noise, or at least something that can be relegated to debug mode.

* Is it worth adding a cares-about-collation flag to pg_proc?  Probably
too late to be worrying about such refinements for 9.1.

There are a bunch of other minor issues that I'm still working through,
but these are the ones that seem to merit discussion.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

Re: [HACKERS] Open issues for collations

[HACKERS] Open issues for collations

25 matches

Site Navigation

Mail list logo

Footer information