At 09:20 PM 8/24/2004 +0200, Peter Eisentraut wrote:
David Wheeler wrote:
That's not the trouble so much as that the locales can be badly
If we always followed the principle X could be broken, so let's not use
X, then we would never get anything done. Instead, X is broken, so
fix it.
broken,
David Wheeler wrote:
But given what you've said, Tatsuo, it makes me wonder if it's worth
it to use the system locale default when running initdb?
Yes, because that is the locale that the user prefers. If a locale is
broken then you shouldn't set it as system locale in the first place.
--
On Aug 23, 2004, at 10:25 PM, Joel wrote:
If the locale machinery iw functioning correctly (and if I understand
correctly), there ought to be a setting that would allow those to
collate to the same point.
Bleh. There must be some distinction between them. It sounds like
querying for synonyms.
On Aug 24, 2004, at 12:20 PM, Peter Eisentraut wrote:
broken, and that they're useless for multilingual use.
I don't agree with that, but perhaps we differ in our interpretation of
multilingual use. If you have special requirements, you can always
turn the locales off.
Well, we're getting beyond
David Wheeler [EMAIL PROTECTED] writes:
Hmm. I tried putting your string into a UNICODE database and I got
ERROR: invalid byte sequence for encoding UNICODE: 0xc7
Really? Curious.
Oh, are you sure that you got my UTF-8 data? Because it came back in
your reply all mangled.
I
On Aug 23, 2004, at 3:46 PM, Markus Bertheau wrote:
The collation rules of your (and my) locale say that these strings are
the same:
[EMAIL PROTECTED] markus]$ cat t
[EMAIL PROTECTED] markus]$ uniq t
[EMAIL PROTECTED] markus]$
Interesting.
Make sure that you have initdb'd the database under
David Wheeler [EMAIL PROTECTED] writes:
But is it possible to store non-UTF-8 data in a UNICODE database?
In theory not ... but I think there was a discussion earlier that
concluded that our check for encoding validity is not airtight ...
regards, tom lane
On Aug 23, 2004, at 3:59 PM, Tom Lane wrote:
But is it possible to store non-UTF-8 data in a UNICODE database?
In theory not ... but I think there was a discussion earlier that
concluded that our check for encoding validity is not airtight ...
Well, it it was mostly right, I wouldn't expect it to
David Wheeler [EMAIL PROTECTED] writes:
Is the encoding check fixed in 8.0beta1?
[ looks back at discussion... ] Actually I misremembered --- the
discussion was about how we would *reject* legal UTF-8 codes that are
more than 2 bytes long. So the code is broken, but not in the direction
that
On Aug 23, 2004, at 4:08 PM, Tom Lane wrote:
[ looks back at discussion... ] Actually I misremembered --- the
discussion was about how we would *reject* legal UTF-8 codes that are
more than 2 bytes long. So the code is broken, but not in the
direction
that would cause your problem. Time for
On Tue, 24 Aug 2004 00:46:50 +0200, Markus Bertheau
[EMAIL PROTECTED] wrote:
, 23.08.2004, 23:04, David Wheeler :
On Aug 23, 2004, at 1:58 PM, Ian Barwick wrote:
er, the characters in name don't seem to match the characters in the
query - '' vs. '' - does that have any bearing?
David Wheeler [EMAIL PROTECTED] writes:
Is the problem query using an index? If so, does REINDEX help?
Doesn't look like it:
bric=3D# reindex index udx_keyword__name;
REINDEX
bric=3D# select * from keyword where name =3D'=BA=CF=C7=D1=C0=C7';
id | name | screen_name | sort_name |
On Aug 23, 2004, at 4:35 PM, Tom Lane wrote:
Hmm. I tried putting your string into a UNICODE database and I got
ERROR: invalid byte sequence for encoding UNICODE: 0xc7
Really? Curious.
So there's something funny happening here. What is your
client_encoding
setting?
It's not set. I've had it
On Aug 23, 2004, at 4:34 PM, Ian Barwick wrote:
wild speculation in need of a Korean speaker, but:
[EMAIL PROTECTED]:~/tmp cat j.txt
[EMAIL PROTECTED]:~/tmp uniq j.txt
All but the first and last lines are random Korean (Hangul)
characters. Evidently our respective locales think all
On Aug 23, 2004, at 4:49 PM, David Wheeler wrote:
Hmm. I tried putting your string into a UNICODE database and I got
ERROR: invalid byte sequence for encoding UNICODE: 0xc7
Really? Curious.
Oh, are you sure that you got my UTF-8 data? Because it came back in
your reply all mangled.
Cheers,
On Mon, 23 Aug 2004 16:50:04 -0700, David Wheeler [EMAIL PROTECTED] wrote:
On Aug 23, 2004, at 4:34 PM, Ian Barwick wrote:
wild speculation in need of a Korean speaker, but:
[EMAIL PROTECTED]:~/tmp cat j.txt
[EMAIL PROTECTED]:~/tmp uniq j.txt
All but
On Aug 23, 2004, at 5:07 PM, Ian Barwick wrote:
Does this go away if you change your locale to C?
Yes.
Hallelujah! I'm running initdb again now.
Cheers,
David
smime.p7s
Description: S/MIME cryptographic signature
, 23.08.2004, 23:04, David Wheeler :
On Aug 23, 2004, at 1:58 PM, Ian Barwick wrote:
er, the characters in name don't seem to match the characters in the
query - '' vs. '' - does that have any bearing?
Yes, it means that = is doing the wrong thing!!
The collation
On Aug 23, 2004, at 5:22 PM, Tatsuo Ishii wrote:
Locales for multibyte encodings are often broken on many platforms. I
see identical things with Japanese on Red Hat. This is one of the
reason why I tell Japanese PostgreSQL users not to enable locale while
initdb...
Yep, and exporting my data,
Tom Lane wrote:
David Wheeler [EMAIL PROTECTED] writes:
bric=3D# reindex index udx_keyword__name;
REINDEX
bric=3D# select * from keyword where name =3D'=BA=CF=C7=D1=C0=C7';
id | name | screen_name | sort_name | active
--++-+---+
1218 |
On Aug 23, 2004, at 6:49 PM, Tim Allen wrote:
One possible clue: your original post in this thread was using
encoding euc-kr, not unicode (utf-8). If your mailer was set to use
that encoding, perhaps your other client software is/was also?
Bah! Stupid Mail.app was trying to be too smart!
Thanks,
On Tue, 24 Aug 2004 01:34:46 +0200
(BIan Barwick [EMAIL PROTECTED] wrote
(B
(B ...
(B wild speculation in need of a Korean speaker, but:
(B
(B [EMAIL PROTECTED]:~/tmp cat j.txt
(B $Bec,e$;ec(B
(B $ByyPl%$%9wd!"(B
(B $Bx"(l%$(Bl$B%i(B
(B $Bw{%1v.%/wd(B
(B
22 matches
Mail list logo