On 05/21/2012 06:59 PM, Andrew Sullivan wrote:
On Mon, May 21, 2012 at 02:44:45AM -0700, John R Pierce wrote:
support the bastardized UTF-16 'unicode' implemented by Windows NT
To be fair to Microsoft, while the BOM might be an irritant, they do
use a perfectly legitimate encoding of Unicode.
Vincas Dargis writes:
> Database created using:
> initdb -D ../data -E utf-8 -U postgres
That looks fairly dangerous, as it will absorb the database's locale
settings (particularly LC_CTYPE, which is what you care about for these
operations) from your shell environment. If the environment locale
I've forgot to mention I'm working on Windows XP SP3
Yes, we are using UTF8 encoding and regexp works wrong. It looks like
you replicated that.
2012/5/21 Albe Laurenz :
>
> I tried it with 9.1.3 on Linux:
>
> upper() and lower() works fine, no matter what the
> database encoding is:
>
> test=> SE
Vincas Dargis wrote:
> We have problems (currently using 8.4, but also in latest 9.1.3) in
> our application with Unicode word symbols in Lithuanian ('ąčęėįšųūž'),
> Russian and of course potentially other languages.
>
> For example, regex_replace('acząčž', E'\\W', '', 'g') removes ąčž.
>
> lower
Sorry I have to write "manual" replay since I've messed up mailing
list settings (got "Partial Digest"...).
John R Pierce wrote:
> your database encoding is UTF8 ? the language or environment you're using to
> generate those strings such as 'acząčž' is also UTF8 ?
Database created using:
initdb
On Mon, May 21, 2012 at 02:44:45AM -0700, John R Pierce wrote:
> support the bastardized UTF-16 'unicode' implemented by Windows NT
To be fair to Microsoft, while the BOM might be an irritant, they do
use a perfectly legitimate encoding of Unicode. There is no Unicode
requirement that code points
On 05/21/12 2:09 AM, Vincas Dargis wrote:
We have problems (currently using 8.4, but also in latest 9.1.3) in
our application with Unicode word symbols in Lithuanian ('ąčęėįšųūž'),
Russian and of course potentially other languages.
For example, regex_replace('acząčž', E'\\W', '', 'g') removes ąč
Hello,
We have problems (currently using 8.4, but also in latest 9.1.3) in
our application with Unicode word symbols in Lithuanian ('ąčęėįšųūž'),
Russian and of course potentially other languages.
For example, regex_replace('acząčž', E'\\W', '', 'g') removes ąčž.
lower() and ~* comparison works