Tatsuo Ishii writes:
> BTW, same characters are assigned different code points are pretty
> common in many character sets (Unicode, for example).
This is widely considered a security bug; read section 10 in RFC 3629 (the
definition of UTF8), and search the CVE database a bit if you still doubt
it
> Tatsuo Ishii writes:
>>> MULE is completely evil.
>>> It has N different encodings for the same character,
>
>> What's wrong with that? It aims that in the first place.
>
> It greatly complicates comparisons --- at least, if you'd like to preserve
> the principle that strings that appear the s
Tatsuo Ishii writes:
>> MULE is completely evil.
>> It has N different encodings for the same character,
> What's wrong with that? It aims that in the first place.
It greatly complicates comparisons --- at least, if you'd like to preserve
the principle that strings that appear the same are equal
> MULE is completely evil.
> It has N different encodings for the same
> character,
What's wrong with that? It aims that in the first place.
> not to mention no support code available.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.c
> Isn't this essentially what the MULE internal encoding is?
No. MULE is not powerfull enough and overly complicated to deal with
different encodings (character sets).
>> Currently there's no such an universal encoding in the universe, I
>> think the only way is, inventing it by ourselves.
>
> T
Martijn van Oosterhout writes:
> On Tue, Nov 12, 2013 at 03:57:52PM +0900, Tatsuo Ishii wrote:
>> Once we implement the universal encoding, other problem such as
>> "pg_database with multiple encoding problem" can be solved easily.
> Isn't this essentially what the MULE internal encoding is?
MUL
On Tue, Nov 12, 2013 at 03:57:52PM +0900, Tatsuo Ishii wrote:
> I have been thinking about this for years and I think the key idea for
> this is, implementing "universal encoding". The universal encoding
> should have following characteristics to implement N>2 encoding in a
> database.
>
> 1) no l
On 11/12/13, 1:57 AM, Tatsuo Ishii wrote:
> Currently there's no such an universal encoding in the universe, I
> think the only way is, inventing it by ourselves.
I think ISO 2022 is something in that direction, but it's not
ASCII-safe, AFAICT.
--
Sent via pgsql-hackers mailing list (pgsql-hack
> I'd be much more impressed by seeing a road map for how we get to a
> useful amount of added functionality --- which, to my mind, would be
> the ability to support N different encodings in one database, for N>2.
> But even if you think N=2 is sufficient, we haven't got a road map, and
> commandee
"MauMau" writes:
> On the other hand, nchar is an established data type in the SQL standard. I
> think most people will expect to get "nchar" as output from psql \d and
> pg_dump as they specified in DDL.
This argument seems awfully weak. You've been able to say
create table nt (nf natio
From: "Albe Laurenz"
In a way, it is similar to using the "data type" serial. The column will be
displayed as "integer", and the information that it was a serial can
only be inferred from the DEFAULT value.
It seems that this is working fine and does not cause many problems,
so I don't see why t
From: "Albe Laurenz"
In a way, it is similar to using the "data type" serial. The column will be
displayed as "integer", and the information that it was a serial can
only be inferred from the DEFAULT value.
It seems that this is working fine and does not cause many problems,
so I don't see why t
MauMau wrote:
> Let me repeat myself: I think the biggest and immediate issue is that
> PostgreSQL does not support national character types at least officially.
> "Officially" means the description in the manual. So I don't have strong
> objection against the current (hidden) implementation of nc
From: "Robert Haas"
On Tue, Nov 5, 2013 at 5:15 PM, Peter Eisentraut wrote:
On 11/5/13, 1:04 AM, Arulappan, Arul Shaji wrote:
Implements NCHAR/NVARCHAR as distinct data types, not as synonyms
If, per SQL standard, NCHAR(x) is equivalent to CHAR(x) CHARACTER SET
"cs", then for some "cs", NCH
On Tue, Nov 5, 2013 at 5:15 PM, Peter Eisentraut wrote:
> On 11/5/13, 1:04 AM, Arulappan, Arul Shaji wrote:
>> Implements NCHAR/NVARCHAR as distinct data types, not as synonyms
>
> If, per SQL standard, NCHAR(x) is equivalent to CHAR(x) CHARACTER SET
> "cs", then for some "cs", NCHAR(x) must be th
On 11/5/13, 1:04 AM, Arulappan, Arul Shaji wrote:
> Implements NCHAR/NVARCHAR as distinct data types, not as synonyms
If, per SQL standard, NCHAR(x) is equivalent to CHAR(x) CHARACTER SET
"cs", then for some "cs", NCHAR(x) must be the same as CHAR(x).
Therefore, an implementation as separate data
From: "Albe Laurenz"
I looked into the Standard, and it does not have NVARCHAR.
The type is called NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING
or NCHAR VARYING.
OUch, that's just a mistake in my mail. You are correct.
> I guess that the goal of this patch is to support Oracle syntax.
MauMau wrote:
> From: "Albe Laurenz"
>> If I understood the discussion correctly the use case is that
>> there are advantages to having a database encoding different
>> from UTF-8, but you'd still want sume UTF-8 columns.
>>
>> Wouldn't it be a better design to allow specifying the encoding
>> per
From: "Albe Laurenz"
If I understood the discussion correctly the use case is that
there are advantages to having a database encoding different
from UTF-8, but you'd still want sume UTF-8 columns.
Wouldn't it be a better design to allow specifying the encoding
per column? That would give you m
Arul Shaji Arulappan wrote:
> Attached is a patch that implements the first set of changes discussed
> in this thread originally. They are:
>
> (i) Implements NCHAR/NVARCHAR as distinct data types, not as synonyms so
> that:
> - psql \d can display the user-specified data types.
> - pg
From: "Greg Stark"
If it's not lossy then what's the point? From the client's point of view
it'll be functionally equivalent to text then.
Sorry, what Tatsuo san suggested meant was "same or compatible", not lossy.
I quote the relevant part below. This is enough for the use case I
mentioned
From: "Peter Eisentraut"
On Tue, 2013-09-24 at 21:04 +0900, MauMau wrote:
"4. I guess some users really want to continue to use ShiftJIS or EUC_JP
for
database encoding, and use NCHAR for a limited set of columns to store
international text in Unicode:
- to avoid code conversion between the se
On Tue, 2013-09-24 at 21:04 +0900, MauMau wrote:
> "4. I guess some users really want to continue to use ShiftJIS or EUC_JP for
> database encoding, and use NCHAR for a limited set of columns to store
> international text in Unicode:
> - to avoid code conversion between the server and the client fo
From: "Peter Eisentraut"
That assumes that the conversion client encoding -> server encoding ->
NCHAR encoding is not lossy.
Yes, so Tatsuo san suggested to restrict server encoding <-> NCHAR encoding
combination to those with lossless conversion.
I thought one main point of this exercise
From: "Robert Haas"
Sure, it's EnterpriseDB's policy to add features that facilitate
migrations from other databases - particularly Oracle - to our
product, Advanced Server, even if those features don't otherwise add
any value. However, the community is usually reluctant to add such
features to
On 9/23/13 2:53 AM, MauMau wrote:
> Yes, I believe you are right. Regardless of whether we support multiple
> encodings in one database or not, a single client encoding will be
> sufficient for one session. When receiving the "Q" message, the whole
> SQL text is converted from the client encoding
On Fri, Sep 20, 2013 at 8:32 PM, MauMau wrote:
>> I don't think that you'll be able to
>> get consensus around that path on this mailing list.
>> I agree that the fact we have both varchar and text feels like a wart.
>
> Is that right? I don't feel varchar/text case is a wart. I think text was
>
From: "Tatsuo Ishii"
I don't think the bind placeholder is the case. That is processed by
exec_bind_message() in postgres.c. It has enough info about the type
of the placeholder, and I think we can easily deal with NCHAR. Same
thing can be said to COPY case.
Yes, I've learned it. Agreed. If
>
>
> PostgreSQL has a very powerful possibilities for storing any kind of
>> encoding. So maybe it makes sense to add the ENCODING as another column
>> property, the same way a COLLATION was added?
>>
>
> Some other people in this community suggested that. ANd the SQL standard
> suggests the sam
> I think the point here is that, at least as I understand it, encoding
> conversion and sanitization happens at a very early stage right now,
> when we first receive the input from the client. If the user sends a
> string of bytes as part of a query or bind placeholder that's not
> valid in the da
From: "Robert Haas"
On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii
wrote:
What about limiting to use NCHAR with a database which has same
encoding or "compatible" encoding (on which the encoding conversion is
defined)? This way, NCHAR text can be automatically converted from
NCHAR to the databa
From: "Robert Haas"
I don't think that you'll be able to
get consensus around that path on this mailing list.
I agree that the fact we have both varchar and text feels like a wart.
Is that right? I don't feel varchar/text case is a wart. I think text was
introduced for a positive reason
From: "Martijn van Oosterhout"
As far as I can tell the whole reason for introducing NCHAR is to
support SHIFT-JIS, there hasn't been call for any other encodings, that
I can remember anyway.
Could you elaborate on this, giving some info sources?
So rather than this whole NCHAR thing, why n
From: "Valentine Gogichashvili"
the whole NCHAR appeared as hack for the systems, that did not have it
from
the beginning. It would not be needed, if all the text would be magically
stored in UNICODE or UTF from the beginning and idea of character would be
the same as an idea of a rune and not
From: "Tatsuo Ishii"
What about limiting to use NCHAR with a database which has same
encoding or "compatible" encoding (on which the encoding conversion is
defined)? This way, NCHAR text can be automatically converted from
NCHAR to the database encoding in the server side thus we can treat
NCHAR
On 9/20/13 2:22 PM, Robert Haas wrote:
>>> I am not keen to introduce support for nchar and nvarchar as
>>> >> differently-named types with identical semantics.
>> >
>> > Similar examples already exist:
>> >
>> > - varchar and text: the only difference is the existence of explicit length
>> > limit
On Thu, Sep 19, 2013 at 6:42 PM, MauMau wrote:
> National character types support may be important to some potential users of
> PostgreSQL and the popularity of PostgreSQL, not me. That's why national
> character support is listed in the PostgreSQL TODO wiki. We might be losing
> potential users
On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii wrote:
> What about limiting to use NCHAR with a database which has same
> encoding or "compatible" encoding (on which the encoding conversion is
> defined)? This way, NCHAR text can be automatically converted from
> NCHAR to the database encoding in t
On Fri, Sep 20, 2013 at 08:58:53AM +0900, Tatsuo Ishii wrote:
> For example, "CREATE TABLE t1(t NCHAR(10))" will succeed if NCHAR is
> UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is
> SHIFT-JIS and database encoding is UTF-8 because there is a conversion
> between UTF-8 and SHIFT-JI
Hi,
> That may be what's important to you, but it's not what's important to
>> me.
>>
>
> National character types support may be important to some potential users
> of PostgreSQL and the popularity of PostgreSQL, not me. That's why
> national character support is listed in the PostgreSQL TODO
> On Mon, Sep 16, 2013 at 8:49 AM, MauMau wrote:
>> 2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always
>> contain Unicode data.
> ...
>> 3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns.
>> Fixed-width encoding may allow faster string manipulation as described in
From: "Robert Haas"
That may be what's important to you, but it's not what's important to
me.
National character types support may be important to some potential users of
PostgreSQL and the popularity of PostgreSQL, not me. That's why national
character support is listed in the PostgreSQL T
On Wed, Sep 18, 2013 at 6:42 PM, MauMau wrote:
>> It seems to me that these two points here are the real core of your
>> proposal. The rest is just syntactic sugar.
>
> No, those are "desirable if possible" features. What's important is to
> declare in the manual that PostgreSQL officially suppo
From: "Tom Lane"
Another point to keep in mind is that UTF16 is not really any easier
to deal with than UTF8, unless you write code that fails to support
characters outside the basic multilingual plane. Which is a restriction
I don't believe we'd accept. But without that restriction, you're st
From: "Robert Haas"
On Mon, Sep 16, 2013 at 8:49 AM, MauMau wrote:
2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always
contain Unicode data.
...
3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns.
Fixed-width encoding may allow faster string manipulation as des
On 18.09.2013 16:16, Robert Haas wrote:
On Mon, Sep 16, 2013 at 8:49 AM, MauMau wrote:
2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always
contain Unicode data.
...
3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns.
Fixed-width encoding may allow faster string
Robert Haas writes:
> On Mon, Sep 16, 2013 at 8:49 AM, MauMau wrote:
>> 2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always
>> contain Unicode data.
>> ...
>> 3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns.
>> Fixed-width encoding may allow faster string manipul
On Mon, Sep 16, 2013 at 8:49 AM, MauMau wrote:
> 2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always
> contain Unicode data.
...
> 3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns.
> Fixed-width encoding may allow faster string manipulation as described in
> Oracle
>-Original Message-
>From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers-
>ow...@postgresql.org] On Behalf Of MauMau
>
>Hello,
>
>I think it would be nice for PostgreSQL to support national character
types
>largely because it should ease migration from other DBMSs.
>
>[Reasons
Hello,
I think it would be nice for PostgreSQL to support national character types
largely because it should ease migration from other DBMSs.
[Reasons why we need NCHAR]
--
1. Invite users of other DBMSs to PostgreSQL. Oracle, SQL Server, MySQL,
"Boguk, Maksym" writes:
> Hi, my task is implementing ANSI NATIONAL character string types as
> part of PostgreSQL core.
No, that's not a given. You have a problem to solve, ie store some UTF8
strings in a database that's mostly just 1-byte data. It is not clear
that NATIONAL CHARACTER is the
>> 1)Addition of new string data types NATIONAL CHARACTER and NATIONAL
>> CHARACTER VARIABLE.
>> These types differ from the char/varchar data types in one important
>> respect: NATIONAL string types are always have UTF8 encoding even
>> (independent from used database encoding).
>I don't like
Heikki Linnakangas writes:
> On 03.09.2013 05:28, Boguk, Maksym wrote:
>> Target usage: ability to store UTF8 national characters in some
>> selected fields inside a single-byte encoded database.
> I think we should take a completely different approach to this. Two
> alternatives spring to mind
On 03.09.2013 05:28, Boguk, Maksym wrote:
Target usage: ability to store UTF8 national characters in some
selected fields inside a single-byte encoded database.
For sample if I have a ru-RU.koi8r encoded database with mostly Russian
text inside, it would be nice to be able store an Japanese tex
54 matches
Mail list logo