Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-13 Thread Tom Lane
Tatsuo Ishii writes: > BTW, same characters are assigned different code points are pretty > common in many character sets (Unicode, for example). This is widely considered a security bug; read section 10 in RFC 3629 (the definition of UTF8), and search the CVE database a bit if you still doubt it

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-13 Thread Tatsuo Ishii
> Tatsuo Ishii writes: >>> MULE is completely evil. >>> It has N different encodings for the same character, > >> What's wrong with that? It aims that in the first place. > > It greatly complicates comparisons --- at least, if you'd like to preserve > the principle that strings that appear the s

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-13 Thread Tom Lane
Tatsuo Ishii writes: >> MULE is completely evil. >> It has N different encodings for the same character, > What's wrong with that? It aims that in the first place. It greatly complicates comparisons --- at least, if you'd like to preserve the principle that strings that appear the same are equal

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-13 Thread Tatsuo Ishii
> MULE is completely evil. > It has N different encodings for the same > character, What's wrong with that? It aims that in the first place. > not to mention no support code available. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.c

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-13 Thread Tatsuo Ishii
> Isn't this essentially what the MULE internal encoding is? No. MULE is not powerfull enough and overly complicated to deal with different encodings (character sets). >> Currently there's no such an universal encoding in the universe, I >> think the only way is, inventing it by ourselves. > > T

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-13 Thread Tom Lane
Martijn van Oosterhout writes: > On Tue, Nov 12, 2013 at 03:57:52PM +0900, Tatsuo Ishii wrote: >> Once we implement the universal encoding, other problem such as >> "pg_database with multiple encoding problem" can be solved easily. > Isn't this essentially what the MULE internal encoding is? MUL

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-13 Thread Martijn van Oosterhout
On Tue, Nov 12, 2013 at 03:57:52PM +0900, Tatsuo Ishii wrote: > I have been thinking about this for years and I think the key idea for > this is, implementing "universal encoding". The universal encoding > should have following characteristics to implement N>2 encoding in a > database. > > 1) no l

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-13 Thread Peter Eisentraut
On 11/12/13, 1:57 AM, Tatsuo Ishii wrote: > Currently there's no such an universal encoding in the universe, I > think the only way is, inventing it by ourselves. I think ISO 2022 is something in that direction, but it's not ASCII-safe, AFAICT. -- Sent via pgsql-hackers mailing list (pgsql-hack

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-11 Thread Tatsuo Ishii
> I'd be much more impressed by seeing a road map for how we get to a > useful amount of added functionality --- which, to my mind, would be > the ability to support N different encodings in one database, for N>2. > But even if you think N=2 is sufficient, we haven't got a road map, and > commandee

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-10 Thread Tom Lane
"MauMau" writes: > On the other hand, nchar is an established data type in the SQL standard. I > think most people will expect to get "nchar" as output from psql \d and > pg_dump as they specified in DDL. This argument seems awfully weak. You've been able to say create table nt (nf natio

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-10 Thread MauMau
From: "Albe Laurenz" In a way, it is similar to using the "data type" serial. The column will be displayed as "integer", and the information that it was a serial can only be inferred from the DEFAULT value. It seems that this is working fine and does not cause many problems, so I don't see why t

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-10 Thread MauMau
From: "Albe Laurenz" In a way, it is similar to using the "data type" serial. The column will be displayed as "integer", and the information that it was a serial can only be inferred from the DEFAULT value. It seems that this is working fine and does not cause many problems, so I don't see why t

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-09 Thread Albe Laurenz
MauMau wrote: > Let me repeat myself: I think the biggest and immediate issue is that > PostgreSQL does not support national character types at least officially. > "Officially" means the description in the manual. So I don't have strong > objection against the current (hidden) implementation of nc

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-08 Thread MauMau
From: "Robert Haas" On Tue, Nov 5, 2013 at 5:15 PM, Peter Eisentraut wrote: On 11/5/13, 1:04 AM, Arulappan, Arul Shaji wrote: Implements NCHAR/NVARCHAR as distinct data types, not as synonyms If, per SQL standard, NCHAR(x) is equivalent to CHAR(x) CHARACTER SET "cs", then for some "cs", NCH

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-08 Thread Robert Haas
On Tue, Nov 5, 2013 at 5:15 PM, Peter Eisentraut wrote: > On 11/5/13, 1:04 AM, Arulappan, Arul Shaji wrote: >> Implements NCHAR/NVARCHAR as distinct data types, not as synonyms > > If, per SQL standard, NCHAR(x) is equivalent to CHAR(x) CHARACTER SET > "cs", then for some "cs", NCHAR(x) must be th

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-05 Thread Peter Eisentraut
On 11/5/13, 1:04 AM, Arulappan, Arul Shaji wrote: > Implements NCHAR/NVARCHAR as distinct data types, not as synonyms If, per SQL standard, NCHAR(x) is equivalent to CHAR(x) CHARACTER SET "cs", then for some "cs", NCHAR(x) must be the same as CHAR(x). Therefore, an implementation as separate data

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-05 Thread MauMau
From: "Albe Laurenz" I looked into the Standard, and it does not have NVARCHAR. The type is called NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING or NCHAR VARYING. OUch, that's just a mistake in my mail. You are correct. > I guess that the goal of this patch is to support Oracle syntax.

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-05 Thread Albe Laurenz
MauMau wrote: > From: "Albe Laurenz" >> If I understood the discussion correctly the use case is that >> there are advantages to having a database encoding different >> from UTF-8, but you'd still want sume UTF-8 columns. >> >> Wouldn't it be a better design to allow specifying the encoding >> per

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-05 Thread MauMau
From: "Albe Laurenz" If I understood the discussion correctly the use case is that there are advantages to having a database encoding different from UTF-8, but you'd still want sume UTF-8 columns. Wouldn't it be a better design to allow specifying the encoding per column? That would give you m

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-11-05 Thread Albe Laurenz
Arul Shaji Arulappan wrote: > Attached is a patch that implements the first set of changes discussed > in this thread originally. They are: > > (i) Implements NCHAR/NVARCHAR as distinct data types, not as synonyms so > that: > - psql \d can display the user-specified data types. > - pg

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-25 Thread MauMau
From: "Greg Stark" If it's not lossy then what's the point? From the client's point of view it'll be functionally equivalent to text then. Sorry, what Tatsuo san suggested meant was "same or compatible", not lossy. I quote the relevant part below. This is enough for the use case I mentioned

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-25 Thread MauMau
From: "Peter Eisentraut" On Tue, 2013-09-24 at 21:04 +0900, MauMau wrote: "4. I guess some users really want to continue to use ShiftJIS or EUC_JP for database encoding, and use NCHAR for a limited set of columns to store international text in Unicode: - to avoid code conversion between the se

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-24 Thread Peter Eisentraut
On Tue, 2013-09-24 at 21:04 +0900, MauMau wrote: > "4. I guess some users really want to continue to use ShiftJIS or EUC_JP for > database encoding, and use NCHAR for a limited set of columns to store > international text in Unicode: > - to avoid code conversion between the server and the client fo

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-24 Thread MauMau
From: "Peter Eisentraut" That assumes that the conversion client encoding -> server encoding -> NCHAR encoding is not lossy. Yes, so Tatsuo san suggested to restrict server encoding <-> NCHAR encoding combination to those with lossless conversion. I thought one main point of this exercise

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-23 Thread MauMau
From: "Robert Haas" Sure, it's EnterpriseDB's policy to add features that facilitate migrations from other databases - particularly Oracle - to our product, Advanced Server, even if those features don't otherwise add any value. However, the community is usually reluctant to add such features to

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-23 Thread Peter Eisentraut
On 9/23/13 2:53 AM, MauMau wrote: > Yes, I believe you are right. Regardless of whether we support multiple > encodings in one database or not, a single client encoding will be > sufficient for one session. When receiving the "Q" message, the whole > SQL text is converted from the client encoding

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-23 Thread Robert Haas
On Fri, Sep 20, 2013 at 8:32 PM, MauMau wrote: >> I don't think that you'll be able to >> get consensus around that path on this mailing list. >> I agree that the fact we have both varchar and text feels like a wart. > > Is that right? I don't feel varchar/text case is a wart. I think text was >

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-22 Thread MauMau
From: "Tatsuo Ishii" I don't think the bind placeholder is the case. That is processed by exec_bind_message() in postgres.c. It has enough info about the type of the placeholder, and I think we can easily deal with NCHAR. Same thing can be said to COPY case. Yes, I've learned it. Agreed. If

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-22 Thread Valentine Gogichashvili
> > > PostgreSQL has a very powerful possibilities for storing any kind of >> encoding. So maybe it makes sense to add the ENCODING as another column >> property, the same way a COLLATION was added? >> > > Some other people in this community suggested that. ANd the SQL standard > suggests the sam

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-21 Thread Tatsuo Ishii
> I think the point here is that, at least as I understand it, encoding > conversion and sanitization happens at a very early stage right now, > when we first receive the input from the client. If the user sends a > string of bytes as part of a query or bind placeholder that's not > valid in the da

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-20 Thread MauMau
From: "Robert Haas" On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii wrote: What about limiting to use NCHAR with a database which has same encoding or "compatible" encoding (on which the encoding conversion is defined)? This way, NCHAR text can be automatically converted from NCHAR to the databa

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-20 Thread MauMau
From: "Robert Haas" I don't think that you'll be able to get consensus around that path on this mailing list. I agree that the fact we have both varchar and text feels like a wart. Is that right? I don't feel varchar/text case is a wart. I think text was introduced for a positive reason

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-20 Thread MauMau
From: "Martijn van Oosterhout" As far as I can tell the whole reason for introducing NCHAR is to support SHIFT-JIS, there hasn't been call for any other encodings, that I can remember anyway. Could you elaborate on this, giving some info sources? So rather than this whole NCHAR thing, why n

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-20 Thread MauMau
From: "Valentine Gogichashvili" the whole NCHAR appeared as hack for the systems, that did not have it from the beginning. It would not be needed, if all the text would be magically stored in UNICODE or UTF from the beginning and idea of character would be the same as an idea of a rune and not

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-20 Thread MauMau
From: "Tatsuo Ishii" What about limiting to use NCHAR with a database which has same encoding or "compatible" encoding (on which the encoding conversion is defined)? This way, NCHAR text can be automatically converted from NCHAR to the database encoding in the server side thus we can treat NCHAR

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-20 Thread Peter Eisentraut
On 9/20/13 2:22 PM, Robert Haas wrote: >>> I am not keen to introduce support for nchar and nvarchar as >>> >> differently-named types with identical semantics. >> > >> > Similar examples already exist: >> > >> > - varchar and text: the only difference is the existence of explicit length >> > limit

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-20 Thread Robert Haas
On Thu, Sep 19, 2013 at 6:42 PM, MauMau wrote: > National character types support may be important to some potential users of > PostgreSQL and the popularity of PostgreSQL, not me. That's why national > character support is listed in the PostgreSQL TODO wiki. We might be losing > potential users

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-20 Thread Robert Haas
On Thu, Sep 19, 2013 at 7:58 PM, Tatsuo Ishii wrote: > What about limiting to use NCHAR with a database which has same > encoding or "compatible" encoding (on which the encoding conversion is > defined)? This way, NCHAR text can be automatically converted from > NCHAR to the database encoding in t

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-20 Thread Martijn van Oosterhout
On Fri, Sep 20, 2013 at 08:58:53AM +0900, Tatsuo Ishii wrote: > For example, "CREATE TABLE t1(t NCHAR(10))" will succeed if NCHAR is > UTF-8 and database encoding is UTF-8. Even succeed if NCHAR is > SHIFT-JIS and database encoding is UTF-8 because there is a conversion > between UTF-8 and SHIFT-JI

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-19 Thread Valentine Gogichashvili
Hi, > That may be what's important to you, but it's not what's important to >> me. >> > > National character types support may be important to some potential users > of PostgreSQL and the popularity of PostgreSQL, not me. That's why > national character support is listed in the PostgreSQL TODO

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-19 Thread Tatsuo Ishii
> On Mon, Sep 16, 2013 at 8:49 AM, MauMau wrote: >> 2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always >> contain Unicode data. > ... >> 3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns. >> Fixed-width encoding may allow faster string manipulation as described in

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-19 Thread MauMau
From: "Robert Haas" That may be what's important to you, but it's not what's important to me. National character types support may be important to some potential users of PostgreSQL and the popularity of PostgreSQL, not me. That's why national character support is listed in the PostgreSQL T

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-19 Thread Robert Haas
On Wed, Sep 18, 2013 at 6:42 PM, MauMau wrote: >> It seems to me that these two points here are the real core of your >> proposal. The rest is just syntactic sugar. > > No, those are "desirable if possible" features. What's important is to > declare in the manual that PostgreSQL officially suppo

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-18 Thread MauMau
From: "Tom Lane" Another point to keep in mind is that UTF16 is not really any easier to deal with than UTF8, unless you write code that fails to support characters outside the basic multilingual plane. Which is a restriction I don't believe we'd accept. But without that restriction, you're st

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-18 Thread MauMau
From: "Robert Haas" On Mon, Sep 16, 2013 at 8:49 AM, MauMau wrote: 2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always contain Unicode data. ... 3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns. Fixed-width encoding may allow faster string manipulation as des

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-18 Thread Heikki Linnakangas
On 18.09.2013 16:16, Robert Haas wrote: On Mon, Sep 16, 2013 at 8:49 AM, MauMau wrote: 2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always contain Unicode data. ... 3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns. Fixed-width encoding may allow faster string

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-18 Thread Tom Lane
Robert Haas writes: > On Mon, Sep 16, 2013 at 8:49 AM, MauMau wrote: >> 2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always >> contain Unicode data. >> ... >> 3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns. >> Fixed-width encoding may allow faster string manipul

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-18 Thread Robert Haas
On Mon, Sep 16, 2013 at 8:49 AM, MauMau wrote: > 2. NCHAR/NVARCHAR columns can be used in non-UTF-8 databases and always > contain Unicode data. ... > 3. Store strings in UTF-16 encoding in NCHAR/NVARCHAR columns. > Fixed-width encoding may allow faster string manipulation as described in > Oracle

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-17 Thread Arulappan, Arul Shaji
>-Original Message- >From: pgsql-hackers-ow...@postgresql.org [mailto:pgsql-hackers- >ow...@postgresql.org] On Behalf Of MauMau > >Hello, > >I think it would be nice for PostgreSQL to support national character types >largely because it should ease migration from other DBMSs. > >[Reasons

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-16 Thread MauMau
Hello, I think it would be nice for PostgreSQL to support national character types largely because it should ease migration from other DBMSs. [Reasons why we need NCHAR] -- 1. Invite users of other DBMSs to PostgreSQL. Oracle, SQL Server, MySQL,

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-04 Thread Tom Lane
"Boguk, Maksym" writes: > Hi, my task is implementing ANSI NATIONAL character string types as > part of PostgreSQL core. No, that's not a given. You have a problem to solve, ie store some UTF8 strings in a database that's mostly just 1-byte data. It is not clear that NATIONAL CHARACTER is the

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-03 Thread Boguk, Maksym
>> 1)Addition of new string data types NATIONAL CHARACTER and NATIONAL >> CHARACTER VARIABLE. >> These types differ from the char/varchar data types in one important >> respect: NATIONAL string types are always have UTF8 encoding even >> (independent from used database encoding). >I don't like

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-03 Thread Tom Lane
Heikki Linnakangas writes: > On 03.09.2013 05:28, Boguk, Maksym wrote: >> Target usage: ability to store UTF8 national characters in some >> selected fields inside a single-byte encoded database. > I think we should take a completely different approach to this. Two > alternatives spring to mind

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-03 Thread Heikki Linnakangas
On 03.09.2013 05:28, Boguk, Maksym wrote: Target usage: ability to store UTF8 national characters in some selected fields inside a single-byte encoded database. For sample if I have a ru-RU.koi8r encoded database with mostly Russian text inside, it would be nice to be able store an Japanese tex