Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Martijn van Oosterhout
On Mon, Dec 20, 2010 at 09:03:56AM +0900, Itagaki Takahiro wrote: On Mon, Dec 20, 2010 at 01:34, Tom Lane t...@sss.pgh.pa.us wrote: I agree that the default encoding is UTF-8, but it should be configurable by the 'encoding' parameter in control files. Why is it necessary to have such a

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread David Fetter
On Mon, Dec 20, 2010 at 08:01:42PM +0100, Martijn van Oosterhout wrote: On Mon, Dec 20, 2010 at 09:03:56AM +0900, Itagaki Takahiro wrote: On Mon, Dec 20, 2010 at 01:34, Tom Lane t...@sss.pgh.pa.us wrote: I agree that the default encoding is UTF-8, but it should be configurable by the

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Tom Lane
David Fetter da...@fetter.org writes: On Mon, Dec 20, 2010 at 08:01:42PM +0100, Martijn van Oosterhout wrote: I think you mean Unicode is not a superset of all character sets. I've heard this before but never found what's missing. [citation needed]? Windows-1252, ISO-2022-JP-2 and EUC-TW are

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Kenneth Marshall
On Mon, Dec 20, 2010 at 02:10:39PM -0500, Tom Lane wrote: David Fetter da...@fetter.org writes: On Mon, Dec 20, 2010 at 08:01:42PM +0100, Martijn van Oosterhout wrote: I think you mean Unicode is not a superset of all character sets. I've heard this before but never found what's missing.

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread David E. Wheeler
On Dec 20, 2010, at 11:53 AM, Kenneth Marshall wrote: Here is an interesting description of some of the gotchas: http://en.wikipedia.org/wiki/Windows-1252 FWIW, those are gotchas translating between Windows 1252 and Latin-1. Windows 1252's nerbles translate to UTF-8 just fine. David --

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Tom Lane
Kenneth Marshall k...@rice.edu writes: On Mon, Dec 20, 2010 at 02:10:39PM -0500, Tom Lane wrote: [citation needed]? Exactly what characters are missing, and why would the Unicode people have chosen to leave them out? It's not like they've not heard of those encodings, I'm sure. Here is an

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Kenneth Marshall
On Mon, Dec 20, 2010 at 03:08:48PM -0500, Tom Lane wrote: Kenneth Marshall k...@rice.edu writes: On Mon, Dec 20, 2010 at 02:10:39PM -0500, Tom Lane wrote: [citation needed]? Exactly what characters are missing, and why would the Unicode people have chosen to leave them out? It's not like

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Nicolas Barbier
2010/12/20 Martijn van Oosterhout klep...@svana.org: On Mon, Dec 20, 2010 at 09:03:56AM +0900, Itagaki Takahiro wrote: UTF-8 is not a superset of all encodings. I think you mean Unicode is not a superset of all character sets. I've heard this before but never found what's missing. [citation

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Martijn van Oosterhout
On Mon, Dec 20, 2010 at 10:15:56PM +0100, Nicolas Barbier wrote: From URL:http://en.wikipedia.org/wiki/Japanese_language_and_computers#Character_encodings: Unicode is supposed to solve all encoding problems in all languages of the world. [..] There are still controversies. For Japanese, the

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-20 Thread Itagaki Takahiro
On Tue, Dec 21, 2010 at 08:04, Martijn van Oosterhout klep...@svana.org wrote: On Mon, Dec 20, 2010 at 10:15:56PM +0100, Nicolas Barbier wrote: From URL:http://en.wikipedia.org/wiki/Japanese_language_and_computers#Character_encodings: ISTM that since all the mapping tables are public it

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Dimitri Fontaine
Hi, Thanks for your review and your time. Trying to answer some of your points there: Robert Haas robertmh...@gmail.com writes: I spent a little time looking at this tonight. I'm going to give you the same general advice that I've given other people who have submitted very large patches of

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Robert Haas
On Sun, Dec 19, 2010 at 5:30 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: I spent a little time looking at this tonight.  I'm going to give you the same general advice that I've given other people who have submitted very large patches of this

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Itagaki Takahiro
- Did we decide to ditch the encoding parameter for extension scripts and mandate UTF-8? No we didn't, we decided that the default encoding is UTF-8 and that any contrib script defaults to UTF-8, so that it's not necessary to care about the 'encoding' parameter in the control file there.

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Tom Lane
Itagaki Takahiro itagaki.takah...@gmail.com writes: Oh, I wasn't aware that Itagaki-san had objected to Tom's proposal. I agree that the default encoding is UTF-8, but it should be configurable by the 'encoding' parameter in control files. Why is it necessary to have such a parameter at all?

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes: Why is it necessary to have such a parameter at all? AFAICS it just adds complexity for little if any gain. Most extension files will probably be pure ASCII anyway. Dictionary files are *far* more likely to contain non-ASCII characters. If we've gotten

Re: [HACKERS] Extensions, patch v20 (bitrot fixes)

2010-12-19 Thread Itagaki Takahiro
On Mon, Dec 20, 2010 at 01:34, Tom Lane t...@sss.pgh.pa.us wrote: I agree that the default encoding is UTF-8, but it should be configurable by the 'encoding' parameter in control files. Why is it necessary to have such a parameter at all? UTF-8 is not a superset of all encodings. -- Itagaki