Re: [Firebird-devel] Unicode UTF-16 etc

2013-09-03 Thread Norman Dunbar
Morning all, be gentle with me, I'm not all that good a developer! ;-) Given the problem with varchars being defined in bytes but needing to store chars, how feasible would it be to allow the definition of a column (or variable) in a manner similar to how Oracle does it? create table ...( a_c

Re: [Firebird-devel] Unicode UTF-16 etc

2013-09-02 Thread Ivan Přenosil
> Unfortunately the implementation of UTF-8 in Firebird is annoying > because it reduces that maximum allowed number of characters to a 1/4 of > that for single byte character sets making it necessary to switch to > blobs sooner. IIRC, old Interbase versions defined maximum Varchar length in *byte

Re: [Firebird-devel] Unicode UTF-16 etc

2013-09-02 Thread Michal Kubecek
On Mon, Sep 02, 2013 at 03:30:26PM +0200, Stefan Heymann wrote: > >>> I'd prefer to have an option to use UTF-16 (treated as a 2-byte > >>> character set with surrogate pairs) as that will only halve the > >>> maximum allowed number of characters. > > The maximum allowed number of characters in Un

Re: [Firebird-devel] Unicode UTF-16 etc

2013-09-02 Thread Stefan Heymann
>>> I'd prefer to have an option to use UTF-16 (treated as a 2-byte >>> character set with surrogate pairs) as that will only halve the >>> maximum allowed number of characters. The maximum allowed number of characters in Unicode is about 1 Million. Which can be perfectly represented by either UTF

Re: [Firebird-devel] Unicode UTF-16 etc

2013-08-31 Thread Ann Harrison
On Aug 31, 2013, at 4:55 AM, Mark Rotteveel wrote: > On 29-8-2013 17:41, Jim Starkey wrote: >> Paradoxically, Japanese strings tend to be shorter in UTF-8 than 16 bit >> Unicode. The reason is simple: There are enough single byte characters >> -- punctuation, control characters, and digits --

Re: [Firebird-devel] Unicode UTF-16 etc

2013-08-31 Thread Dimitry Sibiryakov
31.08.2013 13:53, Mark Rotteveel wrote: > On 31-8-2013 13:38, Dimitry Sibiryakov wrote: >> 31.08.2013 13:29, Mark Rotteveel wrote: >>> As most languages don't need those surrogate pairs for their >>> codepoints/glyphs, it is easier to consider UTF-16 to be 2 byte. As far >>> as I know this is how m

Re: [Firebird-devel] Unicode UTF-16 etc

2013-08-31 Thread Mark Rotteveel
On 31-8-2013 13:38, Dimitry Sibiryakov wrote: > 31.08.2013 13:29, Mark Rotteveel wrote: >> As most languages don't need those surrogate pairs for their >> codepoints/glyphs, it is easier to consider UTF-16 to be 2 byte. As far >> as I know this is how most UTF-16 implementations handle it. > >

Re: [Firebird-devel] Unicode UTF-16 etc

2013-08-31 Thread Dimitry Sibiryakov
31.08.2013 13:29, Mark Rotteveel wrote: > As most languages don't need those surrogate pairs for their > codepoints/glyphs, it is easier to consider UTF-16 to be 2 byte. As far > as I know this is how most UTF-16 implementations handle it. In this case UTF-16 has no difference from UCS2. --

Re: [Firebird-devel] Unicode UTF-16 etc

2013-08-31 Thread Mark Rotteveel
On 31-8-2013 13:15, Dimitry Sibiryakov wrote: > 31.08.2013 10:55, Mark Rotteveel wrote: >> I'd prefer to have an option to use UTF-16 (treated as a 2-byte >> character set with surrogate pairs) as that will only halve the maximum >> allowed number of characters. > > Nope. If you take into accou

Re: [Firebird-devel] Unicode UTF-16 etc

2013-08-31 Thread Dimitry Sibiryakov
31.08.2013 10:55, Mark Rotteveel wrote: > I'd prefer to have an option to use UTF-16 (treated as a 2-byte > character set with surrogate pairs) as that will only halve the maximum > allowed number of characters. Nope. If you take into account surrogates, UTF-16 will have the same maximum of 4

Re: [Firebird-devel] Unicode UTF-16 etc

2013-08-31 Thread Mark Rotteveel
On 29-8-2013 17:41, Jim Starkey wrote: > Paradoxically, Japanese strings tend to be shorter in UTF-8 than 16 bit > Unicode. The reason is simple: There are enough single byte characters > -- punctuation, control characters, and digits -- stay as single bytes, > double byte characters are a wash, a

Re: [Firebird-devel] Unicode UTF-16 etc

2013-08-29 Thread Jim Starkey
Paradoxically, Japanese strings tend to be shorter in UTF-8 than 16 bit Unicode. The reason is simple: There are enough single byte characters -- punctuation, control characters, and digits -- stay as single bytes, double byte characters are a wash, and the single byte characters generally bal

Re: [Firebird-devel] Unicode UTF-16 etc

2013-08-09 Thread Mark Rotteveel
On 9-8-2013 01:18, Adriano dos Santos Fernandes wrote: > On 08-08-2013 13:30, Mark Rotteveel wrote: >> Looking in the source of intl_builtin.cpp I noticed that there is >> support for UTF16, UTF32 and UNICODE_UCS2, for UNICODE_UCS2 there is >> also a constant (=8) defined in charsets.h >> >> These

Re: [Firebird-devel] Unicode UTF-16 etc

2013-08-09 Thread Paul Vinkenoog
Adriano wrote: >> Looking in the source of intl_builtin.cpp I noticed that there is >> support for UTF16, UTF32 and UNICODE_UCS2, for UNICODE_UCS2 there is >> also a constant (=8) defined in charsets.h >> >> These definitions are missing from RDB$CHARACTER_SETS. Can these be used >> as a connec

Re: [Firebird-devel] Unicode UTF-16 etc

2013-08-08 Thread Adriano dos Santos Fernandes
On 08-08-2013 13:30, Mark Rotteveel wrote: > Looking in the source of intl_builtin.cpp I noticed that there is > support for UTF16, UTF32 and UNICODE_UCS2, for UNICODE_UCS2 there is > also a constant (=8) defined in charsets.h > > These definitions are missing from RDB$CHARACTER_SETS. Can these

[Firebird-devel] Unicode UTF-16 etc

2013-08-08 Thread Mark Rotteveel
Looking in the source of intl_builtin.cpp I noticed that there is support for UTF16, UTF32 and UNICODE_UCS2, for UNICODE_UCS2 there is also a constant (=8) defined in charsets.h These definitions are missing from RDB$CHARACTER_SETS. Can these be used as a connection or column character set? If