Re: [firebird-support] UTF8 in firebird ?

2012-01-10 Thread Milan Babuskov
Vander Clock Stephane wrote: let speak first about the utf8 UTF8 it's just a way to encode special character like è à etc .. for this utf8 will use combination of char upper than ascii #127 In this way, and it's not the less, UTF8 stay compatible with all software that work with 8 bit string.

Re: [firebird-support] UTF8 in firebird ?

2012-01-10 Thread Ismael L. Donis Garcia
What differences there is between WIN1252 and ISO8859_1? Best Regards = || ISMAEL || = - Original Message - From: Milan Babuskov To: firebird-support@yahoogroups.com Sent: Tuesday, January 10, 2012 5:07 AM Subject: Re: [firebird-support] UTF8 in firebird

Re: [firebird-support] UTF8 in firebird ?

2012-01-10 Thread Mark Rotteveel
On Tue, 10 Jan 2012 08:18:00 -0500, Ismael L. Donis Garcia ism...@citricos.co.cu wrote: What differences there is between WIN1252 and ISO8859_1? The encoding is a superset of ISO 8859-1, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the

Re: [firebird-support] UTF8 in firebird ?

2012-01-08 Thread Vander Clock Stephane
I don't know, i just try and ASCII seam to accept char 127 i thing internally ASCII is based on 8 bits not 7 ... On 1/8/2012 1:33 AM, Mark Rotteveel wrote: On 7-1-2012 18:29, Vander Clock Stephane wrote: I think you're talking about raw UTF-8 bytes; as other have suggested, you should

Re: [firebird-support] UTF8 in firebird ?

2012-01-07 Thread Mark Rotteveel
On 7-1-2012 0:07, Michael Ludwig wrote: where you see that some bytes are forbidden in ISO8859_1 ? firebird never complain about it ! Then it could be said this is a bug, like here: http://tech.groups.yahoo.com/group/firebird-support/message/112680

Re: [firebird-support] UTF8 in firebird ?

2012-01-07 Thread Mark Rotteveel
On 6-1-2012 11:07, Vander Clock Stephane wrote: of course i was speaking about codepoint ! not (yet) so crazy to thing i can put all the symbols in earth in 1 bytes :) my index work perfectly, my sorting no (and off course) ! this why i write this paper about utf8 if not i will stay with my

Re: [firebird-support] UTF8 in firebird ?

2012-01-07 Thread Mark Rotteveel
On 6-1-2012 10:47, Vander Clock Stephane wrote: yes, at least some options in the database (or in the create statement) to define the size in byte of 1 UTF8 char For exemple by default 1 utf8 char = 4 bytes (like it is now) and i can be able to customize it to be egual to 1 bytes. Then

Re: [firebird-support] UTF8 in firebird ?

2012-01-07 Thread Vander Clock Stephane
dear Ann, You've got some choices. You can pick one of the almost OK character sets. You can use UTF8 and not overspecify field lengths and choose field lengths that are likely to compress well with Firebird's RLE when they're empty. Or, you can use the fairly well defined interfaces for

Re: [firebird-support] UTF8 in firebird ?

2012-01-07 Thread Mark Rotteveel
On 7-1-2012 14:36, Michael Ludwig wrote: Isn't it rather that UTF-8 just follows the *Unicode* standard which doesn't make any provisions for codepoints above 1114111 (0x10) and hence doesn't require UTF-8 to use more than four bytes for encoding? Okay, I took a look at the Unicode 6.0

Re: [firebird-support] UTF8 in firebird ?

2012-01-07 Thread Mark Rotteveel
On 7-1-2012 18:29, Vander Clock Stephane wrote: I think you're talking about raw UTF-8 bytes; as other have suggested, you should be using CHARACTER SET OCTETS. Which means no characters, just bytes (octets). yes sorry i m confuse about character, code point or raw UTF8 byte... actually i

Re: [firebird-support] UTF8 in firebird ?

2012-01-07 Thread Ann Harrison
Dear Geoff, I am far from convinced that your testing reveals real-world differences between the current UTF8 implementation vs any practical alternative (which neither ISO_8859 nor OCTETS represent). Stephane's tests show that when you carry a lot of extra space around in strings, it slows

Re: [firebird-support] UTF8 in firebird ?

2012-01-07 Thread Geoff Worboys
Hi Ann, Ann Harrison wrote: Dear Geoff, I am far from convinced that your testing reveals real-world differences between the current UTF8 implementation vs any practical alternative (which neither ISO_8859 nor OCTETS represent). Stephane's tests show that when you carry a lot of extra

Re: [firebird-support] UTF8 in firebird ?

2012-01-06 Thread Mark Rotteveel
On Fri, 06 Jan 2012 00:42:25 +0400, Vander Clock Stephane svandercl...@yahoo.fr wrote: The longer term solution is for the Firebird project to look at its data representation and find something that works better with UTF8. yes, at least some options in the database (or in the create

Re: [firebird-support] UTF8 in firebird ?

2012-01-06 Thread Lester Caine
Mark Rotteveel wrote: and even you say yourself, in the true of the true standard, utf8 must be encoded in up to 6 char even !:) That is not what I said. UTF-8 encoding was originally devised to allow for encoding 2^31 - 1 characters using variable length encoding of 1 to 6 bytes (which

Re: [firebird-support] UTF8 in firebird ?

2012-01-06 Thread Michael Ludwig
Mark Rotteveel schrieb am 05.01.2012 um 21:21 (+0100): On Thu, 05 Jan 2012 21:10:15 +0400, Vander Clock Stephane svandercl...@yahoo.fr wrote: now let thing i target such country (portugal, spain, france, italian, etc..) what kind of charset will best fit my database ? of course UTF8 ! but

Re: [firebird-support] UTF8 in firebird ?

2012-01-06 Thread Vander Clock Stephane
yes, at least some options in the database (or in the create statement) to define the size in byte of 1 UTF8 char For exemple by default 1 utf8 char = 4 bytes (like it is now) and i can be able to customize it to be egual to 1 bytes. Then it is no longer UTF-8. i thing you have a

Re: [firebird-support] UTF8 in firebird ?

2012-01-06 Thread Vander Clock Stephane
Vander Clock Stephane wrote: no, you can store in iso-8859-1 ALL the UTF8 char :) No this is incorrect. What you can store in ISO-8859-1 are all the UTF8 codepoints not characters. Once you understand the difference you will also understand that to do so means that none of your indexes,

Re: [firebird-support] UTF8 in firebird ?

2012-01-06 Thread Michael Ludwig
Vander Clock Stephane schrieb am 06.01.2012 um 14:50 (+0400): No it isn't possible. You could attempt to store unicode codepoints in ISO-8859-1 by inventing your own encoding, not inventing my own encoding ! simply store in iso8859_1 the code point (1 UTF8 code point = 1 bytes) There's

Re: [firebird-support] UTF8 in firebird ?

2012-01-06 Thread Ann Harrison
Hi Stephane, RLE doesn't work well with large fields that are mostly unused - better than the absence of all compression, but not great when more than 75% of every field is unused. Most applications used seven-bit ASCII when Firebird's compression was developed. Unfortunately, there are

Re: [firebird-support] UTF8 in firebird ?

2012-01-06 Thread Lester Caine
Vander Clock Stephane wrote: Keep utf8 like it is if you want, but why not add a new charset like UTF8_SVDC that is completely egual to UTF8 except that it's considere that when i write varchar(250) = 250 bytes (or 250 code point if you prefere) ? As I have already said ... unicode needs 24

Re: [firebird-support] UTF8 in firebird ?

2012-01-06 Thread Geoff Worboys
Vander Clock Stephane wrote: i not understand, you spend so much in developpement to win speed, you make that you can even optimize some stuff like the TcpRemoteBufferSize and here i gave you an option to make your system 2x more faster easily and i have as an answer wear the cost ??

Re: [firebird-support] UTF8 in firebird ?

2012-01-05 Thread Ann Harrison
Stéphane, I want to know if UTF8 is a good in Firebird so i do some tests. can you gave me your opinion ? Firebird's compression algorithm was designed before anyone had thought about variable length data encoding, and the combination of fixed length allocations and run-length compression is

Re: [firebird-support] UTF8 in firebird ?

2012-01-05 Thread Ann Harrison
On Thu, Jan 5, 2012 at 3:21 PM, Mark Rotteveel m...@lawinegevaar.nl wrote: now you will say me: is their any penalty for this ? after all varchar column are compressed ? Unfortunately, as Ann indicates, the RLE used by Firebird is per byte, and not per character. This means that the

Re: [firebird-support] UTF8 in firebird ?

2012-01-05 Thread Geoff Worboys
Vander Clock Stephane wrote: no, you can store in iso-8859-1 ALL the UTF8 char :) No this is incorrect. What you can store in ISO-8859-1 are all the UTF8 codepoints not characters. Once you understand the difference you will also understand that to do so means that none of your indexes,