Vander Clock Stephane wrote:
let speak first about the utf8
UTF8 it's just a way to encode special character like è à etc ..
for this utf8 will use combination of char upper than ascii #127
In this way, and it's not the less, UTF8 stay compatible with all
software that work with 8 bit string.
What differences there is between WIN1252 and ISO8859_1?
Best Regards
=
|| ISMAEL ||
=
- Original Message -
From: Milan Babuskov
To: firebird-support@yahoogroups.com
Sent: Tuesday, January 10, 2012 5:07 AM
Subject: Re: [firebird-support] UTF8 in firebird
On Tue, 10 Jan 2012 08:18:00 -0500, Ismael L. Donis Garcia
ism...@citricos.co.cu wrote:
What differences there is between WIN1252 and ISO8859_1?
The encoding is a superset of ISO 8859-1, but differs from the IANA's
ISO-8859-1 by using displayable characters rather than control characters
in the
I don't know, i just try and ASCII seam to accept char 127 i thing
internally ASCII is based on 8 bits not 7 ...
On 1/8/2012 1:33 AM, Mark Rotteveel wrote:
On 7-1-2012 18:29, Vander Clock Stephane wrote:
I think you're talking about raw UTF-8 bytes; as other have
suggested, you should
On 7-1-2012 0:07, Michael Ludwig wrote:
where you see that some bytes are forbidden in ISO8859_1 ?
firebird never complain about it !
Then it could be said this is a bug, like here:
http://tech.groups.yahoo.com/group/firebird-support/message/112680
On 6-1-2012 11:07, Vander Clock Stephane wrote:
of course i was speaking about codepoint ! not (yet) so crazy to
thing i can put all the symbols in earth in 1 bytes :)
my index work perfectly, my sorting no (and off course) !
this why i write this paper about utf8 if not i will stay with
my
On 6-1-2012 10:47, Vander Clock Stephane wrote:
yes, at least some options in the database (or in the create statement)
to
define the size in byte of 1 UTF8 char
For exemple by default 1 utf8 char = 4 bytes (like it is now) and i can
be able to
customize it to be egual to 1 bytes.
Then
dear Ann,
You've got
some choices. You can pick one of the almost OK character sets. You
can use UTF8 and not overspecify field lengths and choose field
lengths that are likely to compress well with Firebird's RLE when
they're empty. Or, you can use the fairly well defined interfaces for
On 7-1-2012 14:36, Michael Ludwig wrote:
Isn't it rather that UTF-8 just follows the *Unicode* standard which
doesn't make any provisions for codepoints above 1114111 (0x10) and
hence doesn't require UTF-8 to use more than four bytes for encoding?
Okay, I took a look at the Unicode 6.0
On 7-1-2012 18:29, Vander Clock Stephane wrote:
I think you're talking about raw UTF-8 bytes; as other have
suggested, you should be using CHARACTER SET OCTETS. Which
means no characters, just bytes (octets).
yes sorry i m confuse about character, code point or raw UTF8 byte...
actually i
Dear Geoff,
I am far from convinced that your testing reveals real-world
differences between the current UTF8 implementation vs any
practical alternative (which neither ISO_8859 nor OCTETS
represent).
Stephane's tests show that when you carry a lot of extra space around
in strings, it slows
Hi Ann,
Ann Harrison wrote:
Dear Geoff,
I am far from convinced that your testing reveals real-world
differences between the current UTF8 implementation vs any
practical alternative (which neither ISO_8859 nor OCTETS
represent).
Stephane's tests show that when you carry a lot of extra
On Fri, 06 Jan 2012 00:42:25 +0400, Vander Clock Stephane
svandercl...@yahoo.fr wrote:
The longer term solution is for the Firebird project to look at its
data representation and find something that works better with UTF8.
yes, at least some options in the database (or in the create
Mark Rotteveel wrote:
and even you say yourself, in the true of the true standard, utf8 must
be encoded
in up to 6 char even !:)
That is not what I said. UTF-8 encoding was originally devised to allow
for encoding 2^31 - 1 characters using variable length encoding of 1 to 6
bytes (which
Mark Rotteveel schrieb am 05.01.2012 um 21:21 (+0100):
On Thu, 05 Jan 2012 21:10:15 +0400, Vander Clock Stephane
svandercl...@yahoo.fr wrote:
now let thing i target such country (portugal, spain, france,
italian, etc..) what kind of charset will best fit my database ?
of course UTF8 ! but
yes, at least some options in the database (or in the create statement)
to
define the size in byte of 1 UTF8 char
For exemple by default 1 utf8 char = 4 bytes (like it is now) and i can
be able to
customize it to be egual to 1 bytes.
Then it is no longer UTF-8.
i thing you have a
Vander Clock Stephane wrote:
no, you can store in iso-8859-1 ALL the UTF8 char :)
No this is incorrect. What you can store in ISO-8859-1 are
all the UTF8 codepoints not characters. Once you understand
the difference you will also understand that to do so means
that none of your indexes,
Vander Clock Stephane schrieb am 06.01.2012 um 14:50 (+0400):
No it isn't possible. You could attempt to store unicode
codepoints in ISO-8859-1 by inventing your own encoding,
not inventing my own encoding ! simply store in iso8859_1 the
code point (1 UTF8 code point = 1 bytes)
There's
Hi Stephane,
RLE doesn't work well with large fields that are mostly unused -
better than the absence of all compression, but not great when more
than 75% of every field is unused. Most applications used seven-bit
ASCII when Firebird's compression was developed.
Unfortunately, there are
Vander Clock Stephane wrote:
Keep utf8 like it is if you want, but why not add a new charset like
UTF8_SVDC that is completely egual to UTF8 except that it's considere that
when i write varchar(250) = 250 bytes (or 250 code point if you prefere) ?
As I have already said ... unicode needs 24
Vander Clock Stephane wrote:
i not understand, you spend so much in developpement to win
speed, you make that you can even optimize some stuff like
the TcpRemoteBufferSize and here i gave you an option to make
your system 2x more faster easily and i have as an answer
wear the cost ??
Stéphane,
I want to know if UTF8 is a good in Firebird so i do some
tests. can you gave me your opinion ?
Firebird's compression algorithm was designed before anyone had
thought about variable length data encoding, and the combination of
fixed length allocations and run-length compression is
On Thu, Jan 5, 2012 at 3:21 PM, Mark Rotteveel m...@lawinegevaar.nl wrote:
now you will say me: is their any penalty for this ? after all varchar
column are compressed ?
Unfortunately, as Ann indicates, the RLE used by Firebird is per byte, and
not per character. This means that the
Vander Clock Stephane wrote:
no, you can store in iso-8859-1 ALL the UTF8 char :)
No this is incorrect. What you can store in ISO-8859-1 are
all the UTF8 codepoints not characters. Once you understand
the difference you will also understand that to do so means
that none of your indexes,
24 matches
Mail list logo