Jonas Maebe wrote:
The way to implement stuff like that is to call the appropriate library
functions. It makes no sense to completely re-implement everything in
the RTL.
Of course I have been supposing that (if possible) the RTL would call a
library function. But it makes no difference: If the
On 17 Sep 2009, at 09:33, Michael Schnell wrote:
Of course I have been supposing that (if possible) the RTL would
call a
library function. But it makes no difference: If the algorithm to
normalize the multi-character codes (if this is possible at all)
needs
a table of several GBytes and
Jonas Maebe wrote:
Neither that much space nor that much time is required.
Any pointers regarding a decent estimation ?
As there are billions of possible Unicode characters and most of them
potentially can be alternately depicted by one or multiple
multi-Unicode surrogates, I don't share
On 17 Sep 2009, at 09:55, Michael Schnell wrote:
Jonas Maebe wrote:
Neither that much space nor that much time is required.
Any pointers regarding a decent estimation ?
No.
As there are billions of possible Unicode characters and most of
them
potentially can be alternately depicted
In our previous episode, Michael Schnell said:
[ Charset ISO-8859-1 unsupported, converting... ]
Jonas Maebe wrote:
Neither that much space nor that much time is required.
Any pointers regarding a decent estimation ?
http://www.stack/nl/~marcov/unicode.jpg
there is a v5 now though.
Michael Schnell het geskryf:
Neither that much space nor that much time is required.
Any pointers regarding a decent estimation ?
Open a document with OpenOffice and search for text. It's as simple as
that. You are seriously over exaggerating on the GBytes size lookup
tables etc...
Maybe you
Luiz Americo Pereira Camara het geskryf:
Yes. RTLString would be just an alias to UnicodeString in win32 and
UTF8String in unixes
Bad news for Michael. :-) We would have to have serious documentation on
all the string types supported by FPC - I'm loosing count! We would
also need a nice big
In our previous episode, Graeme Geldenhuys said:
Yes. RTLString would be just an alias to UnicodeString in win32 and
UTF8String in unixes
Bad news for Michael. :-) We would have to have serious documentation on
all the string types supported by FPC - I'm loosing count! We would
also
In our previous episode, Marco van de Voort said:
Java is not portable. The VM hides the platform differences.
Maybe that came out to strong. I don't want endless discussions about this.
Point is, I meant more Java has a different philosophy wrt portability.
Marco van de Voort het geskryf:
A good place to start:
http://www.stack.nl/~marcov/delphistringtypes.txt
Good god! 16 different types already! It's worse (and a lot more) than I
thought. :-)
Regards,
- Graeme -
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
In our previous episode, Graeme Geldenhuys said:
A good place to start:
http://www.stack.nl/~marcov/delphistringtypes.txt
Good god! 16 different types already! It's worse (and a lot more) than I
thought. :-)
Not counting aliases and untyped library level usage like Lazarus UTF-8 in
Marco van de Voort wrote:
http://www.stack/nl/~marcov/unicode.jpg
Obviously a huge Volume for a huge encoding scheme that really imposes a
huge number of huge problems ;).
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
Graeme Geldenhuys wrote:
Open a document with OpenOffice and search for text.
Unicode is all about language independent coding.
Unfortunately I don't have a language independent version of Open
Office, that would be able to do this for -say- ancient Egypt.
-Michael
In our previous episode, Michael Schnell said:
Marco van de Voort wrote:
http://www.stack/nl/~marcov/unicode.jpg
Obviously a huge Volume for a huge encoding scheme that really imposes a
huge number of huge problems ;).
No, the description is about one-two hundred pages, the rest are
Michael Schnell het geskryf:
Marco van de Voort wrote:
like Lazarus UTF-8 in
ansistring.
That already produced a huge confusion and obviously is a way that can't
decently be followed in the future.
fpGUI Toolkit is in the same boat. Using UTF-8 inside AnsiStrings. And
like Lazarus, fpGUI
On 17 Sep 2009, at 13:11, Graeme Geldenhuys wrote:
fpGUI Toolkit is in the same boat. Using UTF-8 inside AnsiStrings. And
like Lazarus, fpGUI also has various UTF-8 friendly RTL String
functions. It works relatively well, as long as the developer knows
that
he/she must rather use the
Michael Schnell het geskryf:
Unfortunately I don't have a language independent version of Open
Office, that would be able to do this for -say- ancient Egypt.
What I meant was, use OpenOffice to open a document with Unicode text in
it. Make use that some visual character and bases on
On Thu, Sep 17, 2009 at 8:11 AM, Jonas Maebe jonas.ma...@elis.ugent.be wrote:
Please, not again this discussion. I think we all know what everyone thinks
about that, what the pitfalls are and this will eventually be solved by
D2009-style unicode string support.
AFAIK D2009-style solution is
Graeme Geldenhuys wrote:
What I meant was, use OpenOffice to open a document with Unicode text in
it. Make use that some visual character and bases on normalized text
and some are bases on non-normalized text (two or more characters
forming one visual character. eg: o + ¨ = ö).
How am I
Michael Schnell het geskryf:
How am I supposed to input o + in Open Office in a way that the
program combines them to ö ?
Not that this is the right list for OpenOffice support - but I'll make
it quick. ;-)
I use Dvorak keyboard layout, so it makes such character input a breeze.
You can test this also using the TCharacter class
http://wiki.lazarus.freepascal.org/Theodp
procedure TForm1.Button1Click(Sender: TObject);
var a,b,c:String;
begin
a:= UTF8Encode(WideChar($41)+WideChar($030A)); //Decomposed Å
b:= UTF8Encode(WideChar($212B)); //Å Ångström
c:=
On 17 Sep 2009, at 13:34, Felipe Monteiro de Carvalho wrote:
On Thu, Sep 17, 2009 at 8:11 AM, Jonas Maebe jonas.ma...@elis.ugent.be
wrote:
Please, not again this discussion. I think we all know what
everyone thinks
about that, what the pitfalls are and this will eventually be
solved by
Vincent Snijders het geskryf:
http://www.stack.nl/~marcov/unicode.jpg
The electronic version is also freely available from the unicode.org
website. I can't remember the direct link, but I do have Unicode v5
chapters 1-6 here in pdf format, downloaded from unicode.org
Regards,
- Graeme -
In our previous episode, Graeme Geldenhuys said:
http://www.stack.nl/~marcov/unicode.jpg
The electronic version is also freely available from the unicode.org
website. I can't remember the direct link, but I do have Unicode v5
chapters 1-6 here in pdf format, downloaded from unicode.org
On Thu, Sep 17, 2009 at 9:42 AM, Jonas Maebe jonas.ma...@elis.ugent.be wrote:
It isn't. It is a string type whereby the string encoding is part of the
string information (just like the reference count and length already are
currently).
Ah, that string type. Previous discussions between Lazarus
Felipe Monteiro de Carvalho schreef:
On Thu, Sep 17, 2009 at 9:42 AM, Jonas Maebe jonas.ma...@elis.ugent.be wrote:
It isn't. It is a string type whereby the string encoding is part of the
string information (just like the reference count and length already are
currently).
Ah, that string
Graeme Geldenhuys escreveu:
Luiz Americo Pereira Camara het geskryf:
Yes. RTLString would be just an alias to UnicodeString in win32 and
UTF8String in unixes
Bad news for Michael. :-) We would have to have serious documentation on
all the string types supported by FPC - I'm loosing
2009/9/17 Luiz Americo Pereira Camara pascal...@bol.com.br:
RTLString would not meant to be used in client applications. Would be useful
only in functions that interact with system calls like the RTL ones having
two benefits: avoiding extra encoding conversions and the need for
duplicated RTL
Graeme Geldenhuys escreveu:
2009/9/17 Luiz Americo Pereira Camara pascal...@bol.com.br:
RTLString would not meant to be used in client applications. Would be useful
only in functions that interact with system calls like the RTL ones having
two benefits: avoiding extra encoding conversions
I suppose converting a combined character into a single character is not
possible as it would need a huge table.
Michael Schnell, I thought you'd know about character.pas
http://wiki.lazarus.freepascal.org/Theodp
It does normalization:
class function Normalize_NFD(AString: UTF8String):
On Tuesday 15 September 2009 15:31:36 Thaddy wrote:
afaik widestrings are reference counted in Delphi. PWideChars not.
According my experience, the Delphi7/Kylix3 documentation and this article:
http://edn.embarcadero.com/article/21301
WideStrings are now reference counted. In Windows, the
Jonas Maebe wrote:
There are definitions of canonical forms (both composed and
decomposed) of utf strings ...
So unless the rtl automatically offers this, the user is required to
take care of this by hand any time he tries to analyze a string in any way.
Code from hell
-Michael
On 16 Sep 2009, at 11:30, Michael Schnell wrote:
Jonas Maebe wrote:
There are definitions of canonical forms (both composed and
decomposed) of utf strings ...
So unless the rtl automatically offers this, the user is required to
take care of this by hand any time he tries to analyze a
In our previous episode, Thaddy said:
It is. Widestring always worked more or less, on both FPC,Kylix and Delphi.
But the COM backed versions (FPC2.2+ (?) and Delphi) suffered from
performance problems
As I wrote it should be opaque ( = transparent, btw).
At least for windows I overcame
Martin Schreiber escreveu:
On Tuesday 15 September 2009 18:04:33 Luiz Americo Pereira Camara wrote:
In my view, to get the fpc unicode support in a good state would be
necessary to implement the encoding field in the string type so
converting strings can be done system independently (seems
Hi,
I reviewing the various errors that I am experiencing with tiOPF and FPC
2.3.1. The last count was 130+ errors! Considering that there was under
10 errors with FPC 2.2.5 and tiOPF, it seems a lot in FPC has changed.
Anyway, there are a few places in tiOPF code that has been IFDEF'ed
UNICODE
On 15 Sep 2009, at 11:32, Graeme Geldenhuys wrote:
As far as I know FPC doesn't have Unicode support like Delphi 2009+
has.
Yet, when I query a WideString property, the RTTI functions now return
tkUString. tkUString is the Delphi Unicode string - but FPC doesn't
support that yet? So why is
Jonas Maebe schrieb:
I guess that FPC should simply write tkWString also for this unicode
string type, since that's effectively what it is. Florian?
No. An unicode string encoded by tkUString is an utf-16 string.
___
fpc-devel maillist -
Graeme Geldenhuys schrieb:
Hi,
I reviewing the various errors that I am experiencing with tiOPF and FPC
2.3.1. The last count was 130+ errors! Considering that there was under
10 errors with FPC 2.2.5 and tiOPF, it seems a lot in FPC has changed.
You should have tested with the unicode
Marco van de Voort het geskryf:
MSE has no D2009 tested code afaik.
MSE has no unit tests, period!
As far as I know unicodestring support is not at D2009 level, since the
1-byte stuff and the format of internals are still missing/different?
Exactly. Plus, from what I can see from the unit
On Tuesday 15 September 2009 11:49:13 Florian Klaempfl wrote:
Who says that? What is not supported? Which issue report? FPC 2.3.1
claims to support the UnicodeString type fully and it can be that bad
because e.g. MSE is using it afaik.
Correct, it is enabled for MSEide+MSEgui SVN trunk which
Florian Klaempfl het geskryf:
You should have tested with the unicode string branch one year ago ;)
I gave up a long time ago testing unstable FPC branches with
production code. Things change to often. I only test with the new FPC
when it is announced that a new version ('fixes' branch is
Graeme Geldenhuys schrieb:
Florian Klaempfl het geskryf:
You should have tested with the unicode string branch one year ago ;)
I gave up a long time ago testing unstable FPC branches with
production code. Things change to often. I only test with the new FPC
when it is announced that a new
Graeme Geldenhuys schrieb:
Graeme Geldenhuys het geskryf:
Who says that? What is not supported?
Let me know how far you get with this example as well.
http://compaspascal.blogspot.com/2008/10/delphi-2009-strings-explained-by.html
Didn't we talk about *unicode* ?
Graeme Geldenhuys schrieb:
Marco van de Voort het geskryf:
MSE has no D2009 tested code afaik.
MSE has no unit tests, period!
As far as I know unicodestring support is not at D2009 level, since the
1-byte stuff and the format of internals are still missing/different?
Exactly.
The
In our previous episode, Florian Klaempfl said:
Exactly.
The 1-byte stuff has nothing to do with unicode but with code page
aware strings.
Doesn't it have certain consequences for unicodestring- ansistring
conversions? Most notably to avoid that if a procedure has ansistring in
its
Graeme Geldenhuys wrote:
MyString := '世界您好';
MyChar := MyString[1];
writeln(MyChar);
end.
Extracting a Char from a UnicodeString? What's that supposed to do?
Micha
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
In our previous episode, Micha Nelissen said:
Graeme Geldenhuys wrote:
MyString := '';
MyChar := MyString[1];
writeln(MyChar);
end.
Extracting a Char from a UnicodeString? What's that supposed to do?
CHAR is a 16-bit wchar in D2009. Simularly, pchar is a pointer to a
Marco van de Voort wrote:
In our previous episode, Micha Nelissen said:
Graeme Geldenhuys wrote:
MyString := '';
MyChar := MyString[1];
writeln(MyChar);
end.
Extracting a Char from a UnicodeString? What's that supposed to do?
CHAR is a 16-bit wchar in D2009. Simularly, pchar is a
Graeme Geldenhuys schrieb:
Micha Nelissen het geskryf:
Extracting a Char from a UnicodeString? What's that supposed to do?
Follow the URL I posted with that example.
I don't claim to know everything regarding Unicode. Florian said FPC
supports Unicode,
I said it supports the
Micha Nelissen schrieb:
Graeme Geldenhuys wrote:
MyString := '世界您好';
MyChar := MyString[1];
writeln(MyChar);
end.
Extracting a Char from a UnicodeString? What's that supposed to do?
As I said, it's UCS coding style :)
___
fpc-devel
Graeme Geldenhuys schrieb:
Florian Klaempfl het geskryf:
Then you shouldn't cry if a release candidate breaks your stuff :)
I'm not crying. I go through this process on every new FPC release. It's
part of my job. :-)
We can only fix stuff we know about.
And I only know about things
Florian Klaempfl het geskryf:
Do you use the cwstrings unit? Did you tell the encoding (UTF-8?) to the
compiler? Did you use the UnicodeChar instead of Char?
Yes to all, and the example still doesn't work.
$ ./test3
ä
ä is not 世 as the
On 15 Sep 2009, at 13:44, Florian Klaempfl wrote:
Graeme Geldenhuys schrieb:
Micha Nelissen het geskryf:
Extracting a Char from a UnicodeString? What's that supposed to do?
Follow the URL I posted with that example.
I don't claim to know everything regarding Unicode. Florian said FPC
In our previous episode, Micha Nelissen said:
MyChar := MyString[1];
writeln(MyChar);
end.
Extracting a Char from a UnicodeString? What's that supposed to do?
CHAR is a 16-bit wchar in D2009. Simularly, pchar is a pointer to a 16-bits
char. (pansichar being the 1-byte one).
..
Jonas Maebe het geskryf:
The problem is that Delphi 2009's unicodestring type is something
completely different.
Correct. I don't know the 'cpstr' branch so I can't comment on that.
Regards,
- Graeme -
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
On Tuesday 15 September 2009 12:56:08 Graeme Geldenhuys wrote:
Who says that? What is not supported?
This I found myself:
* Unicode+Variants... varUString type.
By Google'ing for some D2009 unicode examples and searching the FPC
source. Here I list a few that I found in under 5 minutes:
Martin Schreiber het geskryf:
I think there is a misunderstanding. FPC UnicodeString type is identical to
widestring on all platforms except Windows. On Windows widestring is actual a
not reference counted OLE-string.
The problem as, that it's very different to Delphi. Bottom line: not
On 15 Sep 2009, at 13:53, Graeme Geldenhuys wrote:
ian Klaempfl het geskryf:
Do you use the cwstrings unit? Did you tell the encoding (UTF-8?)
to the
compiler? Did you use the UnicodeChar instead of Char?
Yes to all, and the example still doesn't work.
$
Jonas Maebe schrieb:
On 15 Sep 2009, at 13:53, Graeme Geldenhuys wrote:
ian Klaempfl het geskryf:
Do you use the cwstrings unit? Did you tell the encoding (UTF-8?) to the
compiler? Did you use the UnicodeChar instead of Char?
Yes to all, and the example still doesn't work.
Jonas Maebe het geskryf:
That's because you did not specify the code page of the source code,
in which case it's parsed as CP 8859-1.
Even though my Linux system defaults to UTF-8? Umm
Add {$codepage utf-8} or save
Adding that with test3.pas and then it works.
$
On 15 Sep 2009, at 14:15, Florian Klaempfl wrote:
Jonas Maebe schrieb:
ä is not 世 as the website described the result to be.
That's because you did not specify the code page of the source
code, in
which case it's parsed as CP 8859-1. Add {$codepage utf-8} or save
the
file with an
On 15 Sep 2009, at 14:18, Graeme Geldenhuys wrote:
Why doesn't the widestring manager default to the system defaults of
my
platform: UTF-8?
It does default to the system code page. It's the compiler that
doesn't while parsing your source file.
Jonas
Graeme Geldenhuys schrieb:
Jonas Maebe het geskryf:
That's because you did not specify the code page of the source code,
in which case it's parsed as CP 8859-1.
Even though my Linux system defaults to UTF-8? Umm
Add {$codepage utf-8} or save
Adding that with test3.pas and then it
Micha Nelissen wrote:
Graeme Geldenhuys wrote:
MyString := '世界您好';
MyChar := MyString[1];
writeln(MyChar);
end.
Extracting a Char from a UnicodeString? What's that supposed to do?
If Char is an 8-bit coded thing, for me this does not make sense
(using the [n] notation to take the
On Tuesday 15 September 2009 14:34:52 Jonas Maebe wrote:
I don't think that the default source code encoding has ever been
changed. And the way to specify it is also quite old already.
Wasn't a string constant containing a character #127 treated as widestring
earlier?
Martin
Jonas Maebe wrote:
Our current unicodestring is an utf16string or something like that.
I don't claim that this is bad, but it can't be Delphi compatible at all.
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
Martin Schreiber wrote:
On Windows widestring is actual a
not reference counted OLE-string.
How can decent (and System independent) coding be done with not
reference counting (variable length) strings ?
-Michael
___
fpc-devel maillist -
In our previous episode, Michael Schnell said:
If we really want a character, MyChar would need to be a 32-Bit thing,
and (in case of UTF, the [n] notation would need to scan the Unicode
byte stream to find it, but I don't know if it's implemented in that way.)
Afaik a character in the unicode
Florian Klaempfl het geskryf:
Because the source might not be written by you and because we want
consistent behaviour.
OK, thanks.
Does the $codepage only affect the WideString Manager? It doesn't
affect AnsiString (String type) at all?
Regards,
- Graeme -
--
fpGUI Toolkit - a
On 15 Sep 2009, at 14:47, Martin Schreiber wrote:
On Tuesday 15 September 2009 14:34:52 Jonas Maebe wrote:
I don't think that the default source code encoding has ever been
changed. And the way to specify it is also quite old already.
Wasn't a string constant containing a character #127
On 15 Sep 2009, at 14:58, Graeme Geldenhuys wrote:
Does the $codepage only affect the WideString Manager? It doesn't
affect AnsiString (String type) at all?
It affects how the compiler interprets characters #127 inside
constant strings appearing in your source code. This can affect both
On 15 Sep 2009, at 14:54, Michael Schnell wrote:
Martin Schreiber wrote:
On Windows widestring is actual a
not reference counted OLE-string.
How can decent (and System independent) coding be done with not
reference counting (variable length) strings ?
Ask Microsoft and Borland. Microsoft
On Tuesday 15 September 2009 15:21:54 Martin Schreiber wrote:
On Tuesday 15 September 2009 15:07:48 Jonas Maebe wrote:
On 15 Sep 2009, at 14:54, Michael Schnell wrote:
Martin Schreiber wrote:
On Windows widestring is actual a
not reference counted OLE-string.
How can decent (and
In our previous episode, Thaddy said:
afaik widestrings are reference counted in Delphi. PWideChars not.
Yes, but not by Delphi but by COM.
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
Marco van de Voort wrote:
In our previous episode, Thaddy said:
afaik widestrings are reference counted in Delphi. PWideChars not.
Yes, but not by Delphi but by COM.
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
On Tuesday 15 September 2009 18:04:33 Luiz Americo Pereira Camara wrote:
In my view, to get the fpc unicode support in a good state would be
necessary to implement the encoding field in the string type so
converting strings can be done system independently (seems to be the
case of cpstr
77 matches
Mail list logo