Hello again,
We are seeing more and more hacks being applied to projects trying
to scramble around the missing FPC feature - no built-in Unicode
supporting.
A simple example in Lazarus Loading a UTF-8 encoded file into a TMemo.
Normally you would write code as follows (for ANSI text):
shorter (and faster) hacky crap:
ls := TStringList.Create;
ls.LoadFromFile('someunicodefile.txt');
Memo.Text := UTF8Encode(ls.Text);
ls.Free
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
Graeme Geldenhuys schrieb:
Hello again,
We are seeing more and more hacks being applied to projects trying
to scramble around the missing FPC feature - no built-in Unicode
supporting.
A simple example in Lazarus Loading a UTF-8 encoded file into a TMemo.
Normally you would write code as
Op Thu, 20 Nov 2008, schreef Graeme Geldenhuys:
All that crap just to load a simple text file that contains unicode
content!!! :-( And the other problem is that the hack above assumes
the files content is UTF-8 encoded. If the content is UTF-16 encoded,
you need yet another hack. :-(
As far
On Thu, Nov 20, 2008 at 11:12 AM, Florian Klaempfl
[EMAIL PROTECTED] wrote:
Ok, two questions for the example above:
- how do you maintain backward compatibility?
- how do you load a plain old ansi file?
If the file is UTF-8 or ANSI, the above should work. UTF-8 was
designed to be backward
Graeme Geldenhuys schrieb:
On Thu, Nov 20, 2008 at 11:12 AM, Florian Klaempfl
[EMAIL PROTECTED] wrote:
Ok, two questions for the example above:
- how do you maintain backward compatibility?
- how do you load a plain old ansi file?
If the file is UTF-8 or ANSI, the above should work. UTF-8
Op Thu, 20 Nov 2008, schreef Graeme Geldenhuys:
On Thu, Nov 20, 2008 at 11:12 AM, Florian Klaempfl
[EMAIL PROTECTED] wrote:
Ok, two questions for the example above:
- how do you maintain backward compatibility?
- how do you load a plain old ansi file?
If the file is UTF-8 or ANSI, the
On Thu, Nov 20, 2008 at 11:28 AM, Daniël Mantione
[EMAIL PROTECTED] wrote:
These instructions are highly unproductive. Work on being able to compile
the RTL in either ansi/unicode depending on the platform has started.
Full Unicode support is for FPC 2.4.
Well, that's the first I heard of it.
On Thu, Nov 20, 2008 at 10:06, Graeme Geldenhuys
[EMAIL PROTECTED] wrote:
Unfortunately that doesn't work if the file contains unicode content,
so the following hack is required which is quite nasty:
ls := TStringList.Create;
ls.LoadFromFile('someunicodefile.txt');
for i := 0 to
FPC supports Unicode, in 2.3.x is the UnicodeString type available
being a ref. counted utf-16 string on all platforms.
Is same used by TStringList ?
I don't think so, otherwise LoadFromFile should need to be aware of
several possible file encodings. And I suppose the utf8-API of the LCL
Op Thu, 20 Nov 2008, schreef Graeme Geldenhuys:
On Thu, Nov 20, 2008 at 11:37 AM, Florian Klaempfl
[EMAIL PROTECTED] wrote:
FPC supports Unicode, in 2.3.x is the UnicodeString type available being a
ref. counted utf-16 string on all platforms.
OK, I'll try to switch fpGUI's TfpgString
Full Unicode support is for FPC 2.4. If you need it today, widestrings
are your best option.
Unfortunately working with WideString in Lazarus is close to impossible
as the LCL API is done with UTF8String and there is no correct automatic
conversion between UTF8String and WideString, as the
On Thu, Nov 20, 2008 at 12:07 PM, Michael Schnell [EMAIL PROTECTED] wrote:
Russian locale requires a 1 byte char.
Hmmm. We did lots of non-Unicode Delphi programs with a Russian ANSI
variant.
Well, I have a Russian user of fpGUI. He noted quite a few issues with
FPC's locale variables and
I started a separate thread for this lazarus part of the unicode talk.
On Thu, Nov 20, 2008 at 7:37 AM, Florian Klaempfl
[EMAIL PROTECTED] wrote:
And that's why I urge all core FPC
developers to try and finalize a Unicode design. Otherwise you leave
it up to developers to keep adding
If you want to help, we need to implement the Delphi 2009 encoding
aware string type, both runtime support as well as the compiler support.
A previous discussion showed that this also breaks a lot of old code and
is not really nice.
So a better concept seems to have a dedicated type for
if a real utf8string would be a solution for Lazarus (I am not saying
it is, but it could be), we need to have a directive to change the
default string into utf8string. To avoid a huge amount of code to need
to be suddenly changed. Then only ansistring needs to be changed.
--
Felipe Monteiro de
Op Thu, 20 Nov 2008, schreef Michael Schnell:
If you want to help, we need to implement the Delphi 2009 encoding aware
string type, both runtime support as well as the compiler support.
A previous discussion showed that this also breaks a lot of old code and is
not really nice.
As I
Maybe a real UTF8String?
Does this mean teach the compiler tell the type UTF8String from the type
ANSIString and do the appropriate conversion automatically (and do the
assignment of constants appropriately) ?
I suppose this in fact would solve a lot problems for Lazarus.
If on top of
Op Thu, 20 Nov 2008, schreef Bernd Mueller:
Felipe Monteiro de Carvalho wrote:
I would like to hear of others actually have a better proposal for Lazarus.
sorry, I have no idea since I am doing primarily embedded stuff. Speed and
backward compatibility are the most important factors to
But it seems, that not everybody is happy with the current Codegear
Unicode solution:
https://forums.codegear.com/thread.jspa?threadID=7140tstart=0
This is neither backwards compatible, nor nice, nor fast nor small :(
After reading this thread, I am not sure, if Delphi 2009
Op Thu, 20 Nov 2008, schreef Michael Schnell:
The file is assumed to be in system encoding (which can be UTF-8). Support
for reading of other encodings has not been decided on about yet and is not
part of the initial plan.
What is system encoding regarding different OS, locale, ... ?
On Thu, Nov 20, 2008 at 12:50 PM, Daniël Mantione
[EMAIL PROTECTED] wrote:
What is system encoding regarding different OS, locale, ... ?
System encoding is the encoding your files are written in when doing a
echo Hello file.txt.
Good explanation Daniël. :-) I always wonder that same
* Copy, Length, Pos etc...?
Yup.
* What about usage like: SomeString[x] := 'A';
String element based.
This also holds for Copy, Length, Pos, etc.
I thinks if would be a good idea to provide dedicated functions for the
element based (fast) and the character based (old style
System encoding is the encoding your files are written in when doing a
echo Hello file.txt.
nice point :)
I Suppose with my German WinXP system encoding is German ANSI
Does it hold only for files ? I suppose WinXP provides an OS API with
WideStrings (supposedly UCS16).
But how do I have
Isn't this the same??
I understand that D2009 uses dynamic code information, while my
suggestion is based on several different (static) types.
I feel that static types are a lot easier to implement and if using them
correctly, the user can tune the program to be as fast as possible or as
UCS16
UTF16 :)
-Michael
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
Op Thu, 20 Nov 2008, schreef Felipe Monteiro de Carvalho:
So, what kind of support could be implemented in Free Pascal to
improve things for Lazarus and it´s users?
Maybe a real UTF8String?
There will be a real UTF8string, i.e. ansistring with UTF-8 encoding as
part of type information,
On Thu, Nov 20, 2008 at 12:55 PM, Michael Schnell [EMAIL PROTECTED] wrote:
* What about usage like: SomeString[x] := 'A';
String element based.
This also holds for Copy, Length, Pos, etc.
I thinks if would be a good idea to provide dedicated functions for the
element based (fast) and the
Zitat von Felipe Monteiro de Carvalho [EMAIL PROTECTED]:
if a real utf8string would be a solution for Lazarus (I am not saying
it is, but it could be), we need to have a directive to change the
default string into utf8string. To avoid a huge amount of code to need
to be suddenly changed. Then
Op Thu, 20 Nov 2008, schreef Michael Schnell:
Isn't this the same??
I understand that D2009 uses dynamic code information, while my suggestion is
based on several different (static) types.
As I understand it is static.
type cp850string=ansistring(CP_850);
Daniël Mantione wrote:
Op Thu, 20 Nov 2008, schreef Felipe Monteiro de Carvalho:
So, what kind of support could be implemented in Free Pascal to
improve things for Lazarus and it´s users?
Maybe a real UTF8String?
There will be a real UTF8string, i.e. ansistring with UTF-8 encoding
as part of
Op Thu, 20 Nov 2008, schreef Martin Friebe:
Daniël Mantione wrote:
Op Thu, 20 Nov 2008, schreef Felipe Monteiro de Carvalho:
So, what kind of support could be implemented in Free Pascal to
improve things for Lazarus and it´s users?
Maybe a real UTF8String?
There will be a real UTF8string,
For best backward compatibility, I would say Copy, Length, Pos etc
should work by character based by default.
The thing is we can't reasonablly provide functions based on what a user
would see as a character because doing so would require huge lookup
tables (one user visible character != one
For best backward compatibility, I would say Copy, Length, Pos etc
should work by character based by default.
Agreed.
Then introduce more
optimised versions like ElementCopy, ElementLength, etc... Old
programs will work out of the box, but might experience a minor speed
penalty, until the
The thing is we can't reasonablly provide functions based on what a
user would see as a character because doing so would require huge
lookup tables (one user visible character != one code point) so the
best we can do is code point based which isn't really much better for
most tasks than code
type cp850string=ansistring(CP_850);
utf8string=ansistring(CP_UTF8);
Why not use the current locale for this ? Would that be just ANSIString ?
a:=b; {Compiler knows conversion to perform at compile time.
I suppose the conversion function is provided with the locale and this
it
Compiler support for a unicode string is not enough for the LCL.
As long as base classes like TStrings uses ansistrings, the LCL must use a
string type, that does no conversion.
Of course you are right that the RTL needs to be made up accordingly.
Maybe TStrings and friends are needed in
Ok, two questions for the example above:
- how do you maintain backward compatibility?
- how do you load a plain old ansi file?
You could alter the LoadFromFile(), LoadFromStream(), SaveToFile(),
SaveToStrwam() routines like below:
procedure TStringList.LoadFromFile(AFileName: TFilename;
On Thu, Nov 20, 2008 at 1:50 PM, Michael Schnell [EMAIL PROTECTED] wrote:
Compiler support for a unicode string is not enough for the LCL.
As long as base classes like TStrings uses ansistrings, the LCL must use a
string type, that does no conversion.
Of course you are right that the RTL
On Thu, Nov 20, 2008 at 1:22 PM, peter green [EMAIL PROTECTED] wrote:
The thing is we can't reasonablly provide functions based on what a user
would see as a character because doing so would require huge lookup tables
(one user visible character != one code point) so the best we can do is code
On 20 Nov 2008, at 13:13, Graeme Geldenhuys wrote:
I think basing those functions on code points should suffice. I also
think as soon as strings are assigned or loaded from file, they should
be normalized. So two code points like the A and Umlaut code points
would become one.
How would one
Zitat von Graeme Geldenhuys [EMAIL PROTECTED]:
On Thu, Nov 20, 2008 at 1:22 PM, peter green [EMAIL PROTECTED] wrote:
The thing is we can't reasonablly provide functions based on what a user
would see as a character because doing so would require huge lookup tables
(one user visible
Graeme Geldenhuys schreef:
Hello again,
We are seeing more and more hacks being applied to projects trying
to scramble around the missing FPC feature - no built-in Unicode
supporting.
A simple example in Lazarus Loading a UTF-8 encoded file into a TMemo.
Normally you would write code as
Hi,
Is there any list of missing features for UnicodeString in the RTL?
For example:
* I can't seem to find a UnicodeString version of TStrings or TStringList
Any more such cases? I would like to create a RTL UnicodeString
RoadMap, so the missing parts can be know and implemented.
Graeme Geldenhuys schrieb:
Hi,
Is there any list of missing features for UnicodeString in the RTL?
For example:
* I can't seem to find a UnicodeString version of TStrings or TStringList
Any more such cases?
No idea, nobody complainted so far ;) Just create one.
I would like
Or... it could be implemented using generics, so one can choose:
TStringListUnicodeString
TStringListAnsiString
TStringListShortString
(sorry for C++ish syntax, but I hope you understand)
On Thu, Nov 20, 2008 at 15:07, Florian Klaempfl [EMAIL PROTECTED] wrote:
Graeme Geldenhuys schrieb:
Hi,
On Thu, Nov 20, 2008 at 4:10 PM, Aleksa Todorovic [EMAIL PROTECTED] wrote:
Or... it could be implemented using generics, so one can choose:
TStringListUnicodeString
TStringListAnsiString
TStringListShortString
(sorry for C++ish syntax, but I hope you understand)
I somehow managed to skip
Graeme Geldenhuys schrieb:
On Thu, Nov 20, 2008 at 4:10 PM, Aleksa Todorovic [EMAIL PROTECTED] wrote:
Or... it could be implemented using generics, so one can choose:
TStringListUnicodeString
TStringListAnsiString
TStringListShortString
(sorry for C++ish syntax, but I hope you understand)
That must name Convert not Hack
it is same when you work with Ansi version of Lazarus/Delphi and then
try to load from unicode file.
___
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel
As long as the ANSIString and UTF8String and String types are the same to
the compiler this questions does not make too much sense.
Well those all refer to ANSI string types.
What do you mean by this ? These refer to Byte String Types
I was referring to
WideString and UnicodeString
On Thu, Nov 20, 2008 at 4:46 PM, Michael Schnell [EMAIL PROTECTED] wrote:
UTF8 _is_ a Unicode coding and thus UTF8String _should_be_ a Unicode String
type (of course it is not in the current implementation, as the compiler
can't tell it from ANSIString, but that is exactly what we are
I meant TStringList must not make Converting, convert string must be
outside of TStringList (or add special methods to it), and without
detecting the encode inside the file when LoadFromFile or Stream,
Detecting may use Seek function in the stream, and that break load
from tcp/ip connection or
UnicodeString (the type in FPC 2.3.1) is a UTF-16
type,
I was not aware that there is a type with this name. Why does it exist ?
WideString that is not Unicode does not make much sense.
-Michael
___
fpc-devel maillist -
Zaher Dirkey wrote:
I meant TStringList must not make Converting,
If it's known that a file is in some encoding and the instance of
TStringList uses another one, I suppose LoadFromFile needs to do the
re-encoding appropriately.
-Michael
___
Graeme Geldenhuys schrieb:
Hi
How am I supposed to handle unicode characters for locale variables?
All locale variables like ThousandSeparator is type Char and there is
no overloaded UnicodeChar versions. This causes problems in Russian
locales as the example below shows.
c :=
On 11/20/08, Florian Klaempfl [EMAIL PROTECTED] wrote:
Well, this is one of the thousands of little problems which need to be
solved ...
OK, I'll add this to the RoadMap wiki page as well...
Regards,
- Graeme -
___
fpGUI - a cross-platform
On 11/20/08, Michael Schnell [EMAIL PROTECTED] wrote:
UnicodeString (the type in FPC 2.3.1) is a UTF-16
type,
I was not aware that there is a type with this name. Why does it exist ?
WideString that is not Unicode does not make much sense.
I'm new to this, but as far as I understand,
Hi,
I have added a Roadmap section in the following wiki page. If you find
anything missing or not 100% implemented, please add it to the wiki
page.
http://wiki.freepascal.org/FPC_Unicode_support#Roadmap_of_RTL_Unicode_support
This applies to FPC 2.3.x
Regards,
- Graeme -
58 matches
Mail list logo