Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Graeme Geldenhuys
On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
[EMAIL PROTECTED] wrote:

 Who says that? UTF-16 is simply chosen because it has features (supporting
 all characters basically) ANSI doesn't?

Sorry, my message was unclear and I got somewhat mixed up between ANSI
and UTF-8. I meant the encoding type of String or UnicodeString being
UTF-16 instead of UTF-8.  The CodeGear newsgroups are full of people
saying that UTF-16 was chosen because they could call the 'W' api's
without needing a conversion.

My question is, has anybody actually seen the speed difference (actual
timing results) showing UTF-16 string calling 'W' api's compared to
UTF-8-UTF-16 and then calling the 'W' api's.  With today's computers,
I can't imagine that there would be a significant speed loss using
such conversions. The speed difference might be milliseconds, but
that's not really significant speed loss is it?

So has anybody actually done a timing comparision? Do you have your
test code available? Do you have your results published? I'm
interested to see the timing results using different hardware.

I suppose it would be viable doing timing results for saving text
files as well. After all, 99% of the time, text files are stored in
UTF-8. So in D2009 you would first have to convert UTF-16 to UTF-8 and
then save. And the opposite when reading, plus checking for the byte
order marker.  If you used UTF-8 for the String encoding no
conversions are required and no byte order marker checks needed.

Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Daniël Mantione



Op Fri, 26 Sep 2008, schreef Graeme Geldenhuys:


On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
[EMAIL PROTECTED] wrote:


Who says that? UTF-16 is simply chosen because it has features (supporting
all characters basically) ANSI doesn't?


Sorry, my message was unclear and I got somewhat mixed up between ANSI
and UTF-8. I meant the encoding type of String or UnicodeString being
UTF-16 instead of UTF-8.  The CodeGear newsgroups are full of people
saying that UTF-16 was chosen because they could call the 'W' api's
without needing a conversion.

My question is, has anybody actually seen the speed difference (actual
timing results) showing UTF-16 string calling 'W' api's compared to
UTF-8-UTF-16 and then calling the 'W' api's.  With today's computers,
I can't imagine that there would be a significant speed loss using
such conversions. The speed difference might be milliseconds, but
that's not really significant speed loss is it?


I think the main speed issue with UTF-8 is the speed of procedures like 
val. A val which accepts both western and Arabic digits would be 
significantly more complex and therefore slower in UTF-8 than in UTF-16.



I suppose it would be viable doing timing results for saving text
files as well. After all, 99% of the time, text files are stored in
UTF-8. So in D2009 you would first have to convert UTF-16 to UTF-8 and
then save. And the opposite when reading, plus checking for the byte
order marker.  If you used UTF-8 for the String encoding no
conversions are required and no byte order marker checks needed.


For me the speed of input/output is less relevant, this is limited by disk 
speed anyway. It's the speed of processing that should be decisive.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Vincent Snijders

Graeme Geldenhuys schreef:

On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
I suppose it would be viable doing timing results for saving text
files as well. After all, 99% of the time, text files are stored in
UTF-8. 


Where did you get that number (99%) from? I don't think that is true, 
except maybe, if you count all ASCII files as UTF8 too.


Vincent
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Graeme Geldenhuys
On Fri, Sep 26, 2008 at 9:04 AM, Graeme Geldenhuys
[EMAIL PROTECTED] wrote:

 So has anybody actually done a timing comparision? Do you have your
 test code available? Do you have your results published? I'm
 interested to see the timing results using different hardware.


What I'm getting at, is that if FPC implements a UnicodeString based
on UTF-16, compared to UTF-8. It would be nice to base that decision
on educated research and not just a hunch that UTF-16 will be faster
on 1 of the 11 officially supported platforms.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Florian Klaempfl
Graeme Geldenhuys schrieb:
 On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
 [EMAIL PROTECTED] wrote:
 Who says that? UTF-16 is simply chosen because it has features (supporting
 all characters basically) ANSI doesn't?
 
 Sorry, my message was unclear and I got somewhat mixed up between ANSI
 and UTF-8. I meant the encoding type of String or UnicodeString being
 UTF-16 instead of UTF-8.  The CodeGear newsgroups are full of people
 saying that UTF-16 was chosen because they could call the 'W' api's
 without needing a conversion.
 
 My question is, has anybody actually seen the speed difference (actual
 timing results) showing UTF-16 string calling 'W' api's compared to
 UTF-8-UTF-16 and then calling the 'W' api's.  With today's computers,
 I can't imagine that there would be a significant speed loss using
 such conversions. The speed difference might be milliseconds, but
 that's not really significant speed loss is it?

Windows has no utf-8 string processing routines so any case conversion,
comparision whatever needs an utf-8 - utf-16 conversion.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Aleksa Todorovic
On Fri, Sep 26, 2008 at 09:04, Graeme Geldenhuys
[EMAIL PROTECTED] wrote:
 On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
 [EMAIL PROTECTED] wrote:


 I suppose it would be viable doing timing results for saving text
 files as well. After all, 99% of the time, text files are stored in
 UTF-8. So in D2009 you would first have to convert UTF-16 to UTF-8 and
 then save. And the opposite when reading, plus checking for the byte
 order marker.  If you used UTF-8 for the String encoding no
 conversions are required and no byte order marker checks needed.


That is true. But, on the other hand, 99% of your time, your
application will work with string in memory, and only 1% of time will
be spend on I/O. (Ok, this is for normal application, special cases
like databases are special cases anyway). I don't really think that
file encoding is strong argument regarding internal string
representation. When you read text file, it's inevitable that you'll
parse it in some way. And parsing is lot more slower than simple
character conversions.

I support decision of using UTF-16 over UTF-8. String processing is
far more simpler, it's actually as simple as it should be. Have you
ever done any serious processing using UTF-8? It's not nightmare, but
it's surely real pain. No such problems with UTF-16. You don't need to
thing about encodings  conversions all the time.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Graeme Geldenhuys
On Fri, Sep 26, 2008 at 9:12 AM, Daniël Mantione
[EMAIL PROTECTED] wrote:

 For me the speed of input/output is less relevant, this is limited by disk
 speed anyway. It's the speed of processing that should be decisive.

That's highly dependant on what you application does!  If your
application primarily parses text files, it's relevant. :-)


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Florian Klaempfl
Graeme Geldenhuys schrieb:
 On Fri, Sep 26, 2008 at 9:04 AM, Graeme Geldenhuys
 [EMAIL PROTECTED] wrote:
 So has anybody actually done a timing comparision? Do you have your
 test code available? Do you have your results published? I'm
 interested to see the timing results using different hardware.
 
 
 What I'm getting at, is that if FPC implements a UnicodeString based
 on UTF-16, compared to UTF-8. It would be nice to base that decision
 on educated research and not just a hunch that UTF-16 will be faster
 on 1 of the 11 officially supported platforms.

Being honest, imo UTF-8 is only a hack to get unicode on platforms like
unix. Further, processing UTF-16 is much easier, for a lot of
applications faster and for important encodings like chinese more memory
efficient. If UTF-8 was easy to handle, we wouldn't have to convert
everything to UTF-32 on unix to do case conversations, comparisations etc.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Graeme Geldenhuys
On Fri, Sep 26, 2008 at 9:19 AM, Aleksa Todorovic [EMAIL PROTECTED] wrote:
 I support decision of using UTF-16 over UTF-8. String processing is
 far more simpler, it's actually as simple as it should be.

And that's guarenteed to work with surrogate pairs as well? The
problem is, most people assume UTF-16 = UCS2 and never both to check
if surrogate pairs are well supported - irrespective if most languages
incidentally fall in the BMP.

 Have you
 ever done any serious processing using UTF-8? It's not nightmare, but
 it's surely real pain. No such problems with UTF-16. You don't need to
 thing about encodings  conversions all the time.

Well if you have Utf-8 versions of all basic string processing
functions like Pos, Length, Copy, Insert etc you don't have to think
of encoding or anything. fpGUI uses UTF-8 internally, and I never have
to think about what encoding I'm working with. I assume Lazarus LCL is
the same.

Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Florian Klaempfl
Graeme Geldenhuys schrieb:
 On Fri, Sep 26, 2008 at 9:27 AM, Florian Klaempfl
 [EMAIL PROTECTED] wrote:
 Being honest, imo UTF-8 is only a hack to get unicode on platforms like
 unix.
 
 I don't know where you get that information, 

Rather simple: initially in unicode 1.0 there was only a 16 bit encoding.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Marco van de Voort
In our previous episode, Graeme Geldenhuys said:
 Yes I know we have had lengthy discussions about this before.
 Everybody (whoever they might be) keeps saying that UTF-16 was chosen
 for Tiburon's UnicodeString because it makes significant speed gains
 when calling the Windows API based on UTF-16 - compared to the ANSI
 API's. The whole debate goes that you wouldn't need constant
 conversions between ANSI-UTF-16-ANSI.  Now it seems Free Pascal
 developers want to base their design on those results as well (yes,
 plus the whole compatibility thing)

Well, I discussed with Florian and Michael (and Felipe also a bit) what to
do with unicode, and I'm pretty sure speed on Windows API calls wasn't even
mentioned.
 
 Marco Cantu, as far as I can see, is the only one that shows a
 comparison and numbers. Surprisingly, the ANSI calls where faster!

Most of this is based on the old NT books from Butler c.s. where they say
all calls of NT are unicode, and the ascii ones are wrappers. However this
is aeons old, and memory constrains are less, so maybe there are two sets
now. Nobody knows.

As far as Cantu goes, be very,very careful with benchmarking:

Cantu himself says this is due to repainting. Maybe ansistrings with
CP_UTF8 the repainting is also slower. IOW it is unicode widget painting
(any encoding) vs ansi widget painting.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Marco van de Voort
In our previous episode, Aleksa Todorovic said:
  I suppose it would be viable doing timing results for saving text
  files as well. After all, 99% of the time, text files are stored in
  UTF-8. So in D2009 you would first have to convert UTF-16 to UTF-8 and
  then save. And the opposite when reading, plus checking for the byte
  order marker.  If you used UTF-8 for the String encoding no
  conversions are required and no byte order marker checks needed.
 
 That is true. But, on the other hand, 99% of your time, your
 application will work with string in memory, and only 1% of time will
 be spend on I/O. 

This is not true. Working with Database exports (simple transformations,
pump functionality) is a quite normal task for a programmer.

 I support decision of using UTF-16 over UTF-8. String processing is
 far more simpler, it's actually as simple as it should be. Have you
 ever done any serious processing using UTF-8? It's not nightmare, but
 it's surely real pain. No such problems with UTF-16. 

It's no different then UTF-16 if you want to do it properly. In both you
have to look out for surrogates.

All also note that there hasn't been a final decision about UTF-16 only. The
original idea was to have a multi encoding string, but that got stricken
because Tiburon reality crashed in.

Tiburon actually also does this, it has a way of dealing with UTF-8
automated too.

IMHO any system should allow to generally work with strings in the native
encoding. Which means UTF-8 on *nix.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Marco van de Voort
In our previous episode, Florian Klaempfl said:
  On Fri, Sep 26, 2008 at 9:27 AM, Florian Klaempfl
  [EMAIL PROTECTED] wrote:
  Being honest, imo UTF-8 is only a hack to get unicode on platforms like
  unix.
  
  I don't know where you get that information, 
 
 Rather simple: initially in unicode 1.0 there was only a 16 bit encoding.

Problem is that UTF-16 is just the same hack. And they couldn't move to
UTF-32 since it is so memory hungry.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Michael Schnell



Well if you have Utf-8 versions of all basic string processing
functions like Pos, Length, Copy, Insert etc 

s[i] := 'x'; will be especially funny :).

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Michael Schnell



It's no different then UTF-16 if you want to do it properly. In both you
have to look out for surrogates.
  
Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just ignoring 
the surrogates ? (AFAI understand, a Widechar is just 16 bit, it would 
need to be 32 bit if surrogates were allowed in Widestrings).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Graeme Geldenhuys
On Fri, Sep 26, 2008 at 10:43 AM, Michael Schnell [EMAIL PROTECTED] wrote:

 It's no different then UTF-16 if you want to do it properly. In both you
 have to look out for surrogates.


 Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just ignoring the
 surrogates ?

Lets hope not, because then it would be UCS-2 and NOT UTF-16! As far
as I know D2009 (I think) handles this correctly, but I have no idea
how.

 (AFAI understand, a Widechar is just 16 bit, it would need to
 be 32 bit if surrogates were allowed in Widestrings).

Good question and I have been wondering about this myself.  In D2009
SizeOf(Char) = 2, so I have no idea how that works with surrogate
pairs. Can anybody explain this please?


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Daniël Mantione



Op Fri, 26 Sep 2008, schreef Graeme Geldenhuys:


On Fri, Sep 26, 2008 at 9:12 AM, Daniël Mantione
[EMAIL PROTECTED] wrote:


For me the speed of input/output is less relevant, this is limited by disk
speed anyway. It's the speed of processing that should be decisive.


That's highly dependant on what you application does!  If your
application primarily parses text files, it's relevant. :-)


Shortstrings  ansistrings won't go away. You'll still be able to code 
fast text file parsers. Note that in such cases your application won't 
process unicode, taking the numbers example again: As soon as your 
application accepts arabic numbers everywhere western numbers are allowed, 
you want the parsing to happen in UTF-16.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
  It's no different then UTF-16 if you want to do it properly. In both you
  have to look out for surrogates.

 Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just ignoring 
 the surrogates ? 

No different as UTF-8 in principle. Base routines keep surrogate pairs
intact if you don't use them wrongly.

(AFAI understand, a Widechar is just 16 bit, it would 

And a char is 8 -bit the granularity of UTF-8 without surrogates. IOW it is
orthogonal.

 need to be 32 bit if surrogates were allowed in Widestrings).

No it doesn't, Windows supports surrogates, and so does afaik Tiburon. It is
just that they chose the granularity of [] to be the granularity of the
encoding rather than char based.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Ivo Steinmann
Graeme Geldenhuys schrieb:
 On Thu, Sep 25, 2008 at 10:33 PM, Florian Klaempfl
 [EMAIL PROTECTED] wrote:
   
 Who says that? UTF-16 is simply chosen because it has features (supporting
 all characters basically) ANSI doesn't?
 

 Sorry, my message was unclear and I got somewhat mixed up between ANSI
 and UTF-8. I meant the encoding type of String or UnicodeString being
 UTF-16 instead of UTF-8.  The CodeGear newsgroups are full of people
 saying that UTF-16 was chosen because they could call the 'W' api's
 without needing a conversion.

 My question is, has anybody actually seen the speed difference (actual
 timing results) showing UTF-16 string calling 'W' api's compared to
 UTF-8-UTF-16 and then calling the 'W' api's.  With today's computers,
 I can't imagine that there would be a significant speed loss using
 such conversions. The speed difference might be milliseconds, but
 that's not really significant speed loss is it?

 So has anybody actually done a timing comparision? Do you have your
 test code available? Do you have your results published? I'm
 interested to see the timing results using different hardware.

 I suppose it would be viable doing timing results for saving text
 files as well. After all, 99% of the time, text files are stored in
 UTF-8. So in D2009 you would first have to convert UTF-16 to UTF-8 and
 then save. And the opposite when reading, plus checking for the byte
 order marker.  If you used UTF-8 for the String encoding no
 conversions are required and no byte order marker checks needed.

 Regards,
   - Graeme -
   

In the core of all windows nt systems, there's the NT API. The normal
WinAPI is on the top of the NTAPI. the NT API itself uses UTF-16 as
stringtype!

type
  UNICODE_STRING = record
Length: USHORT;
MaximumLength: USHORT;
Buffer: PWSTR;
  end;

const
  FileShareMode = FILE_SHARE_READ or FILE_SHARE_WRITE or FILE_SHARE_DELETE;
var
  str: UNICODE_STRING;  { utf16 type from ntapi }
  attr: OBJECT_ATTRIBUTES;
  io: IO_STATUS_BLOCK;
  ntmode: Integer;
  Handle: longword;
begin
  attr.Length := sizeof(attr);
  attr.RootDirectory := 0;
  attr.Attributes := 0;
  attr.ObjectName := @str;
  attr.SecurityDescriptor := nil;
  attr.SecurityQualityOfService := nil;

  NtOpenFile(@Handle, ntmode, @attr, @io, FileShareMode,
FILE_NON_DIRECTORY_FILE or FILE_SYNCHRONOUS_IO_NONALERT)
end;



So in core, winnt is working with UTF16. All ANSI Winapi functions map
to these winnt calls.

-Ivo Steinmann

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Marco van de Voort
In our previous episode, Dani?l Mantione said:
  That's highly dependant on what you application does!  If your
  application primarily parses text files, it's relevant. :-)
 
 Shortstrings  ansistrings won't go away. You'll still be able to code 
 fast text file parsers. Note that in such cases your application won't 
 process unicode, taking the numbers example again: As soon as your 
 application accepts arabic numbers everywhere western numbers are allowed, 
 you want the parsing to happen in UTF-16.

Accepting both Arabic and Westernized Arabic numerals would possibly break a
lot of code anyway, since to string and back wouldn't be reversible. (it
actually already isn't with Delphi I know, due to hex and padding handling,
but this would be a magnitude worse)

You can't seperate val from str, and what would str(100,s) do?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Daniël Mantione



Op Fri, 26 Sep 2008, schreef Graeme Geldenhuys:


On Fri, Sep 26, 2008 at 10:43 AM, Michael Schnell [EMAIL PROTECTED] wrote:



It's no different then UTF-16 if you want to do it properly. In both you
have to look out for surrogates.



Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just ignoring the
surrogates ?


Lets hope not, because then it would be UCS-2 and NOT UTF-16! As far
as I know D2009 (I think) handles this correctly, but I have no idea
how.


Let me put it like this: Someone writing a Russian/Arabic/Japanese spell 
checker does not have to handle surrogates with UTF-16, but he does with 
UTF-8, i.e. UTF-16 is much better for them than UTF-8.


Someone writing a spell checker for old-Egyptian Hieroglyphs will have to 
deal with surrogates. For those people UTF-16 has few advantages over 
UTF-8, (allthough in practice it's still a bit easier to handle than UTF-8).


Russian, Arabic, Japanese are languages in daily use on computers, 
countless electronic documents in these languages exist. There is a 
huge interrest in software handling it, and therefore it's worth spending 
our valuable time on. Egyptian Hieroglyphs are not worth spending our 
valuable time on.


Some UTF-16 support should come by default, like UTF-8 - UTF-16 
conversion. In many situations it will not be necessary to bother with 
surrogates at all. In some situations we may just accept patches if 
someone is interrested.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Ivo Steinmann
Ivo Steinmann schrieb:

 In the core of all windows nt systems, there's the NT API. The normal
 WinAPI is on the top of the NTAPI. the NT API itself uses UTF-16 as
 stringtype!

 type
   UNICODE_STRING = record
 Length: USHORT;
 MaximumLength: USHORT;
 Buffer: PWSTR;
   end;

 const
   FileShareMode = FILE_SHARE_READ or FILE_SHARE_WRITE or FILE_SHARE_DELETE;
 var
   str: UNICODE_STRING;  { utf16 type from ntapi }
   attr: OBJECT_ATTRIBUTES;
   io: IO_STATUS_BLOCK;
   ntmode: Integer;
   Handle: longword;
 begin
   attr.Length := sizeof(attr);
   attr.RootDirectory := 0;
   attr.Attributes := 0;
   attr.ObjectName := @str;
   attr.SecurityDescriptor := nil;
   attr.SecurityQualityOfService := nil;

   NtOpenFile(@Handle, ntmode, @attr, @io, FileShareMode,
 FILE_NON_DIRECTORY_FILE or FILE_SYNCHRONOUS_IO_NONALERT)
 end;



 So in core, winnt is working with UTF16. All ANSI Winapi functions map
 to these winnt calls.

 -Ivo Steinmann
   

that's the object_attributes type

  OBJECT_ATTRIBUTES = record
Length: ULONG;
RootDirectory: HANDLE;
ObjectName: PUNICODE_STRING;
Attributes: ULONG;
SecurityDescriptor: PVOID;   // Points to type SECURITY_DESCRIPTOR
SecurityQualityOfService: PVOID; // Points to type
SECURITY_QUALITY_OF_SERVICE
  end;


if fpc would use ntapi instead of winapi (maybe it do, no idea) it would
be faster, because there's no overhead at all :)  at least with new
UnicodeString type. ntapi is also quite near to functions you know as
syscalls from unix.

-Ivo
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Graeme Geldenhuys
On Fri, Sep 26, 2008 at 11:11 AM, Ivo Steinmann [EMAIL PROTECTED] wrote:

 So in core, winnt is working with UTF16. All ANSI Winapi functions map
 to these winnt calls.

So then there is already a conversion going on. From ANSI api to
UTF16 api.  I still think (and will try and put together some
benchmark app over the weekend) that the conversion from
UTF8-UTF16-API call is going to be so small that it's hardly
something to talk about. Especially with todays CPU's.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Graeme Geldenhuys
On Fri, Sep 26, 2008 at 11:17 AM, Daniël Mantione
[EMAIL PROTECTED] wrote:

 Russian, Arabic, Japanese are languages in daily use on computers, countless
 electronic documents in these languages exist.

And most documents that exist in the world are in UTF-8 format: Save
to file, HTML documents etc... :-)

 In many situations it will not be necessary to bother with
 surrogates at all. In some situations we may just accept patches if someone
 is interrested.

Oh? So what now - is FPC going to only implement UCS-2 support (like
MSEgui did).


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Jonas Maebe


On 26 Sep 2008, at 10:43, Michael Schnell wrote:

Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just  
ignoring the surrogates ?


At least the Unix widestring manager fully supports surrogates (except  
if you use the MSIDE-patched version, where it has been removed  
because it is considered as unnecessary overhead).



Jonas

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Daniël Mantione



Op Fri, 26 Sep 2008, schreef Marco van de Voort:


In our previous episode, Dani?l Mantione said:

That's highly dependant on what you application does!  If your
application primarily parses text files, it's relevant. :-)


Shortstrings  ansistrings won't go away. You'll still be able to code
fast text file parsers. Note that in such cases your application won't
process unicode, taking the numbers example again: As soon as your
application accepts arabic numbers everywhere western numbers are allowed,
you want the parsing to happen in UTF-16.


Accepting both Arabic and Westernized Arabic numerals would possibly break a
lot of code anyway, since to string and back wouldn't be reversible.


It has never been reversible. Think about val('$100',v);


actually already isn't with Delphi I know, due to hex and padding handling,
but this would be a magnitude worse)


You want to handle it transparently. Otherwise you get a mess like 
that people need all kind of ugly case constructs, having to call a 
different val routine depending on the language the program is shown in. 
That way you never will get good multi-lingual support.


For many people Unicode is just let's go UTF-8. It's far more than that 
and 100% supporting Unicode is even next to impossible.



You can't seperate val from str, and what would str(100,s) do?


It could accept an extra optional parameter for the desired script or 
something like that.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Graeme Geldenhuys
On Fri, Sep 26, 2008 at 11:31 AM, Marco van de Voort [EMAIL PROTECTED] wrote:
 Someone writing a spell checker for old-Egyptian Hieroglyphs will have to
 deal with surrogates. For those people UTF-16 has few advantages over
 UTF-8, (allthough in practice it's still a bit easier to handle than UTF-8).

 IMHO such assumptions can be made for end user businesscode. (and only if the 
 CJK
 pages above $ are ancient and not in modern use), however the RTL and
 other libraries should be simply unicode complaint. Period.

I fully agree In fact, the application developers should even be
bother with encoding types etc. All string functions and string
handling in the RTL should take care of that. [I can say that, because
I have no clue how the FPC  RTL internals works. ;-) ]


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Daniël Mantione



Op Fri, 26 Sep 2008, schreef Marco van de Voort:


In our previous episode, Dani?l Mantione said:

as I know D2009 (I think) handles this correctly, but I have no idea
how.


Let me put it like this: Someone writing a Russian/Arabic/Japanese spell
checker does not have to handle surrogates with UTF-16, but he does with
UTF-8, i.e. UTF-16 is much better for them than UTF-8.


Are you sure? There is a CJK plane above $.


Chinese yes, Japanese is fully BMP.

 Afaik these are non

simplified glyphs used for titles etc. Less than normal script, but not that
rare.


Someone writing a spell checker for old-Egyptian Hieroglyphs will have to
deal with surrogates. For those people UTF-16 has few advantages over
UTF-8, (allthough in practice it's still a bit easier to handle than UTF-8).


IMHO such assumptions can be made for end user businesscode. (and only if the 
CJK
pages above $ are ancient and not in modern use), however the RTL and
other libraries should be simply unicode complaint. Period.


Yes.

Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Marco van de Voort
In our previous episode, Dani?l Mantione said:
 
  Accepting both Arabic and Westernized Arabic numerals would possibly break a
  lot of code anyway, since to string and back wouldn't be reversible.
 
 It has never been reversible. Think about val('$100',v);

See one line further down.
 
  actually already isn't with Delphi I know, due to hex and padding handling,
  but this would be a magnitude worse)
 
 You want to handle it transparently. Otherwise you get a mess like 
 that people need all kind of ugly case constructs, having to call a 
 different val routine depending on the language the program is shown in. 
 That way you never will get good multi-lingual support.

IMHO one should separate GUI val from system val. IMHO it is a presentation
layer problem and should be dealt with there.
 
 For many people Unicode is just let's go UTF-8. It's far more than that 
 and 100% supporting Unicode is even next to impossible.

Correct, but that is what I'm suggesting. UTF-16 is not a cure all either,
only at a first superficial glance. I'm btw not for UTF-8, but for working
in the native encoding per platform.

  You can't seperate val from str, and what would str(100,s) do?
 
 It could accept an extra optional parameter for the desired script or 
 something like that.

If you think that is acceptable, you can also do it for val.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Martin Schreiber
On Friday 26 September 2008 09.34:44 Graeme Geldenhuys wrote:

 Well if you have Utf-8 versions of all basic string processing
 functions like Pos, Length, Copy, Insert etc you don't have to think
 of encoding or anything. fpGUI uses UTF-8 internally, and I never have
 to think about what encoding I'm working with. I assume Lazarus LCL is
 the same.

It seems you prefer utf-8 over utf-16 for internal string encoding in a GUI 
framework. Why?
I prefer utf-16 over utf-8 for MSEide+MSEgui because *all* current users 
(including the Chinese) can use simple string index to access the characters 
of their used languages and almost nobody can use string index to access 
characters in utf-8.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Marco van de Voort
In our previous episode, Martin Schreiber said:
  Well if you have Utf-8 versions of all basic string processing
  functions like Pos, Length, Copy, Insert etc you don't have to think
  of encoding or anything. fpGUI uses UTF-8 internally, and I never have
  to think about what encoding I'm working with. I assume Lazarus LCL is
  the same.
 
 It seems you prefer utf-8 over utf-16 for internal string encoding in a GUI 
 framework. Why?

 I prefer utf-16 over utf-8 for MSEide+MSEgui because *all* current users 
 (including the Chinese) can use simple string index to access the
 characters

See my previous discussion with Daniel. There is a CJK block over $
(afaik containing non-simplified Chinese). Moreover, with Vista there are no
special fonts or East Asia versions needed anymore to use these.

 of their used languages and almost nobody can use string index to access 
 characters in utf-8.

If you do it right, you can't with UTF-16 either. Moreover, you get a split
between the encoding used for GUI (utf-16, as forced by you), and a system
using UTF-8 on e.g. the free unices.

This was originally the reason for FPC to at least support both encodings,
UTF-8 users can for those few routines in their business code where they
must hack something character based together, simply declare those routines
with a forced UTF16 string type, and the system will autoconvert, without
the entire system having to be utf-16.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Graeme Geldenhuys
On Fri, Sep 26, 2008 at 11:46 AM, Martin Schreiber [EMAIL PROTECTED] wrote:
 It seems you prefer utf-8 over utf-16 for internal string encoding in a GUI
 framework. Why?
 I prefer utf-16 over utf-8 for MSEide+MSEgui because *all* current users
 (including the Chinese) can use simple string index to access the characters
 of their used languages and almost nobody can use string index to access
 characters in utf-8.

In my years of experience, string index access is not a requirement.
In the last four years working on our current project I have still not
had a need for string index access. It's a overrated statement as far
as I'm concerned.

UTF8CharAtByte() or UTF8Copy() if needed is fine for me.  And if you
are parsing a string, it happens sequentially anyway, so it's very
easy to track characters in a utf8 string.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Ivo Steinmann
Marco van de Voort schrieb:
  
   
 For many people Unicode is just let's go UTF-8. It's far more than that 
 and 100% supporting Unicode is even next to impossible.
 

 Correct, but that is what I'm suggesting. UTF-16 is not a cure all either,
 only at a first superficial glance. I'm btw not for UTF-8, but for working
 in the native encoding per platform.

   
I guess that would be one of the best solutions. Having a system unicode
string type and then some specialized string types.

SysString
UTF8String
UTF16String
UTF32String



Anyway, I still think something like this would be nice ;) I have got
already an implementation of such a system and I think it's not best
solution (there's no best solution) but it's not a bad one. let's see
what next delphi version brings, but my code works like this:

type
  TMapFunction = function(const Dest: pointer; const Source: Pointer):
integer;

  PEncodings = ^TEncodings;
  TEncodings = record
signsize: integer; // 1,2 or 4
encode: TMapFunction;  // encode some ucs32 string to this encoding
decode: TMapFunction;  // decode this encoding to ucs32 buffer
  end;

const
  MyOwnEncodings: TEncodings = (
Foo: 
Bar: 
  );

type
  SysString = UnicodeString[SystemEncoding]
  UTF8String = UnicodeString[UTF-8]
  UTF16String = UnicodeString[UTF-16]
  MyOwnString = UnicodeString[MyOwnEncodings]


then you can assign all specialized string types to UnicodeString, but
you can't change the encoding of UnicodeString (either it's not
changeable at all or it's locked);

TUnicodeStringRec = record
  Encoding: PEncoding;
  Locked: Boolean; // locked encoding   SetEncoding(S, someEncoding); 
is not possible
  CodeCount: Integer;   // number of signs
  RefCount: Integer;  // refcounter
  Length: Integer; // number of char
  FirstChar: Byte/Word/Longword;
end;


locked encoding is allways true after you assigned a spezialized string
to UnicodeString, eg

S1: UTF8String;
S2: UnicodeString;

S1 := 'foobar';
S2 := S1;
SetEncoding(S2, UTF16);   exception

for fast string processing, it's easy to convert a string to UCS32

S1: UTF8String;
S2: UCS32String;
P: PUCS32Char;

S2 := S1;
for i := 0 to length(S2)  - 1 do
  S2[i] := 'X';
S1 := S2;

or

P := PUCS32Char(S2);
while P^  0 do
begin
  P^ := 'X';
  Inc(P);
end;


-Ivo Steinmann
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?y

2008-09-26 Thread Marco van de Voort
In our previous episode, Martin Schreiber said:
 
 Hmm, you should ask the Russian users for example if they prefer MSEgui 
 utf-16 
 internal encoding or Lazarus utf-8.

Users always look short term, and want to change as little as possible. 

This goes both for UTF-16 (with the is UCS2 approximation and keep the old
ways of string indexing) as for UTF-8 (as superset of ansi, avoidance of
multiple file types (no endianess)).

Note that e.g. source ocde seems to go en masse in the direction of UTF-8
(Even Tiburon, which works exclusively on Windows, an UTF-16 platform, saves
source default to UTF8 afaik).

Anyway, I think a mix of UTF-8 and UTF-16 is here to stay, so better deal
with it. UTF-8 won't go away as legacy anytime soon.

It's the developers responsibility to keep an eye out for the long term
direction of a toolchain.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re[2]: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread JoshyFun
Hello Graeme,

Friday, September 26, 2008, 10:50:43 AM, you wrote:

GG Good question and I have been wondering about this myself.  In D2009
GG SizeOf(Char) = 2, so I have no idea how that works with surrogate
GG pairs. Can anybody explain this please?

I don't know how D2009 and others do it, but for myself I'm using the
approach of use strings in two flavors:

raw mode: String[x]:=Y; where you access each char unit, no
surrogate pairs or packing information is being taking in account.

text mode: Substr, leftstr and alike, where the surogate pairs are
used and processed.

-- 
Best regards,
 JoshyFun

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Marco van de Voort
In our previous episode, Ivo Steinmann said:
  in the native encoding per platform.
 

 I guess that would be one of the best solutions. Having a system unicode
 string type and then some specialized string types.
 
 SysString
 UTF8String
 UTF16String
 UTF32String
 Anyway, I still think something like this would be nice ;)

This originally was the plan. The implementation differed however between
different solutions due to problems with automated conversions.

However it turned out that Tiburon made a different choice, and chose to
keep tunicodestring UTF16 only, and map UTF-8 on ansistring (and add
codepages support to ansistring too)

Since the Tiburon system has the most important required properties, I think
it is useless to invent a different solution.

Btw, IMHO working with mixed encodings should be possible without using
procedures like setencoding, that is hidden manual string handling, which
has no place in an automated system.

IOW the system should be declaritive.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Paul Ishenin

Martin Schreiber wrote:
Hmm, you should ask the Russian users for example if they prefer MSEgui utf-16 
internal encoding or Lazarus utf-8.
  
You are mixing things a bit. People from russian forum prefere less 
bugs. And utf8 implementation of lazarus brought them alot. This is the 
difference.


And btw, I've never heard from them that they dislike utf8. Alhough they 
have problems especially with retrieving ansi data from their databases.


Best regards,
Paul Ishenin.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Graeme Geldenhuys
On Fri, Sep 26, 2008 at 12:34 PM, Marco van de Voort [EMAIL PROTECTED] wrote:
 I guess that would be one of the best solutions. Having a system unicode
 string type and then some specialized string types.

 SysString
 UTF8String
 UTF16String
 UTF32String
 Anyway, I still think something like this would be nice ;)

 This originally was the plan. The implementation differed however between
 different solutions due to problems with automated conversions.

Taking a step back from Free Pascal and Tiburon How do other
frameworks handle string encodings etc... Frameworks like Java, Qt
etc... Can't we learn something from them as well?  Both Java and Qt
run on multiple platforms, read/write to files, do string manipulation
etc  I don't know those frameworks well, but they have huge
developer base and backed by huge companies (with plenty of developers
working on those frameworks). Plus, they have been supporting Unicode
for ages already! I'm sure we can learn something from their
experience.


Regards,
  - Graeme -


___
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Michael Schnell



Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just ignoring the
surrogates ?



Lets hope not, 

I don't think,  full UTF-16 really would be desirable desirable over UC-2.

Imagine you have a string of some million characters (e.g. a Book). All 
functions that need to find the n-th character (like x[n], copy, ...) 
would take forever, as they need to scan the complete string (if not 
widestring is a rather complex tree-like format).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Michael Schnell


  

need to be 32 bit if surrogates were allowed in Widestrings).


How to squeeze a value  $ in a 16 Bit value ?

Can you magically store two bits in a single hardware cell ?

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Daniël Mantione



Op Fri, 26 Sep 2008, schreef Graeme Geldenhuys:



Taking a step back from Free Pascal and Tiburon How do other
frameworks handle string encodings etc... Frameworks like Java, Qt
etc... Can't we learn something from them as well?  Both Java and Qt
run on multiple platforms, read/write to files, do string manipulation
etc  I don't know those frameworks well, but they have huge
developer base and backed by huge companies (with plenty of developers
working on those frameworks). Plus, they have been supporting Unicode
for ages already! I'm sure we can learn something from their
experience.


Both Java  QT use UTF-16 internally.

Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?y

2008-09-26 Thread Martin Schreiber
On Friday 26 September 2008 12.30:27 Marco van de Voort wrote:
 In our previous episode, Martin Schreiber said:
  Hmm, you should ask the Russian users for example if they prefer MSEgui
  utf-16 internal encoding or Lazarus utf-8.

 Users always look short term, and want to change as little as possible.

 This goes both for UTF-16 (with the is UCS2 approximation and keep the
 old ways of string indexing) as for UTF-8 (as superset of ansi, avoidance
 of multiple file types (no endianess)).

 Note that e.g. source ocde seems to go en masse in the direction of UTF-8
 (Even Tiburon, which works exclusively on Windows, an UTF-16 platform,
 saves source default to UTF8 afaik).

As does MSEide, source code is stored in the current locale encoding or utf-8, 
the latter is preferred. MSEgui stores ini files and the like in utf-8, the 
form definition files (*.mfm, the MSEgui equivalent of Delphi *.dfm) are pure 
ASCII.
For DB access MSEgui converts from utf-8 or the current locale encoding to 
utf-16 while fetching the data from server and converts back to utf-8 or the 
locale encoding before writing data to the server. There is a switch in the 
connection and dataset components to select either utf-8 or the locale 
encoding. Strings in the dataset buffer are stored as variable length utf-16 
strings.
All this can be done with the currently available standard FPC 2.2.2 
widestring facilities. I have no problem if the FPC RTL supports the system 
encoding only, MSEgui has the commonly used interface  to the filesystem and 
other services with widestring parameters. If something is missing, it can be 
added to the MSEgui library.
But for internal character encoding where the users must work with, utf-16 is 
better suited than utf-8, I am happy that FPC will support a reference 
counted widestring type in Windows in future releases.

Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Michael Schnell



How do other
frameworks handle string encodings etc
With .NET/Mono I suppose you don't need to bother. But I suppose this is 
one of the reasons that strings are constants once they are assigned 
some value; and you can't so things like s[n] := 'x'.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel



Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
  need to be 32 bit if surrogates were allowed in Widestrings).
  
 How to squeeze a value  $ in a 16 Bit value ?
 
 Can you magically store two bits in a single hardware cell ?

As said before, unicode is more than just expanding the range of characters.
The whole concept of character based parsing must be limited as much as
possible, since aside from encoding related conditions, there are also a lot
of language related issues.

IOW making an app support multiple languages is more than mapping in the
characters.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Marco van de Voort
In our previous episode, Dani?l Mantione said:
  Taking a step back from Free Pascal and Tiburon How do other
  frameworks handle string encodings etc... Frameworks like Java, Qt
  etc... Can't we learn something from them as well?  Both Java and Qt
  run on multiple platforms, read/write to files, do string manipulation
  etc  I don't know those frameworks well, but they have huge
  developer base and backed by huge companies (with plenty of developers
  working on those frameworks). Plus, they have been supporting Unicode
  for ages already! I'm sure we can learn something from their
  experience.
 
 Both Java  QT use UTF-16 internally.

Afaik Java and .NET (C#) also have the feature that for character based
access you need to use a different type (a -builder type). 

This means they can have separate internal encodings for the base string
type, and the chara based editing string types.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] Help on building crosscompiler with fpc 2.3.1

2008-09-26 Thread Lukas Gradl
At the moment, I'm developing on a i386 Linux machine. For some servers 
I need x86_64 binaries, so I have a second machine with x86_64 linux I 
use just for compiling.


It would be great to compile everything on one machine, so I tried to 
build a crosscompiler for x86_64 on my i386 machine. The Wikipage on 
http://wiki.lazarus.freepascal.org/Cross_compiling doesn't work anymore 
- make is complaining about a missing ppcrossx86_64 almost immediatly.


Could anyone point me to the right direction on how to build a 
crosscompiler myself? (either on my x64_86 machine or on my i386 machine...)


regards
Lukas


--

--
software security networks
Lukas Gradl fpc#ssn.at
Eduard-Bodem-Gasse 9
A - 6020 Innsbruck
Tel: +43-512-214040-0
Fax: +43-512-214040-21
--
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Michael Schnell
Nonetheless a type to hold a single character needs to exist. And same 
needs to be a 32 bit type if you want to store more than 2^16 different 
values (as possible with UTF-8 and UTF-16 but not with UCS-2.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Help on building crosscompiler with fpc 2.3.1

2008-09-26 Thread Marco van de Voort
In our previous episode, Lukas Gradl said:
 At the moment, I'm developing on a i386 Linux machine. For some servers 
 I need x86_64 binaries, so I have a second machine with x86_64 linux I 
 use just for compiling.
 
 It would be great to compile everything on one machine, so I tried to 
 build a crosscompiler for x86_64 on my i386 machine. The Wikipage on 
 http://wiki.lazarus.freepascal.org/Cross_compiling doesn't work anymore 
 - make is complaining about a missing ppcrossx86_64 almost immediatly.

Please be more detailed. What command do you run in which directory, with
what version as starting compiler?
 
 Could anyone point me to the right direction on how to build a 
 crosscompiler myself? (either on my x64_86 machine or on my i386 machine...)

It should work more or less, I crosscompiled from linux/x86 to
FreeBSD/x86_64 not that long ago. I did remember a slight deviation (IIRC
something like having to pass the crosscompiler to both FPC and PP variables).

See also http://www.stack.nl/~marcov/buildfaq.pdf for some background info.

I suspected some of the (albeit minor, but potential confusing) differences
come from the biarch support that was added some time ago (-32 and -64).
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Sergei Gorelkin

Graeme Geldenhuys wrote:


Has anybody else got sample test code that clearly shows the claimed
significant speed gain in using UTF-16 for Windows API's?  If so,
could you please post the code and your comparative results (timing
values).  I think most people perception was that ANSI API's will be
slower, but never really bothered to actually proof that it was.


Such testing is pretty much useless, because the speed of any real 
program depends on what this program is doing.
However, since I've done intensive benchmarking while developing the XML 
parser, here are some results:


Parsing an XML file with 1 million chars, 25000 elements and 18000 text 
nodes is about 10% faster for UTF-16 than for UTF-8 (despite the byte 
count is twice bigger).
At the same time, the parsing itself takes only about 30% of total time. 
The rest is spent in the memory manager. The memory manager usage 
pattern also matters: it works faster when you only allocate memory than 
when you allocate, free, then allocate again.


The speed of string conversion itself might be unnoticeable on modern 
CPUs, but remember that each conversion is at least two memory manager 
calls, plus a guarding exception frame.



Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Marco van de Voort
In our previous episode, Michael Schnell said:
  Is UTF-16 Widestring in FPC (and Delphi 200x ? ) not done just ignoring the
  surrogates ?
 
  Lets hope not, 
 I don't think,  full UTF-16 really would be desirable desirable over UC-2.
 
 Imagine you have a string of some million characters (e.g. a Book). All 
 functions that need to find the n-th character (like x[n], copy, ...) 
 would take forever, as they need to scan the complete string (if not 
 widestring is a rather complex tree-like format).

That is a solution to isolate such code and treat it different from the
rest, not to mutilate the unicode standard.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Is calling the Windows Unicode APIs really faster than the ANSI API's?

2008-09-26 Thread Mattias Gaertner
On Fri, 26 Sep 2008 13:20:57 +0200
Michael Schnell [EMAIL PROTECTED] wrote:

 Nonetheless a type to hold a single character needs to exist. And
 same needs to be a 32 bit type if you want to store more than 2^16
 different values (as possible with UTF-8 and UTF-16 but not with
 UCS-2.

Some characters are encoded as several unicode characters. For example
a german a-umlaut is encoded under Mac OS X HFS as 2 characters =
1+2bytes in UTF-8 and 2+2bytes in UTF-16. This is not some Egyptian or
Klingon, but normal German, Finnish, French, etc. A
s[i]:='x' doesn't work in UTF-8, nor UTF-16, nor UTF-32.

In short:
A single character for all purposes can not be defined. Unicode can not
be handled as array of character.

The choice for UTF-8 or UTF-16 depends mostly on the used libraries
and compatibility. The more unicode features you want to support the
less important becomes the encoding.

The encoding can be important for speed:
For example the widestring xml parser is up to 10 times slower than
the ansistring xml parser.

Mattias
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] http..www.stack.nl..~marcov..buildfaq.pdf

2008-09-26 Thread Mehmet Erol Sanliturk






Dear Marco van de Voort ,




Your paper

Development Tutorial (a.k.a Build FAQ)
by Marco van de Voort
May 17 2008


in

http://www.stack.nl/~marcov/buildfaq.pdf


is seen here in

Windows XP
Adobe Reader 8.0
Version 8.0.0

with a very low contrast with letter color ( black )
and background color ( white ) .

Its default display size is % 79.1 .



In that font size it is difficult to read it .

To read it , it is necessary to increase page size to % 150
which is making again difficult to read it because
it is becoming necessary to continuously pan the display window .



When its size is increased to % 1200 every letter
seems in solid black , but when its original size is displayed
letters are nearly indistinguishable from its background .


It seems that the font size is small and letter lines
are very thin due to font size .


I do not know whether it is easy for you or not ,
if it is possible would you increase its contrast
and font size ?


Thank you very much .

Mehmet Erol Sanliturk



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Help on building crosscompiler with fpc 2.3.1

2008-09-26 Thread Lukas Gradl

Marco van de Voort schrieb:

In our previous episode, Lukas Gradl said:
At the moment, I'm developing on a i386 Linux machine. For some servers 
I need x86_64 binaries, so I have a second machine with x86_64 linux I 
use just for compiling.


It would be great to compile everything on one machine, so I tried to 
build a crosscompiler for x86_64 on my i386 machine. The Wikipage on 
http://wiki.lazarus.freepascal.org/Cross_compiling doesn't work anymore 
- make is complaining about a missing ppcrossx86_64 almost immediatly.


Please be more detailed. What command do you run in which directory, with
what version as starting compiler?


I have (for simplifying I use X86_64 to compile i386 as described in the 
wiki):

fpc 2.3.1 built from svn running on a Kubuntu X86_64 machine
(Source got via svn checkout http://svn.freepascal.org/svn/fpc/trunk fpc)

I want:
compiling i386 binaries on that machine

I did (according to http://wiki.lazarus.freepascal.org/Cross_compiling):

created i386-linux-ld and i386-linux-as as described in the Wiki

cd /path-to-fpc-source-root/

make all CPU_TARGET=i386

I get:
Makefile:129: *** Compiler ppcross386 not found.

What I could not find was, how to create that ppcross386. I remember it 
worked with a previous 2.2.0 install, but since I switched to 2.3.1 I 
couldn't get it to work.


regards
Lukas

--

--
software security networks
Lukas Gradl fpc#ssn.at
Eduard-Bodem-Gasse 9
A - 6020 Innsbruck
Tel: +43-512-214040-0
Fax: +43-512-214040-21
--
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Help on building crosscompiler with fpc 2.3.1

2008-09-26 Thread Terry Kemp
On Fri, 2008-09-26 at 19:09 +0200, Lukas Gradl wrote:
 Marco van de Voort schrieb:
  In our previous episode, Lukas Gradl said:
  At the moment, I'm developing on a i386 Linux machine. For some servers 
  I need x86_64 binaries, so I have a second machine with x86_64 linux I 
  use just for compiling.
 
  It would be great to compile everything on one machine, so I tried to 
  build a crosscompiler for x86_64 on my i386 machine. The Wikipage on 
  http://wiki.lazarus.freepascal.org/Cross_compiling doesn't work anymore 
  - make is complaining about a missing ppcrossx86_64 almost immediatly.
  
  Please be more detailed. What command do you run in which directory, with
  what version as starting compiler?
 
 I have (for simplifying I use X86_64 to compile i386 as described in the 
 wiki):
 fpc 2.3.1 built from svn running on a Kubuntu X86_64 machine
 (Source got via svn checkout http://svn.freepascal.org/svn/fpc/trunk fpc)
 
 I want:
 compiling i386 binaries on that machine
 
 I did (according to http://wiki.lazarus.freepascal.org/Cross_compiling):
 
 created i386-linux-ld and i386-linux-as as described in the Wiki
 
 cd /path-to-fpc-source-root/
 
 make all CPU_TARGET=i386
 
 I get:
 Makefile:129: *** Compiler ppcross386 not found.
 
 What I could not find was, how to create that ppcross386. I remember it 
 worked with a previous 2.2.0 install, but since I switched to 2.3.1 I 
 couldn't get it to work.
 
 regards
 Lukas
 

For arm I needed to add FPC=fpc at end of command.

make all CPU_TARGET=i386 FPC=[path to fpc executable] (if not in path).

Seems it looks for the crosscompiler to exist first? because if it does
you dont need this - but it will make a new one. go figure.

Terry




___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Help on building crosscompiler with fpc 2.3.1

2008-09-26 Thread German Gentile
I have the same proble, ubuntu 64 and need to compiler for i386 and
x86_64. Theres some tutorial or somebody doing that to give us a litle
help?

Thanks

Donald shimoda

http://donaldshimoda.blogspot.com

2008/9/26 Lukas Gradl [EMAIL PROTECTED]:
 Marco van de Voort schrieb:

 In our previous episode, Lukas Gradl said:

 At the moment, I'm developing on a i386 Linux machine. For some servers I
 need x86_64 binaries, so I have a second machine with x86_64 linux I use
 just for compiling.

 It would be great to compile everything on one machine, so I tried to
 build a crosscompiler for x86_64 on my i386 machine. The Wikipage on
 http://wiki.lazarus.freepascal.org/Cross_compiling doesn't work anymore -
 make is complaining about a missing ppcrossx86_64 almost immediatly.

 Please be more detailed. What command do you run in which directory, with
 what version as starting compiler?

 I have (for simplifying I use X86_64 to compile i386 as described in the
 wiki):
 fpc 2.3.1 built from svn running on a Kubuntu X86_64 machine
 (Source got via svn checkout http://svn.freepascal.org/svn/fpc/trunk fpc)

 I want:
 compiling i386 binaries on that machine

 I did (according to http://wiki.lazarus.freepascal.org/Cross_compiling):

 created i386-linux-ld and i386-linux-as as described in the Wiki

 cd /path-to-fpc-source-root/

 make all CPU_TARGET=i386

 I get:
 Makefile:129: *** Compiler ppcross386 not found.

 What I could not find was, how to create that ppcross386. I remember it
 worked with a previous 2.2.0 install, but since I switched to 2.3.1 I
 couldn't get it to work.

 regards
 Lukas

 --

 --
 software security networks
 Lukas Gradl fpc#ssn.at
 Eduard-Bodem-Gasse 9
 A - 6020 Innsbruck
 Tel: +43-512-214040-0
 Fax: +43-512-214040-21
 --
 ___
 fpc-devel maillist  -  fpc-devel@lists.freepascal.org
 http://lists.freepascal.org/mailman/listinfo/fpc-devel

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] c:\fpc\2.2.2\examples\gtk1 and gtk2 : ReadMe.TXT Additions

2008-09-26 Thread Mehmet Erol Sanliturk



Dear Sirs ,


In my previous message

http://www.mail-archive.com/fpc-devel@lists.freepascal.org/msg12236.html


I mentioned required DLL files .


(A)


The following sample ReadMe.TXT file
may be inserted into the following directory :


http://svn.freepascal.org/svn/fpc/trunk/packages/gtk1/examples/


ReadMe.TXT

---


The following libraries are needed to run the
example programs supplied in this directory :


In Windows :


libintl-1.dll
libgdk-0.dll
libgtk-0.dll
libglib-2.0-0.dll

gtkgl.dll ( required only for gtkgldemo.pp )



In FreeBSD :


In GNU/Linux :


---



(B)


The following sample ReadMe.TXT file
may be inserted into the following directory :


http://svn.freepascal.org/svn/fpc/trunk/packages/gtk2/examples/


ReadMe.TXT

---


The following libraries are needed to run the
example programs supplied in this directory :


In Windows :

libgdkglext-win32-1.0.0.dll
libglade-2.0-0.dll


In FreeBSD :


In GNU/Linux :


---



(C)


For the program

c:\fpc\2.2.2\examples\gtk2\gtk_demo\gtk_demo.exe

I mentioned the zero division exception .


Dear Paul Ishenin notified me about a Windows bug in his message :

http://www.mail-archive.com/fpc-devel@lists.freepascal.org/msg12237.html


as


 This is a known bug on windows.

Use the folowing call in your application initialization section:

Set8087CW($133F); 


Into

c:\fpc\2.2.2\examples\gtk2\gtk_demo\gtk_demo.pas

I added the above statement as

  .
  .
  .

var
 window,
 notebook,
 hbox,
 tree   : PGtkWidget;

begin


{$ifdef win32}

 Set8087CW($133F);


{$else}

{$endif}


 current_file := NULL;

  .
  .
  .


The program worked very well .


Addition of the following statement


{$ifdef win32}

 Set8087CW($133F);

{$else}

{$endif}


into


http://svn.freepascal.org/svn/fpc/trunk/packages/gtk2/examples/gtk_demo/gtk_demo.pas


will prevent zero division exception in Windows applications .




Into the directory


http://svn.freepascal.org/svn/fpc/trunk/packages/gtk2/examples/gtk_demo/


the following sample ReadMe.TXT file may be added :


ReadMe.TXT

---


The following statement in Windows
prevents zero division exception :


{$ifdef win32}

 Set8087CW($133F);

{$else}

{$endif}


---



Thank you very much ,

Mehmet Erol Sanliturk

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel