Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Graeme Geldenhuys
On 14/09/2011 03:56, Luiz Americo Pereira Camara wrote:
 
 I propose that the above behavior be implemented as a type named RTLString

The Object Pascal language already has enough damn string types. I
really don't think we should be adding fuel to the fire, by adding yet
more string types!


 So the RTL under unix will have functions compiled with UTF8 strings 
 giving no overhead interacting with native API
 The RTL under Windows will have compiled functions with UTF16 strings 
 giving no overhead with native API

That's exactly what I said.


 If a program is pass a UnicodeString to a RTL function under Windows no 
 conversion is made
 When this same program is compiled under unix the UnicodeString should 
 be converted to UTF8 automatically using the encoding info of the string

No, why must unix environments take a performance hit?? This is not
needed if UnicodeString is really what the same suggests. Any unicode
type string. Unicode standard is defined as UTF-8, UTF-16 and UTF-32. So
UnicodeString should really be any of those encodings - living up to
it's name.

If FPC has true unicode support, then all functions should work correct
with just the UnicodeString type. That type's encoding is based on the
native encoding of each platform. NO performance hit required.


I'd even be happier if UnicodeString was dropped too, and String becomes
unicode enabled. One less string type to worry about.

String could be define as follows... [ignore the syntax]

IFDEF unix
   String = String(utf8);
ENDIF
IFDEF windows
  String = String(utf16)
ENDIF
IFDEF OldDelphi
  String = AnsiString  //  of if some String(xxx) could be used
ENDIF


Then if you wanted your project to use some other specific encoding,
then you can simply define your own string type and use that. The
various string types know what encoding they are in, so auto-conversion
is possible too (with possibility of data loss in case of unicode - ansi)
eg:
type
  { say I want to use UTF-32 in my apps for some reason }
  TfpgString = String(utf32);

var
   s: String;  //  as defined above - could be utf8, utf16 etc..
   m: TfpgString;
   a: AnsiString;
begin
  m := 'Hello world!';
  s := m;  // automatic conversion happens here
  a := s;  // auto conversion, with data loss (compiler warning)
end;


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Felipe Monteiro de Carvalho
On Tue, Sep 13, 2011 at 9:23 PM, Michael Van Canneyt
mich...@freepascal.org wrote:
 Current strategy on fpc core seems to be to have 2 RTLs:

 One with unicode string, one with ansistring.

Isn't that somewhat nasty for people currently using UTF-8?

I mean, lets say that we can divide everyone using FPC into 3 groups:

1st People using ansi that don't want to change any line of code -
They get a path forward with this proposal, even if temporary (the
Ansi half of the RTL really seams like the definition of deprecated to
me)
2nd People using UTF-8 - They get no love at all and can choose from
using the old RTL with no Unicode and put some tape to fix some holes
or migrate to something incompatible.
3rd People that want to use UTF-16 - They get a new RTL to move forward

But how many percent of FPC users, libraries and applications are on each group?

1st I really can't imagine anyone who would want to stay stuck to the
pre-Unicode world forever...
2nd The vast majority of users, libraries and applications through Lazarus
3rd msegui and possibly Delphi 2009+ users

Lazarus is by far the most widely way to use FPC, so I would guess
that the group 2 has more then 75% of all users, and still it gets no
love at all. Which real path forward is provided for these users?

Of course one path is migrating everything, the LCL, the IDE, SynEdit,
all packages, etc, to UTF-16, but that's a huge, immense work with
zero advantages over what we are doing up to now, it's just migrate to
migrate, who will be motivated to do that? My point is that it is not
very reasonable to migrate so much working code for no advantage at
all, so the Unicode RTL could provide something to easy interfacing
with UTF-8, for example:

* overloaded versions of routines and methods for utf8string
* A TStrings and TStringList for utf8

These would need to be ifdefed so they are not present in the Ansi
RTL. Without even a TStrings for utf-8 one cannot really expect
Lazarus to be able to use the Unicode URL without doing a full
migration to UTF-16 ...

My final point is just: why not? If code in the RTL could fix things
for Lazarus why impose the need to migrate so much working code?

If the Unicode RTL provides UTF-8 support too then Lazarus projects
could be migrated by just doing 2 things:

1 Change all places which use TStrings and TStringList to
TStringsUTF8 and TStringListUTF8
2 Change all places which add utf-8 to ansi conversions to the RTL
with no conversion at all

On the other hand if we have no path forward except for migrating to
UTF-16 I can imagine we will still be talking about how to move
forward in 5 years from now...

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Felipe Monteiro de Carvalho
On Wed, Sep 14, 2011 at 5:50 AM, Martin Schreiber mse00...@gmail.com wrote:
 Linux expects an array of bytes in filenames (no encoding, no utf-8) AFAIK.

That's a nice theory, but:

All Linux distributions that I know use utf-8
Android uses utf-8
Meego uses utf-8

So, do you have any concrete example of new releases of Linux using
something different from UTF-8 for filenames?

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread michael . vancanneyt



On Wed, 14 Sep 2011, Felipe Monteiro de Carvalho wrote:


On Tue, Sep 13, 2011 at 9:23 PM, Michael Van Canneyt
mich...@freepascal.org wrote:

One with unicode string, one with ansistring. They will have the same code,
but will be compiled twice, each time with a different compiler define to
decide which version it must be.


Is this possible in UNIX? I can see that in Windows you can use the
trick to use W versions which are identical except for the string type
and drop Windows 9x support, but is this really possible for the UNIX
syscalls? They expect UTF-8 not UTF-16 which is what UnicodeString
uses.


And why would this not be possible ?

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread michael . vancanneyt



On Wed, 14 Sep 2011, Felipe Monteiro de Carvalho wrote:


On Tue, Sep 13, 2011 at 9:23 PM, Michael Van Canneyt
mich...@freepascal.org wrote:

Current strategy on fpc core seems to be to have 2 RTLs:

One with unicode string, one with ansistring.


Isn't that somewhat nasty for people currently using UTF-8?


No, why do you think so ?

They should use the unicode version. All will work as-is.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Felipe Monteiro de Carvalho
On Wed, Sep 14, 2011 at 8:59 AM,  michael.vancann...@wisa.be wrote:
 No, why do you think so ?

Well, at the very least:

1 All var parameters from the RTL will no longer be directly usable
with UTF-8 strings

http://www.freepascal.org/docs-html/rtl/sysutils/appendstr.html

How can I pass a UTF-8 string to AppendStr in the Unicode RTL?

The Example62 from the docs will no longer compile =D

2 TStrings will be in a different encoding from the rest of the LCL,
this will surely be very nasty.
MyForm.Caption := MyStrings.Strings[9]; and you get an encoding conversion...

Basically it will be a salad of automatic conversions done by the compiler...

3 FileOpen(MyForm.Caption, whatever_mode);
You get first utf-8 - utf-16 to call FileOpen and then FileOpen does
utf-16-utf-8 on UNIXes

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Mattias Gaertner
On Wed, 14 Sep 2011 08:50:22 +0200
Felipe Monteiro de Carvalho felipemonteiro.carva...@gmail.com wrote:

 On Wed, Sep 14, 2011 at 5:50 AM, Martin Schreiber mse00...@gmail.com wrote:
  Linux expects an array of bytes in filenames (no encoding, no utf-8) AFAIK.
 
 That's a nice theory, but:

It's more than theory.
You can use file names under Linux that are no valid UTF-8.
At work I see it every week.


 All Linux distributions that I know use utf-8
 Android uses utf-8
 Meego uses utf-8
 
 So, do you have any concrete example of new releases of Linux using
 something different from UTF-8 for filenames?


Mattias
 
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Graeme Geldenhuys
On 13/09/2011 21:23, Michael Van Canneyt wrote:
 Current strategy on fpc core seems to be to have 2 RTLs:
 
 One with unicode string, one with ansistring. 

Can you clarify a bit. When you say unicode string to you mean UTF-16
(Delphi's definition of a unicode string), or do you mean a Unicode
string in the true sense - it can be utf-8 or utf-16 etc depending on
the platform's native encoding.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] who can explain why array of const can't be passed to another array of const

2011-09-14 Thread Jonas Maebe


On 14 Sep 2011, at 04:15, Paul Ishenin wrote:

If I change cdecl to stdcall in g_object_dosomething then it  
compiles with no error.


For me it is strange. Should developer care about internal compiler  
representation of an array of const for different conventions?


It's more that even though both are called array of const, they are  
completely different things. They also don't support the same types.



Imo this is a compiler task.

I've checked the same on delphi XE and there it compiles.

So whether this is 1)a bug 2)unimplemented feature 3)desired  
compiler behavior?



You could say it is an unimplemented feature, but implementing it  
would require a lot of assembler code that's different for every  
architecture (and in some cases also for different OSes, since not all  
OSes use the same ABI and the ABI defines how C varargs must be  
passed). It is not a bug since the error message is given on purpose.



Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Martin Schreiber

Am 14.09.2011 07:50, schrieb Felipe Monteiro de Carvalho:

On Wed, Sep 14, 2011 at 5:50 AM, Martin Schreibermse00...@gmail.com  wrote:

Linux expects an array of bytes in filenames (no encoding, no utf-8) AFAIK.


That's a nice theory, but:

All Linux distributions that I know use utf-8
Android uses utf-8
Meego uses utf-8

So, do you have any concrete example of new releases of Linux using
something different from UTF-8 for filenames?

Some Samba shares for example and there still are many old Linux 
systems in the wild. Anyway, I simply wanted to remember the fact.


Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Sven Barth

On 14.09.2011 09:08, Martin Schreiber wrote:

Am 14.09.2011 07:50, schrieb Felipe Monteiro de Carvalho:

On Wed, Sep 14, 2011 at 5:50 AM, Martin Schreibermse00...@gmail.com
wrote:

Linux expects an array of bytes in filenames (no encoding, no utf-8)
AFAIK.


That's a nice theory, but:

All Linux distributions that I know use utf-8
Android uses utf-8
Meego uses utf-8

So, do you have any concrete example of new releases of Linux using
something different from UTF-8 for filenames?


Some Samba shares for example and there still are many old Linux
systems in the wild. Anyway, I simply wanted to remember the fact.


Another good example: FAT. I'm now as far to avoid umlauts and such when 
I copy files from Linux to FAT or the other way round, because with the 
default mount settings they are invalid characters in one of the two... 
(and I didn't yet bother to fiddle around with that ^^)


Regards,
Sven
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread michael . vancanneyt



On Wed, 14 Sep 2011, Felipe Monteiro de Carvalho wrote:


On Wed, Sep 14, 2011 at 8:59 AM,  michael.vancann...@wisa.be wrote:

No, why do you think so ?


Well, at the very least:

1 All var parameters from the RTL will no longer be directly usable
with UTF-8 strings

http://www.freepascal.org/docs-html/rtl/sysutils/appendstr.html

How can I pass a UTF-8 string to AppendStr in the Unicode RTL?

The Example62 from the docs will no longer compile =D


That depends on what the compiler will do for you :-)



2 TStrings will be in a different encoding from the rest of the LCL,
this will surely be very nasty.
MyForm.Caption := MyStrings.Strings[9]; and you get an encoding conversion...


That will always be the case, even if we decided for UTF-16.



Basically it will be a salad of automatic conversions done by the compiler...


This will be so in each case where different codepages or encodings are used.


3 FileOpen(MyForm.Caption, whatever_mode);
You get first utf-8 - utf-16 to call FileOpen and then FileOpen does
utf-16-utf-8 on UNIXes


Once more, why do you think so ?

In each case:

1. It will be messy whatever we do.
   Thinking there is an easy migration path is wishful thinking.

2. Backwards compatibility is a big concern.
   Code that compiled and worked should compile and work.

Michael.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] who can explain why array of const can't be passed to another array of const

2011-09-14 Thread Alexander Klenin
On Wed, Sep 14, 2011 at 19:03, Jonas Maebe jonas.ma...@elis.ugent.be wrote:
 It's more that even though both are called array of const, they are
 completely different things. They also don't support the same types.

Perhaps varargs-compatible parameter type should be called something else then?

-- 
Alexander S. Klenin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Felipe Monteiro de Carvalho
On Wed, Sep 14, 2011 at 9:45 AM, Mattias Gaertner
nc-gaert...@netcologne.de wrote:
 It's more than theory.
 You can use file names under Linux that are no valid UTF-8.
 At work I see it every week.

In this case then for sure we cannot only have file routines only in
UTF-16, because that would make it impossible to identify many files
in Linux...

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] who can explain why array of const can't be passed to another array of const

2011-09-14 Thread Jonas Maebe


On 14 Sep 2011, at 10:40, Alexander Klenin wrote:

On Wed, Sep 14, 2011 at 19:03, Jonas Maebe  
jonas.ma...@elis.ugent.be wrote:

It's more that even though both are called array of const, they are
completely different things. They also don't support the same types.


Perhaps varargs-compatible parameter type should be called something  
else then?


Both backwards and Delphi compatibility stand in the way of that.


Jonas
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Felipe Monteiro de Carvalho
On Wed, Sep 14, 2011 at 10:46 AM,  michael.vancann...@wisa.be wrote:
 Can you clarify a bit. When you say unicode string to you mean UTF-16
 (Delphi's definition of a unicode string), or do you mean a Unicode
 string in the true sense - it can be utf-8 or utf-16 etc depending on
 the platform's native encoding.

 This has not yet been decided.

IMHO a platform-dependent string would be the worse solution of all
... far worse then migrating to UTF-16.

It adds tiny bit of speed while it puts a large development complexity
burdain ... I imagine how one would explain that kind of thing to
newbies ...

Just recently I had a student from my university implement a routine
which converts HTML text from utf-8 to braille in utf-8 ... I didn't
have to explain anything and she could implement it without Pascal
previous experience. I wonder if I had to say: ops, the main string
type is unknown =D To do any operation on it you need to first convert
to something known and then convert back to unknown, while unknown
conversions might take place ...

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Marco van de Voort
In our previous episode, Felipe Monteiro de Carvalho said:
 
 Following from a discussion on mac-pascal, I'd like to propose a
 solution for Unicode support.

First and for all. Backwards compat dropping is not going to happen. If we
were planning that, we had changed everything to something unicode years
ago.
 
 Function FileOpen (Const FileName : utf8string; Mode : Integer) :
 THandle; overload;
 Function FileOpen (Const FileName : unicodestring; Mode : Integer) :
 THandle; overload;

 and similarly for other places and everyone should be happy.

This is not a solution. This is a temporary hack to alieve some perceived
Lazarus pain, and doesn't fix my main gripe of the manual conversions
everywhere. It is is a hack for 0.01% of the unicode problem.

IMHO the objective should be to mimize manual conversions (and with that the
fact that generic code becomes encoding specific).

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Luiz Americo Pereira Camara

On 14/9/2011 03:40, Graeme Geldenhuys wrote:

On 14/09/2011 03:56, Luiz Americo Pereira Camara wrote:

I propose that the above behavior be implemented as a type named RTLString

The Object Pascal language already has enough damn string types. I
really don't think we should be adding fuel to the fire, by adding yet
more string types!



AFAIK RTLString already exist in the cpstr branch. Anyway is just a 
alias to a real type.


[]

String could be define as follows... [ignore the syntax]

IFDEF unix
String = String(utf8);
ENDIF
IFDEF windows
   String = String(utf16)
ENDIF
IFDEF OldDelphi
   String = AnsiString  //  of if some String(xxx) could be used
ENDIF



This is not desirable simply because at each platform (windows / unix) 
the user code of the same program will have a different encoding 
increasing the possibility of subtle errors. Some functions like string 
streaming requires the same encoding between platforms otherwise it will 
require code change to work properly.


Another advantage of using RTLString as i proposed is that Lazarus will 
require almost no code change since the encoding of string in LCL will 
be the same (UTF8) across platforms. The conversion will take place only 
when interacting with the RTL


Luiz
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


the feature request, that started the discussion [Re: Adding properties into existing stabs/dwarf; gdb readable workaround ? [[Re: [fpc-devel] Status and ideas about debug info (stabs, dwarf / dwar3)

2011-09-14 Thread Martin
Unfortunately, once about 2 mails are exchanged on the subject of what I 
actually tried to talk about, the whole discussion takes off and all 
kind of debugger woes are included


So back again:
I am trying to find out, if the below could make a reasonable feature 
request (and therefore have a chance to be implemented in FPC)

And if it does = should I put in on mantis.

I believe, Joost may actually have started to look at the requirements, 
since he enquired about gdb and method execution?


So some points, that I would like to know:

1) I believe the general idea, of making a
 property Counter: Integer read GetCounter
be encoded as a function of the object( in the same way as GetCounter 
already is) is acceptable?

- So field properties are returning the field
- Getter properties are depending on GDBs  ability to execute functions.

2) Execution of that properties. (getter)
I understand it depends on GDB, and FPC can probably not affect it much.

As far as the dwarf debug info can have an influence (if at all), it 
would be nice, if execution was NOT automatic.

e.g NONE of those would execute  (property List: TList read GetList)
  Foo.List
  Foo.List.Counter
The following may or may not:
  Foo.List().Counter

3) Any hint that a symbol is a property, not a field or function 
(despite it being encoded as field or function?
I know there is an desire not to have any hacks/workarounds in FPC, and 
I understand the reasons.


Yet, I was hoping, IF available, and effort is minimal, is there any 
chance at all?


As i said, i don't know if DW_AT_sibling  for example can be used (I 
included the dwarf spec below). It looks to me like it is a hint that 
can be used at the desire of the compiler (debug info provider): IF ... 
FEELS ... If using this flag does not conflict, or abuse the dwarf 
specs, then maybe it could be used?


Even if gdb does not show it, it would mean that later means of access 
may exists, and the info is there, and an IDE can at least tell this is 
a property




from dwarf 3 specs:
In cases where a producer of debugging information feels that it will 
be important for consumers of that information to quickly scan chains 
of sibling entries, while ignoring the children of individual 
siblings, that producer may attach a DW_AT_sibling attribute to any 
debugging information entry. The value of this attribute is a 
reference to the sibling entry of the entry to which the attribute is 
attached



On 12/09/2011 21:13, Martin wrote:

On 12/09/2011 20:46, Joost van der Sluis wrote:

On Mon, 2011-09-12 at 20:31 +0200, Jonas Maebe wrote:

On 12 Sep 2011, at 20:20, Martin wrote:
Could not properties mapping to a function be implemented the same 
way =  normal functions are already listed in ptype so

  public
  property Counter: Integer read GetCounter
could appear the same as the function GetCounter ?

In that case at least the list of available symbols is complete. 
The only thing that then would need codetools involved was to 
check if the name is a property and not a function/field.

That may be possible, yes.

What is it that we actually need? At the Dwarf-level:

Is the information that a property actually has a getter, and the name
of that getter enough?

Or do we want that when the value of a property is asked, the getter is
called automagically? (And that there is some kind of flag that
indicates that a getter is being used?) I don't think that we can add a
stack-script in the DW_AT_Location that executes the getter. I've looked
at DW_OP_call, but that won't help us here.

Or, and maybe this is the best solution: some 'opaque' type that returns
a reference to something else. Which can be different for reading and
writing values...



There are 2 conflicting desires.

-data-evaluate-expression FooObject.BarObjProp.BarValue
ptype FooObject  / ptype FooObject.BarObjProp

The first only works, ( at current) if it is a field, not a getter 
function. IMHO that is ok.


While alot of people do want code execution for properties, there must 
be a mean of control (in the front end, e.g lazarus). Even if that was 
enabled by default.
That means, I would like that gdb does *not* automatically call the 
function.


So for data evaluation we are fine.
If it is a function, the expression fails, and the IDE needs to look 
into it.


Well having said that. If the function was only called, if brackets 
are supplied, maybe.

-data-evaluate-expression FooObject.BarObjProp().BarValue

But it is not a must. I am not even sure if desirable.


the 2nd issue is knowledge that
a) a there is something in the object under the name of the property
b) this something happens to be a property

a) is already fulfilled if it is a field-property. Hence I asked, if 
functions could be added the same way.

-data-evaluate-expression FooObject.GetCounter
currently gets no value
-data-evaluate-expression FooObject.Counter
gives an error, no symbol

if Counter could be the same as GetCounter (making it 

Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Luiz Americo Pereira Camara

On 14/9/2011 03:48, Felipe Monteiro de Carvalho wrote:

[..]

Of course one path is migrating everything, the LCL, the IDE, SynEdit,
all packages, etc, to UTF-16, but that's a huge, immense work with
zero advantages over what we are doing up to now, it's just migrate to
migrate, who will be motivated to do that? My point is that it is not
very reasonable to migrate so much working code for no advantage at
all, so the Unicode RTL could provide something to easy interfacing
with UTF-8, for example:

* overloaded versions of routines and methods for utf8string
* A TStrings and TStringList for utf8



Using the approach i described (RTLString) in other mail this (massive 
LCL code change) is not required. Probably just load from file 
functions like TStrings etc.


Lazarus/LCL could stay as is (UTF8) and would work as today:

Under unix: no conversion is done since the LCL and RTL encodings are 
the same

Under Windows: conversion UTF8 - RTLString (UTF16) is done once


These would need to be ifdefed so they are not present in the Ansi
RTL. Without even a TStrings for utf-8 one cannot really expect
Lazarus to be able to use the Unicode URL without doing a full
migration to UTF-16 ...

My final point is just: why not? If code in the RTL could fix things
for Lazarus why impose the need to migrate so much working code?



Because if someone for some reason, like porting Delphi code, stays with 
a UTF16 string, under windows, when using RTL functions TWO conversions 
will be made:


User Code (UTF16)  RTL (UTF8)  WINAPI (UTF16)

Always using the same encoding in RTL and Native API will keep the 
maximum conversion number at 01


Luiz



___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Flávio Etrusco
On Wed, Sep 14, 2011 at 6:04 AM, Felipe Monteiro de Carvalho
felipemonteiro.carva...@gmail.com wrote:
 On Wed, Sep 14, 2011 at 10:46 AM,  michael.vancann...@wisa.be wrote:
 Can you clarify a bit. When you say unicode string to you mean UTF-16
 (Delphi's definition of a unicode string), or do you mean a Unicode
 string in the true sense - it can be utf-8 or utf-16 etc depending on
 the platform's native encoding.

 This has not yet been decided.

 IMHO a platform-dependent string would be the worse solution of all
 ... far worse then migrating to UTF-16.

 It adds tiny bit of speed while it puts a large development complexity
 burdain ... I imagine how one would explain that kind of thing to
 newbies ...


Why would the internal enconding of a RTLString/UnicodeString have to
affect any effect how you program if the RTL/API is done right?

-Flávio
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Felipe Monteiro de Carvalho
On Wed, Sep 14, 2011 at 11:32 AM, Luiz Americo Pereira Camara
luiz...@oi.com.br wrote:
 Because if someone for some reason, like porting Delphi code, stays with a
 UTF16 string, under windows, when using RTL functions TWO conversions will
 be made:

 User Code (UTF16)  RTL (UTF8)  WINAPI (UTF16)

This would not happen because I proposed to have 2 versions of the
routines in the RTL. Not 1 UTF-8 version. There would be both UTF-8
and UTF-16 versions and one would naturally use the one which matches
his preferred encoding ... and the RTL would only convert the
non-native version.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Mattias Gaertner

 


Felipe Monteiro de Carvalho felipemonteiro.carva...@gmail.com hat am 14.
September 2011 um 10:51 geschrieben:

 On Wed, Sep 14, 2011 at 9:45 AM, Mattias Gaertner
 nc-gaert...@netcologne.de wrote:
  It's more than theory.
  You can use file names under Linux that are no valid UTF-8.
  At work I see it every week.

 In this case then for sure we cannot only have file routines only in
 UTF-16, because that would make it impossible to identify many files
 in Linux... 
Well, many is a bit exaggerated. It does not happen on most Linux systems.
And, yes, UTF-16 is not enough under Linux. 
But this is nothing new. This was explained several times on this list. See the
many threads about unicode strings. 
 
Mattias
 ___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Graeme Geldenhuys
On 14/09/2011 11:19, Luiz Americo Pereira Camara wrote:
 This is not desirable simply because at each platform (windows / unix) 
 the user code of the same program will have a different encoding 
 increasing the possibility of subtle errors.

Why? Not every program is a text manipulation program or text parser.
Most programs simply assign one string to another.

eg:

   Button1.Caption := 'Click me';
   lMyString := Button1.Caption;


Under unix systems 'Click me', Button1.Caption and lMyString will be a
UTF-8 encoded. Under Windows 'Click me', Button1.Caption and lMyString
will be UTF-16 encoding.

When Lazarus saves this information in a .lfm file, it will be stored as
UTF-8 irrespective of the platform. This is normal behaviour on all
platforms already, and already done in Lazarus too.

As for streaming, the same applies as for saving to file. UTF-8 is
ideally suited for (and was designed for simplifying) streaming, hence
the W3C promotes the usage of UTF-8 in HTML, XML etc.


 Another advantage of using RTLString as i proposed is that Lazarus will 
 require almost no code change since the encoding of string in LCL will 
 be the same (UTF8) across platforms.

Lazarus, like fpGUI will have to decide what they want to do. Stick to
having UTF-8 forced on all platforms, or use a native encoding on each
platform. Currently UTF-8 was choosen in both project because it is so
compatible (think easy here) with AnsiString - so least amount of work
was required and it was pretty efficient because most programs already
used AnsiString.

If I was to change fpGUI to use a native encoding on each platform, I
would simply change my definition of TfpgString as described in a
similar example before. All string manupulation inside fpGUI (and LCL)
should already have adhered to the rule that 1 byte  1 character, so
the rest of the framework should continue to work as normal. In the case
of fpGUI, I would also be able to get rid of all the UTF8Copy(),
UTF8Length() calls and simply use the RTL Copy() and Length() functions
again - after all, they were only introduced because FPC's RTL lacked
Unicode (any encoding) support.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Marco van de Voort
In our previous episode, Felipe Monteiro de Carvalho said:
 
 * Make file-handling routines which take filenames as parameters from
 the RTL modular so that the LCL can implement them with UTF-8 support.
 This plus a UTF-8 widestring manager and the Ansi RTL can be fully
 UTF-8.

I'm not as opposed to this as to the other. At least the interfaces stay the
same.

But again, that is no unicode solution, just minor damage control to make
the current situation bearable. 
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Felipe Monteiro de Carvalho
On Wed, Sep 14, 2011 at 11:53 AM, Marco van de Voort mar...@stack.nl wrote:
 * Make file-handling routines which take filenames as parameters from
 the RTL modular so that the LCL can implement them with UTF-8 support.
 This plus a UTF-8 widestring manager and the Ansi RTL can be fully
 UTF-8.

 I'm not as opposed to this as to the other. At least the interfaces stay the
 same.

Yes, but this solution would only work in the Ansi RTL ...

And why would the interfaces change in the other proposal? It is only
1 more overloaded option for the routines, it does not change anything
which will use UnicodeString.

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Graeme Geldenhuys
On 14/09/2011 11:04, Felipe Monteiro de Carvalho wrote:
 
 IMHO a platform-dependent string would be the worse solution of all
 ... far worse then migrating to UTF-16.

I don't see why?  Use the RTL functions to manipulate your text strings.
Both the string  and RTL functions will use the same encoding on each
platform - so no problems, no conversions.

If you really needed to know the encoding, the RTL could include a
helper function to tell you the encoding of any string (just like Delphi
2009+ has).


 Just recently I had a student from my university implement a routine
 which converts HTML text from utf-8 to braille in utf-8 ... I didn't

Again, no problem. The HTML should have specified the encoding it is in.
Normally that would be UTF-8. So under Linux, MacOSX etc it will already
be in the native encoding. Under Windows, text is normally stored in
UTF-8, contrary to UTF-16 being the encoding off the native Windows API.
So loading the file you can compare the HTML file encoding to the
current RTL encoding and do a conversion if needed (same as is required
in Delphi).

As for the text-to-braille functionality, that is outside the scope of
the FPC and RTL. But common sense should prevail, use RTL string
functions to implement your conversion - don't assume 1 byte = 1
character. A unicode aware string iterator could be implemented to help
you step through the characters one at a time. Such a string iterator
could even become part of the RTL as it will probably be used often for
many parsers.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Marco van de Voort
In our previous episode, Felipe Monteiro de Carvalho said:
 And why would the interfaces change in the other proposal?

 It is only
 1 more overloaded option for the routines,

Which is just 1 more interface change. And for something that is a temporary
workaround.

That is what I like on Mattias proposal, it is mostly hidden in
implementation, and the declaration and setting of the manager can have a
lot of platform  and deprecated directives
around it to make it clear that it won't last, and it is not just for
lazarus. I assume it is windows only.

But I really do wonder if this is necessary, since it will already not make
the 2.6 cycle anymore, and I hope 2.8 can be really unicode.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Marco van de Voort
In our previous episode, Martin Schreiber said:
  Is this possible in UNIX? I can see that in Windows you can use the
  trick to use W versions which are identical except for the string type
  and drop Windows 9x support, but is this really possible for the UNIX
  syscalls? They expect UTF-8 not UTF-16 which is what UnicodeString
  uses.
 
 Linux expects an array of bytes in filenames (no encoding, no utf-8) AFAIK.

It is a bit agnostic yes. But does that really matter if all other programs
write filenames in utf8 encoding? It might as well be specified to be utf-8
then, there is no difference in approach.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Felipe Monteiro de Carvalho
On Wed, Sep 14, 2011 at 12:03 PM, Marco van de Voort mar...@stack.nl wrote:
 Is this possible in UNIX? I can see that in Windows you can use the
 trick to use W versions which are identical except for the string type
 and drop Windows 9x support, but is this really possible for the UNIX
 syscalls? They expect UTF-8 not UTF-16 which is what UnicodeString
 uses.

 Afaik QT and many other higher level libs always use UTF-16. MSE does too. 
 Might
 also be useful for the JVM port.

I think I wasn't clear enough. I wanted to say that I don't see how
you can have both a Ansi and a UTF-16 RTL in UNIXes with the same
codebase, without ifdefs. I think this is not possible and one of the
previous messages seamed to indicate that the RTL would be able to use
the same codebase regardless of the output version (ansi vs utf-16),
so without ifdefs.

 But it will be beneficial to everybody, and it is clear to everybody how
 something should behave, so there will be no endless bickering over details
 and workarounds like this thread. It is a structured approach.

Well, there is still uncertainty over the question brought up by
Graeme: Always UTF-16 or the unknown string type?

 BTW: I explained all this to you, including the not dropping legacy, over some
 Chinese food a few months ago. Don't you remember?

Yes, I remember, but the way you spoke about it, it sounded something
like a proposal, not 100% sure it would end up like this =) Now it is
really put as the way forward...

-- 
Felipe Monteiro de Carvalho
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] bounty: FPC based debugger

2011-09-14 Thread Michael Schnell

On 09/13/2011 04:52 PM, Hans-Peter Diettrich wrote:


It's not the CPU, it's more the MMU which can help in finding changed 
(global) variables.
AFAIK, the MMU can not work in byte addresses but just with much bigger 
blocks of data. So it does not seem to help with finding a write access 
to a dedicated variable.


Moreover the MMU programming and  interrupts will be consumed by the OS 
and a user space program can't even see it.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] bounty: FPC based debugger

2011-09-14 Thread Michael Schnell

On 09/13/2011 02:53 PM, Joost van der Sluis wrote:

You do know that GDB does have a Pascal extension, right?
IMHO, if we really can work with the gdb team on feeding the necessary 
Object-Pascal specific add-ons into gdb, creating a new debugger from 
scratch does not make any sense at all.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] bounty: FPC based debugger

2011-09-14 Thread Michael Schnell

On 09/13/2011 04:59 PM, Hans-Peter Diettrich wrote:


IMO you're addressing the wrong audience. Most things, beyond 
breakpoint handling, stepping and memory read/writes, can be done 
outside the debugger. Such external code is not bound to debugger 
support, and can use language specific information (RTTI...).



If this is true, why discussing replacing gdb ?

-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Michael Schnell

On 09/14/2011 08:50 AM, Felipe Monteiro de Carvalho wrote:


All Linux distributions that I know use utf-8
Android uses utf-8
Meego uses utf-8

AFAIK, the EXT system does not care about the code the file-name 
byte-arrays are done in. only 0x00 (end of name) and '\' are interpreted.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Michael Schnell

On 09/14/2011 10:51 AM, Felipe Monteiro de Carvalho wrote:

In this case then for sure we cannot only have file routines only in
UTF-16, because that would make it impossible to identify many files
in Linux...
Who says that file names are supposed to be human readable and this done 
in some character encoding ?


AFAIK:

With EXT they are just streams of up to 512 bytes  (with 0x00 and '/' 
disallowed)


With old style FAT they are just arrays of 11 bytes (maybe with 0x00, 
'.' and '\' disallowed) and ASCII characters of lower case and upper 
case identical


With long filename FAT I fear it's quite complicated (e.g. short and 
long file name of a file need to be recognized as identical). But no 
unicode here.


-Michael

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Michael Schnell

On 09/14/2011 11:05 AM, Marco van de Voort wrote:

First and for all. Backwards compat dropping is not going to happen.


It already has and supposedly can't be avoided. Take a look of what 
Lazarus was forced to make out of the identity of ANSIString and 
UTF8String seemingly forced by FPC. e.g.:


Old programs assuming local ANSI 8 bit code retrieved from LCL GUI 
components, compiled with the new version don't work (e.g. if doing 
myChar := myString[3]; )


Doing My16BitString = 'my constant text containing umlauts as äöü'; 
provides an erroneous result.


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] bounty: FPC based debugger

2011-09-14 Thread Sven Barth

On 14.09.2011 12:44, Michael Schnell wrote:

On 09/13/2011 04:52 PM, Hans-Peter Diettrich wrote:


It's not the CPU, it's more the MMU which can help in finding changed
(global) variables.

AFAIK, the MMU can not work in byte addresses but just with much bigger
blocks of data. So it does not seem to help with finding a write access
to a dedicated variable.

Moreover the MMU programming and interrupts will be consumed by the OS
and a user space program can't even see it.


But the debugger can ask the OS to write protect a page or to enable a 
page guard (which triggers on write access) and then the corresponding 
signal/exception can be catched. This reduces the checks necessary from 
the complete process memory down to only the page size.


Note: I don't know whether it's implemented like that in any debugger, 
this is just a theory of mine.


Regards,
Sven

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] bounty: FPC based debugger

2011-09-14 Thread Michael Schnell

On 09/14/2011 01:58 PM, Sven Barth wrote:


But the debugger can ask the OS to write protect a page or to enable a 
page guard (which triggers on write access) and then the corresponding 
signal/exception can be catched. This reduces the checks necessary 
from the complete process memory down to only the page size.


Do you think this is possible without rewriting the OS (for all 
supported OSes)


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Hans-Peter Diettrich

Felipe Monteiro de Carvalho schrieb:

On Tue, Sep 13, 2011 at 9:23 PM, Michael Van Canneyt
mich...@freepascal.org wrote:

One with unicode string, one with ansistring. They will have the same code,
but will be compiled twice, each time with a different compiler define to
decide which version it must be.


Is this possible in UNIX? I can see that in Windows you can use the
trick to use W versions which are identical except for the string type
and drop Windows 9x support, but is this really possible for the UNIX
syscalls? They expect UTF-8 not UTF-16 which is what UnicodeString
uses.


A few topics:

The NT WinAPI (not 9x) *implements* everything in the Wide (UTF-16) 
routines, the Ansi versions do the string *conversion* before calling 
the Wide version. Unix API (most probably - dunno) has no such dual 
interface with internal conversion.


The NT filesystems store names in UTF-16, while Unix filesystems store 
UTF-8. This means that access to an NTFS or FAT32 drive under Unix will 
require a string conversion, in the filesystem handler.


On Windows, Ansi means any (byte-char) encoding, with different 
(national) codepages on every machine. This can cause trouble to Ansi 
applications (using Ansi strings), when filenames do not convert 
losslessly into that codepage. Unix IMO uses UTF-8 as the Ansi encoding, 
eliminating possible losses, and that's why FPC also prefers UTF-8 encoding.



But let's not forget the user!

Many users still want simple string handling, with direct mapping 
between logical and physical chars (SBCS). This is not possible at all 
with UTF-8, while UTF-16 works fine with the BMP, at least. This want 
of simple string handling suggests the use of UTF-16 for Unicode 
strings in *user* code.


WRT the latter argument, FPC IMO should follow the Delphi implementation 
of Unicode strings as UTF-16. This choice is independent from the 
(platform dependent) RTL conventions, but it affects the standard 
components (string lists...) in the FCL, and the other components in the 
LCL. Here again the average user will prefer UTF-16 component libraries, 
compatible with his own code, while more experienced users may be 
happier with the current UTF-8 libraries.


English (ASCII) users also may prefer UTF-8, as long as they do not have 
to (or want to) deal with strings in foreign languages. Once they have 
to face the existence of non-ASCII strings in their applications, they 
will most probably prefer switching to UTF-16, with few changes to their 
existing codebase and coding habits(!). Really *processing* Unicode 
text, with all its bells and whistles, is so complicated that it should 
be left to dedicated software and libraries, while typical application 
code will ignore everything beyond char level.



IMO the number of required conversions is of little importance to the 
runtime behaviour of an application. File access is always expensive, so 
that a single conversion into the platform specific filename 
representation is not perceptible at all. The same for GUI components, 
which typically store all strings twice: once for their own (and 
application) use, and another copy in the widgets. Here again transfers 
of strings between widgets and components are rare, with neglectable 
slowdown by eventual conversions during message handling.


More important IMO is the external storage of Unicode, where I see no 
reasonable way around UTF-8, considering codepage dependencies and 
UTF-16 byte-order problems.


Another note: a set of char is quite incompatible with Unicode/UTF-16. 
This should be taken into account with *every* introduction of an 
Unicode string type.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Hans-Peter Diettrich

Graeme Geldenhuys schrieb:


If FPC has true unicode support, then all functions should work correct
with just the UnicodeString type. That type's encoding is based on the
native encoding of each platform. NO performance hit required.


Can you specify, *which* strings ever *require* platform specific 
encoding? Beyond filenames and environment strings?


UI strings (GUI, console) are more thightly bound to user code and 
component/widgetset libraries, than to a platform API (see my other comment)


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Hans-Peter Diettrich

Michael Schnell schrieb:

On 09/14/2011 11:05 AM, Marco van de Voort wrote:

First and for all. Backwards compat dropping is not going to happen.


It already has and supposedly can't be avoided. Take a look of what 
Lazarus was forced to make out of the identity of ANSIString and 
UTF8String seemingly forced by FPC. e.g.:


Old programs assuming local ANSI 8 bit code retrieved from LCL GUI 
components, compiled with the new version don't work (e.g. if doing 
myChar := myString[3]; )


How many bytes must a char have, when it shall allow to store any 
(logical) character? Unicode users have no use for an char type, instead 
they have to use substrings for every logical character. A Unicode BMP 
user could be happy with a 2-byte char, of course, at his own (low) risk.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Hans-Peter Diettrich

Graeme Geldenhuys schrieb:


As for the text-to-braille functionality, that is outside the scope of
the FPC and RTL. But common sense should prevail, use RTL string
functions to implement your conversion - don't assume 1 byte = 1
character. A unicode aware string iterator could be implemented to help
you step through the characters one at a time. Such a string iterator
could even become part of the RTL as it will probably be used often for
many parsers.


How many users will have to deal with chars outside the Unicode BMP?
IMO UTF-16 can make 99% of the (current) users happy with simple string 
handling, while the rest would prefer UTF-32 instead of UTF-8, because 
outside the BMP UTF-8 is a waste of space, and lacks indexed char access 
in any case.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] bounty: FPC based debugger

2011-09-14 Thread Hans-Peter Diettrich

Sven Barth schrieb:

But the debugger can ask the OS to write protect a page or to enable a 
page guard (which triggers on write access) and then the corresponding 
signal/exception can be catched. This reduces the checks necessary from 
the complete process memory down to only the page size.


Note: I don't know whether it's implemented like that in any debugger, 
this is just a theory of mine.


Every (reasonable) OS provides such features in its debug API. Available 
support depends on the actual hardware, of course.


DoDi

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] bounty: FPC based debugger

2011-09-14 Thread dmitry boyarintsev
On Wed, Sep 14, 2011 at 6:48 AM, Michael Schnell mschn...@lumino.de wrote:
 IMHO, if we really can work with the gdb team on feeding the necessary
 Object-Pascal specific add-ons into gdb, creating a new debugger from
 scratch does not make any sense at all.

That's true. The only thing concerns me about that, is there's no
really a standard in GDB (i can be wrong). But I've seen a lot of
issues in Lazarus gdb-support, because of the different builds of GDB
used.

Also, IRC, Apple forked gdb (as well other gnu-tools) to make it
usable for their own needs (iDevice debugging support).
I'm not sure, if the latest changes of gdb are there in the Apple gdb,
I would assume they're there.

thanks,
Dmitry
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] bounty: FPC based debugger

2011-09-14 Thread Sven Barth

On 14.09.2011 14:53, Michael Schnell wrote:

On 09/14/2011 01:58 PM, Sven Barth wrote:


But the debugger can ask the OS to write protect a page or to enable a
page guard (which triggers on write access) and then the corresponding
signal/exception can be catched. This reduces the checks necessary
from the complete process memory down to only the page size.


Do you think this is possible without rewriting the OS (for all
supported OSes)


At least Windows allows to use page guards... I don't know about Linux 
though.


Regards,
Sven

@Michael: Sorry, this mail wasn't meant to be private, but Reply to 
list put your mail address into the to field instead of the list's 
address.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Luiz Americo Pereira Camara

On 14/9/2011 06:41, Felipe Monteiro de Carvalho wrote:

On Wed, Sep 14, 2011 at 11:32 AM, Luiz Americo Pereira Camara
luiz...@oi.com.br  wrote:

Because if someone for some reason, like porting Delphi code, stays with a
UTF16 string, under windows, when using RTL functions TWO conversions will
be made:

User Code (UTF16)  RTL (UTF8)  WINAPI (UTF16)

This would not happen because I proposed to have 2 versions of the
routines in the RTL. Not 1 UTF-8 version. There would be both UTF-8
and UTF-16 versions and one would naturally use the one which matches
his preferred encoding ... and the RTL would only convert the
non-native version


OK. The drawback is increasing file size of executables (that are 
already big).


Luiz


___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Luiz Americo Pereira Camara

On 14/9/2011 06:48, Graeme Geldenhuys wrote:

On 14/09/2011 11:19, Luiz Americo Pereira Camara wrote:

This is not desirable simply because at each platform (windows / unix)
the user code of the same program will have a different encoding
increasing the possibility of subtle errors.

Why? Not every program is a text manipulation program or text parser.
Most programs simply assign one string to another.

eg:

Button1.Caption := 'Click me';
lMyString := Button1.Caption;


Given that Button1.Caption will be different under windows and unix, 
even if the compiler provides automatically conversion, at least some 
changes will be required to the default classes that handles things like 
(de)serialization etc and the places where these methods should be used 
must be checked.


Moreover having different encodings in different platforms will give no 
gain to libraries like LCL/Lazarus like stated by DoDi.


All in all my proposition is similar to yours. The only difference is 
that by default i suggest to be used only in RTL but nothing stops to 
users like you using it in a broader scope.


The other difference is the name that i dont care (can be xString, 
MultiString, FPCString). Just i think that using UnicodeString to a 
variable encoding per platform will loose Delphi compatibility for no 
good and more: will be floods of bugreports asking why Delphi code does 
not work this way and asking for a fix.


Luiz





___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Unicode support (yet again)

2011-09-14 Thread Martin Schreiber
On Wednesday 14 September 2011 17:02:14 Hans-Peter Diettrich wrote:
 Felipe Monteiro de Carvalho schrieb:
  On Tue, Sep 13, 2011 at 9:23 PM, Michael Van Canneyt
  
  mich...@freepascal.org wrote:
  One with unicode string, one with ansistring. They will have the same
  code, but will be compiled twice, each time with a different compiler
  define to decide which version it must be.
  
  Is this possible in UNIX? I can see that in Windows you can use the
  trick to use W versions which are identical except for the string type
  and drop Windows 9x support, but is this really possible for the UNIX
  syscalls? They expect UTF-8 not UTF-16 which is what UnicodeString
  uses.
 
 A few topics:
 
[...]

Agreed. And so it is made in MSEgui:

- On user side all stringhandling uses type msestring which is 
defined as the existing Free Pascal 16bit  UnicodeString.
- The MSEgui widgetset works with UnicodeString too.
- For file and directory access MSEgui has a set of functions which convert 
from/to system encoding to/from type filenamety which is defined as existing 
Free Pascal 16bit  UnicodeString.
- MSEgui has own 16bit functions and classes for lists, maps, sorting and the 
like.
- Text files are stored in utf-8 by default.

From my point of view there is no need for a complicated encoding aware 
unicode string type which possibly is slower, needs more memory and  
introduces new bugs.
  
Martin
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel