Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-29 Thread Marco van de Voort
On Sat, Dec 28, 2013 at 10:39:30AM -0200, Marcos Douglas wrote:
  I understand. But if the major companies prefer to use C# or Java
  instead Delphi well, they not care about Delphi compatibilities. If
  they care, why they would be leaving Delphi?
 
  If they leave Delphi compatibility, they normally don't go for a
  marginal oss compiler.
 
 So you're saying that FPC cannot survive without Delphi?

Survive is a big word and very black and white.

Not sustainable on the current level with a serious hope of continued growth? 
Probably.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-29 Thread Marco van de Voort
On Sat, Dec 28, 2013 at 02:09:58PM +0100, Florian Kl??mpfl wrote:
  marginal oss compiler.
  
  So you're saying that FPC cannot survive without Delphi?
 
 Define survive. But I'am saying indeed that FPC's usage would drop
 significantly if Delphi wouldn't be around anymore. A few years it might
 increase because people would use FPC to rescue old sources but after

IMHO that effect has always been exaggerated. If it already happens, it
doesn't really lead to growth in FPC/Lazarus, only people using the finished
product in silence.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-28 Thread Marcos Douglas
On Fri, Dec 27, 2013 at 9:07 PM, Graeme Geldenhuys
gra...@geldenhuys.co.uk wrote:
 On 2013-12-27 22:49, Marco van de Voort wrote:

 Just look on e.g. the forums. All people are asking about Delphi packages.

 And once those Delphi packages are ported to Free Pascal, nobody needs
 Delphi any more. ;-)

Touché.

 Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY!

 Converting to Free Pascal and Lazarus will *always* be easier that
 rewriting everything in C# or Java - no matter how many

I totally agree.

 incompatibilities Free Pascal might have with Delphi. The language still
 stays a lot more similar than the alternative. Yet, looking at the
 current employment market, it seems most companies opted to rewrite
 there Delphi projects in C# and Java - so they took the even harder
 route! Why? Probably due to more innovation happing in those other
 languages.

...most companies opted to rewrite there Delphi projects in C# and Java...

I agree. I see this here in Brazil too.
So, if the companies prefer to rewrite everything to another language,
this is another prove that people do not want compatibility with
Delphi (so much).

Regards,
Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-28 Thread Graeme Geldenhuys
On 2013-12-27 23:55, Marco van de Voort wrote:
 Somehow there is still more Delphi use than Lazarus, so I'll bounce back
 the statistics request.

Umm, I never quoted any stats.

Free Pascal and Lazarus projects are like Linux - there is NO reliable
way of tracking usage. So at best, any usage claims or stats are just a
best guess, nothing more.



Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-28 Thread Florian Klämpfl
Am 28.12.2013 11:01, schrieb Marcos Douglas:
 incompatibilities Free Pascal might have with Delphi. The language still
 stays a lot more similar than the alternative. Yet, looking at the
 current employment market, it seems most companies opted to rewrite
 there Delphi projects in C# and Java - so they took the even harder
 route! Why? Probably due to more innovation happing in those other
 languages.

I'am pretty sure this is not the case. It is a story of “No one ever got
fired for buying IBM.”

 
 ...most companies opted to rewrite there Delphi projects in C# and Java...
 
 I agree. I see this here in Brazil too.
 So, if the companies prefer to rewrite everything to another language,
 this is another prove that people do not want compatibility with
 Delphi (so much).

And you think they would switch instead to some marginal OSS language
which is compatible to nothing and nobody knows? C# and Java are used
because they provide a huge user base (user in the sense of programmers
knowing it) and being developed by huge companies so people expect their
code base has a future.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-28 Thread Florian Klämpfl
Am 28.12.2013 13:02, schrieb Marcos Douglas:
 On Sat, Dec 28, 2013 at 9:41 AM, Florian Klämpfl flor...@freepascal.org 
 wrote:
 Am 28.12.2013 11:01, schrieb Marcos Douglas:

 [...]

 So, if the companies prefer to rewrite everything to another language,
 this is another prove that people do not want compatibility with
 Delphi (so much).

 And you think they would switch instead to some marginal OSS language
 which is compatible to nothing and nobody knows? C# and Java are used
 because they provide a huge user base (user in the sense of programmers
 knowing it) and being developed by huge companies so people expect their
 code base has a future.
 
 I understand. But if the major companies prefer to use C# or Java
 instead Delphi well, they not care about Delphi compatibilities. If
 they care, why they would be leaving Delphi?

If they leave Delphi compatibility, they normally don't go for a
marginal oss compiler.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-28 Thread Marcos Douglas
On Sat, Dec 28, 2013 at 10:19 AM, Florian Klämpfl
flor...@freepascal.org wrote:
 Am 28.12.2013 13:02, schrieb Marcos Douglas:
 On Sat, Dec 28, 2013 at 9:41 AM, Florian Klämpfl flor...@freepascal.org 
 wrote:
 Am 28.12.2013 11:01, schrieb Marcos Douglas:

 [...]

 So, if the companies prefer to rewrite everything to another language,
 this is another prove that people do not want compatibility with
 Delphi (so much).

 And you think they would switch instead to some marginal OSS language
 which is compatible to nothing and nobody knows? C# and Java are used
 because they provide a huge user base (user in the sense of programmers
 knowing it) and being developed by huge companies so people expect their
 code base has a future.

 I understand. But if the major companies prefer to use C# or Java
 instead Delphi well, they not care about Delphi compatibilities. If
 they care, why they would be leaving Delphi?

 If they leave Delphi compatibility, they normally don't go for a
 marginal oss compiler.

So you're saying that FPC cannot survive without Delphi?


Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-28 Thread Marcos Douglas
On Sat, Dec 28, 2013 at 10:37 AM, Jürgen Hestermann
juergen.hesterm...@gmx.de wrote:
 Am 2013-12-28 13:19, schrieb Florian Klämpfl:
 I understand. But if the major companies prefer to use C# or Java
 instead Delphi well, they not care about Delphi compatibilities. If
 they care, why they would be leaving Delphi?
 If they leave Delphi compatibility, they normally don't go for a
 marginal oss compiler.

 The question is:
 Why did they use Delphi before at all?

 If the reason was that Delphi was a very common and widespread programming
 environment
 then it is a understandable behaviour to move to the next main stream
 environment
 as soon as budget and time allows.
 Such people would never care about FPC/Lazarus (even when it was fully
 Delphi compatible).
 They would never think about using it.
 So making FPC/Lazarus compatible would not hold any user of this group.

 If the reason was that they like Pascal as an easy to learn and
 mantain language then they will invest into migration even
 if not all parts are the identical to Delphi.
 Just the opposite:
 They may like that not all misconcepts are repeated in
 FPC/Lazarus and they may like that it is open source.

+1

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-28 Thread Florian Klämpfl
Am 28.12.2013 13:39, schrieb Marcos Douglas:
 If they leave Delphi compatibility, they normally don't go for a
 marginal oss compiler.
 
 So you're saying that FPC cannot survive without Delphi?

Define survive. But I'am saying indeed that FPC's usage would drop
significantly if Delphi wouldn't be around anymore. A few years it might
increase because people would use FPC to rescue old sources but after
that FPC's usage would probably decay significantly.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-28 Thread Marcos Douglas
On Sat, Dec 28, 2013 at 11:09 AM, Florian Klämpfl
flor...@freepascal.org wrote:
 Am 28.12.2013 13:39, schrieb Marcos Douglas:
 If they leave Delphi compatibility, they normally don't go for a
 marginal oss compiler.

 So you're saying that FPC cannot survive without Delphi?

 Define survive.

To remain alive or in existence.

 But I'am saying indeed that FPC's usage would drop
 significantly if Delphi wouldn't be around anymore. A few years it might
 increase because people would use FPC to rescue old sources but after
 that FPC's usage would probably decay significantly.

Well, this is very frustrating... and even more because you, the FPC
main developer, wrote. :-(

Regards,
Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-28 Thread Marius
Flávio Etrusco wrote:

 Yes, but we all know you are a special case :-)

 Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY!

I tend to agree with Graeme on this one.

-Flávio

My opinion as well, and its already hard at this moment as generics and
string/utf8 are the first problems people will encounter while moving
XE* code to FPC.


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-28 Thread Flávio Etrusco
On Sat, Dec 28, 2013 at 2:35 PM, Marius fpclaza...@home.nl wrote:
 Flávio Etrusco wrote:

 Yes, but we all know you are a special case :-)

 Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY!

I tend to agree with Graeme on this one.

-Flávio

 My opinion as well, and its already hard at this moment as generics and
 string/utf8 are the first problems people will encounter while moving
 XE* code to FPC.

Oops, I misquoted.
Graeme actually replied to that opinion with this:
Converting to Free Pascal and Lazarus will *always* be easier that
rewriting everything in C# or Java - no matter how many
incompatibilities Free Pascal might have with Delphi. The language still
stays a lot more similar than the alternative. Yet, looking at the
current employment market, it seems most companies opted to rewrite
there Delphi projects in C# and Java - so they took the even harder
route! Why? Probably due to more innovation happing in those other
languages.

IOW I think that for some years already, innovation would be a much
better selling point for Free Pascal rather than Delphi compatibility.

-Flávio

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Hans-Peter Diettrich

Sven Barth schrieb:

On 26.12.2013 17:02, Sven Barth wrote:

Am 26.12.2013 12:30 schrieb Hans-Peter Diettrich drdiettri...@aol.com
mailto:drdiettri...@aol.com:
 
  Sven Barth schrieb:
 
  Am 26.12.2013 02:19 schrieb Hans-Peter Diettrich
drdiettri...@aol.com mailto:drdiettri...@aol.com
mailto:drdiettri...@aol.com mailto:drdiettri...@aol.com:
  Please specify AnsiString, of which encoding?
 
  When I concat an AnsiString and an UTF8String and assign it to an
OEMString
o := a + u;
  then I get these warnings in XE:
 
  [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
'AnsiString' to 'string'
  [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
'UTF8String' to 'string'
  [DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with
potential data loss from 'string' to 'OEMString'
 
  I cannot see the system codepage used here.

Try to make o of type RawByteString. And maybe also use more than two
strings.


As I already statet: RawByteString is not for application use!


Ok, I didn't remember the situation correctly. When searching for Jonas' 
mail I mentioned below I also found this which I was referring to:


=== quote of Jonas begin ===

var
  mypath: utf8string;
  sr: tsearchrec;
begin
  { assign some utf-8 string to mypath }
  if findfirst(mypath+allfilesmask,faAnyFile,sr)=0 then
begin
  ...
end;
end


Delphi has no problem with this code, because all strings are upgraded 
to UnicodeString.


If the DefaultSystemCodePage is something different from UTF-8, the 
result of mypath+allfilesmask will be downgraded to 
DefaultSystemCodePage because the string constant allfilesmask is 
encoded using that code page.


Delphi has no rule of downgrading.

When mypath+allfilesmask is assigned to a variable, the result has the 
correct encoding, not necessarily CP_ACP.



This is due to rule that concatenating 
ansistrings with different encodings results in an ansistring with the 
encoding of the destination ansistring is followed, and the destination 
ansistring is a rawbytestring here (the first argument of findfirst), in 
which case the ansi encoding is used.


Again: RawByteString is a mess, should be used with care.

The first argument of FindFirst (file mask) certainly *can not* be a 
RawByteString.



=== quote of Jonas end ===


 
 
  What I want to point out are the string function overloads, where
Delphi supplies only string (UTF-16) and RawByteString arguments, and
AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String
overloads and use these when dealing with AnsiStrings of an encoding
different from CP_ACP.

That was already discussed some time ago between devs and was deemed not
useable by Jonas. I'll try to find his mail with his explanation.


=== quote of Jonas begin ===

Adding explicitly named UTF-8 versions of routines with constant or value
rawbytestring arguments (FindFirstUTF8 etc) with UTF8String arguments and
that internally simply call through to the rawbytestring versions could
perhaps be useful.  Interestingly, Lazarus users probably won't suffer
from this particular problem as they already use such routines from the
LCL, and those routines can simply be adapted by simply removing all the
UTF8ToSys calls (they will keep working in their current state though,
they simply keep suffering from the same data loss issues they had
before).

=== quote of Jonas end ===


I see no argument for or against UTF-8 overloads here.


Please note that Jonas states that different named overloads would be 
needed. Equally named UTF8String overloads won't necessarily work 
correctly.


You see the need for making RawByteString a compiler magic? :-]
It should be used only as the last resort, when no other string type 
matches a given string encoding.


As for FindFirst, a choice of the mask string exists only on Windows, 
depending on the use of the A or W API. Other targets have an dedicated 
encoding for filenames, that should be used in all file and directory 
functions. Even on Windows only the W API should be used nowadays; the A 
API (as used in older Delphi versions) was only for support of legacy 
Win9x systems, where not all W subroutine versions were available.


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Hans-Peter Diettrich

Juha Manninen schrieb:

It happened again. The word Unicode was mentioned and the result is
an endless debate of how it should be done. Now  100 messages and
counting ...


Now that we are in pre-release of strings with Encoding, the debate 
enters a very new round.



I personally don't care much what the default encoding will be, but I
wonder how easy it will be to use UTF-8 for my employer's code.
The situation with FPC will be better than with Delphi because FPC
does not convert automatically to default encoding ALWAYS. It only
converts when the conversion is needed.
For example TStringList can be used for UTF8Strings and it does not
trigger automatic conversion.
Isn't it so? Please correct me if I still got it wrong.


That's the old state, where strings have no stored Encoding. As soon as 
AnsiStrings have an encoding, the default encoding becomes important for 
the reduction of automatic conversions. When the RTL is converted to 
UTF-16, you'll have to accept either this new default encoding, or any 
number of automatic conversions between Ansi and UnicodeStrings.



It means UTF-8 with FPC will be easier than UTF-8 with Delphi, even if
UTF-16 was the default.


Delphi suffers from the use of CP_ACP, which was the only supported 
encoding before, and still is the only explicitly supported encoding 
when the AnsiString unit is used. In Lazarus we had the same only one 
encoding philosophy, except that here the default string type is UTF-8. 
With the encoded AnsiStrings the problem of other encodings and 
automatic conversion arises. Delphi solved most problems by changing 
string to UTF-16, so that only the forced used of AnsiString will ever 
result in automatic conversions due to different string encodings.


In FPC/Lazarus the situation is somewhat different, because now the 
default string type could be UTF-8, UTF-16 or even CP_ACP, with a number 
of users voting for each of them. Technically the simplest solution 
would be to keep the de-facto standard UTF-8, as assumed by Lazarus. But 
when string becomes UTF-16, as in recent Delphi versions, Lazarus and 
the LCL deserves heavy refactoring. That's the top discussion topic 
right now.


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Hans-Peter Diettrich

Marco van de Voort schrieb:

On Thu, Dec 26, 2013 at 12:28:54AM +0100, Hans-Peter Diettrich wrote:

is dangerous if they are not all the same encoding. If there is any
mismatch, it will be converted down to default encoding.
Then the implementation is wrong. 


Wrong according to you.


Wrong, or better *broken*, with regards to expected results.


Not wrong according to defined Microsoft
applications.


Where do you see Microsoft applications using Ansi strings, nowadays?



This way of top-down thinking will turn FPC into a Java, where you are
lugging along an own platform-within-an-platform everywhere.


That's what FPC and Lazarus do already: they assume an UTF-8 
environment, till now. That's okay for all targets execpt Windows, where 
a UTF-8/16 conversion is required on the app-WinAPI boundary.


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Marco van de Voort
On Thu, Dec 26, 2013 at 04:48:34PM +0200, Juha Manninen wrote:
 It happened again. The word Unicode was mentioned and the result is
 an endless debate of how it should be done. Now  100 messages and
 counting ...

This is because still nothing definitive is chosen, after 4+ years of
discussion (this started in April 2009, when the first D2009 details were
leaked)

It was a situation I hoped to avoid by going for two encodings per target
directly. Or at least try.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Marco van de Voort
On Thu, Dec 26, 2013 at 11:53:38PM -0200, Marcos Douglas wrote:
  If you totally drop Delphi compatibility you can do whatever you want. But
  IMHO that is more something for the Graeme's and Martin (MSEGUI's) of the
  world, not Lazarus.
 
 Ok... but if FPC, on Windows, will be UTF-16 and Lazarus continues
 using UTF-8 what is the difference?

Well, currently, Lazarus has no other choice, since the unicode FPC is not
ready. (only up to classes level).

This alone means that the current uncertainty will persist at least several
years.

 This approach is not like Delphi. It has the RTL and VCL using the
 same encode... FPC RTL and LCL will continue fighting!  :(

I always considered the UTF8 choice of Lazarus a temporary solution till FPC
caught up with Delphi.

The current situation really worries me, since at work I invested in
FPC/Lazarus in the assumption that compatibility would increase, not
decrease.


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Graeme Geldenhuys
On 2013-12-27 09:16, Hans-Peter Diettrich wrote:
 But 
 when string becomes UTF-16, as in recent Delphi versions, Lazarus and 
 the LCL deserves heavy refactoring. That's the top discussion topic 
 right now.

Personally I think FPC and Lazarus should get rid of string
altogether! It should be a user definable type that can be defined per
project.

eg:
Projects could do the following

  type
{$IFDEF WINDOWS}
UnicodeString = UTF16String
{$ENDIF}
{$IFDEF UNIX}
UnicodeString = UTF8String
{$ENDIF}

   String = UnicodeString
   // or for backwards compatibility with old projects:
   //   String = AnsiString

or they could simply say they prefer to work with a specific encoding,
so use UTF16String or UTF8String directly. Thus no alias type needed.

Also the very broken logic of UnicodeString = UTF16String should
disappear. Unicode Standard  UTF-16!!!  That is just some sh*t
Microsoft came up with and Delphi followed suite!  It's just WRONG. The
Unicode Standard exists of multiple encodings, not just one.

But that's just my 2c worth - and arguments like these have been raised
before.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Jürgen Hestermann

Am 2013-12-27 12:21, schrieb Marco van de Voort:
 The current situation really worries me, since at work I invested in
 FPC/Lazarus in the assumption that compatibility would increase, not decrease.

I think that's the root cause of the discussion:

Some want to make FPC/Lazarus into a (possibly exact) clone of
Delphi (which means to follow every sh*t that is and will be put into this 
product)
and others (like me) hope for a more Pascal like programming environment
which at least avoids future (maybe even removes existing) obscurity
crept into Pascal with Borland/Embarcadero.

The ease of use was the reason for success of Turbo-Pascal but
meanwhile this goal has been put aside and it becomes a more
C-like environment (with lots of ugly hacks..).

On the other hand I understand the demand for compatibility if there is a lot of
(Delphi) code that needs to be reused and cannot be changed easily.



--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Graeme Geldenhuys
On 2013-12-27 17:11, Jürgen Hestermann wrote:
 Some want to make FPC/Lazarus into a (possibly exact) clone of
 Delphi (which means to follow every sh*t that is and will be put into this 
 product)
 and others (like me) hope for a more Pascal like programming environment
 which at least avoids future (maybe even removes existing) obscurity
 crept into Pascal with Borland/Embarcadero.

Yeah, and I wonder what is the plans for the Free Pascal project, now
that Embarcadero is changing the compiler and language - specifically
for mobile platforms. EMBT are introducing even more sh*t - as you put
it. ;-)  Enhancements for Desktop or Web development are getting very
little attention these days by EMBT.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Graeme Geldenhuys
On 2013-12-27 13:42, Marco van de Voort wrote:
 
 Please read 4 years of discussion backlog. It is not all language, there is 
 something called
 libraries, and they are generally installed precompiled.

I don't know of a single ISV that ships precompiled *.ppu files for FPC
or Lazarus. They all include source code (yes even most Delphi ISV's do
this now) that can be compiled with a project.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Reinier Olislagers
On 27/12/2013 18:34, Graeme Geldenhuys wrote:
 On 2013-12-27 13:42, Marco van de Voort wrote:

 Please read 4 years of discussion backlog. It is not all language, there is 
 something called
 libraries, and they are generally installed precompiled.
 
 I don't know of a single ISV that ships precompiled *.ppu files for FPC
 or Lazarus. They all include source code (yes even most Delphi ISV's do
 this now) that can be compiled with a project.

raudus, see e.g.
http://forum.lazarus.freepascal.org/index.php/topic,22059.0.html


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Marco van de Voort
On Fri, Dec 27, 2013 at 05:34:33PM +, Graeme Geldenhuys wrote:
  Please read 4 years of discussion backlog. It is not all language, there is 
  something called
  libraries, and they are generally installed precompiled.
 
 I don't know of a single ISV that ships precompiled *.ppu files for FPC
 or Lazarus.

What about fpc ? Binary downloads top source downloads 20-30 to 1.


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Marco van de Voort
On Fri, Dec 27, 2013 at 06:11:31PM +0100, J?rgen Hestermann wrote:
 
 I think that's the root cause of the discussion:
 
 Some want to make FPC/Lazarus into a (possibly exact) clone of
 Delphi (which means to follow every sh*t that is and will be put into this 
 product)

I always think it is amusing when you talk about compatability, proponents
always are described as brainless drones that can't think for them selves.

Please come back after you fixed 700 bugs.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Marcos Douglas
On Fri, Dec 27, 2013 at 3:38 PM, Graeme Geldenhuys
gra...@geldenhuys.co.uk wrote:
 On 2013-12-27 17:11, Jürgen Hestermann wrote:
 Some want to make FPC/Lazarus into a (possibly exact) clone of
 Delphi (which means to follow every sh*t that is and will be put into this 
 product)
 and others (like me) hope for a more Pascal like programming environment
 which at least avoids future (maybe even removes existing) obscurity
 crept into Pascal with Borland/Embarcadero.

 Yeah, and I wonder what is the plans for the Free Pascal project, now
 that Embarcadero is changing the compiler and language - specifically
 for mobile platforms. EMBT are introducing even more sh*t - as you put
 it. ;-)  Enhancements for Desktop or Web development are getting very
 little attention these days by EMBT.

Hmm...
Maybe Martin Schreiber saw a dark future and has already taken the
initiative to keep the true living legacy.  :-)

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Marco van de Voort
On Fri, Dec 27, 2013 at 06:39:38PM -0200, Marcos Douglas wrote:
 If we continue to follow Delphi, means that we are always one step behind.

If we stop following delphi, we are multiple steps behind.

Most of the antis only agree on being anti, they only have simplistic
topdown proposals and talk about an elusive own way.

Fact is that the extensions of FPC are used much less than the Delphi
compatibility aspect.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Graeme Geldenhuys
On 2013-12-27 21:12, Marco van de Voort wrote:
 Fact is that the extensions of FPC are used much less than the Delphi
 compatibility aspect.

And would you mind sharing how you came to that conclusion? Can you
share the data on that research?

From my personal experience of using FPC since 2004-2005, there is NO
need to have support for two compilers (Delphi  FPC) in a single
project. Free Pascal is more than capable enough to stand on its own
feet. Speaking as someone that has personally ported large Delphi and
Kylix project not only to Free Pascal and Lazarus's LCL, but also to
fpGUI - a completely VCL incompatible UI toolkit. I have done this for
multiple projects, frameworks and GUI widgets. My conclusion after all
this... A conversion is NOT THAT HARD, and it's a great time to review
old code too. So it has a triple positive. Moving to a real
cross-platform compiler, a real cross-platform toolkit (be that LCL or
fpGUI) and being able to review and improve old code and designs (your
second attempt and software is ALWAYS better that your first).

Many projects have moved away from Delphi in the last few years - mostly
to other languages like C# or Java. That requires a total rewrite -
which is infinitely more work than moving to Free Pascal.

Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Marco van de Voort
On Fri, Dec 27, 2013 at 10:43:16PM +, Graeme Geldenhuys wrote:
  Fact is that the extensions of FPC are used much less than the Delphi
  compatibility aspect.
 
 And would you mind sharing how you came to that conclusion? Can you
 share the data on that research?

Just look on e.g. the forums. All people are asking about Delphi packages.
 
 From my personal experience of using FPC since 2004-2005, there is NO
 need to have support for two compilers (Delphi  FPC) in a single
 project

Yes, but we all know you are a special case :-)

Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY!

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Juha Manninen
perjantai 27. joulukuuta 2013 Marcos Douglas kirjoitti:

 I think Lazarus team did not think the same.


 Lazarus team has not thought it much. The question is not acute yet and we
already have a working Unicode system in LCL. Thinking of it now when
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Graeme Geldenhuys
On 2013-12-27 22:49, Marco van de Voort wrote:
 
 Just look on e.g. the forums. All people are asking about Delphi packages.

And once those Delphi packages are ported to Free Pascal, nobody needs
Delphi any more. ;-)


 Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY!

Converting to Free Pascal and Lazarus will *always* be easier that
rewriting everything in C# or Java - no matter how many
incompatibilities Free Pascal might have with Delphi. The language still
stays a lot more similar than the alternative. Yet, looking at the
current employment market, it seems most companies opted to rewrite
there Delphi projects in C# and Java - so they took the even harder
route! Why? Probably due to more innovation happing in those other
languages.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Juha Manninen
Lazarus team has not thought it much. The question is not acute yet and we
already have a working Unicode system in LCL. Thinking of it now when FPC
behavior is not decided would be waste of time.
My comment about UTF-8 was based on poor knowledge and maybe wishfull
thinking.
Some other Lazarus developers have more knowledge of the issue but they are
wise enough not to enter this discussion.

Let's wait till FPC developers get their work done and then let's discuss
about LCL.
If you need UTF-8 solution right now, it is possible with LCL.
It is also good to read the old mail threads because the same things get
repeated again and again.

Juha

P.S.
Learning to type with iPad. Sent an unfinished text.
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Hans-Peter Diettrich

Marco van de Voort schrieb:

On Fri, Dec 27, 2013 at 06:39:38PM -0200, Marcos Douglas wrote:

If we continue to follow Delphi, means that we are always one step behind.


If we stop following delphi, we are multiple steps behind.


FPC/Lazarus always was in front of Delphi. It had Unicode support 
(UTF-8) since long, supports multiple widgetsets, platforms and 
machines. Even the help system and is better and more user friendly, as 
well as are the editing helpers.


The compatibility problems are selfmade, IMO. Compatibility with all 
versions of a continuously moving target is near impossible, at least 
not feasable with the available manpower.


Now that Delphi introduced something really useful (encoded strings and 
automatic conversion), the new Unicode support should be integrated into 
FPC and Lazarus. When this works for the AnsiString version (UTF-8), 
somewhat compatible with D7, a UnicodeString (UTF-16) version can be 
considered, compatible with some newer Delphi version.


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Marco van de Voort
On Fri, Dec 27, 2013 at 11:07:02PM +, Graeme Geldenhuys wrote:
  Just look on e.g. the forums. All people are asking about Delphi packages.
 
 And once those Delphi packages are ported to Free Pascal, nobody needs
 Delphi any more. ;-)

Somehow there is still more Delphi use than Lazarus, so I'll bounce back
the statistics request.
 
  Point is if you make conversion harder PEOPLE WILL NOT EVEN TRY!
 
 Converting to Free Pascal and Lazarus will *always* be easier that
 rewriting everything in C# or Java - no matter how many
 incompatibilities Free Pascal might have with Delphi. The language still
 stays a lot more similar than the alternative. Yet, looking at the
 current employment market, it seems most companies opted to rewrite
 there Delphi projects in C# and Java - so they took the even harder
 route! Why? Probably due to more innovation happing in those other
 languages.

Not really, it is simply vendors pushing it, and bundling it with their
products, giving the SDKs preferential treatment. Therefore I don't really
think it is sane to compare FPC (or even Delphi) to C# and Java.

Worse, one of the motivators,  webframeworks often need support serverside,
and getting into the ISP's portfolios is hard, specially as native language.

But more importantly, however which way you turn it, there are still way
more new users coming from Delphi than from other sources (and then I'm
already generous, since those other sources also include other pascals).

And I see the numbers of _knowledgable_ users from old Delphi decreasing,
and the more able people also working with /new/ Delphi (and e.g.  testing
Lazarus to see if they can get a subset running on some other target).

This trend will only increase, and follows the same pattern as the TP to
Delphi migration years ago. Still a definitive switch is still some time
off, since most OSS projects and vendors still support D7. (support before
that is getting scarce).

But that is now. Decisions for FPC/Lazarus NG will only come to fruition in
2-3 years.  I think the D7 installed base will erode, but only very slowly. 
But the /activities/ of that installed base, and their investments in new
(D7 level) code will erode quicker.  Again this prediction is based on the
same pattern as with TP.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-27 Thread Marco van de Voort
On Sat, Dec 28, 2013 at 12:42:22AM +0100, Hans-Peter Diettrich wrote:
  If we continue to follow Delphi, means that we are always one step behind.
  
  If we stop following delphi, we are multiple steps behind.
 
 FPC/Lazarus always was in front of Delphi. It had Unicode support 
 (UTF-8) since long, supports multiple widgetsets, platforms and 
 machines.

It did not. It half a solution, like the wellknown TNT components.

 Even the help system and is better and more user friendly, as 
 well as are the editing helpers.

There is some potential there, but the content is still subpar to Delphi,
specially the Lazarus/LCL part. Way subpar.
 
 The compatibility problems are selfmade, IMO. Compatibility with all 
 versions of a continuously moving target is near impossible, at least 
 not feasable with the available manpower.

I don't see this at all. Yes, it is hard. Yes, it will be at a distance. But
I don't see impossible. Also major change is fairly rare. 

But the unicode change has been a done deal for 6 versions. This is is not
about the bleeding edge, this is about planning steps that Embarcadero
brought to production nearly 5 years ago, and which affect many levels of
the code (more so than later additions, with the dotted change being
debatable)
 
 Now that Delphi introduced something really useful (encoded strings and 
 automatic conversion), the new Unicode support should be integrated into 
 FPC and Lazarus. When this works for the AnsiString version (UTF-8), 
 somewhat compatible with D7, a UnicodeString (UTF-16) version can be 
 considered, compatible with some newer Delphi version.

A utf8 ansistring version will be per definition not compatible.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-26 Thread Sven Barth
Am 26.12.2013 02:19 schrieb Hans-Peter Diettrich drdiettri...@aol.com:

 Sven Barth schrieb:


 If in 2.6.2 your three strings contain text of different encodings then
the resulting string might be garbage from the user's POV.
 In trunk the encoding is part of each string and if they differ then
each strings will be converted to the default string encoding (defined by a
global variable inside unit System) and thus the string might still be
valid.


 If so, this flaw should be fixed immediately. Delphi uses lossless
conversions, i.e. an up-cast to Unicode.

No it does not. If the variables you concatenate are AnsiString and the
variable or parameter you pass them to is AnsiString as well (AFAIK it even
needs to be RawByteString) then the strings are converted to the system
encoding before they are concatenated and passed. This is implemented
Delphi compatible in FPC.

 Such problems can be avoided by making RawByteString a compiler magic,
that enforces a Unciode conversion whenever AnsiStrings of a different
dynamic encoding have to be combined.

RawByteString is already as magical as it gets and exactly is what's on the
tin: a raw byte string. No automatic conversions ever. This is a type that
is needed for implementing String handling in RTL so overloading it with
another meaning will only result on problems.
If you want UTF-8 encoded strings then use UTF8String. Period.


 Furthermore the use of UTF-8 will allow for lossless conversions of
AnsiStrings of any encoding, with the result still being an AnsiString.
Here Delphi has the problem that a RawByteString result type requires a
conversion of an intermediate Unicode string (UTF-16) into an
AnsiString(CP_ACP), with possible losses. This is not required when FPC
treats UTF-8 as a fully supported encoding, in addition to CP_ACP - it also
were a strong argument for using UTF-8 for UnicodeString, *instead* of
UTF-16. The related functions already exist in the FPC libraries, they only
have to take precedence over CP_ACP (if different). Then additional
UTF-8/16 conversions are required only on Windows, when calling external
(API...) functions which expect/return WideStrings.

UnicodeString is *defined* as 2-Byte character reference counted string.
There will be no change there.

Regards,
Sven
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-26 Thread Jürgen Hestermann

Am 2013-12-25 19:50, schrieb Marco van de Voort:
 In short, I don't think fighting the native encoding of an target is worth
 the shallow appeal of the one encoding rules all principle. That is mostly
 pushed by people that don't even use windows, and thus won't feel the pain.

This is not true!
I am programming for Windows exlusively (currently) and still want UTF8 
everywhere.
UTF8 is the most useful encoding *and* Lazarus uses it *and* it is used in many 
other situations.
And I don't want to think about encodings all the time.
I want a single string type in my programs.


Therefore I now need to write my own Windows interface unit because FPC does not
provide Unicode file API functions.
If the Windows unit of FPC migrates to unicode soon I still cannot use it 
because
it would use the foolish UTF16 string type which I still need to convert to 
UTF8.
What an incredible short-sighted decision.
The unique opportunity to establish a single Unicode string type encoding (UTF8)
within the whole programming environment Free Pascal/Lazarus has been missed.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-26 Thread Hans-Peter Diettrich

Sven Barth schrieb:
Am 26.12.2013 02:19 schrieb Hans-Peter Diettrich drdiettri...@aol.com 
mailto:drdiettri...@aol.com:

 
  Sven Barth schrieb:
 
 
  If in 2.6.2 your three strings contain text of different encodings 
then the resulting string might be garbage from the user's POV.
  In trunk the encoding is part of each string and if they differ then 
each strings will be converted to the default string encoding (defined 
by a global variable inside unit System) and thus the string might still 
be valid.

 
 
  If so, this flaw should be fixed immediately. Delphi uses lossless 
conversions, i.e. an up-cast to Unicode.


No it does not. If the variables you concatenate are AnsiString and the 
variable or parameter you pass them to is AnsiString as well (AFAIK it 
even needs to be RawByteString) then the strings are converted to the 
system encoding before they are concatenated and passed. This is 
implemented Delphi compatible in FPC.


Please specify AnsiString, of which encoding?

When I concat an AnsiString and an UTF8String and assign it to an OEMString
  o := a + u;
then I get these warnings in XE:

[DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 
'AnsiString' to 'string'
[DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from 
'UTF8String' to 'string'
[DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with 
potential data loss from 'string' to 'OEMString'


I cannot see the system codepage used here.


What I want to point out are the string function overloads, where Delphi 
supplies only string (UTF-16) and RawByteString arguments, and 
AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String 
overloads and use these when dealing with AnsiStrings of an encoding 
different from CP_ACP.


  Such problems can be avoided by making RawByteString a compiler 
magic, that enforces a Unciode conversion whenever AnsiStrings of a 
different dynamic encoding have to be combined.


RawByteString is already as magical as it gets and exactly is what's on 
the tin: a raw byte string. No automatic conversions ever. This is a 
type that is needed for implementing String handling in RTL so 
overloading it with another meaning will only result on problems.

If you want UTF-8 encoded strings then use UTF8String. Period.


Please understand that the use of RawByteString in Delphi can lead to 
strings with wrong encoding. This type should not be available for 
declaring variables, only for parameters and function results. This 
restriction requires compiler magic.



UnicodeString is *defined* as 2-Byte character reference counted string. 
There will be no change there.


Sorry, I meant the generic string type.

DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-26 Thread Marco van de Voort
On Wed, Dec 25, 2013 at 10:43:24PM +0100, Jy V wrote:
 Sorry Marco,

No problem.
 
 On Wed, Dec 25, 2013 at 6:15 PM, Marco van de Voort mar...@stack.nl wrote:
 
  There is no utf8 on Windows. One can try to mess with the defaultcodepage,
  but that will probably only force a different kind of problems.
 
 
 I cannot let you answer alone and make you appear as the only knowledgeable
 reference for this important subject,
 it looks like defining default code page 65001 for Windows make it perfect
 fit to handle UTF-8
 http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx

As the others already said, that is mostly a target for conversions. (you
still need from/to utf8 conversion if you use utf8 documents, thus the
conversion routines support utf8). 

That is something else than Windows APIs (and stuff like MSXML, ADO etc)
support utf8.

utf8 proponents often then say then just set utf8 as default encoding, but
some limited experiments from me created more problems than it solved. IOW
it is always suggested as the solution, but few to none people did anything
substantial with it. I know the command shell can crash if you chcp 65001

It is mostly a suggestion to end debate and get their way.

I'm going to study the links posted in this thread (from the Microsoft guy)
to get more info myself too.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-26 Thread Juha Manninen
It happened again. The word Unicode was mentioned and the result is
an endless debate of how it should be done. Now  100 messages and
counting ...

I personally don't care much what the default encoding will be, but I
wonder how easy it will be to use UTF-8 for my employer's code.
The situation with FPC will be better than with Delphi because FPC
does not convert automatically to default encoding ALWAYS. It only
converts when the conversion is needed.
For example TStringList can be used for UTF8Strings and it does not
trigger automatic conversion.
Isn't it so? Please correct me if I still got it wrong.

It means UTF-8 with FPC will be easier than UTF-8 with Delphi, even if
UTF-16 was the default.

Juha

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-26 Thread Marco van de Voort
On Thu, Dec 26, 2013 at 12:28:54AM +0100, Hans-Peter Diettrich wrote:
  is dangerous if they are not all the same encoding. If there is any
  mismatch, it will be converted down to default encoding.
 
 Then the implementation is wrong. 

Wrong according to you. Not wrong according to defined Microsoft
applications.

This way of top-down thinking will turn FPC into a Java, where you are
lugging along an own platform-within-an-platform everywhere.

IMHO this is not desirable.

  There is no utf8 on Windows.
 
 Yep, that's why the Unicode (W) API should be used. No problem with
 UTF-8 strings there :-)

If you totally drop Delphi compatibility you can do whatever you want. But
IMHO that is more something for the Graeme's and Martin (MSEGUI's) of the
world, not Lazarus.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-26 Thread Sven Barth
Am 26.12.2013 12:30 schrieb Hans-Peter Diettrich drdiettri...@aol.com:

 Sven Barth schrieb:

 Am 26.12.2013 02:19 schrieb Hans-Peter Diettrich 
 drdiettri...@aol.commailto:
drdiettri...@aol.com:
 Please specify AnsiString, of which encoding?

 When I concat an AnsiString and an UTF8String and assign it to an
OEMString
   o := a + u;
 then I get these warnings in XE:

 [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
'AnsiString' to 'string'
 [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
'UTF8String' to 'string'
 [DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with potential
data loss from 'string' to 'OEMString'

 I cannot see the system codepage used here.

Try to make o of type RawByteString. And maybe also use more than two
strings.



 What I want to point out are the string function overloads, where Delphi
supplies only string (UTF-16) and RawByteString arguments, and
AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String overloads
and use these when dealing with AnsiStrings of an encoding different from
CP_ACP.

That was already discussed some time ago between devs and was deemed not
useable by Jonas. I'll try to find his mail with his explanation.

Regards,
Sven
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-26 Thread Sven Barth

On 26.12.2013 17:02, Sven Barth wrote:

Am 26.12.2013 12:30 schrieb Hans-Peter Diettrich drdiettri...@aol.com
mailto:drdiettri...@aol.com:
 
  Sven Barth schrieb:
 
  Am 26.12.2013 02:19 schrieb Hans-Peter Diettrich
drdiettri...@aol.com mailto:drdiettri...@aol.com
mailto:drdiettri...@aol.com mailto:drdiettri...@aol.com:
  Please specify AnsiString, of which encoding?
 
  When I concat an AnsiString and an UTF8String and assign it to an
OEMString
o := a + u;
  then I get these warnings in XE:
 
  [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
'AnsiString' to 'string'
  [DCC Warning] ConcTest.dpr(20): W1057 Implicit string cast from
'UTF8String' to 'string'
  [DCC Warning] ConcTest.dpr(20): W1058 Implicit string cast with
potential data loss from 'string' to 'OEMString'
 
  I cannot see the system codepage used here.

Try to make o of type RawByteString. And maybe also use more than two
strings.



Ok, I didn't remember the situation correctly. When searching for Jonas' 
mail I mentioned below I also found this which I was referring to:


=== quote of Jonas begin ===

var
  mypath: utf8string;
  sr: tsearchrec;
begin
  { assign some utf-8 string to mypath }
  if findfirst(mypath+allfilesmask,faAnyFile,sr)=0 then
begin
  ...
end;
end

If the DefaultSystemCodePage is something different from UTF-8, the 
result of mypath+allfilesmask will be downgraded to 
DefaultSystemCodePage because the string constant allfilesmask is 
encoded using that code page. This is due to rule that concatenating 
ansistrings with different encodings results in an ansistring with the 
encoding of the destination ansistring is followed, and the destination 
ansistring is a rawbytestring here (the first argument of findfirst), in 
which case the ansi encoding is used.


=== quote of Jonas end ===


 
 
  What I want to point out are the string function overloads, where
Delphi supplies only string (UTF-16) and RawByteString arguments, and
AnsiString(CP_ACP) in unit AnsiStrings. FPC could add UTF8String
overloads and use these when dealing with AnsiStrings of an encoding
different from CP_ACP.

That was already discussed some time ago between devs and was deemed not
useable by Jonas. I'll try to find his mail with his explanation.


=== quote of Jonas begin ===

Adding explicitly named UTF-8 versions of routines with constant or value
rawbytestring arguments (FindFirstUTF8 etc) with UTF8String arguments and
that internally simply call through to the rawbytestring versions could
perhaps be useful.  Interestingly, Lazarus users probably won't suffer
from this particular problem as they already use such routines from the
LCL, and those routines can simply be adapted by simply removing all the
UTF8ToSys calls (they will keep working in their current state though,
they simply keep suffering from the same data loss issues they had
before).

=== quote of Jonas end ===

Please note that Jonas states that different named overloads would be 
needed. Equally named UTF8String overloads won't necessarily work correctly.


Regards,
Sven

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-26 Thread Marcos Douglas
On Thu, Dec 26, 2013 at 1:04 PM, Marco van de Voort mar...@stack.nl wrote:
 On Thu, Dec 26, 2013 at 12:28:54AM +0100, Hans-Peter Diettrich wrote:
  is dangerous if they are not all the same encoding. If there is any
  mismatch, it will be converted down to default encoding.

 Then the implementation is wrong.

 Wrong according to you. Not wrong according to defined Microsoft
 applications.

 This way of top-down thinking will turn FPC into a Java, where you are
 lugging along an own platform-within-an-platform everywhere.

 IMHO this is not desirable.

  There is no utf8 on Windows.

 Yep, that's why the Unicode (W) API should be used. No problem with
 UTF-8 strings there :-)

 If you totally drop Delphi compatibility you can do whatever you want. But
 IMHO that is more something for the Graeme's and Martin (MSEGUI's) of the
 world, not Lazarus.

Ok... but if FPC, on Windows, will be UTF-16 and Lazarus continues
using UTF-8 what is the difference?
This approach is not like Delphi. It has the RTL and VCL using the
same encode... FPC RTL and LCL will continue fighting!  :(

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Jürgen Hestermann

Am 2013-12-25 01:36, schrieb Hans-Peter Diettrich:
 Whenever the encoding matters, most users and applications are best off
 with their regional Ansi encoding - all used characters are single bytes.

You forget that using ANSI API functions on Windows not only has the drawback
that you cannot access all files (which have unicode characters in them)
but also that there is the limit of 255 characters for the path length
(while unicode API functions allow up to 32k characters).
So you run into problems in 2 cases:

1.) if strings (file names) contain non-ANSI unicode characters
2.) if paths are longer than 255 characters

Do you realy advice people nowadays to restrict their programs so far by using 
ANSI API functions?
I wouldn't. I was always wondering why so many programs fail with these 2 
limitations on
Windows after an alternative has been available for such a long time.
Now you want to extent this time by yet another generation of programmers.
That's not good. Hopefully not too many programmers follow this road...

 UTF-16 extends the range of languages whose characters can be assumed to have 
a fixed size,

That's not true.
You still you cannot rely on having a number of bytes for characters in UTF16 
either.
Also, UTF8 would not have any BOM problem while UTF16 and UTF32 have.
So UTF16 has all drawbacks of all encodings but no benefit (except that this 
awfull decision is used by Windows).


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Graeme Geldenhuys
On 2013-12-24 17:13, Jürgen Hestermann wrote:
 All units used should use the same string encoding IMO.
 But which?

UTF-8 of course!  It's the newest Unicode encoding that overcomes all
problems found in other encodings. It is also the only Unicode encoding
that is backwards compatible with ASCII - hence the W3C and the rest of
the Internet etc standardised on it. It is also future proof and can
(again) be extended to full (4 byte range) or to using 5 or 6 byte code
points [*1]. Performance wise, it is also NOT any slower than any of the
other Unicode encodings.

Probably the only reason UTF-16 is still being used is because of
Windows - which used to use UCS2, and moving to UTF-16 was easier at the
time (and I don't think UTF-8 existed at that point).



[1] A couple years back they limited the range of UTF-8 so that it stays
compatible for now with the limited range of UTF-16. But the UTF-8
encoding can actually go all the way to 6 bytes per code page, which is
an absolute massive range.


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Graeme Geldenhuys
On 2013-12-25 10:05, Graeme Geldenhuys wrote:
 But which?
 
 UTF-8 of course!  It's the newest Unicode encoding that overcomes all
 problems found in other encodings.


This guy explains it very well.

  https://www.youtube.com/watch?v=MijmeoH9LT4


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Graeme Geldenhuys
On 2013-12-25 10:03, Jürgen Hestermann wrote:
 So UTF16 has all drawbacks of all encodings but no benefit (except
 that this awful decision is used by Windows).

+1


Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Hans-Peter Diettrich

Jürgen Hestermann schrieb:

Am 2013-12-25 01:36, schrieb Hans-Peter Diettrich:
  Whenever the encoding matters, most users and applications are best off
  with their regional Ansi encoding - all used characters are single 
bytes.


You forget that using ANSI API functions on Windows not only has the 
drawback

that you cannot access all files (which have unicode characters in them)
but also that there is the limit of 255 characters for the path length
(while unicode API functions allow up to 32k characters).


For that purpose (file names) I vote for a dedicated string type, that 
matches the target platform requirements. Then the user has not to look 
at filenames on a per-character base.



Do you realy advice people nowadays to restrict their programs so far by 
using ANSI API functions?


How many users have to use API functions, which are bound to a single 
platform? And which of these do not understand how to handle strings of 
whatever encoding?


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Martin Schreiber
On Wednesday 25 December 2013 11:03:55 Jürgen Hestermann wrote:

 So UTF16 has all drawbacks of all encodings but no benefit
 (except that this awfull decision is used by Windows).

This is not true. Everytime someone claims this nonsense I need to comment but 
I will not argue again. ;-)

Martin

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Marcos Douglas
On Tue, Dec 24, 2013 at 7:08 PM, Sven Barth pascaldra...@googlemail.com wrote:
 Am 24.12.2013 15:34 schrieb Marcos Douglas m...@delfire.net:


 Sorry if I say something crazy, but what do you think to use UTF-16 on
 {mode delphi} and UTF-8 in {mode fpc}?

 That is already the case with mode delphiunicode. But the big problem are
 classes and their inheritance. Take TStringList for example. Let's assume
 it's declared with String=AnsiString and you override it in a unit with
 String=UnicodeString then you'll get problems with overloads/overrides,
 because UnicodeString  AnsiString.

Hmm, you're right. Understood.


 The mode concept is all good and well, but here it breaks down... :(

So, if the {mode} continue to be a way, I think it should be used on
platform level, not per unit level.
Even if the programmer can to change this, he will change in all code.
Thinking better, this is to be used on compiler level, not source
level.

Regards,
Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Marco van de Voort
On Tue, Dec 24, 2013 at 12:33:41PM -0200, Marcos Douglas wrote:
  But the prime point is that IMHO an utf8 Windows is insane, and it should be
  possible to port modern Delphi VCL apps at least to Windows. Preferably to
  all.
 
 Sorry if I say something crazy, but what do you think to use UTF-16 on
 {mode delphi} and UTF-8 in {mode fpc}?

This is not possible. The precompiled files remain the same.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Marco van de Voort
On Wed, Dec 25, 2013 at 09:57:05AM -0200, Marcos Douglas wrote:
 
  The mode concept is all good and well, but here it breaks down... :(
 
 So, if the {mode} continue to be a way, I think it should be used on
 platform level, not per unit level.

That was the original proposal from me. Add encoding to the target. (so
i386-linux-utf8) and make a distro per target. Call them appropriately.

Encoding wise there are three options:

- ansi
- utf8
- utf16

but not all options are relevant for all targets. E.g. most *nix are utf8,
so it wouldn't make sense to make an ansi port. Windows does not support
utf8, so only ansi and utf16 would make sense.

Since that means typically two per target, it was suggested to combine this
using dotted unit functionality.

Keeping it in the same distribution at least gives hope of keeping encoding
agnostic units shared, but that required compiler extensions nobody started.
(I personally don't see the benefit in this)

 Even if the programmer can to change this, he will change in all code.
 Thinking better, this is to be used on compiler level, not source
 level.

Something like that, but not compiler, but unit directory on a per project
basis. Or dotted unit prefix in the dotted alternative.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Marco van de Voort
On Tue, Dec 24, 2013 at 12:22:41PM -0200, Marcos Douglas wrote:
  IMO the biggest group are old fashioned Delphi (D7) users, which want their
  existing Ansi/VCL code base supported *without* complications and
  incompatibilities introduced by the newer Delphi versions. The subject of
  this thread clearly indicates that UTF-8 is *not* a solution for this group
  of users.
 
 I started this thread. My problem isn't to use UTF-8 on Windows... my
 problem is use different encodings on the same code, ie, RTL  LCL.

Yes. But the selection of UTF8, and the legacy concerns with that are for
Lazarus, and lazarus alone.
 
 Use functions, always, to convert string between RTL and LCL and
 vice-versa IHMO is wrong because the final code is confusing. In a
 huge application you still need to think here is UTF-8 or
 ANSI/UTF-16?

There are many scenarios up in the sky, and nothing is 100% certain, but it
would at least be significantly better. It is already significantly better
in trunk.

The only problem on Windows is that you must only pass a string with a very
clear encoding to a RTL function.  

so 

 assignfile(f,s+s2+s3);  

is dangerous if they are not all the same encoding. If there is any
mismatch, it will be converted down to default encoding.

It is defined, but somewhat special.

  That's my conclusion as well. But is that new audience worth to abandon the
  entire existing Lazarus audience?
 
 Of course nobody will abandon the entire existing Lazarus audience. If
 the RTL will be UTF-16, UTF-32, whatever the Lazarus will continues --
 I think -- working using UTF-8.

There is no utf8 on Windows. One can try to mess with the defaultcodepage,
but that will probably only force a different kind of problems.

On Windows there is only ansi or utf16, or keeping it manual.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Marcos Douglas
On Wed, Dec 25, 2013 at 3:08 PM, Marco van de Voort mar...@stack.nl wrote:
 On Wed, Dec 25, 2013 at 09:57:05AM -0200, Marcos Douglas wrote:
 
  The mode concept is all good and well, but here it breaks down... :(

 So, if the {mode} continue to be a way, I think it should be used on
 platform level, not per unit level.

 That was the original proposal from me. Add encoding to the target. (so
 i386-linux-utf8) and make a distro per target. Call them appropriately.

 Encoding wise there are three options:

 - ansi
 - utf8
 - utf16

 but not all options are relevant for all targets. E.g. most *nix are utf8,
 so it wouldn't make sense to make an ansi port. Windows does not support
 utf8, so only ansi and utf16 would make sense.

Make sense.

 Since that means typically two per target, it was suggested to combine this
 using dotted unit functionality.

I did not understand this... dotted unit functionality?

 Keeping it in the same distribution at least gives hope of keeping encoding
 agnostic units shared, but that required compiler extensions nobody started.
 (I personally don't see the benefit in this)

... you mean 3 compiled units (ansi, utf8 and utf16) using dotted unit
names functionality?

 Even if the programmer can to change this, he will change in all code.
 Thinking better, this is to be used on compiler level, not source
 level.

 Something like that, but not compiler, but unit directory on a per project
 basis. Or dotted unit prefix in the dotted alternative.

Maybe I did not understand:
Using only directories or per project we have the same problem that
use {mode} directive, ie, TStringList could be compiled using utf-16
by default, the programmer inherit this class and compile your own
directory or project using utf-8... something will break.

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Marcos Douglas
On Wed, Dec 25, 2013 at 3:15 PM, Marco van de Voort mar...@stack.nl wrote:
 On Tue, Dec 24, 2013 at 12:22:41PM -0200, Marcos Douglas wrote:
  IMO the biggest group are old fashioned Delphi (D7) users, which want their
  existing Ansi/VCL code base supported *without* complications and
  incompatibilities introduced by the newer Delphi versions. The subject of
  this thread clearly indicates that UTF-8 is *not* a solution for this group
  of users.

 I started this thread. My problem isn't to use UTF-8 on Windows... my
 problem is use different encodings on the same code, ie, RTL  LCL.

 Yes. But the selection of UTF8, and the legacy concerns with that are for
 Lazarus, and lazarus alone.

 Use functions, always, to convert string between RTL and LCL and
 vice-versa IHMO is wrong because the final code is confusing. In a
 huge application you still need to think here is UTF-8 or
 ANSI/UTF-16?

 There are many scenarios up in the sky, and nothing is 100% certain, but it
 would at least be significantly better. It is already significantly better
 in trunk.

When you say that is better in trunk is only on FPC context or there
are improvements for Lazarus users too?

 The only problem on Windows is that you must only pass a string with a very
 clear encoding to a RTL function.

 so

  assignfile(f,s+s2+s3);

 is dangerous if they are not all the same encoding. If there is any
 mismatch, it will be converted down to default encoding.

Yes but where is the difference between 2.6.2 and trunk, in that case?

 It is defined, but somewhat special.

  That's my conclusion as well. But is that new audience worth to abandon the
  entire existing Lazarus audience?

 Of course nobody will abandon the entire existing Lazarus audience. If
 the RTL will be UTF-16, UTF-32, whatever the Lazarus will continues --
 I think -- working using UTF-8.

 There is no utf8 on Windows. One can try to mess with the defaultcodepage,
 but that will probably only force a different kind of problems.

 On Windows there is only ansi or utf16, or keeping it manual.

You're right.
But if we imagine a perfect world that FPC and Lazarus use the same
encode -- doesn't matter if is UTF-8 or UTF-16 -- everything would
work. Do you agree?
So, if the encode chosen was UTF-8 for all, RTL only needs to decode
strings -- on Windows -- before to call API functions.  The same on
Linux (whatever platforms that uses UTF-8) if the encode chosen was
UTF-16.

My thinking is correct?


Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Marco van de Voort
On Wed, Dec 25, 2013 at 04:34:40PM -0200, Marcos Douglas wrote:
  There are many scenarios up in the sky, and nothing is 100% certain, but it
  would at least be significantly better. It is already significantly better
  in trunk.
 
 When you say that is better in trunk is only on FPC context or there
 are improvements for Lazarus users too?

Functions like mkdir/findfirst/assign etc are encoding safe. The only
exception is the concatenation problem below.
 
  The only problem on Windows is that you must only pass a string with a very
  clear encoding to a RTL function.
 
  so
 
   assignfile(f,s+s2+s3);
 
  is dangerous if they are not all the same encoding. If there is any
  mismatch, it will be converted down to default encoding.

 Yes but where is the difference between 2.6.2 and trunk, in that case?

1. UTF16 works fine 
2. You can actually pass utf8 to functions, as long as you are careful with
concatenations.

  There is no utf8 on Windows. One can try to mess with the defaultcodepage,
  but that will probably only force a different kind of problems.
 
  On Windows there is only ansi or utf16, or keeping it manual.
 
 You're right.
 But if we imagine a perfect world that FPC and Lazarus use the same
 encode -- doesn't matter if is UTF-8 or UTF-16 -- everything would
 work. Do you agree?

The point is (as shown by the above problem) is that the choice must align
with what the OS offers. Because otherwise you are yet an island again.

E.g. the Windows unit (also in trunk) will only work in ansi or utf16.

 So, if the encode chosen was UTF-8 for all, RTL only needs to decode
 strings -- on Windows -- before to call API functions.  

The Windows unit is not wrapped, and the only unicode available on Windows
is UTF16. And the windows target converts mixes of 1-byte strings (say
ansi+utf8) to the default encoding (ansi).

One can attempt to fix that by messing with Windows encoding settings, but
the effect of doing that for large applications is unknown. 

Another possibility is using only 
own unicode routines (linking in the tables into each binaries). But that
again could lead to strange artefacts.

 The same on Linux (whatever platforms that uses UTF-8) if the encode chosen
 was UTF-16.

Yes. I don't think that is a good default choice either. But at least it has
some merits for modern Delphi compat.
 
 My thinking is correct?

Oversimplified. The RTL will never abstract everything, and there is the
issue of the default OS encoding.

In short, I don't think fighting the native encoding of an target is worth
the shallow appeal of the one encoding rules all principle. That is mostly
pushed by people that don't even use windows, and thus won't feel the pain.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Sven Barth
Am 25.12.2013 19:19 schrieb Marcos Douglas m...@delfire.net:
  Since that means typically two per target, it was suggested to combine
this
  using dotted unit functionality.

 I did not understand this... dotted unit functionality?

Delphi XE 2 with the introduction of FireMonkey switched from normal unit
names for the RTL to dotted ones (aka unit namespaces). E.g. SysUtils and
Classes became System.SysUtils and System.Classes respectively, the Windows
units moved into a Windows namespace (AFAIK) and Forms became VCL.Forms.
Now the XE2 IDE and command line compiler also provide the possibility to
specify multiple default namespaces (e.g. a VCL application would have
System and VCL) to ensure backwards compatibility.

Now the idea is to have dotted units in FPC where String=UnicodeString and
the legacy non-dotted ones where String=AnsiString. That only leaves out
Delphi 2009 and XE compatibility (which uses non-dotted UnicodeString
units), but that's a small price IMHO. Also there are a few further
problems that need to be tackled with that approach.

Regards,
Sven
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Jy V
Sorry Marco,

On Wed, Dec 25, 2013 at 6:15 PM, Marco van de Voort mar...@stack.nl wrote:

 There is no utf8 on Windows. One can try to mess with the defaultcodepage,
 but that will probably only force a different kind of problems.


I cannot let you answer alone and make you appear as the only knowledgeable
reference for this important subject,
it looks like defining default code page 65001 for Windows make it perfect
fit to handle UTF-8
http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx

Jerome.
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Sven Barth
Am 25.12.2013 19:35 schrieb Marcos Douglas m...@delfire.net:
  The only problem on Windows is that you must only pass a string with a
very
  clear encoding to a RTL function.
 
  so
 
   assignfile(f,s+s2+s3);
 
  is dangerous if they are not all the same encoding. If there is any
  mismatch, it will be converted down to default encoding.

 Yes but where is the difference between 2.6.2 and trunk, in that case?

If in 2.6.2 your three strings contain text of different encodings then the
resulting string might be garbage from the user's POV.
In trunk the encoding is part of each string and if they differ then each
strings will be converted to the default string encoding (defined by a
global variable inside unit System) and thus the string might still be
valid.

Regards,
Sven
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Craig Peterson
On Dec 25, 2013, at 3:43 PM, Jy V jyv...@gmail.com wrote:
 I cannot let you answer alone and make you appear as the only knowledgeable 
 reference for this important subject,
 it looks like defining default code page 65001 for Windows make it perfect 
 fit to handle UTF-8
 http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx

Windows doesn't support using UTF-8 as the default system code page and never 
will. Michael Kaplan, from Microsoft, has talked about it a number of times in 
his blog. The original site is unfortunately offline, but it's available 
through the Internet Archive at 
https://web.archive.org/web/20120414160234/http://blogs.msdn.com/b/michkap/archive/2006/10/11/816996.aspx
https://web.archive.org/web/20110108050100/http://blogs.msdn.com/b/michkap/archive/2006/07/04/656051.aspx

The short answer is that all of the selectable ANSI codepages have at most 2 
bytes, and UTF-8 can have up to 4, which would require auditing/updating huge 
amounts of code.

-- 
Craig Peterson
Scooter Software


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Sven Barth
Am 25.12.2013 22:44 schrieb Jy V jyv...@gmail.com:

 Sorry Marco,

 On Wed, Dec 25, 2013 at 6:15 PM, Marco van de Voort mar...@stack.nl
wrote:

 There is no utf8 on Windows. One can try to mess with the
defaultcodepage,
 but that will probably only force a different kind of problems.


 I cannot let you answer alone and make you appear as the only
knowledgeable reference for this important subject,
 it looks like defining default code page 65001 for Windows make it
perfect fit to handle UTF-8
 http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx

The Windows API *A functions use the system's (or more precisely user's)
code page set through the selected locale to convert ANSI to Unicode. So
even if you could set the system's codepage to UTF-8 (which you can not)
you'd need the user to change his/her codepage to UTF-8 which is a definite
no-go.

Regards,
Sven
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Marcos Douglas
On Wed, Dec 25, 2013 at 7:41 PM, Sven Barth pascaldra...@googlemail.com wrote:
 Am 25.12.2013 19:19 schrieb Marcos Douglas m...@delfire.net:
  Since that means typically two per target, it was suggested to combine
  this
  using dotted unit functionality.

 I did not understand this... dotted unit functionality?

 Delphi XE 2 with the introduction of FireMonkey switched from normal unit
 names for the RTL to dotted ones (aka unit namespaces). E.g. SysUtils and
 Classes became System.SysUtils and System.Classes respectively, the Windows
 units moved into a Windows namespace (AFAIK) and Forms became VCL.Forms.
 Now the XE2 IDE and command line compiler also provide the possibility to
 specify multiple default namespaces (e.g. a VCL application would have
 System and VCL) to ensure backwards compatibility.

Simple explanation, I understood, thank you.

So the new Delphi namespace is virtual (eg: there is no
VCL.Forms.pas file only a Forms.pas) or they have two options, two
files?

If is virtual and could be changed in command line compiler, looks
like an ideia that I had (posted on fpc-list) about namespaces to
use two units with the same name in the same project. ;-)

 Now the idea is to have dotted units in FPC where String=UnicodeString and
 the legacy non-dotted ones where String=AnsiString. That only leaves out
 Delphi 2009 and XE compatibility (which uses non-dotted UnicodeString
 units), but that's a small price IMHO. Also there are a few further problems
 that need to be tackled with that approach.

I see.
Well, it seems that the way has already been decided and is in development.

Thanks again for the update, in a few words, about the implementations
on the trunk.

Regards,
Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Hans-Peter Diettrich

Marco van de Voort schrieb:


The only problem on Windows is that you must only pass a string with a very
clear encoding to a RTL function.  

so 

 assignfile(f,s+s2+s3);  


is dangerous if they are not all the same encoding. If there is any
mismatch, it will be converted down to default encoding.


Then the implementation is wrong. All string conversion is done via UTF 
(lossless), the result can be either UTF-8 (FPC) or UTF-16 (Delphi). The 
final conversion depends on the target, i.e. the declaration of AssignFile.




There is no utf8 on Windows.


Yep, that's why the Unicode (W) API should be used. No problem with
UTF-8 strings there :-)

DoDi



--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-25 Thread Hans-Peter Diettrich

Sven Barth schrieb:

If in 2.6.2 your three strings contain text of different encodings then 
the resulting string might be garbage from the user's POV.
In trunk the encoding is part of each string and if they differ then 
each strings will be converted to the default string encoding (defined 
by a global variable inside unit System) and thus the string might still 
be valid.


If so, this flaw should be fixed immediately. Delphi uses lossless 
conversions, i.e. an up-cast to Unicode.


BTW the use of RawByteString variables or parameters *in Delphi* can 
result in stored strings of an encoding that doesn't match the 
declaration of the target variable. This in turn can confuse the 
compiler, when two such strings of the same declared (static) encoding, 
but of different actual (dynamic) encoding, are simply appended without 
further checks/conversions.


Such problems can be avoided by making RawByteString a compiler magic, 
that enforces a Unciode conversion whenever AnsiStrings of a different 
dynamic encoding have to be combined.


Furthermore the use of UTF-8 will allow for lossless conversions of 
AnsiStrings of any encoding, with the result still being an AnsiString. 
Here Delphi has the problem that a RawByteString result type requires a 
conversion of an intermediate Unicode string (UTF-16) into an 
AnsiString(CP_ACP), with possible losses. This is not required when FPC 
treats UTF-8 as a fully supported encoding, in addition to CP_ACP - it 
also were a strong argument for using UTF-8 for UnicodeString, *instead* 
of UTF-16. The related functions already exist in the FPC libraries, 
they only have to take precedence over CP_ACP (if different). Then 
additional UTF-8/16 conversions are required only on Windows, when 
calling external (API...) functions which expect/return WideStrings.



Conclusion:

FPC can treat RawByteString as *the one and only* string type of a 
variable dynamic encoding. Procedures accepting RawByteString arguments 
either retain the dynamic encoding of these strings, or convert 
parameters of different encoding into UTF-8. A conversion back to a 
different encoding may be required *only* when a RawByteString is 
assigned to a variable or parameter in another subroutine call.


There remains one problem with empty strings, whose declared encoding 
cannot be determined at runtime in the Delphi model, because empty 
strings are represented by Nil pointers. I can imagine two workarounds, 
to add an Encoding field to every string variable, or to make empty 
strings point to a string constant of their static encoding.



Alternatively typed AnsiStrings and RawByteString can be dropped, so 
that every AnsiString variable or parameter can have any dynamic 
encoding (equivalent to RawByteString), with the favorite encoding being 
UTF-8. This would allow to keep Lazarus and other existing code 
unmodified, all eventual string conversions can be inserted by the 
compiler, the obsolete UTF8... functions can be dropped.


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-24 Thread Jürgen Hestermann

Am 2013-12-23 23:08, schrieb Marco van de Voort:
 But if I have to chose to kill one, it is utf8. It is the lesser used choice
 for unicode strings INSIDE APPLICATIONS.  Yes, UTF8 is dominant in documents, 
but
 not in APIs.

But in APIs it would not matter much to convert (in general the time for 
conversion
is negligible compared to the time that is needed for the rest around the API 
call).

I have written a file manager for Windows that can log and store millions of 
files in memory.
It uses the (UTF16) unicode API from Windows and converts the file names as 
UTF8 internally.
There exists another file manager who uses UTF16 internally too which can also 
log millions of files.
When logging the same source I can't see any difference in performance (even 
when logging
multiple times so that everything is cached!) although I have to convert and 
the other one does not.
But the memory footprints are very different.


 UTF16 is the most horrible decision (all bad things combined).
 For what? Most of the sentiments I hear are echoed discussions on the web
 that are mostly about document encodings, NOT application internal
 encodings.

IMO this decision is based on the assumption to choose one encoding for 
everything.
So the same encoding is used *everywhere* as much as possible.
Then UTF8 is the best solution.
Why use UTF16/32? They cannot be treated the same as ancient ANSI strings 
either.
So what would be the reason behind it? Just wasting memory?


 UTF8 has the lowest memory demand
 Not according to 1 billion Chinese.

How many of the strings stored and processed on a chinese computer are in 
chinese language?
A lot of the strings are still in english (HTML etc.).
So for asian countries the real memory demand is a mix and is not so easy to 
determine.
In most western countries UTF8 definitely uses less memory.


 On the other hand, adapting the string encoding for each
 Widgetset/OS would be a can of worms IMO.
 If you feel that way, I think Delphi compatibility should prevail.

Why this?
Free Pascal/Lazarus should fledge and not repeat all the bad decissions of 
Borland/Embarcadero/..


 Note that the language support for utf8 breaks down when you pass e.g. a
 string to rawbytestring on Windows. (because it is converted to the
 default 1-byte encoding, which is not utf8 in general).

I am not sure what you are talking about here.
For Windows I would use the unicode (UTF16) API interface exclusively and
convert it to UTF8 internally. From then on, everything should be UTF8.


 As said, UTF8 on Windows is a crutch, and attempts to workaround that moves
 Lazarus in the direction of portability to everything as long as it is
 unix philosophies, a la Cygwin.

For me the decision of what Unicode encoding should be used is primary OS 
independent.
Just do the conversion once at the API interface level but then use internal 
what was
decided to be the best (UTF8 IMO). Conversions seem to be unavoidable anyway.
So it is just a decision where and when they take place.
And the API level is a good place IMO.
And when other OS's use the same encoding it is even better but not the reason 
to chose one or the other.


 A lot of additional knowledge about strings is put on the programmer
 because handling of strings has to be done differently depending on OS.

No!. That's just the aim: If *all* Free Pascal/Lazarus programmers can rely on 
having
UTF8 in all cases then you only need to handle UTF8 strings.
No IFDEFS to handle UTF16 on Windows and UTF8 on Linux.
The same code just works on *all* platforms!


 Constructs that happen to work with Linux will fail on Windows.
 Because on Windows the default 1-byte encoding is not UTF8.

The ANSI interface should not be used anymore. It is obsolete and only needed
for ancient OS's like DOS. But programmers should not be encourraged to use it
on modern platforms. Just use UTF8 *everywhere*. That should be the aim IMO.


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-24 Thread Marco van de Voort
On Tue, Dec 24, 2013 at 06:18:41AM +0100, Hans-Peter Diettrich wrote:
  
  Not necessarily. Supporting both on both platforms is a sane reason too.
  
  One can't ditch utf16 because of Delphi compatibility. It will be hard to
  ditch utf8 because of old Lazarus compatibility.
 
 In the meantime we have 2 Delphi compiler/RTL versions:
 - Ansi (Win32)
 - Unicode (UTF-16, multi target)
 and 4 GUI versions
 - VCL Win32
 - CLX
 - VCL.NET
 - FireMonkey
 summing up to 8 versions in theory, and 3 versions in practice.

The older delphi compilers are unsupported. We never supported anything but
VCL 32/64, so this list seems artificially inflated to me.

 So what does Delphi compatible mean *really*?

The same as it always has. VCL, and language level at a distance. The rest
is irrelevant.
 
 The FPC compiler supports multiple targets, and most probably can be 
 managed to support both string types using the *same code base* 
 (maintenance issue!). 

Yes.

 IMO this  does *not* apply to the libraries (RTL 
 and LCL) 

RTL is less of a problem than one might think. The problem mostly only comes
in at the classes level.

and existing applications, where Lazarus counts as the most 
 important and prominent application. 

Existing Lazarus applications are toast anyway, without changes.

 We can be happy to have one single LCL and IDE version, which is already
 incompatible due to the use of UTF-8 strings instead of Ansi.  Multiple
 versions, for compatibility with the other Delphi combinations, are beyond
 *development capacities*.

Then drop the old stuff, and simply go for full compatibility. Anything else
will only cause the loss of all OSS Delphi projects (and even the commercial
ones that support Lazarus).

And people like me that are torn between both systems.

 This sheds a very different light on Delphi compatibility, meaning that 
 a Unicode LCL and IDE can *not* be supported in parallel to the existing 
 UTF-8 implementation.

There is no existing UTF8 implementation that can be continued as is anyway.

 Dumping UTF-8 would discontinue support for the entire* range of
 **existing* LCL applications, i.e.  loose all the
 current Lazarus users :-(
 
 So what should be the intended *audience* for a future Lazarus version?

 IMO the biggest group are old fashioned Delphi (D7) users, which want 
 their existing Ansi/VCL code base supported *without* complications and 
 incompatibilities introduced by the newer Delphi versions. The subject 
 of this thread clearly indicates that UTF-8 is *not* a solution for this 
 group of users.

It was like that two years ago. But I see more and more people migrate to
the unicode versions, and updating packages. The D7 base is eroding, and
worse, many of its users are mostly hedging bets to keep their codebases
running. Not to make new code. (and we need people that DO things)

It's like with turbo pascal in the (1.0.x) past. Yes, the numbers are huge,
but all they say is they want something 100% compatible to effortless keep
their codebases running.  But when the times come to actually _invest_ in
the code again, they pick something that is at least halfwhat modern.  And
all you are stuck with is oldtimers and l33t tinkerers.

That is the curse of supporting legacy targets, you can't do that forever
without making yourself irrelevant.

Keep in mind that any Lazarus solution in production use based on 2.8.x is
years away. The current activity levels in that group will be even less. Our
decisions must be aimed not at the situation now, but good for at least 5
years.

 Another important user group is targeting mobiles, where time will tell 
 whether FM will ever succeed, or shares the fate of Kylix or VCL.NET. 

Everywhere I see FM (Mobile plugin) buyers, I see existing Delphi users
hoping for an easy conversion to mobile and a quick buck to tide them over
the crisis.  Not real go-getters that really go for mobile.

That makes me think this is not sustainable.

But Embarcadero is said to use it heavily internally, so they won't quickly
kill it off, and I assume a certain kind of customers will adapt it.

But IMHO for us it is irrelevant

 IMO these should be happy already with fpGUI or mseGUI, no need to raise 
 another competitor in this area.

I don't really see any adaptation there. Those teams and offerings are again
a magnitude smaller than Lazarus, and for most of those users switching from
Embacadero to Lazarus is already the biggest step they are willing to make.

  But if I have to chose to kill one, it is utf8. It is the lesser used choice
  for unicode strings INSIDE APPLICATIONS.  Yes, UTF8 is dominant in 
  documents, but
  not in APIs.
 
 That's my conclusion as well. But is that new audience worth to abandon 
 the entire existing Lazarus audience?

I myself hope for the two tracks way. It satisfies multiple demands, and the
extra work is offset by less rewriting from current Delphi sources and less
discussion.

But the prime point is that IMHO an utf8 Windows is 

Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-24 Thread Marcos Douglas
On Tue, Dec 24, 2013 at 3:18 AM, Hans-Peter Diettrich
drdiettri...@aol.com wrote:
 Marco van de Voort schrieb:

 On Mon, Dec 23, 2013 at 06:52:21PM +0100, J?rgen Hestermann wrote:

 Am 2013-12-23 11:32, schrieb Marco van de Voort:
   So I would say UTF16, and maybe, if there is demand, some can get utf8
 :-)

 The question is:
 Should FPC and LCL use a fixed encoding for all platforms
 or should the encoding be adapted for each WidgetSet/OS?


 Not necessarily. Supporting both on both platforms is a sane reason too.

 One can't ditch utf16 because of Delphi compatibility. It will be hard to
 ditch utf8 because of old Lazarus compatibility.


 In the meantime we have 2 Delphi compiler/RTL versions:
 - Ansi (Win32)
 - Unicode (UTF-16, multi target)
 and 4 GUI versions
 - VCL Win32
 - CLX
 - VCL.NET
 - FireMonkey
 summing up to 8 versions in theory, and 3 versions in practice.

 So what does Delphi compatible mean *really*?

 The FPC compiler supports multiple targets, and most probably can be managed
 to support both string types using the *same code base* (maintenance
 issue!). IMO this  does *not* apply to the libraries (RTL and LCL) and
 existing applications, where Lazarus counts as the most important and
 prominent application. We can be happy to have one single LCL and IDE
 version, which is already incompatible due to the use of UTF-8 strings
 instead of Ansi. Multiple versions, for compatibility with the other Delphi
 combinations, are beyond *development capacities*.

 This sheds a very different light on Delphi compatibility, meaning that a
 Unicode LCL and IDE can *not* be supported in parallel to the existing UTF-8
 implementation. Dumping UTF-8 would discontinue support for the *entire*
 range of *existing* LCL applications, i.e. loose all the current Lazarus
 users :-(

 So what should be the intended *audience* for a future Lazarus version?

 IMO the biggest group are old fashioned Delphi (D7) users, which want their
 existing Ansi/VCL code base supported *without* complications and
 incompatibilities introduced by the newer Delphi versions. The subject of
 this thread clearly indicates that UTF-8 is *not* a solution for this group
 of users.

I started this thread. My problem isn't to use UTF-8 on Windows... my
problem is use different encodings on the same code, ie, RTL  LCL.

Use functions, always, to convert string between RTL and LCL and
vice-versa IHMO is wrong because the final code is confusing. In a
huge application you still need to think here is UTF-8 or
ANSI/UTF-16?

 Another important user group is targeting mobiles, where time will tell
 whether FM will ever succeed, or shares the fate of Kylix or VCL.NET. IMO
 these should be happy already with fpGUI or mseGUI, no need to raise another
 competitor in this area.



 But if I have to chose to kill one, it is utf8. It is the lesser used
 choice
 for unicode strings INSIDE APPLICATIONS.  Yes, UTF8 is dominant in
 documents, but
 not in APIs.


 That's my conclusion as well. But is that new audience worth to abandon the
 entire existing Lazarus audience?

Of course nobody will abandon the entire existing Lazarus audience. If
the RTL will be UTF-16, UTF-32, whatever the Lazarus will continues --
I think -- working using UTF-8.

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-24 Thread Marcos Douglas
On Tue, Dec 24, 2013 at 12:19 PM, Marco van de Voort mar...@stack.nl wrote:
 On Tue, Dec 24, 2013 at 06:18:41AM +0100, Hans-Peter Diettrich wrote:
 
  Not necessarily. Supporting both on both platforms is a sane reason too.
 
  One can't ditch utf16 because of Delphi compatibility. It will be hard to
  ditch utf8 because of old Lazarus compatibility.

 In the meantime we have 2 Delphi compiler/RTL versions:
 - Ansi (Win32)
 - Unicode (UTF-16, multi target)
 and 4 GUI versions
 - VCL Win32
 - CLX
 - VCL.NET
 - FireMonkey
 summing up to 8 versions in theory, and 3 versions in practice.

 The older delphi compilers are unsupported. We never supported anything but
 VCL 32/64, so this list seems artificially inflated to me.

 So what does Delphi compatible mean *really*?

 The same as it always has. VCL, and language level at a distance. The rest
 is irrelevant.

 The FPC compiler supports multiple targets, and most probably can be
 managed to support both string types using the *same code base*
 (maintenance issue!).

 Yes.

 IMO this  does *not* apply to the libraries (RTL
 and LCL)

 RTL is less of a problem than one might think. The problem mostly only comes
 in at the classes level.

and existing applications, where Lazarus counts as the most
 important and prominent application.

 Existing Lazarus applications are toast anyway, without changes.

 We can be happy to have one single LCL and IDE version, which is already
 incompatible due to the use of UTF-8 strings instead of Ansi.  Multiple
 versions, for compatibility with the other Delphi combinations, are beyond
 *development capacities*.

 Then drop the old stuff, and simply go for full compatibility. Anything else
 will only cause the loss of all OSS Delphi projects (and even the commercial
 ones that support Lazarus).

 And people like me that are torn between both systems.

 This sheds a very different light on Delphi compatibility, meaning that
 a Unicode LCL and IDE can *not* be supported in parallel to the existing
 UTF-8 implementation.

 There is no existing UTF8 implementation that can be continued as is anyway.

 Dumping UTF-8 would discontinue support for the entire* range of
 **existing* LCL applications, i.e.  loose all the
 current Lazarus users :-(

 So what should be the intended *audience* for a future Lazarus version?

 IMO the biggest group are old fashioned Delphi (D7) users, which want
 their existing Ansi/VCL code base supported *without* complications and
 incompatibilities introduced by the newer Delphi versions. The subject
 of this thread clearly indicates that UTF-8 is *not* a solution for this
 group of users.

 It was like that two years ago. But I see more and more people migrate to
 the unicode versions, and updating packages. The D7 base is eroding, and
 worse, many of its users are mostly hedging bets to keep their codebases
 running. Not to make new code. (and we need people that DO things)

 It's like with turbo pascal in the (1.0.x) past. Yes, the numbers are huge,
 but all they say is they want something 100% compatible to effortless keep
 their codebases running.  But when the times come to actually _invest_ in
 the code again, they pick something that is at least halfwhat modern.  And
 all you are stuck with is oldtimers and l33t tinkerers.

 That is the curse of supporting legacy targets, you can't do that forever
 without making yourself irrelevant.

 Keep in mind that any Lazarus solution in production use based on 2.8.x is
 years away. The current activity levels in that group will be even less. Our
 decisions must be aimed not at the situation now, but good for at least 5
 years.

 Another important user group is targeting mobiles, where time will tell
 whether FM will ever succeed, or shares the fate of Kylix or VCL.NET.

 Everywhere I see FM (Mobile plugin) buyers, I see existing Delphi users
 hoping for an easy conversion to mobile and a quick buck to tide them over
 the crisis.  Not real go-getters that really go for mobile.

 That makes me think this is not sustainable.

 But Embarcadero is said to use it heavily internally, so they won't quickly
 kill it off, and I assume a certain kind of customers will adapt it.

 But IMHO for us it is irrelevant

 IMO these should be happy already with fpGUI or mseGUI, no need to raise
 another competitor in this area.

 I don't really see any adaptation there. Those teams and offerings are again
 a magnitude smaller than Lazarus, and for most of those users switching from
 Embacadero to Lazarus is already the biggest step they are willing to make.

  But if I have to chose to kill one, it is utf8. It is the lesser used 
  choice
  for unicode strings INSIDE APPLICATIONS.  Yes, UTF8 is dominant in 
  documents, but
  not in APIs.

 That's my conclusion as well. But is that new audience worth to abandon
 the entire existing Lazarus audience?

 I myself hope for the two tracks way. It satisfies multiple demands, and the
 extra work is offset by less rewriting from 

Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-24 Thread Jürgen Hestermann

Am 24.12.2013 15:22, schrieb Marcos Douglas:
 Use functions, always, to convert string between RTL and LCL and
 vice-versa IHMO is wrong because the final code is confusing. In a
 huge application you still need to think here is UTF-8 or
 ANSI/UTF-16?

That's true.
It's a pain to pay attention to this.
All units used should use the same string encoding IMO.
But which?
I think that's the discussion in  this thread.



 If the RTL will be UTF-16, UTF-32, whatever the Lazarus will continues --
 I think -- working using UTF-8.

But that would be a real pain.
In a program it should be possible to use strings
without the need to convert back and forth between encodings.
So all strings from/to FPC and LCL routines should have the same encoding.


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-24 Thread Marcos Douglas
On Tue, Dec 24, 2013 at 3:13 PM, Jürgen Hestermann
juergen.hesterm...@gmx.de wrote:
 Am 24.12.2013 15:22, schrieb Marcos Douglas:

 Use functions, always, to convert string between RTL and LCL and
 vice-versa IHMO is wrong because the final code is confusing. In a
 huge application you still need to think here is UTF-8 or
 ANSI/UTF-16?

 That's true.
 It's a pain to pay attention to this.

Someone agreed! :-)

 All units used should use the same string encoding IMO.
 But which?
 I think that's the discussion in  this thread.

Yes, this is the major problem... ;-)

 If the RTL will be UTF-16, UTF-32, whatever the Lazarus will continues --
 I think -- working using UTF-8.

 But that would be a real pain.

Would not.. IS a real pain today.

 In a program it should be possible to use strings
 without the need to convert back and forth between encodings.
 So all strings from/to FPC and LCL routines should have the same encoding.

This will depend only on the FPC team...

When I created this thread I was looking for a way to only minimize
this problem but...


Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-24 Thread Sven Barth
Am 24.12.2013 15:34 schrieb Marcos Douglas m...@delfire.net:
 Sorry if I say something crazy, but what do you think to use UTF-16 on
 {mode delphi} and UTF-8 in {mode fpc}?

That is already the case with mode delphiunicode. But the big problem are
classes and their inheritance. Take TStringList for example. Let's assume
it's declared with String=AnsiString and you override it in a unit with
String=UnicodeString then you'll get problems with overloads/overrides,
because UnicodeString  AnsiString.

The mode concept is all good and well, but here it breaks down... :(

Regards,
Sven
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-24 Thread Hans-Peter Diettrich

Jürgen Hestermann schrieb:

The ANSI interface should not be used anymore. It is obsolete and only 
needed
for ancient OS's like DOS. But programmers should not be encourraged to 
use it
on modern platforms. Just use UTF8 *everywhere*. That should be the aim 
IMO.


Whenever the encoding matters, most users and applications are best off 
with their regional Ansi encoding - all used characters are single 
bytes. UTF-16 extends the range of languages whose characters can be 
assumed to have a fixed size, i.e. all character sets in the BMP. Such 
fixed-size characters IMO are on the top of the wishlist of most users, 
so that none of them ever will be happy with UTF-8. Certainly UTF-8 was 
the best choice when Delphi (and FPC) did not have native UTF-16 
strings, but when we have Unicode strings, now or soon, it should be 
dropped.


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-24 Thread Hans-Peter Diettrich

Marcos Douglas schrieb:

On Tue, Dec 24, 2013 at 3:18 AM, Hans-Peter Diettrich
drdiettri...@aol.com wrote:



I started this thread. My problem isn't to use UTF-8 on Windows... my
problem is use different encodings on the same code, ie, RTL  LCL.


This mix would cause problems, of course.


Use functions, always, to convert string between RTL and LCL and
vice-versa IHMO is wrong because the final code is confusing. In a
huge application you still need to think here is UTF-8 or
ANSI/UTF-16?


The simplest (feasable) solution IMO is the adaptation of (OS...) string 
types behind the scene, i.e. inside the RTL and widgetsets. Then you can 
have any common encoding in the application and library API, while 
encoding-dependent code is encapsulated in lower level functions 
receiving explicit (Unicode, UTF8String...) string types, so that the 
compiler can insert required conversions. Such explicit parameter types 
also were required for legacy code, where a specific encoding is 
assumed. I'm not sure how this conversion process can be automated or 
supported, perhaps removing/renaming the tradional UTF8... functions 
would help in spotting the procedures that require special attention.


The number of automatic conversions can be reduced in the next step, by 
e.g. adding overrides, or conditional code, for both string types one by 
one, as time permits.


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-23 Thread Marco van de Voort
On Sun, Dec 22, 2013 at 11:52:04PM +0100, Hans-Peter Diettrich wrote:
  Keeping UTF8 on Windows makes a majority platform seem only half supported.
  Not good either. Worse, it is Delphi incompatible.
 
 You favor a special FPC and Lazarus for Windows, in addition to the 
 UTF-8 version for all other platforms?

IMHO the utf8 is not a done deal, and Delphi compatibility requires at least
also UTF16 on other platforms.

QT is utf16, and so is Cocoa. Only GTK is utf8

So I would say UTF16, and maybe, if there is demand, some can get utf8 :-)



--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-23 Thread Hans-Peter Diettrich

Juha Manninen schrieb:


However using a new Unicode-Delphi would cause
many problems because all VCL functions and classes, including
TStringList, expect UTF-16 string. When using UTF8String, the compiler
converts between encodings all the time.


Then you can give your favorite string type a unique name, and set it to 
whatever is best in your favorite environment.


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-23 Thread Juha Manninen
On Mon, Dec 23, 2013 at 1:58 PM, Hans-Peter Diettrich
drdiettri...@aol.com wrote:
 Then you can give your favorite string type a unique name, and set it to
 whatever is best in your favorite environment.

The favorite string type in this case would be UTF8String. It already
has a name. Please see what I was writing earlier.

Juha

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-23 Thread Jürgen Hestermann

Am 2013-12-23 11:32, schrieb Marco van de Voort:
 So I would say UTF16, and maybe, if there is demand, some can get utf8 :-)

The question is:
Should FPC and LCL use a fixed encoding for all platforms
or should the encoding be adapted for each WidgetSet/OS?


If it should be the same for all platforms then it should be UTF8 IMO.
UTF16 is the most horrible decision (all bad things combined).
UTF32 would at least have the advantage of fixed character size
but pays this with *a lot* of memory consumption.
UTF8 has the lowest memory demand (in general) and a good
backward compatibility.


On the other hand, adapting the string encoding for each
Widgetset/OS would be a can of worms IMO.
A lot of additional knowledge about strings is put on the programmer
because handling of strings has to be done differently depending on OS.
That would be a hazadous decision and would only be of use if programs
are exclusively written for one OS only.
But FPC/Lazarus is meant to be portable so this should not be done.


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-23 Thread Marcos Douglas
On Mon, Dec 23, 2013 at 8:38 AM, Marco van de Voort mar...@stack.nl wrote:
 On Sun, Dec 22, 2013 at 05:06:27PM -0200, Marcos Douglas wrote:
  FPC 2.7.x can compile the windows unit in unicode (UTF16) mode. Most 
  system and
  sysutils file related routines are already unicode (UTF16 with 
  Rawbytestring
  overload).

 So FPC 2.7.x can compile the windows unit in unicode (UTF16) mode. But
 how it will work with Lazarus that uses UTF-8?

 Not without conversions. UTF8 on Windows IMHO _NEVER_ was a good idea.

 Lazarus will not to
 change to UTF-16 -- only for Windows -- then everything will stay the
 same to Windows programmers?

 I think it is too early to say what will happen. One way or the other.
 Everybody is still searching, and the current 2.6.x based UTF8 support will
 need an overhaul anyway for 2.8.x.

 I think 2.8.x will be a transition version anyway, and a definitive unicode
 solution will only in the major release after that.

Ok, thanks for the explanation.

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-23 Thread Marco van de Voort
On Mon, Dec 23, 2013 at 06:52:21PM +0100, J?rgen Hestermann wrote:
 Am 2013-12-23 11:32, schrieb Marco van de Voort:
   So I would say UTF16, and maybe, if there is demand, some can get utf8 :-)
 
 The question is:
 Should FPC and LCL use a fixed encoding for all platforms
 or should the encoding be adapted for each WidgetSet/OS?

Not necessarily. Supporting both on both platforms is a sane reason too.

One can't ditch utf16 because of Delphi compatibility. It will be hard to
ditch utf8 because of old Lazarus compatibility.

But if I have to chose to kill one, it is utf8. It is the lesser used choice
for unicode strings INSIDE APPLICATIONS.  Yes, UTF8 is dominant in documents, 
but
not in APIs.

 If it should be the same for all platforms then it should be UTF8 IMO.
 UTF16 is the most horrible decision (all bad things combined).

For what? Most of the sentiments I hear are echoed discussions on the web
that are mostly about document encodings, NOT application internal
encodings.

However we

 UTF32 would at least have the advantage of fixed character size
 but pays this with *a lot* of memory consumption.

(it is not fixed character, but fixed codepoint)

 UTF8 has the lowest memory demand

Not according to 1 billion Chinese.

 (in general) and a good backward compatibility.

Hardly. Only for western languages, and even there conversions often go
wrong. That's why the whole BOM kludge became so important.

 On the other hand, adapting the string encoding for each
 Widgetset/OS would be a can of worms IMO.

If you feel that way, I think Delphi compatibility should prevail. Old
Lazarus code needs to be modified anyway. 

Note that the language support for utf8 breaks down when you pass e.g. a
string to rawbytestring on Windows. (because it is converted to the
default 1-byte encoding, which is not utf8 in general).

As said, UTF8 on Windows is a crutch, and attempts to workaround that moves
Lazarus in the direction of portability to everything as long as it is
unix philosophies, a la Cygwin. 

IMHO a bad direction. FPC has in general avoided having an outright
preference and IMHO should continue to do so.

 A lot of additional knowledge about strings is put on the programmer
 because handling of strings has to be done differently depending on OS.

It will anyway, even with utf8. Constructs that happen to work with Linux
will fail on Windows. Because on Windows the default 1-byte encoding is not
UTF8.

Moreover, I think  people step over the Delphi compatibility card too easy.
Way, way ,way to easy.  

 But FPC/Lazarus is meant to be portable so this should not be done.

FPC/Lazarus is supposed to be portable, not an emulated Unix on everything.
Using other systems default encoding is emulation, and not portability.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-23 Thread Hans-Peter Diettrich

Marco van de Voort schrieb:

On Mon, Dec 23, 2013 at 06:52:21PM +0100, J?rgen Hestermann wrote:

Am 2013-12-23 11:32, schrieb Marco van de Voort:
  So I would say UTF16, and maybe, if there is demand, some can get utf8 :-)

The question is:
Should FPC and LCL use a fixed encoding for all platforms
or should the encoding be adapted for each WidgetSet/OS?


Not necessarily. Supporting both on both platforms is a sane reason too.

One can't ditch utf16 because of Delphi compatibility. It will be hard to
ditch utf8 because of old Lazarus compatibility.


In the meantime we have 2 Delphi compiler/RTL versions:
- Ansi (Win32)
- Unicode (UTF-16, multi target)
and 4 GUI versions
- VCL Win32
- CLX
- VCL.NET
- FireMonkey
summing up to 8 versions in theory, and 3 versions in practice.

So what does Delphi compatible mean *really*?

The FPC compiler supports multiple targets, and most probably can be 
managed to support both string types using the *same code base* 
(maintenance issue!). IMO this  does *not* apply to the libraries (RTL 
and LCL) and existing applications, where Lazarus counts as the most 
important and prominent application. We can be happy to have one single 
LCL and IDE version, which is already incompatible due to the use of 
UTF-8 strings instead of Ansi. Multiple versions, for compatibility with 
the other Delphi combinations, are beyond *development capacities*.


This sheds a very different light on Delphi compatibility, meaning that 
a Unicode LCL and IDE can *not* be supported in parallel to the existing 
UTF-8 implementation. Dumping UTF-8 would discontinue support for the 
*entire* range of *existing* LCL applications, i.e. loose all the 
current Lazarus users :-(


So what should be the intended *audience* for a future Lazarus version?

IMO the biggest group are old fashioned Delphi (D7) users, which want 
their existing Ansi/VCL code base supported *without* complications and 
incompatibilities introduced by the newer Delphi versions. The subject 
of this thread clearly indicates that UTF-8 is *not* a solution for this 
group of users.


Another important user group is targeting mobiles, where time will tell 
whether FM will ever succeed, or shares the fate of Kylix or VCL.NET. 
IMO these should be happy already with fpGUI or mseGUI, no need to raise 
another competitor in this area.




But if I have to chose to kill one, it is utf8. It is the lesser used choice
for unicode strings INSIDE APPLICATIONS.  Yes, UTF8 is dominant in documents, 
but
not in APIs.


That's my conclusion as well. But is that new audience worth to abandon 
the entire existing Lazarus audience?


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-22 Thread Juha Manninen
On Sat, Dec 21, 2013 at 5:41 PM, Marcos Douglas m...@delfire.net wrote:
 LCL (and VCL) typically use events, like TNotifyEvent. They are
 basically just call-back functions.
 Oh, not same. I use a lot Events -- no only Form or GUI components --
 in my core codes but PostMessage is very different, eg., you call a
 PostMessage, show a Modal Form and the process will start after; the
 task code is not inside the instance of the Form and the Form knows
 nothing about the task.

Ok, true.
Some of the Windows message are ported to be cross-platform. I have
used OnIdle handler and sometimes threads when I want the action to
happen later.

Juha

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-22 Thread Juha Manninen
On Sat, Dec 21, 2013 at 7:55 PM, Jürgen Hestermann
juergen.hesterm...@gmx.de wrote:
 The bottom line is: Use only UTF-16 with Delphi and it works very well.
 I would not like Lazarus to do the same.
 UTF16 is the worst of all possible unicode encodings.

I believe LCL will continue to use UTF-8. Nobody knows yet how many
changes are needed later with new FPC versions but no worries, that
question is not acute now.

Juha

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-22 Thread Marcos Douglas
On Sun, Dec 22, 2013 at 7:06 AM, Juha Manninen
juha.mannine...@gmail.com wrote:
 On Sat, Dec 21, 2013 at 5:41 PM, Marcos Douglas m...@delfire.net wrote:
 LCL (and VCL) typically use events, like TNotifyEvent. They are
 basically just call-back functions.
 Oh, not same. I use a lot Events -- no only Form or GUI components --
 in my core codes but PostMessage is very different, eg., you call a
 PostMessage, show a Modal Form and the process will start after; the
 task code is not inside the instance of the Form and the Form knows
 nothing about the task.

 Ok, true.
 Some of the Windows message are ported to be cross-platform. I have
 used OnIdle handler and sometimes threads when I want the action to
 happen later.

I use threads too, but I like make things as simple as possible and
threads can be hard sometimes. Use PostMessage is very easy and
simple.

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-22 Thread Marco van de Voort
On Sun, Dec 15, 2013 at 06:13:32PM +0100, Reinier Olislagers wrote:
  FPC's context.
  These components do not use Lazarus' routines and that is the BIG
  problem. I need to remember in pass only ANSI strings for these
  components as remember to convert the component's output string
  results to use in Lazarus.
 
 Why not just include a project reference to LCLBase (IIRC that should be
 enough) and just always use the LCL units until FPC catches up?

FPC 2.7.x can compile the windows unit in unicode (UTF16) mode. Most system and 
sysutils file related routines are already unicode (UTF16 with Rawbytestring
overload).


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-22 Thread Marco van de Voort
On Wed, Dec 18, 2013 at 12:03:56PM +0100, Hans-Peter Diettrich wrote:
  Apart from that there's not much else you can do except contribute
  patches to help unicode-ise the FPC RTL...
 
 The new AnsiStrings (with Encoding and automatic conversion) should be
 sufficient, Unicode is not required. In fact a move to a Unicode RTL 
 would require that either Lazarus is converted, too, or that 2 RTL 
 flavors (Ansi and Unicode) must be supported. Not a good idea, IMO.

Keeping UTF8 on Windows makes a majority platform seem only half supported.
Not good either. Worse, it is Delphi incompatible.


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-22 Thread Marcos Douglas
On Sun, Dec 22, 2013 at 4:56 PM, Marco van de Voort mar...@stack.nl wrote:
 On Sun, Dec 15, 2013 at 06:13:32PM +0100, Reinier Olislagers wrote:
  FPC's context.
  These components do not use Lazarus' routines and that is the BIG
  problem. I need to remember in pass only ANSI strings for these
  components as remember to convert the component's output string
  results to use in Lazarus.

 Why not just include a project reference to LCLBase (IIRC that should be
 enough) and just always use the LCL units until FPC catches up?

 FPC 2.7.x can compile the windows unit in unicode (UTF16) mode. Most system 
 and
 sysutils file related routines are already unicode (UTF16 with Rawbytestring
 overload).

So FPC 2.7.x can compile the windows unit in unicode (UTF16) mode. But
how it will work with Lazarus that uses UTF-8? Lazarus will not to
change to UTF-16 -- only for Windows -- then everything will stay the
same to Windows programmers?

Thanks,
Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-22 Thread Hans-Peter Diettrich

Marco van de Voort schrieb:


Keeping UTF8 on Windows makes a majority platform seem only half supported.
Not good either. Worse, it is Delphi incompatible.


You favor a special FPC and Lazarus for Windows, in addition to the 
UTF-8 version for all other platforms?


DoDi


--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-21 Thread Marcos Douglas
On Sat, Dec 21, 2013 at 5:56 AM, Juha Manninen
juha.mannine...@gmail.com wrote:
 On Sat, Dec 21, 2013 at 3:08 AM, Marcos Douglas m...@delfire.net wrote:
 I didn't understand. If I have a TStringList instance, on Windows, I
 need to convert Text property to ANSI. But some components, e.g.
 TMemo, do these conversions automatically, but this is different.

 TMemo is a GUI component.

I know, of course... :)

 Then the string encoding matters and it must
 be converted to the native widgetset encoding. Still, the conversion
 is automatic. You don't need to care about it if you work with LCL
 components only and not with WinAPI directly.

Yes and so I wrote TMemo, do these conversions automatically, but
this is different.

 TStringList is not a GUI component. It can be used for example in an
 embedded Linux program with no GUI.

Yes again. I use a lot TStrings as a transfer of information in many
cases... no GUI envolved.

 It does not need to know the encoding (except for sorting maybe). With
 FPC no automatic conversions happen.

And that is one of these problems because I need convert the Text
property to the right encode.

 In Delphi things are different. The auto-conversion happens ALWAYS
 when assigning between eg. UTF-8 and UTF-16. It has nothing to do with
 WinAPI, or any other widgetset API.
 Native string is UTF-16. If you have
   var MyUTF8Str: UTF8String;
   ...
   StringList.Add(MyUTF8Str);   - triggers conversion
   MyUTF8Str := StringList[0];   - triggers conversion again

 The amazing thing is that such code works. Delphi does a good job in
 converting the strings.

That's it!
I think you talking about of new versions of Delphi, right? So I
always read that new Unicode implementation in new versions of
Delphi is wrong, broke things, etc. but you is writing other vision.
These conversions, IMHO, could be automatic -- as Delphi does -- when
I use the correct type of string, in that case UT8String. So, I can
write my packages and opt to use only UTF8String or UTF16String in all
arguments and the compiler convert for me. What is wrong in that
approach?

 It is also reasonably fast, but still not acceptable in a speed
 critical code. This was the problem in my employer's code base. We are
 thinking how to use UTF-8 for the core program without triggering many
 auto-conversions. One choice is to dump Delphi and use only FPC. Now
 the code still works with both.

If you do not want automatic conversions, use the RawByteString type.
Delphi does not do conversions in that case, right?
Thank you, I'm learning.

Marcos Douglas

 P.S.
   I am still wondering why you are so fond of WinAPI while you have a
 nice cross-platform API available.

Fond? Of course not! I use WinAPI when I need or when I don't know
another way to do the same using cross-plataform. I'm a classic
Delphi programmer. I still use Delphi (stoped in 7 version) today but
all new projects I use Lazarus -- MSEgui a little.
For example, I use a lot PostMessage, SendMessage, PeekMessage... Are
these cross-plataform? If not, how can I do the same?

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-21 Thread Juha Manninen
On Sat, Dec 21, 2013 at 4:33 PM, Marcos Douglas m...@delfire.net wrote:
 That's it!
 I think you talking about of new versions of Delphi, right? So I
 always read that new Unicode implementation in new versions of
 Delphi is wrong, broke things, etc. but you is writing other vision.

Yes, Delphi 2009+. Delphi 2009 is soon 5 years old, not really new any more.
IMO it does a good job for such a fundamental change in string type.
Only code that relies on sizeof(char) = 1 does not work. It includes
streaming strings, file I/O or I/O with some outside devices, using
Length(Str) as a parameter for GetMem(), Move() etc.
Most clean code works amazingly well, if you are ok with using
UTF-16 everywhere.


 These conversions, IMHO, could be automatic -- as Delphi does -- when
 I use the correct type of string, in that case UT8String. So, I can
 write my packages and opt to use only UTF8String or UTF16String in all
 arguments and the compiler convert for me. What is wrong in that
 approach?

Nothing wrong I guess. I hope it will be possible with FPC. Still,
let's not speculate more, we already have such mail threads in fpc-dev
list that continued for months.


 If you do not want automatic conversions, use the RawByteString type.
 Delphi does not do conversions in that case, right?
 Thank you, I'm learning.

You can bypass the conversion sometimes by using RawByteString but it
would be rather hackish. Remember, all VCL classes and string
functions expect UTF-16. I don't want to try what happens if you pass
them a UTF-8 encoded string using some hack.
The bottom line is: Use only UTF-16 with Delphi and it works very well.


 For example, I use a lot PostMessage, SendMessage, PeekMessage... Are
 these cross-plataform? If not, how can I do the same?

LCL (and VCL) typically use events, like TNotifyEvent. They are
basically just call-back functions.

Juha

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-21 Thread Marcos Douglas
On Sat, Dec 21, 2013 at 1:18 PM, Juha Manninen
juha.mannine...@gmail.com wrote:
 On Sat, Dec 21, 2013 at 4:33 PM, Marcos Douglas m...@delfire.net wrote:
 That's it!
 I think you talking about of new versions of Delphi, right? So I
 always read that new Unicode implementation in new versions of
 Delphi is wrong, broke things, etc. but you is writing other vision.

 Yes, Delphi 2009+. Delphi 2009 is soon 5 years old, not really new any more.
 IMO it does a good job for such a fundamental change in string type.
 Only code that relies on sizeof(char) = 1 does not work. It includes
 streaming strings, file I/O or I/O with some outside devices, using
 Length(Str) as a parameter for GetMem(), Move() etc.
 Most clean code works amazingly well, if you are ok with using
 UTF-16 everywhere.


 These conversions, IMHO, could be automatic -- as Delphi does -- when
 I use the correct type of string, in that case UT8String. So, I can
 write my packages and opt to use only UTF8String or UTF16String in all
 arguments and the compiler convert for me. What is wrong in that
 approach?

 Nothing wrong I guess. I hope it will be possible with FPC. Still,
 let's not speculate more, we already have such mail threads in fpc-dev
 list that continued for months.


 If you do not want automatic conversions, use the RawByteString type.
 Delphi does not do conversions in that case, right?
 Thank you, I'm learning.

 You can bypass the conversion sometimes by using RawByteString but it
 would be rather hackish. Remember, all VCL classes and string
 functions expect UTF-16. I don't want to try what happens if you pass
 them a UTF-8 encoded string using some hack.
 The bottom line is: Use only UTF-16 with Delphi and it works very well.


I always said here -- FPC/Lazarus lists -- that FPC should never
follow Delphi but you're making me change my mind about Unicode
implementation.
Ok, no more speculations about how next FPC will work with Unicode.

 For example, I use a lot PostMessage, SendMessage, PeekMessage... Are
 these cross-plataform? If not, how can I do the same?

 LCL (and VCL) typically use events, like TNotifyEvent. They are
 basically just call-back functions.

Oh, not same. I use a lot Events -- no only Form or GUI components --
in my core codes but PostMessage is very different, eg., you call a
PostMessage, show a Modal Form and the process will start after; the
task code is not inside the instance of the Form and the Form knows
nothing about the task.


Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-21 Thread Jürgen Hestermann


Am 2013-12-21 16:18, schrieb Juha Manninen:

The bottom line is: Use only UTF-16 with Delphi and it works very well.


I would not like Lazarus to do the same.
UTF16 is the worst of all possible unicode encodings.



--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-20 Thread Michael Schnell

On 12/20/2013 07:22 AM, Juha Manninen wrote:
When Unicode is mentioned, usually people start to argue about how 
it SHOULD be done.


:-) :-) :-)

Especially because the big boss Delphi does not provide a really good 
model to go for. Delphi String handling is done with UTF16 (using other 
encoding results in bad performance and other problems) in mind and with 
no respect to portability at all.


And in spite of that there still are some soles that claim Unicode 
support is not a complicated thing :-(


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-20 Thread Jy V
On Wed, Dec 18, 2013 at 6:48 AM, Juha Manninen
juha.mannine...@gmail.com wrote:
 What more, UTF-16 is confusing because it has variations. It all is
 well explained here:
   http://www.utf8everywhere.org/

my experience at: http://www.utf8bootcamp.org/
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-20 Thread Marcos Douglas
On Fri, Dec 20, 2013 at 3:19 AM, Jürgen Hestermann
juergen.hesterm...@gmx.de wrote:
 Am 2013-12-19 23:04, schrieb Marcos Douglas:

 Well, the same problem...
 If there is no solution (for now), I prefer using SysToUTF8/ UTF8ToSys
 because is more simpler than use WideString API and conversion to
 UnicodeString, UTF8Decode, etc. Don't you think?

 As Bart already mentions, the ANSI (SYS) interface does *not* support
 Unicode.
 Also, you are not be able to access long paths (longer than 255 characters)
 when using ANSI API functions.
 Therefore the [W]ide (unicode) character API functions are a must.

So, these limitations exist in Lazarus too, right?

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-20 Thread Mattias Gaertner
On Fri, 20 Dec 2013 17:55:48 -0200
Marcos Douglas m...@delfire.net wrote:

 On Fri, Dec 20, 2013 at 3:19 AM, Jürgen Hestermann
 juergen.hesterm...@gmx.de wrote:
  Am 2013-12-19 23:04, schrieb Marcos Douglas:
 
  Well, the same problem...
  If there is no solution (for now), I prefer using SysToUTF8/ UTF8ToSys
  because is more simpler than use WideString API and conversion to
  UnicodeString, UTF8Decode, etc. Don't you think?
 
  As Bart already mentions, the ANSI (SYS) interface does *not* support
  Unicode.
  Also, you are not be able to access long paths (longer than 255 characters)
  when using ANSI API functions.
  Therefore the [W]ide (unicode) character API functions are a must.
 
 So, these limitations exist in Lazarus too, right?

The file functions with UTF8 in name use internally the W functions
under Windows.

Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-20 Thread Marcos Douglas
On Fri, Dec 20, 2013 at 4:22 AM, Juha Manninen
juha.mannine...@gmail.com wrote:
 On Fri, Dec 20, 2013 at 2:47 AM, Marcos Douglas m...@delfire.net wrote:
 Not _only_ to UTF-16? It will depend on the OS?

 No, FPC string will know its encoding and the conversion is made to
 any encoding but only when needed.
 Let's not go deeper into this subject here. The details of future FPC
 are still open and they are not yet documented.
 When Unicode is mentioned, usually people start to argue about how
 it SHOULD be done.
 You can search in fpc-pascal and fpc-dev histories for that.

Ok, you're right.

 For now (2.6.2) works ok only for AnsiString... I'm talking about
 codify TStringList class to work with UTF-8 but no changes in string
 type arguments.

 Again no. TStringList in 2.6.2 works ok for UTF-8 encoded strings, too.
 The same is true for future FPC versions because they are not
 hard-coded for UTF-16 (as Delphi is).

I didn't understand. If I have a TStringList instance, on Windows, I
need to convert Text property to ANSI. But some components, e.g.
TMemo, do these conversions automatically, but this is different.

 With Delphi you would need to copy the whole class, name it
 TUtf8StringList, and replace string with UTF8String.
 This new class must NOT inherit from Classes.TStringList.

 The same here... I think.

 No no no :)
But you talking about to make a new StringList... this is not the proposal.  ;-)

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-20 Thread Marcos Douglas
On Fri, Dec 20, 2013 at 7:16 AM, Michael Schnell mschn...@lumino.de wrote:
 On 12/20/2013 12:46 AM, Juha Manninen wrote:

 Yes, Delphi does that. Future FPC versions will do automatic conversion,
 too, but not only to UTF-16.


 It's a long winding debate whether or not this is a good idea from a
 technical POW, but as Delphi does this, FPC seems to need to follow.

 In fact there are decent positive aspects.

 But it obviously is a negative aspect if TStringList and such functions are
 implemented using a fixed encoding scheme forcing conversions to and fro
 when e.g. using TStringList as an intermediate store. Here, a generic
 implementation (which Delphi does not provide) would be good. IMHO this is
 doable without loosing Delphi compatibility or performance.

+1

That's I was talking about in previous mail, using TStringList as an
intermediate store.

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-20 Thread Marcos Douglas
On Fri, Dec 20, 2013 at 8:44 PM, Mattias Gaertner
nc-gaert...@netcologne.de wrote:
 On Fri, 20 Dec 2013 17:55:48 -0200
 Marcos Douglas m...@delfire.net wrote:

 On Fri, Dec 20, 2013 at 3:19 AM, Jürgen Hestermann
 juergen.hesterm...@gmx.de wrote:
  Am 2013-12-19 23:04, schrieb Marcos Douglas:
 
  Well, the same problem...
  If there is no solution (for now), I prefer using SysToUTF8/ UTF8ToSys
  because is more simpler than use WideString API and conversion to
  UnicodeString, UTF8Decode, etc. Don't you think?
 
  As Bart already mentions, the ANSI (SYS) interface does *not* support
  Unicode.
  Also, you are not be able to access long paths (longer than 255 characters)
  when using ANSI API functions.
  Therefore the [W]ide (unicode) character API functions are a must.

 So, these limitations exist in Lazarus too, right?

 The file functions with UTF8 in name use internally the W functions
 under Windows.

I didn't know, thanks.

Marcos Douglas

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-20 Thread Juha Manninen
On Sat, Dec 21, 2013 at 3:08 AM, Marcos Douglas m...@delfire.net wrote:
 I didn't understand. If I have a TStringList instance, on Windows, I
 need to convert Text property to ANSI. But some components, e.g.
 TMemo, do these conversions automatically, but this is different.

TMemo is a GUI component. Then the string encoding matters and it must
be converted to the native widgetset encoding. Still, the conversion
is automatic. You don't need to care about it if you work with LCL
components only and not with WinAPI directly.

TStringList is not a GUI component. It can be used for example in an
embedded Linux program with no GUI.
It does not need to know the encoding (except for sorting maybe). With
FPC no automatic conversions happen.

In Delphi things are different. The auto-conversion happens ALWAYS
when assigning between eg. UTF-8 and UTF-16. It has nothing to do with
WinAPI, or any other widgetset API.
Native string is UTF-16. If you have
  var MyUTF8Str: UTF8String;
  ...
  StringList.Add(MyUTF8Str);   - triggers conversion
  MyUTF8Str := StringList[0];   - triggers conversion again

The amazing thing is that such code works. Delphi does a good job in
converting the strings.
It is also reasonably fast, but still not acceptable in a speed
critical code. This was the problem in my employer's code base. We are
thinking how to use UTF-8 for the core program without triggering many
auto-conversions. One choice is to dump Delphi and use only FPC. Now
the code still works with both.

Juha

P.S.
  I am still wondering why you are so fond of WinAPI while you have a
nice cross-platform API available.

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] Lazarus (UTF8) and Windows: SysToUTF8, UTF8ToSys... Is there a better solution?

2013-12-19 Thread Jürgen Hestermann

Am 2013-12-18 02:16, schrieb Marcos Douglas:
 On Tue, Dec 17, 2013 at 4:15 PM, Jürgen Hestermann
 juergen.hesterm...@gmx.de wrote:
 I am just writing a file manager for Windows (hopefully can port it to Linux
 later)
 and I don't see any performance problems by using UTF8 in my program while
 the API is UTF16.
 Most (if not all) things that I do with files take much longer than the
 string conversion so it does not matter much.
 Ok. But how do you work, using SysToUTF8 / UTF8ToSys?

I use the following:

---
var X,Path : UTF8String;
FW : Win32_Find_DataW;

H := FindFirstFileW(pwidechar(UTF8Decode(WinAPIPathName(Path))),FW);
...
X := UTF8Encode(UnicodeString(FW.cFileName));
---

where WinAPIPathName just prepends the \\?\ string to the pathname to 
overcome the 255 char length limitation.
Path is the UTF8 string for the file search and X holds the found file name(s) 
in UTF8 notation.
When I later need an API-call I convert back:

---
... 
Windows.DeleteFileW(pwidechar(UTF8Decode(WinAPIPathName(AppendDir(Pfad,X)
---







--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


  1   2   >