Re: [Lazarus] String vs WideString
On Tue, 15 Aug 2017 21:22:10 +0200, Luca Olivetti via Lazaruswrote: >(I remarked the "if" because I don't know if that's the case, according >to Bo Berglund's experience it is) Just to expand on my "experience" and the reason I posted: My work on converting the old program started back a couple of years when I went from Delphi 2007 (pre-unicode) to Delphi XE5 because we wanted the GUI to be translatable to non-western languages. But then all the communications functions (and these are many in this utility application) broke because they used strings as containers for the inherently binary serial data. So I followed advice on the Embarcadero forum to switch to AnsiString because that was really what the old string type was an alias for. I had no great insight in the inner workings of the string handling functions but I "knew" that AnsiString was a 1-byte per element and (unicode)string was now a 2-byte per element container. The fact that the code could alter the content of the AnsiString did not dawn on me at all. And the comm functions worked fine after the change (I tested a lot, but of course only on my English Win7 computer). Then some time ago there was a report of a failure of the new program version that only happened in Korea, China and Thailand. In the log files there was a very strange entry about finding an illegal command byte when sending a command to the equipment. It never triggered when I debugged the problem, for me and my collegues it worked flawlessly. So I had to add more logging and found that the problem arose when the outgoing command was built. A certain 1-byte command was then expanded to 2 bytes with the wrong first byte! The commands in the protocol are the first byte of the data of a telegram and they are in range $C0..$E9. When one of these (I don't now remember exactly which one) was used in an assignment to the AnsiString buffer it was converted to $3F + something that was never logged and the operation failed because the equipment could not decode the command. So I asked again on the forum and was steered towards RawByteString because presumably that container would disallow conversions. And when I changed this and sent a new version to the distributor in Korea the problem was seemingly gone. Based on this experience I wanted to alert the OP of the fact that using AnsiString instead of string is not a cure-all for binary data, you need to fix the codepage too, which is what the RawByteString does for you But I have now moved on and replaced all comm related containers with TBytes including modifying the serial component we have used. (With some help from Remy Lebeau). -- Bo Berglund Developer in Sweden -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 2017-08-15 20:22, Luca Olivetti via Lazarus wrote: Wait a minute, why "abuse"? After all, before code aware strings, an ansistring could store any kind of arbitrary data with no problem and no conversion, and made it extremely easy Just listen to what you are saying A string type and you want to store all kinds of non-string related data in that type. How is that not "abuse"??? Use a TBytes, TStream or other binary byte based storage mechanism. A string type was definitely not the right choice. Regards, Graeme -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
El 15/08/17 a les 22:08, Luca Olivetti ha escrit: El 15/08/17 a les 21:38, Ondrej Pokorny via Lazarus ha escrit: On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote: On Tue, 15 Aug 2017 21:22:10 +0200 Luca Olivetti via Lazaruswrote: [...] *If* code that worked before (and dare I say without abusing the language) suddenly breaks, the bug is in the compiler and not in the library. ... unless of course the incompatibility is deliberate and documented. In this case it is. Furthermore, if you use(d) strings for binary data, just replace old string for AnsiString/RawByteString (and Char for AnsiChar, PChar for PAnsiChar) and you are good to go. Annoying but no big deal. If that's all it's OK then, thank you. Sorry for the direct reply, it was meant for the list. Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
El 15/08/17 a les 21:38, Ondrej Pokorny via Lazarus ha escrit: On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote: On Tue, 15 Aug 2017 21:22:10 +0200 Luca Olivetti via Lazaruswrote: [...] *If* code that worked before (and dare I say without abusing the language) suddenly breaks, the bug is in the compiler and not in the library. ... unless of course the incompatibility is deliberate and documented. In this case it is. Furthermore, if you use(d) strings for binary data, just replace old string for AnsiString/RawByteString (and Char for AnsiChar, PChar for PAnsiChar) and you are good to go. Annoying but no big deal. If that's all it's OK then, thank you. Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
El 15/08/17 a les 21:38, Ondrej Pokorny via Lazarus ha escrit: On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote: On Tue, 15 Aug 2017 21:22:10 +0200 Luca Olivetti via Lazaruswrote: [...] *If* code that worked before (and dare I say without abusing the language) suddenly breaks, the bug is in the compiler and not in the library. ... unless of course the incompatibility is deliberate and documented. In this case it is. Furthermore, if you use(d) strings for binary data, just replace old string for AnsiString/RawByteString (and Char for AnsiChar, PChar for PAnsiChar) and you are good to go. Annoying but no big deal. If that's all it's OK then, thank you. Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] The new kid is growing up fast
Too bad that Eugene didn't decide to improve Lazarus Cocoa bindings :) Ondrej -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote: On Tue, 15 Aug 2017 21:22:10 +0200 Luca Olivetti via Lazaruswrote: [...] *If* code that worked before (and dare I say without abusing the language) suddenly breaks, the bug is in the compiler and not in the library. ... unless of course the incompatibility is deliberate and documented. In this case it is. Furthermore, if you use(d) strings for binary data, just replace old string for AnsiString/RawByteString (and Char for AnsiChar, PChar for PAnsiChar) and you are good to go. Annoying but no big deal. Ondrej -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Tue, 15 Aug 2017 21:22:10 +0200 Luca Olivetti via Lazaruswrote: >[...] > *If* code that worked before (and dare I say without abusing the > language) suddenly breaks, the bug is in the compiler and not in the > library. ... unless of course the incompatibility is deliberate and documented. In this case it is. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
El 15/08/17 a les 21:14, Graeme Geldenhuys via Lazarus ha escrit: On 2017-08-15 18:29, Luca Olivetti via Lazarus wrote: but for 3rd party libraries/components (e.g. synapse comes to mind Then better start filing bug reports to all those 3rd party libraries and components - they have been abusing the system and will silently fail. Not to mention that FPC is almost at v3.0.4 and the new string changes were introduced in v3.0.0 already. Wait a minute, why "abuse"? After all, before code aware strings, an ansistring could store any kind of arbitrary data with no problem and no conversion, and made it extremely easy to, e.g., add bytes to a buffer or find and extract data from the same buffer. *If* code that worked before (and dare I say without abusing the language) suddenly breaks, the bug is in the compiler and not in the library. (I remarked the "if" because I don't know if that's the case, according to Bo Berglund's experience it is) Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 2017-08-15 18:29, Luca Olivetti via Lazarus wrote: but for 3rd party libraries/components (e.g. synapse comes to mind Then better start filing bug reports to all those 3rd party libraries and components - they have been abusing the system and will silently fail. Not to mention that FPC is almost at v3.0.4 and the new string changes were introduced in v3.0.0 already. Regards, Graeme -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 08/15/2017 05:25 AM, Michael Van Canneyt via Lazarus wrote: As it is now, FPC offers a way out for all cases: WideString/UnicodeString for those that want 2-byte characters. what if 3 and 4 byte characters are required? will they also work in UnicodeStrings? i'm looking at this from a linux POV but have been trying to come from the very old school DOS TP stuff using codepages... especially needing to be able to read codepage strings and properly convert all their characters to UTF-8... converting back would be a huge help, too... even with the possible loss of characters requiring replacing them with "?" or something to hold their place and show they didn't convert... that or even leaving them in their 2, 3 or 4 byte form and let those using older codepage stuff see them raw... -- NOTE: No off-list assistance is given without prior approval. *Please keep mailing list traffic on the list unless* *a signed and pre-paid contract is in effect with us.* -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 2017-08-15 10:52, Michael Van Canneyt via Lazarus wrote: The only 'problem' is that TStrings uses a single-byte string. Why can't that be changed to a UnicodeString or UTF8String - after all, the Unicode standard is meant to support all languages. I would have thought that would be an obvious move for a Unicode-aware RTL. TStrings could also be extended (if it hasn't already) to keep track of what encoding is read in from file, and what encoding in should procedure when lines are extracted - in case those two encodings are not the same. Regards, Graeme -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Tue, 15 Aug 2017 16:44:30 +0200 Michael Schnell via Lazaruswrote: > On 15.08.2017 14:53, Mattias Gaertner via Lazarus wrote: > > Do you mean a 'char' is a string in your proposal? > Nope. In my proposal there would be Chars for any statically encoded > String Type, hence 1, 2, 4, and 8 byte wide. (As regarding statically > encoded string (and char) brands, it's just an extension of the existing > paradigm. 8 bytes? Do you propose a string without the array operator [] ? Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 14:53, Mattias Gaertner via Lazarus wrote: Do you mean a 'char' is a string in your proposal? Nope. In my proposal there would be Chars for any statically encoded String Type, hence 1, 2, 4, and 8 byte wide. (As regarding statically encoded string (and char) brands, it's just an extension of the existing paradigm. I did not think about the necessity to also have a dynamically encoded Char type. If yes, it (like a string) would need the additional fields for encoding number and bytes_per_char, and the appropriate compiler magic to handle them appropriately (workalike to a on-element string). -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Tue, 15 Aug 2017, Mattias Gaertner via Lazarus wrote: On Tue, 15 Aug 2017 14:26:34 +0200 Michael Schnell via Lazaruswrote: On 15.08.2017 11:15, Tony Whyman via Lazarus wrote: > Why shouldn't there be a single char type that intuitively represents > a single character regardless of how many bytes are used to represent it. I suppose by "char" you mean "single printable thingy" with Unicode it's rather debatable what such a thingy is. Hence a Unicode singe char would need to be just be a Unicode string. Do you mean a 'char' is a string in your proposal? That would be a neat recursive definition :) Michael. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Tue, 15 Aug 2017 14:26:34 +0200 Michael Schnell via Lazaruswrote: > On 15.08.2017 11:15, Tony Whyman via Lazarus wrote: > > Why shouldn't there be a single char type that intuitively represents > > a single character regardless of how many bytes are used to represent it. > > I suppose by "char" you mean "single printable thingy" with Unicode it's > rather debatable what such a thingy is. > > Hence a Unicode singe char would need to be just be a Unicode string. Do you mean a 'char' is a string in your proposal? Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 11:15, Tony Whyman via Lazarus wrote: 3. The problem with string handling today is that it is not based on a consistent approach to the character type. If you clean up character handling then the model for string handling should become obvious. A string is after all no more than a container for a character array and which should be constrained to have the same character encoding. A string should intuitively represent a string of text regardless of how many bytes are used to represent each character and with dynamic attributes to tell you how it is encoded. 4. FPC should clean up Delphi's mess for it. If a unified string type follows a consistent model then it should be possible to make all Delphi string types synonyms. You will need to allow exceptions for legacy programs that insist on manipulating the bytes themselves - but that is not rocket science. There is also the issue of the Windows API and its insistence on Wide Strings - but isn't that why calling conventions such as cdecl and stdcall exist - to tell the compiler when it needs to reformat the call for a given API convention. see -> http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 11:15, Tony Whyman via Lazarus wrote: Why shouldn't there be a single char type that intuitively represents a single character regardless of how many bytes are used to represent it. I suppose by "char" you mean "single printable thingy" with Unicode it's rather debatable what such a thingy is. Hence a Unicode singe char would need to be just be a Unicode string. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 8/15/17, Tony Whyman via Lazaruswrote: > 2. Clean up the char type. > > Why shouldn't there be a single char > type that intuitively represents a single character regardless of > how many bytes are used to represent it. You would have to define what "a single character" means in the first place. This is especially important when it involves precomposed characters and combining characters. Bart -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] The new kid is growing up fast
On 15.08.2017 13:19, Graeme Geldenhuys via Lazarus wrote: Just wanted to show you guys something. Great. CrossVCL seems to allow to easily port Delphi VCL applications to Mac and Linux. How to compare it against Lazarus ? -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
[Lazarus] The new kid is growing up fast
Hi guys, Just wanted to show you guys something. The new kid on the block is growing up very fast CrossVCL. https://www.youtube.com/watch?v=_lr_BQlXvkk I believe the programmer is the ex-FMX (FireMonkey) developer that was let go by Embarcadero, and he is hitting back with a vengeance. The CrossVCL project has grown from nothing to something in an extremely short time. Coming from a toolkit designer myself, that is very impressive to see. Regards, Graeme -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Tue, 15 Aug 2017, Michael Schnell via Lazarus wrote: On 15.08.2017 12:15, Michael Van Canneyt via Lazarus wrote: What does S[2] mean in your proposal ? Is it 1, 2, 4 or even 8 bytes ? Regarding the users' appreciation, the S[x] notation is decently incompatible between the different string types and compiler versions. Of course not. It's 1 byte for ansistring, 2 bytes for widestring. The point is that the compiler knows how many bytes it is based on the declaration of S. In your proposal, it is dynamic, if I understand it correctly. There were hundreds of complains in all the appropriate forums and mailing list. Complaints about what exactly ? So not much additional harm can be done, anyway. I suggest that it should be according to the character_size definition stored S, and the operation c := S[x] should transfer the appropriate count of bits, provided the type of c allows for taking them. As far as I understand your proposal, this currently cannot be done ? The compiler needs to know the S[X] size at compile time. Michael. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 12:15, Michael Van Canneyt via Lazarus wrote: What does S[2] mean in your proposal ? Is it 1, 2, 4 or even 8 bytes ? Regarding the users' appreciation, the S[x] notation is decently incompatible between the different string types and compiler versions. There were hundreds of complains in all the appropriate forums and mailing list. So not much additional harm can be done, anyway. I suggest that it should be according to the character_size definition stored S, and the operation c := S[x] should transfer the appropriate count of bits, provided the type of c allows for taking them. This seems to be compatible to the current implementation of any 1-Byte brand and UTF16. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 12:11, Mattias Gaertner via Lazarus wrote: It does not explain what the characters of DynamicString are, does it? I don't understand what you are asking. The element size and encoding of a Dynamic String ("CP_ANY" in the document) are not predefined, but depend on the content: http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support -> Defining String variables and String types: *CP_ANY* = $FF00 // ElementSize dynamically assigned // fully dynamical String for intermediate storing string content // just assigned to the Type or variable, never used in the "Encoding" field in the string header. Hence it stores the "branding" when it is assigned to from a string with a fixed branding (such as *CP_UTF8*), and the content is auto-converted if necessary when assigning form CP_ANY to a fixed branded string variable. If (in your example) the data is read from a file, a CP_ANY Strings based StringList would keep the encoding/char_size of the data as t is in the file (it would need to somehow get to know the presumed encoding of the file, anyway) and store that information in the EncodingBrandNumber and ElementSize fields (which do exist in any "NewString" variable, anyway), in each String read. If the user assignes an element of the stringlist to a fixed branding (such as *CP_UTF8*), the content obviously is auto-converted if necessary when assigning form CP_ANY to a fixed branded string variable, as usual. In fact I suppose that the current implementation of TStringlist does not use new strings to store the data on the heap, but I never said that trying to implement such idea would not require a lot of work. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Tue, 15 Aug 2017, Mattias Gaertner via Lazarus wrote: On Tue, 15 Aug 2017 12:02:28 +0200 Michael Schnell via Lazaruswrote: On 15.08.2017 11:52, Michael Van Canneyt via Lazarus wrote: > This cannot be solved properly except by duplicating the classes unit. Sorry to disagree, but IMHO this can only be solved properly by defining an additional fully dynamically encoded string type and use same for TStrings (see -> http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support ) It does not explain what the characters of DynamicString are, does it? I was just going to write that. The problem of the element size is circumvented by simply not digging into it. What does S[2] mean in your proposal ? Is it 1, 2, 4 or even 8 bytes ? Michael. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Tue, 15 Aug 2017 12:02:28 +0200 Michael Schnell via Lazaruswrote: > On 15.08.2017 11:52, Michael Van Canneyt via Lazarus wrote: > > This cannot be solved properly except by duplicating the classes unit. > > Sorry to disagree, but IMHO this can only be solved properly by defining > an additional fully dynamically encoded string type and use same for > TStrings (see -> > http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support > > ) It does not explain what the characters of DynamicString are, does it? Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 11:52, Michael Van Canneyt via Lazarus wrote: This cannot be solved properly except by duplicating the classes unit. Sorry to disagree, but IMHO this can only be solved properly by defining an additional fully dynamically encoded string type and use same for TStrings (see -> http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support ) But I am perfectly aware that implementing this would be a huge effort (see other mail here), and nobody i entitled to ask for this. (I wrote the article just to elaborate what was discussed in the fpc mailing list at that time.) -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 11:15, Tony Whyman via Lazarus wrote: In this case, I would argue that both are true. And the culprit obviously is Embarcadeo and not the fpc or the Lazarus team, who did their best to try to do a compatible and implementation that is really workable on the multiple supported platforms (which E$ did not feel necessary when they released the encoding aware strings). Maybe a better solution can be found, but who would want to nudge the fpc / Lazarus developers to invest a huge amount of time to create it and then make sure it is decently tested stable ? -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Tue, 15 Aug 2017, Michael Schnell via Lazarus wrote: On 15.08.2017 11:25, Michael Van Canneyt via Lazarus wrote: WideString/UnicodeString for those that want 2-byte characters. A codepage-aware single-byte string for those that want 1-byte characters. The shortstring is even still available. IM (often stated) O, this does not help as long as TStrings does not without forced auto-conversion support the string type the user is inclined to choose. Please check TStrings in trunk. This exists. procedure LoadFromFile(const FileName: string; AEncoding: TEncoding); overload; virtual; procedure LoadFromStream(Stream: TStream; AEncoding: TEncoding); overload; virtual; The only 'problem' is that TStrings uses a single-byte string. This cannot be solved properly except by duplicating the classes unit. Michael. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 15.08.2017 11:25, Michael Van Canneyt via Lazarus wrote: WideString/UnicodeString for those that want 2-byte characters. A codepage-aware single-byte string for those that want 1-byte characters. The shortstring is even still available. IM (often stated) O, this does not help as long as TStrings does not without forced auto-conversion support the string type the user is inclined to choose. This obviously requires an (additional) fully dynamic string brand. This (again obviously) is not the "Embarcadero way", but supposedly does not necessarily lead to incompatibility regarding the user code. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Tue, 15 Aug 2017, Mattias Gaertner via Lazarus wrote: On Mon, 14 Aug 2017 18:47:58 +0200 Sven Barth via Lazaruswrote: [...] The main problem of such a dynamic type would be the inability to do fast indexing as the compiler would need to insert runtime checks for the size of a character. I had already thought the same, but then had to discard the idea due to this. IMHO the main problem of adding a new string type is https://xkcd.com/927/ Exactly. I don't think we should add even more. As it is now, FPC offers a way out for all cases: WideString/UnicodeString for those that want 2-byte characters. A codepage-aware single-byte string for those that want 1-byte characters. The shortstring is even still available. Attempting to store binary data in a string is not advisable. Dynamic arrays, TBytes and - in the worst case - TBytesStream are powerful enough to cover most use-cases in this area. Michael. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
You can me as a "like" on that one. On 15/08/17 10:13, Mattias Gaertner via Lazarus wrote: IMHO the main problem of adding a new string type is https://xkcd.com/927/ -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 14/08/17 22:01, Juha Manninen via Lazarus wrote: Tony Whyman, this issue has been discussed again and again for the past 10+ years first in FPC mailing lists and then in Lazarus lists. The current Unicode support in Lazarus works f***ing well and is amazingly compatible with Delphi. WinAPI parameters may require an explicit temporary UnicodeString variable but even then the code is compatible with Delphi. Tony Whyman, Marcos Douglas and Michael Schnell, please study the facts. For starters, this is about the current Unicode support in Lazarus: http://wiki.freepascal.org/Unicode_Support_in_Lazarus I think the dynamic encoding and automatic conversion now work perfectly well. If you have a piece of code where it does not work, please ask for detailed info. If a topic keeps on being discussed after 10+ years of argument, the reason is usually either (a) the problem and its solution have not been documented properly, or (b) the outcome is an unsatisfactory compromise. In this case, I would argue that both are true. I went back and read the wiki article you mentioned and was no more the wiser as to why the current mess exists. Is it really no more than because Delphi continues to screw up in this area, so must FPC? The body of the article appears to be a set of notes - not necessarily wrong in themselves but lacking the background and context needed to explain why it is like it is. This problem will keep coming up until it is fixed properly and, by that, I mean the that solution is consistent, understandable intuitively and well documented. Windows eccentricity also need to kept to Windows. Here is my wish list: 1. Stop using the term "Unicode". It is too ambiguous. It is used as both an all embracing term for multi-byte encoding and as a synonym for UTF16 and that is really too confusing. The problem is made worse by having UnicodeString as a two byte wide string type in both FPC and Delphi. 2. Clean up the char type. When Wirth created the "char" type in Pascal it was a simple ASCII or EBCDIC character. There are now seven different char types (including type equivalence) with no guidelines on when each is applicable. This is too many. Why shouldn't there be a single char type that intuitively represents a single character regardless of how many bytes are used to represent it. Yes, in a world where we have to live with UTF8, UTF16, UTF32, legacy code pages and Chinese variations on UTF8, that means that dynamic attributes have to be included in the type. But isn't that the only way to have consistent and intuitive character handling? 3. The problem with string handling today is that it is not based on a consistent approach to the character type. If you clean up character handling then the model for string handling should become obvious. A string is after all no more than a container for a character array and which should be constrained to have the same character encoding. A string should intuitively represent a string of text regardless of how many bytes are used to represent each character and with dynamic attributes to tell you how it is encoded. 4. FPC should clean up Delphi's mess for it. If a unified string type follows a consistent model then it should be possible to make all Delphi string types synonyms. You will need to allow exceptions for legacy programs that insist on manipulating the bytes themselves - but that is not rocket science. There is also the issue of the Windows API and its insistence on Wide Strings - but isn't that why calling conventions such as cdecl and stdcall exist - to tell the compiler when it needs to reformat the call for a given API convention. Tony Whyman -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Mon, 14 Aug 2017 18:47:58 +0200 Sven Barth via Lazaruswrote: >[...] > The main problem of such a dynamic type would be the inability to do fast > indexing as the compiler would need to insert runtime checks for the size > of a character. I had already thought the same, but then had to discard the > idea due to this. IMHO the main problem of adding a new string type is https://xkcd.com/927/ Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On Sat, 12 Aug 2017 17:56:58 -0300 "Marcos Douglas B. Santos via Lazarus"wrote: >[...] > > Which one? Do you mean Windows CP-1252? > > Yes... > But would it make any difference? Just > >>[...] > >> Warning: Implicit string type conversion from "AnsiString" to "WideString" > >> > > > > Explicit type cast: > > > > Lib.SetLicense( > >WideString(IniFile.ReadString('TheLib', 'license', '')) > > ); > > Wow... everywhere? :( You could instead define an overloaded Lib.SetLicense(AnsiString). Or you could disable this hint altogether for your project (not recommended). Select the message in the Messages window. Right click and click on add -vm Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 14/08/17 17:47, Sven Barth via Lazarus wrote: The main problem of such a dynamic type would be the inability to do fast indexing as the compiler would need to insert runtime checks for the size of a character. I had already thought the same, but then had to discard the idea due to this. Is this really a big problem? It is not as if it would be necessary to do a table lookup everytime you index a string as the indexing method could be an attribute of the string and updated with the character encoding attribute. Is it really that complicated for the compiler to generate code that jumps to an indexing method depending upon a data attribute? Is your problem really more about the result type as, depending on the character width, the result could be an AnsiChar or WideChar or a UTF8 character for which I don't believe there is a defined char type (other than an arguable mis-use of UCS4Char)? I can accept that a clear up of this area would also have to extend to the char types as well - but I would also argue that that is well overdue. On a quick count, I found 7 different char types in the system unit. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 14.08.2017 18:47, Sven Barth via Lazarus wrote: The main problem of such a dynamic type would be the inability to do fast indexing as the compiler would need to insert runtime checks for the size of a character. What "indexing" do you think of ? Could you give an example where such a difference is supposed to get important ? (As you know I wrote a paper where I claimed the contrary. I'd like to revise same if necessary.) -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] String vs WideString
On 14.08.2017 18:49, Sven Barth via Lazarus wrote: Because the crowd demanding Delphi compatibility is larger than the crowd demanding exact terminology. ... or even a revised concept avoiding the junk presented by Embarcadero :( But obviously the fpc team has no choice. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus