Re: [fpc-devel] Unit for handling UTF-8 strings
On 04/08/2013 07:02 PM, Mattias Gaertner wrote: I guess, you mean encoded string types. AFAIK, you can just create string variables of the appropriate coding type and an assignment will do auto-conversion. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
On Tue, 09 Apr 2013 08:24:11 +0200 Michael Schnell mschn...@lumino.de wrote: On 04/08/2013 07:02 PM, Mattias Gaertner wrote: I guess, you mean encoded string types. AFAIK, you can just create string variables of the appropriate coding type and an assignment will do auto-conversion. Yes. But how do you examine the characters? If I understand Michael right, there will be some implicit functions for that. I wonder how they work. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
On Tue, 9 Apr 2013, Mattias Gaertner wrote: On Tue, 09 Apr 2013 08:24:11 +0200 Michael Schnell mschn...@lumino.de wrote: On 04/08/2013 07:02 PM, Mattias Gaertner wrote: I guess, you mean encoded string types. AFAIK, you can just create string variables of the appropriate coding type and an assignment will do auto-conversion. Yes. But how do you examine the characters? If I understand Michael right, there will be some implicit functions for that. I wonder how they work. See the character unit: // flat functions function ConvertFromUtf32(AChar : UCS4Char) : UnicodeString; function ConvertToUtf32(const AString : UnicodeString; AIndex : Integer) : UCS4Char; overload; function ConvertToUtf32(const AString : UnicodeString; AIndex : Integer; out ACharLength : Integer) : UCS4Char; overload; function ConvertToUtf32(const AHighSurrogate, ALowSurrogate : UnicodeChar) : UCS4Char; overload; function GetNumericValue(AChar : UnicodeChar) : Double; overload; function GetNumericValue(const AString : UnicodeString; AIndex : Integer) : Double; overload; function GetUnicodeCategory(AChar : UnicodeChar) : TUnicodeCategory; overload; function GetUnicodeCategory(const AString : UnicodeString; AIndex : Integer) : TUnicodeCategory; overload; function IsControl(AChar : UnicodeChar) : Boolean; overload; function IsControl(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsDigit(AChar : UnicodeChar) : Boolean; overload; function IsDigit(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsSurrogate(AChar : UnicodeChar) : Boolean; overload; function IsSurrogate(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsHighSurrogate(AChar : UnicodeChar) : Boolean; overload; function IsHighSurrogate(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsLowSurrogate(AChar : UnicodeChar) : Boolean; overload; function IsLowSurrogate(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsSurrogatePair(const AHighSurrogate, ALowSurrogate : UnicodeChar) : Boolean; overload; function IsSurrogatePair(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsLetter(AChar : UnicodeChar) : Boolean; overload; function IsLetter(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsLetterOrDigit(AChar : UnicodeChar) : Boolean; overload; function IsLetterOrDigit(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsLower(AChar : UnicodeChar) : Boolean; overload; function IsLower(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsNumber(AChar : UnicodeChar) : Boolean; overload; function IsNumber(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsPunctuation(AChar : UnicodeChar) : Boolean; overload; function IsPunctuation(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsSeparator(AChar : UnicodeChar) : Boolean; overload; function IsSeparator(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsSymbol(AChar : UnicodeChar) : Boolean; overload; function IsSymbol(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsUpper(AChar : UnicodeChar) : Boolean; overload; function IsUpper(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function IsWhiteSpace(AChar : UnicodeChar) : Boolean; overload; function IsWhiteSpace(const AString : UnicodeString; AIndex : Integer) : Boolean; overload; function ToLower(AChar : UnicodeChar) : UnicodeChar; overload; function ToLower(const AString : UnicodeString) : UnicodeString; overload; function ToUpper(AChar : UnicodeChar) : UnicodeChar; overload; function ToUpper(const AString : UnicodeString) : UnicodeString; overload; ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
On Tue, 9 Apr 2013 08:55:15 +0200 (CEST) Michael Van Canneyt mich...@freepascal.org wrote: On Tue, 9 Apr 2013, Mattias Gaertner wrote: On Tue, 09 Apr 2013 08:24:11 +0200 Michael Schnell mschn...@lumino.de wrote: On 04/08/2013 07:02 PM, Mattias Gaertner wrote: I guess, you mean encoded string types. AFAIK, you can just create string variables of the appropriate coding type and an assignment will do auto-conversion. Yes. But how do you examine the characters? If I understand Michael right, there will be some implicit functions for that. I wonder how they work. See the character unit: Nice! Why do you call them implicit calls? Will there be UTF-8 functions too or do you have to convert to UnicodeString? Will there be PUnicodeChar functions too? // flat functions function ConvertFromUtf32(AChar : UCS4Char) : UnicodeString; function ConvertToUtf32(const AString : UnicodeString; AIndex : Integer) : UCS4Char; overload; function ConvertToUtf32(const AString : UnicodeString; AIndex : Integer; out ACharLength : Integer) : UCS4Char; overload; function ConvertToUtf32(const AHighSurrogate, ALowSurrogate : UnicodeChar) : UCS4Char; overload; [...] Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
09.04.2013 15:13, Mattias Gaertner wrote: Will there be UTF-8 functions too or do you have to convert to UnicodeString? At the moment TCharacter contains methods which delphi TCharacter has. If there is demand we will add UTF8 overloads. Will there be PUnicodeChar functions too? // flat functions function ConvertFromUtf32(AChar : UCS4Char) : UnicodeString; function ConvertToUtf32(const AString : UnicodeString; AIndex : Integer) : UCS4Char; overload; function ConvertToUtf32(const AString : UnicodeString; AIndex : Integer; out ACharLength : Integer) : UCS4Char; overload; function ConvertToUtf32(const AHighSurrogate, ALowSurrogate : UnicodeChar) : UCS4Char; overload; Best regards, Paul Ishenin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
On 04/09/2013 08:49 AM, Mattias Gaertner wrote: But how do you examine the characters? Even defining what a character is, is extremely problematic with any use of Unicode. Regarding that a printable character can be assembled by multiple of the (nearly 2^32) Unicode codes, and a single Unicode codes is represented by 1, 2, 3, or 4 Bytes when using UTF-8 or UTF-16 encoding, and now the order of those bytes depends on the CPU-arch and/or the file the string is imported from and the way it is imported. This of course is not a problem introduced by fpc, but the perfectly normal complexity of Unicode. If I understand Michael right, there will be some implicit functions for that. I wonder how they work. This is what Delphi compatibility dictated. (You might read the Delphi XE Docs on how to code Unicode enabled Delphi source.) I do hope, fpc avoids some of the quirks Delphi introduces and offers some useful additional features (e.g. dedicated string types such as unencoded (raw, never auto-converted) Byte, Word and DWord Strings, and a flexible encoded String type, that inherit the encoding scheme from the source string when doing an assignment or using them as a function parameter, doing auto-conversion whenever dynamically necessary. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
Paul Ishenin i...@kmiac.ru hat am 9. April 2013 um 09:20 geschrieben: 09.04.2013 15:13, Mattias Gaertner wrote: Will there be UTF-8 functions too or do you have to convert to UnicodeString? At the moment TCharacter contains methods which delphi TCharacter has. If there is demand we will add UTF8 overloads. Demand+=1 Will there be PUnicodeChar functions too? Well? Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
Am 09.04.2013 10:30, schrieb Mattias Gaertner: Paul Ishenin i...@kmiac.ru hat am 9. April 2013 um 09:20 geschrieben: 09.04.2013 15:13, Mattias Gaertner wrote: Will there be UTF-8 functions too or do you have to convert to UnicodeString? At the moment TCharacter contains methods which delphi TCharacter has. If there is demand we will add UTF8 overloads. Demand+=1 (1,8) Error: Illegal expression (1,9) Error: Illegal expression (1,9) Fatal: Syntax error, ; expected but ordinal const found (Sorry, had to be said :P ) Regards, Sven ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
09.04.2013 17:10, Sven Barth пишет: Demand+=1 (1,8) Error: Illegal expression (1,9) Error: Illegal expression (1,9) Fatal: Syntax error, ; expected but ordinal const found (Sorry, had to be said :P ) Also, Patches variable seems to be equal to zero. And assigning Demand without assigning Patches has almost no effect :) Best regards, Paul Ishenin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
Paul Ishenin i...@kmiac.ru hat am 9. April 2013 um 11:23 geschrieben: 09.04.2013 17:10, Sven Barth пишет: Demand+=1 (1,8) Error: Illegal expression (1,9) Error: Illegal expression (1,9) Fatal: Syntax error, ; expected but ordinal const found (Sorry, had to be said :P ) Also, Patches variable seems to be equal to zero. And assigning Demand without assigning Patches has almost no effect :) Creating a patch is not hard. The lazutf8 already contains the code. But I have no idea how the the interface should look like. TCharacter is a Delphi class and Delphi does not have UTF-8 functions. Michael wrote that these functions are implicit, so maybe these functions need to fit some form? In other words: Are there any suggestions, recommendations how the UTF-8 functions should look like? Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
09.04.2013 18:09, Mattias Gaertner пишет: Creating a patch is not hard. The lazutf8 already contains the code. But I have no idea how the the interface should look like. TCharacter is a Delphi class and Delphi does not have UTF-8 functions. Michael wrote that these functions are implicit, so maybe these functions need to fit some form? In other words: Are there any suggestions, recommendations how the UTF-8 functions should look like? Let's see. The next function should stay as is. Compiler will add required implicit conversion when you assign result to UTF8String variable. function ConvertFromUtf32(AChar : UCS4Char) : UnicodeString; Here you can add UTF8String overloads if needed: function ConvertToUtf32(const AString : UnicodeString; AIndex : Integer) : UCS4Char; overload; function ConvertToUtf32(const AString : UnicodeString; AIndex : Integer; out ACharLength : Integer) : UCS4Char; overload; At the same time even without UTF8 overloads compiler will insert implicit conversion from UTF8String to UnicodeString when you pass it to that functions. So UTF8 overloads can only increase spead by removing 1 implicit conversion. Best regards, Paul Ishenin ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
On Sun, 7 Apr 2013 20:18:51 +0200 (CEST) Michael Van Canneyt mich...@freepascal.org wrote: [...] FPC is preparing for a complete unicode solution, with proper language support. Great. I guess, you mean encoded string types. But even then, FPC should contain UTF-8 and UTF-16 functions. At best, these units are a temporary solution. You might be right about the UTF-8 classes like TFileStreamUTF8 (I really hope). But I doubt that about basic UTF-8/16 functions. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
On Mon, 8 Apr 2013, Mattias Gaertner wrote: On Sun, 7 Apr 2013 20:18:51 +0200 (CEST) Michael Van Canneyt mich...@freepascal.org wrote: [...] FPC is preparing for a complete unicode solution, with proper language support. Great. I guess, you mean encoded string types. But even then, FPC should contain UTF-8 and UTF-16 functions. Why ? The necessary functionality will be implicit in the various calls. At best, these units are a temporary solution. You might be right about the UTF-8 classes like TFileStreamUTF8 (I really hope). But I doubt that about basic UTF-8/16 functions. Such as ? Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
On Mon, 8 Apr 2013 23:13:17 +0200 (CEST) Michael Van Canneyt mich...@freepascal.org wrote: On Mon, 8 Apr 2013, Mattias Gaertner wrote: On Sun, 7 Apr 2013 20:18:51 +0200 (CEST) Michael Van Canneyt mich...@freepascal.org wrote: [...] FPC is preparing for a complete unicode solution, with proper language support. Great. I guess, you mean encoded string types. But even then, FPC should contain UTF-8 and UTF-16 functions. Why ? The necessary functionality will be implicit in the various calls. What calls? At best, these units are a temporary solution. You might be right about the UTF-8 classes like TFileStreamUTF8 (I really hope). But I doubt that about basic UTF-8/16 functions. Such as ? Functions like determining the number of bytes of an UTF-8 character or checking if it is a valid character. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
On 07 Apr 2013, at 13:35, Kostas Michalopoulos wrote: But still no UTF8 in FPC, despite all the different implementations floating around out there and despite UTF8 being the most important Unicode encoding (being used by practically anything that doesn't falsely believe that 16bit integers are enough). Why is it that no debate about unicode can ever be held without adding flamebait about either UTF-8 or UTF-16? Please don't react to the above part of that message. Jonas FPC mailing lists admin___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
On Sun, 7 Apr 2013 13:35:40 +0200 Kostas Michalopoulos badsectorac...@gmail.com wrote: [...]I looked around in FPC 2.6.2's units and found nothing beyond utf8encode/decode (which in linux requires a C widestring manager that i'd like to avoid... and doesn't really help in all cases since Unicode can exceed the 16bit range). It does not require a widestring manager. Searching in Google i found a discussion from 2007 which basically concluded to yeah, it is a nice feature, has some warts, but people need it but didn't went anywhere http://free-pascal-general.1045716.n5.nabble.com/UTF-8-versions-of-Copy-and-Length-td2814536.html and the apparent lack of a UTF8 unit in FPC six years later (even for basic stuff like copy, length, etc) means that that unit never came to exist. So, what is going on with that? Graeme mentioned that he already had some code and knew some other library that provided a more complete solution See for example the Lazarus lazutf8 unit. that could be imported in FPC and even another guy had yet another library. But still no UTF8 in FPC, despite all the different implementations floating around out there and despite UTF8 being the most important Unicode encoding (being used by practically anything that doesn't falsely believe that 16bit integers are enough). AFAIK there are no UTF16 functions either. Lazarus provides a lazutf16 unit too. Personally i coded yet another unit, which you can find here: http://pastebin.com/cJ2TvRdZ Of course my code is most likely slow and there might be some bugs there - i only did some testing with Greek characters which seem to work fine, but nothing like Chinese or the new emoticon stuff which is regularly added in Unicode. I agree, your unit is most likely slow. [...] Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
On Sun, Apr 7, 2013 at 5:59 PM, Mattias Gaertner nc-gaert...@netcologne.dewrote: On Sun, 7 Apr 2013 13:35:40 +0200 Kostas Michalopoulos badsectorac...@gmail.com wrote: [...]I looked around in FPC 2.6.2's units and found nothing beyond utf8encode/decode (which in linux requires a C widestring manager that i'd like to avoid... and doesn't really help in all cases since Unicode can exceed the 16bit range). It does not require a widestring manager. The documentation says otherwise: http://www.freepascal.org/docs-html/rtl/system/utf8encode.html For this function to work, a widestring manager must be installed. See for example the Lazarus lazutf8 unit. AFAIK there are no UTF16 functions either. Lazarus provides a lazutf16 unit too. Is there a reason for those to be in Lazarus only? Can they be moved to FPC? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] Unit for handling UTF-8 strings
On Sun, 7 Apr 2013, Kostas Michalopoulos wrote: On Sun, Apr 7, 2013 at 5:59 PM, Mattias Gaertner nc-gaert...@netcologne.de wrote: On Sun, 7 Apr 2013 13:35:40 +0200 Kostas Michalopoulos badsectorac...@gmail.com wrote: [...]I looked around in FPC 2.6.2's units and found nothing beyond utf8encode/decode (which in linux requires a C widestring manager that i'd like to avoid... and doesn't really help in all cases since Unicode can exceed the 16bit range). It does not require a widestring manager. The documentation says otherwise: http://www.freepascal.org/docs-html/rtl/system/utf8encode.html For this function to work, a widestring manager must be installed. See for example the Lazarus lazutf8 unit. AFAIK there are no UTF16 functions either. Lazarus provides a lazutf16 unit too. Is there a reason for those to be in Lazarus only? Can they be moved to FPC? FPC is preparing for a complete unicode solution, with proper language support. At best, these units are a temporary solution. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel