Re: [Lazarus] UTF-8 string recognition

2010-03-03 Thread JoshyFun
Hello Lazarus-List, Wednesday, March 3, 2010, 12:24:35 AM, you wrote: RH Pls check the function I used for check UTF8 string. Hope it helpful RH function IsUTF8(UnknownStr:string):boolean; Well, there is a lot of UTF8 strings that do not pass your checks ;) If you remove low ascii control chars

Re: [Lazarus] UTF-8 string recognition

2010-03-03 Thread Robin Hoo
Hi, JoshyFun Thanks for pointing out the bug in my coding, yes you are right. I forgot to put some checking before every inc(i,k) and continue; there should a judgement statement *if ilength(UnknownStr) then exit(false);* 2010/3/3 JoshyFun joshy...@gmail.com Hello Lazarus-List, Wednesday,

Re: [Lazarus] UTF-8 string recognition

2010-03-03 Thread JoshyFun
Hello Lazarus-List, Wednesday, March 3, 2010, 2:05:48 PM, you wrote: RH Hi, JoshyFun RH Thanks for pointing out the bug in my coding, yes you are right. I forgot to RH put some checking before every inc(i,k) and continue; there should a RH judgement statement *if ilength(UnknownStr) then

Re: [Lazarus] UTF-8 string recognition

2010-03-03 Thread Hans-Peter Diettrich
JoshyFun schrieb: RH Pls check the function I used for check UTF8 string. Hope it helpful RH function IsUTF8(UnknownStr:string):boolean; Well, there is a lot of UTF8 strings that do not pass your checks ;) If you remove low ascii control chars what happend with UTF8 control chars ? RH var RH

Re: [Lazarus] UTF-8 string recognition

2010-03-02 Thread Robin Hoo
Hi, Antonio Pls check the function I used for check UTF8 string. Hope it helpful function IsUTF8(UnknownStr:string):boolean; var i:Integer; begin if length(UnknownStr)=0 then exit(true); i:=1; while ilength(UnknownStr) do begin // ASCII if (UnknownStr[i]

Re: [Lazarus] UTF-8 string recognition

2010-03-02 Thread Mattias Gaertner
On Wed, 3 Mar 2010 07:24:35 +0800 Robin Hoo robin.hoo...@gmail.com wrote: Hi, Antonio Pls check the function I used for check UTF8 string. Hope it helpful You combine a IsText (no special characters in #0-#31) and IsUTF8 - good idea. function IsUTF8(UnknownStr:string):boolean; Maybe

Re: [Lazarus] UTF-8 string recognition

2010-03-01 Thread Santiago A.
Not every ansi string is a valid UTF8 string, but every UTF8 string is a valid ansi string. So you can be sure that a string is not a valid UTF8 string, if it's a valid, it could be a UTF8 or an ansi string. You can make some guess, for latin alphabet (western or eastern), if it is a valid UTF8

Re: [Lazarus] UTF-8 string recognition

2010-02-28 Thread Mattias Gaertner
On Sun, 28 Feb 2010 00:46:12 -0300 Antônio antoniog12...@gmail.com wrote: There is no way to determine whether a string is in ANSI format or not? In general: No, because every 8bit combination is valid. In specific case: Not every combination makes sense. So you can write heuristics. You

Re: [Lazarus] UTF-8 string recognition

2010-02-28 Thread Antônio
I refer to a text loaded in a SynEdit by the user in which I need to make some string manipulations. Antônio 2010/2/28 Mattias Gaertner nc-gaert...@netcologne.de: On Sun, 28 Feb 2010 00:46:12 -0300 Antônio antoniog12...@gmail.com wrote: There is no way to determine whether a string is in

Re: [Lazarus] UTF-8 string recognition

2010-02-28 Thread Mattias Gaertner
On Sun, 28 Feb 2010 06:28:38 -0300 Antônio antoniog12...@gmail.com wrote: I refer to a text loaded in a SynEdit by the user in which I need to make some string manipulations. Ah, so not a string, but a whole text. See here

Re: [Lazarus] UTF-8 string recognition

2010-02-27 Thread Antônio
Mainly from ANSI, but whatever. Antonio 2010/2/27 Bart bartjun...@gmail.com: Hi Antônio, On 2/27/10, Antônio antoniog12...@gmail.com wrote: How to determine whether a string is UTF-8 or not? Distinguish UTF-8 from what? ANSI, UTF16 or whatever? Bart --

Re: [Lazarus] UTF-8 string recognition

2010-02-27 Thread JoshyFun
Hello Lazarus-List, Saturday, February 27, 2010, 5:43:24 PM, you wrote: A Mainly from ANSI, but whatever. You can not determine if a string is an UTF8 one, only if it is an UTF8 well formed sequence. In other words, you can not detect if a string IS an UTF8, only if it does not have UTF8

Re: [Lazarus] UTF-8 string recognition

2010-02-27 Thread Antônio
There is no way to determine whether a string is in ANSI format or not? Antônio 2010/2/27 JoshyFun joshy...@gmail.com: Hello Lazarus-List, Saturday, February 27, 2010, 5:43:24 PM, you wrote: A Mainly from ANSI, but whatever. You can not determine if a string is an UTF8 one, only if it is

[Lazarus] UTF-8 string recognition

2010-02-26 Thread Antônio
How to determine whether a string is UTF-8 or not? -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus