Re: [fpc-pascal] Parse unicode scalar

2023-07-02 Thread Hairy Pixels via fpc-pascal
> On Jul 3, 2023, at 12:04 PM, Mattias Gaertner via fpc-pascal > wrote: > > No, the header of a codepoint to figure out the length. so the smallest character UTF-8 can represent is 2 bytes? 1 for the header and 1 for the character? ASCII #100 is the same character in UTF-8 but it needs a

Re: [fpc-pascal] Parse unicode scalar

2023-07-02 Thread Mattias Gaertner via fpc-pascal
On Mon, 3 Jul 2023 11:58:33 +0700 Hairy Pixels via fpc-pascal wrote: > > On Jul 3, 2023, at 11:43 AM, Mattias Gaertner via fpc-pascal > > wrote: > > > > There is a header byte. > > > > It depends, if you want to check for invalid UTF-8 sequences. > > > > From LazUTF8: > > > > function UTF8Co

Re: [fpc-pascal] Parse unicode scalar

2023-07-02 Thread Hairy Pixels via fpc-pascal
> On Jul 3, 2023, at 11:36 AM, Mattias Gaertner via fpc-pascal > wrote: > > Useless array of. > And it does not return the bytecount. it's an open array so what's the problem? > >> var >> i: Integer; >> byteCount: Integer; >> begin >> // Number of bytes required to represent the Unicode

Re: [fpc-pascal] Parse unicode scalar

2023-07-02 Thread Hairy Pixels via fpc-pascal
> On Jul 3, 2023, at 11:43 AM, Mattias Gaertner via fpc-pascal > wrote: > > There is a header byte. > > It depends, if you want to check for invalid UTF-8 sequences. > > From LazUTF8: > > function UTF8CodepointSizeFast(p: PChar): integer; > begin > case p^ of >#0..#191 : Result := 1

Re: [fpc-pascal] Parse unicode scalar

2023-07-02 Thread Mattias Gaertner via fpc-pascal
On Mon, 3 Jul 2023 08:29:11 +0700 Hairy Pixels via fpc-pascal wrote: > > On Jul 2, 2023, at 11:16 PM, Jer Haan wrote: > > > > This table is copied from Wikipedia.Hope it’s useful > > for you. If you improve the code pls let me know. > > This is perfect, thanks! Much more complicated than I th

Re: [fpc-pascal] Parse unicode scalar

2023-07-02 Thread Mattias Gaertner via fpc-pascal
On Mon, 3 Jul 2023 09:34:10 +0700 Hairy Pixels via fpc-pascal wrote: >[...] > Ok today I I just tried to ask ChatGPT and got an answer. I must have > asked the wrong thing yesterday but it got it right today (with one > syntax error using an inline "var" in the code section for some > reason). >

Re: [fpc-pascal] Parse unicode scalar

2023-07-02 Thread Hairy Pixels via fpc-pascal
> On Jul 3, 2023, at 12:20 AM, Nikolay Nikolov via fpc-pascal > wrote: > > There's no such thing as "unicode scalar" in Unicode terminology: > > https://unicode.org/glossary/ I got it from here https://docs.swift.org/swift-book/documentation/the-swift-programming-language/stringsandcharact

Re: [fpc-pascal] Parse unicode scalar

2023-07-02 Thread Hairy Pixels via fpc-pascal
> On Jul 2, 2023, at 11:16 PM, Jer Haan wrote: > > This table is copied from Wikipedia.Hope it’s useful for you. > If you improve the code pls let me know. > This is perfect, thanks! Much more complicated than I thought. I'm curious now, if you were going the other direction and parsing a s

Re: [fpc-pascal] Parse unicode scalar

2023-07-02 Thread Nikolay Nikolov via fpc-pascal
On 7/2/23 20:38, Martin Frb via fpc-pascal wrote: On 02/07/2023 19:20, Nikolay Nikolov via fpc-pascal wrote: On 7/2/23 16:30, Hairy Pixels via fpc-pascal wrote: I'm interested in parsing unicode scalars (I think they're called) to byte sized values but I'm not sure where to start. First thing

Re: [fpc-pascal] Parse unicode scalar

2023-07-02 Thread Martin Frb via fpc-pascal
On 02/07/2023 19:20, Nikolay Nikolov via fpc-pascal wrote: On 7/2/23 16:30, Hairy Pixels via fpc-pascal wrote: I'm interested in parsing unicode scalars (I think they're called) to byte sized values but I'm not sure where to start. First thing I did was choose the unicode scalar U+1F496 (πŸ’–).

Re: [fpc-pascal] Parse unicode scalar

2023-07-02 Thread Nikolay Nikolov via fpc-pascal
On 7/2/23 16:30, Hairy Pixels via fpc-pascal wrote: I'm interested in parsing unicode scalars (I think they're called) to byte sized values but I'm not sure where to start. First thing I did was choose the unicode scalar U+1F496 (πŸ’–). There's no such thing as "unicode scalar" in Unicode termin

[fpc-pascal] Parse unicode scalar

2023-07-02 Thread Hairy Pixels via fpc-pascal
I'm interested in parsing unicode scalars (I think they're called) to byte sized values but I'm not sure where to start. First thing I did was choose the unicode scalar U+1F496 (πŸ’–). Next I cheated and ask ChatGPT. :) Amazingly from my question it was able to tell me the scaler is comprised of t