Re: DMD: invalid UTF character `\U0000d800`
On Monday, 9 November 2020 at 16:39:49 UTC, Boris Carvajal wrote: There's also: dchar(0xd8000) Thanks
Re: DMD: invalid UTF character `\U0000d800`
On Sunday, 8 November 2020 at 10:47:34 UTC, Per Nordlöw wrote: Can I just do, for instance, cast(dchar)0xd8000 for `\Ud800` to accomplish this? There's also: dchar(0xd8000)
Re: DMD: invalid UTF character `\U0000d800`
On 2020-11-08 13:39, Kagamin wrote: Surrogate pairs are used in rules because java strings are utf-16 encoded, it doesn't make much sense for other encodings. D supports the UTF-16 encoding as well. The compiler doesn't accept the surrogate pairs even for UTF-16 strings. -- /Jacob Carlborg
Re: DMD: invalid UTF character `\U0000d800`
On 11/8/20 5:47 AM, Per Nordlöw wrote: On Saturday, 7 November 2020 at 17:49:54 UTC, Jacob Carlborg wrote: [1] https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF Thanks! I'm only using these UTF characters to create ranges that source code characters as checked against during parsing. Therefore I would like to just convert these to a `dchar` for now using a `cast`. Can I just do, for instance, cast(dchar)0xd8000 for `\Ud800` to accomplish this? Yes, use the cast. It should work. It's just the D grammar that is stopping you, a dchar is just an integer under the hood, so the cast should be fine. -Steve
Re: DMD: invalid UTF character `\U0000d800`
On Sunday, 8 November 2020 at 10:47:34 UTC, Per Nordlöw wrote: dchar Surrogate pairs are used in rules because java strings are utf-16 encoded, it doesn't make much sense for other encodings.
Re: DMD: invalid UTF character `\U0000d800`
On Sunday, 8 November 2020 at 10:47:34 UTC, Per Nordlöw wrote: cast(dchar)0xd8000 To clarify, enum dch1 = cast(dchar)0xa0a0; enum dch2 = '\ua0a0'; assert(dch1 == dch2); works. Can I use the first-variant if I want to postpone these encoding questions for now?
Re: DMD: invalid UTF character `\U0000d800`
On Saturday, 7 November 2020 at 17:49:54 UTC, Jacob Carlborg wrote: [1] https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF Thanks! I'm only using these UTF characters to create ranges that source code characters as checked against during parsing. Therefore I would like to just convert these to a `dchar` for now using a `cast`. Can I just do, for instance, cast(dchar)0xd8000 for `\Ud800` to accomplish this?
Re: DMD: invalid UTF character `\U0000d800`
On Saturday, 7 November 2020 at 16:12:06 UTC, Per Nordlöw wrote: CtoLexer_parser.d 665 57 error invalid UTF character \Ud800 CtoLexer_parser.d 665 67 error invalid UTF character \Udbff CtoLexer_parser.d 666 28 error invalid UTF character \Ud800 CtoLexer_parser.d 666 38 error invalid UTF character \Udbff CtoLexer_parser.d 666 53 error invalid UTF character \Udc00 CtoLexer_parser.d 666 63 error invalid UTF character \Udfff Doesn't DMD support these Unicodes yet? They're not valid: "The Unicode standard permanently reserves these code point values for UTF-16 encoding of the high and low surrogates, and they will never be assigned a character, so there should be no reason to encode them. The official Unicode standard says that no UTF forms, including UTF-16, can encode these code points" [1]. "... the standard states that such arrangements should be treated as encoding errors" [1]. Perhaps they need to be combined with other code points to form a valid character. [1] https://en.wikipedia.org/wiki/UTF-16#U+D800_to_U+DFFF -- /Jacob Carlborg