On zaterdag 25 april 2020 21:51:41 CEST Joseph Brenner wrote:
> > Yary has an issue posted regarding 'display-width' of UTF-16 encoded
strings:
> > https://github.com/rakudo/rakudo/issues/3461
> >
> > I know it might be far-fetched, but what if your UTF-8 issue and
>
> Yary's UTF-16 issue
On maandag 27 april 2020 09:49:20 CEST Joseph Brenner wrote:
> After you do a .readchars, what point in the file would you expect to
> be "current"? I would expect it would be the point right after the
> last char read. Instead that's true if you're reading ascii
> characters but not unicode
After you do a .readchars, what point in the file would you expect to
be "current"? I would expect it would be the point right after the
last char read. Instead that's true if you're reading ascii
characters but not unicode characters up above the ascii range, in
which case the "current" point
I decided to open an issue for this one. Even if there's no practical
fix for the behavior of readchars, I'd think this odd meaning of the
"current" point in the file would need to be better documented:
https://github.com/rakudo/rakudo/issues/3646
I simplified the test I've been using:
use
> Yary has an issue posted regarding 'display-width' of UTF-16 encoded strings:
> https://github.com/rakudo/rakudo/issues/3461
> I know it might be far-fetched, but what if your UTF-8 issue and
Yary's UTF-16 issue were related
Well, an issue with handling combining characters could easily
Hi Joe,
I was able to run the code you posted and reproduced the exact same
result (Rakudo version 2020.02.1..1 built on MoarVM version
2020.02.1 implementing Raku 6.d). I tried playing with file encodings a bit
(e.g. UTF8-C8), but I didn't see any improvement.
Yary has an issue posted
I was just posting that.
On 4/24/20, Elizabeth Mattijsen wrote:
>
>
>> On 24 Apr 2020, at 22:03, Joseph Brenner wrote:
>>
>> Thanks, yes I understand unicode and utf-8 reasonably well.
>>
>>> So Rakudo has to read the next codepoint to make sure that it isn't a
>>> combining codepoint.
>>
>>>
Another version of my test code, checking .tell throughout:
use v6;
use Test;
my $tmpdir = IO::Spec::Unix.tmpdir;
my $file = "$tmpdir/scratch_file.txt";
my $unichar_str = "\x[1200]\x[2D80]\x[4DFC]\x[]\x[2CA4]\x[2C8E]"; # ሀⶀ䷼ꪪⲤⲎ
my $ascii_str = "ABCDEFGHI";
> On 24 Apr 2020, at 22:03, Joseph Brenner wrote:
>
> Thanks, yes I understand unicode and utf-8 reasonably well.
>
>> So Rakudo has to read the next codepoint to make sure that it isn't a
>> combining codepoint.
>
>> It is probably faking up the reads to look right when reading ASCII, but
Thanks, yes I understand unicode and utf-8 reasonably well.
> So Rakudo has to read the next codepoint to make sure that it isn't a
> combining codepoint.
> It is probably faking up the reads to look right when reading ASCII, but
> failing to do that for wider codepoints.
I think it'd be the
In UTF8 characters can be 1 to 4 bytes long.
UTF8 was designed so that 7-bit ASCII is a subset of it.
Any 8bit byte that has its most significant bit set cannot be ASCII.
So multi-byte codepoints have the most significant bit set for all of the
bytes.
The first byte can tell you the number of
11 matches
Mail list logo