Re: "ICU - International Components for Unicode"

Patrick R. Michaud Fri, 25 Sep 2020 11:16:44 -0700

On Fri, Sep 25, 2020 at 12:37:49PM +0200, Elizabeth Mattijsen wrote:
> > On 25 Sep 2020, at 04:25, Brad Gilbert <b2gi...@gmail.com> wrote:
> > Rakudo does not use ICU
> > 
> > It used to though.
> > 
> > Rakudo used to run on Parrot.
> > Parrot used ICU for its Unicode features.
> 
> I do remember that in the Parrot days, any non-ASCII character in 
> any string, would have a significant negative effect on grammar parsing.  
> This was usually not that visible when trying to run a script, but the 
> time needed to compile the core setting (which already took a few minutes 
> then) rose (probably exponentially) to: well, I don't know.


Part of this is because Parrot/ICU was using UTF-8 and/or UTF-16 to
encode non-ASCII strings.  As a result, indexing into a string often 
became a O(n) operation instead of O(1).  For short strings, no problem,
for long strings (such as the core setting) it was really painful.

We did work on some ways in Parrot/NQP to reduce the amount of string
scanning involved, such as caching certain index-points in the string, 
but it was always a bit of a hack.  Switching to a fixed-width encoding
(NFG, which MoarVM implements) was definitely the correct path to take
there.

Pm

Re: "ICU - International Components for Unicode"

Reply via email to