On Fri, Sep 25, 2020 at 12:37:49PM +0200, Elizabeth Mattijsen wrote: > > On 25 Sep 2020, at 04:25, Brad Gilbert <b2gi...@gmail.com> wrote: > > Rakudo does not use ICU > > > > It used to though. > > > > Rakudo used to run on Parrot. > > Parrot used ICU for its Unicode features. > > I do remember that in the Parrot days, any non-ASCII character in > any string, would have a significant negative effect on grammar parsing. > This was usually not that visible when trying to run a script, but the > time needed to compile the core setting (which already took a few minutes > then) rose (probably exponentially) to: well, I don't know.
Part of this is because Parrot/ICU was using UTF-8 and/or UTF-16 to encode non-ASCII strings. As a result, indexing into a string often became a O(n) operation instead of O(1). For short strings, no problem, for long strings (such as the core setting) it was really painful. We did work on some ways in Parrot/NQP to reduce the amount of string scanning involved, such as caching certain index-points in the string, but it was always a bit of a hack. Switching to a fixed-width encoding (NFG, which MoarVM implements) was definitely the correct path to take there. Pm