Hello all, As you probably know by now, I'm really fired up about making sure that Parrot's string handling works well and supports a good range of character sets and encodings. To this end, I sketched out PDD28 with Allison and a cast of hundreds, and now I'm working on implementing it.
It's going to involve ripping out most of the strings support in Parrot at the moment and replacing it with something that both (at a core level) abstracts away access to strings so that the whole encoding/charset/normalization palaver is hidden away from ordinary string producers and consumers, and which also does contain good support for handling and converting between all the various string formats in existence. Getting this right is a Hard Problem, and so my plan to ensure a well working proof-of-concept involves sketching it all out as a prototype in a higher-level language first. So I've been implementing it in my current favourite high-level language, Perl 6. Another benefit of this is that having it all in Perl 6 is a little more accessible for people who want to look in and see how it all works. I've created a branch in the repository called "strings", and there's a new scratch directory in there called "pseudocode". (It was going to contain pseudocode, but on the Perl-is-executable-pseudocode principle...) Right now there is an implementation of Parrot strings which supports a few basic features - and some not so basic ones. The latest commit I made allows you to take a string in UTF8, convert it to a ParrotNative encoded string in NFG, and then convert it back to UTF8 aagain. This means we can read and write UTF8 and NFG. I've implemented about a quarter of the Parrot strings API. If you have a spare minute or two, please have a look at strings/pseudocode and let me know what you think. If you can write Perl 6 (or even Perl 5) and read PDD28, then I'm especially looking forward to receiving more tests, awkward corner cases, attempts to break the code and so on. I'd like to know that our algorithms are robust. Feel free to check in or send me failing tests so I can make them work. Other general thoughts, encouragements or comments would also be welcome! Thanks, Simon