Over the past week, I've been using my scant bits of nighttime coding to
cobble together a pseudocode version of what I think the URI module should
look like. There's already one available as example code, but it doesn't
actually implement either the URI or IRI spec correctly. Instead, this
approach uses a pluggable grammar so that you can:

  my URI $uri .= new( get_url_from_user(), :spec<IRI> )

which would parse the given URL using the RFC3987 IRI grammar. By default,
it will use RFC3896 to parse URIs, which does not implement the UCS
extensions. It can even handle the "legacy" RFC2396 and regex-based RFC3896
variations.

Here's the code:

https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4&hl=en

So, my questions are:

* Is this code doing anything that is explicitly not Perl 6ish?
* Is this style of pluggable grammar the correct approach?
* Should I hold off until R* to even try to convert this into working code?
* What's the best way to write tests/package?
* Am I correct in assuming that <...> in a regex is intended to allow the
creation of interface roles for grammars?
* I guessed wildly at how I should be invoking the match against a saved
"token" reference:
        if $s ~~ m/^ <.$.spec.gtype.URI_reference> $/ {
  is that correct?
* Are implementations going to be OK with massive character classes like:
<+[\xA0 .. \xD7FF] + [\xF900 .. \xFDCF] + [\xFDF0 .. \xFFEF] +
  [\x10000 .. \x1FFFD] + [\x20000 .. \x2FFFD] +
  [\x30000 .. \x3FFFD] + [\x40000 .. \x4FFFD] +
  [\x50000 .. \x5FFFD] + [\x60000 .. \x6FFFD] +
  [\x70000 .. \x7FFFD] + [\x80000 .. \x8FFFD] +
  [\x90000 .. \x9FFFD] + [\xA0000 .. \xAFFFD] +
  [\xB0000 .. \xBFFFD] + [\xC0000 .. \xCFFFD] +
  [\xD0000 .. \xDFFFD] + [\xE1000 .. \xEFFFD]>
(from the IRI specification)

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs

Reply via email to