Fwd: URI replacement pseudocode

Aaron Sherman Mon, 17 May 2010 12:01:48 -0700

Ooops, took this off-list by accident.

---------- Forwarded message ----------
From: ajs <a...@ajs.com>
Date: Mon, May 17, 2010 at 2:59 PM
Subject: Re: URI replacement pseudocode
To: Moritz Lenz <mor...@faui2k3.org>

Thank you for your responses!

On Mon, May 17, 2010 at 1:37 PM, Moritz Lenz <mor...@faui2k3.org> wrote:

> Aaron Sherman wrote:
> > Here's the code:
> >
> >
> https://docs.google.com/leaf?id=0B41eVYcoggK7YjdkMzVjODctMTAxMi00ZGE0LWE1OTAtZTg1MTY0Njk5YjY4&hl=en
>
> I think your code would benefit greatly from actually trying to get run
> with Rakudo (or at least that parts that are yet implemented), as well
> as from a version control system.
>

(re: storage. yes, I intend to get this into something. not sure what, yet.
git is preferred, I presume?)

I had a hard time even getting basic code working like:

  token foo { blah }
  if "blah" ~~ m/<foo>/ { say "blah!" }

(See my question to the list, last week)

so I really didn't want to venture into trying to get this working, but
yeah, now that it's done I'll see how Rakudo chokes on it.

>
> > So, my questions are:
> >
> > * Is this code doing anything that is explicitly not Perl 6ish?
>
> Some things I've noticed:
> * you put lots of subs into roles - you probably meant methods
>

Well... that's a fair question. What does a method mean in a grammar? I
wasn't too clear on what being a method of a grammar meant. Should I be
calling these as class-methods?

> * Don't inherit from roles, implement them with 'does'
>

I did that, didn't I? Did I typo something?

   grammar URI::rfc2396 does URI::Grammarish ...

> * the grammars contain a mixture of tokens for parsing and of
> methods/subs for data extraction; yet Perl 6 offers a nice way to
> separate the two, in the form of action/reduction methods; your code
> might benefit from them.
>

Do you have a pointer for some discussion of this? I'd love to pursue it.

> * class URI::GrammarType seems not very extensible... maybe keep a hash
> of URI names that map to URIs, which can be extended by method calls?
>

The idea that I was working with was that you would provide the grammar
itself when you wanted to do something custom, and the string names were
just a convenience for the default cases.  So, for example:

  my URI $privatewww .= new("ajs://perl**6", :spec(::MyURI::Spec));

Where MyURI::Spec could be any grammar that implements the URI::Grammarish
interface (see grammar interface discussion, below). I can look into
extending it with string names as well, though.

> > * Is this style of pluggable grammar the correct approach?
>
> Looks good, from a first glance.
>

Thanks!

>
> > * Should I hold off until R* to even try to convert this into working
> code?
>
> No need for that. The support for grammars and roles is pretty good,
> character classes and match objects are still a bit unstable/whacky.
>

Is there any collected wisdom available on this? I'd love to not run around
chasing my own tail trying to figure out why something doesn't work.

> > * Am I correct in assuming that <...> in a regex is intended to allow the
> > creation of interface roles for grammars?
>
> You lost me here. <identifier(...)> calls a named rule (with arguments).
> Could you rephrase your question?

Sure.

All S05 says is "The <...>, <???>, and <!!!> special tokens have the same
"not-defined-yet" meanings within regexes that the bare elipses have in
ordinary code." Which doesn't tell me a lot, but seems to imply that:

role blah { token bletch { <...> } }

is roughly analogous to:

role blah { method bletch {...} }

that is to say, the role should have an interface which, when applied to a
grammar, would assert the presence of a bletch token. Am I reading too much
into this? If yes, is there a way to assert role-based interfaces on
grammars? The main reason I wanted this was for the very parametric grammar
selection we were talking about, above, where the given block says:

given $type {
    when .does(URI::Grammarish) { $.gtype = $_ }

I'm assuming, of course, that I can make such assertions about a grammar in
the same way that I would make them about a class. Is this true? Have I
identified an interface token/rule correctly given that that was my goal?

> * I guessed wildly at how I should be invoking the match against a saved
> > "token" reference:
> >         if $s ~~ m/^ <.$.spec.gtype.URI_reference> $/ {
> >   is that correct?
>
> probably just $s ~~ /^ $regex $/;
>

But what should $regex contain? I have a $.spec which contains a reference
to the URI::GrammarType object whose $.gtype identifies the grammar I should
be using. That grammar is guaranteed to have a URI_reference rule, so the
variable is:

  $.spec.gtype.URI_reference

Should that be:

  if $s ~~ m/^ <$.spec.gtype.URI_reference> $/ { ... }

then? Do I need to copy that into a lexical to avoid confusing the rule
parser? I think I'll do that, just to avoid all the confusion anyway.

> > * Are implementations going to be OK with massive character classes like:
> > <+[\xA0 .. \xD7FF] + [\xF900 .. \xFDCF] + [\xFDF0 .. \xFFEF] +
> >   [\x10000 .. \x1FFFD] + [\x20000 .. \x2FFFD] +
> >   [\x30000 .. \x3FFFD] + [\x40000 .. \x4FFFD] +
> >   [\x50000 .. \x5FFFD] + [\x60000 .. \x6FFFD] +
> >   [\x70000 .. \x7FFFD] + [\x80000 .. \x8FFFD] +
> >   [\x90000 .. \x9FFFD] + [\xA0000 .. \xAFFFD] +
> >   [\xB0000 .. \xBFFFD] + [\xC0000 .. \xCFFFD] +
> >   [\xD0000 .. \xDFFFD] + [\xE1000 .. \xEFFFD]>
> > (from the IRI specification)
>
> Funny thing, why does it exclude the FFFE and FFFF codepoints?
> Anyway, I can't answer that question.
>

FFFE and FEFF are used to manage byte-ordering, so they really shouldn't be
part of a URI (URIs should exist in a context in which byte ordering is
assured, would be my take).

The Unicode spec says that FFFF is guaranteed not to be a valid Unicode
character, but does not explain why. [
http://unicode.org/charts/PDF/UFFF0.pdf]

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs

-- 
Aaron Sherman
Email or GTalk: a...@ajs.com
http://www.ajs.com/~ajs

Fwd: URI replacement pseudocode

Reply via email to