Hello,

I am very interested in chromatic's pheme language. I have been reading through 
the code
and looking at your TODO list. I thought I would tackle some of the easier 
issues
to get a handle on PIR and help out a bit.

questions:

1. Are you targetting r5 or r6 ? I think r6 would be a better fit for parrot
   myself. In particular the spec for (library foo) aka name-spaces would
   help pheme integrate with parrot/other languages better.
 
I decided to start with something easy: whitespace.

I looked up r6 which has a nice BNF grammar that is a useful starting point.
I came up with the rules below:

rule ignore { [ <comment> | <delimiter> ]* }

token comment { ; \N* <eol> }

rule delimiter { <blank> | <eol> }
token blank { <[\ \t]>+ }
token eol { \n\r? }

I know almost zero about PGE, I am reading docs etc. but basically what
I would like to do is build tokens out of tokens. Ideally I would like
to make both "ignore" and "delimiter" be tokens , not rules.

This is more of a writing convenience. In a difficult sed script I built in 
the past to go through and convert a bunch of broken C++ decls I would use
shell variables to store regex building blocks, and then assemble those
building blocks into higher level expressions with basic string interpolation.

question:

1. can tokens be used to build tokens ? In perl 5 I would compile a regex with 
   string interpolation to get this sort of functionality. if so is there a 
name 
   for this feature in PGE ? 

2. token eol { \n\r? }

   This is pretty clearly for handling windows line terminators. This is the 
sort
   of thing that should be pushed down into parrot. a special builtin "eol" or
   "end-of-line" token could help get rid of this stuff out of parrot. Is this
   RT worthy ? Something like this would definitely fit with the "conservation 
of
   cruft" principle.

3. Is there a tool for pretty printing a AST dump ? I am thinking of dumping
   the AST using dump, then using a classic tree drawing algorithm , and
   drawing a tree using SVG. Something like that could probably be done
   easily in perl5. Is there a tool like this ?

4. how do you debug AST ? recommended tools ?

atom handling:

I noticed that atom handling looked very alpha. It looks like you want to 
distinguish
between symbols "foo" | "foo-bar" , and literal values "#t" | "#f". This is 
really nasty
to do at a lexical level.

A nicer way to do this would be to form a string token like this:

token string { <!reserved>+ }

token reserved {
  # r5 reserved
  # future reserved
  <[
    \( \) \# '

    \[ \] \{ \}
  ]>
}

At this point most languages are going to need to post-lexical analysis of the 
string to distinguish literal values from symbols. A syntax like this would be
nice:

token truth:string {
  # <[tf]>
}

token integral:string {
  \d+
}

token symbol:string fallback

This syntax would indicate that after the token string has been lexed that
it is again analyzed by a regex, and converted to either a truth value,
integral value, or a symbol if all else fails.

If this is not already implemented I would like to create a TODO RT for
it.

Thanks for any comments/suggestions.

Cheers,
Mike Mattie - [EMAIL PROTECTED]

Attachment: signature.asc
Description: PGP signature

Reply via email to