On 11/14/2013 12:22 PM, Jelle Feringa wrote:
My question is what is the right way to go about this?
Here we have an example of a procedure defined in RAPID.

The example seems to be missing, but in general, you don't start with the parser, you start with the scanner, identifying the individual words that you should recognize.

> PROC top_front( string strNoStepIn )
>      ! procedure block
>      MoveL ...;
> ENDPROC

becomes a sequence of tokens (1 per line), empty lines and // text is added to clarify what you read. (token names are written all uppercase)

PROC
IDENTIFIER(top_front)
PARENTHESIS_OPEN
STRING    // if "string" is not a built-in, it would become an IDENTIFIER
IDENTIFIER(strNoStepIn)
PARENTHESIS_CLOSE

// Assuming ! means 'comment', skipped it.

MoveL

// skipped some

SEMICOLON
ENDPROC

You break down your input text in these small elementary words with the scanner. I didn't do it, but it's often useful to add a suffix or prefix to keywords (I use ...KW, eg PROCKW), and other tokens (I use ...TK), it makes the parser rules below more readable, and avoids name conflicts between different tokens that are closely related, like the keyword string denoting a type and a literal string like "abcd".



The parser takes this stream of tokens, and reconstructs the parts you want to keep together, with grammar rules, like

Procedure : PROC IDENTIFIER PARENTHESIS_OPEN FormalParameters PARENTHESIS_CLOSE 
Statments ENDPROC ;
Procedure : PROC IDENTIFIER PARENTHESIS_OPEN PARENTHESIS_CLOSE Statments 
ENDPROC ;

A "Procedure" is thing that starts with the keyword PROC and ends with the keyword ENDPROC. There are 2 variants, one with and one without FormalParameters.

FormalParameters : FormalParameter
                 | FormalParameters COMMA FormalParameter
                 ;

FortmalParameter : Type IDENTIFIER ;

Type : STRING
     | ...
     ;

FormalParameters is one or more FormalParameter, separated by COMMA. The latter is a sequence of Type and IDENTIFIER.

Intuitively I would write a regex that matches the
name of the procedure, its argument and the procedure block.

In general, regex is not powerful enough to handle programming languages. 
Consider the case

string x = "endproc";

in the middle of a proc. Good luck detecting the right 'endproc' word. Similar cases exist when a user comments away a part of a proc.

You may get it working for a set of cases, but all cases that are valid for the RAPID compiler is impossible, probably.

Since this pattern is so present in the language, I'd like to get it
right and in a p(l)ythonic manner. Thing is that I'm too new to the
parsing to really see that.

The pattern is not really special, { .. } or BEGIN .. END are mostly the same thing, although they group different things.

Good luck with your parsing adventure,
Albert

--
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ply-hack/5284BA48.7040408%40tue.nl.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to