Pedro Lopes wrote:
> Hi, I have a difficult(?) problem to solve in PLY. I'm trying to parse
> a
> little language that allows statements to be broken across several
> lines
> by "escaping" the newline with \. This wouldn't be unusual, except in
> this case the break is allowed anywhere, even in the middle of a
> token.

In my view, the best solution would be to ignore the latter part, and only 
allow a subset of specifications as input.

> Here is a real example, note how the "chrU42" identifier is split
> across
> 2 lines:
> 
>   test4 = chrt34||chrh35||chre36||chrF38||chrO39||chrR40||chrM41||chr\
>   U42||chrL43||chrA44||chrP46||chrA47||chrR48||chrS49||chrE50||chrR51\
>   &&y>0.03&&y<0.07

I see, but I am not convinced of the need.

> Now, this would be easy to do by preprocessing the input to PLY with
> a
> regex, but I would rather do it the lexer. Problem is, I can't figure
> out how. Ignored characters in the lexer aren't really ignored
> because
> they still act as token delimiters, so that doesn't work. Ideas?

You'd have to allow for them to exist anywhere between two characters in the 
token, ie a number token [0-9]+ would become (\\\n)*[0-9]((\\\n)*[0-9])+ where 
"(\\\n)" is the escape sequence.
I suspect you have to think carefully about white space too to prevent 
ambiguity problems.

While this will probably work, it is ugly to say the least.




Aside from doing pre-processing as you already suggested, you could also write 
a custom scanner yourself. It is not terribly difficult, and it gives you the 
freedom to make it work correctly (eg the position information of tokens is 
going to be a challenge with disappearing \n characters)


Albert

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/ply-hack?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to