The idea is to introduce a kind of soft skip for tokens. Ragg will accept them 
in the grammar but leave them out of the syntax object for the production.

I've had the opportunity to discover that it's really hard to parse syslog 
messages properly. Especially since I want to accept the traditional and new 
format.

To be able to write a sensible grammar I must include whitespace in the token 
stream, and a couple other bytes for delimiting parts of the message.

header: "<" NUM ">" timestamp SP hostname SP app-name SP [procid] SP [msgid]
date: (STR SP NUM SP) | NUM "-" NUM "-" NUM

Also I must stitch together strings since delimiting tokens are valid in some 
parts

hostname: (STR|'<'|'>'|':'|'='|'['|']'|'.'|'-'|'+'|'T'|'Z')+

It would be nice to have the extraneous tokens not in the syntax for the 
productions. One possibility might be to use an escape character in front of 
them, I'll use @ here.

header: @"<" NUM @">" timestamp @SP hostname @SP app-name @SP [procid] @SP 
[msgid]

For the space case a #:soft-skip keyword in (token …) would work to always 
purge it from the result.

What do you think about this? I don't know that much about parsing yet so if 
there's another way to handle this I'd be interested to hear.

Lo
____________________
  Racket Users list:
  http://lists.racket-lang.org/users

Reply via email to