Hi,
I am trying to implement a state based recursice descent SIP
parser using re2c for the lexer and have a hand-coded parser.
I have a problem here that with parsing the Absolute Uri, "Accept"
header and the generic param.
1. Absolute Uri:
absoluteURI = scheme ":" ( hier-part / opaque-part )
hier-part = ( net-path / abs-path ) [ "?" query ]
net-path = "//" authority [ abs-path ]
abs-path = "/" path-segments
opaque-part = uric-no-slash *uric
uric = reserved / unreserved / escaped
uric-no-slash = unreserved / escaped / ";" / "?" / ":" / "@"/
"&" / "=" / "+" / "$" / ","
path-segments = segment *( "/" segment )
segment = *pchar *( ";" param )
param = *pchar
pchar = unreserved / escaped /":" / "@" / "&" / "=" /
"+" / "$" / ","
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
/*Problem*/
authority = srvr / reg-name
srvr = [ [ userinfo "@" ] hostport ]
reg-name = 1*( unreserved / escaped / "$" / ","
/ ";" / ":" / "@" / "&" / "=" / "+" )
query = *uric
Here the problem is with authority.
I have srvr and reg-name in the same state and I am getting
reg-name for almost cases from the lexer. I have no way of
differentiating between srvr and reg-name. So, I can't even put
them in two different states.
2. Accept Header:
Accept = "Accept" HCOLON
( accept-range *(COMMA accept-range) )
accept-range = media-range *(SEMI accept-param)
media-range = ( "*/*"
/ ( m-type SLASH "*" )
/ ( m-type SLASH m-subtype )
) *( SEMI m-parameter )
accept-param = ("q" EQUAL qvalue) / generic-param
qvalue = ( "0" [ "." 0*3DIGIT ] )
/ ( "1" [ "." 0*3("0") ] )
The problem that I have here is that, I can't decide what to parse
after the SEMI in the accept-range. It could be either a
m-parameter in the media-range or it could be the accept param
after the SEMI.
3. Generic Param:
generic-param = token [ EQUAL gen-value ]
gen-value = token / host / quoted-string
Here the probelm is that token is a superset of the character set
of host. Now in the lexer, if I define TOKEN first, I am returned
a TOKEN evertime and no instance of HOST is found.
If I define host first, then in most cases I am getting a HOST and
TOKEN in only a very few cases where some character that is not in
HOST is present is present.
Somebody please tell me a solution for these problems.
ciao,
Akshat
_________________________________________________________
There is always a better job for you at Monsterindia.com.
Go now http://monsterindia.rediff.com/jobs
_______________________________________________
Sip-implementors mailing list
[EMAIL PROTECTED]
http://lists.cs.columbia.edu/mailman/listinfo/sip-implementors