Now I’m using ragel to build a http parser.
One requirement of the parser is to parse all the http headers, which would
be used by another module.
For some common headers, the header value would be parsed directly by ragel.
For other headers, only mark the header name and value.
Here is my ragel script:
message_header =(
("Content-Length"i ":" LWS* digit+ $on_content_length)
| ("Transfer-Encoding"i ":" LWS* ("Chunked"i %
{request->transfer_encoding = HTTP_TE_CHUNKED;} | any* > start_TE %
finish_TE))
| ("Connection"i ":" LWS* ("Keep-alive"i %
{request->connection = HTTP_CONNECTION_KEEP_ALIVE;} | "Close"i %
{request->connection = HTTP_CONNECTION_CLOSE;}))
| ("Host"i ":" LWS* field_content >start_host
%*finish_host*%/break_host)
| ("Accept"i ":" LWS* field_content >start_accept %
finish_accept %/break_accept)
| ("Accept-Charset"i ":" LWS* field_content
>start_accept_charset % finish_accept_charset %/break_accept_charset)
| ("Accept-Encoding"i ":" LWS* field_content
>start_accept_encoding % finish_accept_encoding %/break_accept_encoding)
| ("Accept-Language"i ":" LWS* field_content
>start_accept_language % finish_accept_language %/break_accept_language)
| ("User-Agent"i ":" LWS* field_content >start_user_agent %
finish_user_agent %/break_user_agent)
| ("Referer"i ":" LWS* field_content >start_referer %
finish_referer %/break_referer)
| ("X-Forward-For"i ":" LWS* field_content
>start_x_forward_for % finish_x_forward_for %/break_x_forward_for)
| ((token) >start_header_name %finish_header_name
%/break_header_name":" LWS* field_content % *header_end*)
) :>CRLF;
Here there is a nondeterminism between token and the common header names.
For example, when a Host header value finished, the header’s action
finish_host and the “token” action header_end will be triggered.
How can I resolve this kind of problem?
Thanks.
Hongbin.
_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users