It is indeed a common problem. It is purposefully left in the hands of the programmer. This is due to a few factors.

-There is no general solution that doesn't involve memory allocation.
-There are many different ways to approach memory allocation
-There are many use cases that have always input in one buffer

When I need a general solution I use automatically growing buffers (to a limit). See the DSNPd source for an example. I don't think that buffer class is limited, but it should be.

http://svn.complang.org/choicesocial/trunk/dsnpd/parser.rl

There are other approaches though. See the Ragel manual (5.9) for a short discussion.

Regards,
 Adrian

On 11-01-20 03:26 PM, Benjamin van der Veen wrote:
Hello,

I am using Ragel to make an HTTP parser. Feel free to tell me this is a 
terrible idea. ;)

It seems to me that a common problem faced by users of Ragel is that they do 
not know in advance where (with respect to the grammar being parsed) the 
boundaries of buffers that they feed the parser are going to be. For example, I 
can easily make a Ragel grammar which will parse the following using only 
entering and leaving actions:

"GET /foo HTTP/1.1\r\nBar: Baz\r\n\r\n"

However the parser breaks if I feed it the same data across multiple buffers 
(as would be the case when reading chunks of data from a network socket):

"GE"
"T /f"
"oo HTTP/1.1\r"
"\nBar: Baz\r\n\r\n"

I found that this can be mitigated against by using EOF-leaving actions 
(%/some_action) and always setting eof to pe to cause the EOF-leaving actions 
to occur. However I'm finding that it isn't consistent and leads to unexpected 
behavior in some cases. Note that I am using the regular expression syntax, not 
the state chart syntax.

What is the recommended approach to this problem? My intuition is that a 
properly-specified state machine should work regardless of how data is fed to 
it and Ragel should make this opaque to the user—it seems to me that processing 
data across multiple buffers would be a very common problem that Ragel would 
solve for the user, but I may be mistaken.

In general I'm rather confused about how EOF actions are handled and when 
entering or leaving actions are treated as EOF actions. I've pored over the 
manual but I feel like it's all predicated on some knowledge that I don't have 
and am unsure where to look to find. In particular the first two paragraphs of 
section 3.1.4 (Leaving Actions) are almost completely opaque to me.

Cheers!
Benjamin van der Veen
_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users


_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

Reply via email to