It is indeed a common problem. It is purposefully left in the hands of
the programmer. This is due to a few factors.
-There is no general solution that doesn't involve memory allocation.
-There are many different ways to approach memory allocation
-There are many use cases that have always input in one buffer
When I need a general solution I use automatically growing buffers (to a
limit). See the DSNPd source for an example. I don't think that buffer
class is limited, but it should be.
http://svn.complang.org/choicesocial/trunk/dsnpd/parser.rl
There are other approaches though. See the Ragel manual (5.9) for a
short discussion.
Regards,
Adrian
On 11-01-20 03:26 PM, Benjamin van der Veen wrote:
Hello,
I am using Ragel to make an HTTP parser. Feel free to tell me this is a
terrible idea. ;)
It seems to me that a common problem faced by users of Ragel is that they do
not know in advance where (with respect to the grammar being parsed) the
boundaries of buffers that they feed the parser are going to be. For example, I
can easily make a Ragel grammar which will parse the following using only
entering and leaving actions:
"GET /foo HTTP/1.1\r\nBar: Baz\r\n\r\n"
However the parser breaks if I feed it the same data across multiple buffers
(as would be the case when reading chunks of data from a network socket):
"GE"
"T /f"
"oo HTTP/1.1\r"
"\nBar: Baz\r\n\r\n"
I found that this can be mitigated against by using EOF-leaving actions
(%/some_action) and always setting eof to pe to cause the EOF-leaving actions
to occur. However I'm finding that it isn't consistent and leads to unexpected
behavior in some cases. Note that I am using the regular expression syntax, not
the state chart syntax.
What is the recommended approach to this problem? My intuition is that a
properly-specified state machine should work regardless of how data is fed to
it and Ragel should make this opaque to the user—it seems to me that processing
data across multiple buffers would be a very common problem that Ragel would
solve for the user, but I may be mistaken.
In general I'm rather confused about how EOF actions are handled and when
entering or leaving actions are treated as EOF actions. I've pored over the
manual but I feel like it's all predicated on some knowledge that I don't have
and am unsure where to look to find. In particular the first two paragraphs of
section 3.1.4 (Leaving Actions) are almost completely opaque to me.
Cheers!
Benjamin van der Veen
_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users
_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users