On Sat, Jun 16, 2012 at 10:16 AM, Adrian Thurston <[email protected]> wrote: > I'm in the midst of altering ragel internals to bring the implementation of > conditions right into the core data structure. In previous versions it could > be described as an add-on. The benefits will be fewer "alphabet space > exhausted" exceptions when developing grammars. Also, the ragel internals > will become easier to understand. > > Next I want to add in some language constructs that support the common usage > patterns that have emerged for parsing binary protocols using ragel's > conditions. They shows up for me in my work, where I use Ragel for binary > protocols. > > There are different ways to approach value-specified repetition. Some ways > are more awkward than others. Some work well if there is at least one item. > Others work when you can have a zero-length list of items.
What I'd really love to see is an Erlang style binary parser in Ragel: Readable intro: http://pupeno.com/2006/10/24/erlang-the-language-for-network-programming-issue-2-binary-pattern-matching/ The formal version: http://www.erlang.org/doc/programming_examples/bit_syntax.html http://www.erlang.org/doc/efficiency_guide/binaryhandling.html Example from the syntax link above, parsing IP header: =========== Example 4: The following is a more elaborate example of matching, where Dgram is bound to the consecutive bytes of an IP datagram of IP protocol version 4, and where we want to extract the header and the data of the datagram: -define(IP_VERSION, 4). -define(IP_MIN_HDR_LEN, 5). DgramSize = byte_size(Dgram), case Dgram of <<?IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16, ID:16, Flgs:3, FragOff:13, TTL:8, Proto:8, HdrChkSum:16, SrcIP:32, DestIP:32, RestDgram/binary>> when HLen>=5, 4*HLen=<DgramSize -> OptsLen = 4*(HLen - ?IP_MIN_HDR_LEN), <<Opts:OptsLen/binary,Data/binary>> = RestDgram, ... end. Here the segment corresponding to the Opts variable has a type modifier specifying that Opts should bind to a binary. All other variables have the default type equal to unsigned integer. An IP datagram header is of variable length, and its length - measured in the number of 32-bit words - is given in the segment corresponding to HLen, the minimum value of which is 5. It is the segment corresponding to Opts that is variable: if HLen is equal to 5, Opts will be an empty binary. The tail variables RestDgram and Data bind to binaries, as all tail variables do. Both may bind to empty binaries. If the first 4-bits segment of Dgram is not equal to 4, or if HLen is less than 5, or if the size of Dgram is less than 4*HLen, the match of Dgram fails. =========== The over simplistic way to look at it is given a binary string of length X, break X down into fields of size Y: IP_VERSION gets the first first four bits of X. HLen then next four bits of X SrvcType the next eight bits of X and on it goes. There is of course more to it to get true binary list comprehension. _______________________________________________ ragel-users mailing list [email protected] http://www.complang.org/mailman/listinfo/ragel-users
