On Sat, Jun 16, 2012 at 10:16 AM, Adrian Thurston <[email protected]> wrote:
> I'm in the midst of altering ragel internals to bring the implementation of
> conditions right into the core data structure. In previous versions it could
> be described as an add-on. The benefits will be fewer "alphabet space
> exhausted" exceptions when developing grammars. Also, the ragel internals
> will become easier to understand.
>
> Next I want to add in some language constructs that support the common usage
> patterns that have emerged for parsing binary protocols using ragel's
> conditions. They shows up for me in my work, where I use Ragel for binary
> protocols.
>
> There are different ways to approach value-specified repetition. Some ways
> are more awkward than others. Some work well if there is at least one item.
> Others work when you can have a zero-length list of items.

What I'd really love to see is an Erlang style binary parser in Ragel:

Readable intro:
http://pupeno.com/2006/10/24/erlang-the-language-for-network-programming-issue-2-binary-pattern-matching/

The formal version:

http://www.erlang.org/doc/programming_examples/bit_syntax.html

http://www.erlang.org/doc/efficiency_guide/binaryhandling.html

Example from the syntax link above, parsing IP header:

===========
Example 4: The following is a more elaborate example of matching,
where Dgram is bound to the consecutive bytes of an IP datagram of IP
protocol version 4, and where we want to extract the header and the
data of the datagram:
-define(IP_VERSION, 4).
-define(IP_MIN_HDR_LEN, 5).

DgramSize = byte_size(Dgram),
case Dgram of
    <<?IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16,
      ID:16, Flgs:3, FragOff:13,
      TTL:8, Proto:8, HdrChkSum:16,
      SrcIP:32,
      DestIP:32, RestDgram/binary>> when HLen>=5, 4*HLen=<DgramSize ->
        OptsLen = 4*(HLen - ?IP_MIN_HDR_LEN),
        <<Opts:OptsLen/binary,Data/binary>> = RestDgram,
    ...
end.

Here the segment corresponding to the Opts variable has a type
modifier specifying that Opts should bind to a binary. All other
variables have the default type equal to unsigned integer.

An IP datagram header is of variable length, and its length - measured
in the number of 32-bit words - is given in the segment corresponding
to HLen, the minimum value of which is 5. It is the segment
corresponding to Opts that is variable: if HLen is equal to 5, Opts
will be an empty binary.

The tail variables RestDgram and Data bind to binaries, as all tail
variables do. Both may bind to empty binaries.

If the first 4-bits segment of Dgram is not equal to 4, or if HLen is
less than 5, or if the size of Dgram is less than 4*HLen, the match of
Dgram fails.

===========
The over simplistic way to look at it is given a binary string of
length X, break X down into fields of size Y:

IP_VERSION gets the first first four bits of X.
HLen then next four bits of X
SrvcType the next eight bits of X and on it goes.

There is of course more to it to get true binary list comprehension.

_______________________________________________
ragel-users mailing list
[email protected]
http://www.complang.org/mailman/listinfo/ragel-users

Reply via email to