Hi Folks,

I think DFDL is awesome. Think about it: DFDL is a standard language for 
describing (describe, not parse) just about any data format. Again, I emphasize 
that it's not about how to parse the data format, it's about describing the 
data format. Given a description a DFDL processor can figure out how to parse 
instances of the data format. Wow!

But there's another reason that DFDL is awesome: it forces you to be very 
precise in your description. It forces you to think very logically. It forces 
you to understand the implications of your description decisions. Let me give 
you an example of the later. 

I am dealing with a data format that consists of a sequence of lines. Here's a 
sample instance:

John Doe
OPER/XRAY//
Sally Smith

The first and last lines are just strings. Not interesting. The second line is 
the interesting one. Here's another instance:

John Doe
EXER/TANGO//
Sally Smith

As you can see, the second line starts with either OPER or EXER and terminates 
with //. The second line is also optional. That is, the second line is either 
OPER, EXER, or neither. That leads one to this description:

choice
      OPER (optional)
      EXER (optional)

However, DFDL doesn't allow branches of a choice to be optional. So, the 
correct description is:

choice
      sequence
            OPER (optional)
      sequence
            EXER (optional)

Slick, aye?

But not correct.

Let's think about this. Suppose the input is this:

John Doe
EXER/TANGO//
Sally Smith

While processing the second line, you would think that the DFDL processor would 
find that the first branch of the choice (the OPER branch) doesn't match and 
therefore the processor would process the line using the second branch. Ha! Not 
correct!

The first branch is optional. That is key! Since the second line doesn't start 
with OPER, the DFDL processor thinks, "Oh, there must be no occurrences of the 
OPER line." So, the processor moves on to the description following the choice. 
Do you see it? Do you see the problem? I hope so. This is wicked cool. As I 
worked through this example, it forced me to think very, very clearly about the 
implication of an optional OPER line. So, what's the solution? Make OPER and 
EXER mandatory:

 choice
      sequence
            OPER (mandatory)
      sequence
            EXER (mandatory)

And, place the choice inside an optional wrapper element:

OPER-EXER-wrapper (optional)
      choice
            sequence
                  OPER (mandatory)
            sequence
                  EXER (mandatory)

Now, with this input:

John Doe
EXER/TANGO//
Sally Smith

The processor will try the first branch of the choice, it fails, so it tries 
the second branch and succeeds.

With this input:

John Doe
Sally Smith

The processor will try the first branch of the choice, it fails, try the second 
branch, it fails, so there is no value for the wrapper element.

This blows my mind. I feel like this example alone boosted my IQ by 10 points. 

/Roger

Reply via email to