Hi Folks,

A couple of weeks ago Mike Beckerle pointed out that many data formats contain 
things like this:

A number, N
N occurrences of something

For example, 3 followed by the names of three students:

3
John Doe
Sally Smith
Judy Jones

How should that be parsed? Using the DFDL occursCount and 
occursCountKind="expression" and hiddenGroup you can parse the input to ensure 
that exactly three student names are consumed. The output is this XML:

<Students>
    <name>John Doe</name>
    <name>Sally Smith</name>
    <name>Judy Jones</name>
</Students>

But is it really the job of the parser to "ensure that exactly three student 
names are consumed"?

I raised this question to the compiler experts on the compilers Usenet list. 
Here's what one person wrote:

> I would contend that in your example the /syntax/ of lists is really a number 
> followed by zero or more strings (number string*), and that verifying the 
> string 
> count is semantics, not syntax.  I believe that, whenever possible, semantics 
> are 
> best left until after parsing is finished.

In other words, keep your DFDL schema simple: forget 
occursCountKind="expression" and hiddenGroup; just parse the number and the 
following strings. The output should be this:

<number>3</number>
<Students>
    <name>John Doe</name>
    <name>Sally Smith</name>
    <name>Judy Jones</name>
</Students>

If you need to "ensure that there are 3 student names" you can do that check 
*after* parsing.

This is the Minimalist DFDL philosophy.

/Roger


Reply via email to