Hi Folks, A couple of weeks ago Mike Beckerle pointed out that many data formats contain things like this:
A number, N N occurrences of something For example, 3 followed by the names of three students: 3 John Doe Sally Smith Judy Jones How should that be parsed? Using the DFDL occursCount and occursCountKind="expression" and hiddenGroup you can parse the input to ensure that exactly three student names are consumed. The output is this XML: <Students> <name>John Doe</name> <name>Sally Smith</name> <name>Judy Jones</name> </Students> But is it really the job of the parser to "ensure that exactly three student names are consumed"? I raised this question to the compiler experts on the compilers Usenet list. Here's what one person wrote: > I would contend that in your example the /syntax/ of lists is really a number > followed by zero or more strings (number string*), and that verifying the > string > count is semantics, not syntax. I believe that, whenever possible, semantics > are > best left until after parsing is finished. In other words, keep your DFDL schema simple: forget occursCountKind="expression" and hiddenGroup; just parse the number and the following strings. The output should be this: <number>3</number> <Students> <name>John Doe</name> <name>Sally Smith</name> <name>Judy Jones</name> </Students> If you need to "ensure that there are 3 student names" you can do that check *after* parsing. This is the Minimalist DFDL philosophy. /Roger