I think the core thing to understand what's going on is that when a
lengthPattern fails to match anything, it is not considered an error--it
is considered a zero length element.
So say we had a line like this:
"Howdy Joe"
The "Hello|Greeting" regex will not match, so the length of the label
element is considered zero, i.e.:
<label></label>
We don't consume any data, and there is no error--we will continue
parsing the next element.
The whitespace regex too wouldn't match (the next character is an 'H',
not whitespace), and so we would get another zero length element:
<whitespace></whitespace>
Again without consuming data.
We would then try the name regex, and that regex would match "Howdy", so
we would get:
<name>Howdy</name>
Which is clearly wrong. Things have really gone off the rails.
I could continue, but I think you see the point and how this no-match
behavior of lengthKind="pattern" can lead to issues.
So it's really important to be very careful when using lengthKind
pattern, and to consider what happens if the pattern doesn't match--it's
not an error like most people expect.
As to how this causes the "no forward progress" error: What's going on
is Daffodil is successfully parsing those two Hello and Greeting lines
as you would expect, and has reached the end of the data. But because
maxOccurs="unbounded", Daffodil is still going to try to parse more
greetings. But we've reach the end of data, so none of those regexes are
going to match. An since none of those are errors, all the elements will
just be zero length. So after we parse all the data, we end up parsing
another <greeting> element that is completely empty, e.g.:
<greeting>
<label></label>
<whitespace></whitespace>
<name></name>
<newline></newline>
</greeting>
So we were able to successfully parse a complete greeting element, but
consumed no actual data. And in theory, because the <greeting> element
is an unbounded array, we could just keep doing this over and over again
forever. But that means we're stuck in an infinite loop of parsing
greeting elements, but consuming no data. To avoid this infinite loop,
Daffodil detects this and errors with a message about "no forward progress".
The solution for cases like these is usually to add assertions to each
element requiring them to not have zero length. So if the regex doesn't
match, then we create an error. For example:
<xs:element name="label" type="xs:string"
dfdl:lengthPattern="Hello|Greeting">
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:assert test="{ . ne '' }" />
</xs:appinfo>
</xs:annotation>
</xs:element>
By adding this assertion to each element, a failure to match the regex
will create a parse error rather than consuming zero data, so it parsing
zero length data errors and has the normal backtracking behavior.
I want those extra credit points, so to your question about removing the
dfdl:lengthKind="delimited", one suggestion would be to make the default
lengthKind="implicit" in the dfdl:format. Then your <greetings> and
<greeting> elements will have implicit length (i.e. their length will be
the length of their children) and you won't need to specify the
lengthKind attribute for those elements. Then create a special simple
type for pattern length strings, e.g.
<xs:simpleType name="patternString" dfdl:lengthKind="pattern">
<xs:restriction base="xs:string" />
</xs:simpleType>
And then use this type for your string elements that need to be pattern
length, e.g.:
<xs:element name="foo" type="patternString" dfdl:lengthPattern="...">
So any element that is a patternString type will already have
lengthKind="pattern" attribute set, and you only need to specify the
lengthPattern attribute.
- Steve
On 11/3/21 9:43 AM, Roger L Costello wrote:
Hi Folks,
My input consists of a series of greetings on new lines. Each greeting contains
a label, whitespace, name, newline. Here is a sample input:
Hello Roger
Greeting Sally
The label is either Hello or Greeting
The name is lower and uppercase letters
Below is my DFDL schema. When I run it I get the (unhelpful) error message "Parse
error: no forward progress" (what does that mean?)
My schema uses regexes to specify the tokens (label, whitespace, name, newline). I know
that there are other ways to design the schema, but I want to do it this way. Can you
tell me why I am getting that error message, and how to fix it, please? Extra credit if
you can tell me how to get rid of those two dfdl:lengthKind="delimited"
attributes.
<xs:element name="greetings" dfdl:lengthKind="delimited">
<xs:complexType>
<xs:sequence>
<xs:element name="greeting" maxOccurs="unbounded" dfdl:lengthKind="delimited"
dfdl:occursCountKind="implicit">
<xs:complexType>
<xs:sequence>
<xs:element name="label" type="xs:string"
dfdl:lengthPattern="Hello|Greeting"/>
<xs:element name="whitespace" type="xs:string"
dfdl:lengthPattern="[ \t]+"/>
<xs:element name="name" type="xs:string"
dfdl:lengthPattern="[a-zA-Z]+"/>
<xs:element name="newline" type="xs:string"
dfdl:lengthPattern="\n"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
Here is the complete schema:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/">
<xs:annotation>
<xs:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format
textBidi="no"
separatorSuppressionPolicy="trailingEmpty"
floating="no"
encodingErrorPolicy="replace"
outputNewLine="%CR;%LF;"
leadingSkip="0"
trailingSkip="0"
alignment="1"
alignmentUnits="bytes"
textPadKind="none"
textTrimKind="none"
truncateSpecifiedLengthString="no"
escapeSchemeRef=""
representation="text"
encoding="ASCII"
separator = ""
initiator = ""
terminator = ""
ignoreCase = "yes"
sequenceKind="ordered"
initiatedContent="no"
fillByte="%SP;"
lengthUnits="characters"
lengthKind="pattern"
/>
</xs:appinfo>
</xs:annotation>
<xs:element name="greetings" dfdl:lengthKind="delimited">
<xs:complexType>
<xs:sequence>
<xs:element name="greeting" maxOccurs="unbounded"
dfdl:lengthKind="delimited" dfdl:occursCountKind="implicit">
<xs:complexType>
<xs:sequence>
<xs:element name="label" type="xs:string"
dfdl:lengthPattern="Hello|Greeting"/>
<xs:element name="whitespace" type="xs:string"
dfdl:lengthPattern="[ \t]+"/>
<xs:element name="name" type="xs:string"
dfdl:lengthPattern="[a-zA-Z]+"/>
<xs:element name="newline" type="xs:string"
dfdl:lengthPattern="\n"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>