I think the core thing to understand what's going on is that when a lengthPattern fails to match anything, it is not considered an error--it is considered a zero length element.

So say we had a line like this:

  "Howdy Joe"

The "Hello|Greeting" regex will not match, so the length of the label element is considered zero, i.e.:

  <label></label>

We don't consume any data, and there is no error--we will continue parsing the next element.

The whitespace regex too wouldn't match (the next character is an 'H', not whitespace), and so we would get another zero length element:

  <whitespace></whitespace>

Again without consuming data.

We would then try the name regex, and that regex would match "Howdy", so we would get:

  <name>Howdy</name>

Which is clearly wrong. Things have really gone off the rails.

I could continue, but I think you see the point and how this no-match behavior of lengthKind="pattern" can lead to issues.

So it's really important to be very careful when using lengthKind pattern, and to consider what happens if the pattern doesn't match--it's not an error like most people expect.


As to how this causes the "no forward progress" error: What's going on is Daffodil is successfully parsing those two Hello and Greeting lines as you would expect, and has reached the end of the data. But because maxOccurs="unbounded", Daffodil is still going to try to parse more greetings. But we've reach the end of data, so none of those regexes are going to match. An since none of those are errors, all the elements will just be zero length. So after we parse all the data, we end up parsing another <greeting> element that is completely empty, e.g.:

  <greeting>
    <label></label>
    <whitespace></whitespace>
    <name></name>
    <newline></newline>
  </greeting>

So we were able to successfully parse a complete greeting element, but consumed no actual data. And in theory, because the <greeting> element is an unbounded array, we could just keep doing this over and over again forever. But that means we're stuck in an infinite loop of parsing greeting elements, but consuming no data. To avoid this infinite loop, Daffodil detects this and errors with a message about "no forward progress".

The solution for cases like these is usually to add assertions to each element requiring them to not have zero length. So if the regex doesn't match, then we create an error. For example:

<xs:element name="label" type="xs:string" dfdl:lengthPattern="Hello|Greeting">
    <xs:annotation>
      <xs:appinfo source="http://www.ogf.org/dfdl/";>
        <dfdl:assert test="{ . ne '' }" />
      </xs:appinfo>
    </xs:annotation>
  </xs:element>

By adding this assertion to each element, a failure to match the regex will create a parse error rather than consuming zero data, so it parsing zero length data errors and has the normal backtracking behavior.


I want those extra credit points, so to your question about removing the dfdl:lengthKind="delimited", one suggestion would be to make the default lengthKind="implicit" in the dfdl:format. Then your <greetings> and <greeting> elements will have implicit length (i.e. their length will be the length of their children) and you won't need to specify the lengthKind attribute for those elements. Then create a special simple type for pattern length strings, e.g.

  <xs:simpleType name="patternString" dfdl:lengthKind="pattern">
    <xs:restriction base="xs:string" />
  </xs:simpleType>

And then use this type for your string elements that need to be pattern length, e.g.:

  <xs:element name="foo" type="patternString" dfdl:lengthPattern="...">

So any element that is a patternString type will already have lengthKind="pattern" attribute set, and you only need to specify the lengthPattern attribute.

- Steve




On 11/3/21 9:43 AM, Roger L Costello wrote:
Hi Folks,

My input consists of a series of greetings on new lines. Each greeting contains 
a label, whitespace, name, newline. Here is a sample input:

Hello Roger
Greeting Sally

The label is either Hello or Greeting
The name is lower and uppercase letters

Below is my DFDL schema. When I run it I get the (unhelpful) error message "Parse 
error: no forward progress" (what does that mean?)

My schema uses regexes to specify the tokens (label, whitespace, name, newline). I know 
that there are other ways to design the schema, but I want to do it this way. Can you 
tell me why I am getting that error message, and how to fix it, please? Extra credit if 
you can tell me how to get rid of those two dfdl:lengthKind="delimited" 
attributes.

<xs:element name="greetings" dfdl:lengthKind="delimited">
     <xs:complexType>
         <xs:sequence>
             <xs:element name="greeting" maxOccurs="unbounded" dfdl:lengthKind="delimited" 
dfdl:occursCountKind="implicit">
                 <xs:complexType>
                     <xs:sequence>
                         <xs:element name="label" type="xs:string"
                             dfdl:lengthPattern="Hello|Greeting"/>
                         <xs:element name="whitespace" type="xs:string"
                             dfdl:lengthPattern="[ \t]+"/>
                         <xs:element name="name" type="xs:string"
                             dfdl:lengthPattern="[a-zA-Z]+"/>
                         <xs:element name="newline" type="xs:string"
                             dfdl:lengthPattern="\n"/>
                     </xs:sequence>
                 </xs:complexType>
             </xs:element>
         </xs:sequence>
     </xs:complexType>
</xs:element>

Here is the complete schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";>
<xs:annotation>
         <xs:appinfo source="http://www.ogf.org/dfdl/";>
             <dfdl:format
                 textBidi="no"
                 separatorSuppressionPolicy="trailingEmpty"
                 floating="no"
                 encodingErrorPolicy="replace"
                 outputNewLine="%CR;%LF;"
                 leadingSkip="0"
                 trailingSkip="0"
                 alignment="1"
                 alignmentUnits="bytes"
                 textPadKind="none"
                 textTrimKind="none"
                 truncateSpecifiedLengthString="no"
                 escapeSchemeRef=""
                 representation="text"
                 encoding="ASCII"
                 separator = ""
                 initiator = ""
                 terminator = ""
                 ignoreCase = "yes"
                 sequenceKind="ordered"
                 initiatedContent="no"
                 fillByte="%SP;"
                 lengthUnits="characters"
                 lengthKind="pattern"
             />
         </xs:appinfo>
     </xs:annotation>
<xs:element name="greetings" dfdl:lengthKind="delimited">
         <xs:complexType>
             <xs:sequence>
                 <xs:element name="greeting" maxOccurs="unbounded" 
dfdl:lengthKind="delimited" dfdl:occursCountKind="implicit">
                     <xs:complexType>
                         <xs:sequence>
                             <xs:element name="label" type="xs:string"
                                 dfdl:lengthPattern="Hello|Greeting"/>
                             <xs:element name="whitespace" type="xs:string"
                                 dfdl:lengthPattern="[ \t]+"/>
                             <xs:element name="name" type="xs:string"
                                 dfdl:lengthPattern="[a-zA-Z]+"/>
                             <xs:element name="newline" type="xs:string"
                                 dfdl:lengthPattern="\n"/>
                         </xs:sequence>
                     </xs:complexType>
                 </xs:element>
             </xs:sequence>
         </xs:complexType>
     </xs:element>
</xs:schema>


Reply via email to