Hi Mike,

I read the web page you referenced. I don’t see where it says that the order of 
regex choice alternatives matter. Would you quote the sentence that says that, 
please?

/Roger

From: Mike Beckerle <mbecke...@apache.org>
Sent: Wednesday, April 6, 2022 1:48 PM
To: users@daffodil.apache.org
Subject: [EXT] Re: Bug in Daffodil

This is standard regex behavior. Order of the regex choice alternatives matters 
very much. Authors of regex must organize for longest matches to be attempted 
first.

See: https://www.regular-expressions.info/alternation.html

This is one of the reasons DFDL delimiters don't let you just write a regex.

For delimiter matching, DFDL insists on the longest match, where standard regex 
behavior is simply "always eager" behavior.





On Wed, Apr 6, 2022 at 1:34 PM Roger L Costello 
<coste...@mitre.org<mailto:coste...@mitre.org>> wrote:
With this input:

GENTEXT/FOO/TAS//

The following DFDL generates the dreaded “Left over data” error:

<xs:element name="GeneralTextInfo" minOccurs="0" dfdl:initiator="GENTEXT" 
dfdl:terminator="//">
    <xs:complexType>
        <xs:sequence dfdl:separator="/" dfdl:separatorPosition="prefix">
            <xs:element name="TextIndicator" minOccurs="0" nillable="true" 
type="non-zero-length-string" dfdl:lengthPattern="[A-Z ]+"/>
            <xs:element name="FreeText" minOccurs="0" nillable="true" 
type="non-zero-length-string" dfdl:lengthPattern="[A-Z]|([A-Z][/A-Z]*[A-Z])"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>

If I reverse the regex for FreeText:

<xs:element name="GeneralTextInfo" minOccurs="0" dfdl:initiator="GENTEXT" 
dfdl:terminator="//">
    <xs:complexType>
        <xs:sequence dfdl:separator="/" dfdl:separatorPosition="prefix">
            <xs:element name="TextIndicator" minOccurs="0" nillable="true" 
type="non-zero-length-string" dfdl:lengthPattern="[A-Z ]+"/>
            <xs:element name="FreeText" minOccurs="0" nillable="true" 
type="non-zero-length-string" dfdl:lengthPattern="([A-Z][/A-Z]*[A-Z])|[A-Z]"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>

Then the error goes away.

This seems like a bug in Daffodil. The order in which a regex OR clause is 
expressed should not matter.

/Roger

Reply via email to