This is standard regex behavior. Order of the regex choice alternatives
matters very much. Authors of regex must organize for longest matches to be
attempted first.

See: https://www.regular-expressions.info/alternation.html

This is one of the reasons DFDL delimiters don't let you just write a
regex.

For delimiter matching, DFDL insists on the longest match, where standard
regex behavior is simply "always eager" behavior.





On Wed, Apr 6, 2022 at 1:34 PM Roger L Costello <coste...@mitre.org> wrote:

> With this input:
>
>
>
> GENTEXT/FOO/TAS//
>
>
>
> The following DFDL generates the dreaded “Left over data” error:
>
>
>
> <xs:element name="GeneralTextInfo" minOccurs="0" dfdl:initiator="GENTEXT"
> dfdl:terminator="//">
>     <xs:complexType>
>         <xs:sequence dfdl:separator="/" dfdl:separatorPosition="prefix">
>             <xs:element name="TextIndicator" minOccurs="0" nillable="true"
> type="non-zero-length-string" dfdl:lengthPattern="[A-Z ]+"/>
>             <xs:element name="FreeText" minOccurs="0" nillable="true"
> type="non-zero-length-string"
> dfdl:lengthPattern="[A-Z]|([A-Z][/A-Z]*[A-Z])"/>
>         </xs:sequence>
>     </xs:complexType>
> </xs:element>
>
>
>
> If I reverse the regex for FreeText:
>
>
>
> <xs:element name="GeneralTextInfo" minOccurs="0" dfdl:initiator="GENTEXT"
> dfdl:terminator="//">
>     <xs:complexType>
>         <xs:sequence dfdl:separator="/" dfdl:separatorPosition="prefix">
>             <xs:element name="TextIndicator" minOccurs="0" nillable="true"
> type="non-zero-length-string" dfdl:lengthPattern="[A-Z ]+"/>
>             <xs:element name="FreeText" minOccurs="0" nillable="true"
> type="non-zero-length-string"
> dfdl:lengthPattern="([A-Z][/A-Z]*[A-Z])|[A-Z]"/>
>         </xs:sequence>
>     </xs:complexType>
> </xs:element>
>
>
>
> Then the error goes away.
>
>
>
> This seems like a bug in Daffodil. The order in which a regex OR clause is
> expressed should not matter.
>
>
>
> /Roger
>

Reply via email to