On that page, paragraph 4 under the heading "Remember That the Regex Engine is Eager"
"I already explained that the regex engine is eager<https://www.regular-expressions.info/engine.html>. It stops searching as soon as it finds a valid match. The consequence is that in certain situations, the order of the alternatives matters. " It then goes on to explain how "Get|GetValue" the order matters, and that the 2nd alternative is only tried if the first one fails. ________________________________ From: Roger L Costello <coste...@mitre.org> Sent: Wednesday, April 6, 2022 1:54 PM To: users@daffodil.apache.org <users@daffodil.apache.org>; Beckerle, Mike <mbecke...@owlcyberdefense.com> Subject: Re: Bug in Daffodil Hi Mike, I read the web page you referenced. I don’t see where it says that the order of regex choice alternatives matter. Would you quote the sentence that says that, please? /Roger From: Mike Beckerle <mbecke...@apache.org> Sent: Wednesday, April 6, 2022 1:48 PM To: users@daffodil.apache.org Subject: [EXT] Re: Bug in Daffodil This is standard regex behavior. Order of the regex choice alternatives matters very much. Authors of regex must organize for longest matches to be attempted first. See: https://www.regular-expressions.info/alternation.html This is one of the reasons DFDL delimiters don't let you just write a regex. For delimiter matching, DFDL insists on the longest match, where standard regex behavior is simply "always eager" behavior. On Wed, Apr 6, 2022 at 1:34 PM Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>> wrote: With this input: GENTEXT/FOO/TAS// The following DFDL generates the dreaded “Left over data” error: <xs:element name="GeneralTextInfo" minOccurs="0" dfdl:initiator="GENTEXT" dfdl:terminator="//"> <xs:complexType> <xs:sequence dfdl:separator="/" dfdl:separatorPosition="prefix"> <xs:element name="TextIndicator" minOccurs="0" nillable="true" type="non-zero-length-string" dfdl:lengthPattern="[A-Z ]+"/> <xs:element name="FreeText" minOccurs="0" nillable="true" type="non-zero-length-string" dfdl:lengthPattern="[A-Z]|([A-Z][/A-Z]*[A-Z])"/> </xs:sequence> </xs:complexType> </xs:element> If I reverse the regex for FreeText: <xs:element name="GeneralTextInfo" minOccurs="0" dfdl:initiator="GENTEXT" dfdl:terminator="//"> <xs:complexType> <xs:sequence dfdl:separator="/" dfdl:separatorPosition="prefix"> <xs:element name="TextIndicator" minOccurs="0" nillable="true" type="non-zero-length-string" dfdl:lengthPattern="[A-Z ]+"/> <xs:element name="FreeText" minOccurs="0" nillable="true" type="non-zero-length-string" dfdl:lengthPattern="([A-Z][/A-Z]*[A-Z])|[A-Z]"/> </xs:sequence> </xs:complexType> </xs:element> Then the error goes away. This seems like a bug in Daffodil. The order in which a regex OR clause is expressed should not matter. /Roger