This is standard regex behavior. Order of the regex choice alternatives matters very much. Authors of regex must organize for longest matches to be attempted first.
See: https://www.regular-expressions.info/alternation.html This is one of the reasons DFDL delimiters don't let you just write a regex. For delimiter matching, DFDL insists on the longest match, where standard regex behavior is simply "always eager" behavior. On Wed, Apr 6, 2022 at 1:34 PM Roger L Costello <coste...@mitre.org> wrote: > With this input: > > > > GENTEXT/FOO/TAS// > > > > The following DFDL generates the dreaded “Left over data” error: > > > > <xs:element name="GeneralTextInfo" minOccurs="0" dfdl:initiator="GENTEXT" > dfdl:terminator="//"> > <xs:complexType> > <xs:sequence dfdl:separator="/" dfdl:separatorPosition="prefix"> > <xs:element name="TextIndicator" minOccurs="0" nillable="true" > type="non-zero-length-string" dfdl:lengthPattern="[A-Z ]+"/> > <xs:element name="FreeText" minOccurs="0" nillable="true" > type="non-zero-length-string" > dfdl:lengthPattern="[A-Z]|([A-Z][/A-Z]*[A-Z])"/> > </xs:sequence> > </xs:complexType> > </xs:element> > > > > If I reverse the regex for FreeText: > > > > <xs:element name="GeneralTextInfo" minOccurs="0" dfdl:initiator="GENTEXT" > dfdl:terminator="//"> > <xs:complexType> > <xs:sequence dfdl:separator="/" dfdl:separatorPosition="prefix"> > <xs:element name="TextIndicator" minOccurs="0" nillable="true" > type="non-zero-length-string" dfdl:lengthPattern="[A-Z ]+"/> > <xs:element name="FreeText" minOccurs="0" nillable="true" > type="non-zero-length-string" > dfdl:lengthPattern="([A-Z][/A-Z]*[A-Z])|[A-Z]"/> > </xs:sequence> > </xs:complexType> > </xs:element> > > > > Then the error goes away. > > > > This seems like a bug in Daffodil. The order in which a regex OR clause is > expressed should not matter. > > > > /Roger >