Thanks Mike.

Okay, the alternatives in a pattern facet must be sorted longest-to-shortest.

What if the alternatives are expressed in an enumeration facet, e.g.,

<simpleType>
    <restriction base=”string”>
        <enumeration value=”abc”/>
        <enumeration value=”abcd”/>
    </restriction>
</simpleType>

Do I also need to sort the enumeration values in longest-to-shortest order?

/Roger

From: Mike Beckerle <mbecke...@apache.org>
Sent: Friday, August 5, 2022 5:50 PM
To: users@daffodil.apache.org
Subject: [EXT] Re: Do I need to sort the xs:pattern regex alternatives 
longest-to-shortest?

Yes you do. All the regex engines I know are greedy.

Besides regexs just being fussy, this is the main reason DFDL has a delimiter 
language that is it's own thing. Because the delimiters are specified in 
different places, not all together as in a regex. Hence the user has no 
opportunity to sort longest to shortest, so DFDL delimiters match all the 
possible delimiters that can appear at a point with longest match preferred.



Il Ven 5 Ago 2022, 1:54 PM Roger L Costello 
<coste...@mitre.org<mailto:coste...@mitre.org>> ha scritto:
Hi Folks,

Recall that when using dfdl:lengthPattern you must specify its regex 
alternatives longest-to-shortest. For example, if you specify this:

dfdl:lengthPattern="abc|abcd"

then you will get a "left over data" error message.

So you must sort the alternatives in longest-to-shortest order. That is a 
hassle.

The "-V limited" option changes things. It enables me to abandon 
dfdl:lengthPattern and instead use the XSD pattern facet:

<simpleType>
    <restriction base="string">
        <pattern value="abc|abcd"/>
    </restriction>
</simpleType>

Question: Do I need to sort the pattern facet alternatives in 
longest-to-shortest order? I am hoping the answer is "no".

/Roger

Reply via email to