Thank you Steve. Terrific explanation. I tried the approach you described - dfdl:lengthKind="pattern" dfdl:lengthPattern="ABC|AB|AC|A" - and it worked great.
I also tried using enumeration facets coupled with dfdl:checkConstraints within dfdl:assert <xs:element name="item1"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert test="{ dfdl:checkConstraints(.) }" message="The value of item1 is not one of the allowable values" /> </xs:appinfo> </xs:annotation> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="A" /> <xs:enumeration value="ABC" /> <xs:enumeration value="AB" /> <xs:enumeration value="AC" /> </xs:restriction> </xs:simpleType> </xs:element> But that did not work. Why does that not work? /Roger -----Original Message----- From: Steve Lawrence <slawre...@apache.org> Sent: Monday, July 12, 2021 2:39 PM To: users@daffodil.apache.org Subject: [EXT] Re: How to specify data with two fields, no delimiter, variable length? In cases like these, you need to use dfdl:lengthKind="pattern" and a regular expression to define the length of the first item. There's lots of different regexs depending on what kinds of infosets you want to allow. For example, one approach for the first item is a very strict regex that matches exactly one of the four values, e.g. <xs:element name="item" type="xs:string" dfdl:lengthKind="pattern" dfdl:lengthPattern="ABC|AB|AC|A" /> With this approach, the item will get a non-zero length if it is one of those items. Otherwise the item will be the empty string. And if you don't want empty string to be allowed, you need to add an assert that the length is greater than zero. Also, note that order in the regex matters so it matches the longest possibility first. On the other end of the spectrum, you could instead model the first item to match as many non-digits as possible: <xs:element name="item" type="xs:string" dfdl:lengthKind="pattern" dfdl:lengthPattern="[^0-9]*" /> This will match any of the four allowed values, but will also match anything else up to the first digit. So this could potentially produce infosets with an item value of XYZ, for example. In some cases, you might actually want this--we might consider the data to be "well-formed" but not "valid". So you still get an infoset, it's just not "valid". Whereas in the first case, you could only get a valid infoset. You'll probably also need to use regex length for matching the numeric item if there's no delimiter after the number. So putting it together, and using the second approach for both items, you might do something like this: <xs:sequence> <xs:element name="item1 type="xs:string" dfdl:lengthKind="pattern" dfdl:lengthPattern="[^0-9]*" /> <xs:element name="item2" type="xs:int" dfdl:lengthKind="pattern" dfdl:lengthPattern="[0-9]*" /> </xs:sequence> So the first item is string parsing as many non-digits as possible, and the second is an int parsing as many digits as possible. Note that this approach probably should have limits on the regex length in case the data is bad/malformed. For example, if the data didn't contain numbers then item1 would just consume the entire data. So instead of *, you might instead want to use something like "{0,10}" for both regexes. - Steve On 7/12/21 2:05 PM, Roger L Costello wrote: > Hi Folks, > > I have a data field composed to two items. > > The values for the first item can be enumerated: > > A > ABC > AB > AC > > The values for the second item is any integer 0-999 > > So, here is a same data field: > > A250 > > How do I parse that using DFDL? I reckon I'm stuck. > > /Roger >