Thanks Steve. So now we have two solutions: the solution that Mike identified using dfdl:checkConstraints(.), and the solution that Steve identified switching the order of the choice branches and using fn:nilled(.).
I am writing this stuff up. Which solution should I recommend? I prefer Steve's solution since it is simpler and easier to describe. Thoughts? /Roger -----Original Message----- From: Steve Lawrence <slawre...@apache.org> Sent: Thursday, August 25, 2022 8:54 AM To: users@daffodil.apache.org Subject: [EXT] Re: Daffodil does not correctly parse variable length, nillable elements with complexType Ah yeah, you're right. If Origin_ were first, you would also need an assert that Origin_ must be nilled to cause it to backtrack and try the other branch if it wasn't the nil value, e.g.: <xs:element name="Origin_" ... > <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert>{ fn:nilled(.) }</dfdl:assert> </xs:appinfo> </xs:annotation> </xs:element> Note that this is probably a good idea to add regardless of order. Otherwise, if there was a parse error in the Origin element (which would occur with Mike's changes on invalid data) then Origin_ would accept any string, but it is intended to only parse a nil element. On 8/25/22 7:47 AM, Roger L Costello wrote: > Thanks Mike and Steve! > > I need to study carefully what Mike said. > > Steve, I think your suggestion is not correct. I did as you suggested and > reversed the order of the branches in the choice. Now, with this input: > > John Doe/2006N-05912E/Sally Smith > > I get this XML: > > <Test> > <A>John Doe</A> > <Origin_>2006N-05912E</Origin_> > <B>Sally Smith</B> > </Test> > > which is not correct. > > I conclude that switching the order of the branches in the choice is not > correct. Do you concur? > > /Roger > > -----Original Message----- > From: Steve Lawrence <slawre...@apache.org> > Sent: Thursday, August 25, 2022 7:38 AM > To: users@daffodil.apache.org > Subject: [EXT] Re: Daffodil does not correctly parse variable length, > nillable elements with complexType > > Another option, put your nillable type as the first branch in the > choice. This way Daffodil will attempt to parse the nillable type first, > and will only attempt to parse the complex Origin. > > You'll still likely want the validation that Mike suggestions so that > when something fails it fails immediately instead of happily continuing > off the rails. > > > On 8/25/22 7:30 AM, Mike Beckerle wrote: >> I think I know what is happening. >> >> In the battle of delimiters vs. nested explicit length, explicit wins. >> >> So if you have abc/-/cef >> >> but after parsing abc then finding the separator /, the next field is >> latitudeDegrees with explicit length 2, that "wins" and "-/" are the >> characters >> of that string. >> >> Validation will then issue a validation warning because Daffodil's "limited" >> validation is done as the elements are parsed. >> >> This does not cause backtracking, it's just a "warning" that the seemingly >> well-formed data is invalid. >> >> Then latitudeMinutes is parsed, and that uses the ever problematic lengthKind >> pattern, which succeeds, with a zero-length string, which then also causes a >> validation error. >> >> Again because this validation error because this, now zero-length string >> doesn't look like the digits you expect. >> >> Then it parses the hyphen element, which is just a string of length 1, >> >> .... I'll stop here because things are clearly off the rails. >> >> Here's my suggestion for how to fix this and get Daffodil to magically do >> what >> you want, which is to pay attention to the facets. >> >> <!-- vString = 'validated string'. Facets are checked while parsing. --> >> <simpleType name="vString"> >> <annotation><appinfo source="http://www.ogf.org/dfdl/ >> <http://www.ogf.org/dfdl/>"> >> <dfdl:assert message="Invalid value">{ dfdl:checkConstraints(.) >> }</dfdl:assert> >> </appinfo></annotation> >> <restriction base="xs:string"/> >> </simpleType> >> >> Define all your strings with vString as your type, and it should behave much >> more like you expect. >> >> Now normally I tell people not to call checkConstraints(.) on everything >> because >> it fails to distinguish well-formed data from invalid data, and often one >> wants >> the parse to succeed even if the data is invalid. >> >> In your case things are different. You have not provided enough information >> in >> the DFDL properties to parse this data. The facets are necessary information >> to >> successfully parse it. >> >> You will want to complement vString with use of discriminators. For example I >> think your schema should have a discriminator after the latitudeDegrees >> element >> because if you successfully parse that element, backtracking to the nilled >> case >> no longer makes sense. >> >> >> >> >> On Thu, Aug 25, 2022 at 7:01 AM Roger L Costello <coste...@mitre.org >> <mailto:coste...@mitre.org>> wrote: >> >> Hi Folks, >> >> Here are two sample inputs: >> >> John Doe/2006N-05912E/Sally Smith >> John Doe/-/Sally Smith >> >> It is the field in the middle that is of interest. >> >> The field is a composite field, i.e., it consists of a series of >> parts: lat >> degrees, lat minutes, lat hemisphere, hyphen, long degrees, long >> minutes, >> long hemisphere. No separator between the parts. >> >> The field is nillable and the hyphen is the nil value. >> >> The first input shown above succeeds, the second fails to parse. >> >> What we have here is a variable length, nillable element with a >> complexType >> and the nil value is not %ES;. As we have determined in previous posts, >> Daffodil does not support this. So, the workaround is to place the >> element >> in a choice, where the first branch of the choice is the element minus >> the >> nillable stuff and the second branch is a plain string element that is >> nillable. Well, I implemented that and Daffodil complains: >> >> [error] Parse Error: Failed to parse infix separator. Cause: Parse >> Error: >> Separator '/' not found >> >> When I use the -V limited parse option I get a completely different >> set of >> error messages, e.g.: >> >> [error] Validation Error: LatitudeMinutes failed facet checks due to: >> facet >> pattern(s): >> >> [0-9]{2}|[0-9]{2}\.[0-9]{1}|[0-9]{2}\.[0-9]{2}|[0-9]{2}\.[0-9]{3}|[0-9]{2}\.[0-9]{4} >> >> Am I doing something wrong in my DFDL schema (shown below) or is this >> a bug >> in Daffodil? /Roger >> >> <?xml version="1.0" encoding="UTF-8"?> >> <xs:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/ >> <http://www.ogf.org/dfdl/dfdl-1.0/>" >> xmlns:xs="http://www.w3.org/2001/XMLSchema >> <http://www.w3.org/2001/XMLSchema>"> >> <xs:annotation> >> <xs:appinfo source="http://www.ogf.org/dfdl/ >> <http://www.ogf.org/dfdl/>"> >> <dfdl:format >> alignment="1" >> alignmentUnits="bytes" >> emptyValueDelimiterPolicy="none" >> encoding="ASCII" >> encodingErrorPolicy="replace" >> escapeSchemeRef="" >> fillByte="%SP;" >> floating="no" >> ignoreCase="yes" >> initiatedContent="no" >> initiator="" >> leadingSkip="0" >> lengthKind="delimited" >> lengthUnits="characters" >> nilValueDelimiterPolicy="none" >> occursCountKind="implicit" >> outputNewLine="%CR;%LF;" >> representation="text" >> separator="" >> separatorSuppressionPolicy="anyEmpty" >> sequenceKind="ordered" >> textBidi="no" >> textPadKind="none" >> textTrimKind="none" >> trailingSkip="0" >> truncateSpecifiedLengthString="no" >> terminator="" >> textNumberRep="standard" >> textStandardBase="10" >> textStandardZeroRep="0" >> textNumberRounding="pattern" >> textStandardExponentRep="E" >> textNumberCheckPolicy="strict"/> >> </xs:appinfo> >> </xs:annotation> >> <xs:element name="Test"> >> <xs:complexType> >> <xs:sequence dfdl:separator="/" >> dfdl:separatorPosition="infix"> >> <xs:element name="A" type="xs:string"/> >> <xs:choice dfdl:choiceLengthKind="implicit"> >> <xs:element name="Origin"> >> <xs:complexType> >> <xs:sequence dfdl:separator=""> >> <xs:element name="LatitudeDegrees" >> dfdl:lengthKind="explicit" dfdl:length="2"> >> <xs:simpleType> >> <xs:restriction >> base="xs:string"> >> <xs:pattern >> value="[0-9]{2}"/> >> </xs:restriction> >> </xs:simpleType> >> </xs:element> >> <xs:element name="LatitudeMinutes" >> dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(N|S))"> >> <xs:simpleType> >> <xs:restriction >> base="xs:string"> >> <xs:pattern >> value="[0-9]{2}"/> >> <xs:pattern >> value="[0-9]{2}\.[0-9]{1}"/> >> <xs:pattern >> value="[0-9]{2}\.[0-9]{2}"/> >> <xs:pattern >> value="[0-9]{2}\.[0-9]{3}"/> >> <xs:pattern >> value="[0-9]{2}\.[0-9]{4}"/> >> </xs:restriction> >> </xs:simpleType> >> </xs:element> >> <xs:element name="LatitudeHemisphere" >> dfdl:lengthKind="explicit" dfdl:length="1"> >> <xs:simpleType> >> <xs:restriction >> base="xs:string"> >> <xs:enumeration >> value="N"/> >> <xs:enumeration >> value="S"/> >> </xs:restriction> >> </xs:simpleType> >> </xs:element> >> <xs:element name="Hyphen" >> dfdl:lengthKind="explicit" dfdl:length="1"> >> <xs:simpleType> >> <xs:restriction >> base="xs:string"> >> <xs:enumeration >> value="-"/> >> </xs:restriction> >> </xs:simpleType> >> </xs:element> >> <xs:element name="LongitudeDegrees" >> dfdl:lengthKind="explicit" dfdl:length="3"> >> <xs:simpleType> >> <xs:restriction >> base="xs:string"> >> <xs:pattern >> value="[0-9]{3}"/> >> </xs:restriction> >> </xs:simpleType> >> </xs:element> >> <xs:element name="LongitudeMinutes" >> dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(E|W))"> >> <xs:simpleType> >> <xs:restriction >> base="xs:string"> >> <xs:pattern >> value="[0-9]{2}"/> >> <xs:pattern >> value="[0-9]{2}\.[0-9]{1}"/> >> <xs:pattern >> value="[0-9]{2}\.[0-9]{2}"/> >> <xs:pattern >> value="[0-9]{2}\.[0-9]{3}"/> >> <xs:pattern >> value="[0-9]{2}\.[0-9]{4}"/> >> </xs:restriction> >> </xs:simpleType> >> </xs:element> >> <xs:element >> name="LongitudeHemisphere"> >> <xs:simpleType> >> <xs:restriction >> base="xs:string"> >> <xs:enumeration >> value="E"/> >> <xs:enumeration >> value="W"/> >> </xs:restriction> >> </xs:simpleType> >> </xs:element> >> </xs:sequence> >> </xs:complexType> >> </xs:element> >> <xs:element name="Origin_" type="xs:string" >> nillable="true" dfdl:nilKind="literalValue" dfdl:nilValue="-"/> >> </xs:choice> >> <xs:element name="B" type="xs:string"/> >> </xs:sequence> >> </xs:complexType> >> </xs:element> >> </xs:schema> >>