Thanks Steve. 

So now we have two solutions: the solution that Mike identified using 
dfdl:checkConstraints(.), and the solution that Steve identified switching the 
order of the choice branches and using fn:nilled(.).

I am writing this stuff up. Which solution should I recommend? I prefer Steve's 
solution since it is simpler and easier to describe. Thoughts?

/Roger

-----Original Message-----
From: Steve Lawrence <slawre...@apache.org> 
Sent: Thursday, August 25, 2022 8:54 AM
To: users@daffodil.apache.org
Subject: [EXT] Re: Daffodil does not correctly parse variable length, nillable 
elements with complexType

Ah yeah, you're right. If Origin_ were first, you would also need an 
assert that Origin_ must be nilled to cause it to backtrack and try the 
other branch if it wasn't the nil value, e.g.:

   <xs:element name="Origin_" ... >
     <xs:annotation>
       <xs:appinfo source="http://www.ogf.org/dfdl/";>
         <dfdl:assert>{ fn:nilled(.) }</dfdl:assert>
       </xs:appinfo>
     </xs:annotation>
   </xs:element>

Note that this is probably a good idea to add regardless of order. 
Otherwise, if there was a parse error in the Origin element (which would 
occur with Mike's changes on invalid data) then Origin_ would accept any 
string, but it is intended to only parse a nil element.



On 8/25/22 7:47 AM, Roger L Costello wrote:
> Thanks Mike and Steve!
> 
> I need to study carefully what Mike said.
> 
> Steve, I think your suggestion is not correct. I did as you suggested and 
> reversed the order of the branches in the choice. Now, with this input:
> 
> John Doe/2006N-05912E/Sally Smith
> 
> I get this XML:
> 
> <Test>
>    <A>John Doe</A>
>    <Origin_>2006N-05912E</Origin_>
>    <B>Sally Smith</B>
> </Test>
> 
> which is not correct.
> 
> I conclude that switching the order of the branches in the choice is not 
> correct. Do you concur?
> 
> /Roger
> 
> -----Original Message-----
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Thursday, August 25, 2022 7:38 AM
> To: users@daffodil.apache.org
> Subject: [EXT] Re: Daffodil does not correctly parse variable length, 
> nillable elements with complexType
> 
> Another option, put your nillable type as the first branch in the
> choice. This way Daffodil will attempt to parse the nillable type first,
> and will only attempt to parse the complex Origin.
> 
> You'll still likely want the validation that Mike suggestions so that
> when something fails it fails immediately instead of happily continuing
> off the rails.
> 
> 
> On 8/25/22 7:30 AM, Mike Beckerle wrote:
>> I think I know what is happening.
>>
>> In the battle of delimiters vs. nested explicit length, explicit wins.
>>
>> So if you have abc/-/cef
>>
>> but after parsing abc then finding the separator /, the next field is
>> latitudeDegrees with explicit length 2, that "wins" and "-/" are the 
>> characters
>> of that string.
>>
>> Validation will then issue a validation warning because Daffodil's "limited"
>> validation is done as the elements are parsed.
>>
>> This does not cause backtracking, it's just a "warning" that the seemingly
>> well-formed data is invalid.
>>
>> Then latitudeMinutes is parsed, and that uses the ever problematic lengthKind
>> pattern, which succeeds, with a zero-length string, which then also causes a
>> validation error.
>>
>>     Again because this validation error because this, now zero-length string
>> doesn't look like the digits you expect.
>>
>> Then it parses the hyphen element, which is just a string of length 1,
>>
>> .... I'll stop here because things are clearly off the rails.
>>
>> Here's my suggestion for how to fix this and get Daffodil to magically do 
>> what
>> you want, which is to pay attention to the facets.
>>
>> <!-- vString = 'validated string'. Facets are checked while parsing. -->
>> <simpleType name="vString">
>>       <annotation><appinfo source="http://www.ogf.org/dfdl/
>> <http://www.ogf.org/dfdl/>">
>>           <dfdl:assert message="Invalid value">{ dfdl:checkConstraints(.)
>> }</dfdl:assert>
>>       </appinfo></annotation>
>>        <restriction base="xs:string"/>
>> </simpleType>
>>
>> Define all your strings with vString as your type, and it should behave much
>> more like you expect.
>>
>> Now normally I tell people not to call checkConstraints(.) on everything 
>> because
>> it fails to distinguish well-formed data from invalid data, and often one 
>> wants
>> the parse to succeed even if the data is invalid.
>>
>> In your case things are different. You have not provided enough information 
>> in
>> the DFDL properties to parse this data. The facets are necessary information 
>> to
>> successfully parse it.
>>
>> You will want to complement vString with use of discriminators. For example I
>> think your schema should have a discriminator after the latitudeDegrees 
>> element
>> because if you successfully parse that element, backtracking to the nilled 
>> case
>> no longer makes sense.
>>
>>
>>
>>
>> On Thu, Aug 25, 2022 at 7:01 AM Roger L Costello <coste...@mitre.org
>> <mailto:coste...@mitre.org>> wrote:
>>
>>       Hi Folks,
>>
>>       Here are two sample inputs:
>>
>>       John Doe/2006N-05912E/Sally Smith
>>       John Doe/-/Sally Smith
>>
>>       It is the field in the middle that is of interest.
>>
>>       The field is a composite field, i.e., it consists of a series of 
>> parts: lat
>>       degrees, lat minutes, lat hemisphere, hyphen, long degrees, long 
>> minutes,
>>       long hemisphere. No separator between the parts.
>>
>>       The field is nillable and the hyphen is the nil value.
>>
>>       The first input shown above succeeds, the second fails to parse.
>>
>>       What we have here is a variable length, nillable element with a 
>> complexType
>>       and the nil value is not %ES;. As we have determined in previous posts,
>>       Daffodil does not support this. So, the workaround is to place the 
>> element
>>       in a choice, where the first branch of the choice is the element minus 
>> the
>>       nillable stuff and the second branch is a plain string element that is
>>       nillable. Well, I implemented that and Daffodil complains:
>>
>>       [error] Parse Error: Failed to parse infix separator. Cause: Parse 
>> Error:
>>       Separator '/' not found
>>
>>       When I use the -V limited parse option I get a completely different 
>> set of
>>       error messages, e.g.:
>>
>>       [error] Validation Error: LatitudeMinutes failed facet checks due to: 
>> facet
>>       pattern(s):
>>       
>> [0-9]{2}|[0-9]{2}\.[0-9]{1}|[0-9]{2}\.[0-9]{2}|[0-9]{2}\.[0-9]{3}|[0-9]{2}\.[0-9]{4}
>>
>>       Am I doing something wrong in my DFDL schema (shown below) or is this 
>> a bug
>>       in Daffodil?  /Roger
>>
>>       <?xml version="1.0" encoding="UTF-8"?>
>>       <xs:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/
>>       <http://www.ogf.org/dfdl/dfdl-1.0/>"
>>                             xmlns:xs="http://www.w3.org/2001/XMLSchema
>>       <http://www.w3.org/2001/XMLSchema>">
>>            <xs:annotation>
>>                <xs:appinfo source="http://www.ogf.org/dfdl/
>>       <http://www.ogf.org/dfdl/>">
>>                    <dfdl:format
>>                        alignment="1"
>>                        alignmentUnits="bytes"
>>                        emptyValueDelimiterPolicy="none"
>>                        encoding="ASCII"
>>                        encodingErrorPolicy="replace"
>>                        escapeSchemeRef=""
>>                        fillByte="%SP;"
>>                        floating="no"
>>                        ignoreCase="yes"
>>                        initiatedContent="no"
>>                        initiator=""
>>                        leadingSkip="0"
>>                        lengthKind="delimited"
>>                        lengthUnits="characters"
>>                        nilValueDelimiterPolicy="none"
>>                        occursCountKind="implicit"
>>                        outputNewLine="%CR;%LF;"
>>                        representation="text"
>>                        separator=""
>>                        separatorSuppressionPolicy="anyEmpty"
>>                        sequenceKind="ordered"
>>                        textBidi="no"
>>                        textPadKind="none"
>>                        textTrimKind="none"
>>                        trailingSkip="0"
>>                        truncateSpecifiedLengthString="no"
>>                        terminator=""
>>                        textNumberRep="standard"
>>                        textStandardBase="10"
>>                        textStandardZeroRep="0"
>>                        textNumberRounding="pattern"
>>                        textStandardExponentRep="E"
>>                        textNumberCheckPolicy="strict"/>
>>                </xs:appinfo>
>>            </xs:annotation>
>>            <xs:element name="Test">
>>                <xs:complexType>
>>                    <xs:sequence dfdl:separator="/" 
>> dfdl:separatorPosition="infix">
>>                        <xs:element name="A" type="xs:string"/>
>>                        <xs:choice dfdl:choiceLengthKind="implicit">
>>                            <xs:element name="Origin">
>>                                <xs:complexType>
>>                                    <xs:sequence dfdl:separator="">
>>                                        <xs:element name="LatitudeDegrees"
>>       dfdl:lengthKind="explicit" dfdl:length="2">
>>                                            <xs:simpleType>
>>                                                <xs:restriction 
>> base="xs:string">
>>                                                    <xs:pattern 
>> value="[0-9]{2}"/>
>>                                                </xs:restriction>
>>                                            </xs:simpleType>
>>                                        </xs:element>
>>                                        <xs:element name="LatitudeMinutes"
>>       dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(N|S))">
>>                                            <xs:simpleType>
>>                                                <xs:restriction 
>> base="xs:string">
>>                                                    <xs:pattern 
>> value="[0-9]{2}"/>
>>                                                    <xs:pattern
>>       value="[0-9]{2}\.[0-9]{1}"/>
>>                                                    <xs:pattern
>>       value="[0-9]{2}\.[0-9]{2}"/>
>>                                                    <xs:pattern
>>       value="[0-9]{2}\.[0-9]{3}"/>
>>                                                    <xs:pattern
>>       value="[0-9]{2}\.[0-9]{4}"/>
>>                                                </xs:restriction>
>>                                            </xs:simpleType>
>>                                        </xs:element>
>>                                        <xs:element name="LatitudeHemisphere"
>>       dfdl:lengthKind="explicit" dfdl:length="1">
>>                                            <xs:simpleType>
>>                                                <xs:restriction 
>> base="xs:string">
>>                                                    <xs:enumeration 
>> value="N"/>
>>                                                    <xs:enumeration 
>> value="S"/>
>>                                                </xs:restriction>
>>                                            </xs:simpleType>
>>                                        </xs:element>
>>                                        <xs:element name="Hyphen"
>>       dfdl:lengthKind="explicit" dfdl:length="1">
>>                                            <xs:simpleType>
>>                                                <xs:restriction 
>> base="xs:string">
>>                                                    <xs:enumeration 
>> value="-"/>
>>                                                </xs:restriction>
>>                                            </xs:simpleType>
>>                                        </xs:element>
>>                                        <xs:element name="LongitudeDegrees"
>>       dfdl:lengthKind="explicit" dfdl:length="3">
>>                                            <xs:simpleType>
>>                                                <xs:restriction 
>> base="xs:string">
>>                                                    <xs:pattern 
>> value="[0-9]{3}"/>
>>                                                </xs:restriction>
>>                                            </xs:simpleType>
>>                                        </xs:element>
>>                                        <xs:element name="LongitudeMinutes"
>>       dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(E|W))">
>>                                            <xs:simpleType>
>>                                                <xs:restriction 
>> base="xs:string">
>>                                                    <xs:pattern 
>> value="[0-9]{2}"/>
>>                                                    <xs:pattern
>>       value="[0-9]{2}\.[0-9]{1}"/>
>>                                                    <xs:pattern
>>       value="[0-9]{2}\.[0-9]{2}"/>
>>                                                    <xs:pattern
>>       value="[0-9]{2}\.[0-9]{3}"/>
>>                                                    <xs:pattern
>>       value="[0-9]{2}\.[0-9]{4}"/>
>>                                                </xs:restriction>
>>                                            </xs:simpleType>
>>                                        </xs:element>
>>                                        <xs:element 
>> name="LongitudeHemisphere">
>>                                            <xs:simpleType>
>>                                                <xs:restriction 
>> base="xs:string">
>>                                                    <xs:enumeration 
>> value="E"/>
>>                                                    <xs:enumeration 
>> value="W"/>
>>                                                </xs:restriction>
>>                                            </xs:simpleType>
>>                                        </xs:element>
>>                                    </xs:sequence>
>>                                </xs:complexType>
>>                            </xs:element>
>>                            <xs:element name="Origin_" type="xs:string"
>>       nillable="true" dfdl:nilKind="literalValue" dfdl:nilValue="-"/>
>>                        </xs:choice>
>>                        <xs:element name="B" type="xs:string"/>
>>                    </xs:sequence>
>>                </xs:complexType>
>>            </xs:element>
>>       </xs:schema>
>>

Reply via email to