Ah yeah, you're right. If Origin_ were first, you would also need an assert that Origin_ must be nilled to cause it to backtrack and try the other branch if it wasn't the nil value, e.g.:

  <xs:element name="Origin_" ... >
    <xs:annotation>
      <xs:appinfo source="http://www.ogf.org/dfdl/";>
        <dfdl:assert>{ fn:nilled(.) }</dfdl:assert>
      </xs:appinfo>
    </xs:annotation>
  </xs:element>

Note that this is probably a good idea to add regardless of order. Otherwise, if there was a parse error in the Origin element (which would occur with Mike's changes on invalid data) then Origin_ would accept any string, but it is intended to only parse a nil element.



On 8/25/22 7:47 AM, Roger L Costello wrote:
Thanks Mike and Steve!

I need to study carefully what Mike said.

Steve, I think your suggestion is not correct. I did as you suggested and 
reversed the order of the branches in the choice. Now, with this input:

John Doe/2006N-05912E/Sally Smith

I get this XML:

<Test>
   <A>John Doe</A>
   <Origin_>2006N-05912E</Origin_>
   <B>Sally Smith</B>
</Test>

which is not correct.

I conclude that switching the order of the branches in the choice is not 
correct. Do you concur?

/Roger

-----Original Message-----
From: Steve Lawrence <slawre...@apache.org>
Sent: Thursday, August 25, 2022 7:38 AM
To: users@daffodil.apache.org
Subject: [EXT] Re: Daffodil does not correctly parse variable length, nillable 
elements with complexType

Another option, put your nillable type as the first branch in the
choice. This way Daffodil will attempt to parse the nillable type first,
and will only attempt to parse the complex Origin.

You'll still likely want the validation that Mike suggestions so that
when something fails it fails immediately instead of happily continuing
off the rails.


On 8/25/22 7:30 AM, Mike Beckerle wrote:
I think I know what is happening.

In the battle of delimiters vs. nested explicit length, explicit wins.

So if you have abc/-/cef

but after parsing abc then finding the separator /, the next field is
latitudeDegrees with explicit length 2, that "wins" and "-/" are the characters
of that string.

Validation will then issue a validation warning because Daffodil's "limited"
validation is done as the elements are parsed.

This does not cause backtracking, it's just a "warning" that the seemingly
well-formed data is invalid.

Then latitudeMinutes is parsed, and that uses the ever problematic lengthKind
pattern, which succeeds, with a zero-length string, which then also causes a
validation error.

    Again because this validation error because this, now zero-length string
doesn't look like the digits you expect.

Then it parses the hyphen element, which is just a string of length 1,

.... I'll stop here because things are clearly off the rails.

Here's my suggestion for how to fix this and get Daffodil to magically do what
you want, which is to pay attention to the facets.

<!-- vString = 'validated string'. Facets are checked while parsing. -->
<simpleType name="vString">
      <annotation><appinfo source="http://www.ogf.org/dfdl/
<http://www.ogf.org/dfdl/>">
          <dfdl:assert message="Invalid value">{ dfdl:checkConstraints(.)
}</dfdl:assert>
      </appinfo></annotation>
       <restriction base="xs:string"/>
</simpleType>

Define all your strings with vString as your type, and it should behave much
more like you expect.

Now normally I tell people not to call checkConstraints(.) on everything because
it fails to distinguish well-formed data from invalid data, and often one wants
the parse to succeed even if the data is invalid.

In your case things are different. You have not provided enough information in
the DFDL properties to parse this data. The facets are necessary information to
successfully parse it.

You will want to complement vString with use of discriminators. For example I
think your schema should have a discriminator after the latitudeDegrees element
because if you successfully parse that element, backtracking to the nilled case
no longer makes sense.




On Thu, Aug 25, 2022 at 7:01 AM Roger L Costello <coste...@mitre.org
<mailto:coste...@mitre.org>> wrote:

      Hi Folks,

      Here are two sample inputs:

      John Doe/2006N-05912E/Sally Smith
      John Doe/-/Sally Smith

      It is the field in the middle that is of interest.

      The field is a composite field, i.e., it consists of a series of parts: 
lat
      degrees, lat minutes, lat hemisphere, hyphen, long degrees, long minutes,
      long hemisphere. No separator between the parts.

      The field is nillable and the hyphen is the nil value.

      The first input shown above succeeds, the second fails to parse.

      What we have here is a variable length, nillable element with a 
complexType
      and the nil value is not %ES;. As we have determined in previous posts,
      Daffodil does not support this. So, the workaround is to place the element
      in a choice, where the first branch of the choice is the element minus the
      nillable stuff and the second branch is a plain string element that is
      nillable. Well, I implemented that and Daffodil complains:

      [error] Parse Error: Failed to parse infix separator. Cause: Parse Error:
      Separator '/' not found

      When I use the -V limited parse option I get a completely different set of
      error messages, e.g.:

      [error] Validation Error: LatitudeMinutes failed facet checks due to: 
facet
      pattern(s):
      
[0-9]{2}|[0-9]{2}\.[0-9]{1}|[0-9]{2}\.[0-9]{2}|[0-9]{2}\.[0-9]{3}|[0-9]{2}\.[0-9]{4}

      Am I doing something wrong in my DFDL schema (shown below) or is this a 
bug
      in Daffodil?  /Roger

      <?xml version="1.0" encoding="UTF-8"?>
      <xs:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/
      <http://www.ogf.org/dfdl/dfdl-1.0/>"
                            xmlns:xs="http://www.w3.org/2001/XMLSchema
      <http://www.w3.org/2001/XMLSchema>">
           <xs:annotation>
               <xs:appinfo source="http://www.ogf.org/dfdl/
      <http://www.ogf.org/dfdl/>">
                   <dfdl:format
                       alignment="1"
                       alignmentUnits="bytes"
                       emptyValueDelimiterPolicy="none"
                       encoding="ASCII"
                       encodingErrorPolicy="replace"
                       escapeSchemeRef=""
                       fillByte="%SP;"
                       floating="no"
                       ignoreCase="yes"
                       initiatedContent="no"
                       initiator=""
                       leadingSkip="0"
                       lengthKind="delimited"
                       lengthUnits="characters"
                       nilValueDelimiterPolicy="none"
                       occursCountKind="implicit"
                       outputNewLine="%CR;%LF;"
                       representation="text"
                       separator=""
                       separatorSuppressionPolicy="anyEmpty"
                       sequenceKind="ordered"
                       textBidi="no"
                       textPadKind="none"
                       textTrimKind="none"
                       trailingSkip="0"
                       truncateSpecifiedLengthString="no"
                       terminator=""
                       textNumberRep="standard"
                       textStandardBase="10"
                       textStandardZeroRep="0"
                       textNumberRounding="pattern"
                       textStandardExponentRep="E"
                       textNumberCheckPolicy="strict"/>
               </xs:appinfo>
           </xs:annotation>
           <xs:element name="Test">
               <xs:complexType>
                   <xs:sequence dfdl:separator="/" 
dfdl:separatorPosition="infix">
                       <xs:element name="A" type="xs:string"/>
                       <xs:choice dfdl:choiceLengthKind="implicit">
                           <xs:element name="Origin">
                               <xs:complexType>
                                   <xs:sequence dfdl:separator="">
                                       <xs:element name="LatitudeDegrees"
      dfdl:lengthKind="explicit" dfdl:length="2">
                                           <xs:simpleType>
                                               <xs:restriction base="xs:string">
                                                   <xs:pattern 
value="[0-9]{2}"/>
                                               </xs:restriction>
                                           </xs:simpleType>
                                       </xs:element>
                                       <xs:element name="LatitudeMinutes"
      dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(N|S))">
                                           <xs:simpleType>
                                               <xs:restriction base="xs:string">
                                                   <xs:pattern 
value="[0-9]{2}"/>
                                                   <xs:pattern
      value="[0-9]{2}\.[0-9]{1}"/>
                                                   <xs:pattern
      value="[0-9]{2}\.[0-9]{2}"/>
                                                   <xs:pattern
      value="[0-9]{2}\.[0-9]{3}"/>
                                                   <xs:pattern
      value="[0-9]{2}\.[0-9]{4}"/>
                                               </xs:restriction>
                                           </xs:simpleType>
                                       </xs:element>
                                       <xs:element name="LatitudeHemisphere"
      dfdl:lengthKind="explicit" dfdl:length="1">
                                           <xs:simpleType>
                                               <xs:restriction base="xs:string">
                                                   <xs:enumeration value="N"/>
                                                   <xs:enumeration value="S"/>
                                               </xs:restriction>
                                           </xs:simpleType>
                                       </xs:element>
                                       <xs:element name="Hyphen"
      dfdl:lengthKind="explicit" dfdl:length="1">
                                           <xs:simpleType>
                                               <xs:restriction base="xs:string">
                                                   <xs:enumeration value="-"/>
                                               </xs:restriction>
                                           </xs:simpleType>
                                       </xs:element>
                                       <xs:element name="LongitudeDegrees"
      dfdl:lengthKind="explicit" dfdl:length="3">
                                           <xs:simpleType>
                                               <xs:restriction base="xs:string">
                                                   <xs:pattern 
value="[0-9]{3}"/>
                                               </xs:restriction>
                                           </xs:simpleType>
                                       </xs:element>
                                       <xs:element name="LongitudeMinutes"
      dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(E|W))">
                                           <xs:simpleType>
                                               <xs:restriction base="xs:string">
                                                   <xs:pattern 
value="[0-9]{2}"/>
                                                   <xs:pattern
      value="[0-9]{2}\.[0-9]{1}"/>
                                                   <xs:pattern
      value="[0-9]{2}\.[0-9]{2}"/>
                                                   <xs:pattern
      value="[0-9]{2}\.[0-9]{3}"/>
                                                   <xs:pattern
      value="[0-9]{2}\.[0-9]{4}"/>
                                               </xs:restriction>
                                           </xs:simpleType>
                                       </xs:element>
                                       <xs:element name="LongitudeHemisphere">
                                           <xs:simpleType>
                                               <xs:restriction base="xs:string">
                                                   <xs:enumeration value="E"/>
                                                   <xs:enumeration value="W"/>
                                               </xs:restriction>
                                           </xs:simpleType>
                                       </xs:element>
                                   </xs:sequence>
                               </xs:complexType>
                           </xs:element>
                           <xs:element name="Origin_" type="xs:string"
      nillable="true" dfdl:nilKind="literalValue" dfdl:nilValue="-"/>
                       </xs:choice>
                       <xs:element name="B" type="xs:string"/>
                   </xs:sequence>
               </xs:complexType>
           </xs:element>
      </xs:schema>


Reply via email to