Well your first field is fixed length 2, so the boundary there is not a
problem.

The second field ends where the E/W appear. I would do this with lengthKind
pattern and lookahead: ".*?(?=(E|W))". Note that your digits play no role
in this. Those stay in the pattern facet.

This regex is looking for (but not including) the start of what comes next.
This is absolutlely in my experience the most common idiom for lengthKind
'pattern'. The patterns care not about what the value looks like at all,
only about using lookahead to find where it must end.

The third field is followed by a hyphen, so that's a terminator.

The fourth is fixed length

The fifth is delimited.

On Thu, Aug 18, 2022 at 12:55 PM Roger L Costello <coste...@mitre.org>
wrote:

>
>    - Is there a delimiter after the longitude like the "/"?
>
>
>
> No. Here’s the actual sequence of elements along with their pattern facet
> regex or enumeration values (no separator between them):
>
>
>
> LatitudeDegrees  [0-9]{2}
>
> LatitudeMinutes  [0-9]{2}
>
>                                 [0-9]{2}\.[0-9]{1}
>
>                                  [0-9]{2}\.[0-9]{2}
>
>                                  [0-9]{2}\.[0-9]{3}
>
>                                  [0-9]{2}\.[0-9]{4}
>
> Hemisphere  E
>
>                          W
>
> Hyphen  -
>
> LongitudeDegrees [0-9]{3}
>
> LongitudeMinutes  [0-9]{2}
>
>                                    [0-9]{2}\.[0-9]{1}
>
>                                     [0-9]{2}\.[0-9]{2}
>
>                                     [0-9]{2}\.[0-9]{3}
>
>                                     [0-9]{2}\.[0-9]{4}
>
>
>
> Following that sequence is a slash delimiter.
>
>
>
> This discussion has helped me to clarify the modeling problem:
>
>
>
> How to create DFDL for a data field consisting of a series of parts (with
> no separator between the parts) and the parts may or may not be of variable
> length (the parts are non-nillable)?
>
>
>
> How do you answer that?
>
>
>
> /Roger
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Thursday, August 18, 2022 12:43 PM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: Bug in Daffodil
>
>
>
> Is there a delimiter after the longitude like the "/"?
>
> If so then it is only the latitude field that is fixed length.
>
>
>
> On Thu, Aug 18, 2022 at 12:40 PM Roger L Costello <coste...@mitre.org>
> wrote:
>
> > You'll need to use lengthKind="pattern" in this case.
>
> Ugh!
>
> I thought, with the use of -V limited, I had finally gotten rid of
> lengthKind="pattern". Now, with what you're telling me, I find myself back
> to the old tedious, error-prone task of ordering regexes
> longest-to-shortest. In fact, writing a program that examines an arbitrary
> set of pattern facet regexes to order them longest-to-shortest is going to
> be extremely difficult or even impossible. This is horrible!
>
> Is there no other solution?
>
> /Roger
>
> -----Original Message-----
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Thursday, August 18, 2022 12:24 PM
> To: users@daffodil.apache.org
> Subject: [EXT] Re: Bug in Daffodil
>
> You'll need to use lengthKind="pattern" in this case. You could combine
> your pattern restrictions in to a big regex of alternatives, or you
> could do something a little less verbose like this:
>
> <xs:element name="LatitudeMinutes" dfdl:lengthKind="pattern"
> dfdl:lengthPattern="[0-9]{2}(\.[0-9]{1,4})?" />
>
> Matches the same thing, but is a bit more compact. The same pattern
> could be used for the restriction.
>
> Alternatively, if you wanted to differentiate between well-formed/valid
> (i.e. different length pattern than restriction pattern), you could even
> do something like this:
>
> <xs:element name="LatitudeMinutes" dfdl:lengthKind="pattern"
> dfdl:lengthPattern="[0-9]+(\.[0-9]+)?" />
>
> So parsing would accept any decimal number with optional decimal digits,
> and then validation could restrict this to the appropriate number of
> digits using the existing facets. Note that treating it as an xs:decimal
> instead of xs:string might give more you control (e.g. value must be >=
> 0 and < than 60).
>
> The Hemisphere element would have an explicit length of 1, e.g.
>
> <xs:element name="Hemisphere" dfdl:lengthKind="explicit" dfdl:length="1">
>
> On 8/18/22 12:03 PM, Roger L Costello wrote:
> > Thanks Steve. Unfortunately, specifying an explicit length on each
> element is not going to work. The second element - LatitudeMinutes - can
> actually be 2, 4, 5, 6, or 7 characters in length:
> >
> > <xs:element name="LatitudeMinutes">
> >      <xs:simpleType>
> >          <xs:restriction base="xs:string">
> >              <xs:pattern value="[0-9]{2}"/>
> >              <xs:pattern value="[0-9]{2}\.[0-9]{1}"/>
> >              <xs:pattern value="[0-9]{2}\.[0-9]{2}"/>
> >              <xs:pattern value="[0-9]{2}\.[0-9]{3}"/>
> >              <xs:pattern value="[0-9]{2}\.[0-9]{4}"/>
> >          </xs:restriction>
> >      </xs:simpleType>
> > </xs:element>
> >
> > And after it are more elements. For example, following it is this element
> >
> > <xs:element name="Hemisphere">
> >      <xs:simpleType>
> >          <xs:restriction base="xs:string">
> >              <xs:enumeration value="N"/>
> >              <xs:enumeration value="S"/>
> >          </xs:restriction>
> >      </xs:simpleType>
> > </xs:element>
> >
> > How to handle this situation?
> >
> > /Roger
> >
> > -----Original Message-----
> > From: Steve Lawrence <slawre...@apache.org>
> > Sent: Thursday, August 18, 2022 11:54 AM
> > To: users@daffodil.apache.org
> > Subject: [EXT] Re: Bug in Daffodil
> >
> > You haven't specified a length of the LatitudeDegrees (or
> > LatitudeMinutes). So the lengthKind is just delimited and so will end up
> > delimited by the nearest enclosing delimiter, which is the /. So
> > LatatitudeDegrees is parsed as "2006", and things go off the rails.
> >
> > Instead, you want your LatitudeDegrees/Minutes elements to have
> > lengthKind="explicit" with length="2", e.g.:
> >
> > <xs:element name="Origin">
> >       <xs:complexType>
> >           <xs:sequence>
> >               <xs:element name="LatitudeDegrees"
> > dfdl:lengthKind="explicit" dfdl:length="2">
> >                   <xs:simpleType>
> >                       <xs:restriction base="xs:string">
> >                           <xs:pattern value="[0-9]{2}"/>
> >                       </xs:restriction>
> >                   </xs:simpleType>
> >               </xs:element>
> >               <xs:element name="LatitudeMinutes"
> > dfdl:lengthKind="explicit" dfdl:length="2">
> >                   <xs:simpleType>
> >                       <xs:restriction base="xs:string">
> >                           <xs:pattern value="[0-9]{2}"/>
> >                       </xs:restriction>
> >                   </xs:simpleType>
> >               </xs:element>
> >           </xs:sequence>
> >       </xs:complexType>
> > </xs:element>
> >
> >
> >
> > On 8/18/22 11:08 AM, Roger L Costello wrote:
> >> Hi Folks,
> >>
> >> Daffodil is unable to parse DFDL schemas containing two consecutive
> element declarations, each with a simpleType which has a facet.
> >>
> >> With this input:
> >>
> >> John Doe/2006/Sally Smith
> >>
> >> The part of interest is the middle part - 2006 - which consists of two
> subparts: 20 (LatitudeDegrees) and 06 (LatitudeMinutes). Each subpart is
> constrained via XSD facets.
> >>
> >> I get this error message when I parse using Daffodil version 3.2.1
> (using the -V limited option):
> >>
> >> [error] Validation Error: LatitudeMinutes failed facet checks due to:
> facet enumeration(s): 06
> >>
> >> Below is my DFDL schema.
> >>
> >> I believe this is a bug, yes? Is there a workaround?
> >>
> >> <xs:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"; xmlns:xs="
> http://www.w3.org/2001/XMLSchema";>
> >>       <xs:annotation xmlns:f="function" xmlns:fn="
> http://www.w3.org/2005/xpath-functions"; xmlns:regex="regex-functions">
> >>           <xs:appinfo source="http://www.ogf.org/dfdl/";>
> >>               <dfdl:format alignment="1"
> >>                   alignmentUnits="bytes"
> >>                   emptyValueDelimiterPolicy="none"
> >>                   encoding="ASCII"
> >>                   encodingErrorPolicy="replace"
> >>                   escapeSchemeRef=""
> >>                   fillByte="%SP;"
> >>                   floating="no"
> >>                   ignoreCase="yes"
> >>                   initiatedContent="no"
> >>                   initiator=""
> >>                   leadingSkip="0"
> >>                   lengthKind="delimited"
> >>                   lengthUnits="characters"
> >>                   nilValueDelimiterPolicy="none"
> >>                   occursCountKind="implicit"
> >>                   outputNewLine="%CR;%LF;"
> >>                   representation="text"
> >>                   separator=""
> >>                   separatorSuppressionPolicy="anyEmpty"
> >>                   sequenceKind="ordered"
> >>                   textBidi="no"
> >>                   textPadKind="none"
> >>                   textTrimKind="none"
> >>                   trailingSkip="0"
> >>                   truncateSpecifiedLengthString="no"
> >>                   terminator=""
> >>                   textNumberRep="standard"
> >>                   textStandardBase="10"
> >>                   textStandardZeroRep="0"
> >>                   textNumberRounding="pattern"
> >>                   textStandardExponentRep="E"
> >>                   textNumberCheckPolicy="strict"/>
> >>           </xs:appinfo>
> >>       </xs:annotation>
> >>       <xs:element name="Test">
> >>           <xs:complexType>
> >>               <xs:sequence dfdl:separator="/"
> dfdl:separatorPosition="infix">
> >>                   <xs:element name="A" type="xs:string" />
> >>                   <xs:element name="Origin">
> >>                       <xs:complexType>
> >>                           <xs:sequence dfdl:separator="">
> >>                               <xs:element name="LatitudeDegrees">
> >>                                   <xs:simpleType>
> >>                                       <xs:restriction base="xs:string">
> >>                                           <xs:pattern value="[0-9]{2}"/>
> >>                                       </xs:restriction>
> >>                                   </xs:simpleType>
> >>                               </xs:element>
> >>                               <xs:element name="LatitudeMinutes">
> >>                                   <xs:simpleType>
> >>                                       <xs:restriction base="xs:string">
> >>                                           <!--<xs:pattern
> value="[0-9]{2}"/>-->  <!-- This also fails -->
> >>                                           <xs:enumeration value="06" />
> >>                                       </xs:restriction>
> >>                                   </xs:simpleType>
> >>                               </xs:element>
> >>                           </xs:sequence>
> >>                       </xs:complexType>
> >>                   </xs:element>
> >>                   <xs:element name="B" type="xs:string" />
> >>               </xs:sequence>
> >>           </xs:complexType>
> >>       </xs:element>
> >> </xs:schema>
> >>
> >>
>
>

Reply via email to