Re: optional int and unparse formatting

Theodore Toth Thu, 07 Oct 2021 03:35:06 -0700

I'm still struggling with optional subelements at the end of an
element this time for a complex type, the approach that worked for a
simpleType doesn't work for a complex type. I'm getting  "[error]
Parse Error: Terminator '%NL;%WSP*;' not found". I'm not sure yet but
a newline might not be a valid terminator for a OTH-GOLD message line
:(
Also how would you specify an optional literal like '//' at the end of
an element when there can be other option subelements separated by '/'
prior to it?


On Sat, Sep 18, 2021 at 4:12 AM Beckerle, Mike
<[email protected]> wrote:
>
> Sorry for the late response on this. Turns out outlook 365 was spam filtering 
> some apache emails. It's a known issue with microsoft's spam filters.
>
> The sequence wrapped around elem5 doesn't need a dfdl:separator because the 
> elem5 has maxOccurs 1, so there will never be enough things to separate.
>
> Otherwise yeah, this looks like what I was suggesting.
>
> I agree that the DFDL spec is quite painful in numerous areas. Unfortunately 
> I have to take the blame for some of that. Someday I hope some sections will 
> get refactored and rewritten.
>
>
> ________________________________
> From: Theodore Toth <[email protected]>
> Sent: Tuesday, August 31, 2021 12:21 AM
> To: [email protected] <[email protected]>
> Subject: Re: optional int and unparse formatting
>
> The following worked for me although I don't know if it's the 'right'
> way to do it. Reading the spec can give you a headache.
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema
>     xmlns:xs="http://www.w3.org/2001/XMLSchema";
>     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";>
>
>   <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
>   <xs:annotation>
>     <xs:appinfo source="http://www.ogf.org/dfdl/";>
>       <dfdl:format ref="default-dfdl-properties" />
>     </xs:appinfo>
>   </xs:annotation>
>
>   <xs:element name="FOO"
>               dfdl:initiator="FOO/"
>               dfdl:lengthKind="implicit"
>               dfdl:terminator="%NL;%WSP*;">
>
>     <xs:complexType>
>       <xs:sequence dfdl:sequenceKind="ordered"
>                    dfdl:separator="/"
>                    dfdl:separatorPosition="infix">
>
>         <xs:element name="elem1">
>           <xs:simpleType>
>             <xs:restriction base="xs:string">
>               <xs:minLength value="1"/>
>               <xs:maxLength value="14"/>
>               <xs:pattern value="[A-Z0-9,:%#*\- ]+"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:element name="elem2">
>           <xs:simpleType>
>             <xs:restriction base="xs:string">
>               <xs:pattern value="CAT|DOG|HORSE"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:element name="elem3" dfdl:textNumberPattern="#0000">
>           <xs:simpleType>
>             <xs:restriction base="xs:int">
>               <xs:minInclusive value="1"/>
>               <xs:maxInclusive value="99999"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
>           <xs:simpleType>
>             <xs:restriction base="xs:string">
>               <xs:minLength value="1"/>
>               <xs:maxLength value="20"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:sequence dfdl:separator="/" dfdl:terminator="/"
>                      dfdl:separatorSuppressionPolicy="anyEmpty">
>           <xs:element name="elem5" minOccurs="0" maxOccurs="1"
>                       dfdl:textNumberPattern="000">
>             <xs:simpleType>
>               <xs:restriction base="xs:int">
>                 <xs:minInclusive value="1"/>
>                 <xs:maxInclusive value="999"/>
>               </xs:restriction>
>             </xs:simpleType>
>           </xs:element>
>         </xs:sequence>
>
>       </xs:sequence>
>     </xs:complexType>
>   </xs:element>
>
> </xs:schema>
>
> On Tue, Aug 31, 2021 at 9:31 AM Theodore Toth
> <[email protected]> wrote:
> >
> > Thanks for the response.
> >
> > On Tue, Aug 31, 2021 at 12:49 AM Beckerle, Mike
> > <[email protected]> wrote:
> > >
> > > Good question.
> > >
> > > I think what is happening is this. elem5 fails to parse because it is an 
> > > empty string, but then the parse backtracks, and here's the trick: that 
> > > means it is putting back the separator before this array/optional 
> > > element. Then your schema has nothing to absorb the final separator.
> > >
> > > Your schema has expressed an optional element, but what you want is a 
> > > required separator, then an optional element after it.
> > >
> > > I think wrapping an xs:sequence around elem5 will fix this.
> >
> > So the required separator goes on the sequence?
> >
> > >
> > > To be sure, I need to see the occursCountKind property, lengthKind 
> > > property, etc. Basically I need to be able to reproduce your run.
> > > I would need your default-dfdl-properties/defaults.dfdl.xsd file.
> > >
> > Here's my defaults that I pulled from the DFDL-part1 presentation:
> >
> > ?xml version="1.0" encoding="UTF-8"?>
> >
> > <schema xmlns="http://www.w3.org/2001/XMLSchema";
> >         xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
> >         xmlns:xs="http://www.w3.org/2001/XMLSchema";>
> >
> >   <xs:annotation>
> >     <xs:appinfo source="http://www.ogf.org/dfdl/";>
> >       <dfdl:defineFormat name="default-dfdl-properties">
> >         <dfdl:format
> >             alignment="1"
> >             alignmentUnits="bytes"
> >             binaryFloatRep="ieee"
> >             binaryNumberRep="binary"
> >             bitOrder="mostSignificantBitFirst"
> >             byteOrder="bigEndian"
> >             calendarPatternKind="implicit"
> >             documentFinalTerminatorCanBeMissing="yes"
> >             emptyValueDelimiterPolicy="none"
> >             encoding="ISO-8859-1"
> >             encodingErrorPolicy="replace"
> >             escapeSchemeRef=""
> >             fillByte="f"
> >             floating="no"
> >             ignoreCase="no"
> >             initiator=""
> >             initiatedContent="no"
> >             leadingSkip="0"
> >             lengthKind="delimited"
> >             lengthUnits="characters"
> >             nilKind="literalValue"
> >             nilValueDelimiterPolicy="none"
> >             occursCountKind="implicit"
> >             outputNewLine="%CR;%LF;"
> >             representation="text"
> >             separator=""
> >             separatorPosition="infix"
> >             separatorSuppressionPolicy="never"
> >             sequenceKind="ordered"
> >             terminator=""
> >             textBidi="no"
> >             textNumberCheckPolicy="strict"
> >             textNumberPattern="#,##0.###;-#,##0.###"
> >             textNumberRep="standard"
> >             textNumberRounding="explicit"
> >             textNumberRoundingIncrement="0"
> >             textNumberRoundingMode="roundUnnecessary"
> >             textOutputMinLength="0"
> >             textPadKind="none"
> >             textStandardBase="10"
> >             textStandardExponentRep="E"
> >             textStandardInfinityRep="Inf"
> >             textStandardNaNRep="NaN"
> >             textStandardZeroRep="0"
> >             textStandardDecimalSeparator="."
> >             textStandardGroupingSeparator=","
> >             textTrimKind="none"
> >             trailingSkip="0"
> >             truncateSpecifiedLengthString="no"
> >             utf16Width="fixed"/>
> >           </dfdl:defineFormat>
> >         </xs:appinfo>
> >       </xs:annotation>
> >     </schema>
> >
> >
> > > w.r.t your 0001 issue....
> > >
> > > The ability to control text number formats like leading zeros, is by way 
> > > of the dfdl:textNumberPattern property. I think you want different values 
> > > for this property for your two integer-type elements if they are supposed 
> > > to have different numbers of digits, as evidenced by their max values of 
> > > 999 and 99999.
> > >
> > > However, your request that 0001 be preserved is not consistent with 
> > > either 999 nor 99999 as max values. So I'm not sure what you are trying 
> > > to achieve in this format.
> >
> > Just trying to teach an old dog some new tricks.
> >
> > >
> > > DFDL does not "remember how the integer was presented". It parses it 
> > > according to rules, creates an xs:int in the infoset, and at that point 
> > > the leading zero information is gone. It then unparses according to 
> > > rules. If you want 0001 to parse and unparse as 0001, you want 
> > > dfdl:textNumberPattern="#0000". That will give you 4 digits, optionally a 
> > > fifth if needed, but will always produce 4.
> > >
> > > But in this case, if you are first parsing, then unparsing data, then 
> > > incoming "01" will also unparse as "0001". Using 
> > > dfdl:textNumberPattern="#0000" means "canonical form for this data is at 
> > > least 4 digits". If you parse the data using dfdl:lengthKind='delimited', 
> > > then your schema has expressed "tolerate any number of digits, but always 
> > > canonicalize to at least 4 digits".
> >
> > I'll play with this.
> >
> > >
> > > If you want the text of these numbers preserved, not canonicalized, and 
> > > your application does both parse and unparse, like data security apps 
> > > often do, then you need to use strings, not numbers.
> >
> > If I were to use strings how would I then validate that the value was
> > in some range?
> >
> > >
> > > Note, however, that preserving leading/trailing non-numerically 
> > > significant zeros is a security hole - they can be used to carry covert 
> > > channel data.
> > > Canonicalization of data is fundamentally more secure.
> > >
> > > The usual reason people want preservation of data exactly, character for 
> > > character, is to make test/QA easier. That's ok so long as you get that 
> > > there is a loss of some data security when non-information-carrying 
> > > things like leading/trailing zeros are preserved.
> > >
> > >
> > >
> > > ________________________________
> > > From: Theodore Toth <[email protected]>
> > > Sent: Sunday, August 29, 2021 2:45 AM
> > > To: [email protected] <[email protected]>
> > > Subject: optional int and unparse formatting
> > >
> > > I just started looking at daffodil and have a few questions about my
> > > first experiment:
> > > Here's my dfdl:
> > >
> > > <?xml version="1.0" encoding="UTF-8"?>
> > > <xs:schema
> > >     xmlns:xs="http://www.w3.org/2001/XMLSchema";
> > >     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";>
> > >
> > >   <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" 
> > > />
> > >   <xs:annotation>
> > >     <xs:appinfo source="http://www.ogf.org/dfdl/";>
> > >       <dfdl:format ref="default-dfdl-properties" />
> > >     </xs:appinfo>
> > >   </xs:annotation>
> > >
> > >   <xs:element name="FOO"
> > >               dfdl:initiator="FOO/"
> > >               dfdl:lengthKind="implicit">
> > > <!--
> > >               dfdl:terminator="//%NL;%WSP*;">
> > > -->
> > >     <xs:complexType>
> > >       <xs:sequence dfdl:sequenceKind="ordered"
> > >                    dfdl:separator="/"
> > >                    dfdl:separatorPosition="infix">
> > >
> > >         <xs:element name="elem1">
> > >           <xs:simpleType>
> > >             <xs:restriction base="xs:string">
> > >               <xs:minLength value="1"/>
> > >               <xs:maxLength value="14"/>
> > >             </xs:restriction>
> > >           </xs:simpleType>
> > >         </xs:element>
> > >
> > >         <xs:element name="elem2">
> > >           <xs:simpleType>
> > >             <xs:restriction base="xs:string">
> > >               <xs:pattern value="CAT|DOG|HORSE"/>
> > >             </xs:restriction>
> > >           </xs:simpleType>
> > >         </xs:element>
> > >
> > >         <xs:element name="elem3">
> > >           <xs:simpleType>
> > >             <xs:restriction base="xs:int">
> > >               <xs:minInclusive value="1"/>
> > >               <xs:maxInclusive value="99999"/>
> > >             </xs:restriction>
> > >           </xs:simpleType>
> > >         </xs:element>
> > >
> > >         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
> > >           <xs:simpleType>
> > >             <xs:restriction base="xs:string">
> > >               <xs:minLength value="1"/>
> > >               <xs:maxLength value="20"/>
> > >             </xs:restriction>
> > >           </xs:simpleType>
> > >         </xs:element>
> > >
> > >         <xs:element name="elem5" minOccurs="0" maxOccurs="1">
> > >           <xs:simpleType>
> > >             <xs:restriction base="xs:int">
> > >               <xs:minInclusive value="1"/>
> > >               <xs:maxInclusive value="999"/>
> > >             </xs:restriction>
> > >           </xs:simpleType>
> > >         </xs:element>
> > >       </xs:sequence>
> > >     </xs:complexType>
> > >   </xs:element>
> > >
> > > </xs:schema>
> > >
> > > Here's some test data:
> > > FOO/GONE FISHIN/DOG/0001///
> > >
> > > The parse fails with:
> > > [error] Parse Error: Unable to parse xs:int from empty string
> > > Schema context: elem5 Location line 59 column 10 in
> > > file:/home/tedx/dfdl-test/test.dfdl.xsd
> > > Data location was preceding byte 26
> > >
> > > Why does it fail when elem5 has minOccurs="0"? elem5 is optional.
> > >
> > > Then if I put a 0 before the last slash it generates:
> > > <?xml version="1.0" encoding="UTF-8"?>
> > > <FOO>
> > >   <elem1>GONE FISHIN</elem1>
> > >   <elem2>DOG</elem2>
> > >   <elem3>1</elem3>
> > >   <elem4></elem4>
> > >   <elem5>0</elem5>
> > > </FOO>
> > >
> > > and when I unparse it generates:
> > > FOO/GONE FISHIN/DOG/1//0
> > >
> > > but I'd like it to output 0001 for elem3, how do I do that?
> > >
> > > Ted

Re: optional int and unparse formatting

Reply via email to