Re: optional int and unparse formatting

Theodore Toth Thu, 07 Oct 2021 17:08:36 -0700

Yes I'm looking at the DI2E USMTF ATO/ACO schemas, thanks.
Unfortunately OTH-G doesn't define optional values settings and 'end
of set' like USMTF which is what I'm struggling with :(


Ted

On Thu, Oct 7, 2021 at 9:58 PM Mike Beckerle <mbecke...@apache.org> wrote:
>
> Ted,
>
> If you have access to the DI2E.net system, then this USMTF DFDL schema 
> (partial. Mostly just ATO) may help you as OTH-G has similarities.
>
> https://bitbucket.di2e.net/projects/DFDL/repos/usmtf/browse
>
> If you don't have that access, then please get in contact privately and we'll 
> arrange to get you a copy by other means.
>
> Of possible interest: I am currently adding features to Daffodil that will 
> support OTH-G style check-digits i.e., verifying them, computing them on 
> unparse.
> This will come out in release 3.2.0 later this year.
>
> -mikeb
>
>
>
>
> On Thu, Oct 7, 2021 at 6:35 AM Theodore Toth <ted.toth....@sage.northcom.mil> 
> wrote:
>>
>> I'm still struggling with optional subelements at the end of an
>> element this time for a complex type, the approach that worked for a
>> simpleType doesn't work for a complex type. I'm getting  "[error]
>> Parse Error: Terminator '%NL;%WSP*;' not found". I'm not sure yet but
>> a newline might not be a valid terminator for a OTH-GOLD message line
>> :(
>> Also how would you specify an optional literal like '//' at the end of
>> an element when there can be other option subelements separated by '/'
>> prior to it?
>>
>> On Sat, Sep 18, 2021 at 4:12 AM Beckerle, Mike
>> <mbecke...@owlcyberdefense.com> wrote:
>> >
>> > Sorry for the late response on this. Turns out outlook 365 was spam 
>> > filtering some apache emails. It's a known issue with microsoft's spam 
>> > filters.
>> >
>> > The sequence wrapped around elem5 doesn't need a dfdl:separator because 
>> > the elem5 has maxOccurs 1, so there will never be enough things to 
>> > separate.
>> >
>> > Otherwise yeah, this looks like what I was suggesting.
>> >
>> > I agree that the DFDL spec is quite painful in numerous areas. 
>> > Unfortunately I have to take the blame for some of that. Someday I hope 
>> > some sections will get refactored and rewritten.
>> >
>> >
>> > ________________________________
>> > From: Theodore Toth <ted.toth....@sage.northcom.mil>
>> > Sent: Tuesday, August 31, 2021 12:21 AM
>> > To: users@daffodil.apache.org <users@daffodil.apache.org>
>> > Subject: Re: optional int and unparse formatting
>> >
>> > The following worked for me although I don't know if it's the 'right'
>> > way to do it. Reading the spec can give you a headache.
>> >
>> > <?xml version="1.0" encoding="UTF-8"?>
>> > <xs:schema
>> >     xmlns:xs="http://www.w3.org/2001/XMLSchema";
>> >     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";>
>> >
>> >   <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
>> >   <xs:annotation>
>> >     <xs:appinfo source="http://www.ogf.org/dfdl/";>
>> >       <dfdl:format ref="default-dfdl-properties" />
>> >     </xs:appinfo>
>> >   </xs:annotation>
>> >
>> >   <xs:element name="FOO"
>> >               dfdl:initiator="FOO/"
>> >               dfdl:lengthKind="implicit"
>> >               dfdl:terminator="%NL;%WSP*;">
>> >
>> >     <xs:complexType>
>> >       <xs:sequence dfdl:sequenceKind="ordered"
>> >                    dfdl:separator="/"
>> >                    dfdl:separatorPosition="infix">
>> >
>> >         <xs:element name="elem1">
>> >           <xs:simpleType>
>> >             <xs:restriction base="xs:string">
>> >               <xs:minLength value="1"/>
>> >               <xs:maxLength value="14"/>
>> >               <xs:pattern value="[A-Z0-9,:%#*\- ]+"/>
>> >             </xs:restriction>
>> >           </xs:simpleType>
>> >         </xs:element>
>> >
>> >         <xs:element name="elem2">
>> >           <xs:simpleType>
>> >             <xs:restriction base="xs:string">
>> >               <xs:pattern value="CAT|DOG|HORSE"/>
>> >             </xs:restriction>
>> >           </xs:simpleType>
>> >         </xs:element>
>> >
>> >         <xs:element name="elem3" dfdl:textNumberPattern="#0000">
>> >           <xs:simpleType>
>> >             <xs:restriction base="xs:int">
>> >               <xs:minInclusive value="1"/>
>> >               <xs:maxInclusive value="99999"/>
>> >             </xs:restriction>
>> >           </xs:simpleType>
>> >         </xs:element>
>> >
>> >         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
>> >           <xs:simpleType>
>> >             <xs:restriction base="xs:string">
>> >               <xs:minLength value="1"/>
>> >               <xs:maxLength value="20"/>
>> >             </xs:restriction>
>> >           </xs:simpleType>
>> >         </xs:element>
>> >
>> >         <xs:sequence dfdl:separator="/" dfdl:terminator="/"
>> >                      dfdl:separatorSuppressionPolicy="anyEmpty">
>> >           <xs:element name="elem5" minOccurs="0" maxOccurs="1"
>> >                       dfdl:textNumberPattern="000">
>> >             <xs:simpleType>
>> >               <xs:restriction base="xs:int">
>> >                 <xs:minInclusive value="1"/>
>> >                 <xs:maxInclusive value="999"/>
>> >               </xs:restriction>
>> >             </xs:simpleType>
>> >           </xs:element>
>> >         </xs:sequence>
>> >
>> >       </xs:sequence>
>> >     </xs:complexType>
>> >   </xs:element>
>> >
>> > </xs:schema>
>> >
>> > On Tue, Aug 31, 2021 at 9:31 AM Theodore Toth
>> > <ted.toth....@sage.northcom.mil> wrote:
>> > >
>> > > Thanks for the response.
>> > >
>> > > On Tue, Aug 31, 2021 at 12:49 AM Beckerle, Mike
>> > > <mbecke...@owlcyberdefense.com> wrote:
>> > > >
>> > > > Good question.
>> > > >
>> > > > I think what is happening is this. elem5 fails to parse because it is 
>> > > > an empty string, but then the parse backtracks, and here's the trick: 
>> > > > that means it is putting back the separator before this array/optional 
>> > > > element. Then your schema has nothing to absorb the final separator.
>> > > >
>> > > > Your schema has expressed an optional element, but what you want is a 
>> > > > required separator, then an optional element after it.
>> > > >
>> > > > I think wrapping an xs:sequence around elem5 will fix this.
>> > >
>> > > So the required separator goes on the sequence?
>> > >
>> > > >
>> > > > To be sure, I need to see the occursCountKind property, lengthKind 
>> > > > property, etc. Basically I need to be able to reproduce your run.
>> > > > I would need your default-dfdl-properties/defaults.dfdl.xsd file.
>> > > >
>> > > Here's my defaults that I pulled from the DFDL-part1 presentation:
>> > >
>> > > ?xml version="1.0" encoding="UTF-8"?>
>> > >
>> > > <schema xmlns="http://www.w3.org/2001/XMLSchema";
>> > >         xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
>> > >         xmlns:xs="http://www.w3.org/2001/XMLSchema";>
>> > >
>> > >   <xs:annotation>
>> > >     <xs:appinfo source="http://www.ogf.org/dfdl/";>
>> > >       <dfdl:defineFormat name="default-dfdl-properties">
>> > >         <dfdl:format
>> > >             alignment="1"
>> > >             alignmentUnits="bytes"
>> > >             binaryFloatRep="ieee"
>> > >             binaryNumberRep="binary"
>> > >             bitOrder="mostSignificantBitFirst"
>> > >             byteOrder="bigEndian"
>> > >             calendarPatternKind="implicit"
>> > >             documentFinalTerminatorCanBeMissing="yes"
>> > >             emptyValueDelimiterPolicy="none"
>> > >             encoding="ISO-8859-1"
>> > >             encodingErrorPolicy="replace"
>> > >             escapeSchemeRef=""
>> > >             fillByte="f"
>> > >             floating="no"
>> > >             ignoreCase="no"
>> > >             initiator=""
>> > >             initiatedContent="no"
>> > >             leadingSkip="0"
>> > >             lengthKind="delimited"
>> > >             lengthUnits="characters"
>> > >             nilKind="literalValue"
>> > >             nilValueDelimiterPolicy="none"
>> > >             occursCountKind="implicit"
>> > >             outputNewLine="%CR;%LF;"
>> > >             representation="text"
>> > >             separator=""
>> > >             separatorPosition="infix"
>> > >             separatorSuppressionPolicy="never"
>> > >             sequenceKind="ordered"
>> > >             terminator=""
>> > >             textBidi="no"
>> > >             textNumberCheckPolicy="strict"
>> > >             textNumberPattern="#,##0.###;-#,##0.###"
>> > >             textNumberRep="standard"
>> > >             textNumberRounding="explicit"
>> > >             textNumberRoundingIncrement="0"
>> > >             textNumberRoundingMode="roundUnnecessary"
>> > >             textOutputMinLength="0"
>> > >             textPadKind="none"
>> > >             textStandardBase="10"
>> > >             textStandardExponentRep="E"
>> > >             textStandardInfinityRep="Inf"
>> > >             textStandardNaNRep="NaN"
>> > >             textStandardZeroRep="0"
>> > >             textStandardDecimalSeparator="."
>> > >             textStandardGroupingSeparator=","
>> > >             textTrimKind="none"
>> > >             trailingSkip="0"
>> > >             truncateSpecifiedLengthString="no"
>> > >             utf16Width="fixed"/>
>> > >           </dfdl:defineFormat>
>> > >         </xs:appinfo>
>> > >       </xs:annotation>
>> > >     </schema>
>> > >
>> > >
>> > > > w.r.t your 0001 issue....
>> > > >
>> > > > The ability to control text number formats like leading zeros, is by 
>> > > > way of the dfdl:textNumberPattern property. I think you want different 
>> > > > values for this property for your two integer-type elements if they 
>> > > > are supposed to have different numbers of digits, as evidenced by 
>> > > > their max values of 999 and 99999.
>> > > >
>> > > > However, your request that 0001 be preserved is not consistent with 
>> > > > either 999 nor 99999 as max values. So I'm not sure what you are 
>> > > > trying to achieve in this format.
>> > >
>> > > Just trying to teach an old dog some new tricks.
>> > >
>> > > >
>> > > > DFDL does not "remember how the integer was presented". It parses it 
>> > > > according to rules, creates an xs:int in the infoset, and at that 
>> > > > point the leading zero information is gone. It then unparses according 
>> > > > to rules. If you want 0001 to parse and unparse as 0001, you want 
>> > > > dfdl:textNumberPattern="#0000". That will give you 4 digits, 
>> > > > optionally a fifth if needed, but will always produce 4.
>> > > >
>> > > > But in this case, if you are first parsing, then unparsing data, then 
>> > > > incoming "01" will also unparse as "0001". Using 
>> > > > dfdl:textNumberPattern="#0000" means "canonical form for this data is 
>> > > > at least 4 digits". If you parse the data using 
>> > > > dfdl:lengthKind='delimited', then your schema has expressed "tolerate 
>> > > > any number of digits, but always canonicalize to at least 4 digits".
>> > >
>> > > I'll play with this.
>> > >
>> > > >
>> > > > If you want the text of these numbers preserved, not canonicalized, 
>> > > > and your application does both parse and unparse, like data security 
>> > > > apps often do, then you need to use strings, not numbers.
>> > >
>> > > If I were to use strings how would I then validate that the value was
>> > > in some range?
>> > >
>> > > >
>> > > > Note, however, that preserving leading/trailing non-numerically 
>> > > > significant zeros is a security hole - they can be used to carry 
>> > > > covert channel data.
>> > > > Canonicalization of data is fundamentally more secure.
>> > > >
>> > > > The usual reason people want preservation of data exactly, character 
>> > > > for character, is to make test/QA easier. That's ok so long as you get 
>> > > > that there is a loss of some data security when 
>> > > > non-information-carrying things like leading/trailing zeros are 
>> > > > preserved.
>> > > >
>> > > >
>> > > >
>> > > > ________________________________
>> > > > From: Theodore Toth <ted.toth....@sage.northcom.mil>
>> > > > Sent: Sunday, August 29, 2021 2:45 AM
>> > > > To: users@daffodil.apache.org <users@daffodil.apache.org>
>> > > > Subject: optional int and unparse formatting
>> > > >
>> > > > I just started looking at daffodil and have a few questions about my
>> > > > first experiment:
>> > > > Here's my dfdl:
>> > > >
>> > > > <?xml version="1.0" encoding="UTF-8"?>
>> > > > <xs:schema
>> > > >     xmlns:xs="http://www.w3.org/2001/XMLSchema";
>> > > >     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";>
>> > > >
>> > > >   <xs:include 
>> > > > schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
>> > > >   <xs:annotation>
>> > > >     <xs:appinfo source="http://www.ogf.org/dfdl/";>
>> > > >       <dfdl:format ref="default-dfdl-properties" />
>> > > >     </xs:appinfo>
>> > > >   </xs:annotation>
>> > > >
>> > > >   <xs:element name="FOO"
>> > > >               dfdl:initiator="FOO/"
>> > > >               dfdl:lengthKind="implicit">
>> > > > <!--
>> > > >               dfdl:terminator="//%NL;%WSP*;">
>> > > > -->
>> > > >     <xs:complexType>
>> > > >       <xs:sequence dfdl:sequenceKind="ordered"
>> > > >                    dfdl:separator="/"
>> > > >                    dfdl:separatorPosition="infix">
>> > > >
>> > > >         <xs:element name="elem1">
>> > > >           <xs:simpleType>
>> > > >             <xs:restriction base="xs:string">
>> > > >               <xs:minLength value="1"/>
>> > > >               <xs:maxLength value="14"/>
>> > > >             </xs:restriction>
>> > > >           </xs:simpleType>
>> > > >         </xs:element>
>> > > >
>> > > >         <xs:element name="elem2">
>> > > >           <xs:simpleType>
>> > > >             <xs:restriction base="xs:string">
>> > > >               <xs:pattern value="CAT|DOG|HORSE"/>
>> > > >             </xs:restriction>
>> > > >           </xs:simpleType>
>> > > >         </xs:element>
>> > > >
>> > > >         <xs:element name="elem3">
>> > > >           <xs:simpleType>
>> > > >             <xs:restriction base="xs:int">
>> > > >               <xs:minInclusive value="1"/>
>> > > >               <xs:maxInclusive value="99999"/>
>> > > >             </xs:restriction>
>> > > >           </xs:simpleType>
>> > > >         </xs:element>
>> > > >
>> > > >         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
>> > > >           <xs:simpleType>
>> > > >             <xs:restriction base="xs:string">
>> > > >               <xs:minLength value="1"/>
>> > > >               <xs:maxLength value="20"/>
>> > > >             </xs:restriction>
>> > > >           </xs:simpleType>
>> > > >         </xs:element>
>> > > >
>> > > >         <xs:element name="elem5" minOccurs="0" maxOccurs="1">
>> > > >           <xs:simpleType>
>> > > >             <xs:restriction base="xs:int">
>> > > >               <xs:minInclusive value="1"/>
>> > > >               <xs:maxInclusive value="999"/>
>> > > >             </xs:restriction>
>> > > >           </xs:simpleType>
>> > > >         </xs:element>
>> > > >       </xs:sequence>
>> > > >     </xs:complexType>
>> > > >   </xs:element>
>> > > >
>> > > > </xs:schema>
>> > > >
>> > > > Here's some test data:
>> > > > FOO/GONE FISHIN/DOG/0001///
>> > > >
>> > > > The parse fails with:
>> > > > [error] Parse Error: Unable to parse xs:int from empty string
>> > > > Schema context: elem5 Location line 59 column 10 in
>> > > > file:/home/tedx/dfdl-test/test.dfdl.xsd
>> > > > Data location was preceding byte 26
>> > > >
>> > > > Why does it fail when elem5 has minOccurs="0"? elem5 is optional.
>> > > >
>> > > > Then if I put a 0 before the last slash it generates:
>> > > > <?xml version="1.0" encoding="UTF-8"?>
>> > > > <FOO>
>> > > >   <elem1>GONE FISHIN</elem1>
>> > > >   <elem2>DOG</elem2>
>> > > >   <elem3>1</elem3>
>> > > >   <elem4></elem4>
>> > > >   <elem5>0</elem5>
>> > > > </FOO>
>> > > >
>> > > > and when I unparse it generates:
>> > > > FOO/GONE FISHIN/DOG/1//0
>> > > >
>> > > > but I'd like it to output 0001 for elem3, how do I do that?
>> > > >
>> > > > Ted

Re: optional int and unparse formatting

Reply via email to