Re: optional int and unparse formatting

Theodore Toth Mon, 30 Aug 2021 17:31:46 -0700

Thanks for the response.

On Tue, Aug 31, 2021 at 12:49 AM Beckerle, Mike
<mbecke...@owlcyberdefense.com> wrote:
>
> Good question.
>
> I think what is happening is this. elem5 fails to parse because it is an 
> empty string, but then the parse backtracks, and here's the trick: that means 
> it is putting back the separator before this array/optional element. Then 
> your schema has nothing to absorb the final separator.
>
> Your schema has expressed an optional element, but what you want is a 
> required separator, then an optional element after it.
>
> I think wrapping an xs:sequence around elem5 will fix this.


So the required separator goes on the sequence?

>
> To be sure, I need to see the occursCountKind property, lengthKind property, 
> etc. Basically I need to be able to reproduce your run.
> I would need your default-dfdl-properties/defaults.dfdl.xsd file.
>
Here's my defaults that I pulled from the DFDL-part1 presentation:

?xml version="1.0" encoding="UTF-8"?>

<schema xmlns="http://www.w3.org/2001/XMLSchema";
        xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
        xmlns:xs="http://www.w3.org/2001/XMLSchema";>

  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/";>
      <dfdl:defineFormat name="default-dfdl-properties">
        <dfdl:format
            alignment="1"
            alignmentUnits="bytes"
            binaryFloatRep="ieee"
            binaryNumberRep="binary"
            bitOrder="mostSignificantBitFirst"
            byteOrder="bigEndian"
            calendarPatternKind="implicit"
            documentFinalTerminatorCanBeMissing="yes"
            emptyValueDelimiterPolicy="none"
            encoding="ISO-8859-1"
            encodingErrorPolicy="replace"
            escapeSchemeRef=""
            fillByte="f"
            floating="no"
            ignoreCase="no"
            initiator=""
            initiatedContent="no"
            leadingSkip="0"
            lengthKind="delimited"
            lengthUnits="characters"
            nilKind="literalValue"
            nilValueDelimiterPolicy="none"
            occursCountKind="implicit"
            outputNewLine="%CR;%LF;"
            representation="text"
            separator=""
            separatorPosition="infix"
            separatorSuppressionPolicy="never"
            sequenceKind="ordered"
            terminator=""
            textBidi="no"
            textNumberCheckPolicy="strict"
            textNumberPattern="#,##0.###;-#,##0.###"
            textNumberRep="standard"
            textNumberRounding="explicit"
            textNumberRoundingIncrement="0"
            textNumberRoundingMode="roundUnnecessary"
            textOutputMinLength="0"
            textPadKind="none"
            textStandardBase="10"
            textStandardExponentRep="E"
            textStandardInfinityRep="Inf"
            textStandardNaNRep="NaN"
            textStandardZeroRep="0"
            textStandardDecimalSeparator="."
            textStandardGroupingSeparator=","
            textTrimKind="none"
            trailingSkip="0"
            truncateSpecifiedLengthString="no"
            utf16Width="fixed"/>
          </dfdl:defineFormat>
        </xs:appinfo>
      </xs:annotation>
    </schema>


> w.r.t your 0001 issue....
>
> The ability to control text number formats like leading zeros, is by way of 
> the dfdl:textNumberPattern property. I think you want different values for 
> this property for your two integer-type elements if they are supposed to have 
> different numbers of digits, as evidenced by their max values of 999 and 
> 99999.
>
> However, your request that 0001 be preserved is not consistent with either 
> 999 nor 99999 as max values. So I'm not sure what you are trying to achieve 
> in this format.

Just trying to teach an old dog some new tricks.

>
> DFDL does not "remember how the integer was presented". It parses it 
> according to rules, creates an xs:int in the infoset, and at that point the 
> leading zero information is gone. It then unparses according to rules. If you 
> want 0001 to parse and unparse as 0001, you want 
> dfdl:textNumberPattern="#0000". That will give you 4 digits, optionally a 
> fifth if needed, but will always produce 4.
>
> But in this case, if you are first parsing, then unparsing data, then 
> incoming "01" will also unparse as "0001". Using 
> dfdl:textNumberPattern="#0000" means "canonical form for this data is at 
> least 4 digits". If you parse the data using dfdl:lengthKind='delimited', 
> then your schema has expressed "tolerate any number of digits, but always 
> canonicalize to at least 4 digits".

I'll play with this.

>
> If you want the text of these numbers preserved, not canonicalized, and your 
> application does both parse and unparse, like data security apps often do, 
> then you need to use strings, not numbers.

If I were to use strings how would I then validate that the value was
in some range?

>
> Note, however, that preserving leading/trailing non-numerically significant 
> zeros is a security hole - they can be used to carry covert channel data.
> Canonicalization of data is fundamentally more secure.
>
> The usual reason people want preservation of data exactly, character for 
> character, is to make test/QA easier. That's ok so long as you get that there 
> is a loss of some data security when non-information-carrying things like 
> leading/trailing zeros are preserved.
>
>
>
> ________________________________
> From: Theodore Toth <ted.toth....@sage.northcom.mil>
> Sent: Sunday, August 29, 2021 2:45 AM
> To: users@daffodil.apache.org <users@daffodil.apache.org>
> Subject: optional int and unparse formatting
>
> I just started looking at daffodil and have a few questions about my
> first experiment:
> Here's my dfdl:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema
>     xmlns:xs="http://www.w3.org/2001/XMLSchema";
>     xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";>
>
>   <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" />
>   <xs:annotation>
>     <xs:appinfo source="http://www.ogf.org/dfdl/";>
>       <dfdl:format ref="default-dfdl-properties" />
>     </xs:appinfo>
>   </xs:annotation>
>
>   <xs:element name="FOO"
>               dfdl:initiator="FOO/"
>               dfdl:lengthKind="implicit">
> <!--
>               dfdl:terminator="//%NL;%WSP*;">
> -->
>     <xs:complexType>
>       <xs:sequence dfdl:sequenceKind="ordered"
>                    dfdl:separator="/"
>                    dfdl:separatorPosition="infix">
>
>         <xs:element name="elem1">
>           <xs:simpleType>
>             <xs:restriction base="xs:string">
>               <xs:minLength value="1"/>
>               <xs:maxLength value="14"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:element name="elem2">
>           <xs:simpleType>
>             <xs:restriction base="xs:string">
>               <xs:pattern value="CAT|DOG|HORSE"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:element name="elem3">
>           <xs:simpleType>
>             <xs:restriction base="xs:int">
>               <xs:minInclusive value="1"/>
>               <xs:maxInclusive value="99999"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:element name="elem4" minOccurs="0" maxOccurs="1">
>           <xs:simpleType>
>             <xs:restriction base="xs:string">
>               <xs:minLength value="1"/>
>               <xs:maxLength value="20"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>
>         <xs:element name="elem5" minOccurs="0" maxOccurs="1">
>           <xs:simpleType>
>             <xs:restriction base="xs:int">
>               <xs:minInclusive value="1"/>
>               <xs:maxInclusive value="999"/>
>             </xs:restriction>
>           </xs:simpleType>
>         </xs:element>
>       </xs:sequence>
>     </xs:complexType>
>   </xs:element>
>
> </xs:schema>
>
> Here's some test data:
> FOO/GONE FISHIN/DOG/0001///
>
> The parse fails with:
> [error] Parse Error: Unable to parse xs:int from empty string
> Schema context: elem5 Location line 59 column 10 in
> file:/home/tedx/dfdl-test/test.dfdl.xsd
> Data location was preceding byte 26
>
> Why does it fail when elem5 has minOccurs="0"? elem5 is optional.
>
> Then if I put a 0 before the last slash it generates:
> <?xml version="1.0" encoding="UTF-8"?>
> <FOO>
>   <elem1>GONE FISHIN</elem1>
>   <elem2>DOG</elem2>
>   <elem3>1</elem3>
>   <elem4></elem4>
>   <elem5>0</elem5>
> </FOO>
>
> and when I unparse it generates:
> FOO/GONE FISHIN/DOG/1//0
>
> but I'd like it to output 0001 for elem3, how do I do that?
>
> Ted

Re: optional int and unparse formatting

Reply via email to