Thanks for the response. On Tue, Aug 31, 2021 at 12:49 AM Beckerle, Mike <mbecke...@owlcyberdefense.com> wrote: > > Good question. > > I think what is happening is this. elem5 fails to parse because it is an > empty string, but then the parse backtracks, and here's the trick: that means > it is putting back the separator before this array/optional element. Then > your schema has nothing to absorb the final separator. > > Your schema has expressed an optional element, but what you want is a > required separator, then an optional element after it. > > I think wrapping an xs:sequence around elem5 will fix this.
So the required separator goes on the sequence? > > To be sure, I need to see the occursCountKind property, lengthKind property, > etc. Basically I need to be able to reproduce your run. > I would need your default-dfdl-properties/defaults.dfdl.xsd file. > Here's my defaults that I pulled from the DFDL-part1 presentation: ?xml version="1.0" encoding="UTF-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:defineFormat name="default-dfdl-properties"> <dfdl:format alignment="1" alignmentUnits="bytes" binaryFloatRep="ieee" binaryNumberRep="binary" bitOrder="mostSignificantBitFirst" byteOrder="bigEndian" calendarPatternKind="implicit" documentFinalTerminatorCanBeMissing="yes" emptyValueDelimiterPolicy="none" encoding="ISO-8859-1" encodingErrorPolicy="replace" escapeSchemeRef="" fillByte="f" floating="no" ignoreCase="no" initiator="" initiatedContent="no" leadingSkip="0" lengthKind="delimited" lengthUnits="characters" nilKind="literalValue" nilValueDelimiterPolicy="none" occursCountKind="implicit" outputNewLine="%CR;%LF;" representation="text" separator="" separatorPosition="infix" separatorSuppressionPolicy="never" sequenceKind="ordered" terminator="" textBidi="no" textNumberCheckPolicy="strict" textNumberPattern="#,##0.###;-#,##0.###" textNumberRep="standard" textNumberRounding="explicit" textNumberRoundingIncrement="0" textNumberRoundingMode="roundUnnecessary" textOutputMinLength="0" textPadKind="none" textStandardBase="10" textStandardExponentRep="E" textStandardInfinityRep="Inf" textStandardNaNRep="NaN" textStandardZeroRep="0" textStandardDecimalSeparator="." textStandardGroupingSeparator="," textTrimKind="none" trailingSkip="0" truncateSpecifiedLengthString="no" utf16Width="fixed"/> </dfdl:defineFormat> </xs:appinfo> </xs:annotation> </schema> > w.r.t your 0001 issue.... > > The ability to control text number formats like leading zeros, is by way of > the dfdl:textNumberPattern property. I think you want different values for > this property for your two integer-type elements if they are supposed to have > different numbers of digits, as evidenced by their max values of 999 and > 99999. > > However, your request that 0001 be preserved is not consistent with either > 999 nor 99999 as max values. So I'm not sure what you are trying to achieve > in this format. Just trying to teach an old dog some new tricks. > > DFDL does not "remember how the integer was presented". It parses it > according to rules, creates an xs:int in the infoset, and at that point the > leading zero information is gone. It then unparses according to rules. If you > want 0001 to parse and unparse as 0001, you want > dfdl:textNumberPattern="#0000". That will give you 4 digits, optionally a > fifth if needed, but will always produce 4. > > But in this case, if you are first parsing, then unparsing data, then > incoming "01" will also unparse as "0001". Using > dfdl:textNumberPattern="#0000" means "canonical form for this data is at > least 4 digits". If you parse the data using dfdl:lengthKind='delimited', > then your schema has expressed "tolerate any number of digits, but always > canonicalize to at least 4 digits". I'll play with this. > > If you want the text of these numbers preserved, not canonicalized, and your > application does both parse and unparse, like data security apps often do, > then you need to use strings, not numbers. If I were to use strings how would I then validate that the value was in some range? > > Note, however, that preserving leading/trailing non-numerically significant > zeros is a security hole - they can be used to carry covert channel data. > Canonicalization of data is fundamentally more secure. > > The usual reason people want preservation of data exactly, character for > character, is to make test/QA easier. That's ok so long as you get that there > is a loss of some data security when non-information-carrying things like > leading/trailing zeros are preserved. > > > > ________________________________ > From: Theodore Toth <ted.toth....@sage.northcom.mil> > Sent: Sunday, August 29, 2021 2:45 AM > To: users@daffodil.apache.org <users@daffodil.apache.org> > Subject: optional int and unparse formatting > > I just started looking at daffodil and have a few questions about my > first experiment: > Here's my dfdl: > > <?xml version="1.0" encoding="UTF-8"?> > <xs:schema > xmlns:xs="http://www.w3.org/2001/XMLSchema" > xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"> > > <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" /> > <xs:annotation> > <xs:appinfo source="http://www.ogf.org/dfdl/"> > <dfdl:format ref="default-dfdl-properties" /> > </xs:appinfo> > </xs:annotation> > > <xs:element name="FOO" > dfdl:initiator="FOO/" > dfdl:lengthKind="implicit"> > <!-- > dfdl:terminator="//%NL;%WSP*;"> > --> > <xs:complexType> > <xs:sequence dfdl:sequenceKind="ordered" > dfdl:separator="/" > dfdl:separatorPosition="infix"> > > <xs:element name="elem1"> > <xs:simpleType> > <xs:restriction base="xs:string"> > <xs:minLength value="1"/> > <xs:maxLength value="14"/> > </xs:restriction> > </xs:simpleType> > </xs:element> > > <xs:element name="elem2"> > <xs:simpleType> > <xs:restriction base="xs:string"> > <xs:pattern value="CAT|DOG|HORSE"/> > </xs:restriction> > </xs:simpleType> > </xs:element> > > <xs:element name="elem3"> > <xs:simpleType> > <xs:restriction base="xs:int"> > <xs:minInclusive value="1"/> > <xs:maxInclusive value="99999"/> > </xs:restriction> > </xs:simpleType> > </xs:element> > > <xs:element name="elem4" minOccurs="0" maxOccurs="1"> > <xs:simpleType> > <xs:restriction base="xs:string"> > <xs:minLength value="1"/> > <xs:maxLength value="20"/> > </xs:restriction> > </xs:simpleType> > </xs:element> > > <xs:element name="elem5" minOccurs="0" maxOccurs="1"> > <xs:simpleType> > <xs:restriction base="xs:int"> > <xs:minInclusive value="1"/> > <xs:maxInclusive value="999"/> > </xs:restriction> > </xs:simpleType> > </xs:element> > </xs:sequence> > </xs:complexType> > </xs:element> > > </xs:schema> > > Here's some test data: > FOO/GONE FISHIN/DOG/0001/// > > The parse fails with: > [error] Parse Error: Unable to parse xs:int from empty string > Schema context: elem5 Location line 59 column 10 in > file:/home/tedx/dfdl-test/test.dfdl.xsd > Data location was preceding byte 26 > > Why does it fail when elem5 has minOccurs="0"? elem5 is optional. > > Then if I put a 0 before the last slash it generates: > <?xml version="1.0" encoding="UTF-8"?> > <FOO> > <elem1>GONE FISHIN</elem1> > <elem2>DOG</elem2> > <elem3>1</elem3> > <elem4></elem4> > <elem5>0</elem5> > </FOO> > > and when I unparse it generates: > FOO/GONE FISHIN/DOG/1//0 > > but I'd like it to output 0001 for elem3, how do I do that? > > Ted