Ted, If you have access to the DI2E.net system, then this USMTF DFDL schema (partial. Mostly just ATO) may help you as OTH-G has similarities.
https://bitbucket.di2e.net/projects/DFDL/repos/usmtf/browse If you don't have that access, then please get in contact privately and we'll arrange to get you a copy by other means. Of possible interest: I am currently adding features to Daffodil that will support OTH-G style check-digits i.e., verifying them, computing them on unparse. This will come out in release 3.2.0 later this year. -mikeb On Thu, Oct 7, 2021 at 6:35 AM Theodore Toth <ted.toth....@sage.northcom.mil> wrote: > I'm still struggling with optional subelements at the end of an > element this time for a complex type, the approach that worked for a > simpleType doesn't work for a complex type. I'm getting "[error] > Parse Error: Terminator '%NL;%WSP*;' not found". I'm not sure yet but > a newline might not be a valid terminator for a OTH-GOLD message line > :( > Also how would you specify an optional literal like '//' at the end of > an element when there can be other option subelements separated by '/' > prior to it? > > On Sat, Sep 18, 2021 at 4:12 AM Beckerle, Mike > <mbecke...@owlcyberdefense.com> wrote: > > > > Sorry for the late response on this. Turns out outlook 365 was spam > filtering some apache emails. It's a known issue with microsoft's spam > filters. > > > > The sequence wrapped around elem5 doesn't need a dfdl:separator because > the elem5 has maxOccurs 1, so there will never be enough things to separate. > > > > Otherwise yeah, this looks like what I was suggesting. > > > > I agree that the DFDL spec is quite painful in numerous areas. > Unfortunately I have to take the blame for some of that. Someday I hope > some sections will get refactored and rewritten. > > > > > > ________________________________ > > From: Theodore Toth <ted.toth....@sage.northcom.mil> > > Sent: Tuesday, August 31, 2021 12:21 AM > > To: users@daffodil.apache.org <users@daffodil.apache.org> > > Subject: Re: optional int and unparse formatting > > > > The following worked for me although I don't know if it's the 'right' > > way to do it. Reading the spec can give you a headache. > > > > <?xml version="1.0" encoding="UTF-8"?> > > <xs:schema > > xmlns:xs="http://www.w3.org/2001/XMLSchema" > > xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"> > > > > <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" > /> > > <xs:annotation> > > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > <dfdl:format ref="default-dfdl-properties" /> > > </xs:appinfo> > > </xs:annotation> > > > > <xs:element name="FOO" > > dfdl:initiator="FOO/" > > dfdl:lengthKind="implicit" > > dfdl:terminator="%NL;%WSP*;"> > > > > <xs:complexType> > > <xs:sequence dfdl:sequenceKind="ordered" > > dfdl:separator="/" > > dfdl:separatorPosition="infix"> > > > > <xs:element name="elem1"> > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:minLength value="1"/> > > <xs:maxLength value="14"/> > > <xs:pattern value="[A-Z0-9,:%#*\- ]+"/> > > </xs:restriction> > > </xs:simpleType> > > </xs:element> > > > > <xs:element name="elem2"> > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:pattern value="CAT|DOG|HORSE"/> > > </xs:restriction> > > </xs:simpleType> > > </xs:element> > > > > <xs:element name="elem3" dfdl:textNumberPattern="#0000"> > > <xs:simpleType> > > <xs:restriction base="xs:int"> > > <xs:minInclusive value="1"/> > > <xs:maxInclusive value="99999"/> > > </xs:restriction> > > </xs:simpleType> > > </xs:element> > > > > <xs:element name="elem4" minOccurs="0" maxOccurs="1"> > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:minLength value="1"/> > > <xs:maxLength value="20"/> > > </xs:restriction> > > </xs:simpleType> > > </xs:element> > > > > <xs:sequence dfdl:separator="/" dfdl:terminator="/" > > dfdl:separatorSuppressionPolicy="anyEmpty"> > > <xs:element name="elem5" minOccurs="0" maxOccurs="1" > > dfdl:textNumberPattern="000"> > > <xs:simpleType> > > <xs:restriction base="xs:int"> > > <xs:minInclusive value="1"/> > > <xs:maxInclusive value="999"/> > > </xs:restriction> > > </xs:simpleType> > > </xs:element> > > </xs:sequence> > > > > </xs:sequence> > > </xs:complexType> > > </xs:element> > > > > </xs:schema> > > > > On Tue, Aug 31, 2021 at 9:31 AM Theodore Toth > > <ted.toth....@sage.northcom.mil> wrote: > > > > > > Thanks for the response. > > > > > > On Tue, Aug 31, 2021 at 12:49 AM Beckerle, Mike > > > <mbecke...@owlcyberdefense.com> wrote: > > > > > > > > Good question. > > > > > > > > I think what is happening is this. elem5 fails to parse because it > is an empty string, but then the parse backtracks, and here's the trick: > that means it is putting back the separator before this array/optional > element. Then your schema has nothing to absorb the final separator. > > > > > > > > Your schema has expressed an optional element, but what you want is > a required separator, then an optional element after it. > > > > > > > > I think wrapping an xs:sequence around elem5 will fix this. > > > > > > So the required separator goes on the sequence? > > > > > > > > > > > To be sure, I need to see the occursCountKind property, lengthKind > property, etc. Basically I need to be able to reproduce your run. > > > > I would need your default-dfdl-properties/defaults.dfdl.xsd file. > > > > > > > Here's my defaults that I pulled from the DFDL-part1 presentation: > > > > > > ?xml version="1.0" encoding="UTF-8"?> > > > > > > <schema xmlns="http://www.w3.org/2001/XMLSchema" > > > xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" > > > xmlns:xs="http://www.w3.org/2001/XMLSchema"> > > > > > > <xs:annotation> > > > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > > <dfdl:defineFormat name="default-dfdl-properties"> > > > <dfdl:format > > > alignment="1" > > > alignmentUnits="bytes" > > > binaryFloatRep="ieee" > > > binaryNumberRep="binary" > > > bitOrder="mostSignificantBitFirst" > > > byteOrder="bigEndian" > > > calendarPatternKind="implicit" > > > documentFinalTerminatorCanBeMissing="yes" > > > emptyValueDelimiterPolicy="none" > > > encoding="ISO-8859-1" > > > encodingErrorPolicy="replace" > > > escapeSchemeRef="" > > > fillByte="f" > > > floating="no" > > > ignoreCase="no" > > > initiator="" > > > initiatedContent="no" > > > leadingSkip="0" > > > lengthKind="delimited" > > > lengthUnits="characters" > > > nilKind="literalValue" > > > nilValueDelimiterPolicy="none" > > > occursCountKind="implicit" > > > outputNewLine="%CR;%LF;" > > > representation="text" > > > separator="" > > > separatorPosition="infix" > > > separatorSuppressionPolicy="never" > > > sequenceKind="ordered" > > > terminator="" > > > textBidi="no" > > > textNumberCheckPolicy="strict" > > > textNumberPattern="#,##0.###;-#,##0.###" > > > textNumberRep="standard" > > > textNumberRounding="explicit" > > > textNumberRoundingIncrement="0" > > > textNumberRoundingMode="roundUnnecessary" > > > textOutputMinLength="0" > > > textPadKind="none" > > > textStandardBase="10" > > > textStandardExponentRep="E" > > > textStandardInfinityRep="Inf" > > > textStandardNaNRep="NaN" > > > textStandardZeroRep="0" > > > textStandardDecimalSeparator="." > > > textStandardGroupingSeparator="," > > > textTrimKind="none" > > > trailingSkip="0" > > > truncateSpecifiedLengthString="no" > > > utf16Width="fixed"/> > > > </dfdl:defineFormat> > > > </xs:appinfo> > > > </xs:annotation> > > > </schema> > > > > > > > > > > w.r.t your 0001 issue.... > > > > > > > > The ability to control text number formats like leading zeros, is by > way of the dfdl:textNumberPattern property. I think you want different > values for this property for your two integer-type elements if they are > supposed to have different numbers of digits, as evidenced by their max > values of 999 and 99999. > > > > > > > > However, your request that 0001 be preserved is not consistent with > either 999 nor 99999 as max values. So I'm not sure what you are trying to > achieve in this format. > > > > > > Just trying to teach an old dog some new tricks. > > > > > > > > > > > DFDL does not "remember how the integer was presented". It parses it > according to rules, creates an xs:int in the infoset, and at that point the > leading zero information is gone. It then unparses according to rules. If > you want 0001 to parse and unparse as 0001, you want > dfdl:textNumberPattern="#0000". That will give you 4 digits, optionally a > fifth if needed, but will always produce 4. > > > > > > > > But in this case, if you are first parsing, then unparsing data, > then incoming "01" will also unparse as "0001". Using > dfdl:textNumberPattern="#0000" means "canonical form for this data is at > least 4 digits". If you parse the data using dfdl:lengthKind='delimited', > then your schema has expressed "tolerate any number of digits, but always > canonicalize to at least 4 digits". > > > > > > I'll play with this. > > > > > > > > > > > If you want the text of these numbers preserved, not canonicalized, > and your application does both parse and unparse, like data security apps > often do, then you need to use strings, not numbers. > > > > > > If I were to use strings how would I then validate that the value was > > > in some range? > > > > > > > > > > > Note, however, that preserving leading/trailing non-numerically > significant zeros is a security hole - they can be used to carry covert > channel data. > > > > Canonicalization of data is fundamentally more secure. > > > > > > > > The usual reason people want preservation of data exactly, character > for character, is to make test/QA easier. That's ok so long as you get that > there is a loss of some data security when non-information-carrying things > like leading/trailing zeros are preserved. > > > > > > > > > > > > > > > > ________________________________ > > > > From: Theodore Toth <ted.toth....@sage.northcom.mil> > > > > Sent: Sunday, August 29, 2021 2:45 AM > > > > To: users@daffodil.apache.org <users@daffodil.apache.org> > > > > Subject: optional int and unparse formatting > > > > > > > > I just started looking at daffodil and have a few questions about my > > > > first experiment: > > > > Here's my dfdl: > > > > > > > > <?xml version="1.0" encoding="UTF-8"?> > > > > <xs:schema > > > > xmlns:xs="http://www.w3.org/2001/XMLSchema" > > > > xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"> > > > > > > > > <xs:include > schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" /> > > > > <xs:annotation> > > > > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > > > <dfdl:format ref="default-dfdl-properties" /> > > > > </xs:appinfo> > > > > </xs:annotation> > > > > > > > > <xs:element name="FOO" > > > > dfdl:initiator="FOO/" > > > > dfdl:lengthKind="implicit"> > > > > <!-- > > > > dfdl:terminator="//%NL;%WSP*;"> > > > > --> > > > > <xs:complexType> > > > > <xs:sequence dfdl:sequenceKind="ordered" > > > > dfdl:separator="/" > > > > dfdl:separatorPosition="infix"> > > > > > > > > <xs:element name="elem1"> > > > > <xs:simpleType> > > > > <xs:restriction base="xs:string"> > > > > <xs:minLength value="1"/> > > > > <xs:maxLength value="14"/> > > > > </xs:restriction> > > > > </xs:simpleType> > > > > </xs:element> > > > > > > > > <xs:element name="elem2"> > > > > <xs:simpleType> > > > > <xs:restriction base="xs:string"> > > > > <xs:pattern value="CAT|DOG|HORSE"/> > > > > </xs:restriction> > > > > </xs:simpleType> > > > > </xs:element> > > > > > > > > <xs:element name="elem3"> > > > > <xs:simpleType> > > > > <xs:restriction base="xs:int"> > > > > <xs:minInclusive value="1"/> > > > > <xs:maxInclusive value="99999"/> > > > > </xs:restriction> > > > > </xs:simpleType> > > > > </xs:element> > > > > > > > > <xs:element name="elem4" minOccurs="0" maxOccurs="1"> > > > > <xs:simpleType> > > > > <xs:restriction base="xs:string"> > > > > <xs:minLength value="1"/> > > > > <xs:maxLength value="20"/> > > > > </xs:restriction> > > > > </xs:simpleType> > > > > </xs:element> > > > > > > > > <xs:element name="elem5" minOccurs="0" maxOccurs="1"> > > > > <xs:simpleType> > > > > <xs:restriction base="xs:int"> > > > > <xs:minInclusive value="1"/> > > > > <xs:maxInclusive value="999"/> > > > > </xs:restriction> > > > > </xs:simpleType> > > > > </xs:element> > > > > </xs:sequence> > > > > </xs:complexType> > > > > </xs:element> > > > > > > > > </xs:schema> > > > > > > > > Here's some test data: > > > > FOO/GONE FISHIN/DOG/0001/// > > > > > > > > The parse fails with: > > > > [error] Parse Error: Unable to parse xs:int from empty string > > > > Schema context: elem5 Location line 59 column 10 in > > > > file:/home/tedx/dfdl-test/test.dfdl.xsd > > > > Data location was preceding byte 26 > > > > > > > > Why does it fail when elem5 has minOccurs="0"? elem5 is optional. > > > > > > > > Then if I put a 0 before the last slash it generates: > > > > <?xml version="1.0" encoding="UTF-8"?> > > > > <FOO> > > > > <elem1>GONE FISHIN</elem1> > > > > <elem2>DOG</elem2> > > > > <elem3>1</elem3> > > > > <elem4></elem4> > > > > <elem5>0</elem5> > > > > </FOO> > > > > > > > > and when I unparse it generates: > > > > FOO/GONE FISHIN/DOG/1//0 > > > > > > > > but I'd like it to output 0001 for elem3, how do I do that? > > > > > > > > Ted >