Yes I'm looking at the DI2E USMTF ATO/ACO schemas, thanks. Unfortunately OTH-G doesn't define optional values settings and 'end of set' like USMTF which is what I'm struggling with :(
Ted On Thu, Oct 7, 2021 at 9:58 PM Mike Beckerle <mbecke...@apache.org> wrote: > > Ted, > > If you have access to the DI2E.net system, then this USMTF DFDL schema > (partial. Mostly just ATO) may help you as OTH-G has similarities. > > https://bitbucket.di2e.net/projects/DFDL/repos/usmtf/browse > > If you don't have that access, then please get in contact privately and we'll > arrange to get you a copy by other means. > > Of possible interest: I am currently adding features to Daffodil that will > support OTH-G style check-digits i.e., verifying them, computing them on > unparse. > This will come out in release 3.2.0 later this year. > > -mikeb > > > > > On Thu, Oct 7, 2021 at 6:35 AM Theodore Toth <ted.toth....@sage.northcom.mil> > wrote: >> >> I'm still struggling with optional subelements at the end of an >> element this time for a complex type, the approach that worked for a >> simpleType doesn't work for a complex type. I'm getting "[error] >> Parse Error: Terminator '%NL;%WSP*;' not found". I'm not sure yet but >> a newline might not be a valid terminator for a OTH-GOLD message line >> :( >> Also how would you specify an optional literal like '//' at the end of >> an element when there can be other option subelements separated by '/' >> prior to it? >> >> On Sat, Sep 18, 2021 at 4:12 AM Beckerle, Mike >> <mbecke...@owlcyberdefense.com> wrote: >> > >> > Sorry for the late response on this. Turns out outlook 365 was spam >> > filtering some apache emails. It's a known issue with microsoft's spam >> > filters. >> > >> > The sequence wrapped around elem5 doesn't need a dfdl:separator because >> > the elem5 has maxOccurs 1, so there will never be enough things to >> > separate. >> > >> > Otherwise yeah, this looks like what I was suggesting. >> > >> > I agree that the DFDL spec is quite painful in numerous areas. >> > Unfortunately I have to take the blame for some of that. Someday I hope >> > some sections will get refactored and rewritten. >> > >> > >> > ________________________________ >> > From: Theodore Toth <ted.toth....@sage.northcom.mil> >> > Sent: Tuesday, August 31, 2021 12:21 AM >> > To: users@daffodil.apache.org <users@daffodil.apache.org> >> > Subject: Re: optional int and unparse formatting >> > >> > The following worked for me although I don't know if it's the 'right' >> > way to do it. Reading the spec can give you a headache. >> > >> > <?xml version="1.0" encoding="UTF-8"?> >> > <xs:schema >> > xmlns:xs="http://www.w3.org/2001/XMLSchema" >> > xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"> >> > >> > <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" /> >> > <xs:annotation> >> > <xs:appinfo source="http://www.ogf.org/dfdl/"> >> > <dfdl:format ref="default-dfdl-properties" /> >> > </xs:appinfo> >> > </xs:annotation> >> > >> > <xs:element name="FOO" >> > dfdl:initiator="FOO/" >> > dfdl:lengthKind="implicit" >> > dfdl:terminator="%NL;%WSP*;"> >> > >> > <xs:complexType> >> > <xs:sequence dfdl:sequenceKind="ordered" >> > dfdl:separator="/" >> > dfdl:separatorPosition="infix"> >> > >> > <xs:element name="elem1"> >> > <xs:simpleType> >> > <xs:restriction base="xs:string"> >> > <xs:minLength value="1"/> >> > <xs:maxLength value="14"/> >> > <xs:pattern value="[A-Z0-9,:%#*\- ]+"/> >> > </xs:restriction> >> > </xs:simpleType> >> > </xs:element> >> > >> > <xs:element name="elem2"> >> > <xs:simpleType> >> > <xs:restriction base="xs:string"> >> > <xs:pattern value="CAT|DOG|HORSE"/> >> > </xs:restriction> >> > </xs:simpleType> >> > </xs:element> >> > >> > <xs:element name="elem3" dfdl:textNumberPattern="#0000"> >> > <xs:simpleType> >> > <xs:restriction base="xs:int"> >> > <xs:minInclusive value="1"/> >> > <xs:maxInclusive value="99999"/> >> > </xs:restriction> >> > </xs:simpleType> >> > </xs:element> >> > >> > <xs:element name="elem4" minOccurs="0" maxOccurs="1"> >> > <xs:simpleType> >> > <xs:restriction base="xs:string"> >> > <xs:minLength value="1"/> >> > <xs:maxLength value="20"/> >> > </xs:restriction> >> > </xs:simpleType> >> > </xs:element> >> > >> > <xs:sequence dfdl:separator="/" dfdl:terminator="/" >> > dfdl:separatorSuppressionPolicy="anyEmpty"> >> > <xs:element name="elem5" minOccurs="0" maxOccurs="1" >> > dfdl:textNumberPattern="000"> >> > <xs:simpleType> >> > <xs:restriction base="xs:int"> >> > <xs:minInclusive value="1"/> >> > <xs:maxInclusive value="999"/> >> > </xs:restriction> >> > </xs:simpleType> >> > </xs:element> >> > </xs:sequence> >> > >> > </xs:sequence> >> > </xs:complexType> >> > </xs:element> >> > >> > </xs:schema> >> > >> > On Tue, Aug 31, 2021 at 9:31 AM Theodore Toth >> > <ted.toth....@sage.northcom.mil> wrote: >> > > >> > > Thanks for the response. >> > > >> > > On Tue, Aug 31, 2021 at 12:49 AM Beckerle, Mike >> > > <mbecke...@owlcyberdefense.com> wrote: >> > > > >> > > > Good question. >> > > > >> > > > I think what is happening is this. elem5 fails to parse because it is >> > > > an empty string, but then the parse backtracks, and here's the trick: >> > > > that means it is putting back the separator before this array/optional >> > > > element. Then your schema has nothing to absorb the final separator. >> > > > >> > > > Your schema has expressed an optional element, but what you want is a >> > > > required separator, then an optional element after it. >> > > > >> > > > I think wrapping an xs:sequence around elem5 will fix this. >> > > >> > > So the required separator goes on the sequence? >> > > >> > > > >> > > > To be sure, I need to see the occursCountKind property, lengthKind >> > > > property, etc. Basically I need to be able to reproduce your run. >> > > > I would need your default-dfdl-properties/defaults.dfdl.xsd file. >> > > > >> > > Here's my defaults that I pulled from the DFDL-part1 presentation: >> > > >> > > ?xml version="1.0" encoding="UTF-8"?> >> > > >> > > <schema xmlns="http://www.w3.org/2001/XMLSchema" >> > > xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" >> > > xmlns:xs="http://www.w3.org/2001/XMLSchema"> >> > > >> > > <xs:annotation> >> > > <xs:appinfo source="http://www.ogf.org/dfdl/"> >> > > <dfdl:defineFormat name="default-dfdl-properties"> >> > > <dfdl:format >> > > alignment="1" >> > > alignmentUnits="bytes" >> > > binaryFloatRep="ieee" >> > > binaryNumberRep="binary" >> > > bitOrder="mostSignificantBitFirst" >> > > byteOrder="bigEndian" >> > > calendarPatternKind="implicit" >> > > documentFinalTerminatorCanBeMissing="yes" >> > > emptyValueDelimiterPolicy="none" >> > > encoding="ISO-8859-1" >> > > encodingErrorPolicy="replace" >> > > escapeSchemeRef="" >> > > fillByte="f" >> > > floating="no" >> > > ignoreCase="no" >> > > initiator="" >> > > initiatedContent="no" >> > > leadingSkip="0" >> > > lengthKind="delimited" >> > > lengthUnits="characters" >> > > nilKind="literalValue" >> > > nilValueDelimiterPolicy="none" >> > > occursCountKind="implicit" >> > > outputNewLine="%CR;%LF;" >> > > representation="text" >> > > separator="" >> > > separatorPosition="infix" >> > > separatorSuppressionPolicy="never" >> > > sequenceKind="ordered" >> > > terminator="" >> > > textBidi="no" >> > > textNumberCheckPolicy="strict" >> > > textNumberPattern="#,##0.###;-#,##0.###" >> > > textNumberRep="standard" >> > > textNumberRounding="explicit" >> > > textNumberRoundingIncrement="0" >> > > textNumberRoundingMode="roundUnnecessary" >> > > textOutputMinLength="0" >> > > textPadKind="none" >> > > textStandardBase="10" >> > > textStandardExponentRep="E" >> > > textStandardInfinityRep="Inf" >> > > textStandardNaNRep="NaN" >> > > textStandardZeroRep="0" >> > > textStandardDecimalSeparator="." >> > > textStandardGroupingSeparator="," >> > > textTrimKind="none" >> > > trailingSkip="0" >> > > truncateSpecifiedLengthString="no" >> > > utf16Width="fixed"/> >> > > </dfdl:defineFormat> >> > > </xs:appinfo> >> > > </xs:annotation> >> > > </schema> >> > > >> > > >> > > > w.r.t your 0001 issue.... >> > > > >> > > > The ability to control text number formats like leading zeros, is by >> > > > way of the dfdl:textNumberPattern property. I think you want different >> > > > values for this property for your two integer-type elements if they >> > > > are supposed to have different numbers of digits, as evidenced by >> > > > their max values of 999 and 99999. >> > > > >> > > > However, your request that 0001 be preserved is not consistent with >> > > > either 999 nor 99999 as max values. So I'm not sure what you are >> > > > trying to achieve in this format. >> > > >> > > Just trying to teach an old dog some new tricks. >> > > >> > > > >> > > > DFDL does not "remember how the integer was presented". It parses it >> > > > according to rules, creates an xs:int in the infoset, and at that >> > > > point the leading zero information is gone. It then unparses according >> > > > to rules. If you want 0001 to parse and unparse as 0001, you want >> > > > dfdl:textNumberPattern="#0000". That will give you 4 digits, >> > > > optionally a fifth if needed, but will always produce 4. >> > > > >> > > > But in this case, if you are first parsing, then unparsing data, then >> > > > incoming "01" will also unparse as "0001". Using >> > > > dfdl:textNumberPattern="#0000" means "canonical form for this data is >> > > > at least 4 digits". If you parse the data using >> > > > dfdl:lengthKind='delimited', then your schema has expressed "tolerate >> > > > any number of digits, but always canonicalize to at least 4 digits". >> > > >> > > I'll play with this. >> > > >> > > > >> > > > If you want the text of these numbers preserved, not canonicalized, >> > > > and your application does both parse and unparse, like data security >> > > > apps often do, then you need to use strings, not numbers. >> > > >> > > If I were to use strings how would I then validate that the value was >> > > in some range? >> > > >> > > > >> > > > Note, however, that preserving leading/trailing non-numerically >> > > > significant zeros is a security hole - they can be used to carry >> > > > covert channel data. >> > > > Canonicalization of data is fundamentally more secure. >> > > > >> > > > The usual reason people want preservation of data exactly, character >> > > > for character, is to make test/QA easier. That's ok so long as you get >> > > > that there is a loss of some data security when >> > > > non-information-carrying things like leading/trailing zeros are >> > > > preserved. >> > > > >> > > > >> > > > >> > > > ________________________________ >> > > > From: Theodore Toth <ted.toth....@sage.northcom.mil> >> > > > Sent: Sunday, August 29, 2021 2:45 AM >> > > > To: users@daffodil.apache.org <users@daffodil.apache.org> >> > > > Subject: optional int and unparse formatting >> > > > >> > > > I just started looking at daffodil and have a few questions about my >> > > > first experiment: >> > > > Here's my dfdl: >> > > > >> > > > <?xml version="1.0" encoding="UTF-8"?> >> > > > <xs:schema >> > > > xmlns:xs="http://www.w3.org/2001/XMLSchema" >> > > > xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"> >> > > > >> > > > <xs:include >> > > > schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" /> >> > > > <xs:annotation> >> > > > <xs:appinfo source="http://www.ogf.org/dfdl/"> >> > > > <dfdl:format ref="default-dfdl-properties" /> >> > > > </xs:appinfo> >> > > > </xs:annotation> >> > > > >> > > > <xs:element name="FOO" >> > > > dfdl:initiator="FOO/" >> > > > dfdl:lengthKind="implicit"> >> > > > <!-- >> > > > dfdl:terminator="//%NL;%WSP*;"> >> > > > --> >> > > > <xs:complexType> >> > > > <xs:sequence dfdl:sequenceKind="ordered" >> > > > dfdl:separator="/" >> > > > dfdl:separatorPosition="infix"> >> > > > >> > > > <xs:element name="elem1"> >> > > > <xs:simpleType> >> > > > <xs:restriction base="xs:string"> >> > > > <xs:minLength value="1"/> >> > > > <xs:maxLength value="14"/> >> > > > </xs:restriction> >> > > > </xs:simpleType> >> > > > </xs:element> >> > > > >> > > > <xs:element name="elem2"> >> > > > <xs:simpleType> >> > > > <xs:restriction base="xs:string"> >> > > > <xs:pattern value="CAT|DOG|HORSE"/> >> > > > </xs:restriction> >> > > > </xs:simpleType> >> > > > </xs:element> >> > > > >> > > > <xs:element name="elem3"> >> > > > <xs:simpleType> >> > > > <xs:restriction base="xs:int"> >> > > > <xs:minInclusive value="1"/> >> > > > <xs:maxInclusive value="99999"/> >> > > > </xs:restriction> >> > > > </xs:simpleType> >> > > > </xs:element> >> > > > >> > > > <xs:element name="elem4" minOccurs="0" maxOccurs="1"> >> > > > <xs:simpleType> >> > > > <xs:restriction base="xs:string"> >> > > > <xs:minLength value="1"/> >> > > > <xs:maxLength value="20"/> >> > > > </xs:restriction> >> > > > </xs:simpleType> >> > > > </xs:element> >> > > > >> > > > <xs:element name="elem5" minOccurs="0" maxOccurs="1"> >> > > > <xs:simpleType> >> > > > <xs:restriction base="xs:int"> >> > > > <xs:minInclusive value="1"/> >> > > > <xs:maxInclusive value="999"/> >> > > > </xs:restriction> >> > > > </xs:simpleType> >> > > > </xs:element> >> > > > </xs:sequence> >> > > > </xs:complexType> >> > > > </xs:element> >> > > > >> > > > </xs:schema> >> > > > >> > > > Here's some test data: >> > > > FOO/GONE FISHIN/DOG/0001/// >> > > > >> > > > The parse fails with: >> > > > [error] Parse Error: Unable to parse xs:int from empty string >> > > > Schema context: elem5 Location line 59 column 10 in >> > > > file:/home/tedx/dfdl-test/test.dfdl.xsd >> > > > Data location was preceding byte 26 >> > > > >> > > > Why does it fail when elem5 has minOccurs="0"? elem5 is optional. >> > > > >> > > > Then if I put a 0 before the last slash it generates: >> > > > <?xml version="1.0" encoding="UTF-8"?> >> > > > <FOO> >> > > > <elem1>GONE FISHIN</elem1> >> > > > <elem2>DOG</elem2> >> > > > <elem3>1</elem3> >> > > > <elem4></elem4> >> > > > <elem5>0</elem5> >> > > > </FOO> >> > > > >> > > > and when I unparse it generates: >> > > > FOO/GONE FISHIN/DOG/1//0 >> > > > >> > > > but I'd like it to output 0001 for elem3, how do I do that? >> > > > >> > > > Ted