Good question. I think what is happening is this. elem5 fails to parse because it is an empty string, but then the parse backtracks, and here's the trick: that means it is putting back the separator before this array/optional element. Then your schema has nothing to absorb the final separator.
Your schema has expressed an optional element, but what you want is a required separator, then an optional element after it. I think wrapping an xs:sequence around elem5 will fix this. To be sure, I need to see the occursCountKind property, lengthKind property, etc. Basically I need to be able to reproduce your run. I would need your default-dfdl-properties/defaults.dfdl.xsd file. w.r.t your 0001 issue.... The ability to control text number formats like leading zeros, is by way of the dfdl:textNumberPattern property. I think you want different values for this property for your two integer-type elements if they are supposed to have different numbers of digits, as evidenced by their max values of 999 and 99999. However, your request that 0001 be preserved is not consistent with either 999 nor 99999 as max values. So I'm not sure what you are trying to achieve in this format. DFDL does not "remember how the integer was presented". It parses it according to rules, creates an xs:int in the infoset, and at that point the leading zero information is gone. It then unparses according to rules. If you want 0001 to parse and unparse as 0001, you want dfdl:textNumberPattern="#0000". That will give you 4 digits, optionally a fifth if needed, but will always produce 4. But in this case, if you are first parsing, then unparsing data, then incoming "01" will also unparse as "0001". Using dfdl:textNumberPattern="#0000" means "canonical form for this data is at least 4 digits". If you parse the data using dfdl:lengthKind='delimited', then your schema has expressed "tolerate any number of digits, but always canonicalize to at least 4 digits". If you want the text of these numbers preserved, not canonicalized, and your application does both parse and unparse, like data security apps often do, then you need to use strings, not numbers. Note, however, that preserving leading/trailing non-numerically significant zeros is a security hole - they can be used to carry covert channel data. Canonicalization of data is fundamentally more secure. The usual reason people want preservation of data exactly, character for character, is to make test/QA easier. That's ok so long as you get that there is a loss of some data security when non-information-carrying things like leading/trailing zeros are preserved. ________________________________ From: Theodore Toth <ted.toth....@sage.northcom.mil> Sent: Sunday, August 29, 2021 2:45 AM To: users@daffodil.apache.org <users@daffodil.apache.org> Subject: optional int and unparse formatting I just started looking at daffodil and have a few questions about my first experiment: Here's my dfdl: <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"> <xs:include schemaLocation="default-dfdl-properties/defaults.dfdl.xsd" /> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format ref="default-dfdl-properties" /> </xs:appinfo> </xs:annotation> <xs:element name="FOO" dfdl:initiator="FOO/" dfdl:lengthKind="implicit"> <!-- dfdl:terminator="//%NL;%WSP*;"> --> <xs:complexType> <xs:sequence dfdl:sequenceKind="ordered" dfdl:separator="/" dfdl:separatorPosition="infix"> <xs:element name="elem1"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:minLength value="1"/> <xs:maxLength value="14"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="elem2"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="CAT|DOG|HORSE"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="elem3"> <xs:simpleType> <xs:restriction base="xs:int"> <xs:minInclusive value="1"/> <xs:maxInclusive value="99999"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="elem4" minOccurs="0" maxOccurs="1"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:minLength value="1"/> <xs:maxLength value="20"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="elem5" minOccurs="0" maxOccurs="1"> <xs:simpleType> <xs:restriction base="xs:int"> <xs:minInclusive value="1"/> <xs:maxInclusive value="999"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> Here's some test data: FOO/GONE FISHIN/DOG/0001/// The parse fails with: [error] Parse Error: Unable to parse xs:int from empty string Schema context: elem5 Location line 59 column 10 in file:/home/tedx/dfdl-test/test.dfdl.xsd Data location was preceding byte 26 Why does it fail when elem5 has minOccurs="0"? elem5 is optional. Then if I put a 0 before the last slash it generates: <?xml version="1.0" encoding="UTF-8"?> <FOO> <elem1>GONE FISHIN</elem1> <elem2>DOG</elem2> <elem3>1</elem3> <elem4></elem4> <elem5>0</elem5> </FOO> and when I unparse it generates: FOO/GONE FISHIN/DOG/1//0 but I'd like it to output 0001 for elem3, how do I do that? Ted