Hi Folks, Please let me know of anything that is unclear. /Roger -------------------------------------------------------------------------------------- 4. Fixed length, nillable, composite, choice
A composite field is one that is composed of parts. There is no separator between the parts. The parts may be fixed length or variable length. The parts are non-nillable, although the composite field itself may be nillable. This section deals with a nillable field whose value is a choice between two composite fields and the composite fields contain parts that are fixed length. We will create a DFDL schema for a field containing the date that a book was published. I named the field "PublicationDate." There are two ways to express the publication date: 1. 4-digit year followed by a 3-letter month 2. 3-letter month followed by a 4-digit year Here is a sample value for the first way: 2022SEP Here is a sample value for the second way: SEP2022 In both cases, the field is composite with two parts. The field has a length of 7. If no data is available, then the field will contain a hyphen. Field Requirements: >> Fixed length (7) >> Nillable, hyphen is the nil value, the hyphen may be positioned anywhere >> within the 7-character field >> Choice of values >> Each choice is composite, each choice has 2 parts PublicationDate has a complexType and its value may be nil. Recall from section 2 that a complexType element with a nillable value is a problem. The workaround is to put a wrapper element around PublicationDate. The wrapper element (PublicationDateWrapper) has a choice of values: a simpleType element (PublicationDate_) that is used for the case where the input contains a nil value, and the other branch of the choice is the PublicationDate element: <xs:element name="PublicationDateWrapper"> <xs:complexType> <xs:choice> <xs:element name="PublicationDate_" type="xs:string" nillable="true" /> <xs:element name="PublicationDate"> <!-- choice of Year, Month or Month, Year --> </xs:element> </xs:choice> </xs:complexType> </xs:element> Here is an XML Schema declaration of PublicationDate, sans any DFDL properties (I highlighted in yellow the field name - PublicationDate - and its two choices and for each choice its part names): <xs:element name="PublicationDate"> <xs:complexType> <xs:choice> <xs:element name="YearMonth"> <!-- branch #1 --> <xs:complexType> <xs:sequence> <xs:element name="Year"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{4}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Month"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="JAN"/> <xs:enumeration value="FEB"/> <xs:enumeration value="MAR"/> <xs:enumeration value="APR"/> <xs:enumeration value="MAY"/> <xs:enumeration value="JUN"/> <xs:enumeration value="JUL"/> <xs:enumeration value="AUG"/> <xs:enumeration value="SEP"/> <xs:enumeration value="OCT"/> <xs:enumeration value="NOV"/> <xs:enumeration value="DEC"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="MonthYear"> <!-- branch #2 --> <xs:complexType> <xs:sequence> <xs:element name="Month"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="JAN"/> <xs:enumeration value="FEB"/> <xs:enumeration value="MAR"/> <xs:enumeration value="APR"/> <xs:enumeration value="MAY"/> <xs:enumeration value="JUN"/> <xs:enumeration value="JUL"/> <xs:enumeration value="AUG"/> <xs:enumeration value="SEP"/> <xs:enumeration value="OCT"/> <xs:enumeration value="NOV"/> <xs:enumeration value="DEC"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Year"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{4}"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> </xs:element> Each branch has two parts, and they are fixed length. Add to them these two DFDL properties: dfdl:lengthKind="explicit" dfdl:length="__" Consider how this input: SEP2022 is parsed. The first part (SEP) is the month. The parse starts down first branch and immediately fails since SEP does not satisfy the facets of the first element (Year). An error is thrown and parsing halts. The parser does not backup and try the other branch. The solution is to add checkConstraints() in the declaration of Year: <xs:element name="Year"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert> </xs:appinfo> </xs:annotation> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{4}"/> </xs:restriction> </xs:simpleType> </xs:element> The checkConstraints() tells the parser to validate the data against the facets and if validation fails then backup and try the other choice branch. Here's the DFDL schema with the DFDL properties added (shown in yellow): <xs:element name="PublicationDate"> <xs:complexType> <xs:choice> <xs:element name="YearMonth"> <!-- branch #1 --> <xs:complexType> <xs:sequence> <xs:element name="Year" dfdl:lengthKind="explicit" dfdl:length="4"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert> </xs:appinfo> </xs:annotation> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{4}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Month" dfdl:lengthKind="explicit" dfdl:length="3"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="JAN"/> <xs:enumeration value="FEB"/> ... </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="MonthYear"> <!-- branch #2 --> <xs:complexType> <xs:sequence> <xs:element name="Month" dfdl:lengthKind="explicit" dfdl:length="3"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="JAN"/> <xs:enumeration value="FEB"/> ... </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Year" dfdl:lengthKind="explicit" dfdl:length="4"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{4}"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> </xs:element> Notice that the last part of the second branch (Year) has no DFDL added. This is because I am assuming that it is followed by the delimiter for the PublicationDate field. The wrapper element and its child nil element are exactly analogous to that shown in section 2. <xs:element name="PublicationDateWrapper"> <xs:complexType> <xs:choice dfdl:choiceLengthKind="implicit"> <xs:element name="PublicationDate_" type="xs:string" nillable="true" dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;-%WSP*;"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert>{ fn:nilled(.) }</dfdl:assert> </xs:appinfo> </xs:annotation> </xs:element> <!-- see PublicationDate above --> </xs:choice> </xs:complexType> </xs:element> One last (important) point: When parsing input with Daffodil use the -V limited option. The option instructs Daffodil to validate each part of the composite fields against the XSD facets. With this erroneous input value: 2022xxx Daffodil gives this very helpful error message on parsing: [error] Validation Error: Month failed facet checks due to: facet enumeration(s): JAN|FEB|... If you don't use the -V limited option, then Daffodil won't validate the parts against the XSD facets. Consequently, Daffodil will not report any errors with the above erroneous input. Why? Because if we ignore the facets in this element declaration: <xs:element name="Month" dfdl:lengthKind="explicit" dfdl:length="3"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="JAN"/> <xs:enumeration value="FEB"/> ... </xs:restriction> </xs:simpleType> </xs:element> then it is simply saying that the input is any text of length 3, and "xxx" certainly fits that specification.