Hi Folks,

Please let me know of anything that is unclear.  /Roger
--------------------------------------------------------------------------------------
4. Fixed length, nillable, composite, choice

A composite field is one that is composed of parts. There is no separator 
between the parts. The parts may be fixed length or variable length. The parts 
are non-nillable, although the composite field itself may be nillable.
This section deals with a nillable field whose value is a choice between two 
composite fields and the composite fields contain parts that are fixed length.
We will create a DFDL schema for a field containing the date that a book was 
published. I named the field "PublicationDate." There are two ways to express 
the publication date:

  1.  4-digit year followed by a 3-letter month
  2.  3-letter month followed by a 4-digit year
Here is a sample value for the first way:
2022SEP
Here is a sample value for the second way:
SEP2022
In both cases, the field is composite with two parts. The field has a length of 
7.
If no data is available, then the field will contain a hyphen.
Field Requirements:
>>  Fixed length (7)
>>  Nillable, hyphen is the nil value, the hyphen may be positioned anywhere 
>> within the 7-character field
>>  Choice of values
>>  Each choice is composite, each choice has 2 parts

PublicationDate has a complexType and its value may be nil. Recall from section 
2 that a complexType element with a nillable value is a problem. The workaround 
is to put a wrapper element around PublicationDate. The wrapper element 
(PublicationDateWrapper) has a choice of values: a simpleType element 
(PublicationDate_) that is used for the case where the input contains a nil 
value, and the other branch of the choice is the PublicationDate element:
<xs:element name="PublicationDateWrapper">
    <xs:complexType>
        <xs:choice>
            <xs:element name="PublicationDate_" type="xs:string" 
nillable="true" />
            <xs:element name="PublicationDate">
                <!-- choice of Year, Month or Month, Year -->
            </xs:element>
        </xs:choice>
    </xs:complexType>
</xs:element>
Here is an XML Schema declaration of PublicationDate, sans any DFDL properties 
(I highlighted in yellow the field name - PublicationDate - and its two choices 
and for each choice its part names):
<xs:element name="PublicationDate">
    <xs:complexType>
        <xs:choice>
            <xs:element name="YearMonth">   <!-- branch #1 -->
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="Year">
                            <xs:simpleType>
                                <xs:restriction base="xs:string">
                                    <xs:pattern value="[0-9]{4}"/>
                                </xs:restriction>
                            </xs:simpleType>
                        </xs:element>
                        <xs:element name="Month">
                            <xs:simpleType>
                                <xs:restriction base="xs:string">
                                    <xs:enumeration value="JAN"/>
                                    <xs:enumeration value="FEB"/>
                                    <xs:enumeration value="MAR"/>
                                    <xs:enumeration value="APR"/>
                                    <xs:enumeration value="MAY"/>
                                    <xs:enumeration value="JUN"/>
                                    <xs:enumeration value="JUL"/>
                                    <xs:enumeration value="AUG"/>
                                    <xs:enumeration value="SEP"/>
                                    <xs:enumeration value="OCT"/>
                                    <xs:enumeration value="NOV"/>
                                    <xs:enumeration value="DEC"/>
                                </xs:restriction>
                            </xs:simpleType>
                        </xs:element>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
            <xs:element name="MonthYear">   <!-- branch #2 -->
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="Month">
                            <xs:simpleType>
                                <xs:restriction base="xs:string">
                                    <xs:enumeration value="JAN"/>
                                    <xs:enumeration value="FEB"/>
                                    <xs:enumeration value="MAR"/>
                                    <xs:enumeration value="APR"/>
                                    <xs:enumeration value="MAY"/>
                                    <xs:enumeration value="JUN"/>
                                    <xs:enumeration value="JUL"/>
                                    <xs:enumeration value="AUG"/>
                                    <xs:enumeration value="SEP"/>
                                    <xs:enumeration value="OCT"/>
                                    <xs:enumeration value="NOV"/>
                                    <xs:enumeration value="DEC"/>
                                </xs:restriction>
                            </xs:simpleType>
                        </xs:element>
                        <xs:element name="Year">
                            <xs:simpleType>
                                <xs:restriction base="xs:string">
                                    <xs:pattern value="[0-9]{4}"/>
                                </xs:restriction>
                            </xs:simpleType>
                        </xs:element>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
        </xs:choice>
    </xs:complexType>
</xs:element>
Each branch has two parts, and they are fixed length. Add to them these two 
DFDL properties:
dfdl:lengthKind="explicit"
dfdl:length="__"
Consider how this input:
SEP2022
is parsed. The first part (SEP) is the month. The parse starts down first 
branch and immediately fails since SEP does not satisfy the facets of the first 
element (Year). An error is thrown and parsing halts. The parser does not 
backup and try the other branch.
The solution is to add checkConstraints() in the declaration of Year:
<xs:element name="Year">
    <xs:annotation>
        <xs:appinfo source="http://www.ogf.org/dfdl/";>
            <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
        </xs:appinfo>
    </xs:annotation>
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:pattern value="[0-9]{4}"/>
        </xs:restriction>
    </xs:simpleType>
</xs:element>
The checkConstraints() tells the parser to validate the data against the facets 
and if validation fails then backup and try the other choice branch.
Here's the DFDL schema with the DFDL properties added (shown in yellow):
<xs:element name="PublicationDate">
    <xs:complexType>
        <xs:choice>
            <xs:element name="YearMonth">   <!-- branch #1 -->
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="Year"
                                               dfdl:lengthKind="explicit"
                                               dfdl:length="4">
                           <xs:annotation>
                               <xs:appinfo source="http://www.ogf.org/dfdl/";>
                                   <dfdl:assert>{ dfdl:checkConstraints(.) 
}</dfdl:assert>
                               </xs:appinfo>
                            </xs:annotation>
                            <xs:simpleType>
                                <xs:restriction base="xs:string">
                                    <xs:pattern value="[0-9]{4}"/>
                                </xs:restriction>
                            </xs:simpleType>
                        </xs:element>
                        <xs:element name="Month"
                                               dfdl:lengthKind="explicit"
                                               dfdl:length="3">
                            <xs:simpleType>
                                <xs:restriction base="xs:string">
                                    <xs:enumeration value="JAN"/>
                                    <xs:enumeration value="FEB"/>
                                    ...
                                </xs:restriction>
                            </xs:simpleType>
                        </xs:element>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
            <xs:element name="MonthYear">   <!-- branch #2 -->
                <xs:complexType>
                    <xs:sequence>
                        <xs:element name="Month"
                                              dfdl:lengthKind="explicit"
                                               dfdl:length="3">
                            <xs:simpleType>
                                <xs:restriction base="xs:string">
                                    <xs:enumeration value="JAN"/>
                                    <xs:enumeration value="FEB"/>
                                    ...
                                </xs:restriction>
                            </xs:simpleType>
                        </xs:element>
                        <xs:element name="Year"
                                               dfdl:lengthKind="explicit"
                                               dfdl:length="4">
                            <xs:simpleType>
                                <xs:restriction base="xs:string">
                                    <xs:pattern value="[0-9]{4}"/>
                                </xs:restriction>
                            </xs:simpleType>
                        </xs:element>
                    </xs:sequence>
                </xs:complexType>
            </xs:element>
        </xs:choice>
    </xs:complexType>
</xs:element>
Notice that the last part of the second branch (Year) has no DFDL added. This 
is because I am assuming that it is followed by the delimiter for the 
PublicationDate field.
The wrapper element and its child nil element are exactly analogous to that 
shown in section 2.
<xs:element name="PublicationDateWrapper">
    <xs:complexType>
        <xs:choice dfdl:choiceLengthKind="implicit">
            <xs:element name="PublicationDate_" type="xs:string" nillable="true"
                                   dfdl:nilKind="literalValue"
                                   dfdl:nilValue="%WSP*;-%WSP*;">
                <xs:annotation>
                    <xs:appinfo source="http://www.ogf.org/dfdl/";>
                        <dfdl:assert>{ fn:nilled(.) }</dfdl:assert>
                    </xs:appinfo>
                </xs:annotation>
            </xs:element>
            <!-- see PublicationDate above -->
        </xs:choice>
    </xs:complexType>
</xs:element>
One last (important) point: When parsing input with Daffodil use the -V limited 
option. The option instructs Daffodil to validate each part of the composite 
fields against the XSD facets. With this erroneous input value:
2022xxx

Daffodil gives this very helpful error message on parsing:

[error] Validation Error: Month failed facet checks due to: facet 
enumeration(s): JAN|FEB|...

If you don't use the -V limited option, then Daffodil won't validate the parts 
against the XSD facets. Consequently, Daffodil will not report any errors with 
the above erroneous input. Why? Because if we ignore the facets in this element 
declaration:
<xs:element name="Month"
                       dfdl:lengthKind="explicit"
                       dfdl:length="3">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:enumeration value="JAN"/>
            <xs:enumeration value="FEB"/>
            ...
        </xs:restriction>
    </xs:simpleType>
</xs:element>
then it is simply saying that the input is any text of length 3, and "xxx" 
certainly fits that specification.


Reply via email to