Hi Folks, Please let me know of anything that is unclear. /Roger -------------------------------------------------------------------------------------- 16. Variable length, not nillable, composite, choice
A composite field is one that is composed of parts. There is no separator between the parts. The parts may be fixed length or variable length. The parts are non-nillable, although the composite field itself may be nillable. This section deals with a non-nillable field whose value is a choice between two composite fields and the composite fields contain parts that are variable length. We will create a DFDL schema for a field containing the course and speed of a group of hikers. I named the field "CourseAndSpeed." There are two ways to express the hiker's course and speed: 1. Course in degrees followed by measured speed (in knots) 2. Course in degrees followed by perceived speed (fast, slow, medium) Here is a sample value for the first way: 330T5KNOTS Read that as: The hikers are travelling in the direction 330 degrees True at a speed of 5 knots. Alternatively, the field may be populated with a value like this: 330TSLOW Read that as: The hikers are travelling in the direction 330 degrees True at a slow speed. In both cases, there is only one value. In the first case, there are four parts: The first part (330) is the course. It has variable length, 1-3 digits. The second part (T) is the angular measurement. It has fixed length, 1 character. The third part (5) is an integer representing the speed. It has variable length, 1-3 digits, optionally followed by a decimal point and 1-2 digits. The fourth part (KNOTS) is the speed units. It has fixed length, 5 characters. In the second case, there are three parts: The first part (330) is the course. It has variable length, 1-3 digits. The second part (T) is the angular measurement. It has fixed length, 1 character. The third part (SLOW) is an enumeration value (values are FAST, MEDIUM, SLOW, and ZERO). It has variable length, 4-6 characters. Field Requirements: >> Choice of values >> Each choice has variable length >> Each choice is composite, the first choice has 4 parts, the second choice >> has 3 parts Here is an XML Schema declaration of CourseAndSpeed, sans any DFDL properties (I highlighted in yellow the field name - CourseAndSpeed - and its two choices and for each choice its part names): <xs:element name="CourseAndSpeed"> <xs:complexType> <xs:choice> <xs:element name="CourseAndMeasuredSpeed"> <!-- first choice --> <xs:complexType> <xs:sequence> <xs:element name="Course"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{1,3}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="AngularMeasurement"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="T"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Speed"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{1,3}"/> <xs:pattern value="[0-9]?\.[0-9]{1,2}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="SpeedUnit"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="KNOTS"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="CourseAndPerceivedSpeed"> <!-- second branch --> <xs:complexType> <xs:sequence> <xs:element name="Course"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{1,3}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="AngularMeasurement"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="T"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Speed"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="FAST"/> <xs:enumeration value="MEDIUM"/> <xs:enumeration value="SLOW"/> <xs:enumeration value="ZERO"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> </xs:element> In the first choice, the following parts have fixed length: AngularMeasurement and SpeedUnit. These parts have variable length: Course and Speed. In the second choice, only one part has fixed length: AngularMeasurement. The other parts have variable length: Course and Speed. For the fixed length parts, add two DFDL properties: dfdl:lengthKind="explicit" dfdl:length="__" For example, AngularMeasurement has a fixed length of 1. Here is its declaration, with the DFDL properties (in yellow) added: <xs:element name="AngularMeasurement" dfdl:lengthKind="explicit" dfdl:length="1"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="T"/> </xs:restriction> </xs:simpleType> </xs:element> Use the same strategy for the other fixed fields. Course is variable length. The part that follows it (AngularMeasurement) has a fixed length (its value is T). To declare Course, add these two DFDL properties: dfdl:lengthKind="pattern" dfdl:lengthPattern="regex" For the regex use a lookahead pattern. Here is Course, extended with the DFDL properties (in yellow): <xs:element name="Course" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(T))"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{1,3}"/> </xs:restriction> </xs:simpleType> </xs:element> Read that as: the content of Course is the text up to, but not including T. Use the same regex lookahead strategy for Speed. Thus far our description of how to deal with the fixed and variable parts of a composite field are identical to the description in section 11. For this section, however, there is a challenge that is not present in section 11. Consider how this input: 330TSLOW will be parsed. The first part (330) is a course. Since the first branch has a Course element, the parser starts down that path. The parser successfully processes 330 and then it successfully processes T (AngularMeasurement). The next element declaration is Speed, and its DFDL properties says that its value is "any text up to but not including KNOTS." Well, SLOW fits that description, so the parser assumes that the value of Speed is SLOW. Next the parser gets to the element declaration for SpeedUnit and suddenly realizes that there is no data available to populate it. An error is thrown and parsing halts. The parser does not backup and try the other branch. The solution is to add checkConstraints() in the declaration of SpeedUnit: <xs:element name="SpeedUnit"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert> </xs:appinfo> </xs:annotation> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="KNOTS"/> </xs:restriction> </xs:simpleType> </xs:element> The checkConstraints() tells the parser to validate the data against the facets and if validation fails then backup and try the other choice branch. Here's the DFDL schema with the DFDL properties added (shown in yellow): <xs:element name="CourseAndSpeed"> <xs:complexType> <xs:choice dfdl:choiceLengthKind="implicit"> <xs:element name="CourseAndMeasuredSpeed"> <!-- first choice --> <xs:complexType> <xs:sequence> <xs:element name="Course" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(T))"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{1,3}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="AngularMeasurement" dfdl:lengthKind="explicit" dfdl:length="1"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="T"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Speed" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(KNOTS))"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{1,3}"/> <xs:pattern value="[0-9]?\.[0-9]{1,2}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="SpeedUnit"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert> </xs:appinfo> </xs:annotation> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="KNOTS"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="CourseAndPerceivedSpeed"> <!-- second branch --> <xs:complexType> <xs:sequence> <xs:element name="Course" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(T))"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{1,3}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="AngularMeasurement" dfdl:lengthKind="explicit" dfdl:length="1"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="T"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Speed"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="FAST"/> <xs:enumeration value="MEDIUM"/> <xs:enumeration value="SLOW"/> <xs:enumeration value="ZERO"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:choice> </xs:complexType> </xs:element> Notice that the last part of the second branch (Speed) has no DFDL added. This is because I am assuming that it is followed by the delimiter for the CourseAndSpeed field. One last (important) point: When parsing input with Daffodil use the -V limited option. The option instructs Daffodil to validate each part of the composite fields against the XSD facets. With this erroneous input value: xxxTSLOW Daffodil gives this very helpful error message on parsing: [error] Validation Error: Course failed facet checks due to: facet pattern(s): [0-9]{3} If you don't use the -V limited option, then Daffodil won't validate the parts against the XSD facets. Consequently, Daffodil will not report any errors with the above erroneous input. Why? Because if we ignore the facets in this element declaration: <xs:element name="Course" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(T))"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{1,3}"/> </xs:restriction> </xs:simpleType> </xs:element> then it is simply saying that the input is any text as long is it's not T, and "xxx" certainly fits that specification.