Hi Folks, I am jumping around in my writeups. As always, please let me know of anything that is unclear. /Roger -------------------------------------------------------------------------------------- 11. Variable length, nillable, composite, no choice
A composite field is one that is composed of parts. There is no separator between the parts. The parts may be fixed length or variable length. The parts are non-nillable, although the composite field itself may be nillable. This section deals with composite fields containing parts that are variable length and the field is nillable. We will create a DFDL schema for a "Location" field that has a latitude and longitude, separated by a dash. Here is a sample value: 2006N-05912E That is one value with 7 parts: The first two digits (20) represents a latitude in degrees. The next two digits (06) represents the latitude in minutes. The N indicates the latitude's hemisphere. The dash ( - ) separates the latitude values from the following longitude values. The 059 represents the longitude in degrees. The 12 represents the longitude in minutes. The E represents the longitude hemisphere. In other words, the location is latitude 20 degrees, 6 minutes North, longitude 59 degrees, 12 minutes East. Both the latitude minute and longitude minute are variable length are expressed as a two-digit integer or as a decimal value. If a decimal, there may be 1-4 digits to the right of the decimal point. Here are Location values with minute parts (highlighted in yellow) that have decimal values: 4221.6N-71003.5W 4221.63N-71003.57W 4221.630N-71003.576W 4221.6300N-71003.5760W Here is one more example of a valid Location value: - That value means: no data was available to populate the field. To re-emphasize, Location is a variable length, nillable, composite field. Here is an XML Schema declaration of Location, sans any DFDL properties (I highlighted in yellow the field name and part names): <xs:element name="Location" nillable="true"> <xs:complexType> <xs:sequence> <xs:element name="LatitudeDegrees"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LatitudeMinutes"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}" /> <xs:pattern value="[0-9]{2}\.[0-9]{1}" /> <xs:pattern value="[0-9]{2}\.[0-9]{2}" /> <xs:pattern value="[0-9]{2}\.[0-9]{3}" /> <xs:pattern value="[0-9]{2}\.[0-9]{4}" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LatitudeHemisphere"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="N" /> <xs:enumeration value="S" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Hyphen"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="-" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeDegrees"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{3}" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeMinutes"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}" /> <xs:pattern value="[0-9]{2}\.[0-9]{1}" /> <xs:pattern value="[0-9]{2}\.[0-9]{2}" /> <xs:pattern value="[0-9]{2}\.[0-9]{3}" /> <xs:pattern value="[0-9]{2}\.[0-9]{4}" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeHemisphere"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="E" /> <xs:enumeration value="W" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> These parts have fixed length: LatitudeDegrees, LatitudeHemisphere, Hyphen, LongitudeDegrees, and LongitudeHemisphere. These parts have variable length: LatitudeMinutes and LongitudeMinutes. For the fixed length parts, add these two DFDL properties: dfdl:lengthKind="explicit" dfdl:length="__" For example, LatitudeDegrees has a fixed length of 2. Here is its declaration, with the DFDL properties (in yellow) added: <xs:element name="LatitudeDegrees" dfdl:lengthKind="explicit" dfdl:length="2"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}" /> </xs:restriction> </xs:simpleType> </xs:element> Use the same strategy for the other fixed fields. LatitudeMinutes is variable length. The part that follows it (LatitudeHemisphere) has a fixed length (its value is either N or S). To declare LatitudeMinutes, add these two DFDL properties: dfdl:lengthKind="pattern" dfdl:lengthPattern="regex" In the regex use a lookahead pattern. Here is LatitudeMinutes, extended with the DFDL properties (in yellow): <xs:element name="LatitudeMinutes" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(N|S))"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}"/> <xs:pattern value="[0-9]{2}\.[0-9]{1}"/> <xs:pattern value="[0-9]{2}\.[0-9]{2}"/> <xs:pattern value="[0-9]{2}\.[0-9]{3}"/> <xs:pattern value="[0-9]{2}\.[0-9]{4}"/> </xs:restriction> </xs:simpleType> </xs:element> Read that as: the content of LatitudeMinutes is the text up to, but not including N or S. Use the same regex lookahead strategy for LongitudeMinutes. As I stated earlier, Location is nillable with hyphen as the nil value. Further, Location has a complexType. That is a problem. See section 2 for a complete discussion of the problem with nillable complexTypes and how to deal with it. Here's the DFDL schema for the Location field (DFDL is shown in yellow): <xs:element name="Location"> <xs:complexType> <xs:sequence> <xs:element name="LatitudeDegrees" dfdl:lengthKind="explicit" dfdl:length="2"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LatitudeMinutes" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(N|S))"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}" /> <xs:pattern value="[0-9]{2}\.[0-9]{1}" /> <xs:pattern value="[0-9]{2}\.[0-9]{2}" /> <xs:pattern value="[0-9]{2}\.[0-9]{3}" /> <xs:pattern value="[0-9]{2}\.[0-9]{4}" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LatitudeHemisphere" dfdl:lengthKind="explicit" dfdl:length="1"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="N" /> <xs:enumeration value="S" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Hyphen" dfdl:lengthKind="explicit" dfdl:length="1"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="-" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeDegrees" dfdl:lengthKind="explicit" dfdl:length="3"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{3}" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeMinutes" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(E|W))"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}" /> <xs:pattern value="[0-9]{2}\.[0-9]{1}" /> <xs:pattern value="[0-9]{2}\.[0-9]{2}" /> <xs:pattern value="[0-9]{2}\.[0-9]{3}" /> <xs:pattern value="[0-9]{2}\.[0-9]{4}" /> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeHemisphere"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="E" /> <xs:enumeration value="W" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> Notice that the last part (LongitudeHemisphere) has no DFDL added. This is because I am assuming that it is followed by the delimiter for the Location field.