Hi Folks, A lot of complexity got replaced with simplicity, thanks to Mike and Steve.
Here's the updated information. Lots of changes. If you find any errors, let me know. /Roger ------------------------------------------ Daffodil now supports the -V limited option. The -V limited option is a game changer. It totally changes the strategy for creating DFDL schemas. You use less DFDL properties and more XSD facets. This is huge! That said, what I am about to describe may or may not fit your DFDL work. For my work, there already exists an XML Schema (XSD). [If your work doesn't already have an XSD, then create one!] The XSD is scaffolding and all I must do is add the appropriate DFDL properties to the scaffolding. All the leaf elements in the XSD are of type xs:string and are constrained using pattern or enumeration facets. Some data fields are nillable and so their corresponding XSD element declarations have nillable="true". Others are non-nillable. Some data fields have fixed length. Others have variable length. This message shows how to add appropriate DFDL properties to each type of leaf element. Before doing so, however, let's see how the -V limited option changes DFDL schema development. Prior to the availability of the -V limited option I was using dfdl:lengthPattern="regex" to specify leaf elements. As a result, I had to: * Convert each enumeration list in the XSD to a regex, where the enumeration values became regex alternatives. Then I would sort the alternatives longest-to-shortest. For fixed fields I would pad the alternatives that weren't of the required length. And then I would set the sorted, padded regex as the value of dfdl:lengthPattern. Now, with the -V limited option I leave the enumeration list as it is. I ditched dfdl:lengthPattern. It's not needed anymore. * Convert pattern facets in the XSD to a single regex containing alternatives. Then sort the alternatives longest-to-shortest. For fixed fields I would pad the alternatives that weren't of the required length. And then I would set the sorted, padded regex as the value of dfdl:lengthPattern. With the -V limited option, I no longer process the pattern facet, I use it as is. The -V limited option means greater use of XSD facets and less need for DFDL properties. It means less processing: no more converting enumeration values into regex alternatives, no more converting pattern facets into regex alternatives, no more sorting regex alternatives in longest-to-shortest order, and for fixed fields no more padding alternatives. Here is the Desired Parsing Behavior: If data is well-formed and valid, I want parsing to produce XML and display no errors. If data is well-formed but not valid, I want parsing to produce XML and display errors. If data is not well-formed, I want parsing to not produce XML and display errors. I use the Daffodil -V limited option, as it results in the desired parsing behavior. As I said above, in my XSD the leaf elements are nillable or not, fixed length or not. In other words, there four types of data fields: 1. Data field is fixed length, nillable The following element declaration shows how to specify fixed length, nillable fields. Field specification: >> Fixed length (3) >> Nillable, hyphen is the nil value, the hyphen may be positioned anywhere >> within the 3-character field >> Values must be left-justified >> Values shorter than 3 characters must be padded with spaces <xs:element name="RunwayStatus" nillable="true" dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;-" dfdl:lengthKind="explicit" dfdl:length="3" dfdl:textTrimKind="padChar" dfdl:textPadKind="padChar" dfdl:textStringPadCharacter="%SP;" dfdl:textStringJustification="left"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="FLT"/> <xs:enumeration value="GVL"/> <xs:enumeration value="BRK"/> <xs:enumeration value="GDD"/> </xs:restriction> </xs:simpleType> </xs:element> In this case all the enumeration values are of the required length (3). Suppose some were shorter, would you need to pad them with spaces? No, there is no need to pad enumeration values. The combination of dfdl:length="3" and dfdl:textStringPadCharacter="%SP;" means that parsing will check that the input field has length 3 and if it contains a value that is shorter than 3 it is padded on the right with spaces. The dfdl:textStringJustification="left" property specifies that values must be left-justified. Which means, this input is okay: .../AB /... but this is not: .../ AB/... If there is no input data available to populate the field, a hyphen is to be inserted. In other words, hyphen is the nil value. Of course, even with a nil value the field is still required to have length 3, so the hyphen must be padded with spaces. dfdl:nilValue="%WSP*;-" specifies that the hyphen may be positioned anywhere within the 3-character field. Let's see how a DFDL processor parses the element. With the following input (note the spaces around the hyphen): .../ - /... parsing produces this output: <RunwayStatus xsi:nil="true"></RunwayStatus> and unparsing produces this output: .../- /... Notice that unparsing results in moving the hyphen to the left side of the field. With this input: .../FLT/... parsing produces this output: <RunwayStatus>FLT</RunwayStatus> and unparsing produces this output: .../FLT/... If a pattern facet had been used instead of the enumeration facet: <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="FLT|GVL|BRK|GDD" /> </xs:restriction> </xs:simpleType> everything works the same. That is, the same set of DFDL properties are used. 2. Data field is fixed length, non-nillable The following element declaration shows how to specify fixed length, non-nillable fields. Field specification: >> Fixed length (6) >> Values must be left-justified >> Values shorter than 6 characters must be padded with spaces <xs:element name="TimeLabel" dfdl:lengthKind="explicit" dfdl:length="6" dfdl:textTrimKind="padChar" dfdl:textPadKind="padChar" dfdl:textStringPadCharacter="%SP;" dfdl:textStringJustification="left"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="JUPT"/> <xs:enumeration value="VENUSS"/> <xs:enumeration value="MARSSS"/> <xs:enumeration value="SUNNYY"/> <xs:enumeration value="EAR"/> </xs:restriction> </xs:simpleType> </xs:element> Notice that some of the enumeration values have a length less than the required length (6). For example, EAR has a length of only 3. Does that mean we need to pad those values with length less than 6? No, there is no need to pad any enumeration value. The combination of dfdl:length="6" and dfdl:textStringPadCharacter="%SP;" means that parsing will check the input field to see that it has length 6 and if it contains a value that is shorter than 6, check that it is padded on the right with spaces. The dfdl:textStringJustification="left" property specifies that values must be left-justified. In other words, this input is okay: .../EAR /... but this is not: .../ EAR/... Let's see how a DFDL processor parses the element. With the following input (notice the value is less 4 characters, so it is padded with 2 spaces): .../JUPT /... parsing produces this output: <TimeLabel>JUPT</TimeLabel> and unparsing produces this output: .../JUPT /... In our example, the enumeration facet is used. If a pattern facet had been used instead of the enumeration facet: <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="JUPT|VENUSS|MARSSS|SUNNYY|EAR" /> </xs:restriction> </xs:simpleType> everything works the same. That is, the same set of DFDL properties are used. 3. Data field is variable length, nillable The following element declaration shows how to specify variable length, nillable fields. Field specification: >> Variable length (2-20 characters) >> Nillable, hyphen is the nil value, if a hyphen is present, it is the only >> character in the field <xs:element name="MessageID" nillable="true" dfdl:nilKind="literalValue" dfdl:nilValue="-"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[A-Z0-9 ]{2,20}"></xs:pattern> </xs:restriction> </xs:simpleType> </xs:element> Let's see how a DFDL processor parses the element. With this input: .../-/... parsing produces this output: <MessageID xsi:nil="true"></MessageID> and unparsing produces this output: .../-/... With this input: .../XRAY/... parsing produces this output: <MessageID>XRAY</MessageID> and unparsing produces this output: .../XRAY/... 4. Data field is variable length, non-nillable The following element declaration shows how to specify variable length, non-nillable fields. Field specification: >> Variable length (1-7 characters) <xs:element name="MessageNumber"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[A-Z0-9 ]{1,7}" /> </xs:restriction> </xs:simpleType> </xs:element> Let's see how a DFDL processor parses the element. With this input: .../BRAVO/... parsing produces this output: <MessageNumber>BRAVO</MessageNumber> and unparsing produces this output: .../BRAVO/... The following table shows how to assign XSD and DFDL properties. The nil values and length values shown in the table are from the above examples. Obviously for your data you need to replace them with your values. Properties to add onto the XSD element declaration Data Field: fixed length, nillable Data Field: fixed length, non-nillable Data Field: variable length, nillable Data Field: variable length, non-nillable nillable true n/a true n/a dfdl:nilKind literalValue n/a literalValue n/a dfdl:nilValue %WSP*;- n/a - n/a dfdl:lengthKind explicit explicit delimited delimited dfdl:length 3 6 n/a n/a dfdl:textTrimKind padChar padChar n/a n/a dfdl:textPadKind padChar padChar n/a n/a dfdl:textStringPadCharacter %SP; %SP; n/a n/a dfdl:textStringJustification left left n/a n/a It should be possible to convert this table into a form that can be used to automate the adding of DFDL properties onto element declarations.