Hi Folks, Daffodil now supports the -V limited option. The -V limited option is a game changer. It totally changes the strategy for creating DFDL schemas. You use less DFDL properties and more XSD facets. This is huge!
That said, what I am about to describe may or may not fit your DFDL work. For my work, there already exists an XML Schema (XSD). (If your work doesn't already have an XSD, then create one!) The XSD is scaffolding and all I must do is add the appropriate DFDL properties to the scaffolding. All the leaf elements in the XSD are of type xs:string. They are constrained using pattern or enumeration facets. Some data fields are nillable and their corresponding XSD element declarations have nillable="true". Others are non-nillable. Some data fields have fixed length. Others have variable length. This message describes how to add appropriate DFDL properties to the leaf elements. Before doing so, however, let's see how the -V limited option changes DFDL schema development. Prior to the availability of the -V limited option I was using dfdl:lengthPattern="regex" to specify leaf elements. As a result, I had to: * Convert each enumeration list to a regex where the enumeration values are alternatives, sort the alternatives longest-to-shortest, and then use that sorted regex as the value of dfdl:lengthPattern. With the -V limited option I simply leave the enumeration list as it is. I ditched dfdl:lengthPattern. It's not needed anymore. Now I use the XSD pattern and enumeration facets. To repeat what I said earlier, use less DFDL properties, more XSD facets. * Convert pattern facets to a single regex containing alternatives, sort the alternatives longest-to-shortest, and then use that sorted regex as the value of dfdl:lengthPattern. With the -V limited option, when necessary, I sort the alternatives in the pattern facet longest-to-shortest but otherwise leave it alone. In a nutshell, the -V limited option enables greater use of the XSD facets and less need for the DFDL properties. Here is the Desired Parsing Behavior: If data is well-formed and valid, I want parsing to produce XML and display no errors. If data is well-formed but not valid, I want parsing to produce XML and display errors. If data is not well-formed, I want parsing to not produce XML and display errors. I use the Daffodil -V limited option, as it results in the desired parsing behavior. As I said above, leaf elements are nillable or not, fixed length or not. So there four possible leaf elements: 1. Leaf element is fixed length, nillable The following element declaration shows how to declare fixed length, nillable elements. <xs:element name="RunwayStatus" nillable="true" dfdl:nilKind="literalValue" dfdl:nilValue="-" dfdl:lengthKind="explicit" dfdl:length="3" dfdl:textTrimKind="padChar" dfdl:textPadKind="padChar" dfdl:textStringPadCharacter="%SP;" dfdl:textStringJustification="center"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="FLT"/> <xs:enumeration value="GVL"/> <xs:enumeration value="BRK"/> <xs:enumeration value="GDD"/> </xs:restriction> </xs:simpleType> </xs:element> In this case all the enumeration values are of the required length (3). Suppose some were shorter, would you need to pad them with spaces? No. The enumeration values stay as they are. Of course the data value in the input field must be padded with spaces so that it spans the required length. If there is no data available for the field, a hyphen is to be inserted into the field. The field is still required to have the fixed length, so the hyphen is padded with spaces. The example shows the element using the enumeration facet. If the element instead used the pattern facet and its value had regex alternatives, then you would need to sort the alternatives longest-to-shortest. Let's see how Daffodil processes the element. With the following input (notice the spaces around the hyphen): .../ - /... parsing produces this output: <RunwayStatus xsi:nil="true"></RunwayStatus> and unparsing produces this output: .../ - /... With this input: .../FLT/... parsing produces this output: <RunwayStatus>FLT</RunwayStatus> and unparsing produces this output: .../FLT/... In the example all enumeration values are of the required length, but suppose there is a value (say, AB) that is shorter. Notice the use of dfdl:textStringJustification="center" which is fine for the nillable value (hyphen) but not for AB which should be left justified. As the schema is currently written, the input could contain this (AB is right justified): .../ AB/... which is incorrect. So there are conflicting requirements: the nillable value needs dfdl:textStringJustification="center" whereas non-nillable values need dfdl:textStringJustification="left". What to do about this? [Awaiting response from Mike and/or Steve] 2. Leaf element is fixed length, non-nillable The following element declaration shows how to declare fixed length, non-nillable elements. <xs:element name="TimeLabel" dfdl:lengthKind="explicit" dfdl:length="6" dfdl:textTrimKind="padChar" dfdl:textPadKind="padChar" dfdl:textStringPadCharacter="%SP;" dfdl:textStringJustification="left"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="JUPT"/> <xs:enumeration value="VENUSS"/> <xs:enumeration value="MARSSS"/> <xs:enumeration value="SUNNYY"/> <xs:enumeration value="EAR"/> </xs:restriction> </xs:simpleType> </xs:element> Notice that some of the enumeration values have a length less than the required length (6). For example, EAR has a length of only 3. Does that mean we need to modify the enumeration values, padding values with length less than 6? No, there is no need to ensure that each enumeration value has the required length. The dfdl:textStringPadCharacter="%SP;" property ensures that each value will be padded. Of course, in the input a data value that is shorter must be padded with spaces. Let's see how Daffodil processes the element. With the following input (notice that the data is less than 6 characters, so it is padded with spaces): .../JUPT /... parsing produces this output: <TimeLabel>JUPT</TimeLabel> and unparsing produces this output: .../JUPT /... In our example, the enumeration facet is used. If instead the pattern facet had been used: <xs:pattern value="JUPT|VENUSS|...|EAR" /> then the alternatives would have to be sorted longest-to-shortest. With the enumeration facet, you do not have to sort the values. 3. Leaf element is variable length, nillable The following element declaration shows how to declare variable length, nillable elements. <xs:element name="MessageID" nillable="true" dfdl:nilKind="literalValue" dfdl:nilValue="-"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[A-Z0-9 ]{2,20}"></xs:pattern> </xs:restriction> </xs:simpleType> </xs:element> Let's see how Daffodil processes the element. With this input: .../-/... parsing produces this output: <MessageID xsi:nil="true"></MessageID> and unparsing produces this output: .../-/... With this input: .../XRAY/... parsing produces this output: <MessageID>XRAY</MessageID> and unparsing produces this output: .../XRAY/... 4. Leaf element is variable length, non-nillable The following element declaration shows how to declare variable length, non-nillable elements. <xs:element name="MessageNumber"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[A-Z0-9 ]{1,7}" /> </xs:restriction> </xs:simpleType> </xs:element> Let's see how Daffodil processes the element. With this input: .../BRAVO/... parsing produces this output: <MessageNumber>BRAVO</MessageNumber> and unparsing produces this output: .../BRAVO/... The following table shows how to set the XSD and DFDL properties. $NV (Nil Value) denotes the nil value. $FL (Field Length) denotes the required field length. Obviously for your data replace $NL and $FL with your values. XSD and DFDL properties to be used with the element declaration: Data field with fixed length, nillable ----- In XSD the field is specified by: Data field with fixed length, non-nillable ----- In XSD the field is specified by: Data field with variable length, nillable ----- In XSD the field is specified by: Data field with variable length, non-nillable ----- In XSD field is specified by: pattern facet enumeration facet pattern facet enumeration facet pattern facet enumeration facet pattern facet enumeration facet nillable true true n/a n/a true true n/a n/a dfdl:nilKind literalValue literalValue n/a n/a literalValue literalValue n/a n/a dfdl:nilValue $NV $NV n/a n/a $NV $NV n/a n/a dfdl:lengthKind explicit explicit explicit explicit delimited delimited delimited delimited dfdl:length $FL $FL $FL $FL n/a n/a n/a n/a dfdl:textTrimKind padChar padChar padChar padChar n/a n/a n/a n/a dfdl:textPadKind padChar padChar padChar padChar n/a n/a n/a n/a dfdl:textStringPadCharacter %SP; %SP; %SP; %SP; n/a n/a n/a n/a dfdl:textStringJustification center center left left n/a n/a n/a n/a Sort alternatives in longest-to-shortest order? yes no yes no yes no yes no It should be possible to convert this table into a form that can be used to automate the adding of DFDL properties onto element declarations.