Hi Folks,

Daffodil now supports the -V limited option. The -V limited option is a game 
changer. It totally changes the strategy for creating DFDL schemas. You use 
less DFDL properties and more XSD facets. This is huge!

That said, what I am about to describe may or may not fit your DFDL work.

For my work, there already exists an XML Schema (XSD). (If your work doesn't 
already have an XSD, then create one!) The XSD is scaffolding and all I must do 
is add the appropriate DFDL properties to the scaffolding. All the leaf 
elements in the XSD are of type xs:string. They are constrained using pattern 
or enumeration facets. Some data fields are nillable and their corresponding 
XSD element declarations have nillable="true". Others are non-nillable. Some 
data fields have fixed length. Others have variable length. This message 
describes how to add appropriate DFDL properties to the leaf elements.

Before doing so, however, let's see how the -V limited option changes DFDL 
schema development. Prior to the availability of the -V limited option I was 
using dfdl:lengthPattern="regex" to specify leaf elements. As a result, I had 
to:

  *   Convert each enumeration list to a regex where the enumeration values are 
alternatives, sort the alternatives longest-to-shortest, and then use that 
sorted regex as the value of dfdl:lengthPattern. With the -V limited option I 
simply leave the enumeration list as it is. I ditched dfdl:lengthPattern. It's 
not needed anymore. Now I use the XSD pattern and enumeration facets. To repeat 
what I said earlier, use less DFDL properties, more XSD facets.
  *   Convert pattern facets to a single regex containing alternatives, sort 
the alternatives longest-to-shortest, and then use that sorted regex as the 
value of dfdl:lengthPattern. With the -V limited option, when necessary, I sort 
the alternatives in the pattern facet longest-to-shortest but otherwise leave 
it alone.

In a nutshell, the -V limited option enables greater use of the XSD facets and 
less need for the DFDL properties.

Here is the Desired Parsing Behavior: If data is well-formed and valid, I want 
parsing to produce XML and display no errors. If data is well-formed but not 
valid, I want parsing to produce XML and display errors. If data is not 
well-formed, I want parsing to not produce XML and display errors.

I use the Daffodil -V limited option, as it results in the desired parsing 
behavior.

As I said above, leaf elements are nillable or not, fixed length or not. So 
there four possible leaf elements:

1. Leaf element is fixed length, nillable

The following element declaration shows how to declare fixed length, nillable 
elements.

<xs:element name="RunwayStatus"
              nillable="true"
                dfdl:nilKind="literalValue"
              dfdl:nilValue="-"
                 dfdl:lengthKind="explicit"
              dfdl:length="3"
           dfdl:textTrimKind="padChar"
            dfdl:textPadKind="padChar"
      dfdl:textStringPadCharacter="%SP;"
    dfdl:textStringJustification="center">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:enumeration value="FLT"/>
            <xs:enumeration value="GVL"/>
            <xs:enumeration value="BRK"/>
            <xs:enumeration value="GDD"/>
        </xs:restriction>
    </xs:simpleType>
</xs:element>

In this case all the enumeration values are of the required length (3). Suppose 
some were shorter, would you need to pad them with spaces? No. The enumeration 
values stay as they are. Of course the data value in the input field must be 
padded with spaces so that it spans the required length.

If there is no data available for the field, a hyphen is to be inserted into 
the field. The field is still required to have the fixed length, so the hyphen 
is padded with spaces.

The example shows the element using the enumeration facet. If the element 
instead used the pattern facet and its value had regex alternatives, then you 
would need to sort the alternatives longest-to-shortest.

Let's see how Daffodil processes the element. With the following input (notice 
the spaces around the hyphen):

.../ - /...

parsing produces this output:

<RunwayStatus xsi:nil="true"></RunwayStatus>

and unparsing produces this output:

.../ - /...

With this input:

.../FLT/...

parsing produces this output:

<RunwayStatus>FLT</RunwayStatus>

and unparsing produces this output:

.../FLT/...

In the example all enumeration values are of the required length, but suppose 
there is a value (say, AB) that is shorter. Notice the use of 
dfdl:textStringJustification="center" which is fine for the nillable value 
(hyphen) but not for AB which should be left justified. As the schema is 
currently written, the input could contain this (AB is right justified):

.../ AB/...

which is incorrect. So there are conflicting requirements: the nillable value 
needs dfdl:textStringJustification="center" whereas non-nillable values need 
dfdl:textStringJustification="left". What to do about this? [Awaiting response 
from Mike and/or Steve]

2. Leaf element is fixed length, non-nillable

The following element declaration shows how to declare fixed length, 
non-nillable elements.

<xs:element name="TimeLabel"
                 dfdl:lengthKind="explicit"
              dfdl:length="6"
           dfdl:textTrimKind="padChar"
            dfdl:textPadKind="padChar"
      dfdl:textStringPadCharacter="%SP;"
    dfdl:textStringJustification="left">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:enumeration value="JUPT"/>
            <xs:enumeration value="VENUSS"/>
            <xs:enumeration value="MARSSS"/>
            <xs:enumeration value="SUNNYY"/>
            <xs:enumeration value="EAR"/>
        </xs:restriction>
    </xs:simpleType>
</xs:element>

Notice that some of the enumeration values have a length less than the required 
length (6). For example, EAR has a length of only 3. Does that mean we need to 
modify the enumeration values, padding values with length less than 6? No, 
there is no need to ensure that each enumeration value has the required length. 
The dfdl:textStringPadCharacter="%SP;" property ensures that each value will be 
padded. Of course, in the input a data value that is shorter must be padded 
with spaces.

Let's see how Daffodil processes the element. With the following input (notice 
that the data is less than 6 characters, so it is padded with spaces):

.../JUPT  /...

parsing produces this output:

<TimeLabel>JUPT</TimeLabel>

and unparsing produces this output:

.../JUPT  /...

In our example, the enumeration facet is used. If instead the pattern facet had 
been used:

<xs:pattern value="JUPT|VENUSS|...|EAR" />

then the alternatives would have to be sorted longest-to-shortest.

With the enumeration facet, you do not have to sort the values.

3. Leaf element is variable length, nillable

The following element declaration shows how to declare variable length, 
nillable elements.

<xs:element name="MessageID"
              nillable="true"
                dfdl:nilKind="literalValue"
              dfdl:nilValue="-">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:pattern value="[A-Z0-9 ]{2,20}"></xs:pattern>
        </xs:restriction>
    </xs:simpleType>
</xs:element>

Let's see how Daffodil processes the element. With this input:

.../-/...

parsing produces this output:

<MessageID xsi:nil="true"></MessageID>

and unparsing produces this output:

.../-/...

With this input:

.../XRAY/...

parsing produces this output:

<MessageID>XRAY</MessageID>

and unparsing produces this output:

.../XRAY/...


4. Leaf element is variable length, non-nillable

The following element declaration shows how to declare variable length, 
non-nillable elements.

<xs:element name="MessageNumber">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:pattern value="[A-Z0-9 ]{1,7}" />
        </xs:restriction>
    </xs:simpleType>
</xs:element>

Let's see how Daffodil processes the element. With this input:

.../BRAVO/...

parsing produces this output:

<MessageNumber>BRAVO</MessageNumber>

and unparsing produces this output:

.../BRAVO/...

The following table shows how to set the XSD and DFDL properties. $NV (Nil 
Value) denotes the nil value. $FL (Field Length) denotes the required field 
length. Obviously for your data replace $NL and $FL with your values.

XSD and DFDL properties to be used with the element declaration:
Data field with
fixed length,
nillable
-----
In XSD the field is
specified by:
Data field with
fixed length,
non-nillable
-----
In XSD the field is
specified by:
Data field with
variable length,
nillable
-----
In XSD the field is
specified by:
Data field with
variable length,
non-nillable
-----
In XSD field is
specified by:
pattern
facet
enumeration
facet
pattern
facet
enumeration
facet
pattern
facet
enumeration
facet
pattern
facet
enumeration
facet
nillable
true
true
n/a
n/a
true
true
n/a
n/a
dfdl:nilKind
literalValue
literalValue
n/a
n/a
literalValue
literalValue
n/a
n/a
dfdl:nilValue
$NV
$NV
n/a
n/a
$NV
$NV
n/a
n/a
dfdl:lengthKind
explicit
explicit
explicit
explicit
delimited
delimited
delimited
delimited
dfdl:length
$FL
$FL
$FL
$FL
n/a
n/a
n/a
n/a
dfdl:textTrimKind
padChar
padChar
padChar
padChar
n/a
n/a
n/a
n/a
dfdl:textPadKind
padChar
padChar
padChar
padChar
n/a
n/a
n/a
n/a
dfdl:textStringPadCharacter
%SP;
%SP;
%SP;
%SP;
n/a
n/a
n/a
n/a
dfdl:textStringJustification
center
center
left
left
n/a
n/a
n/a
n/a
Sort alternatives in longest-to-shortest order?
yes
no
yes
no
yes
no
yes
no

It should be possible to convert this table into a form that can be used to 
automate the adding of DFDL properties onto element declarations.

Reply via email to