Thanks Mike. You make a great point creating XSD’s that use minimal XSD 
features. I added this to my writeup:

I suggest creating a "very simple XSD" that sticks with a minimal subset of XSD 
features. Why? Because there is lots of XSD stuff that DFDL doesn't have:

* No attributes
* Only elements can have array or optional nature: maxOccurs or minOccurs > 1 
or < 1.
* Subset of the XSD simple types, subset of facets depending on that type
* No "all" groups
* No wildcards
* No complex type derivation
* No substitution groups
* No list types
* No key/unique constraints
* Restrictions on union types - all must have same base type
* Daffodil (not DFDL) restricts use of multiple child element declarations 
having the same name.
* .... there's a few more, but that's most of it.

I am creating DFDL schemas for a military data format called USMTF. There 
already exists XSDs for the USMTF messages. Fortunately, the people that 
created the USMTF schemas used this minimal subset of XSD features. As a 
result, I am able to immediately use the XSDs as scaffolding onto which DFDL 
properties are added.

From: Mike Beckerle <mbecke...@apache.org>
Sent: Wednesday, August 10, 2022 3:18 PM
To: users@daffodil.apache.org
Subject: [EXT] Re: [UPDATED] *STOP* your DFDL development and READ THIS!

This is really great roger. You've zeroed in on a set of very workable design 
patterns that are great for MTF and all its variations.

What I particularly like is the separation of well-formed data (getting where 
and how long the element is) from validity (getting values right).

At the top you say "If you don't have an XSD create one".

I suggest include advice to create a "very simple XSD" that sticks with a quite 
minimal subset of XSD features.

Because there's lots of XSD stuff DFDL doesn't have:

* No attributes
* Only elements can have array or optional nature: maxOccurs or minOccurs > 1 
or < 1.
* subset of the XSD simple types, subset of facets depending on that type
* No "all" groups
* No wildcards
* No complex type derivation
* No substitution groups
* No list types
* No key/unique constraints
* Restrictions on union types - all must have same base type
* Daffodil (not DFDL) restricts use of multiple child element declarations 
having the same name.
* .... there's a few more, but that's most of it.




On Wed, Aug 10, 2022 at 2:30 PM Roger L Costello 
<coste...@mitre.org<mailto:coste...@mitre.org>> wrote:
Hi Folks,

A lot of complexity got replaced with simplicity, thanks to Mike and Steve.

Here’s the updated information. Lots of changes. If you find any errors, let me 
know.  /Roger
------------------------------------------
Daffodil now supports the -V limited option. The -V limited option is a game 
changer. It totally changes the strategy for creating DFDL schemas. You use 
less DFDL properties and more XSD facets. This is huge!

That said, what I am about to describe may or may not fit your DFDL work.

For my work, there already exists an XML Schema (XSD). [If your work doesn’t 
already have an XSD, then create one!] The XSD is scaffolding and all I must do 
is add the appropriate DFDL properties to the scaffolding. All the leaf 
elements in the XSD are of type xs:string and are constrained using pattern or 
enumeration facets. Some data fields are nillable and so their corresponding 
XSD element declarations have nillable="true". Others are non-nillable. Some 
data fields have fixed length. Others have variable length. This message shows 
how to add appropriate DFDL properties to each type of leaf element.

Before doing so, however, let’s see how the -V limited option changes DFDL 
schema development. Prior to the availability of the -V limited option I was 
using dfdl:lengthPattern="regex" to specify leaf elements. As a result, I had 
to:

  *   Convert each enumeration list in the XSD to a regex, where the 
enumeration values became regex alternatives. Then I would sort the 
alternatives longest-to-shortest. For fixed fields I would pad the alternatives 
that weren’t of the required length. And then I would set the sorted, padded 
regex as the value of dfdl:lengthPattern. Now, with the -V limited option I 
leave the enumeration list as it is. I ditched dfdl:lengthPattern. It’s not 
needed anymore.
  *   Convert pattern facets in the XSD to a single regex containing 
alternatives. Then sort the alternatives longest-to-shortest. For fixed fields 
I would pad the alternatives that weren’t of the required length. And then I 
would set the sorted, padded regex as the value of dfdl:lengthPattern. With the 
-V limited option, I no longer process the pattern facet, I use it as is.

The -V limited option means greater use of XSD facets and less need for DFDL 
properties. It means less processing: no more converting enumeration values 
into regex alternatives, no more converting pattern facets into regex 
alternatives, no more sorting regex alternatives in longest-to-shortest order, 
and for fixed fields no more padding alternatives.

Here is the Desired Parsing Behavior: If data is well-formed and valid, I want 
parsing to produce XML and display no errors. If data is well-formed but not 
valid, I want parsing to produce XML and display errors. If data is not 
well-formed, I want parsing to not produce XML and display errors.

I use the Daffodil -V limited option, as it results in the desired parsing 
behavior.

As I said above, in my XSD the leaf elements are nillable or not, fixed length 
or not. In other words, there four types of data fields:

1. Data field is fixed length, nillable

The following element declaration shows how to specify fixed length, nillable 
fields.

Field specification:
>>  Fixed length (3)
>>  Nillable, hyphen is the nil value, the hyphen may be positioned anywhere 
>> within the 3-character field
>>  Values must be left-justified
>>  Values shorter than 3 characters must be padded with spaces

<xs:element name="RunwayStatus"
              nillable="true"
                dfdl:nilKind="literalValue"
              dfdl:nilValue="%WSP*;-"
                 dfdl:lengthKind="explicit"
              dfdl:length="3"
           dfdl:textTrimKind="padChar"
            dfdl:textPadKind="padChar"
      dfdl:textStringPadCharacter="%SP;"
    dfdl:textStringJustification="left">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:enumeration value="FLT"/>
            <xs:enumeration value="GVL"/>
            <xs:enumeration value="BRK"/>
            <xs:enumeration value="GDD"/>
        </xs:restriction>
    </xs:simpleType>
</xs:element>

In this case all the enumeration values are of the required length (3). Suppose 
some were shorter, would you need to pad them with spaces? No, there is no need 
to pad enumeration values. The combination of dfdl:length="3" and 
dfdl:textStringPadCharacter="%SP;" means that parsing will check that the input 
field has length 3 and if it contains a value that is shorter than 3 it is 
padded on the right with spaces. The dfdl:textStringJustification="left" 
property specifies that values must be left-justified. Which means, this input 
is okay:

…/AB /…

but this is not:

…/ AB/…

If there is no input data available to populate the field, a hyphen is to be 
inserted. In other words, hyphen is the nil value. Of course, even with a nil 
value the field is still required to have length 3, so the hyphen must be 
padded with spaces. dfdl:nilValue="%WSP*;-" specifies that the hyphen may be 
positioned anywhere within the 3-character field.

Let’s see how a DFDL processor parses the element. With the following input 
(note the spaces around the hyphen):

…/ - /…

parsing produces this output:

<RunwayStatus xsi:nil="true"></RunwayStatus>

and unparsing produces this output:

…/-  /…

Notice that unparsing results in moving the hyphen to the left side of the 
field.

With this input:

…/FLT/…

parsing produces this output:

<RunwayStatus>FLT</RunwayStatus>

and unparsing produces this output:

…/FLT/…

If a pattern facet had been used instead of the enumeration facet:

<xs:simpleType>
    <xs:restriction base="xs:string">
        <xs:pattern value="FLT|GVL|BRK|GDD" />
    </xs:restriction>
</xs:simpleType>

everything works the same. That is, the same set of DFDL properties are used.

2. Data field is fixed length, non-nillable

The following element declaration shows how to specify fixed length, 
non-nillable fields.

Field specification:
>>  Fixed length (6)
>>  Values must be left-justified
>>  Values shorter than 6 characters must be padded with spaces

<xs:element name="TimeLabel"
                 dfdl:lengthKind="explicit"
              dfdl:length="6"
           dfdl:textTrimKind="padChar"
            dfdl:textPadKind="padChar"
      dfdl:textStringPadCharacter="%SP;"
    dfdl:textStringJustification="left">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:enumeration value="JUPT"/>
            <xs:enumeration value="VENUSS"/>
            <xs:enumeration value="MARSSS"/>
            <xs:enumeration value="SUNNYY"/>
            <xs:enumeration value="EAR"/>
        </xs:restriction>
    </xs:simpleType>
</xs:element>

Notice that some of the enumeration values have a length less than the required 
length (6). For example, EAR has a length of only 3. Does that mean we need to 
pad those values with length less than 6? No, there is no need to pad any 
enumeration value. The combination of dfdl:length="6" and 
dfdl:textStringPadCharacter="%SP;" means that parsing will check the input 
field to see that it has length 6 and if it contains a value that is shorter 
than 6, check that it is padded on the right with spaces. The 
dfdl:textStringJustification="left" property specifies that values must be 
left-justified. In other words, this input is okay:

…/EAR   /…

but this is not:

…/   EAR/…

Let’s see how a DFDL processor parses the element. With the following input 
(notice the value is less 4 characters, so it is padded with 2 spaces):

…/JUPT  /…

parsing produces this output:

<TimeLabel>JUPT</TimeLabel>

and unparsing produces this output:

…/JUPT  /…

In our example, the enumeration facet is used. If a pattern facet had been used 
instead of the enumeration facet:

<xs:simpleType>
    <xs:restriction base="xs:string">
        <xs:pattern value="JUPT|VENUSS|MARSSS|SUNNYY|EAR" />
    </xs:restriction>
</xs:simpleType>

everything works the same. That is, the same set of DFDL properties are used.

3. Data field is variable length, nillable

The following element declaration shows how to specify variable length, 
nillable fields.

Field specification:
>>  Variable length (2-20 characters)
>>  Nillable, hyphen is the nil value, if a hyphen is present, it is the only 
>> character in the field

<xs:element name="MessageID"
              nillable="true"
                dfdl:nilKind="literalValue"
              dfdl:nilValue="-">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:pattern value="[A-Z0-9 ]{2,20}"></xs:pattern>
        </xs:restriction>
    </xs:simpleType>
</xs:element>

Let’s see how a DFDL processor parses the element. With this input:

…/-/…

parsing produces this output:

<MessageID xsi:nil="true"></MessageID>

and unparsing produces this output:

…/-/…

With this input:

…/XRAY/…

parsing produces this output:

<MessageID>XRAY</MessageID>

and unparsing produces this output:

…/XRAY/…


4. Data field is variable length, non-nillable

The following element declaration shows how to specify variable length, 
non-nillable fields.

Field specification:
>>  Variable length (1-7 characters)

<xs:element name="MessageNumber">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:pattern value="[A-Z0-9 ]{1,7}" />
        </xs:restriction>
    </xs:simpleType>
</xs:element>

Let’s see how a DFDL processor parses the element. With this input:

…/BRAVO/…

parsing produces this output:

<MessageNumber>BRAVO</MessageNumber>

and unparsing produces this output:

…/BRAVO/…

The following table shows how to assign XSD and DFDL properties. The nil values 
and length values shown in the table are from the above examples. Obviously for 
your data you need to replace them with your values.

Properties to add onto the XSD element declaration
Data Field:
fixed length,
nillable
Data Field:
fixed length,
non-nillable
Data Field:
variable length,
nillable
Data Field:
variable length,
non-nillable
nillable
true
n/a
true
n/a
dfdl:nilKind
literalValue
n/a
literalValue
n/a
dfdl:nilValue
%WSP*;-
n/a
-
n/a
dfdl:lengthKind
explicit
explicit
delimited
delimited
dfdl:length
3
6
n/a
n/a
dfdl:textTrimKind
padChar
padChar
n/a
n/a
dfdl:textPadKind
padChar
padChar
n/a
n/a
dfdl:textStringPadCharacter
%SP;
%SP;
n/a
n/a
dfdl:textStringJustification
left
left
n/a
n/a

It should be possible to convert this table into a form that can be used to 
automate the adding of DFDL properties onto element declarations.

Reply via email to