[Definition] Composite field: a field that is composed of parts. Parts may be 
of fixed or variable length. There is no separator between the parts. The parts 
are non-nillable.


  *   Extend Daffodil with a “composite field” property.

Done!

I spent a couple hours this morning creating a tool that implements a composite 
field capability. Turns out, it was easier than I thought.

My tool is a preprocessor. I named it compositepp (composite preprocessor).

I created a DFDL extension property named ‘composite’. In the example that I 
shared yesterday, the ‘Origin’ element is a composite field. Here’s how to 
indicate in a DFDL schema that a field is composite:

<xs:element name="Origin" dfdlx:composite="true"> … </xs:element>

My tool is run from a command shell like this:

cat origin.dfdlx.xsd | compositepp > origin.dfdl.xsd

For example, my tool converts a DFDL schema containing this composite field:

<xs:element name="Origin" dfdlx:composite="true">
    <xs:complexType>
        <xs:sequence dfdl:separator="">
            <xs:element name="LatitudeDegrees">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LatitudeMinutes">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{1}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{2}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{3}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{4}"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LatitudeHemisphere">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="N"/>
                        <xs:enumeration value="S"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="Hyphen">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="-"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeDegrees">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{3}"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeMinutes">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{1}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{2}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{3}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{4}"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeHemisphere">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="E"/>
                        <xs:enumeration value="W"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>

to this:

<xs:element name="Origin">
    <xs:complexType>
        <xs:sequence dfdl:separator="">
            <xs:element name="LatitudeDegrees"
                                  dfdl:lengthKind="explicit"
                                  dfdl:length="2">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LatitudeMinutes"
                                  dfdl:lengthKind="pattern"
                                  dfdl:lengthPattern=".*?(?=(N|S))">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{1}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{2}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{3}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{4}"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LatitudeHemisphere"
                                  dfdl:lengthKind="explicit"
                                  dfdl:length="1">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="N"/>
                        <xs:enumeration value="S"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="Hyphen"
                                  dfdl:lengthKind="explicit"
                                 dfdl:length="1">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="-"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeDegrees"
                                  dfdl:lengthKind="explicit"
                                  dfdl:length="3">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{3}"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeMinutes"
                                  dfdl:lengthKind="pattern"
                                  dfdl:lengthPattern=".*?(?=(E|W))">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:pattern value="[0-9]{2}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{1}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{2}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{3}"/>
                        <xs:pattern value="[0-9]{2}\.[0-9]{4}"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
            <xs:element name="LongitudeHemisphere">
                <xs:simpleType>
                    <xs:restriction base="xs:string">
                        <xs:enumeration value="E"/>
                        <xs:enumeration value="W"/>
                    </xs:restriction>
                </xs:simpleType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>

/Roger

From: Mike Beckerle <mbecke...@apache.org>
Sent: Thursday, August 18, 2022 5:33 PM
To: users@daffodil.apache.org
Subject: [EXT] Re: Request new option in Daffodil

Great idea. I actually think adding new lengthKinds is an important direction 
for Daffodil experimental features.

In fact I just created a JIRA ticket for such the other day:
https://issues.apache.org/jira/browse/DAFFODIL-2722
This is a much smaller change than you are proposing however.

But...wait...There is also this ticket 
https://issues.apache.org/jira/browse/DAFFODIL-2692 which is length kind 
"valuePattern" which is directly related to your current discussion.

I added a link to this email thread to that ticket already, because I think 
this discussion is a reinvention of the ideas in that valuePattern ticket in a 
way that would work better, so these new ideas should subsume that ticket. So, 
conveniently, we already have a ticket for this :-)

I do have to say giving this priority is tough for the existing developers 
working on daffodil-library itself. We have an enthusiastic sub-group working 
on a graphical debug/IDE for daffodil which is awesome. But for the basic 
library, we're a small handful of people. There's plenty of JIRA tickets where 
users have no workaround at all for how to parse their data due to either 
missing DFDL features we haven't done yet, or major bugs in them.

So "next release".... probably not. There's a lot of pressure for a next 
release super soon meaning, as soon as one feature: EXI support, is done.

That said, contributions from new developers reflect their 
personal/organizational priorities. This is all open-source after all. So find 
someone to give it the priority you want, and ... voila, it has that priority. 
Magic.

I think we have plenty of XML-centric users and some release soon on our 
roadmap could have a theme of catering to the needs of the XML-enthusiast 
crowd. If we articulate that roadmap release and give it a target such as rel 
3.5.0 or 3.6.0 and have it contain XML-centric features like this as the main 
theme, that could attract developers who are in the more-XML-features camp to 
help implement it.

I would suggest this as the feature-them of the release:

* https://issues.apache.org/jira/browse/DAFFODIL-2636
* https://issues.apache.org/jira/browse/DAFFODIL-2692
* https://issues.apache.org/jira/browse/DAFFODIL-2722 (this one is pretty small 
work)
* possibly 
https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Extend+DFDL+with+XML+Attribute+Support












On Thu, Aug 18, 2022 at 3:15 PM Roger L Costello 
<coste...@mitre.org<mailto:coste...@mitre.org>> wrote:
Hi Folks,

I request a new option be added to Daffodil. I don't have a name for the 
option, but here's the intent of the option:

        When an element in the schema has a simpleType
        that contains facets, use those facets to specify
        the content of a data field.

For example, with this element declaration:

<xs:element name="LatitudeDegrees">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:pattern value="[0-9]{2}" />
        </xs:restriction>
    </xs:simpleType>
</xs:element>

The pattern facet specifies that the field is two digit characters.

Today, I can't do that. Instead, I have to add two DFDL properties to specify 
that the field's length is two:

<xs:element name="LatitudeDegrees"
                       dfdl:lengthKind="explicit"
                       dfdl:length="2">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:pattern value="[0-9]{2}" />
        </xs:restriction>
    </xs:simpleType>
</xs:element>

There should be no reason for having to add those DFDL properties. The XSD 
pattern facet already tells you that the length is two.

Now, you might argue: "What's so hard about adding those two properties?"

You are correct for this specific instance. It's easy if we are building the 
DFDL schema manually, hardcoding every XSD element declaration.

But if we want to write a program that can input arbitrary XSD and 
automatically apply the appropriate DFDL properties, then things aren't so 
easy. Case in point:

Write a program that inputs an arbitrary sequence of XSD element declarations. 
The sequence of elements represent the parts of one data field. Each part may 
be of fixed or variable length. There is no separator between the parts. The 
parts are non-nillable. The program must output the element declarations with 
the appropriate DFDL properties added.

It is probably impossible to write such a program with today's DFDL. Or at 
least, very difficult.

If DFDL leveraged the XSD facets, then that would greatly simplify the DFDL 
schema. And, it would enable programs to be written to automate the production 
of DFDL schemas.

I recommend including such an option in the next release of Daffodil.

I thought that the -V limited option was doing what I describe above. Sadly, I 
realized today that it doesn't.

/Roger

Reply via email to