[Definition] Composite field: a field that is composed of parts. Parts may be of fixed or variable length. There is no separator between the parts. The parts are non-nillable.
* Extend Daffodil with a “composite field” property. Done! I spent a couple hours this morning creating a tool that implements a composite field capability. Turns out, it was easier than I thought. My tool is a preprocessor. I named it compositepp (composite preprocessor). I created a DFDL extension property named ‘composite’. In the example that I shared yesterday, the ‘Origin’ element is a composite field. Here’s how to indicate in a DFDL schema that a field is composite: <xs:element name="Origin" dfdlx:composite="true"> … </xs:element> My tool is run from a command shell like this: cat origin.dfdlx.xsd | compositepp > origin.dfdl.xsd For example, my tool converts a DFDL schema containing this composite field: <xs:element name="Origin" dfdlx:composite="true"> <xs:complexType> <xs:sequence dfdl:separator=""> <xs:element name="LatitudeDegrees"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LatitudeMinutes"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}"/> <xs:pattern value="[0-9]{2}\.[0-9]{1}"/> <xs:pattern value="[0-9]{2}\.[0-9]{2}"/> <xs:pattern value="[0-9]{2}\.[0-9]{3}"/> <xs:pattern value="[0-9]{2}\.[0-9]{4}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LatitudeHemisphere"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="N"/> <xs:enumeration value="S"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Hyphen"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="-"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeDegrees"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{3}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeMinutes"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}"/> <xs:pattern value="[0-9]{2}\.[0-9]{1}"/> <xs:pattern value="[0-9]{2}\.[0-9]{2}"/> <xs:pattern value="[0-9]{2}\.[0-9]{3}"/> <xs:pattern value="[0-9]{2}\.[0-9]{4}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeHemisphere"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="E"/> <xs:enumeration value="W"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> to this: <xs:element name="Origin"> <xs:complexType> <xs:sequence dfdl:separator=""> <xs:element name="LatitudeDegrees" dfdl:lengthKind="explicit" dfdl:length="2"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LatitudeMinutes" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(N|S))"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}"/> <xs:pattern value="[0-9]{2}\.[0-9]{1}"/> <xs:pattern value="[0-9]{2}\.[0-9]{2}"/> <xs:pattern value="[0-9]{2}\.[0-9]{3}"/> <xs:pattern value="[0-9]{2}\.[0-9]{4}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LatitudeHemisphere" dfdl:lengthKind="explicit" dfdl:length="1"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="N"/> <xs:enumeration value="S"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="Hyphen" dfdl:lengthKind="explicit" dfdl:length="1"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="-"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeDegrees" dfdl:lengthKind="explicit" dfdl:length="3"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{3}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeMinutes" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=(E|W))"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}"/> <xs:pattern value="[0-9]{2}\.[0-9]{1}"/> <xs:pattern value="[0-9]{2}\.[0-9]{2}"/> <xs:pattern value="[0-9]{2}\.[0-9]{3}"/> <xs:pattern value="[0-9]{2}\.[0-9]{4}"/> </xs:restriction> </xs:simpleType> </xs:element> <xs:element name="LongitudeHemisphere"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="E"/> <xs:enumeration value="W"/> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> /Roger From: Mike Beckerle <mbecke...@apache.org> Sent: Thursday, August 18, 2022 5:33 PM To: users@daffodil.apache.org Subject: [EXT] Re: Request new option in Daffodil Great idea. I actually think adding new lengthKinds is an important direction for Daffodil experimental features. In fact I just created a JIRA ticket for such the other day: https://issues.apache.org/jira/browse/DAFFODIL-2722 This is a much smaller change than you are proposing however. But...wait...There is also this ticket https://issues.apache.org/jira/browse/DAFFODIL-2692 which is length kind "valuePattern" which is directly related to your current discussion. I added a link to this email thread to that ticket already, because I think this discussion is a reinvention of the ideas in that valuePattern ticket in a way that would work better, so these new ideas should subsume that ticket. So, conveniently, we already have a ticket for this :-) I do have to say giving this priority is tough for the existing developers working on daffodil-library itself. We have an enthusiastic sub-group working on a graphical debug/IDE for daffodil which is awesome. But for the basic library, we're a small handful of people. There's plenty of JIRA tickets where users have no workaround at all for how to parse their data due to either missing DFDL features we haven't done yet, or major bugs in them. So "next release".... probably not. There's a lot of pressure for a next release super soon meaning, as soon as one feature: EXI support, is done. That said, contributions from new developers reflect their personal/organizational priorities. This is all open-source after all. So find someone to give it the priority you want, and ... voila, it has that priority. Magic. I think we have plenty of XML-centric users and some release soon on our roadmap could have a theme of catering to the needs of the XML-enthusiast crowd. If we articulate that roadmap release and give it a target such as rel 3.5.0 or 3.6.0 and have it contain XML-centric features like this as the main theme, that could attract developers who are in the more-XML-features camp to help implement it. I would suggest this as the feature-them of the release: * https://issues.apache.org/jira/browse/DAFFODIL-2636 * https://issues.apache.org/jira/browse/DAFFODIL-2692 * https://issues.apache.org/jira/browse/DAFFODIL-2722 (this one is pretty small work) * possibly https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Extend+DFDL+with+XML+Attribute+Support On Thu, Aug 18, 2022 at 3:15 PM Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>> wrote: Hi Folks, I request a new option be added to Daffodil. I don't have a name for the option, but here's the intent of the option: When an element in the schema has a simpleType that contains facets, use those facets to specify the content of a data field. For example, with this element declaration: <xs:element name="LatitudeDegrees"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}" /> </xs:restriction> </xs:simpleType> </xs:element> The pattern facet specifies that the field is two digit characters. Today, I can't do that. Instead, I have to add two DFDL properties to specify that the field's length is two: <xs:element name="LatitudeDegrees" dfdl:lengthKind="explicit" dfdl:length="2"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{2}" /> </xs:restriction> </xs:simpleType> </xs:element> There should be no reason for having to add those DFDL properties. The XSD pattern facet already tells you that the length is two. Now, you might argue: "What's so hard about adding those two properties?" You are correct for this specific instance. It's easy if we are building the DFDL schema manually, hardcoding every XSD element declaration. But if we want to write a program that can input arbitrary XSD and automatically apply the appropriate DFDL properties, then things aren't so easy. Case in point: Write a program that inputs an arbitrary sequence of XSD element declarations. The sequence of elements represent the parts of one data field. Each part may be of fixed or variable length. There is no separator between the parts. The parts are non-nillable. The program must output the element declarations with the appropriate DFDL properties added. It is probably impossible to write such a program with today's DFDL. Or at least, very difficult. If DFDL leveraged the XSD facets, then that would greatly simplify the DFDL schema. And, it would enable programs to be written to automate the production of DFDL schemas. I recommend including such an option in the next release of Daffodil. I thought that the -V limited option was doing what I describe above. Sadly, I realized today that it doesn't. /Roger