[Apologies for delayed response, hiccup with our gitlab version control now fixed. All related work should now be publicly visible and usable.]
Mike, your capability request below sounds like an excellent match for the capabilities of our "DFDL Attribution" project, which took a pipeline approach to this long-standing challenge. * Data Format Description Language (DFDL) "Attribution" Project * https://gitlab.nps.edu/Savage/robodata/-/blob/master/DFDL/attribution/README.md * This project is working to show that additional DFDL support for XML attributes is feasible by using a "pipeline" approach to processing. * Good initial progress has been made that allows use of an attribute-aware XML schema. Pre- and post-processing XSLT stylesheets can convert XML documents and schemas into equivalent element-only form that DFDL can use to parse/unparse data documents. * https://gitlab.nps.edu/Savage/robodata/-/raw/master/DFDL/attribution/images/DfdlXmlElementAttributeTransformations.png XSLT preprocessing and postprocessing of both intermediate and destination XML documents and XML XSD schema means that this approach can be used with any DFDL processor, either by tool builders or by data engineers. The essence of this approach is that XML attributes are converted into unique XML child elements. This enables a DFDL parser to remain element-aware and attribute-unaware. Here are some illustrations for the correspondences we chose... Pretty unambiguous, child-element names simply prefix an underscore to attribute name. * https://gitlab.nps.edu/Savage/robodata/-/blob/master/DFDL/attribution/images/DocumentTransformationsSchemaView.png * https://gitlab.nps.edu/Savage/robodata/-/blob/master/DFDL/attribution/images/DocumentTransformationsTreeView.png * https://gitlab.nps.edu/Savage/robodata/-/blob/master/DFDL/attribution/images/DocumentTransformationsXmlView.png We have only tested it for one or two cases, but have designed it to be general. The primary test case shown in the preceding diagrams is quite general. Further testing welcome. There are plenty of achievable TODO items in the README page. (deep breath) Here is why such a capability is really important. DFDL offers magnificent capabilities. However you cannot go from (some arbitrary dataset) to (some existing XML format). This is a major limitation on DFDL utility for mapping arbitrary regular data into widely used XML data forms. (second deep breath) If DFDL pipelines can achieve full support for conversions to and from XML, they can also take full advantage of Efficient XML Interchange (EXI) compression. The EXI Recommendation algorithms have been experimentally shown to meet or beat any other general compression scheme (for example ZIP and GZIP), and further offer significantly faster (and computationally efficient) data decompression. I personally believe that adding full support for DFDL for maping to/from XML can eliminate a major barrier inhibiting more widespread DFDL employment. All questions and collaborative efforts welcome. Very respectfully yours. all the best, Don -- Don Brutzman Naval Postgraduate School, Code USW/Br brutz...@nps.edu Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149 X3D graphics, virtual worlds, navy robotics https://faculty.nps.edu/brutzman ________________________________ From: Mike Beckerle <mbecke...@apache.org> Sent: Thursday, October 31, 2024 6:30 AM To: Brutzman, Donald (Don) (CIV) <brutz...@nps.edu> Cc: Claude Mamo <claude.m...@gmail.com>; Roger L Costello <coste...@mitre.org>; Norbraten, Terry (CIV) <tdnor...@nps.edu>; Blais, Curtis (Curt) (CIV) <clbl...@nps.edu>; users@daffodil.apache.org <users@daffodil.apache.org> Subject: Re: Proposal: Extensible DFDL Beside converting the XML instances to/from an attribute-centric form, there is the need for an XML Schema that describes that form. Converting a DFDL schema, or element-oriented XSD, into one which describes the attribute-oriented variant is non-trivial in the general case. Has anyone worked on tooling for that? On Wed, Oct 30, 2024 at 6:22 PM Brutzman, Donald (Don) (CIV) <brutz...@nps.edu<mailto:brutz...@nps.edu>> wrote: [cc: Curt] Thanks for updated status. Of relevant note is that our NPS team came up with a round-trip approach that converts arbitrary element-attribute XML to corresponding element-element XML, then back again. XSLT is used in each direction. Online at * https://gitlab.nps.edu/Savage/robodata/-/tree/master/DFDL/attribution * https://gitlab.nps.edu/Savage/robodata/-/blob/master/DFDL/attribution/README.md However... am surprised to see that this is not public access. My mistake - apologies for the inconvenience, we will carefully work towards releasing it. * Terry, can we please review together (there are some other things available in parent robodata project) to ensure that we can indeed go fully public with the project. * Attached please find advance copies of some of the screenshots. Summary description appears in pages 3..6 of the following white paper. * Data Strategy for Unmanned Systems: Field Experimentation (FX), Simulation and Analysis * https://nps.edu/web/now/data-strategy-for-autonomous-systems * https://nps.edu/documents/151816058/0/DataStrategyUnmannedSystemsTechnicalMemorandum2023January25.pdf As before: regardless of how complex the implementation of a DFDL processor might be, if this stylesheet is indeed general, then it might server as a DFDL preprocessor/postprocessor for handling attribute-aware DFDL schema. Some additional thoughts: * Perhaps DFDL parsing/unparsing of XML of a source document that includes attributes might provide another angle on this problem. * You won't catch me using ChatGPT but adding descriptions within DFDL schema might further encourage automated translation. Mike and Roger, if a meeting discussing this topic might help, I can be available during second half of November. Very respectfully yours. all the best, Don -- Don Brutzman Naval Postgraduate School, Code USW/Br brutz...@nps.edu<mailto:brutz...@nps.edu> Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149 X3D graphics, virtual worlds, navy robotics https://faculty.nps.edu/brutzman ________________________________ From: Mike Beckerle <mbecke...@apache.org<mailto:mbecke...@apache.org>> Sent: Wednesday, October 30, 2024 7:08 AM To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> <users@daffodil.apache.org<mailto:users@daffodil.apache.org>> Cc: Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>>; Norbraten, Terry (CIV) <tdnor...@nps.edu<mailto:tdnor...@nps.edu>>; Brutzman, Donald (Don) (CIV) <brutz...@nps.edu<mailto:brutz...@nps.edu>> Subject: Re: Proposal: Extensible DFDL That proposal for XML attributes in DFDL has not been prototyped. I believe it is not ready - still only half baked. E.g, the implications of XML attributes' whitespace collapsing behavior are very problematic when using an XML attribute to logically represent data that physically does not conform. XML attributes are entirely unable to represent data that contains, for example, multiple adjacent space characters, or line-endings. If whitespace is significant, attributes won't work. Today there is XSLT and AI. E.g., chatGPT seems to be able to write XSLT very well from XML snippets and a description or example of what you want out of the transformation. The whole burden of having to write symmetric transforms - one for parsing, the inverse for unparsing, is eliminated when chatGPT writes them both for you. On Wed, Oct 30, 2024 at 4:58 AM Claude Mamo <claude.m...@gmail.com<mailto:claude.m...@gmail.com>> wrote: Was there movement on creating attributes from DFDL? I found this https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Extend+DFDL+with+XML+Attribute+Support but does someone know whether this will be available anytime soon? A bit of context. I have a scenario in EDI X12 where (1) the DFDL schema is very generic and (2) the segment ID needs to be an attribute in the parent element so that the XPath selectors in Smooks don't easily break when routing segments. Unfortunately, due to the streaming nature of Smooks, I can't use something like this for the selector: /interchange/segment[segmentId/text() = "GS"]. The workaround so far is to use indexes (e.g., /interchange/segment[2]) but this is bad for various reasons. Thanks, Claude On Thu, Nov 2, 2023 at 5:10 PM Brutzman, Donald (Don) (CIV) <brutz...@nps.edu<mailto:brutz...@nps.edu>> wrote: I think that the single most significant and powerful extension capability for DFDL would be to support attributes. XML Schema is highly extensible already and widely deployed. JSON schema is pretty consistent and has similar expressive power - if ever finally standardized and consistently supported in tools, it might further broaden the available information-architecture infrastructure for many applications and much of the Web. The ability to align DFDL directly with any XML Schema, to support consistent mappings of diverse datasets with coherent data models, would be major increase in DFDL capability. p.s. long-held opinion: skateboards and attributes are not a crime… 8) all the best, Don -- Don Brutzman Naval Postgraduate School, Code USW/Br brutz...@nps.edu<mailto:brutz...@nps.edu> Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149 X3D graphics, virtual worlds, navy robotics https://faculty.nps.edu/brutzman From: Mike Beckerle <mbecke...@apache.org<mailto:mbecke...@apache.org>> Sent: Thursday, November 2, 2023 8:27 AM To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> Subject: Re: Proposal: Extensible DFDL I think extensibility would be great for DFDL. The DFDL workgroup punted on this as there was no such thing as an extensible format description language to generalize into a standard. We realized that unparsing was already breaking a lot of new ground, but it was a must-have feature. So we had to draw a line somewhere on the number of untested new concepts in DFDL or it would never get done. It took 20 years as is to become standardized. Some format description languages may have been implemented this extensible way, but that was not a visible user feature in any one that I ever saw. As a research effort this is a good idea. Daffodil is available for use in prototyping if that's useful, and if it turns out to be valuable it could be proposed for inclusion in DFDL in the future. Some years ago I suggested this to someone as a thesis topic for a CS PhD project, but to my knowledge it didn't go anywhere. On Thu, Nov 2, 2023 at 10:20 AM Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>> wrote: Hi Folks, Consider this input containing a date time value: 20230926T124800Z We can design the DFDL, using the xs:datetime datatype and associated DFDL calendar properties, so that parsing produces this XML: <DateTimeIso>2023-09-26T12:48:00+00:00</DateTimeIso> That is beautiful XML - concise and precise. Next, consider input containing a lat/long value: 2006N-05912E It would be excellent if we could design the DFDL so that parsing produces this: <OriginOfBearing>20°06′N 059°12′E</OriginOfBearing> That is also beautiful XML. In fact, it is possible to achieve this! By hiding the input and then performing a bunch of transformations using dfdl:inputValueCalc. However, that's a terrible approach because, as Mike Beckerle often says, "DFDL is not a transformation language!" If only we had a latlong datatype and associated DFDL latlong properties ..... If only we could extend DFDL ....... How about making DFDL extensible? How about allowing users of DFDL to create their own datatypes (actually, XSD already allows this) and allow users to create their own DFDL properties for the user-defined datatype? That is, how about turning DFDL into extensible DFDL? Thoughts? /Roger