If I understand this, I think you are solving the primary issue of
attribute declarations not being in the right lexical position in the XSD
by essentially requiring the schema author to add element tiers so that the
declaration order matches the physical order.

So for example: I have a record of data that looks like: (a, b, c), d, e,
(f, g, h)
I want a vector v1 of elements containing a, b, and c,
then I want two attributes to hold d and e.
Then I want a vector v2 of elements containing f, g, and h.

Here's what I can't achieve because declarations for d and e attributes
would be after the element declarations for v2.

<record d="d" e="e">
  <v1>a</v1><v1>b</v1><v1>c</v1>
  <v2>f</v2><v2>g</v2><v2>h</v2>
</record>

But I can achieve this by introducing an element to hold the attributes d
and e in the physical location between the two data vectors.

<record>
  <v1>a</v1><v1>b</v1><v1>c</v1>
  <attributes d="d" e="e"/>
  <v2>f</v2><v2>g</v2><v2>h</v2>
</record>

If this is acceptable then this is well worth considering further.

Using XML attributes would place some constraints on the physical data that
you can choose to put into attributes. E.g., data can't contain leading nor
trailing whitespace that needs to be preserved, or adjacent whitespace
inside it which needs to be preserved, because of XML attribute whitespace
collapsing. That or some sort of non-ordinary escaping may be required,
such as using DFDL character entities.

There is also the issue of the implied sequence surrounding attribute
declarations. In my little example above, the d and e fields are comma
separated. There would be no sequence group around their declarations. This
can be finessed by letting dfdl:separator and related properties that we
normally place on xs:sequence to also be placed on xs:complexType, and such
properties would apply to both a sequence that is its model group and any
declared attributes.

I expect these sorts of issues could be worked out.

re: EXI

Daffodil supports EXI today. There is no need to further transform
DFDL's output XML Infosets to take advantage of EXI's density.



On Thu, Nov 7, 2024 at 9:00 AM Brutzman, Donald (Don) (CIV) <
brutz...@nps.edu> wrote:

> [Apologies for delayed response, hiccup with our gitlab version control
> now fixed.  All related work should now be publicly visible and usable.]
>
> Mike, your capability request below sounds like an excellent match for the
> capabilities of our "DFDL Attribution" project, which took a pipeline
> approach to this long-standing challenge.
>
>    - Data Format Description Language (DFDL) "Attribution" Project
>    -
>    
> https://gitlab.nps.edu/Savage/robodata/-/blob/master/DFDL/attribution/README.md
>    - This project is working to show that additional DFDL support for XML
>    attributes is feasible by using a "pipeline" approach to processing.
>    - Good initial progress has been made that allows use of an
>    attribute-aware XML schema. Pre- and post-processing XSLT stylesheets can
>    convert XML documents and schemas into equivalent element-only form that
>    DFDL can use to parse/unparse data documents.
>    -
>    
> https://gitlab.nps.edu/Savage/robodata/-/raw/master/DFDL/attribution/images/DfdlXmlElementAttributeTransformations.png
>
> XSLT preprocessing and postprocessing of both intermediate and destination
> XML documents and XML XSD schema means that this approach can be used with
> any DFDL processor, either by tool builders or by data engineers.
>
> The essence of this approach is that XML attributes are converted into
> unique XML child elements.  This enables a DFDL parser to remain
> element-aware and attribute-unaware.   Here are some illustrations for the
> correspondences we chose... Pretty unambiguous, child-element names simply
> prefix an underscore to attribute name.
>
>    -
>    
> https://gitlab.nps.edu/Savage/robodata/-/blob/master/DFDL/attribution/images/DocumentTransformationsSchemaView.png
>    -
>    
> https://gitlab.nps.edu/Savage/robodata/-/blob/master/DFDL/attribution/images/DocumentTransformationsTreeView.png
>    -
>    
> https://gitlab.nps.edu/Savage/robodata/-/blob/master/DFDL/attribution/images/DocumentTransformationsXmlView.png
>
> We have only tested it for one or two cases, but have designed it to be
> general.  The primary test case shown in the preceding diagrams is quite
> general.
>
> Further testing welcome.  There are plenty of achievable TODO items in the
> README page.
>
> (deep breath) Here is why such a capability is really important.  DFDL
> offers magnificent capabilities.  However you cannot go from (some
> arbitrary dataset) to (some existing XML format).  This is a major
> limitation on DFDL utility for mapping arbitrary regular data into widely
> used XML data forms.
>
> (second deep breath)  If DFDL pipelines can achieve full support for
> conversions to and from XML, they can also take full advantage of Efficient
> XML Interchange (EXI) compression.  The EXI Recommendation algorithms have
> been experimentally shown to meet or beat any other general compression
> scheme (for example ZIP and GZIP), and further offer significantly faster
> (and computationally efficient) data decompression.
>
> I personally believe that adding full support for DFDL for maping to/from
> XML can eliminate a major barrier inhibiting more widespread DFDL
> employment.
>
> All questions and collaborative efforts welcome.   Very respectfully yours.
>
>
> all the best, Don
>
> --
>
> Don Brutzman  Naval Postgraduate School, Code USW/Br
> brutz...@nps.edu
>
> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA
> +1.831.656.2149
>
> X3D graphics, virtual worlds, navy robotics
> https://faculty.nps.edu/brutzman
>
>
>
> ------------------------------
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Thursday, October 31, 2024 6:30 AM
> *To:* Brutzman, Donald (Don) (CIV) <brutz...@nps.edu>
> *Cc:* Claude Mamo <claude.m...@gmail.com>; Roger L Costello <
> coste...@mitre.org>; Norbraten, Terry (CIV) <tdnor...@nps.edu>; Blais,
> Curtis (Curt) (CIV) <clbl...@nps.edu>; users@daffodil.apache.org <
> users@daffodil.apache.org>
> *Subject:* Re: Proposal: Extensible DFDL
>
> Beside converting the XML instances to/from an attribute-centric form,
> there is the need for an XML Schema that describes that form.
>
> Converting a DFDL schema, or element-oriented XSD, into one which
> describes the attribute-oriented variant is non-trivial in the general
> case.
>
> Has anyone worked on tooling for that?
>
>
>
> On Wed, Oct 30, 2024 at 6:22 PM Brutzman, Donald (Don) (CIV) <
> brutz...@nps.edu> wrote:
>
> [cc: Curt]
>
> Thanks for updated status.
>
> Of relevant note is that our NPS team came up with a round-trip approach
> that converts arbitrary element-attribute XML to corresponding
> element-element XML, then back again.  XSLT is used in each direction.
> Online at
>
>    - https://gitlab.nps.edu/Savage/robodata/-/tree/master/DFDL/attribution
>
>    -
>    
> https://gitlab.nps.edu/Savage/robodata/-/blob/master/DFDL/attribution/README.md
>
> However... am surprised to see that this is not public access.  My mistake
> - apologies for the inconvenience, we will carefully work towards releasing
> it.
>
>    - Terry, can we please review together (there are some other things
>    available in parent robodata project) to ensure that we can indeed go fully
>    public with the project.
>    - Attached please find advance copies of some of the screenshots.
>
> Summary description appears in pages 3..6 of the following white paper.
>
>    - Data Strategy for Unmanned Systems: Field Experimentation (FX),
>    Simulation and Analysis
>    - https://nps.edu/web/now/data-strategy-for-autonomous-systems
>    -
>    
> https://nps.edu/documents/151816058/0/DataStrategyUnmannedSystemsTechnicalMemorandum2023January25.pdf
>
> As before: regardless of how complex the implementation of a DFDL
> processor might be, if this stylesheet is indeed general, then it might
> server as a DFDL preprocessor/postprocessor for handling attribute-aware
> DFDL schema.
>
> Some additional thoughts:
>
>    - Perhaps DFDL parsing/unparsing of XML of a source document that
>    includes attributes might provide another angle on this problem.
>    - You won't catch me using ChatGPT but adding descriptions within DFDL
>    schema might further encourage automated translation.
>
> Mike and Roger, if a meeting discussing this topic might help, I can be
> available during second half of November.
>
> Very respectfully yours.
>
>
> all the best, Don
>
> --
>
> Don Brutzman  Naval Postgraduate School, Code USW/Br
> brutz...@nps.edu
>
> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA
> +1.831.656.2149
>
> X3D graphics, virtual worlds, navy robotics
> https://faculty.nps.edu/brutzman
>
>
>
> ------------------------------
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Wednesday, October 30, 2024 7:08 AM
> *To:* users@daffodil.apache.org <users@daffodil.apache.org>
> *Cc:* Roger L Costello <coste...@mitre.org>; Norbraten, Terry (CIV) <
> tdnor...@nps.edu>; Brutzman, Donald (Don) (CIV) <brutz...@nps.edu>
> *Subject:* Re: Proposal: Extensible DFDL
>
> That proposal for XML attributes in DFDL has not been prototyped.
>
>  I believe it is not ready - still only half baked. E.g, the implications
> of XML attributes' whitespace collapsing behavior are very problematic when
> using an XML attribute to logically represent data that physically does not
> conform. XML attributes are entirely unable to represent data that
> contains, for example, multiple adjacent space characters, or line-endings.
> If whitespace is significant, attributes won't work.
>
> Today there is XSLT and AI. E.g., chatGPT seems to be able to write XSLT
> very well from XML snippets and a description or example of what you want
> out of the transformation. The whole burden of having to write symmetric
> transforms - one for parsing, the inverse for unparsing, is eliminated when
> chatGPT writes them both for you.
>
>
>
>
>
>
>
>
>
> On Wed, Oct 30, 2024 at 4:58 AM Claude Mamo <claude.m...@gmail.com> wrote:
>
> Was there movement on creating attributes from DFDL? I found this
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Extend+DFDL+with+XML+Attribute+Support
>  but
> does someone know whether this will be available anytime soon?
>
> A bit of context. I have a scenario in EDI X12 where (1) the DFDL schema
> is very generic and (2) the segment ID needs to be an attribute in the
> parent element so that the XPath selectors in Smooks don't easily break
> when routing segments. Unfortunately, due to the streaming nature of
> Smooks, I can't use something like this for the selector: 
> */interchange/segment[segmentId/text()
> = "GS"]*. The workaround so far is to use indexes (e.g.,
> */interchange/segment[2]*) but this is bad for various reasons.
>
> Thanks,
>
> Claude
>
>
> On Thu, Nov 2, 2023 at 5:10 PM Brutzman, Donald (Don) (CIV) <
> brutz...@nps.edu> wrote:
>
> I think that the single most significant and powerful extension capability
> for DFDL would be to support attributes.
>
>
>
> XML Schema is highly extensible already and widely deployed.  JSON schema
> is pretty consistent and has similar expressive power - if ever finally
> standardized and consistently supported in tools, it might further broaden
> the available information-architecture infrastructure for many applications
> and much of the Web.
>
>
>
> The ability to align DFDL directly with any XML Schema, to support
> consistent mappings of diverse datasets with coherent data models, would be
> major increase in DFDL capability.
>
>
>
> p.s. long-held opinion:  skateboards and attributes are not a crime…  8)
>
>
>
> all the best, Don
>
> --
>
> Don Brutzman  Naval Postgraduate School, Code USW/Br
> brutz...@nps.edu
>
> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA
> +1.831.656.2149
>
> X3D graphics, virtual worlds, navy robotics
> https://faculty.nps.edu/brutzman
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Thursday, November 2, 2023 8:27 AM
> *To:* users@daffodil.apache.org
> *Subject:* Re: Proposal: Extensible DFDL
>
>
>
> I think extensibility would be great for DFDL.
>
>
>
> The DFDL workgroup punted on this as there was no such thing as an
> extensible format description language to generalize into a standard.
>
> We realized that unparsing was already breaking a lot of new ground, but
> it was a must-have feature.
>
>
>
> So we had to draw a line somewhere on the number of untested new concepts
> in DFDL or it would never get done. It took 20 years as is to become
> standardized.
>
>
>
> Some format description languages may have been implemented this
> extensible way, but that was not a visible user feature in any one that I
> ever saw.
>
>
>
> As a research effort this is a good idea. Daffodil is available for use in
> prototyping if that's useful, and if it turns out to be valuable it could
> be proposed for inclusion in DFDL in the future.
>
>
>
> Some years ago I suggested this to someone as a thesis topic for a CS PhD
> project, but to my knowledge it didn't go anywhere.
>
>
>
>
>
> On Thu, Nov 2, 2023 at 10:20 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Folks,
>
> Consider this input containing a date time value:
>
> 20230926T124800Z
>
> We can design the DFDL, using the xs:datetime datatype and associated DFDL
> calendar properties, so that parsing produces this XML:
>
> <DateTimeIso>2023-09-26T12:48:00+00:00</DateTimeIso>
>
> That is beautiful XML - concise and precise.
>
> Next, consider input containing a lat/long value:
>
> 2006N-05912E
>
> It would be excellent if we could design the DFDL so that parsing produces
> this:
>
> <OriginOfBearing>20°06′N 059°12′E</OriginOfBearing>
>
> That is also beautiful XML.
>
> In fact, it is possible to achieve this! By hiding the input and then
> performing a bunch of transformations using dfdl:inputValueCalc.
>
> However, that's a terrible approach because, as Mike Beckerle often says,
> "DFDL is not a transformation language!"
>
> If only we had a latlong datatype and associated DFDL latlong properties
> .....
>
> If only we could extend DFDL .......
>
> How about making DFDL extensible? How about allowing users of DFDL to
> create their own datatypes (actually, XSD already allows this) and allow
> users to create their own DFDL properties for the user-defined datatype?
>
> That is, how about turning DFDL into extensible DFDL?
>
> Thoughts?
>
> /Roger
>
>

Reply via email to