Tom,
Any data format that is popular gets encapsulated and carried around in other data formats. Nature of the whole data game is a long history of this. E.g., you just want to aggregate multiple different pieces of data about a particular event together in a common data structure, but you have a constraint on the format you must use for this aggregate. So genericallly, * using Daffodil to convert data from native (e.g., binary) into format X. * Typically format X is textual, but not necessarily. * The native data also contains data that is already in format X. * Many use cases will want the result to be Format X, not Format X with embedded escapified Format X pieces. Hence merging the translated with the encapsulated pieces is a natural need. Format X could be XML, JSON, EXI (binary XML), S-expresions, SISL, or other things. The fact that Daffodil has a built in validation module, that in the case of XML Schemas, would not be able to use the DFDL schema to validate "Format X" when Format X is XML, that's a corner case for XML. If this really became important, we could add a validation feature to enable validation to choose a different XML schema than the DFDL schema. This is already needed just if you want the validation to have some things like key/unique constraints that are not allowed to appear in a DFDL schema. The feature is also almost already there because if you use schematron validation, that can use a separate schematron .sch file for the validation rules. So making the regular xerces XML validator able to take a different XML schema for the validation seems like a small thing. So I think this is a good generic capability to add to Daffodil. We just need a motivated contributor to create it 🙂 (always recruiting new developers!) -mike beckerle ________________________________ From: Steve Lawrence <slawre...@apache.org> Sent: Wednesday, September 22, 2021 12:21 PM To: users@daffodil.apache.org <users@daffodil.apache.org> Subject: Re: Parsing formats with embedded XML -- recursion and/or layering required? We actually recently added a feature that was intended to solve just this problem of including XML payloads in the resulting infoset as XML rather than a string. Though it requires a custom InfosetInputter and InfosetOutputter that have not been written yet. The proposal is here: https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties The idea is that your payload element is just a normal xs:string, and you annotate it with a custom runtime property like treatStringAsXML=true. Then you can write a custom InfosetOuputter that uses his annotation and outputs the string as XML during parse, and a custom InfosetInputter that converts that XML back to a string during unparse. The Example Implementation discusses this exact use case and gives an idea of how one might implement the custom InfosetInputter/Outputter. This example uses Scala XML Nodes for simplicity, but could be done with the standard text inputter/outputters as well. One thing to point out though is that to Daffodil and its internals, this payload element is still a string. Daffodil has no knowledge about what the InfosetInputter/Outputters are doing, so Daffodil cannot reference the XML payload in DFDL expressions, or validate the XML against a schema. For validation, you would need to pipe the resulting infoset to some other tool with a modified schema that does not treat this payload as a string. Since this is the second time I've come across this requirement, it might be worth considering if this will be a more common technique, and if maybe we should add some built-in mechanism to DFDL, one that would work with both DFDL expressions and validation... - Steve On 9/22/21 11:58 AM, Ballard, Tom - US wrote: > All, > > I have a complex data format I am trying to implement a DFDL schema for, but > don’t believe it’s possible without support for either recursion decomposition > and/or layering. The format in question has a subset of messages which > consist > of a binary “header” followed by an XML payload. The messages begin with a > handful of binary metadata fields, followed by a binary length field, and then > an XML payload (which is the length indicated in length field). In some cases > there may be binary data subsequent to the XML payload as well. I assume I > can > pull the XML payload in as an opaque string blob, but the problem is I also > need > to validate that XML against a schema. > > I know recursion and layering are on the project wish list, but is there a way > to accomplish full parsing and validation of “hybrid” messages like I > described > possible without them? > > V/R, > > Tom Ballard > > > -------------------------------------------------------------------------------- > > This electronic message contains information from CACI International Inc or > subsidiary companies, which may be company sensitive, proprietary, privileged > or > otherwise protected from disclosure. The information is intended to be used > solely by the recipient(s) named above. If you are not an intended recipient, > be > aware that any review, disclosure, copying, distribution or use of this > transmission or its contents is prohibited. If you have received this > transmission in error, please notify the sender immediately. >