Many thanks for the detailed answers, these really help a lot. I'll start experimenting with Apache Daffodil and ask follow up questions in new specific threads. Much obliged.
On Mon, 2023-03-13 at 08:28 -0400, Steve Lawrence wrote: > Here's some highish level answers. If you need more details on > anything > let us know. > > 1. Yep, we call this feature "layers". You can create a custom layer > plugin that receives data (as defined by the DFDL schema), your layer > code transforms (e.g. uncompresses) and outputs that data, and then > Daffodil parses the outputted data as defined by the DFDL schema. > > Here are implementations of the layers included with Daffodil for > gzip, > base64, line folding, and byte swapping: > > https://github.com/apache/daffodil/tree/main/daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1 > > And they are pluggable using Java service loaders, e.g.: > > https://github.com/apache/daffodil/blob/main/daffodil-runtime1-layers/src/main/resources/META-INF/services/org.apache.daffodil.runtime1.layers.LayerCompiler > > So you can create the layer outside of Daffodil, create a jar with > the > right services file, put it on the classpath and Daffodil will be > able > to find and use it. > > And here is the design proposal of the feature with more details and > links to related design pages: > > https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Dynamically+loading+Layer+Transformations > > > 2. I don't think we have any documentation, but we have a number of > examples how to define custom charsets. For example, here's a fairly > small IBM037 charset that we include in Daffodil which is just a > lookup > table: > > https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/scala/org/apache/daffodil/io/processors/charset/IBM037.scala > > You essentially just need to implement BitsCharsetDefinition which > returns a "BitsCharset" that can creae a BitsCharsetEncoder/Decoder. > Depending on the complexity of your charset, you maybe be able to use > existing base classes (e.g. BitsCharseJava) that do a lot of the > heavy > lifting. > > Note that these are also loaded using Java service loaders, e.g.: > > https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.io.processors.charset.BitsCharsetDefinition > > > 3. Not at the moment. If you wanted only a subset of fields, you > would > need to post process the fields and extract what parts you need > yourself. Languages like XSLT/XQuery could probably do this without > too > much effort. > > Another alternative would be to create a custom InfosetOutputter that > would ignore infoset events that you don't care about and keep those > you > do. You could use your own logic for how you determine which fields > are > important, or you could also use dfdlx:runtimeProperties to annotate > the > schema and have your custom InfosetOutputter use those. Here's the > design information on runtime properties: > > https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties > > Here's a small example of a custom InfosetOutputter we use for > testing, > which just captures all events and stores them in a list. You could > imagine doing some sort of filtering and only capture the fields you > want and ouputting to a custom data structure instead of XML, for > example. > > https://github.com/apache/daffodil/blob/main/daffodil-japi/src/test/java/org/apache/daffodil/example/TestInfosetOutputter.java > > > 4. I haven't personally done a lot of DFDL schema generation, though > I > know other Daffodil devs have, they may be able to chime in on > helpful > tips. But I don't think it's anything unique really. I think mostly > what > they do is get a machine readable specification of the data format, > load > that into some model and then iterate over the model and output > strings > to file. We're very familiar with Scala so we tend to write DFDL > schema > generators in that, which is also nice since it has language support > for > XML. So XML templates are sort of built into the language. But any > template language would probably work fine. > > - Steve > > > > On 2023-03-13 06:36 AM, Roded Bahat wrote: > > Hi all, > > I'm looking into integrating Apache Daffodil into our product and > > have > > several questions for which I could not find answers in the > > documentation or issues. > > > > 1. Is it currently possible to extend Daffodil with custom types? > > For > > example, could I create a custom field type for a field compressed > > with > > a custom compression and have Daffodil call my own code for further > > parsing of the original field value? > > 2. The DFDL spec states that additional implementation-defined > > encoding > > names can be defined. How would a custom encoding be defined in the > > DFDL > > specification? > > 3. Is it currently possible to parse a input stream but output only > > a > > set of field from the specification? For example, could an XPath be > > specified to determine which nodes in the specification Daffodil > > will > > output? > > 4. Is there a recommended way of dynamically creating a DFDL > > specification XSD? or should I just use general tooling? > > > > Any pointers and help would be much appreciated. > > Thanks! > > > > Roded >