Many thanks for the detailed answers, these really help a lot.
I'll start experimenting with Apache Daffodil and ask follow up
questions in new specific threads.
Much obliged.

On Mon, 2023-03-13 at 08:28 -0400, Steve Lawrence wrote:
> Here's some highish level answers. If you need more details on
> anything 
> let us know.
> 
> 1. Yep, we call this feature "layers". You can create a custom layer 
> plugin that receives data (as defined by the DFDL schema), your layer
> code transforms (e.g. uncompresses) and outputs that data, and then 
> Daffodil parses the outputted data as defined by the DFDL schema.
> 
> Here are implementations of the layers included with Daffodil for
> gzip, 
> base64, line folding, and byte swapping:
> 
> https://github.com/apache/daffodil/tree/main/daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1
> 
> And they are pluggable using Java service loaders, e.g.:
> 
> https://github.com/apache/daffodil/blob/main/daffodil-runtime1-layers/src/main/resources/META-INF/services/org.apache.daffodil.runtime1.layers.LayerCompiler
> 
> So you can create the layer outside of Daffodil, create a jar with
> the 
> right services file, put it on the classpath and Daffodil will be
> able 
> to find and use it.
> 
> And here is the design proposal of the feature with more details and 
> links to related design pages:
> 
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Dynamically+loading+Layer+Transformations
> 
> 
> 2. I don't think we have any documentation, but we have a number of 
> examples how to define custom charsets. For example, here's a fairly 
> small IBM037 charset that we include in Daffodil which is just a
> lookup 
> table:
> 
> https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/scala/org/apache/daffodil/io/processors/charset/IBM037.scala
> 
> You essentially just need to implement BitsCharsetDefinition which 
> returns a "BitsCharset" that can creae a BitsCharsetEncoder/Decoder. 
> Depending on the complexity of your charset, you maybe be able to use
> existing base classes (e.g. BitsCharseJava) that do a lot of the
> heavy 
> lifting.
> 
> Note that these are also loaded using Java service loaders, e.g.:
> 
> https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.io.processors.charset.BitsCharsetDefinition
> 
> 
> 3. Not at the moment. If you wanted only a subset of fields, you
> would 
> need to post process the fields and extract what parts you need 
> yourself. Languages like XSLT/XQuery could probably do this without
> too 
> much effort.
> 
> Another alternative would be to create a custom InfosetOutputter that
> would ignore infoset events that you don't care about and keep those
> you 
> do. You could use your own logic for how you determine which fields
> are 
> important, or you could also use dfdlx:runtimeProperties to annotate
> the 
> schema and have your custom InfosetOutputter use those. Here's the 
> design information on runtime properties:
> 
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties
> 
> Here's a small example of a custom InfosetOutputter we use for
> testing, 
> which just captures all events and stores them in a list. You could 
> imagine doing some sort of filtering and only capture the fields you 
> want and ouputting to a custom data structure instead of XML, for
> example.
> 
> https://github.com/apache/daffodil/blob/main/daffodil-japi/src/test/java/org/apache/daffodil/example/TestInfosetOutputter.java
> 
> 
> 4. I haven't personally done a lot of DFDL schema generation, though
> I 
> know other Daffodil devs have, they may be able to chime in on
> helpful 
> tips. But I don't think it's anything unique really. I think mostly
> what 
> they do is get a machine readable specification of the data format,
> load 
> that into some model and then iterate over the model and output
> strings 
> to file. We're very familiar with Scala so we tend to write DFDL
> schema 
> generators in that, which is also nice since it has language support
> for 
> XML. So XML templates are sort of built into the language. But any 
> template language would probably work fine.
> 
> - Steve
> 
> 
> 
> On 2023-03-13 06:36 AM, Roded Bahat wrote:
> > Hi all,
> > I'm looking into integrating Apache Daffodil into our product and
> > have 
> > several questions for which I could not find answers in the 
> > documentation or issues.
> > 
> > 1. Is it currently possible to extend Daffodil with custom types?
> > For 
> > example, could I create a custom field type for a field compressed
> > with 
> > a custom compression and have Daffodil call my own code for further
> > parsing of the original field value?
> > 2. The DFDL spec states that additional implementation-defined
> > encoding 
> > names can be defined. How would a custom encoding be defined in the
> > DFDL 
> > specification?
> > 3. Is it currently possible to parse a input stream but output only
> > a 
> > set of field from the specification? For example, could an XPath be
> > specified to determine which nodes in the specification Daffodil
> > will 
> > output?
> > 4. Is there a recommended way of dynamically creating a DFDL 
> > specification XSD? or should I just use general tooling?
> > 
> > Any pointers and help would be much appreciated.
> > Thanks!
> > 
> > Roded
> 

Reply via email to