Hi Roded,

> 3. Is it currently possible to parse a input stream but output only a set
> of field from the specification? For example, could an XPath be specified
> to determine which nodes in the specification Daffodil will output?
>

It's possible with Smooks's DFDL cartridge
<https://github.com/smooks/smooks-dfdl-cartridge>. You can select which
elements you want to handle from the stream that Daffodil produces using
Smooks's XPath-like language and then efficiently process the selected
elements in whichever way you want.

4. Is there a recommended way of dynamically creating a DFDL specification
> XSD? or should I just use general tooling?
>

In the past, I've used Mustache <https://github.com/spullara/mustache.java>
and XSLT to generate DFDL schemas. It worked well for me.

Claude

On Mon, Mar 13, 2023 at 1:28 PM Steve Lawrence <slawre...@apache.org> wrote:

> Here's some highish level answers. If you need more details on anything
> let us know.
>
> 1. Yep, we call this feature "layers". You can create a custom layer
> plugin that receives data (as defined by the DFDL schema), your layer
> code transforms (e.g. uncompresses) and outputs that data, and then
> Daffodil parses the outputted data as defined by the DFDL schema.
>
> Here are implementations of the layers included with Daffodil for gzip,
> base64, line folding, and byte swapping:
>
>
> https://github.com/apache/daffodil/tree/main/daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1
>
> And they are pluggable using Java service loaders, e.g.:
>
>
> https://github.com/apache/daffodil/blob/main/daffodil-runtime1-layers/src/main/resources/META-INF/services/org.apache.daffodil.runtime1.layers.LayerCompiler
>
> So you can create the layer outside of Daffodil, create a jar with the
> right services file, put it on the classpath and Daffodil will be able
> to find and use it.
>
> And here is the design proposal of the feature with more details and
> links to related design pages:
>
>
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Dynamically+loading+Layer+Transformations
>
>
> 2. I don't think we have any documentation, but we have a number of
> examples how to define custom charsets. For example, here's a fairly
> small IBM037 charset that we include in Daffodil which is just a lookup
> table:
>
>
> https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/scala/org/apache/daffodil/io/processors/charset/IBM037.scala
>
> You essentially just need to implement BitsCharsetDefinition which
> returns a "BitsCharset" that can creae a BitsCharsetEncoder/Decoder.
> Depending on the complexity of your charset, you maybe be able to use
> existing base classes (e.g. BitsCharseJava) that do a lot of the heavy
> lifting.
>
> Note that these are also loaded using Java service loaders, e.g.:
>
>
> https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.io.processors.charset.BitsCharsetDefinition
>
>
> 3. Not at the moment. If you wanted only a subset of fields, you would
> need to post process the fields and extract what parts you need
> yourself. Languages like XSLT/XQuery could probably do this without too
> much effort.
>
> Another alternative would be to create a custom InfosetOutputter that
> would ignore infoset events that you don't care about and keep those you
> do. You could use your own logic for how you determine which fields are
> important, or you could also use dfdlx:runtimeProperties to annotate the
> schema and have your custom InfosetOutputter use those. Here's the
> design information on runtime properties:
>
>
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties
>
> Here's a small example of a custom InfosetOutputter we use for testing,
> which just captures all events and stores them in a list. You could
> imagine doing some sort of filtering and only capture the fields you
> want and ouputting to a custom data structure instead of XML, for example.
>
>
> https://github.com/apache/daffodil/blob/main/daffodil-japi/src/test/java/org/apache/daffodil/example/TestInfosetOutputter.java
>
>
> 4. I haven't personally done a lot of DFDL schema generation, though I
> know other Daffodil devs have, they may be able to chime in on helpful
> tips. But I don't think it's anything unique really. I think mostly what
> they do is get a machine readable specification of the data format, load
> that into some model and then iterate over the model and output strings
> to file. We're very familiar with Scala so we tend to write DFDL schema
> generators in that, which is also nice since it has language support for
> XML. So XML templates are sort of built into the language. But any
> template language would probably work fine.
>
> - Steve
>
>
>
> On 2023-03-13 06:36 AM, Roded Bahat wrote:
> > Hi all,
> > I'm looking into integrating Apache Daffodil into our product and have
> > several questions for which I could not find answers in the
> > documentation or issues.
> >
> > 1. Is it currently possible to extend Daffodil with custom types? For
> > example, could I create a custom field type for a field compressed with
> > a custom compression and have Daffodil call my own code for further
> > parsing of the original field value?
> > 2. The DFDL spec states that additional implementation-defined encoding
> > names can be defined. How would a custom encoding be defined in the DFDL
> > specification?
> > 3. Is it currently possible to parse a input stream but output only a
> > set of field from the specification? For example, could an XPath be
> > specified to determine which nodes in the specification Daffodil will
> > output?
> > 4. Is there a recommended way of dynamically creating a DFDL
> > specification XSD? or should I just use general tooling?
> >
> > Any pointers and help would be much appreciated.
> > Thanks!
> >
> > Roded
>
>

Reply via email to