Traditionally the way to do this is with a two-pass approach. Have a schema that
parses just the headers and treats the payloads as hexbinary/blobs. After
parsing it extracts the hexBinary/blob payloads, concatenate them together, and
then parse that result with a different schema.
A possible alternative (thought I don't think this has been done before) to do
it in one pass might be using a custom Layer. The new Layer would read the
Headers and Extended Headers and pass through only the payloads so that Daffodil
only ever sees the reassembled payload. Your schema then just becomes something
like this:
<element name="file">
<complexType>
<sequence>
<sequence dfdl:layer="PayloadRessembler">
<element ref="ex:payload" />
</sequence>
</sequence>
</complexType>
</element>
And your "payload" element can assume the it's just parsing the assembled
payload.
This would mean that parsing of the Header and Extended Header are done in code
in the Layer and wouldn't even be part of the infoset, which isn't necessarily
ideal (often the point of Daffodil is to avoid code specific to one format), but
with small enough headers it's maybe not a big deal.
And on unparsing, the layer would have to recreate the Headers/Extended Headers
and split the payload.
On 2024-04-10 02:07 PM, Larry Barber wrote:
Does anyone know of a way to handle data that is split into sperate pieces?
I can parse the payload normally, but due to variable length fields, etc., it
can span multiple packets – as shown in the diagram below.
I can’t think of a way to allow parsing of the first packet to complete without
all of the data being present, and then continuing the parse in the second (or
third, fourth, etc.) packet(s).