I don't believe either recursion nor offset-oriented formats (TIFF is an 
example) are solvable by something as simple as adding another DFDL parse pass.


Adding recursion to DFDL would make the language Turing-complete. Using two 
passes does not. Nor does adding any a-priori *finite* (not determined from the 
data) number of passes.


It's harder for me to argue why offset-oriented formats are harder.... I just 
can't see how two passes are relevant. In TIFF there's a header you can parse. 
After the header, whether the next bytes are unused wasted space or part of the 
data requires you to interpret an offset in the header, seek to that location, 
which could be very much later in the data, interpret that part of the format, 
follow an offset from it (which can be to very much earlier in the data), and 
so on, without limit. Random-access is at the core of the format. There's no 
way to describe the data without following the offsets around the data one by 
one. This is a loop doing a "pass" of DFDL parsing per iteration of the loop. 
Nothing like just 2 passes or any finite apriori number of passes.


...mike beckerle

Tresys






________________________________
From: Costello, Roger L. <[email protected]>
Sent: Friday, January 25, 2019 1:58:11 PM
To: [email protected]
Subject: RE: [EXT] Re: Assertion: DFDL can parse/unparse every data format ... 
do you agree?

Thanks Steve.

Can't "offset formats" be handled in DFDL by using 2 passes?

I wonder if "recursive formats" can be handled using 2 passes?

/Roger

-----Original Message-----
From: Steve Lawrence <[email protected]>
Sent: Friday, January 25, 2019 1:41 PM
To: [email protected]; Costello, Roger L. <[email protected]>
Subject: [EXT] Re: Assertion: DFDL can parse/unparse every data format ... do 
you agree?

There are definitely formats that DFDL cannot parse. Two general categories 
immediately come to my mind:

1) Recursive formats. DFDL v1.0 does not allow recursion in schemas, and so any 
formats that are recursive cannot be modeled completed in DFDL.
For example, people have tried to model JSON in the past. It can be modeled, 
but only to an arbitrary depth.

2) Offset formats. Some formats specify offsets into data to determine where 
the next chunk of data is, sortof akin to random data access. An example of 
that is the TIFF format. DFDL has no way to support these types of formats.

It's very possible that these two features will be added in future versions of 
DFDL and Daffodil, but as of today that is not possible.

DFDLv1.0 also does not support layering (e.g. compressed data, line folding), 
but Daffodil has implemented an extension to support those kinds of features.

- Steve

On 1/25/19 9:23 AM, Costello, Roger L. wrote:
> Hello DFDL community,
>
> I realize that there are some data formats that require two passes.
>
> Assertion: DFDL can parse and unparse every data format. Most formats can be 
> parsed in a single pass, the remaining can be parsed in two passes.
>
> Do you agree with this assertion?
>
> /Roger
>

Reply via email to