Interesting question.

So if you have legacy/pre-existing data formats, then the use case for DFDL is 
clear.

So I think of your question as this really: What are use cases for DFDL for 
"new" applications?

I think new applications that are inventing file formats may end up using DFDL 
if the application authors are too lazy to use say, XML as the file format. If 
they just do whatever is easiest to write-out from their favorite programming 
language, then they're going to get an ad-hoc file format, and in the future if 
some *other* software wants to read that file, then DFDL is a tool of choice.

But it is preferable if new applications that invent file formats do so 
purposefully and use a standard text-oriented representation like XML. (Could 
be JSON too, but lack of a schema language for JSON makes it far less desirable 
IMHO.)

The exceptions here are if speed/space concerns make the overhead of XML too 
high.

There is an environmental argument against using XML.
Consider all the wasted CPU cycles in the world dealing with XML's verbose and 
redundant structure. Given that computers use lots of energy, the "Carbon 
Footprint" of XML on global scale is something to think about. Makes me wish 
EXI would catch on more. I also wish XML would just allow a non-verbose close 
tag like <foo>value</> where the end tag doesn't have to repeat the open tag. 
This would reduce XML's overhead to much closer to JSON or Lisp S-expressions 
again. But I digress.

But ignoring all that, there are cases where use of an expensive data format 
like XML just won't allow you to achieve the goal of your software. The two 
cases I know of where something like XML is unacceptable and one might prefer a 
dense binary data format are cutting-edge supercomputing applications - where 
every bit counts in space/speed if the application is going to work at all, and 
also ultra-low-power computing, where every bit counts, because the cost of 
just data compress/decompress consumes too much battery power.

But even then, a standard binary format like EXI (binary XML - same infoset as 
XML, just denser binary representation) may be preferable to an ad-hoc file 
format with DFDL schema.

Lastly another use case I've found for DFDL is what I call "CSV-like" data 
files.

These arise when human beings will be editing data files by hand.  I have a lot 
of experience of "CSV" data files that aren't at all well behaved as true CSV 
data files are supposed to be. Given a spreadsheet program like MS-Excel, 
people will create a spreadsheet document with all sorts of headers and 
sections on a sheet. Then they'll export that sheet as "CSV" and claim the file 
is CSV data.

These sorts of "CSV-like" files are often full of inconsistencies. Empty cells 
are sometimes empty string, sometimes all-whitespace strings, sometimes  
various markers like "--" or "N/A" or "none"

A DFDL schema can be written which handles all these human inconsistency 
factors, skipping section headers, standardizing "--", "N/A", etc.  The result 
is well-behaved XML data set from an inconsistent human-edited CSV-like data 
file.

-mike beckerle
Tresys Technology



________________________________
From: Costello, Roger L. <coste...@mitre.org>
Sent: Tuesday, February 19, 2019 12:16:09 PM
To: users@daffodil.apache.org
Subject: With the tremendous agility that DFDL provides, what is the role of 
XML? What is the role of binary?

Hello DFDL community,

DFDL gives us tremendous agility - we can quickly and easily transform binary 
to XML and XML to binary.

Binary, with its conciseness, is beautiful for moving data.

XML, with its vast tool suite, is beautiful for processing data.

What do you see as the role of XML? The role of binary?

Use binary when moving data, use XML when processing data?

Most images (JPEG, GIF, PNG, etc.) are binary and are processed in their binary 
form. So XML isn't necessarily the ideal form for processing data.

I am eager to hear your thoughts/opinions/comments on this subject.

/Roger

Reply via email to