Hello DFDL community,
A few weeks ago, while chatting with Mike he tossed out an exciting challenge.
He said, "We need a unified treatment of no data." I'd like to take up Mike's
challenge, with your help.
As you know, DFDL was created to describe file formats. Formats of all kinds,
both text formats and binary formats. Mike once speculated that there are
probably tens of thousands of file formats. I will further that and speculate
that every one of them probably has the notion of "no data." Mike's challenge
is to characterize that notion. A good characterization would be very useful,
given that the notion is so broadly used.
I would like the characterization, at least initially, to be independent of
DFDL. So, what are the ways that file formats throughout human history have
expressed "no data"?
Below is a start at a characterization. Before I go too far, I wanted to check
with you to see if I am on the right track.
A unified treatment of "no data"
* A region of a file may be treated as having no data in it. There are two
possible reasons why a region has no data:
* No data was available when the file was created. We say the region has
a nil (or null) value.
* Example: suppose a region of a file represents a person's middle
name and when the file was created there was no information available about the
person's middle name. The person has a middle name, but it was not known when
the file was created. So, when the file was created, the region was given a nil
middle name.
* Data is available and the data is the empty data. We say the region
has an empty value.
* Example: again, suppose a region of a file represents a person's
middle name but this time the person does not have a middle name. When the file
is created, the region is given an empty middle name.
Two ways for a file to denote that a region has a nil value
* In-band nil: a symbol is inserted into the region to indicate nil. Thus,
a part of the region's value space is reserved for indicating nil.
* Example: the string "N/A" is inserted into the region to indicate that
data for the person's middle name is Not Available.
* Out-of-band nil: a symbol, separate from the region, is used to indicate
that the region has a nil value.
.......
Well, what do you think, am I on the right track? Any criticisms/edits, big or
small, is welcome. I want to get this right.
/Roger