Hello DFDL community,

A few weeks ago, while chatting with Mike he tossed out an exciting challenge. 
He said, "We need a unified treatment of no data." I'd like to take up Mike's 
challenge, with your help.

As you know, DFDL was created to describe file formats. Formats of all kinds, 
both text formats and binary formats. Mike once speculated that there are 
probably tens of thousands of file formats. I will further that and speculate 
that every one of them probably has the notion of "no data." Mike's challenge 
is to characterize that notion. A good characterization would be very useful, 
given that the notion is so broadly used.

I would like the characterization, at least initially, to be independent of 
DFDL. So, what are the ways that file formats throughout human history have 
expressed "no data"?
Below is a start at a characterization. Before I go too far, I wanted to check 
with you to see if I am on the right track.
A unified treatment of "no data"

  *   A region of a file may be treated as having no data in it. There are two 
possible reasons why a region has no data:
     *   No data was available when the file was created. We say the region has 
a nil (or null) value.
        *   Example: suppose a region of a file represents a person's middle 
name and when the file was created there was no information available about the 
person's middle name. The person has a middle name, but it was not known when 
the file was created. So, when the file was created, the region was given a nil 
middle name.
     *   Data is available and the data is the empty data. We say the region 
has an empty value.
        *   Example: again, suppose a region of a file represents a person's 
middle name but this time the person does not have a middle name. When the file 
is created, the region is given an empty middle name.
Two ways for a file to denote that a region has a nil value

  *   In-band nil: a symbol is inserted into the region to indicate nil. Thus, 
a part of the region's value space is reserved for indicating nil.
     *   Example: the string "N/A" is inserted into the region to indicate that 
data for the person's middle name is Not Available.
  *   Out-of-band nil: a symbol, separate from the region, is used to indicate 
that the region has a nil value.
.......
Well, what do you think, am I on the right track? Any criticisms/edits, big or 
small, is welcome. I want to get this right.
/Roger

Reply via email to