The list you gave, which seems derived from DFDL properties, is a good start.


I suggest encodings (charset encodings) and encodings (aka "ascii armoring") 
like base64, ascii85 (personal favorite!), base100, etc. are all relevant 
topics. Coping with compression, encryption, signatures, and checksums are all 
relevant topics.


The big open research topic is "extensibility". There are many format 
description systems in our industry, and there is this standard DFDL one. None 
of them are extensible. And there are different kinds of extensibility with 
different kinds of difficulty.


The most basic is just composition of schemas. E.g., NITF is an image file 
format that is a container for many different image file types. How do we 
describe NITF so that the description does not have to include directly the 
descriptions of all the types it can contain?


Another kind of extensibility that is harder is new fundamental data types: 
e.g., quad-precision floating point numbers. There is no type for these in XSD, 
hence DFDL itself, being built on XSD, has no type for this.


See: https://cwiki.apache.org/confluence/display/DAFFODIL/DFDL+2.0+Wish+List


for a few more ideas along the lines of the above.

________________________________
From: Costello, Roger L. <[email protected]>
Sent: Wednesday, August 7, 2019 7:47:09 AM
To: [email protected] <[email protected]>
Subject: Is there a list of data format features?

Hello DFDL community,

Is there a list of formatting features used by the world's data formats? I 
imagine a list that is independent of how to parse the features, independent of 
syntax. It would be just a raw list of features.

I imagine something such as the following (incomplete) list.

------------------------------------------------------
List of Features used to Format Data
------------------------------------------------------
- separators
- position of separator
- initiators
- terminators
- length-of-data indicator
- units for the length-of-data indicator
- end-of-file indicator
- empty fields
- optional fields
- mandatory fields
- repeating fields
- when-to-stop-repeating indicator
- sequence
- choice
- default values
- fixed value
- no-data-available indicator
- datatypes
- text encoding
- byte order
- bit order
- newline indicator
- data follows a pattern
- whitespace
- escaping characters
- escaping blocks of data
- nested data
- recursive data
- boolean data representation
- various ways to represent unsigned integers

Reply via email to