The list you gave, which seems derived from DFDL properties, is a good start.
I suggest encodings (charset encodings) and encodings (aka "ascii armoring") like base64, ascii85 (personal favorite!), base100, etc. are all relevant topics. Coping with compression, encryption, signatures, and checksums are all relevant topics. The big open research topic is "extensibility". There are many format description systems in our industry, and there is this standard DFDL one. None of them are extensible. And there are different kinds of extensibility with different kinds of difficulty. The most basic is just composition of schemas. E.g., NITF is an image file format that is a container for many different image file types. How do we describe NITF so that the description does not have to include directly the descriptions of all the types it can contain? Another kind of extensibility that is harder is new fundamental data types: e.g., quad-precision floating point numbers. There is no type for these in XSD, hence DFDL itself, being built on XSD, has no type for this. See: https://cwiki.apache.org/confluence/display/DAFFODIL/DFDL+2.0+Wish+List for a few more ideas along the lines of the above. ________________________________ From: Costello, Roger L. <[email protected]> Sent: Wednesday, August 7, 2019 7:47:09 AM To: [email protected] <[email protected]> Subject: Is there a list of data format features? Hello DFDL community, Is there a list of formatting features used by the world's data formats? I imagine a list that is independent of how to parse the features, independent of syntax. It would be just a raw list of features. I imagine something such as the following (incomplete) list. ------------------------------------------------------ List of Features used to Format Data ------------------------------------------------------ - separators - position of separator - initiators - terminators - length-of-data indicator - units for the length-of-data indicator - end-of-file indicator - empty fields - optional fields - mandatory fields - repeating fields - when-to-stop-repeating indicator - sequence - choice - default values - fixed value - no-data-available indicator - datatypes - text encoding - byte order - bit order - newline indicator - data follows a pattern - whitespace - escaping characters - escaping blocks of data - nested data - recursive data - boolean data representation - various ways to represent unsigned integers
