Interesting. The fact that they call it safe "Documents" suggests that they are worried about the file formats people typically use/create.
Note that DFDL cannot express most such formats due to not having recursion. This was 100% intentional. It keeps DFDL not-turing-complete, which is theoretically good from a safety/security point of view. However, people have requested adding recursion to DFDL as a feature one can turn on/off so that one could describe "document" style formats using DFDL. There's real hard work to changing daffodil to allow that. We'd like to have it as an experimental DFDL feature someday. ________________________________ From: Costello, Roger L. <[email protected]> Sent: Wednesday, September 4, 2019 12:48 PM To: [email protected] <[email protected]> Subject: Are you familiar with Safe Documents (SafeDocs)? Today, code for input data validation is typically written manually in an ad-hoc manner. For commonly-used electronic data formats, input validation is, at a minimum, a problem of scale whereby specifications of these formats comprise hundreds to thousands of pages. Input validation thus translates to thousands or more conditions to be checked against the input data before the data can be safely processed. Manually writing the code to parse and validate input, and then manually auditing whether that code implements all the necessary checks completely and correctly, does not scale. Moreover, manual parser coding and auditing typically fails even for electronic data formats specifically designed to be easier to perform such tasks, e.g., JSON and XML. A variety of critical vulnerabilities have been found in major parser implementations for these formats. Widely deployed mitigations against crafted input attacks include (a) trying to prevent the flow of untrusted data to vulnerable software; and (b) testing software with randomized inputs to find and patch flaws that could be triggered by maliciously created inputs. Unfortunately, neither of these approaches offer security assurance guarantees. Mitigations for preventing the flow of untrusted data to vulnerable software, which can be implemented via network or host-based measures such as firewalls, application proxies, antivirus scanners, etc., neither remove the underlying vulnerability from the target, nor encode complete knowledge of document or message format internals. Attacker bypasses of such mitigations exploit incompleteness of the mitigations' understanding of the data format to exploit the still-vulnerable targets. The Safe Documents (SafeDocs) program will develop novel verified programming methodologies for building high assurance parsers for extant electronic data formats, and novel methodologies for comprehending, simplifying, and reducing these formats to their safe, unambiguous, verification-friendly subsets (“safe sub-setting”). SafeDocs will address the ambiguity and complexity obstacles that hinder the application of verified programming posed by extant electronic data formats. SafeDocs’ multi-pronged approach will combine: * extracting the extant formats’ de facto syntax (including any non-compliant syntax deliberately accepted and substantially used in the wild); * identifying a syntactically simpler subset of this syntax that yields itself to use in verified programming while preserving the format's essential functionality; and * creating software construction kits for building secure, verified parsers for this syntactically simpler subset, and high-assurance translators for converting extant instances of the format to this subset. The parser construction kits developed by SafeDocs will be usable by industry programmers who understand the syntax of electronic data formats but lack the theoretical background in verified programming. These tools will enable developers to construct verifiable parsers for new electronic data formats as well as extant ones. The tools will guide the syntactic design of new formats, by making verification-friendly format syntax easy to express, and vice versa. https://www.darpa.mil/program/safe-documents
