On Fri, Jan 5, 2024 at 12:22 PM Roger L Costello <coste...@mitre.org> wrote:
> [True/False] DFDL is intended for specifying the form of the input, e.g., > “this field is a two text digit unsigned integer.” > > > True > [True/False] DFDL is not a validation language, e.g., don’t use DFDL to > specify that the field’s value must be between 00 and 44. > > False. We use XSD facets for this in DFDL v1.0, but if one invented a non XSD-based syntax for DFDL it would include a way to specify these same constraints. But such an implementation would still want a way to disable enforcement of the validation checks, and a validation error would still not be a parse error that causes backtracking or failure to parse. > > > [True/False] If you want to validate the input data, do it in a downstream > activity, after DFDL parsing. > False. The line between well-formed and valid is sometimes grey. One person's well-formedness is another's validity. Consider that original JPEG DFDL schema you created years ago. The DFDL part of that would accept line-noise can say it was well formed, because it just treated the file as an unstructured bag of JPEG segments one of which is a BLOB segment. The schematron rules that enforced the segment ordering and nesting structure were needed, not for data validation, but to tell you if the data was even well formed. So in that case, the schematron processing was really part of the parsing process. False. For performance reasons, it is pretty important to handle the data once if possible. Why not check the ranges of values at the point where you create that value? To do otherwise means another whole pass over the data. True. Some kinds of validity cannot be done as part of the natural traversal of data done by a parser/unparse. E.g., verifying that a key field is unique. > > > [True/False] Design DFDL to be liberal in what input it accepts. Accept > input as long as it is well-formed, e.g., accept 87 because it is a two > text digit unsigned integer. > False - a DFDL schema should reject malformed data. It should NOT reject invalid data. True - in this case you are separating well-formedness from validity. The judgement here is - if the value is 87, which is out of range, it's still of interest to examine the data. Maybe the spec is wrong and the de-facto data in the world doesn't respect the range bound. You want to see this (so complete parsing the data) or you can't even observe this phenominon.