Re: Don't use DFDL to do validation ... do you agree?

Beckerle, Mike Mon, 15 Apr 2019 06:37:20 -0700

One further thought I should have started with in my reply.


it is still important to keep "well formed" separate from "valid". You often 
want to accept well formed data, and produce a parse of it. You don't want to 
make validity a criteria for creating an infoset at all.


I think there is this hierarchy:

malformed, well-formed, valid, correct


Obviously you can't parse malformed data so you don't get an infoset.

The next 3 all are about infosets.


The difference between valid and correct is that validity checks can't (and 
often don't) check everything. The ultimate test of correctness of data is 
whether it in fact is suitable for purpose.


________________________________
From: Beckerle, Mike
Sent: Monday, April 15, 2019 9:30:09 AM
To: [email protected]
Subject: Re: Don't use DFDL to do validation ... do you agree?


I don't agree.


DFDL has a dfdl:assert with a recoverable flag. (Daffodil doesn't implement 
this yet, but it's easy.)


This can be used to emit warnings. A dfdl:assert of this type can be used as an 
explicit validation check much like a schematron rule.


In addition Daffodil has DFDL schema validation built in. If invoked with the 
validation flag it will do validation of XSD facets and min/max occurs on the 
fly as it parses, and accumulate these validation warnings. This is MUCH more 
efficient than running a separate XSD validation.


The only thing you don't get from this, is it doesn't check key and unique 
constraints. Those you'd have to put in an XML Schema and use a regular XSD 
validatior.


Well one other thing too. DFDL's expressions are a little less expressive than 
full XPath (and some of the more powerful DFDL expression features aren't 
implemented in Daffodil), so the asserts can't be quite as expressive as full 
XPath. But most of it is there.


At the API level one can then check the diagnostics output by the parse to see 
if there were any validation errors or recoverable asserts. If so one could 
indicate the data is invalid.



________________________________
From: Costello, Roger L. <[email protected]>
Sent: Monday, April 15, 2019 7:30:13 AM
To: [email protected]
Subject: Don't use DFDL to do validation ... do you agree?


Hello DFDL community,



Do you agree with the following?



DFDL is a parsing language, not a validation language.



While it is possible to do validation in DFDL, it is not recommended. That is, 
it is possible to design a DFDL schema to validate data as the data is being 
parsed, but that is not recommended.



Instead, use DFDL just for parsing. Once input data has been converted to XML, 
bring to bear all the XML tools to process the XML, including validation tools 
such as XML Schema, Schematron, and/or RelaxNG. That is, do validation on the 
XML, not on the native file format.



But you might argue: Hold on there, in the most recent discussion wasn’t 
validation performed in the DFDL schema:



<xs:sequence>
    <xs:element name="value" type="xs:string"
           dfdl:lengthKind="pattern"
           dfdl:lengthPattern=".*?(?=(\x0D-|\x0D\x0A-|-|\)$))">
        <xs:annotation>
            <xs:appinfo source="http://www.ogf.org/dfdl/";>

     <!-- Isn’t the following validation? -->
                <dfdl:assert message="empty value" >
                    {fn:string-length(.) gt 0}
                </dfdl:assert>
            </xs:appinfo>
        </xs:annotation>
    </xs:element>
    <xs:choice>
        ...
    </xs:choice>
</xs:sequence>



That validates the length of the input is greater than zero, yes?



Yes. However, it is being used strictly for signaling to Daffodil when to 
abandon this sequence, back up, and proceed down the next path. In other words, 
it is being strictly used as a parsing device, not a validation device. [Am I 
expressing this correctly? Is there a better, richer, more correct way to 
express this?]



/Roger

Re: Don't use DFDL to do validation ... do you agree?

Reply via email to