Daffodil from the CLI only validates if you request it.

If you want validation behavior you should enable the CLI -V option.

A -V limited does facet checks and cardinality (max/min occurs) checks only
(and is faster) -V on feeds the XML to Xerces for full validation (but is
slower)

I suggest the -V limited.

You can also arrange (from the command line) for Daffodil to run schematron
rules that are embedded in the DFDL schema.


On Wed, Aug 3, 2022 at 8:02 AM Roger L Costello <coste...@mitre.org> wrote:

> Mike wrote:
>
>
>
>    - It is very often useful to be able to parse data and get a
>    well-formed infoset, even if it is invalid.
>
>
>
> I agree, provided Daffodil generates a warning message for invalid data.
> But my experiments reveal that Daffodil silently accepts invalid data. I
> don’t want that behavior.
>
>
>
> I did some experimenting and here’s what I learned about Daffodil’s
> behavior:
>
>
>
> Scenario: The field is fixed length, 3 characters wide. The legal values
> for the field are: ABC or DEF. Daffodil validation flag _*not*_ set.
>
>
>
> *With checkConstraints present:*
>
>
>
> If field value = ABC, then Daffodil silently processes the input without
> error. *Desired behavior.*
>
>
>
> If field value = XYZ (erroneous value, but proper length), then Daffodil
> displays an error message and does not generate XML output. *Desired
> behavior.*
>
>
>
> If field value = AB (not proper length), then Daffodil displays an error
> message and does not generate XML output. *Desired behavior.*
>
>
>
> *Without checkConstraints present:*
>
>
>
> If field value = ABC, then Daffodil silently processes the input without
> error. *Desired behavior.*
>
>
>
> If field value = XYZ (erroneous value, but proper length), then Daffodil
> silently processes the input without error and generates XML containing
> invalid data. *Undesirable behavior. I want Daffodil to warn me that the
> data has the correct length but the value is not valid.*
>
>
>
> If field value = AB (not proper length), then Daffodil displays an error
> message and does not generate XML output. *Desired behavior.*
>
>
>
> Thoughts?
>
>
>
> /Roger
>
>
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Friday, July 29, 2022 4:32 PM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: Conflicting requirements: a data format field needs
> both lengthKind="explicit" and lengthKind="pattern"
>
>
>
> Looks good roger, except I would not *necessarily* suggest the
> checkConstraints thing, as it is very cyberian in orientation.
>
>
>
> It is very often useful to be able to parse data and get a well-formed
> infoset, even if it is invalid.
>
>
>
> You can have your cake and eat it to, by defining a DFDL variable named
> forceValidityWhenParsing of type xs:boolean. Then your assert can be:
>
>
>
> if ($forceValidityWhenParsing) then dfdl:checkConstraints(.) else fn:true()
>
>
>
> So you can turn on/off this enforcement.
>
>
>
> On Fri, Jul 29, 2022 at 2:39 PM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Thanks for the excellent explanation Mike!
>
>
>
> I did a writeup of the problem and solution. If my writeup has any errors,
> please let me know. See below.  /Roger
>
>
>
> *Problem Statement*: A field in a data format has a fixed width. Let’s
> say the width is 10 characters and we name the field “Foo.” If no data is
> available for the field, it must be populated with a single hyphen,
> surrounded by spaces. The hyphen may be in any position within the field.
> If data is available, it must conform to this regular expression:
> [A-Z]{10}; that is, the data must consist of exactly 10 uppercase letters.
>
>
>
> First, I show the wrong approach to this problem. Then I show the right
> approach.
>
>
>
> We need to specify that the field is populated with a hyphen when no data
> is available; nillable and nilValue are used for this:
>
>
>
> nillable=true
> nilValue='-'
>
>
>
> The field's content is specified using a regex:
>
>
>
> lengthKind=pattern
>
> lengthPattern=[A-Z]{10}
>
>
>
> The regex in that lengthPattern doesn't consider the hyphen. So we update
> the regex to allow for the hyphen:
>
>
>
> lengthPattern=[A-Z]{10}|[ ]*-[ ]*
>
>
>
> However, the right-hand side of that regex (which deals with the hyphen)
> doesn't constrain the length of the field. Recall the hyphen may be
> positioned anywhere within the 10-character field. Writing a regex that
> specifies all possible positions of the hyphen, while ensuring the field is
> 10 characters, is not reasonable.
>
>
>
> So we explicitly specify the length:
>
>
>
> lengthKind=explicit
>
> length=10
>
>
>
> But now we have conflicting requirements:
>
>
>
> 1. lengthKind=pattern for the regex
>
> 2. lengthKind=explicit for the field length
>
>
>
> That's a problem. It's not legal. Can’t specify two different lengthKind
> values for an element.
>
>
>
> Now for the correct solution.
>
>
>
> First, discard nillable and nilValue; the allowed field values, including
> the hyphen, are specified by a regex.
>
>
>
> Explicitly specify the length of the field:
>
>
>
> lengthKind=explicit
>
> length=10
>
>
>
> When the field contains a hyphen, it is surrounded by spaces. Direct the
> parser to trim the surrounding spaces:
>
>
>
> textTrimKind=padChar
> textStringPadChar='%SP;'
> textStringJustification=center
>
>
>
> Note that the parser ensures the field is 10 characters prior to
> performing the trim operation.
>
>
>
> We no longer need to be concerned with the surrounding spaces, so the
> regex is simplified:
>
>
>
> [A-Z]{10}|-
>
>
>
> As seen earlier we cannot specify the regex using lengthKind=pattern.
> Instead, use the XSD pattern facet:
>
>
>
> <simpleType>
>
>     <restriction base="xs:string">
>
>        <pattern value="[A-Z]{10}|-"/>
>
>     </restriction>
>
> </simpleType>
>
>
>
> Unless the parser is run in validation mode, XSD facets are not enforced.
> So force the parser to check the XSD pattern facet by preceding the
> simpleType with an annotation that contains checkConstraints:
>
>
>
> <xs:annotation>
>
>     <xs:appinfo source="http://www.ogf.org/dfdl/";>
>
>         <dfdl:assert test="{ dfdl:checkConstraints(.) }"
>
>             message="Validation of Foo failed" />
>
>     </xs:appinfo>
>
> </xs:annotation>
>
> <xs:simpleType>
>
>     <xs:restriction base="xs:string">
>
>         <xs:pattern value="[A-Z]{10}|-"/>
>
>    </xs:restriction>
>
> </xs:simpleType>
>
>
>
> Putting it all together, here is how to declare the Foo element:
>
>
>
> <xs:element name="Foo"
>
>                        dfdl:lengthKind="explicit"
>
>                        dfdl:length="10"
>
>                        dfdl:textTrimKind="padChar"
>
>                        dfdl:textPadKind="padChar"
>
>                        dfdl:textStringPadCharacter="%SP;"
>
>                        dfdl:textStringJustification="center">
>
>             <xs:annotation>
>
>                 <xs:appinfo source="http://www.ogf.org/dfdl/";>
>
>                     <dfdl:assert test="{ dfdl:checkConstraints(.) }"
>
>                                           message="Validation of Foo
> failed" />
>
>                 </xs:appinfo>
>
>             </xs:annotation>
>
>             <xs:simpleType>
>
>                 <xs:restriction base="xs:string">
>
>                     <xs:pattern value="[A-Z]{10}|-"/>
>
>                 </xs:restriction>
>
>             </xs:simpleType>
> </xs:element>
>
>
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Thursday, July 28, 2022 4:36 PM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: Conflicting requirements: a data format field needs
> both lengthKind="explicit" and lengthKind="pattern"
>
>
>
> You said the length is 100, so that's what's going to want to be the
> lengthKind 'explicit' length.
>
>
>
> What about using your regex but via a pattern facet?
>
>
>
> <element name="Foo" dfdl:lengthKind='explicit' dfdl:length='100'>
>
>   <simpleType>
>
>     <restriction base="xs:string">
>
>        <pattern value="[A-Z]{100}|[ ]*-[ ]*"/>
>
>     </restriction>
>
>   </simpleType>
>
> </element>
>
>
>
> You should be able to trim spaces as well from this so that you will get
> either 100 characters of A-Z or a single "-" character as the string's
> actual length.
>
>
>
> Note that in this case your regex is simpler. The two "[ ]*" are gone
> because the spaces will be trimmed from both ends of the string.
>
>
>
> <element name="Foo" dfdl:lengthKind='explicit' dfdl:length='100'
>
>    dfdl:textTrimKind='padChar'
>
>    dfdl:textStringPadCharacter='%SP;'
>
>    dfdl:textPadKind='padChar'
>
>    dfdl:textStringJustification="center">
>
>   <simpleType>
>
>     <restriction base="xs:string">
>
>        <pattern value="[A-Z]{100}|-"/>
>
>     </restriction>
>
>   </simpleType>
>
> </element>
>
>
>
> I did not run this DFDL, but this sort of thing is typical of fixed length
> data.
>
>
>
> On Thu, Jul 28, 2022 at 8:52 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Folks,
>
> The text data format that I am writing a DFDL schema for has a field
> (let's name it "Foo") with a fixed width. Let's say the width is 100
> characters. The content of the field is uppercase letters. If there is no
> data available to populate the field, it must be populated with a single
> hyphen (surrounded by spaces to ensure the field has a width of 100). The
> hyphen may be in any position within the field. For reasons I will not
> share, I must specify the field's content using a regex:
>
> lengthKind=pattern
> lengthPattern=[A-Z]{100}
>
> However, that lengthPattern doesn't take into account the hyphen that is
> needed when there is no data. So I updated the regex like this:
>
> lengthPattern=[A-Z]{100}|[ ]*-[ ]*
>
> However, the right-hand side of that regex (which deals with the hyphen)
> doesn't constrain the length of the field. Recall the hyphen may be
> positioned anywhere within the 100 character field. Writing a regex that
> specifies all possible positions of the hyphen, while ensuring the field is
> 100 characters, is not reasonable.
>
> So it would seem that I need to specify length=100 on the element
> declaration:
>
> lengthKind=explicit
> length=100
>
> But now I have conflicting requirements:
>
> 1. The element declaration needs to specify lengthKind=pattern for the
> regex
>
> 2. The element declaration needs to specify lengthKind=explicit for the
> field length
>
> That's a problem. That's not legal.
>
> It other words, I need this illegal DFDL:
>
> <xs:element name="Foo"
>         nillable="true"
>         dfdl: nilValue="-"
>         dfdl:lengthKind="explicit"
>         dfdl:length="100"
>         dfdl:lengthUnits="characters"
>         dfdl:lengthKind="pattern"
>         dfdl:lengthPattern="[A-Z]{100}|[ ]*-[ ]*">
>    <xs:simpleType>
>         <xs:annotation>
>             <xs:appinfo source="http://www.ogf.org/dfdl/";>
>                 <dfdl:assert test="{ (fn:nilled(.)) or (. ne '') }"/>
>             </xs:appinfo>
>         </xs:annotation>
>         <xs:restriction base="xs:string"/>
>     </xs:simpleType>
> </xs:element>
>
> Is there a solution to this problem? If not, is there a workaround?
>
> /Roger
>
>

Reply via email to