Daffodil from the CLI only validates if you request it. If you want validation behavior you should enable the CLI -V option.
A -V limited does facet checks and cardinality (max/min occurs) checks only (and is faster) -V on feeds the XML to Xerces for full validation (but is slower) I suggest the -V limited. You can also arrange (from the command line) for Daffodil to run schematron rules that are embedded in the DFDL schema. On Wed, Aug 3, 2022 at 8:02 AM Roger L Costello <coste...@mitre.org> wrote: > Mike wrote: > > > > - It is very often useful to be able to parse data and get a > well-formed infoset, even if it is invalid. > > > > I agree, provided Daffodil generates a warning message for invalid data. > But my experiments reveal that Daffodil silently accepts invalid data. I > don’t want that behavior. > > > > I did some experimenting and here’s what I learned about Daffodil’s > behavior: > > > > Scenario: The field is fixed length, 3 characters wide. The legal values > for the field are: ABC or DEF. Daffodil validation flag _*not*_ set. > > > > *With checkConstraints present:* > > > > If field value = ABC, then Daffodil silently processes the input without > error. *Desired behavior.* > > > > If field value = XYZ (erroneous value, but proper length), then Daffodil > displays an error message and does not generate XML output. *Desired > behavior.* > > > > If field value = AB (not proper length), then Daffodil displays an error > message and does not generate XML output. *Desired behavior.* > > > > *Without checkConstraints present:* > > > > If field value = ABC, then Daffodil silently processes the input without > error. *Desired behavior.* > > > > If field value = XYZ (erroneous value, but proper length), then Daffodil > silently processes the input without error and generates XML containing > invalid data. *Undesirable behavior. I want Daffodil to warn me that the > data has the correct length but the value is not valid.* > > > > If field value = AB (not proper length), then Daffodil displays an error > message and does not generate XML output. *Desired behavior.* > > > > Thoughts? > > > > /Roger > > > > > > *From:* Mike Beckerle <mbecke...@apache.org> > *Sent:* Friday, July 29, 2022 4:32 PM > *To:* users@daffodil.apache.org > *Subject:* [EXT] Re: Conflicting requirements: a data format field needs > both lengthKind="explicit" and lengthKind="pattern" > > > > Looks good roger, except I would not *necessarily* suggest the > checkConstraints thing, as it is very cyberian in orientation. > > > > It is very often useful to be able to parse data and get a well-formed > infoset, even if it is invalid. > > > > You can have your cake and eat it to, by defining a DFDL variable named > forceValidityWhenParsing of type xs:boolean. Then your assert can be: > > > > if ($forceValidityWhenParsing) then dfdl:checkConstraints(.) else fn:true() > > > > So you can turn on/off this enforcement. > > > > On Fri, Jul 29, 2022 at 2:39 PM Roger L Costello <coste...@mitre.org> > wrote: > > Thanks for the excellent explanation Mike! > > > > I did a writeup of the problem and solution. If my writeup has any errors, > please let me know. See below. /Roger > > > > *Problem Statement*: A field in a data format has a fixed width. Let’s > say the width is 10 characters and we name the field “Foo.” If no data is > available for the field, it must be populated with a single hyphen, > surrounded by spaces. The hyphen may be in any position within the field. > If data is available, it must conform to this regular expression: > [A-Z]{10}; that is, the data must consist of exactly 10 uppercase letters. > > > > First, I show the wrong approach to this problem. Then I show the right > approach. > > > > We need to specify that the field is populated with a hyphen when no data > is available; nillable and nilValue are used for this: > > > > nillable=true > nilValue='-' > > > > The field's content is specified using a regex: > > > > lengthKind=pattern > > lengthPattern=[A-Z]{10} > > > > The regex in that lengthPattern doesn't consider the hyphen. So we update > the regex to allow for the hyphen: > > > > lengthPattern=[A-Z]{10}|[ ]*-[ ]* > > > > However, the right-hand side of that regex (which deals with the hyphen) > doesn't constrain the length of the field. Recall the hyphen may be > positioned anywhere within the 10-character field. Writing a regex that > specifies all possible positions of the hyphen, while ensuring the field is > 10 characters, is not reasonable. > > > > So we explicitly specify the length: > > > > lengthKind=explicit > > length=10 > > > > But now we have conflicting requirements: > > > > 1. lengthKind=pattern for the regex > > 2. lengthKind=explicit for the field length > > > > That's a problem. It's not legal. Can’t specify two different lengthKind > values for an element. > > > > Now for the correct solution. > > > > First, discard nillable and nilValue; the allowed field values, including > the hyphen, are specified by a regex. > > > > Explicitly specify the length of the field: > > > > lengthKind=explicit > > length=10 > > > > When the field contains a hyphen, it is surrounded by spaces. Direct the > parser to trim the surrounding spaces: > > > > textTrimKind=padChar > textStringPadChar='%SP;' > textStringJustification=center > > > > Note that the parser ensures the field is 10 characters prior to > performing the trim operation. > > > > We no longer need to be concerned with the surrounding spaces, so the > regex is simplified: > > > > [A-Z]{10}|- > > > > As seen earlier we cannot specify the regex using lengthKind=pattern. > Instead, use the XSD pattern facet: > > > > <simpleType> > > <restriction base="xs:string"> > > <pattern value="[A-Z]{10}|-"/> > > </restriction> > > </simpleType> > > > > Unless the parser is run in validation mode, XSD facets are not enforced. > So force the parser to check the XSD pattern facet by preceding the > simpleType with an annotation that contains checkConstraints: > > > > <xs:annotation> > > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > <dfdl:assert test="{ dfdl:checkConstraints(.) }" > > message="Validation of Foo failed" /> > > </xs:appinfo> > > </xs:annotation> > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:pattern value="[A-Z]{10}|-"/> > > </xs:restriction> > > </xs:simpleType> > > > > Putting it all together, here is how to declare the Foo element: > > > > <xs:element name="Foo" > > dfdl:lengthKind="explicit" > > dfdl:length="10" > > dfdl:textTrimKind="padChar" > > dfdl:textPadKind="padChar" > > dfdl:textStringPadCharacter="%SP;" > > dfdl:textStringJustification="center"> > > <xs:annotation> > > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > <dfdl:assert test="{ dfdl:checkConstraints(.) }" > > message="Validation of Foo > failed" /> > > </xs:appinfo> > > </xs:annotation> > > <xs:simpleType> > > <xs:restriction base="xs:string"> > > <xs:pattern value="[A-Z]{10}|-"/> > > </xs:restriction> > > </xs:simpleType> > </xs:element> > > > > > > *From:* Mike Beckerle <mbecke...@apache.org> > *Sent:* Thursday, July 28, 2022 4:36 PM > *To:* users@daffodil.apache.org > *Subject:* [EXT] Re: Conflicting requirements: a data format field needs > both lengthKind="explicit" and lengthKind="pattern" > > > > You said the length is 100, so that's what's going to want to be the > lengthKind 'explicit' length. > > > > What about using your regex but via a pattern facet? > > > > <element name="Foo" dfdl:lengthKind='explicit' dfdl:length='100'> > > <simpleType> > > <restriction base="xs:string"> > > <pattern value="[A-Z]{100}|[ ]*-[ ]*"/> > > </restriction> > > </simpleType> > > </element> > > > > You should be able to trim spaces as well from this so that you will get > either 100 characters of A-Z or a single "-" character as the string's > actual length. > > > > Note that in this case your regex is simpler. The two "[ ]*" are gone > because the spaces will be trimmed from both ends of the string. > > > > <element name="Foo" dfdl:lengthKind='explicit' dfdl:length='100' > > dfdl:textTrimKind='padChar' > > dfdl:textStringPadCharacter='%SP;' > > dfdl:textPadKind='padChar' > > dfdl:textStringJustification="center"> > > <simpleType> > > <restriction base="xs:string"> > > <pattern value="[A-Z]{100}|-"/> > > </restriction> > > </simpleType> > > </element> > > > > I did not run this DFDL, but this sort of thing is typical of fixed length > data. > > > > On Thu, Jul 28, 2022 at 8:52 AM Roger L Costello <coste...@mitre.org> > wrote: > > Hi Folks, > > The text data format that I am writing a DFDL schema for has a field > (let's name it "Foo") with a fixed width. Let's say the width is 100 > characters. The content of the field is uppercase letters. If there is no > data available to populate the field, it must be populated with a single > hyphen (surrounded by spaces to ensure the field has a width of 100). The > hyphen may be in any position within the field. For reasons I will not > share, I must specify the field's content using a regex: > > lengthKind=pattern > lengthPattern=[A-Z]{100} > > However, that lengthPattern doesn't take into account the hyphen that is > needed when there is no data. So I updated the regex like this: > > lengthPattern=[A-Z]{100}|[ ]*-[ ]* > > However, the right-hand side of that regex (which deals with the hyphen) > doesn't constrain the length of the field. Recall the hyphen may be > positioned anywhere within the 100 character field. Writing a regex that > specifies all possible positions of the hyphen, while ensuring the field is > 100 characters, is not reasonable. > > So it would seem that I need to specify length=100 on the element > declaration: > > lengthKind=explicit > length=100 > > But now I have conflicting requirements: > > 1. The element declaration needs to specify lengthKind=pattern for the > regex > > 2. The element declaration needs to specify lengthKind=explicit for the > field length > > That's a problem. That's not legal. > > It other words, I need this illegal DFDL: > > <xs:element name="Foo" > nillable="true" > dfdl: nilValue="-" > dfdl:lengthKind="explicit" > dfdl:length="100" > dfdl:lengthUnits="characters" > dfdl:lengthKind="pattern" > dfdl:lengthPattern="[A-Z]{100}|[ ]*-[ ]*"> > <xs:simpleType> > <xs:annotation> > <xs:appinfo source="http://www.ogf.org/dfdl/"> > <dfdl:assert test="{ (fn:nilled(.)) or (. ne '') }"/> > </xs:appinfo> > </xs:annotation> > <xs:restriction base="xs:string"/> > </xs:simpleType> > </xs:element> > > Is there a solution to this problem? If not, is there a workaround? > > /Roger > >