Thanks for the excellent explanation Mike! I did a writeup of the problem and solution. If my writeup has any errors, please let me know. See below. /Roger
Problem Statement: A field in a data format has a fixed width. Let’s say the width is 10 characters and we name the field “Foo.” If no data is available for the field, it must be populated with a single hyphen, surrounded by spaces. The hyphen may be in any position within the field. If data is available, it must conform to this regular expression: [A-Z]{10}; that is, the data must consist of exactly 10 uppercase letters. First, I show the wrong approach to this problem. Then I show the right approach. We need to specify that the field is populated with a hyphen when no data is available; nillable and nilValue are used for this: nillable=true nilValue='-' The field's content is specified using a regex: lengthKind=pattern lengthPattern=[A-Z]{10} The regex in that lengthPattern doesn't consider the hyphen. So we update the regex to allow for the hyphen: lengthPattern=[A-Z]{10}|[ ]*-[ ]* However, the right-hand side of that regex (which deals with the hyphen) doesn't constrain the length of the field. Recall the hyphen may be positioned anywhere within the 10-character field. Writing a regex that specifies all possible positions of the hyphen, while ensuring the field is 10 characters, is not reasonable. So we explicitly specify the length: lengthKind=explicit length=10 But now we have conflicting requirements: 1. lengthKind=pattern for the regex 2. lengthKind=explicit for the field length That's a problem. It's not legal. Can’t specify two different lengthKind values for an element. Now for the correct solution. First, discard nillable and nilValue; the allowed field values, including the hyphen, are specified by a regex. Explicitly specify the length of the field: lengthKind=explicit length=10 When the field contains a hyphen, it is surrounded by spaces. Direct the parser to trim the surrounding spaces: textTrimKind=padChar textStringPadChar='%SP;' textStringJustification=center Note that the parser ensures the field is 10 characters prior to performing the trim operation. We no longer need to be concerned with the surrounding spaces, so the regex is simplified: [A-Z]{10}|- As seen earlier we cannot specify the regex using lengthKind=pattern. Instead, use the XSD pattern facet: <simpleType> <restriction base="xs:string"> <pattern value="[A-Z]{10}|-"/> </restriction> </simpleType> Unless the parser is run in validation mode, XSD facets are not enforced. So force the parser to check the XSD pattern facet by preceding the simpleType with an annotation that contains checkConstraints: <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert test="{ dfdl:checkConstraints(.) }" message="Validation of Foo failed" /> </xs:appinfo> </xs:annotation> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[A-Z]{10}|-"/> </xs:restriction> </xs:simpleType> Putting it all together, here is how to declare the Foo element: <xs:element name="Foo" dfdl:lengthKind="explicit" dfdl:length="10" dfdl:textTrimKind="padChar" dfdl:textPadKind="padChar" dfdl:textStringPadCharacter="%SP;" dfdl:textStringJustification="center"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert test="{ dfdl:checkConstraints(.) }" message="Validation of Foo failed" /> </xs:appinfo> </xs:annotation> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[A-Z]{10}|-"/> </xs:restriction> </xs:simpleType> </xs:element> From: Mike Beckerle <mbecke...@apache.org> Sent: Thursday, July 28, 2022 4:36 PM To: users@daffodil.apache.org Subject: [EXT] Re: Conflicting requirements: a data format field needs both lengthKind="explicit" and lengthKind="pattern" You said the length is 100, so that's what's going to want to be the lengthKind 'explicit' length. What about using your regex but via a pattern facet? <element name="Foo" dfdl:lengthKind='explicit' dfdl:length='100'> <simpleType> <restriction base="xs:string"> <pattern value="[A-Z]{100}|[ ]*-[ ]*"/> </restriction> </simpleType> </element> You should be able to trim spaces as well from this so that you will get either 100 characters of A-Z or a single "-" character as the string's actual length. Note that in this case your regex is simpler. The two "[ ]*" are gone because the spaces will be trimmed from both ends of the string. <element name="Foo" dfdl:lengthKind='explicit' dfdl:length='100' dfdl:textTrimKind='padChar' dfdl:textStringPadCharacter='%SP;' dfdl:textPadKind='padChar' dfdl:textStringJustification="center"> <simpleType> <restriction base="xs:string"> <pattern value="[A-Z]{100}|-"/> </restriction> </simpleType> </element> I did not run this DFDL, but this sort of thing is typical of fixed length data. On Thu, Jul 28, 2022 at 8:52 AM Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>> wrote: Hi Folks, The text data format that I am writing a DFDL schema for has a field (let's name it "Foo") with a fixed width. Let's say the width is 100 characters. The content of the field is uppercase letters. If there is no data available to populate the field, it must be populated with a single hyphen (surrounded by spaces to ensure the field has a width of 100). The hyphen may be in any position within the field. For reasons I will not share, I must specify the field's content using a regex: lengthKind=pattern lengthPattern=[A-Z]{100} However, that lengthPattern doesn't take into account the hyphen that is needed when there is no data. So I updated the regex like this: lengthPattern=[A-Z]{100}|[ ]*-[ ]* However, the right-hand side of that regex (which deals with the hyphen) doesn't constrain the length of the field. Recall the hyphen may be positioned anywhere within the 100 character field. Writing a regex that specifies all possible positions of the hyphen, while ensuring the field is 100 characters, is not reasonable. So it would seem that I need to specify length=100 on the element declaration: lengthKind=explicit length=100 But now I have conflicting requirements: 1. The element declaration needs to specify lengthKind=pattern for the regex 2. The element declaration needs to specify lengthKind=explicit for the field length That's a problem. That's not legal. It other words, I need this illegal DFDL: <xs:element name="Foo" nillable="true" dfdl: nilValue="-" dfdl:lengthKind="explicit" dfdl:length="100" dfdl:lengthUnits="characters" dfdl:lengthKind="pattern" dfdl:lengthPattern="[A-Z]{100}|[ ]*-[ ]*"> <xs:simpleType> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert test="{ (fn:nilled(.)) or (. ne '') }"/> </xs:appinfo> </xs:annotation> <xs:restriction base="xs:string"/> </xs:simpleType> </xs:element> Is there a solution to this problem? If not, is there a workaround? /Roger