re: "Doesn't that lengthPattern mean, "The allowable values for this element are foo, bar, or dash?"
No. The length of the match of the lengthPattern isolates the content region for this element in the data grammar. No match means length 0. I.e., the dfdl:lengthPattern property is about determining the length of the representation of the element. It is only about the length. The dfdl:lengthPattern is NOT, in general, a statement about the value. Coincidently, if the type is string, then there may be overlap in the lengthPattern regex between string values and logical values or literal nil values that the strings must contain. But the best way to think about lengthPattern is to ignore the value itself and use lookahead/lookbehind regex features to find out what must terminate the data, i.e., what must appear after it. That's the primary intended use case for lengthKind 'pattern' not to recognize valid allowed data, but to scan past it for things that indicate where it ends. Determining length is a key concept in DFDL. You can do nothing pretty much until you determine length. You haven't isolated what data you are even talking about until length determination is over. Then you have to determine the difference between content and value regions within the data (due to padding typically) and then whether it is the nil, empty, or normal representation. Then, if it is normal representation, you can start talking about what regex the value must match if it is a string (via regular XSD pattern facet, which are about the string value - now isolated from the data stream), what calendar-pattern it must match if it is a date/time, what boolean value it converts to by way of the textBooleanXYZZY properties, etc. Determining length is a key concept to understand the difference between "well formed" data and "valid" data. A string is well formed if it can be isolated properly from the data stream i.e., we can determine which characters/bytes of the data stream *should* be the data and talk about how that data is invalid. If we can't even figure out which characters/bytes of the data stream should even be considered to be the element in question, that's what we mean by "malformed" data. To get a good format description in DFDL, dfdl:lengthKind pattern must be used carefully and minimally. A format description language that handles textual data format description as a BNF grammar with interspersed regular expressions is a potentially useful concept. DFDL is *not* that language. In DFDL, the lengthKind 'pattern' was added as a *hack* to cope with things we couldn't come up with any better way to handle. It is intended to be a last resort for formats that are otherwise impossible to model. It is, for example, to handle the situation in USMTF where "//" is a terminator, except since the internet came around we now must allow content like " http://some.domain.foo/url/syntax", which contains a "//" hence, lengthKind pattern can be used to end a field with a "//" that is not preceded by ":", using the look-ahead and negative look-behind regex features. That's what lengthKind pattern is for. Not for recognizing allowed string values. XSD pattern facets are for recognizing allowed string values. -mikeb On Wed, Apr 27, 2022 at 6:10 PM Roger L Costello <coste...@mitre.org> wrote: > Hi Steve, > > > dfdl:lengthPattern="foo|bar|-" > > That's really interesting. In my data format, the dash is to be used only > to indicate there is no data available. Doesn't that lengthPattern mean, > "The allowable values for this element are foo, bar, or dash"? If I use > that lengthPattern, is there any reason to use nillable="true" and > dfdl:nilValue="-"? > > /Roger > > -----Original Message----- > From: Steve Lawrence <slawre...@apache.org> > Sent: Wednesday, April 27, 2022 3:04 PM > To: users@daffodil.apache.org > Subject: [EXT] Re: Bug in Daffodil? > > Your pattern length must include something that matches the nil content > as well, otherwise Daffodil doesn't actaully know how long your nil > content is. So your pattern needs to look something like this: > > dfdl:lengthPattern="foo|bar|-" > > Additionally, because the "A" element could be nilled, you also need to > update your assertion. This is because when an element is nilled it > doesn't actually have a value, so accessing the value to compare it to > the empty string will cause an SDE. Instead, your assertion wants to be > something like this: > > <dfdl:assert test="{ fn:nilled(.) or . ne '' }"/> > > This asserts that either your element is nilled or its value is not the > empty string. > > - Steve > > On 4/27/22 2:11 PM, Roger L Costello wrote: > > Hi Folks, > > > > My input consists of one field terminated by // > > > > The value of the field is either foo or bar. > > > > Here is a sample input: > > > > foo// > > > > My DFDL schema works fine with that input. > > > > The field is nillable and the nilValue is a hyphen. Here is a valid > input: > > > > -// > > > > My DFDL schema fails with that input. > > > > I specify the field using dfdl:lengthKind="pattern" and > dfdl:lengthPattern="foo|bar" > > > > Below is my DFDL schema. Am I doing something wrong or is this a bug in > Daffodil? If so, is there a workaround? /Roger > > > > <?xml version="1.0" encoding="UTF-8"?> > > <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dfdl=" > http://www.ogf.org/dfdl/dfdl-1.0/" elementFormDefault="qualified"> > > <xs:annotation> > > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > <dfdl:format > > alignment="1" > > alignmentUnits="bytes" > > emptyValueDelimiterPolicy="none" > > encoding="ASCII" > > encodingErrorPolicy="replace" > > escapeSchemeRef="" > > fillByte="%SP;" > > floating="no" > > ignoreCase = "yes" > > initiatedContent="no" > > initiator = "" > > leadingSkip="0" > > lengthKind = "delimited" > > lengthUnits="characters" > > nilKind="literalValue" > > nilValue="-" > > nilValueDelimiterPolicy="none" > > occursCountKind="implicit" > > outputNewLine="%CR;%LF;" > > representation="text" > > separator="" > > separatorSuppressionPolicy="anyEmpty" > > sequenceKind="ordered" > > textBidi="no" > > textPadKind="none" > > textTrimKind="none" > > trailingSkip="0" > > truncateSpecifiedLengthString="no" > > terminator = "" > > textNumberRep="standard" > > textStandardBase="10" > > textStandardZeroRep="0" > > textNumberRounding="pattern" > > textStandardExponentRep="E" > > textNumberCheckPolicy="strict" > > /> > > </xs:appinfo> > > </xs:annotation> > > > > <xs:element name="Test" dfdl:terminator="//"> > > <xs:complexType> > > <xs:sequence dfdl:separator="/" > dfdl:separatorPosition="infix"> > > <xs:element name="A" type="non-zero-length-string" > nillable="true" > > dfdl:lengthPattern="foo|bar" > dfdl:nilValue="-" /> > > </xs:sequence> > > </xs:complexType> > > </xs:element> > > > > <xs:simpleType name="non-zero-length-string" > dfdl:lengthKind="pattern"> > > <xs:annotation> > > <xs:appinfo source="http://www.ogf.org/dfdl/"> > > <dfdl:assert test="{ . ne '' }"/> > > </xs:appinfo> > > </xs:annotation> > > <xs:restriction base="xs:string"/> > > </xs:simpleType> > > > > </xs:schema> >