I think I understand this. I claim this is an interaction of trimming of pad chars with your nil literals.
try dfdl:nilValue="- %SP;- %SP;%SP;-" i.e., remove any %SP; on the right of the hyphen. My theory is that the string, ex: "- " is getting trimmed of the spaces on the right due to textTrimKind='padChar' Hence "- " becomes "-" which doesn't match any of the nilValues. The reason this works with %WSP*; is that entity can match zero characters. Hence, still matches even if the padding is trimmed away. On Thu, Jan 4, 2024 at 9:58 AM Roger L Costello <coste...@mitre.org> wrote: > Hi Mike, > > > > I created a simple DFDL schema which illustrates the problem with > dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-". > > > > Here is the input (I checked, there are no tabs in the input): > > > > .../ABC/... > .../- /... > .../ - /... > .../ -/... > > > > > > <?xml version="1.0" encoding="UTF-8"?> > <xs:schema xmlns:dfdl=http://www.ogf.org/dfdl/dfdl-1.0/ xmlns:xs= > http://www.w3.org/2001/XMLSchema> > > <xs:include schemaLocation= > "../default-dfdl-properties/defaults.dfdl.xsd" /> > > <xs:annotation> > <xs:appinfo source=http://www.ogf.org/dfdl/> > <dfdl:format ref="default-dfdl-properties" /> > </xs:appinfo> > </xs:annotation> > > <xs:element name="Test"> > <xs:complexType> > <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition= > "infix"> > <xs:element name="Line" maxOccurs="unbounded"> > <xs:complexType> > <xs:sequence dfdl:separator="/" > dfdl:separatorPosition="infix"> > <xs:element name="A" type="xs:string" /> > <xs:element ref="Foo"/> > <xs:element name="B" type="xs:string" /> > </xs:sequence> > </xs:complexType> > </xs:element> > </xs:sequence> > </xs:complexType> > </xs:element> > > <xs:element name="Foo" > type="Foo_simpleType" > nillable="true" > dfdl:nilKind="literalValue" > dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-" > dfdl:lengthKind="explicit" > dfdl:length="3" > dfdl:textTrimKind="padChar" > dfdl:textPadKind="padChar" > dfdl:textStringPadCharacter="%SP;" > dfdl:textStringJustification="left"/> > > <xs:simpleType name="Foo_simpleType"> > <xs:restriction base="validString"> > <xs:pattern value="ABC|DEF|GHI" /> > </xs:restriction> > </xs:simpleType> > > <xs:simpleType name="validString"> > <xs:annotation> > <xs:appinfo source=http://www.ogf.org/dfdl/> > <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert> > </xs:appinfo> > </xs:annotation> > <xs:restriction base="xs:string"/> > </xs:simpleType> > > > </xs:schema> > > > > > > *From:* Mike Beckerle <mbecke...@apache.org> > *Sent:* Thursday, January 4, 2024 8:53 AM > *To:* users@daffodil.apache.org > *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere > within a fixed field? > > > > My guess is that one of the whitespace characters is a tab, not a space or > two spaces. So your nilValue doesn't match. That causes a subsequent parse > error, and it backtracks, and your schema then succeeds, without consuming > all the data. Your > > My guess is that one of the whitespace characters is a tab, not a space or > two spaces. So your nilValue doesn't match. That causes a subsequent parse > error, and it backtracks, and your schema then succeeds, without consuming > all the data. > > > > Your schema likely could be improved by adding discriminators. That's a > common need when the "left over data" issue is reported. Your schema is > currently happy to successfully complete parsing, but not consuming all the > data. If your schema is for a file format where there is a requirement that > it consume all the data, then discriminators should ensure all the data is > consumed or a parse error occurs. > > > > I have found this discriminator useful: > > > > <dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/> > > > > This is true if the regex matches the front of the data stream at that > point, which means "there is at least one character/byte of anything at > all. I.e., there is more data to be had. > > > > For example if you have a file that is an array of records. So if there is > more data, it must be a record. Ending the array before all the data is > consumed because attempting to parse another record fails is not > acceptable. So putting this discriminator on that record array element decl > insures this. You will never get 'left over data' because the schema isn't > allowed to succeed if there is data remaining. > > > > I like to wrap this discriminator in a group decl to make it self > documenting: > > > > <group name="discriminator_hasAnyData"> > > <sequence> > > <annotation><appinfo source="http://www.ogf.org/dfdl/"> > > <dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/> > > </appinfo></annotation> > > </sequence> > > </group> > > > > Then a group reference to this is a compact one-liner, not 5 or 7 lines of > sequence and annotation. > > > > > > On Thu, Jan 4, 2024 at 7:51 AM Roger L Costello <coste...@mitre.org> > wrote: > > Hi Mike, > > > > To allow a hyphen to occur anywhere within a 3-character field I specified > this: > > > > dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-" > > > > But that failed with the dreaded “Left over data” error message. > > > > Conversely, both of these succeeded: > > > > dfdl:nilValue="%WSP*;-%WSP*;" > > dfdl:nilValue="%WSP*;-" > > > > Why is that? > > > > /Roger > > > > > > *From:* Mike Beckerle <mbecke...@apache.org> > *Sent:* Tuesday, January 2, 2024 11:58 AM > *To:* users@daffodil.apache.org > *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere > within a fixed field? > > > > Tricky! For strings we typically justify left, meaning we trim padding > characters on the right, i. e. , textStringJustification="left". That means > if your data is "- " or " - ", then the spaces on the right side > > Tricky! > > > > For strings we typically justify left, meaning we trim padding characters > on the right, i.e., textStringJustification="left". > > > > That means if your data is "- " or " - ", then the spaces on the right > side are trimmed away before comparison against the "%WSP*;-" nilValue is > done. > > > > However, for numbers we typically justify right, meaning we trim on the > left, ie., textNumberJustification="right". > > > > In that case "- " or " - " would not be trimmed on the right side, but on > the left, leaving them with spaces after the hyphen, so "%WSP*;-" won't > match them. > > > > So, the rationale for suggesting "%WSP*;-%WSP*;" i.e., with WSP* on both > sides, is so that your nilValue matching conventions are insensitive to > type and to whether you use text justification of left or right. > > > > > > On Fri, Dec 22, 2023 at 8:01 AM Roger L Costello <coste...@mitre.org> > wrote: > > Hi Folks, > > > > I have a fixed-length field (3) that has hyphen as the nilValue. The > hyphen can be positioned anywhere in the field, e.g., > > > > .../- /... > > .../ - /... > > .../ -/... > > > > What is the right way to specify the nilValue? I specified it this way: > > > > dfdl:nilValue="%WSP*;-" > > > > and it seems to work just fine. > > > > But I was told, “that only allows whitespace before the hyphen; it should > be specified this way: > > > > dfdl:nilValue="%WSP*;-%WSP*;" > > > > What is the correct way? > > > > /Roger > > > > > >