Re: How to specify that the nilValue can occur anywhere within a fixed field?

Mike Beckerle Thu, 04 Jan 2024 07:15:11 -0800

I think I understand this.

I claim this is an interaction of trimming of pad chars with your nil
literals.


try dfdl:nilValue="- %SP;- %SP;%SP;-" i.e., remove any %SP; on the right of
the hyphen.

My theory is that the string, ex: "-  " is getting trimmed of the spaces on
the right due to textTrimKind='padChar' Hence "-  " becomes "-" which
doesn't match any of the nilValues.

The reason this works with %WSP*; is that entity can match zero characters.
Hence, still matches even if the padding is trimmed away.

On Thu, Jan 4, 2024 at 9:58 AM Roger L Costello <coste...@mitre.org> wrote:

> Hi Mike,
>
>
>
> I created a simple DFDL schema which illustrates the problem with
> dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-".
>
>
>
> Here is the input (I checked, there are no tabs in the input):
>
>
>
> .../ABC/...
> .../-  /...
> .../ - /...
> .../  -/...
>
>
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:dfdl=http://www.ogf.org/dfdl/dfdl-1.0/ xmlns:xs=
> http://www.w3.org/2001/XMLSchema>
>
>     <xs:include schemaLocation=
> "../default-dfdl-properties/defaults.dfdl.xsd" />
>
>     <xs:annotation>
>         <xs:appinfo source=http://www.ogf.org/dfdl/>
>             <dfdl:format ref="default-dfdl-properties" />
>         </xs:appinfo>
>     </xs:annotation>
>
>     <xs:element name="Test">
>         <xs:complexType>
>             <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition=
> "infix">
>                 <xs:element name="Line" maxOccurs="unbounded">
>                     <xs:complexType>
>                         <xs:sequence dfdl:separator="/"
> dfdl:separatorPosition="infix">
>                             <xs:element name="A" type="xs:string" />
>                             <xs:element ref="Foo"/>
>                             <xs:element name="B" type="xs:string" />
>                         </xs:sequence>
>                     </xs:complexType>
>                 </xs:element>
>             </xs:sequence>
>         </xs:complexType>
>     </xs:element>
>
>     <xs:element name="Foo"
>                 type="Foo_simpleType"
>                 nillable="true"
>                 dfdl:nilKind="literalValue"
>                 dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-"
>                 dfdl:lengthKind="explicit"
>                 dfdl:length="3"
>                 dfdl:textTrimKind="padChar"
>                 dfdl:textPadKind="padChar"
>                 dfdl:textStringPadCharacter="%SP;"
>                 dfdl:textStringJustification="left"/>
>
>     <xs:simpleType name="Foo_simpleType">
>         <xs:restriction base="validString">
>             <xs:pattern value="ABC|DEF|GHI" />
>         </xs:restriction>
>     </xs:simpleType>
>
>     <xs:simpleType name="validString">
>         <xs:annotation>
>             <xs:appinfo source=http://www.ogf.org/dfdl/>
>                 <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
>             </xs:appinfo>
>         </xs:annotation>
>         <xs:restriction base="xs:string"/>
>     </xs:simpleType>
>
>
> </xs:schema>
>
>
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Thursday, January 4, 2024 8:53 AM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere
> within a fixed field?
>
>
>
> My guess is that one of the whitespace characters is a tab, not a space or
> two spaces. So your nilValue doesn't match. That causes a subsequent parse
> error, and it backtracks, and your schema then succeeds, without consuming
> all the data. Your
>
> My guess is that one of the whitespace characters is a tab, not a space or
> two spaces. So your nilValue doesn't match. That causes a subsequent parse
> error, and it backtracks, and your schema then succeeds, without consuming
> all the data.
>
>
>
> Your schema likely could be improved by adding discriminators. That's a
> common need when the "left over data" issue is reported. Your schema is
> currently happy to successfully complete parsing, but not consuming all the
> data. If your schema is for a file format where there is a requirement that
> it consume all the data, then discriminators should ensure all the data is
> consumed or a parse error occurs.
>
>
>
> I have found this discriminator useful:
>
>
>
> <dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/>
>
>
>
> This is true if the regex matches the front of the data stream at that
> point, which means "there is at least one character/byte of anything at
> all. I.e., there is more data to be had.
>
>
>
> For example if you have a file that is an array of records. So if there is
> more data, it must be a record. Ending the array before all the data is
> consumed because attempting to parse another record fails is not
> acceptable. So putting this discriminator on that record array element decl
> insures this. You will never get 'left over data' because the schema isn't
> allowed to succeed if there is data remaining.
>
>
>
> I like to wrap this discriminator in a group decl to make it self
> documenting:
>
>
>
> <group name="discriminator_hasAnyData">
>
>   <sequence>
>
>       <annotation><appinfo source="http://www.ogf.org/dfdl/";>
>
>           <dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/>
>
>        </appinfo></annotation>
>
>    </sequence>
>
> </group>
>
>
>
> Then a group reference to this is a compact one-liner, not 5 or 7 lines of
> sequence and annotation.
>
>
>
>
>
> On Thu, Jan 4, 2024 at 7:51 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Mike,
>
>
>
> To allow a hyphen to occur anywhere within a 3-character field I specified
> this:
>
>
>
> dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-"
>
>
>
> But that failed with the dreaded “Left over data” error message.
>
>
>
> Conversely, both of these succeeded:
>
>
>
> dfdl:nilValue="%WSP*;-%WSP*;"
>
> dfdl:nilValue="%WSP*;-"
>
>
>
> Why is that?
>
>
>
> /Roger
>
>
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Tuesday, January 2, 2024 11:58 AM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere
> within a fixed field?
>
>
>
> Tricky! For strings we typically justify left, meaning we trim padding
> characters on the right, i. e. , textStringJustification="left". That means
> if your data is "- " or " - ", then the spaces on the right side
>
> Tricky!
>
>
>
> For strings we typically justify left, meaning we trim padding characters
> on the right, i.e., textStringJustification="left".
>
>
>
> That means if your data is "-  " or " - ", then the spaces on the right
> side are trimmed away before comparison against the "%WSP*;-" nilValue is
> done.
>
>
>
> However, for numbers we typically justify right, meaning we trim on the
> left, ie., textNumberJustification="right".
>
>
>
> In that case "-  " or " - " would not be trimmed on the right side, but on
> the left, leaving them with spaces after the hyphen, so "%WSP*;-" won't
> match them.
>
>
>
> So, the rationale for suggesting "%WSP*;-%WSP*;" i.e., with WSP* on both
> sides, is so that your nilValue matching conventions are  insensitive to
> type and to whether you use text justification of left or right.
>
>
>
>
>
> On Fri, Dec 22, 2023 at 8:01 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Folks,
>
>
>
> I have a fixed-length field (3) that has hyphen as the nilValue. The
> hyphen can be positioned anywhere in the field, e.g.,
>
>
>
> .../-  /...
>
> .../ - /...
>
> .../  -/...
>
>
>
> What is the right way to specify the nilValue? I specified it this way:
>
>
>
> dfdl:nilValue="%WSP*;-"
>
>
>
> and it seems to work just fine.
>
>
>
> But I was told, “that only allows whitespace before the hyphen; it should
> be specified this way:
>
>
>
> dfdl:nilValue="%WSP*;-%WSP*;"
>
>
>
> What is the correct way?
>
>
>
> /Roger
>
>
>
>
>
>

Re: How to specify that the nilValue can occur anywhere within a fixed field?

Reply via email to