Re: How to specify that the nilValue can occur anywhere within a fixed field?

Mike Beckerle Thu, 04 Jan 2024 08:24:54 -0800

Yeah, it's this kind of thing that makes me wish we left nillable out of
DFDL entirely. It causes nothing but a snarl of subtle interactions of
features.
But nil-related features existed in various data-description systems back
in the day, so we had to drag it in.


That said, you have pad/trim, nilValues, fixed length, and delimiters all
interacting here. It's got some real complexity.

You could consider modeling this sort of data representation as an optional
element, not using nillable at all.

ex:

<choice>
   <sequence dfdl:initiator="-%SP;%SP %SP;-%SP %SP;%SP;-"/>
   <element .... />
</choice>

So if the hyphen indicator is present, the element doesn't exist in the
infoset at all?
This may be preferable.

On Thu, Jan 4, 2024 at 10:22 AM Roger L Costello <coste...@mitre.org> wrote:

>
>    - try dfdl:nilValue="- %SP;- %SP;%SP;-"
>
>
>
> That works.
>
>
>
> Ugh! That is awful.
>
>
>
> /Roger
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Thursday, January 4, 2024 10:15 AM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere
> within a fixed field?
>
>
>
> I think I understand this. I claim this is an interaction of trimming of
> pad chars with your nil literals. try dfdl: nilValue="- %SP;- %SP;%SP;-" i.
> e. , remove any %SP; on the right of the hyphen. My theory is that the
> string, ex: "-
>
> I think I understand this.
>
>
>
> I claim this is an interaction of trimming of pad chars with your nil
> literals.
>
>
>
> try dfdl:nilValue="- %SP;- %SP;%SP;-" i.e., remove any %SP; on the right
> of the hyphen.
>
>
>
> My theory is that the string, ex: "-  " is getting trimmed of the spaces
> on the right due to textTrimKind='padChar' Hence "-  " becomes "-" which
> doesn't match any of the nilValues.
>
>
>
> The reason this works with %WSP*; is that entity can match zero
> characters. Hence, still matches even if the padding is trimmed away.
>
>
>
> On Thu, Jan 4, 2024 at 9:58 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Mike,
>
>
>
> I created a simple DFDL schema which illustrates the problem with
> dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-".
>
>
>
> Here is the input (I checked, there are no tabs in the input):
>
>
>
> .../ABC/...
> .../-  /...
> .../ - /...
> .../  -/...
>
>
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:dfdl=http://www.ogf.org/dfdl/dfdl-1.0/ xmlns:xs=
> http://www.w3.org/2001/XMLSchema>
>
>     <xs:include schemaLocation=
> "../default-dfdl-properties/defaults.dfdl.xsd" />
>
>     <xs:annotation>
>         <xs:appinfo source=http://www.ogf.org/dfdl/>
>             <dfdl:format ref="default-dfdl-properties" />
>         </xs:appinfo>
>     </xs:annotation>
>
>     <xs:element name="Test">
>         <xs:complexType>
>             <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition=
> "infix">
>                 <xs:element name="Line" maxOccurs="unbounded">
>                     <xs:complexType>
>                         <xs:sequence dfdl:separator="/"
> dfdl:separatorPosition="infix">
>                             <xs:element name="A" type="xs:string" />
>                             <xs:element ref="Foo"/>
>                             <xs:element name="B" type="xs:string" />
>                         </xs:sequence>
>                     </xs:complexType>
>                 </xs:element>
>             </xs:sequence>
>         </xs:complexType>
>     </xs:element>
>
>     <xs:element name="Foo"
>                 type="Foo_simpleType"
>                 nillable="true"
>                 dfdl:nilKind="literalValue"
>                 dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-"
>                 dfdl:lengthKind="explicit"
>                 dfdl:length="3"
>                 dfdl:textTrimKind="padChar"
>                 dfdl:textPadKind="padChar"
>                 dfdl:textStringPadCharacter="%SP;"
>                 dfdl:textStringJustification="left"/>
>
>     <xs:simpleType name="Foo_simpleType">
>         <xs:restriction base="validString">
>             <xs:pattern value="ABC|DEF|GHI" />
>         </xs:restriction>
>     </xs:simpleType>
>
>     <xs:simpleType name="validString">
>         <xs:annotation>
>             <xs:appinfo source=http://www.ogf.org/dfdl/>
>                 <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
>             </xs:appinfo>
>         </xs:annotation>
>         <xs:restriction base="xs:string"/>
>     </xs:simpleType>
>
>
> </xs:schema>
>
>
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Thursday, January 4, 2024 8:53 AM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere
> within a fixed field?
>
>
>
> My guess is that one of the whitespace characters is a tab, not a space or
> two spaces. So your nilValue doesn't match. That causes a subsequent parse
> error, and it backtracks, and your schema then succeeds, without consuming
> all the data. Your
>
> My guess is that one of the whitespace characters is a tab, not a space or
> two spaces. So your nilValue doesn't match. That causes a subsequent parse
> error, and it backtracks, and your schema then succeeds, without consuming
> all the data.
>
>
>
> Your schema likely could be improved by adding discriminators. That's a
> common need when the "left over data" issue is reported. Your schema is
> currently happy to successfully complete parsing, but not consuming all the
> data. If your schema is for a file format where there is a requirement that
> it consume all the data, then discriminators should ensure all the data is
> consumed or a parse error occurs.
>
>
>
> I have found this discriminator useful:
>
>
>
> <dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/>
>
>
>
> This is true if the regex matches the front of the data stream at that
> point, which means "there is at least one character/byte of anything at
> all. I.e., there is more data to be had.
>
>
>
> For example if you have a file that is an array of records. So if there is
> more data, it must be a record. Ending the array before all the data is
> consumed because attempting to parse another record fails is not
> acceptable. So putting this discriminator on that record array element decl
> insures this. You will never get 'left over data' because the schema isn't
> allowed to succeed if there is data remaining.
>
>
>
> I like to wrap this discriminator in a group decl to make it self
> documenting:
>
>
>
> <group name="discriminator_hasAnyData">
>
>   <sequence>
>
>       <annotation><appinfo source="http://www.ogf.org/dfdl/";>
>
>           <dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/>
>
>        </appinfo></annotation>
>
>    </sequence>
>
> </group>
>
>
>
> Then a group reference to this is a compact one-liner, not 5 or 7 lines of
> sequence and annotation.
>
>
>
>
>
> On Thu, Jan 4, 2024 at 7:51 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Mike,
>
>
>
> To allow a hyphen to occur anywhere within a 3-character field I specified
> this:
>
>
>
> dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-"
>
>
>
> But that failed with the dreaded “Left over data” error message.
>
>
>
> Conversely, both of these succeeded:
>
>
>
> dfdl:nilValue="%WSP*;-%WSP*;"
>
> dfdl:nilValue="%WSP*;-"
>
>
>
> Why is that?
>
>
>
> /Roger
>
>
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Tuesday, January 2, 2024 11:58 AM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere
> within a fixed field?
>
>
>
> Tricky! For strings we typically justify left, meaning we trim padding
> characters on the right, i. e. , textStringJustification="left". That means
> if your data is "- " or " - ", then the spaces on the right side
>
> Tricky!
>
>
>
> For strings we typically justify left, meaning we trim padding characters
> on the right, i.e., textStringJustification="left".
>
>
>
> That means if your data is "-  " or " - ", then the spaces on the right
> side are trimmed away before comparison against the "%WSP*;-" nilValue is
> done.
>
>
>
> However, for numbers we typically justify right, meaning we trim on the
> left, ie., textNumberJustification="right".
>
>
>
> In that case "-  " or " - " would not be trimmed on the right side, but on
> the left, leaving them with spaces after the hyphen, so "%WSP*;-" won't
> match them.
>
>
>
> So, the rationale for suggesting "%WSP*;-%WSP*;" i.e., with WSP* on both
> sides, is so that your nilValue matching conventions are  insensitive to
> type and to whether you use text justification of left or right.
>
>
>
>
>
> On Fri, Dec 22, 2023 at 8:01 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Folks,
>
>
>
> I have a fixed-length field (3) that has hyphen as the nilValue. The
> hyphen can be positioned anywhere in the field, e.g.,
>
>
>
> .../-  /...
>
> .../ - /...
>
> .../  -/...
>
>
>
> What is the right way to specify the nilValue? I specified it this way:
>
>
>
> dfdl:nilValue="%WSP*;-"
>
>
>
> and it seems to work just fine.
>
>
>
> But I was told, “that only allows whitespace before the hyphen; it should
> be specified this way:
>
>
>
> dfdl:nilValue="%WSP*;-%WSP*;"
>
>
>
> What is the correct way?
>
>
>
> /Roger
>
>
>
>
>
>

Re: How to specify that the nilValue can occur anywhere within a fixed field?

Reply via email to