Re: How to specify that the nilValue can occur anywhere within a fixed field?

Mike Beckerle Thu, 04 Jan 2024 08:41:05 -0800

The discriminator helps because the message now told you exactly why it was
unable to consume all the data.


Without the discriminator, the check-constraints failed, and that caused it
to backtrack and *correctly* complete without consuming all the data.

You didn't intend this, but without the discriminator your schema means
that it is correct for it to do exactly that. Nothing in your schema
without the discriminator indicates anything is wrong.
If you were parsing data from a TCP network stream, this in fact may be the
behavior you want. In your case it's a file of data, and you want the
schema to mean that it must consume everything, so you need to add that
discriminator.

With the discriminator, now your schema means what you intended, which is
that ALL the data must parse against your Line structure.

The fact that Foo failed its checkConstraints(.) assertion means you now
know one of these things is the case at Line 2.
(1) your facets on Foo are wrong (fix the schema facets)
(2) the test data is incorrect and doesn't match the facets (fix the test
data)
(3) the parsing creating the value for Foo that is checked against the
facets is wrong (fix the schema so that you get the right value for Foo)

But it has to be one of those.  You will know exactly which if you turn on
the trace feature (daffodil -t ) and see what happens when it populates the
<Foo> element.


On Thu, Jan 4, 2024 at 11:16 AM Roger L Costello <coste...@mitre.org> wrote:

> Hi Mike,
>
>
>
> Doing as you recommend:
>
>
>
>     <xs:element name="Test">
>         <xs:complexType>
>             <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition=
> "infix">
>                 <xs:element name="Line" maxOccurs="unbounded">
>                     <xs:complexType>
>                         <xs:sequence>
>                             <xs:group ref="discriminator_hasAnyData"/>
>                             <xs:sequence dfdl:separator="/"
> dfdl:separatorPosition="infix">
>                                 <xs:element name="A" type="xs:string" />
>                                 <xs:element ref="Foo"/>
>                                 <xs:element name="B" type="xs:string" />
>                             </xs:sequence>
>                         </xs:sequence>
>                     </xs:complexType>
>                 </xs:element>
>             </xs:sequence>
>         </xs:complexType>
>     </xs:element>
>
>
>
> I now get this error message:
>
>
>
> *Parse Error: Failed to populate Line[2]. Cause: Parse Error: Assertion
> failed: Assertion expression failed: { dfdl:checkConstraints(.) }*
>
> *Schema context: element reference {}Foo Location line 21 column 34 in
> test.dfdl.xsd*
>
>
>
> Commenting out the group ref:
>
>
>
>     <xs:element name="Test">
>         <xs:complexType>
>             <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition=
> "infix">
>                 <xs:element name="Line" maxOccurs="unbounded">
>                     <xs:complexType>
>                         <xs:sequence>
>                             <!--<xs:group
> ref="discriminator_hasAnyData"/>-->
>                             <xs:sequence dfdl:separator="/"
> dfdl:separatorPosition="infix">
>                                 <xs:element name="A" type="xs:string" />
>                                 <xs:element ref="Foo"/>
>                                 <xs:element name="B" type="xs:string" />
>                             </xs:sequence>
>                         </xs:sequence>
>                     </xs:complexType>
>                 </xs:element>
>             </xs:sequence>
>         </xs:complexType>
>     </xs:element>
>
>
>
> I get this error message:
>
>
>
> *Left over data. Consumed 88 bit(s) with at least 312 bit(s) remaining.*
>
>
>
> How has the discriminator approach helped?
>
>
>
> /Roger
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Thursday, January 4, 2024 10:25 AM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere
> within a fixed field?
>
>
>
> Right, so you can't put it inside a sequence that has a separator, because
> that will then require another separator. DFDL doesn't know that your group
> can never contain any syntax or elements. It assumes a group ref means
> another term
>
> Right, so you can't put it inside a sequence that has a separator, because
> that will then require another separator.
>
> DFDL doesn't know that your group can never contain any syntax or
> elements. It assumes a group ref means another term so another separator.
>
>
>
> Here's where I would suggest putting it:
>
>
>
>                 <xs:element name="Line" maxOccurs="unbounded">
>                     <xs:complexType>
>
>                         <xs:sequence> <!-- new sequence without any
> separator -->
>
>                              <!-- In this format, if any data is
> available, then a line element exists! -->
>
>                              <xs:group ref="discriminator_hasAnyData"/>
>
>                              <!-- and that means this separated sequence
> must appear -->
>                             <xs:sequence dfdl:separator="/"
>  dfdl:separatorPosition="infix">
>                                 <xs:element name="A" type="xs:string" />
>                                 <xs:element ref="Foo"/>
>                                 <xs:element name="B" type="xs:string" />
>                             </xs:sequence>
>
>                       </xs:sequence><!-- end new sequence without any
> separator -->
>                     </xs:complexType>
>                 </xs:element>
>
>
>
> On Thu, Jan 4, 2024 at 10:11 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Mike,
>
>
>
> I am trying to get your discriminator suggestion working. I added it to
> the schema that is getting “left over data”. Where do I put the group ref?
> I tried it in several locations but wherever I put it, I got this error
> message:
>
>
>
> *Parse Error: Failed to populate Line[1]. Cause: Parse Error: Failed to
> parse infix separator. Cause: Parse Error: Separator '/' not found*
>
>
>
> Here’s my input:
>
>
>
> .../ABC/...
> .../-  /...
> .../ - /...
> .../  -/...
>
>
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:dfdl=http://www.ogf.org/dfdl/dfdl-1.0/ xmlns:xs=
> http://www.w3.org/2001/XMLSchema>
>
>     <xs:include schemaLocation=
> "../default-dfdl-properties/defaults.dfdl.xsd" />
>
>     <xs:annotation>
>         <xs:appinfo source=http://www.ogf.org/dfdl/>
>             <dfdl:format ref="default-dfdl-properties" />
>         </xs:appinfo>
>     </xs:annotation>
>
>     <xs:element name="Test">
>         <xs:complexType>
>             <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition=
> "infix">
>                 <xs:element name="Line" maxOccurs="unbounded">
>                     <xs:complexType>
>                         <xs:sequence dfdl:separator="/"
> dfdl:separatorPosition="infix">
>                             <xs:element name="A" type="xs:string" />
>                             <xs:group ref="discriminator_hasAnyData"/>
>                             <xs:element ref="Foo"/>
>                             <xs:element name="B" type="xs:string" />
>                         </xs:sequence>
>                     </xs:complexType>
>                 </xs:element>
>             </xs:sequence>
>         </xs:complexType>
>     </xs:element>
>
>     <xs:element name="Foo"
>                 type="Foo_simpleType"
>                 nillable="true"
>                 dfdl:nilKind="literalValue"
>                 dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-"
>                 dfdl:lengthKind="explicit"
>                 dfdl:length="3"
>                 dfdl:textTrimKind="padChar"
>                 dfdl:textPadKind="padChar"
>                 dfdl:textStringPadCharacter="%SP;"
>                 dfdl:textStringJustification="left"/>
>
>     <xs:simpleType name="Foo_simpleType">
>         <xs:restriction base="validString">
>             <xs:pattern value="ABC|DEF|GHI" />
>         </xs:restriction>
>     </xs:simpleType>
>
>     <xs:simpleType name="validString">
>         <xs:annotation>
>             <xs:appinfo source=http://www.ogf.org/dfdl/>
>                 <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
>             </xs:appinfo>
>         </xs:annotation>
>         <xs:restriction base="xs:string"/>
>     </xs:simpleType>
>
>     <xs:group name="discriminator_hasAnyData">
>         <xs:sequence>
>             <xs:annotation>
>                 <xs:appinfo source=http://www.ogf.org/dfdl/>
>                     <dfdl:discriminator testKind="pattern" testPattern=
> "[\s\S]"/>
>                 </xs:appinfo>
>            </xs:annotation>
>        </xs:sequence>
>     </xs:group>
>
> </xs:schema>
>
>
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Thursday, January 4, 2024 8:53 AM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere
> within a fixed field?
>
>
>
> My guess is that one of the whitespace characters is a tab, not a space or
> two spaces. So your nilValue doesn't match. That causes a subsequent parse
> error, and it backtracks, and your schema then succeeds, without consuming
> all the data. Your
>
> ZjQcmQRYFpfptBannerEnd
>
> My guess is that one of the whitespace characters is a tab, not a space or
> two spaces. So your nilValue doesn't match. That causes a subsequent parse
> error, and it backtracks, and your schema then succeeds, without consuming
> all the data.
>
>
>
> Your schema likely could be improved by adding discriminators. That's a
> common need when the "left over data" issue is reported. Your schema is
> currently happy to successfully complete parsing, but not consuming all the
> data. If your schema is for a file format where there is a requirement that
> it consume all the data, then discriminators should ensure all the data is
> consumed or a parse error occurs.
>
>
>
> I have found this discriminator useful:
>
>
>
> <dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/>
>
>
>
> This is true if the regex matches the front of the data stream at that
> point, which means "there is at least one character/byte of anything at
> all. I.e., there is more data to be had.
>
>
>
> For example if you have a file that is an array of records. So if there is
> more data, it must be a record. Ending the array before all the data is
> consumed because attempting to parse another record fails is not
> acceptable. So putting this discriminator on that record array element decl
> insures this. You will never get 'left over data' because the schema isn't
> allowed to succeed if there is data remaining.
>
>
>
> I like to wrap this discriminator in a group decl to make it self
> documenting:
>
>
>
> <group name="discriminator_hasAnyData">
>
>   <sequence>
>
>       <annotation><appinfo source="http://www.ogf.org/dfdl/";>
>
>           <dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/>
>
>        </appinfo></annotation>
>
>    </sequence>
>
> </group>
>
>
>
> Then a group reference to this is a compact one-liner, not 5 or 7 lines of
> sequence and annotation.
>
>
>
>
>
> On Thu, Jan 4, 2024 at 7:51 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Mike,
>
>
>
> To allow a hyphen to occur anywhere within a 3-character field I specified
> this:
>
>
>
> dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-"
>
>
>
> But that failed with the dreaded “Left over data” error message.
>
>
>
> Conversely, both of these succeeded:
>
>
>
> dfdl:nilValue="%WSP*;-%WSP*;"
>
> dfdl:nilValue="%WSP*;-"
>
>
>
> Why is that?
>
>
>
> /Roger
>
>
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Tuesday, January 2, 2024 11:58 AM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere
> within a fixed field?
>
>
>
> Tricky! For strings we typically justify left, meaning we trim padding
> characters on the right, i. e. , textStringJustification="left". That means
> if your data is "- " or " - ", then the spaces on the right side
>
> Tricky!
>
>
>
> For strings we typically justify left, meaning we trim padding characters
> on the right, i.e., textStringJustification="left".
>
>
>
> That means if your data is "-  " or " - ", then the spaces on the right
> side are trimmed away before comparison against the "%WSP*;-" nilValue is
> done.
>
>
>
> However, for numbers we typically justify right, meaning we trim on the
> left, ie., textNumberJustification="right".
>
>
>
> In that case "-  " or " - " would not be trimmed on the right side, but on
> the left, leaving them with spaces after the hyphen, so "%WSP*;-" won't
> match them.
>
>
>
> So, the rationale for suggesting "%WSP*;-%WSP*;" i.e., with WSP* on both
> sides, is so that your nilValue matching conventions are  insensitive to
> type and to whether you use text justification of left or right.
>
>
>
>
>
> On Fri, Dec 22, 2023 at 8:01 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Folks,
>
>
>
> I have a fixed-length field (3) that has hyphen as the nilValue. The
> hyphen can be positioned anywhere in the field, e.g.,
>
>
>
> .../-  /...
>
> .../ - /...
>
> .../  -/...
>
>
>
> What is the right way to specify the nilValue? I specified it this way:
>
>
>
> dfdl:nilValue="%WSP*;-"
>
>
>
> and it seems to work just fine.
>
>
>
> But I was told, “that only allows whitespace before the hyphen; it should
> be specified this way:
>
>
>
> dfdl:nilValue="%WSP*;-%WSP*;"
>
>
>
> What is the correct way?
>
>
>
> /Roger
>
>
>
>
>
>

Re: How to specify that the nilValue can occur anywhere within a fixed field?

Reply via email to