Re: How to specify that the nilValue can occur anywhere within a fixed field?

Roger L Costello Thu, 04 Jan 2024 08:15:41 -0800

Hi Mike,

Doing as you recommend:


    <xs:element name="Test">
        <xs:complexType>
            <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
                <xs:element name="Line" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:group ref="discriminator_hasAnyData"/>
                            <xs:sequence dfdl:separator="/" 
dfdl:separatorPosition="infix">
                                <xs:element name="A" type="xs:string" />
                                <xs:element ref="Foo"/>
                                <xs:element name="B" type="xs:string" />
                            </xs:sequence>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

I now get this error message:

Parse Error: Failed to populate Line[2]. Cause: Parse Error: Assertion failed: 
Assertion expression failed: { dfdl:checkConstraints(.) }
Schema context: element reference {}Foo Location line 21 column 34 in 
test.dfdl.xsd

Commenting out the group ref:

    <xs:element name="Test">
        <xs:complexType>
            <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
                <xs:element name="Line" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:sequence>
                            <!--<xs:group ref="discriminator_hasAnyData"/>-->
                            <xs:sequence dfdl:separator="/" 
dfdl:separatorPosition="infix">
                                <xs:element name="A" type="xs:string" />
                                <xs:element ref="Foo"/>
                                <xs:element name="B" type="xs:string" />
                            </xs:sequence>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

I get this error message:

Left over data. Consumed 88 bit(s) with at least 312 bit(s) remaining.

How has the discriminator approach helped?

/Roger

From: Mike Beckerle <mbecke...@apache.org>
Sent: Thursday, January 4, 2024 10:25 AM
To: users@daffodil.apache.org
Subject: [EXT] Re: How to specify that the nilValue can occur anywhere within a 
fixed field?

Right, so you can't put it inside a sequence that has a separator, because that 
will then require another separator. DFDL doesn't know that your group can 
never contain any syntax or elements. It assumes a group ref means another term

Right, so you can't put it inside a sequence that has a separator, because that 
will then require another separator.
DFDL doesn't know that your group can never contain any syntax or elements. It 
assumes a group ref means another term so another separator.

Here's where I would suggest putting it:

                <xs:element name="Line" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:sequence> <!-- new sequence without any separator 
-->
                             <!-- In this format, if any data is available, 
then a line element exists! -->
                             <xs:group ref="discriminator_hasAnyData"/>
                             <!-- and that means this separated sequence must 
appear -->
                            <xs:sequence dfdl:separator="/" 
dfdl:separatorPosition="infix">
                                <xs:element name="A" type="xs:string" />
                                <xs:element ref="Foo"/>
                                <xs:element name="B" type="xs:string" />
                            </xs:sequence>
                      </xs:sequence><!-- end new sequence without any separator 
-->
                    </xs:complexType>
                </xs:element>

On Thu, Jan 4, 2024 at 10:11 AM Roger L Costello 
<coste...@mitre.org<mailto:coste...@mitre.org>> wrote:
Hi Mike,

I am trying to get your discriminator suggestion working. I added it to the 
schema that is getting “left over data”. Where do I put the group ref? I tried 
it in several locations but wherever I put it, I got this error message:

Parse Error: Failed to populate Line[1]. Cause: Parse Error: Failed to parse 
infix separator. Cause: Parse Error: Separator '/' not found

Here’s my input:

.../ABC/...
.../-  /...
.../ - /...
.../  -/...


<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dfdl=http://www.ogf.org/dfdl/dfdl-1.0/ 
xmlns:xs=http://www.w3.org/2001/XMLSchema>

    <xs:include schemaLocation="../default-dfdl-properties/defaults.dfdl.xsd" />

    <xs:annotation>
        <xs:appinfo source=http://www.ogf.org/dfdl/>
            <dfdl:format ref="default-dfdl-properties" />
        </xs:appinfo>
    </xs:annotation>

    <xs:element name="Test">
        <xs:complexType>
            <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
                <xs:element name="Line" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:sequence dfdl:separator="/" 
dfdl:separatorPosition="infix">
                            <xs:element name="A" type="xs:string" />
                            <xs:group ref="discriminator_hasAnyData"/>
                            <xs:element ref="Foo"/>
                            <xs:element name="B" type="xs:string" />
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

    <xs:element name="Foo"
                type="Foo_simpleType"
                nillable="true"
                dfdl:nilKind="literalValue"
                dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-"
                dfdl:lengthKind="explicit"
                dfdl:length="3"
                dfdl:textTrimKind="padChar"
                dfdl:textPadKind="padChar"
                dfdl:textStringPadCharacter="%SP;"
                dfdl:textStringJustification="left"/>

    <xs:simpleType name="Foo_simpleType">
        <xs:restriction base="validString">
            <xs:pattern value="ABC|DEF|GHI" />
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="validString">
        <xs:annotation>
            <xs:appinfo source=http://www.ogf.org/dfdl/>
                <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
            </xs:appinfo>
        </xs:annotation>
        <xs:restriction base="xs:string"/>
    </xs:simpleType>

    <xs:group name="discriminator_hasAnyData">
        <xs:sequence>
            <xs:annotation>
                <xs:appinfo source=http://www.ogf.org/dfdl/>
                    <dfdl:discriminator testKind="pattern" 
testPattern="[\s\S]"/>
                </xs:appinfo>
           </xs:annotation>
       </xs:sequence>
    </xs:group>

</xs:schema>


From: Mike Beckerle <mbecke...@apache.org<mailto:mbecke...@apache.org>>
Sent: Thursday, January 4, 2024 8:53 AM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org>
Subject: [EXT] Re: How to specify that the nilValue can occur anywhere within a 
fixed field?

My guess is that one of the whitespace characters is a tab, not a space or two 
spaces. So your nilValue doesn't match. That causes a subsequent parse error, 
and it backtracks, and your schema then succeeds, without consuming all the 
data. Your
ZjQcmQRYFpfptBannerEnd
My guess is that one of the whitespace characters is a tab, not a space or two 
spaces. So your nilValue doesn't match. That causes a subsequent parse error, 
and it backtracks, and your schema then succeeds, without consuming all the 
data.

Your schema likely could be improved by adding discriminators. That's a common 
need when the "left over data" issue is reported. Your schema is currently 
happy to successfully complete parsing, but not consuming all the data. If your 
schema is for a file format where there is a requirement that it consume all 
the data, then discriminators should ensure all the data is consumed or a parse 
error occurs.

I have found this discriminator useful:

<dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/>

This is true if the regex matches the front of the data stream at that point, 
which means "there is at least one character/byte of anything at all. I.e., 
there is more data to be had.

For example if you have a file that is an array of records. So if there is more 
data, it must be a record. Ending the array before all the data is consumed 
because attempting to parse another record fails is not acceptable. So putting 
this discriminator on that record array element decl insures this. You will 
never get 'left over data' because the schema isn't allowed to succeed if there 
is data remaining.

I like to wrap this discriminator in a group decl to make it self documenting:

<group name="discriminator_hasAnyData">
  <sequence>
      <annotation><appinfo source="http://www.ogf.org/dfdl/";>
          <dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/>
       </appinfo></annotation>
   </sequence>
</group>

Then a group reference to this is a compact one-liner, not 5 or 7 lines of 
sequence and annotation.


On Thu, Jan 4, 2024 at 7:51 AM Roger L Costello 
<coste...@mitre.org<mailto:coste...@mitre.org>> wrote:
Hi Mike,

To allow a hyphen to occur anywhere within a 3-character field I specified this:

dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-"

But that failed with the dreaded “Left over data” error message.

Conversely, both of these succeeded:

dfdl:nilValue="%WSP*;-%WSP*;"
dfdl:nilValue="%WSP*;-"

Why is that?

/Roger


From: Mike Beckerle <mbecke...@apache.org<mailto:mbecke...@apache.org>>
Sent: Tuesday, January 2, 2024 11:58 AM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org>
Subject: [EXT] Re: How to specify that the nilValue can occur anywhere within a 
fixed field?

Tricky! For strings we typically justify left, meaning we trim padding 
characters on the right, i. e. , textStringJustification="left". That means if 
your data is "- " or " - ", then the spaces on the right side
Tricky!

For strings we typically justify left, meaning we trim padding characters on 
the right, i.e., textStringJustification="left".

That means if your data is "-  " or " - ", then the spaces on the right side 
are trimmed away before comparison against the "%WSP*;-" nilValue is done.

However, for numbers we typically justify right, meaning we trim on the left, 
ie., textNumberJustification="right".

In that case "-  " or " - " would not be trimmed on the right side, but on the 
left, leaving them with spaces after the hyphen, so "%WSP*;-" won't match them.

So, the rationale for suggesting "%WSP*;-%WSP*;" i.e., with WSP* on both sides, 
is so that your nilValue matching conventions are  insensitive to type and to 
whether you use text justification of left or right.


On Fri, Dec 22, 2023 at 8:01 AM Roger L Costello 
<coste...@mitre.org<mailto:coste...@mitre.org>> wrote:

Hi Folks,



I have a fixed-length field (3) that has hyphen as the nilValue. The hyphen can 
be positioned anywhere in the field, e.g.,



.../-  /...

.../ - /...

.../  -/...



What is the right way to specify the nilValue? I specified it this way:


dfdl:nilValue="%WSP*;-"



and it seems to work just fine.



But I was told, “that only allows whitespace before the hyphen; it should be 
specified this way:


dfdl:nilValue="%WSP*;-%WSP*;"



What is the correct way?



/Roger

Re: How to specify that the nilValue can occur anywhere within a fixed field?

Reply via email to