Another thing that cause the dreaded left over data error message.

I have input containing this field:

TYPE:TEL

That is, the field is initiated by TYPE:

The field has a choice of values: either a string of 2-20 uppercase letters, or 
a string 1-56 uppercase letters initiated by TYPE:

Here’s the DFDL schema I used

<xs:choice dfdl:choiceLengthKind="implicit">
    <xs:element name="Identifier" type="non-zero-length-string" 
dfdl:lengthPattern="[A-Z]{2,20}"/>
    <xs:element name="Description" type="non-zero-length-string" 
dfdl:lengthPattern="[A-Z]{1,56}" dfdl:initiator="TYPE:"/>
</xs:choice>

With that choice and the above input, Daffodil doesn’t process the field and 
reports left over data. As best I can tell, Daffodil uses the first branch of 
the choice, notices that the regex doesn’t contain a colon, and then gives up. 
I think.

If I reverse the element declarations, then Daffodil successfully processes the 
input.

I guess that I really don’t understand why one works while the other doesn’t. 
Would you explain why Daffodil reports left over data with the first but not 
the second, please?

For completeness, here is the simpleType:

<xs:simpleType name="non-zero-length-string" dfdl:lengthKind="pattern">
   <xs:annotation>
        <xs:appinfo source=http://www.ogf.org/dfdl/>
            <dfdl:assert test="{ fn:nilled(.) or . ne '' }"/>
        </xs:appinfo>
    </xs:annotation>
    <xs:restriction base="xs:string"/>
</xs:simpleType>

/Roger
From: Mike Beckerle <mbecke...@apache.org>
Sent: Tuesday, May 3, 2022 6:32 PM
To: users@daffodil.apache.org
Subject: [EXT] Re: Catalog the causes of the dreaded “left over data” error 
message

Here is a trick used in one schema I've seen:

<xs:group name="requireNoDataLeft">
  <xs:sequence>
    <xs:element name="data" type="tns:tIntField" dfdl:length="1" minOccurs="0"/>
    <xs:sequence>
      <xs:annotation>
        <xs:appinfo source="http://www.ogf.org/dfdl/";>
          <dfdl:assert test="{ fn:not(fn:exists(data)) }" message="Data found 
where none was expected." />
        </xs:appinfo>
      </xs:annotation>
    </xs:sequence>
  </xs:sequence>
</xs:group>

So a group reference to "requireNoDataLeft" states "There cannot be any more 
data available."

This mostly is for the case where there is a surrounding "box" of data such as 
an element with lengthKind 'explicit' and you expect the described contents to 
use up everything in that box.

So if your first choice branch ends with a group ref to "requireNoDataLeft" 
then it must consume all available data, and will fail (and backtrack the 
choice to the next one) if there is data available after it.

On Tue, May 3, 2022 at 1:52 PM Roger L Costello 
<coste...@mitre.org<mailto:coste...@mitre.org>> wrote:
The “left over data” error occurs when there is a choice where the first branch 
matches the same data as the second branch and the second branch matches a bit 
more. Input data that matches the second branch fails because the first branch 
parses the input and then stops and reports left over data. See example below.

Is there a workaround? (without manually shuffling the order of the branches in 
the choice)

<xs:choice>
    <xs:element name="MilitaryDayTime">
        <xs:complexType>
            <xs:sequence dfdl:separator="">
                <xs:element name="Day" type="non-zero-length-string" 
dfdl:lengthPattern="[0-9]{2}"/>
                <xs:element name="HourTime" type="non-zero-length-string" 
dfdl:lengthPattern="[0-9]{2}"/>
                <xs:element name="MinuteTime" type="non-zero-length-string" 
dfdl:lengthPattern="[0-9]{2}"/>
                <xs:element name="TimeZone" type="non-zero-length-string" 
dfdl:lengthPattern="..."/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
   <xs:element name="DateTimeGroup">
        <xs:complexType>
            <xs:sequence dfdl:separator="">
                <xs:element name="Day" type="non-zero-length-string" 
dfdl:lengthPattern="[0-9]{2}"/>
                <xs:element name="HourTime" type="non-zero-length-string" 
dfdl:lengthPattern="[0-9]{2}"/>
                <xs:element name="MinuteTime" type="non-zero-length-string" 
dfdl:lengthPattern="[0-9]{2}"/>
                <xs:element name="TimeZone" type="non-zero-length-string" 
dfdl:lengthPattern="..."/>
                <xs:element name="MonthName" type="non-zero-length-string" 
dfdl:lengthPattern="…"/>
                <xs:element name="Year" type="non-zero-length-string" 
dfdl:lengthPattern="[0-9]{4}"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:choice>


From: Mike Beckerle <mbecke...@apache.org<mailto:mbecke...@apache.org>>
Sent: Monday, May 2, 2022 10:02 AM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org>
Subject: [EXT] Re: Catalog the causes of the dreaded “left over data” error 
message

I first encountered left-over-data with a dead-simple file format. Just a top 
level element named "records" with a minOccurs="0" maxOccurs="unbounded" array 
of elements named "record".

Due to minOccurs="0" such a schema is very happy to "successfully" parse zero 
records, and tell you the entire file contents are "left over data".

I learned one often wants to have minOccurs="1" to force it to at least be 
successful on one record.



On Fri, Apr 15, 2022 at 9:48 AM Roger L Costello 
<coste...@mitre.org<mailto:coste...@mitre.org>> wrote:
Hi Folks,

Have you encountered the “left over data” error message? If you’ve worked with 
Daffodil for more than 5 minutes, you undoubtedly have.

The problem with that error message is it gives you absolutely no clue what’s 
causing the problem.

Perhaps if we start cataloging the things that triggered the error message, 
then the Daffodil team will be able to provide better diagnostics. Here’s my 
contribution to said catalog.

-----------------------

In recent weeks I have encountered the dreaded “left over data” error message 
twice. After enormous effort I was able to figure out what the problems were in 
my DFDL schema. First I need to describe my DFDL schema.

My DFDL schema consists of a series of element declarations and within each 
element are declarations of subelements:

A
    A.1
    A.2
    …
B
    B.1
    B.2
    …
…

Each subelement is of type string and uses a regex to describe the subelement’s 
data (i.e., the subelements use dfdl:lengthKind=”pattern” and 
dfdl:lengthPattern=”regex”)

The first time that I got the “left over data” error message I found the cause 
was due to this bug in my DFDL schema: a dfdl:lengthPattern listed the regex 
alternatives in the wrong order (shortest to longest instead of longest to 
shortest). The error message said that Daffodil stopped consuming input at 
element G. The actual element containing the regex in wrong order was element 
G.2 (Daffodil stopped consuming input pretty near the problem)

After I fixed that bug I immediately got another “left over data” error at 
element J. After much more effort I found the bug: a regex erroneously had 
spaces in it. In this case, the error message said that Daffodil stopped 
consuming input at element J. The actual element containing the regex with 
spaces was element K.5 (Daffodil stopped consuming input pretty far from the 
problem)

/Roger

Reply via email to