100ff ... Huh?

Beckerle, Mike Tue, 25 Jun 2019 13:25:21 -0700

It is perhaps worth it to review the Data Syntax Grammar for DFDL from Section 
9.2 of the spec.



A simple element such as your test1 which is a non-nillable integer has these 
grammar clauses:


SimpleNormalRep = LeftFraming PrefixLength SimpleContent RightFraming



LeftFraming = LeadingAlignment Initiator

RightFraming = Terminator TrailingAlignment



PrefixLength = SimpleContent | PrefixPrefixLength SimpleContent

PrefixPrefixLength = SimpleContent


SimpleContent =   LeftPadding [ NilLogicalValue | SimpleValue ]  RightPadOrFill


LeadingAlignment = LeadingSkip AlignmentFill

TrailingAlignment = TrailingSkip

RightPadOrFill = RightPadding | RightFill | RightPadding RightFill


If I substitute all those in to the SimpleNormalRep clause, but simplifying by 
assuming the PrefixLength is zero bits because you are not using 
lengthKind='prefixed' I get:


LeadingSkip AlignmentFill Initiator LeftPadding SimpleValue RightPadding 
RightFill Terminator TrailingSkip


All of those terminals of the grammar have properties which control whether 
they have anything in them or they are zero length.


Now I think we easily see your format doesn't have properties that would cause 
any of these framing regions:

LeadingSkip AlignmentFill Initiator Terminator TrailingSkip

to have any bits in them.


That leaves what we call the SimpleContent per the production above.

LeftPadding SimpleValue RightPadding RightFill


You have padding turned off, so we're down to

SimpleValue RightFill


Your SimpleValue for integer is controlled by the dfdl:textNumberPattern. 
Logical value 100 with pattern #,### will be "100"


So you have "100" followed by RightFill which contains the fill byte repeated 
until the length of the element is reached. Your dfdl:fillByte is "f", so you 
get "100ff" because the length of the element is 5 (characters or bytes... not 
sure what your dfdl:lengthUnits are but in ascii encoding they're equivalent.)


That's a long-winded way to come to the same conclusion Steve L. did, but 
provides a methodical way to eliminate why "ff" showed up there. It had to be 
RightPadding, RightFill, Terminator, or TrailingSkip since those are the only 
things after the SimpleValue

________________________________
From: Costello, Roger L. <[email protected]>
Sent: Tuesday, June 25, 2019 1:32:20 PM
To: [email protected]
Subject: 0,100 --> parse --> 100 --> 100ff ... Huh?


Hello DFDL community,



My input file has this:



0,100

0,100



My DFDL schema is this:



<xs:element name="input">
    <xs:complexType>
        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
            <xs:element name="test1" type="xs:unsignedInt"
                dfdl:length="5" dfdl:lengthKind="explicit"
                dfdl:textNumberCheckPolicy="strict"
                dfdl:textNumberPattern="#,###" />
            <xs:element name="test2" type="xs:unsignedInt"
                dfdl:length="5" dfdl:lengthKind="explicit"
                dfdl:textNumberCheckPolicy="strict"
                dfdl:textNumberPattern="0,000" />
        </xs:sequence>
    </xs:complexType>
</xs:element>



The output of parsing is this:



<input>
  <test1>100</test1>
  <test2>100</test2>
</input>



The output of unparsing is this:



100ff
0,100



Huh?



Why am I getting 100ff?



I think the lesson learned is never use the pound (#) symbol in 
dfdl:textNumberPattern. Do you agree?



/Roger

Re: 0,100 --> parse --> 100 --> 100ff ... Huh?

Reply via email to