The below DFDL schema works fine. However, if I remove 
dfdl:textStandardDecimalSeparator then I get an error. Why? The 
dfdl:textNumberPattern does not contain "a decimal separator symbol ("."), or 
the E or @ symbols" so I shouldn't need it, right?  /Roger


<xs:element name="input">
    <xs:complexType>
        <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix">
            <xs:element name="Country" type="xs:string" />
            <xs:element name="NumberOfStudents" type="xs:integer"
                                dfdl:textNumberPattern="###,###;-#"
                                dfdl:textNumberRep="standard"
                                dfdl:textNumberCheckPolicy="strict"
                                dfdl:textStandardDecimalSeparator="{
                                    if (../Country eq 'US') then '.'
                                    else if (../Country eq 'FR') then ','
                                    else if (../Country eq 'UK') then '.'
                                    else '.'
                                }"
                                dfdl:textStandardGroupingSeparator="{
                                    if (../Country eq 'US') then ','
                                    else if (../Country eq 'FR') then '.'
                                    else if (../Country eq 'UK') then '%SP;'
                                    else ','
                                }"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>





-----Original Message-----
From: Steve Lawrence <slawre...@apache.org>
Sent: Wednesday, August 14, 2019 8:40 AM
To: users@daffodil.apache.org
Subject: [EXT] Re: How to model a fixed-length integer that may be padded with 
space on the left?



We use the ICU4J library for handling text to number conversion, and it has a 
parameter for strict vs lax parsing--we just set that flag based on the value 
of dfdl:textNumberCheckPolicy. Unfortunately, ICU4J's implementation of strict 
vs lax doesn't seem to exactly match DFDL's description (the plus sign is one 
example, but I believe there are others). We'd likely have to completely 
implement number parsing ourselves to match the DFDL spec exactly, but that 
would be a large effort and really isn't a high priority.



That said, I believe the original intention of DFDL was to match ICU's 
behavior, so this may have just been an oversight.





On 8/14/19 8:32 AM, Costello, Roger L. wrote:

> Thank you Steve!

>

> You wrote:

>

>  > If you set textNumberCheckPolicy="lax", then

>

>  > we do ignore leading plus signs in the data

>

> The DFDL specification doesn't seem to say that a leading plus sign is 
> ignored.

> Here's what it says:

>

> If 'lax' and dfdl:textNumberRep is 'standard' then grouping separators

> are ignored, leading and trailing whitespace  is ignored, leading

> zeros are ignored and quoted characters may be omitted.

>

> Nothing about ignoring plus signs in that.

>

> Is "ignoring leading plus sign" a Daffodil-specific feature?

>

> /Roger

>

> -----Original Message-----

> From: Steve Lawrence <slawre...@apache.org<mailto:slawre...@apache.org>>

> Sent: Wednesday, August 14, 2019 7:46 AM

> To: users@daffodil.apache.org<mailto:users@daffodil.apache.org>

> Subject: [EXT] Re: How to model a fixed-length integer that may be

> padded with space on the left?

>

> That is a function of the value of dfdl:textNumberPattern property,

> which actually describes two subpatterns for the format of text

> numbers--one for positive values and one for negative values. The full

> syntax is

>

>    dfdl:textNumberPattern="positivePattern;negativePattern"

>

> For example, you'll sometimes see formats where the negative number is

> wrapped in parenthesis instead of prefixed with a minus sign, so the

> property would look something like this:

>

>    dfdl:textNumberPattern="#,##0.###;(#,##0.###)"

>

> If the semicolon/negativePattern is not provided in the pattern it is

> assumed that negativePattern is the same as positivePattern but with a minus 
> sign prefix.

>

> If you require a positive sign at the beginning of the number, you

> want a pattern that's something like this:

>

>    dfdl:textNumberPattern="+#,##0.###;-#,##0.###"

>

> Note how the positive subpattern has a plus sign prefix and the

> negative subpattern has a minus sign prefix.

>

> However, this now requires that positive numbers always have a plus

> sign, so your "12" will fail to parse. Unfortunately, there's no way

> in the pattern syntax to make the plus sign prefix optional.

>

> But if you set dfdl:textNumberCheckPolicy="lax", then we do ignore

> leading plus signs in the data, and whatever pattern you're currently

> using should work for both "12" and "+12".

>

> On 8/14/19 7:19 AM, Costello, Roger L. wrote:

>

>  > I did some further testing after fixing the errors.

>

>  >

>

>  > Now the DFDL schema processes this input perfectly:

>

>  >

>

>  >                  12

>

>  >

>

>  > And it processes this input perfectly:

>

>  >

>

>  >                  -12

>

>  >

>

>  > But it gives an error with this input:

>

>  >

>

>  >                  +12

>

>  >

>

>  > Here's the error message:

>

>  >

>

>  > *[error] Parse Error: Convert to Unlimited Size Integer (for

>

>  > xs:integer): Unable to parse '+12' (using up all characters).*

>

>  >

>

>  > Why do I get that error? It's illegal to use the plus sign with numbers?

>

>  >

>

>  > /Roger

>

>  >

>

>  > -----Original Message-----

>

>  > From: Steve Lawrence <slawre...@apache.org

> <mailto:slawre...@apache.org>>

>

>  > Sent: Tuesday, August 13, 2019 12:06 PM

>

>  > To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> 
> <mailto:users@daffodil.apache.org>

>

>  > Subject: [EXT] Re: How to model a fixed-length integer that may be

>

>  > padded with space on the left?

>

>  >

>

>  > Yep, sounds correct to me.

>

>  >

>

>  > - Steve

>

>  >

>

>  > On 8/13/19 12:03 PM, Costello, Roger L. wrote:

>

>  >

>

>  >  > Steve, one more thing, please. I'd like for you to confirm my

> "lesson learned."

>

>  >

>

>  >  >

>

>  >

>

>  >  > Lesson Learned:

>

>  >

>

>  >  >

>

>  >

>

>  >  > If the value of textNumberPattern contains a decimal point,

> comma,

>

>  > or exponent character, then you must define textStandardDecimalSeparator.

>

>  >

>

>  >  >

>

>  >

>

>  >  > Correct?

>

>  >

>

>  >  >

>

>  >

>

>  >  > /Roger

>

>  >

>

>  >  >

>

>  >

>

>  >  > -----Original Message-----

>

>  >

>

>  >  > From: Costello, Roger L. <coste...@mitre.org

>

>  > <mailto:coste...@mitre.org>>

>

>  >

>

>  >  > Sent: Tuesday, August 13, 2019 11:46 AM

>

>  >

>

>  >  > To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> 
> <mailto:users@daffodil.apache.org>

> <mailto:users@daffodil.apache.org>

>

>  >

>

>  >  > Subject: Re: How to model a fixed-length integer that may be

> padded

>

>  > with space on the left?

>

>  >

>

>  >  >

>

>  >

>

>  >  >> For the first error, I'd guess that your dfdl:textNumberPattern

>

>  > has a

>

>  >

>

>  >  >> decimal character in it.

>

>  >

>

>  >  >

>

>  >

>

>  >  > Ah! Yes it does:

>

>  >

>

>  >  >

>

>  >

>

>  >  > textNumberPattern="#,##0.###;-#,##0.###"

>

>  >

>

>  >  >

>

>  >

>

>  >  >> This makes me think the parsed string is actually "12<CRLF>"

>

>  >

>

>  >  >

>

>  >

>

>  >  > Ah! Once again, you are spot on. I had a cr and when I removed

> it,

>

>  > the error went away.

>

>  >

>

>  >  >

>

>  >

>

>  >  > Amazing piece of detective work Steve! Thank you!

>

>  >

>

>  >  >

>

>  >

>

>  >  > /Roger

>

>  >

>

>  >  >

>

>  >

>

>  >  > -----Original Message-----

>

>  >

>

>  >  > From: Steve Lawrence <slawre...@apache.org

>

>  > <mailto:slawre...@apache.org>>

>

>  >

>

>  >  > Sent: Tuesday, August 13, 2019 11:29 AM

>

>  >

>

>  >  > To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> 
> <mailto:users@daffodil.apache.org>

> <mailto:users@daffodil.apache.org>

>

>  >

>

>  >  > Subject: [EXT] Re: How to model a fixed-length integer that may

> be

>

>  > padded with space on the left?

>

>  >

>

>  >  >

>

>  >

>

>  >  > For the first error, I'd guess that your dfdl:textNumberPattern

> has

>

>  > a decimal character in it. Or due to a bug in our number parsing

>

>  > library (ICU), we also require the property if there's a comma or

> exponent characters in the pattern.

>

>  > If the pattern has any of these characters, the property must be

>

>  > provided so that ICU knows how to consume decimals/groups if it

> comes

>

>  > across one when parsing the number. Even though they might be used

> for

>

>  > this particular case since it's an integer.

>

>  >

>

>  >  >

>

>  >

>

>  >  > For the second error, that means that ICU was unable to convert

> the

>

>  > string to a number based on the dfdl:textNumberPattern and other

> textNumber properties.

>

>  > Common causes of this are the textNumberPattern doesn't allow the

>

>  > string, or one of the other text number properties aren't set right.

>

>  > However, in this specific case, the error message makes it look

> like

>

>  > there's a newline after the 12. This makes me think the parsed

> string

>

>  > is actually "12<CRLF>" or something similar, which will fail to

> parse

>

>  > if dfdl:textNumberCheckPolicy="strict". So you need to either 1)

>

>  > configure your schema to consume this trailing NL via a

>

>  > terminator/separator/padding/etc. or 2) set

>

>  > dfdl:textNumberCheckPolicy="lax", which tells Daffodil to strip off

> leading/trailing whitespace/newlines (among other things).

>

>  >

>

>  >  >

>

>  >

>

>  >  > On 8/13/19 10:10 AM, Costello, Roger L. wrote:

>

>  >

>

>  >  >> Thank you Steve. Truly outstanding response.

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> I have a few follow-up questions, please.

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> I have this ultra-simple DFDL schema:

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> <xs:element name="input">

>

>  >

>

>  >  >>     <xs:complexType>

>

>  >

>

>  >  >>         <xs:sequence>

>

>  >

>

>  >  >>             <xs:element name="NumberOfStudents" type="xs:integer" />

>

>  >

>

>  >  >>         </xs:sequence>

>

>  >

>

>  >  >>     </xs:complexType>

>

>  >

>

>  >  >> </xs:element>

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> My input file (input.txt) contains this:

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> 12

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> When I run Daffodil, I get these 2 errors:

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> [error] Schema Definition Error: Property

>

>  > textStandardDecimalSeparator is not defined.

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> [error] Parse Error: Convert to Unlimited Size Integer (for

>

>  >

>

>  >  >> xs:integer): Unable to parse '12 ' (using up all characters).

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> Why do I need to define the decimal point symbol? After all,

> the

>

>  > datatype is xs:integer, not xs:decimal.

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> For the second error message, I have no clue what it's saying.

>

>  > What is it saying, please?

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> /Roger

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> -----Original Message-----

>

>  >

>

>  >  >> From: Steve Lawrence <slawre...@apache.org

>

>  > <mailto:slawre...@apache.org>>

>

>  >

>

>  >  >> Sent: Monday, August 12, 2019 12:56 PM

>

>  >

>

>  >  >> To: users@daffodil.apache.org<mailto:users@daffodil.apache.org>

> <mailto:users@daffodil.apache.org>

> <mailto:users@daffodil.apache.org>

>

>  >

>

>  >  >> Subject: [EXT] Re: How to model a fixed-length integer that may

> be

>

>  > padded with space on the left?

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> The two properties aren't used unless dfdl:textTrimKind is set to 
> "padChar".

>

>  > Setting that should give you the behavior you expect. You'll also

>

>  > probably want to set textPadKind="padChar", which will add pad

>

>  > characters if needed during unparse.

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> Agreed that it would be nice if our errors about unused

> properties

>

>  > could explain why, but that's a more difficult problem to solve

> that

>

>  > just saying when properties aren't used. Especially since sometimes

> it

>

>  > requires the combination of various properties for a property to be set.

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> - Steve

>

>  >

>

>  >  >>

>

>  >

>

>  >  >> On 8/12/19 12:47 PM, Costello, Roger L. wrote:

>

>  >

>

>  >  >>> Hello DFDL community,

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> My input contain a Length field that must be of length 4. Here

> is

>

>  > a sample

>

>  > input:

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> .../ 101/...

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> There is a space prior to 101, although it might be hard to see it.

>

>  >

>

>  >  >>> So that field is of length 4.

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> The Length field could be nil; a dash is the nil value.

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> I figured this is the way to declare the Length element:

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> <xs:elementname="Length"

>

>  >

>

>  >  >>>      nillable="true"

>

>  >

>

>  >  >>>      type="xs:int"

>

>  >

>

>  >  >>>      dfdl:lengthKind="explicit"dfdl:length="4"

>

>  >

>

>  >  >>>      dfdl:lengthUnits="characters"

>

>  >

>

>  >  >>>      dfdl:nilValue="%WSP*;-%WSP*;"

>

>  >

>

>  >  >>>      dfdl:textNumberPadCharacter="%SP;"

>

>  >

>

>  >  >>>      dfdl:textNumberJustification="right"/>

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> But Daffodil gives these warning messages:

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> *Warning: DFDL property was ignored:

>

>  >

>

>  >  >>> textNumberJustification="right"*

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> *Warning: DFDL property was ignored:

>

>  > textNumberPadCharacter="%SP;"*

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> How come I get those warnings?

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> Anyway, I removed those two properties and then Daffodil

> simply

>

>  >

>

>  >  >>> refused to parse the Length field. How come? What is the right

> way to do this?

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> /Roger

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>> P.S. It would be nice if Daffodil, when outputting a warning

>

>  >

>

>  >  >>> message, gave a brief explanation of why. For example, why is

>

>  > textNumberJustification="right"

>

>  >

>

>  >  >>> ignored?

>

>  >

>

>  >  >>>

>

>  >

>

>  >  >>

>

>  >

>

>  >  >

>

>  >

>


Reply via email to