It's required because of a bug in ICU4J. It's been a while, but I believe the issue was that ICU4J would take into account decimal and grouping separators even if the pattern did not contain one, or at the very least it could get confused. Additionally, ICU sets default separators based on the locale which can cause problems.
As an example, let's say we have the following: <xs:element name="number" type="xs:double" dfdl:textStandardNumberPattern="###,###" dfdl:textStandardGroupingSeparator="," /> So decimal separator is not provided, but that should be fine since there's no decimal in the pattern. However, if we are in a FR locale, ICU will assume the the decimal separator is a comma. But now both the grouping separator and decimal separator are commas. ICU still seems to take into account the decimal separator, and incorrectly parses "123,123" as the double value 123.456, instead of the integer 123456. I'm not sure if this is the exact issue, but it was something similar to that. Because of this, it's easiest and safest to just always require certain separators even though technically they shouldn't be needed. On 8/14/19 2:17 PM, Costello, Roger L. wrote: > The below DFDL schema works fine. However, if I remove > dfdl:textStandardDecimalSeparator then I get an error. Why? The > dfdl:textNumberPattern does not contain “a decimal separator symbol ("."), or > the E or @ symbols” so I shouldn’t need it, right? /Roger > > <xs:elementname="input"> > <xs:complexType> > <xs:sequencedfdl:separator="%NL;"dfdl:separatorPosition="infix"> > <xs:elementname="Country"type="xs:string"/> > <xs:elementname="NumberOfStudents"type="xs:integer" > dfdl:textNumberPattern="###,###;-#" > dfdl:textNumberRep="standard" > dfdl:textNumberCheckPolicy="strict" > dfdl:textStandardDecimalSeparator="{ > if (../Country eq 'US') then '.' > else if (../Country eq 'FR') then ',' > else if (../Country eq 'UK') then '.' > else '.' > }" > dfdl:textStandardGroupingSeparator="{ > if (../Country eq 'US') then ',' > else if (../Country eq 'FR') then '.' > else if (../Country eq 'UK') then '%SP;' > else ',' > }"/> > </xs:sequence> > </xs:complexType> > </xs:element> > > -----Original Message----- > From: Steve Lawrence <slawre...@apache.org> > Sent: Wednesday, August 14, 2019 8:40 AM > To: users@daffodil.apache.org > Subject: [EXT] Re: How to model a fixed-length integer that may be padded > with > space on the left? > > We use the ICU4J library for handling text to number conversion, and it has a > parameter for strict vs lax parsing--we just set that flag based on the value > of > dfdl:textNumberCheckPolicy. Unfortunately, ICU4J's implementation of strict > vs > lax doesn't seem to exactly match DFDL's description (the plus sign is one > example, but I believe there are others). We'd likely have to completely > implement number parsing ourselves to match the DFDL spec exactly, but that > would be a large effort and really isn't a high priority. > > That said, I believe the original intention of DFDL was to match ICU's > behavior, > so this may have just been an oversight. > > On 8/14/19 8:32 AM, Costello, Roger L. wrote: > > > Thank you Steve! > > > > > > You wrote: > > > > > > > If you set textNumberCheckPolicy="lax", then > > > > > > > we do ignore leading plus signs in the data > > > > > > The DFDL specification doesn't seem to say that a leading plus sign is > ignored. > > > Here's what it says: > > > > > > If 'lax' and dfdl:textNumberRep is 'standard' then grouping separators > > > are ignored, leading and trailing whitespace is ignored, leading > > > zeros are ignored and quoted characters may be omitted. > > > > > > Nothing about ignoring plus signs in that. > > > > > > Is “ignoring leading plus sign” a Daffodil-specific feature? > > > > > > /Roger > > > > > > -----Original Message----- > > > From: Steve Lawrence <slawre...@apache.org <mailto:slawre...@apache.org>> > > > Sent: Wednesday, August 14, 2019 7:46 AM > > > To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> > > > Subject: [EXT] Re: How to model a fixed-length integer that may be > > > padded with space on the left? > > > > > > That is a function of the value of dfdl:textNumberPattern property, > > > which actually describes two subpatterns for the format of text > > > numbers--one for positive values and one for negative values. The full > > > syntax is > > > > > > dfdl:textNumberPattern="positivePattern;negativePattern" > > > > > > For example, you'll sometimes see formats where the negative number is > > > wrapped in parenthesis instead of prefixed with a minus sign, so the > > > property would look something like this: > > > > > > dfdl:textNumberPattern="#,##0.###;(#,##0.###)" > > > > > > If the semicolon/negativePattern is not provided in the pattern it is > > > assumed that negativePattern is the same as positivePattern but with a > minus > sign prefix. > > > > > > If you require a positive sign at the beginning of the number, you > > > want a pattern that's something like this: > > > > > > dfdl:textNumberPattern="+#,##0.###;-#,##0.###" > > > > > > Note how the positive subpattern has a plus sign prefix and the > > > negative subpattern has a minus sign prefix. > > > > > > However, this now requires that positive numbers always have a plus > > > sign, so your "12" will fail to parse. Unfortunately, there's no way > > > in the pattern syntax to make the plus sign prefix optional. > > > > > > But if you set dfdl:textNumberCheckPolicy="lax", then we do ignore > > > leading plus signs in the data, and whatever pattern you're currently > > > using should work for both "12" and "+12". > > > > > > On 8/14/19 7:19 AM, Costello, Roger L. wrote: > > > > > > > I did some further testing after fixing the errors. > > > > > > > > > > > > > > Now the DFDL schema processes this input perfectly: > > > > > > > > > > > > > > 12 > > > > > > > > > > > > > > And it processes this input perfectly: > > > > > > > > > > > > > > -12 > > > > > > > > > > > > > > But it gives an error with this input: > > > > > > > > > > > > > > +12 > > > > > > > > > > > > > > Here's the error message: > > > > > > > > > > > > > > *[error] Parse Error: Convert to Unlimited Size Integer (for > > > > > > > xs:integer): Unable to parse '+12' (using up all characters).* > > > > > > > > > > > > > > Why do I get that error? It's illegal to use the plus sign with numbers? > > > > > > > > > > > > > > /Roger > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Steve Lawrence <slawre...@apache.org > > > <mailto:slawre...@apache.org>> > > > > > > > Sent: Tuesday, August 13, 2019 12:06 PM > > > > > > > To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > > > > > > > Subject: [EXT] Re: How to model a fixed-length integer that may be > > > > > > > padded with space on the left? > > > > > > > > > > > > > > Yep, sounds correct to me. > > > > > > > > > > > > > > - Steve > > > > > > > > > > > > > > On 8/13/19 12:03 PM, Costello, Roger L. wrote: > > > > > > > > > > > > > > > Steve, one more thing, please. I'd like for you to confirm my > > > "lesson learned." > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Lesson Learned: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If the value of textNumberPattern contains a decimal point, > > > comma, > > > > > > > or exponent character, then you must define > textStandardDecimalSeparator. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Correct? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /Roger > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > > > > > > From: Costello, Roger L. <coste...@mitre.org > > > > > > > <mailto:coste...@mitre.org>> > > > > > > > > > > > > > > > Sent: Tuesday, August 13, 2019 11:46 AM > > > > > > > > > > > > > > > To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > > > <mailto:users@daffodil.apache.org> > > > > > > > > > > > > > > > Subject: Re: How to model a fixed-length integer that may be > > > padded > > > > > > > with space on the left? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> For the first error, I'd guess that your dfdl:textNumberPattern > > > > > > > has a > > > > > > > > > > > > > > >> decimal character in it. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ah! Yes it does: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > textNumberPattern="#,##0.###;-#,##0.###" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> This makes me think the parsed string is actually "12<CRLF>" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ah! Once again, you are spot on. I had a cr and when I removed > > > it, > > > > > > > the error went away. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Amazing piece of detective work Steve! Thank you! > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /Roger > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > > > > > > From: Steve Lawrence <slawre...@apache.org > > > > > > > <mailto:slawre...@apache.org>> > > > > > > > > > > > > > > > Sent: Tuesday, August 13, 2019 11:29 AM > > > > > > > > > > > > > > > To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > > > <mailto:users@daffodil.apache.org> > > > > > > > > > > > > > > > Subject: [EXT] Re: How to model a fixed-length integer that may > > > be > > > > > > > padded with space on the left? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For the first error, I'd guess that your dfdl:textNumberPattern > > > has > > > > > > > a decimal character in it. Or due to a bug in our number parsing > > > > > > > library (ICU), we also require the property if there's a comma or > > > exponent characters in the pattern. > > > > > > > If the pattern has any of these characters, the property must be > > > > > > > provided so that ICU knows how to consume decimals/groups if it > > > comes > > > > > > > across one when parsing the number. Even though they might be used > > > for > > > > > > > this particular case since it's an integer. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > For the second error, that means that ICU was unable to convert > > > the > > > > > > > string to a number based on the dfdl:textNumberPattern and other > > > textNumber properties. > > > > > > > Common causes of this are the textNumberPattern doesn't allow the > > > > > > > string, or one of the other text number properties aren't set right. > > > > > > > However, in this specific case, the error message makes it look > > > like > > > > > > > there's a newline after the 12. This makes me think the parsed > > > string > > > > > > > is actually "12<CRLF>" or something similar, which will fail to > > > parse > > > > > > > if dfdl:textNumberCheckPolicy="strict". So you need to either 1) > > > > > > > configure your schema to consume this trailing NL via a > > > > > > > terminator/separator/padding/etc. or 2) set > > > > > > > dfdl:textNumberCheckPolicy="lax", which tells Daffodil to strip off > > > leading/trailing whitespace/newlines (among other things). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 8/13/19 10:10 AM, Costello, Roger L. wrote: > > > > > > > > > > > > > > >> Thank you Steve. Truly outstanding response. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> I have a few follow-up questions, please. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> I have this ultra-simple DFDL schema: > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> <xs:element name="input"> > > > > > > > > > > > > > > >> <xs:complexType> > > > > > > > > > > > > > > >> <xs:sequence> > > > > > > > > > > > > > > >> <xs:element name="NumberOfStudents" type="xs:integer" /> > > > > > > > > > > > > > > >> </xs:sequence> > > > > > > > > > > > > > > >> </xs:complexType> > > > > > > > > > > > > > > >> </xs:element> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> My input file (input.txt) contains this: > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> 12 > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> When I run Daffodil, I get these 2 errors: > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> [error] Schema Definition Error: Property > > > > > > > textStandardDecimalSeparator is not defined. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> [error] Parse Error: Convert to Unlimited Size Integer (for > > > > > > > > > > > > > > >> xs:integer): Unable to parse '12 ' (using up all characters). > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> Why do I need to define the decimal point symbol? After all, > > > the > > > > > > > datatype is xs:integer, not xs:decimal. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> For the second error message, I have no clue what it's saying. > > > > > > > What is it saying, please? > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> /Roger > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> -----Original Message----- > > > > > > > > > > > > > > >> From: Steve Lawrence <slawre...@apache.org > > > > > > > <mailto:slawre...@apache.org>> > > > > > > > > > > > > > > >> Sent: Monday, August 12, 2019 12:56 PM > > > > > > > > > > > > > > >> To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> > > > <mailto:users@daffodil.apache.org> > > > <mailto:users@daffodil.apache.org> > > > > > > > > > > > > > > >> Subject: [EXT] Re: How to model a fixed-length integer that may > > > be > > > > > > > padded with space on the left? > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> The two properties aren't used unless dfdl:textTrimKind is set to > "padChar". > > > > > > > Setting that should give you the behavior you expect. You'll also > > > > > > > probably want to set textPadKind="padChar", which will add pad > > > > > > > characters if needed during unparse. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> Agreed that it would be nice if our errors about unused > > > properties > > > > > > > could explain why, but that's a more difficult problem to solve > > > that > > > > > > > just saying when properties aren't used. Especially since sometimes > > > it > > > > > > > requires the combination of various properties for a property to be set. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> - Steve > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> On 8/12/19 12:47 PM, Costello, Roger L. wrote: > > > > > > > > > > > > > > >>> Hello DFDL community, > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> My input contain a Length field that must be of length 4. Here > > > is > > > > > > > a sample > > > > > > > input: > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> .../ 101/... > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> There is a space prior to 101, although it might be hard to see it. > > > > > > > > > > > > > > >>> So that field is of length 4. > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> The Length field could be nil; a dash is the nil value. > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> I figured this is the way to declare the Length element: > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> <xs:elementname="Length" > > > > > > > > > > > > > > >>> nillable="true" > > > > > > > > > > > > > > >>> type="xs:int" > > > > > > > > > > > > > > >>> dfdl:lengthKind="explicit"dfdl:length="4" > > > > > > > > > > > > > > >>> dfdl:lengthUnits="characters" > > > > > > > > > > > > > > >>> dfdl:nilValue="%WSP*;-%WSP*;" > > > > > > > > > > > > > > >>> dfdl:textNumberPadCharacter="%SP;" > > > > > > > > > > > > > > >>> dfdl:textNumberJustification="right"/> > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> But Daffodil gives these warning messages: > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> *Warning: DFDL property was ignored: > > > > > > > > > > > > > > >>> textNumberJustification="right"* > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> *Warning: DFDL property was ignored: > > > > > > > textNumberPadCharacter="%SP;"* > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> How come I get those warnings? > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> Anyway, I removed those two properties and then Daffodil > > > simply > > > > > > > > > > > > > > >>> refused to parse the Length field. How come? What is the right > > > way to do this? > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> /Roger > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> P.S. It would be nice if Daffodil, when outputting a warning > > > > > > > > > > > > > > >>> message, gave a brief explanation of why. For example, why is > > > > > > > textNumberJustification="right" > > > > > > > > > > > > > > >>> ignored? > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > >