The plot thickens on this slightly. I raised this to the DFDL Workgroup, and Steve Hanson verified that IBM DFDL (Java version) implements the textNumberCheckPolicy="lax" behavior as currently described in the DFDL Spec, where if a textNumberPattern specifies a "+", that the plus sign is then required to be in the data even for "lax".
Interestingly, he also says they are "just using ICU". So it seems there must be some initialization/configuration of ICU that affects whether or not ICU enforces the "+" for lax or not. There is general agreement that "lax" should mean the parser tolerates a + sign, but does not require one. There is general agreement that DFDL's spec was intended for "lax" to match the behavior of the ICU library "lax" setting. ________________________________ From: Steve Lawrence <slawre...@apache.org> Sent: Wednesday, August 14, 2019 8:40 AM To: users@daffodil.apache.org <users@daffodil.apache.org> Subject: Re: How to model a fixed-length integer that may be padded with space on the left? We use the ICU4J library for handling text to number conversion, and it has a parameter for strict vs lax parsing--we just set that flag based on the value of dfdl:textNumberCheckPolicy. Unfortunately, ICU4J's implementation of strict vs lax doesn't seem to exactly match DFDL's description (the plus sign is one example, but I believe there are others). We'd likely have to completely implement number parsing ourselves to match the DFDL spec exactly, but that would be a large effort and really isn't a high priority. That said, I believe the original intention of DFDL was to match ICU's behavior, so this may have just been an oversight. On 8/14/19 8:32 AM, Costello, Roger L. wrote: > Thank you Steve! > > You wrote: > > > If you set textNumberCheckPolicy="lax", then > > > we do ignore leading plus signs in the data > > The DFDL specification doesn't seem to say that a leading plus sign is > ignored. > Here's what it says: > > If 'lax' and dfdl:textNumberRep is 'standard' then grouping separators are > ignored, leading and trailing whitespace is ignored, leading zeros are > ignored > and quoted characters may be omitted. > > Nothing about ignoring plus signs in that. > > Is “ignoring leading plus sign” a Daffodil-specific feature? > > /Roger > > -----Original Message----- > From: Steve Lawrence <slawre...@apache.org> > Sent: Wednesday, August 14, 2019 7:46 AM > To: users@daffodil.apache.org > Subject: [EXT] Re: How to model a fixed-length integer that may be padded with > space on the left? > > That is a function of the value of dfdl:textNumberPattern property, which > actually describes two subpatterns for the format of text numbers--one for > positive values and one for negative values. The full syntax is > > dfdl:textNumberPattern="positivePattern;negativePattern" > > For example, you'll sometimes see formats where the negative number is wrapped > in parenthesis instead of prefixed with a minus sign, so the property would > look > something like this: > > dfdl:textNumberPattern="#,##0.###;(#,##0.###)" > > If the semicolon/negativePattern is not provided in the pattern it is assumed > that negativePattern is the same as positivePattern but with a minus sign > prefix. > > If you require a positive sign at the beginning of the number, you want a > pattern that's something like this: > > dfdl:textNumberPattern="+#,##0.###;-#,##0.###" > > Note how the positive subpattern has a plus sign prefix and the negative > subpattern has a minus sign prefix. > > However, this now requires that positive numbers always have a plus sign, so > your "12" will fail to parse. Unfortunately, there's no way in the pattern > syntax to make the plus sign prefix optional. > > But if you set dfdl:textNumberCheckPolicy="lax", then we do ignore leading > plus > signs in the data, and whatever pattern you're currently using should work for > both "12" and "+12". > > On 8/14/19 7:19 AM, Costello, Roger L. wrote: > > > I did some further testing after fixing the errors. > > > > > > Now the DFDL schema processes this input perfectly: > > > > > > 12 > > > > > > And it processes this input perfectly: > > > > > > -12 > > > > > > But it gives an error with this input: > > > > > > +12 > > > > > > Here's the error message: > > > > > > *[error] Parse Error: Convert to Unlimited Size Integer (for > > > xs:integer): Unable to parse '+12' (using up all characters).* > > > > > > Why do I get that error? It's illegal to use the plus sign with numbers? > > > > > > /Roger > > > > > > -----Original Message----- > > > From: Steve Lawrence <slawre...@apache.org <mailto:slawre...@apache.org>> > > > Sent: Tuesday, August 13, 2019 12:06 PM > > > To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> > > > Subject: [EXT] Re: How to model a fixed-length integer that may be > > > padded with space on the left? > > > > > > Yep, sounds correct to me. > > > > > > - Steve > > > > > > On 8/13/19 12:03 PM, Costello, Roger L. wrote: > > > > > > > Steve, one more thing, please. I'd like for you to confirm my "lesson > learned." > > > > > > > > > > > > > > Lesson Learned: > > > > > > > > > > > > > > If the value of textNumberPattern contains a decimal point, comma, > > > or exponent character, then you must define textStandardDecimalSeparator. > > > > > > > > > > > > > > Correct? > > > > > > > > > > > > > > /Roger > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Costello, Roger L. <coste...@mitre.org > > > <mailto:coste...@mitre.org>> > > > > > > > Sent: Tuesday, August 13, 2019 11:46 AM > > > > > > > To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > > > > > > > Subject: Re: How to model a fixed-length integer that may be padded > > > with space on the left? > > > > > > > > > > > > > >> For the first error, I'd guess that your dfdl:textNumberPattern > > > has a > > > > > > >> decimal character in it. > > > > > > > > > > > > > > Ah! Yes it does: > > > > > > > > > > > > > > textNumberPattern="#,##0.###;-#,##0.###" > > > > > > > > > > > > > >> This makes me think the parsed string is actually "12<CRLF>" > > > > > > > > > > > > > > Ah! Once again, you are spot on. I had a cr and when I removed it, > > > the error went away. > > > > > > > > > > > > > > Amazing piece of detective work Steve! Thank you! > > > > > > > > > > > > > > /Roger > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Steve Lawrence <slawre...@apache.org > > > <mailto:slawre...@apache.org>> > > > > > > > Sent: Tuesday, August 13, 2019 11:29 AM > > > > > > > To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > > > > > > > Subject: [EXT] Re: How to model a fixed-length integer that may be > > > padded with space on the left? > > > > > > > > > > > > > > For the first error, I'd guess that your dfdl:textNumberPattern has > > > a decimal character in it. Or due to a bug in our number parsing > > > library (ICU), we also require the property if there's a comma or exponent > characters in the pattern. > > > If the pattern has any of these characters, the property must be > > > provided so that ICU knows how to consume decimals/groups if it comes > > > across one when parsing the number. Even though they might be used for > > > this particular case since it's an integer. > > > > > > > > > > > > > > For the second error, that means that ICU was unable to convert the > > > string to a number based on the dfdl:textNumberPattern and other textNumber > properties. > > > Common causes of this are the textNumberPattern doesn't allow the > > > string, or one of the other text number properties aren't set right. > > > However, in this specific case, the error message makes it look like > > > there's a newline after the 12. This makes me think the parsed string > > > is actually "12<CRLF>" or something similar, which will fail to parse > > > if dfdl:textNumberCheckPolicy="strict". So you need to either 1) > > > configure your schema to consume this trailing NL via a > > > terminator/separator/padding/etc. or 2) set > > > dfdl:textNumberCheckPolicy="lax", which tells Daffodil to strip off > leading/trailing whitespace/newlines (among other things). > > > > > > > > > > > > > > On 8/13/19 10:10 AM, Costello, Roger L. wrote: > > > > > > >> Thank you Steve. Truly outstanding response. > > > > > > >> > > > > > > >> I have a few follow-up questions, please. > > > > > > >> > > > > > > >> I have this ultra-simple DFDL schema: > > > > > > >> > > > > > > >> <xs:element name="input"> > > > > > > >> <xs:complexType> > > > > > > >> <xs:sequence> > > > > > > >> <xs:element name="NumberOfStudents" type="xs:integer" /> > > > > > > >> </xs:sequence> > > > > > > >> </xs:complexType> > > > > > > >> </xs:element> > > > > > > >> > > > > > > >> My input file (input.txt) contains this: > > > > > > >> > > > > > > >> 12 > > > > > > >> > > > > > > >> When I run Daffodil, I get these 2 errors: > > > > > > >> > > > > > > >> [error] Schema Definition Error: Property > > > textStandardDecimalSeparator is not defined. > > > > > > >> > > > > > > >> [error] Parse Error: Convert to Unlimited Size Integer (for > > > > > > >> xs:integer): Unable to parse '12 ' (using up all characters). > > > > > > >> > > > > > > >> Why do I need to define the decimal point symbol? After all, the > > > datatype is xs:integer, not xs:decimal. > > > > > > >> > > > > > > >> For the second error message, I have no clue what it's saying. > > > What is it saying, please? > > > > > > >> > > > > > > >> /Roger > > > > > > >> > > > > > > >> -----Original Message----- > > > > > > >> From: Steve Lawrence <slawre...@apache.org > > > <mailto:slawre...@apache.org>> > > > > > > >> Sent: Monday, August 12, 2019 12:56 PM > > > > > > >> To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > > > > > > >> Subject: [EXT] Re: How to model a fixed-length integer that may be > > > padded with space on the left? > > > > > > >> > > > > > > >> The two properties aren't used unless dfdl:textTrimKind is set to > "padChar". > > > Setting that should give you the behavior you expect. You'll also > > > probably want to set textPadKind="padChar", which will add pad > > > characters if needed during unparse. > > > > > > >> > > > > > > >> Agreed that it would be nice if our errors about unused properties > > > could explain why, but that's a more difficult problem to solve that > > > just saying when properties aren't used. Especially since sometimes it > > > requires the combination of various properties for a property to be set. > > > > > > >> > > > > > > >> - Steve > > > > > > >> > > > > > > >> On 8/12/19 12:47 PM, Costello, Roger L. wrote: > > > > > > >>> Hello DFDL community, > > > > > > >>> > > > > > > >>> My input contain a Length field that must be of length 4. Here is > > > a sample > > > input: > > > > > > >>> > > > > > > >>> .../ 101/... > > > > > > >>> > > > > > > >>> There is a space prior to 101, although it might be hard to see it. > > > > > > >>> So that field is of length 4. > > > > > > >>> > > > > > > >>> The Length field could be nil; a dash is the nil value. > > > > > > >>> > > > > > > >>> I figured this is the way to declare the Length element: > > > > > > >>> > > > > > > >>> <xs:elementname="Length" > > > > > > >>> nillable="true" > > > > > > >>> type="xs:int" > > > > > > >>> dfdl:lengthKind="explicit"dfdl:length="4" > > > > > > >>> dfdl:lengthUnits="characters" > > > > > > >>> dfdl:nilValue="%WSP*;-%WSP*;" > > > > > > >>> dfdl:textNumberPadCharacter="%SP;" > > > > > > >>> dfdl:textNumberJustification="right"/> > > > > > > >>> > > > > > > >>> But Daffodil gives these warning messages: > > > > > > >>> > > > > > > >>> *Warning: DFDL property was ignored: > > > > > > >>> textNumberJustification="right"* > > > > > > >>> > > > > > > >>> *Warning: DFDL property was ignored: > > > textNumberPadCharacter="%SP;"* > > > > > > >>> > > > > > > >>> How come I get those warnings? > > > > > > >>> > > > > > > >>> Anyway, I removed those two properties and then Daffodil simply > > > > > > >>> refused to parse the Length field. How come? What is the right way to > do > this? > > > > > > >>> > > > > > > >>> /Roger > > > > > > >>> > > > > > > >>> P.S. It would be nice if Daffodil, when outputting a warning > > > > > > >>> message, gave a brief explanation of why. For example, why is > > > textNumberJustification="right" > > > > > > >>> ignored? > > > > > > >>> > > > > > > >> > > > > > > > > > > >