The below DFDL schema works fine. However, if I remove dfdl:textStandardDecimalSeparator then I get an error. Why? The dfdl:textNumberPattern does not contain "a decimal separator symbol ("."), or the E or @ symbols" so I shouldn't need it, right? /Roger
<xs:element name="input"> <xs:complexType> <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition="infix"> <xs:element name="Country" type="xs:string" /> <xs:element name="NumberOfStudents" type="xs:integer" dfdl:textNumberPattern="###,###;-#" dfdl:textNumberRep="standard" dfdl:textNumberCheckPolicy="strict" dfdl:textStandardDecimalSeparator="{ if (../Country eq 'US') then '.' else if (../Country eq 'FR') then ',' else if (../Country eq 'UK') then '.' else '.' }" dfdl:textStandardGroupingSeparator="{ if (../Country eq 'US') then ',' else if (../Country eq 'FR') then '.' else if (../Country eq 'UK') then '%SP;' else ',' }"/> </xs:sequence> </xs:complexType> </xs:element> -----Original Message----- From: Steve Lawrence <slawre...@apache.org> Sent: Wednesday, August 14, 2019 8:40 AM To: users@daffodil.apache.org Subject: [EXT] Re: How to model a fixed-length integer that may be padded with space on the left? We use the ICU4J library for handling text to number conversion, and it has a parameter for strict vs lax parsing--we just set that flag based on the value of dfdl:textNumberCheckPolicy. Unfortunately, ICU4J's implementation of strict vs lax doesn't seem to exactly match DFDL's description (the plus sign is one example, but I believe there are others). We'd likely have to completely implement number parsing ourselves to match the DFDL spec exactly, but that would be a large effort and really isn't a high priority. That said, I believe the original intention of DFDL was to match ICU's behavior, so this may have just been an oversight. On 8/14/19 8:32 AM, Costello, Roger L. wrote: > Thank you Steve! > > You wrote: > > > If you set textNumberCheckPolicy="lax", then > > > we do ignore leading plus signs in the data > > The DFDL specification doesn't seem to say that a leading plus sign is > ignored. > Here's what it says: > > If 'lax' and dfdl:textNumberRep is 'standard' then grouping separators > are ignored, leading and trailing whitespace is ignored, leading > zeros are ignored and quoted characters may be omitted. > > Nothing about ignoring plus signs in that. > > Is "ignoring leading plus sign" a Daffodil-specific feature? > > /Roger > > -----Original Message----- > From: Steve Lawrence <slawre...@apache.org<mailto:slawre...@apache.org>> > Sent: Wednesday, August 14, 2019 7:46 AM > To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> > Subject: [EXT] Re: How to model a fixed-length integer that may be > padded with space on the left? > > That is a function of the value of dfdl:textNumberPattern property, > which actually describes two subpatterns for the format of text > numbers--one for positive values and one for negative values. The full > syntax is > > dfdl:textNumberPattern="positivePattern;negativePattern" > > For example, you'll sometimes see formats where the negative number is > wrapped in parenthesis instead of prefixed with a minus sign, so the > property would look something like this: > > dfdl:textNumberPattern="#,##0.###;(#,##0.###)" > > If the semicolon/negativePattern is not provided in the pattern it is > assumed that negativePattern is the same as positivePattern but with a minus > sign prefix. > > If you require a positive sign at the beginning of the number, you > want a pattern that's something like this: > > dfdl:textNumberPattern="+#,##0.###;-#,##0.###" > > Note how the positive subpattern has a plus sign prefix and the > negative subpattern has a minus sign prefix. > > However, this now requires that positive numbers always have a plus > sign, so your "12" will fail to parse. Unfortunately, there's no way > in the pattern syntax to make the plus sign prefix optional. > > But if you set dfdl:textNumberCheckPolicy="lax", then we do ignore > leading plus signs in the data, and whatever pattern you're currently > using should work for both "12" and "+12". > > On 8/14/19 7:19 AM, Costello, Roger L. wrote: > > > I did some further testing after fixing the errors. > > > > > > Now the DFDL schema processes this input perfectly: > > > > > > 12 > > > > > > And it processes this input perfectly: > > > > > > -12 > > > > > > But it gives an error with this input: > > > > > > +12 > > > > > > Here's the error message: > > > > > > *[error] Parse Error: Convert to Unlimited Size Integer (for > > > xs:integer): Unable to parse '+12' (using up all characters).* > > > > > > Why do I get that error? It's illegal to use the plus sign with numbers? > > > > > > /Roger > > > > > > -----Original Message----- > > > From: Steve Lawrence <slawre...@apache.org > <mailto:slawre...@apache.org>> > > > Sent: Tuesday, August 13, 2019 12:06 PM > > > To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > > > Subject: [EXT] Re: How to model a fixed-length integer that may be > > > padded with space on the left? > > > > > > Yep, sounds correct to me. > > > > > > - Steve > > > > > > On 8/13/19 12:03 PM, Costello, Roger L. wrote: > > > > > > > Steve, one more thing, please. I'd like for you to confirm my > "lesson learned." > > > > > > > > > > > > > > Lesson Learned: > > > > > > > > > > > > > > If the value of textNumberPattern contains a decimal point, > comma, > > > or exponent character, then you must define textStandardDecimalSeparator. > > > > > > > > > > > > > > Correct? > > > > > > > > > > > > > > /Roger > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Costello, Roger L. <coste...@mitre.org > > > <mailto:coste...@mitre.org>> > > > > > > > Sent: Tuesday, August 13, 2019 11:46 AM > > > > > > > To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > > > > > > > Subject: Re: How to model a fixed-length integer that may be > padded > > > with space on the left? > > > > > > > > > > > > > >> For the first error, I'd guess that your dfdl:textNumberPattern > > > has a > > > > > > >> decimal character in it. > > > > > > > > > > > > > > Ah! Yes it does: > > > > > > > > > > > > > > textNumberPattern="#,##0.###;-#,##0.###" > > > > > > > > > > > > > >> This makes me think the parsed string is actually "12<CRLF>" > > > > > > > > > > > > > > Ah! Once again, you are spot on. I had a cr and when I removed > it, > > > the error went away. > > > > > > > > > > > > > > Amazing piece of detective work Steve! Thank you! > > > > > > > > > > > > > > /Roger > > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Steve Lawrence <slawre...@apache.org > > > <mailto:slawre...@apache.org>> > > > > > > > Sent: Tuesday, August 13, 2019 11:29 AM > > > > > > > To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > > > > > > > Subject: [EXT] Re: How to model a fixed-length integer that may > be > > > padded with space on the left? > > > > > > > > > > > > > > For the first error, I'd guess that your dfdl:textNumberPattern > has > > > a decimal character in it. Or due to a bug in our number parsing > > > library (ICU), we also require the property if there's a comma or > exponent characters in the pattern. > > > If the pattern has any of these characters, the property must be > > > provided so that ICU knows how to consume decimals/groups if it > comes > > > across one when parsing the number. Even though they might be used > for > > > this particular case since it's an integer. > > > > > > > > > > > > > > For the second error, that means that ICU was unable to convert > the > > > string to a number based on the dfdl:textNumberPattern and other > textNumber properties. > > > Common causes of this are the textNumberPattern doesn't allow the > > > string, or one of the other text number properties aren't set right. > > > However, in this specific case, the error message makes it look > like > > > there's a newline after the 12. This makes me think the parsed > string > > > is actually "12<CRLF>" or something similar, which will fail to > parse > > > if dfdl:textNumberCheckPolicy="strict". So you need to either 1) > > > configure your schema to consume this trailing NL via a > > > terminator/separator/padding/etc. or 2) set > > > dfdl:textNumberCheckPolicy="lax", which tells Daffodil to strip off > leading/trailing whitespace/newlines (among other things). > > > > > > > > > > > > > > On 8/13/19 10:10 AM, Costello, Roger L. wrote: > > > > > > >> Thank you Steve. Truly outstanding response. > > > > > > >> > > > > > > >> I have a few follow-up questions, please. > > > > > > >> > > > > > > >> I have this ultra-simple DFDL schema: > > > > > > >> > > > > > > >> <xs:element name="input"> > > > > > > >> <xs:complexType> > > > > > > >> <xs:sequence> > > > > > > >> <xs:element name="NumberOfStudents" type="xs:integer" /> > > > > > > >> </xs:sequence> > > > > > > >> </xs:complexType> > > > > > > >> </xs:element> > > > > > > >> > > > > > > >> My input file (input.txt) contains this: > > > > > > >> > > > > > > >> 12 > > > > > > >> > > > > > > >> When I run Daffodil, I get these 2 errors: > > > > > > >> > > > > > > >> [error] Schema Definition Error: Property > > > textStandardDecimalSeparator is not defined. > > > > > > >> > > > > > > >> [error] Parse Error: Convert to Unlimited Size Integer (for > > > > > > >> xs:integer): Unable to parse '12 ' (using up all characters). > > > > > > >> > > > > > > >> Why do I need to define the decimal point symbol? After all, > the > > > datatype is xs:integer, not xs:decimal. > > > > > > >> > > > > > > >> For the second error message, I have no clue what it's saying. > > > What is it saying, please? > > > > > > >> > > > > > > >> /Roger > > > > > > >> > > > > > > >> -----Original Message----- > > > > > > >> From: Steve Lawrence <slawre...@apache.org > > > <mailto:slawre...@apache.org>> > > > > > > >> Sent: Monday, August 12, 2019 12:56 PM > > > > > > >> To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > <mailto:users@daffodil.apache.org> > > > > > > >> Subject: [EXT] Re: How to model a fixed-length integer that may > be > > > padded with space on the left? > > > > > > >> > > > > > > >> The two properties aren't used unless dfdl:textTrimKind is set to > "padChar". > > > Setting that should give you the behavior you expect. You'll also > > > probably want to set textPadKind="padChar", which will add pad > > > characters if needed during unparse. > > > > > > >> > > > > > > >> Agreed that it would be nice if our errors about unused > properties > > > could explain why, but that's a more difficult problem to solve > that > > > just saying when properties aren't used. Especially since sometimes > it > > > requires the combination of various properties for a property to be set. > > > > > > >> > > > > > > >> - Steve > > > > > > >> > > > > > > >> On 8/12/19 12:47 PM, Costello, Roger L. wrote: > > > > > > >>> Hello DFDL community, > > > > > > >>> > > > > > > >>> My input contain a Length field that must be of length 4. Here > is > > > a sample > > > input: > > > > > > >>> > > > > > > >>> .../ 101/... > > > > > > >>> > > > > > > >>> There is a space prior to 101, although it might be hard to see it. > > > > > > >>> So that field is of length 4. > > > > > > >>> > > > > > > >>> The Length field could be nil; a dash is the nil value. > > > > > > >>> > > > > > > >>> I figured this is the way to declare the Length element: > > > > > > >>> > > > > > > >>> <xs:elementname="Length" > > > > > > >>> nillable="true" > > > > > > >>> type="xs:int" > > > > > > >>> dfdl:lengthKind="explicit"dfdl:length="4" > > > > > > >>> dfdl:lengthUnits="characters" > > > > > > >>> dfdl:nilValue="%WSP*;-%WSP*;" > > > > > > >>> dfdl:textNumberPadCharacter="%SP;" > > > > > > >>> dfdl:textNumberJustification="right"/> > > > > > > >>> > > > > > > >>> But Daffodil gives these warning messages: > > > > > > >>> > > > > > > >>> *Warning: DFDL property was ignored: > > > > > > >>> textNumberJustification="right"* > > > > > > >>> > > > > > > >>> *Warning: DFDL property was ignored: > > > textNumberPadCharacter="%SP;"* > > > > > > >>> > > > > > > >>> How come I get those warnings? > > > > > > >>> > > > > > > >>> Anyway, I removed those two properties and then Daffodil > simply > > > > > > >>> refused to parse the Length field. How come? What is the right > way to do this? > > > > > > >>> > > > > > > >>> /Roger > > > > > > >>> > > > > > > >>> P.S. It would be nice if Daffodil, when outputting a warning > > > > > > >>> message, gave a brief explanation of why. For example, why is > > > textNumberJustification="right" > > > > > > >>> ignored? > > > > > > >>> > > > > > > >> > > > > > > > > > > >