Thanks Steve. What you say makes perfect sense. It is regrettable that it is called Social Security **Number** given that a SSN is not a number.
/Roger From: Steve Lawrence <slawre...@apache.org> Sent: Monday, November 27, 2023 8:22 AM To: users@daffodil.apache.org Subject: [EXT] Re: Is there a way to represent Social Security Numbers as unsigned integers in DFDL? SSN's are probably best represented as an xs: string, with a pattern facet to validate they are the correct format, e. g. <element name="SSN" dfdl: length="11" .. . > <simpleType> <restriction base="xs: string"> <pattern value="\d{3}-\d{2}-\d{4}" SSN's are probably best represented as an xs:string, with a pattern facet to validate they are the correct format, e.g. <element name="SSN" dfdl:length="11" ...> <simpleType> <restriction base="xs:string"> <pattern value="\d{3}-\d{2}-\d{4}" /> </restriction> </simpleType> </element> It also looks like there are a number of resources online that have more complex regexs for more accurate SSN validation if needed. In addition to the unparse issue you've found, another reason for using string instead of a number is because some SSNs start with zero. Leading zeros do not appear in the infoset for numbers, so these SSNs would look like they don't have the required 9 digits in the infoset. That's not a huge deal, but could potentially be confusing. Although SSN's kindof look like numbers, it's probably best to not treat them that way. If you really don't want the hyphens, you could model an SSN with its individual parts, e.g. three elements with lengths 3, 2, and 4 for the separate parts, all with a infix hyphen separator, but I'm not sure that really gains much except added complexity. And that still has the leading zero issue so the fields probably still want to be strings anyways. On 2023-11-26 05:27 AM, Roger L Costello wrote: > It appears the answer is: No. > > You might think that the SSN pattern could be expressed this way: > > dfdl:textNumberPattern="###,##,####" > dfdl:textStandardGroupingSeparator="-" > > That is, a SSN is a group of 3 digits followed by a group of 2 digits > followed by a group of 4 digits, where the groups are separated by dashes. > > However, that doesn't work. Here's why. > > From the DFDL specification: > ----------------------------------------------------- > If a dfdl:textNumberPattern contains multiple grouping separators, the > interval between the last one and the end of the integer defines the primary > grouping size, and the interval between the last two defines the secondary > grouping size. All others are ignored, so "###,##,####" == "##,##,####" > ----------------------------------------------------- > > Thus, this SSN: > > 123-45-6789 > > is parsed to this: > > <SSN>123456789</SSN> > > and unparsed to this: > > 1-23-45-6789 > > Eek! The parse is what is desired, unparse is not. > > Here's the DFDL that I used: > > <xs:element name="ssn" type="xs:unsignedInt" > dfdl:textNumberRep="standard" > dfdl:textNumberCheckPolicy="lax" > dfdl:textNumberPattern="###,##,####" > dfdl:textStandardGroupingSeparator="-" > dfdl:textStandardDecimalSeparator="." > dfdl:textStandardBase="10" > dfdl:textNumberRounding="pattern" > />