Thanks Steve. What you say makes perfect sense.

It is regrettable that it is called Social Security **Number** given that a SSN 
is not a number.

/Roger

From: Steve Lawrence <slawre...@apache.org>
Sent: Monday, November 27, 2023 8:22 AM
To: users@daffodil.apache.org
Subject: [EXT] Re: Is there a way to represent Social Security Numbers as 
unsigned integers in DFDL?

SSN's are probably best represented as an xs: string, with a pattern facet to 
validate they are the correct format, e. g. <element name="SSN" dfdl: 
length="11" .. . > <simpleType> <restriction base="xs: string"> <pattern 
value="\d{3}-\d{2}-\d{4}"


SSN's are probably best represented as an xs:string, with a pattern

facet to validate they are the correct format, e.g.



   <element name="SSN" dfdl:length="11" ...>

     <simpleType>

       <restriction base="xs:string">

         <pattern value="\d{3}-\d{2}-\d{4}" />

       </restriction>

     </simpleType>

   </element>



It also looks like there are a number of resources online that have more

complex regexs for more accurate SSN validation if needed.



In addition to the unparse issue you've found, another reason for using

string instead of a number is because some SSNs start with zero. Leading

zeros do not appear in the infoset for numbers, so these SSNs would look

like they don't have the required 9 digits in the infoset. That's not a

huge deal, but could potentially be confusing.



Although SSN's kindof look like numbers, it's probably best to not treat

them that way.



If you really don't want the hyphens, you could model an SSN with its

individual parts, e.g. three elements with lengths 3, 2, and 4 for the

separate parts, all with a infix hyphen separator, but I'm not sure that

really gains much except added complexity. And that still has the

leading zero issue so the fields probably still want to be strings anyways.







On 2023-11-26 05:27 AM, Roger L Costello wrote:

> It appears the answer is: No.

>

> You might think that the SSN pattern could be expressed this way:

>

> dfdl:textNumberPattern="###,##,####"

> dfdl:textStandardGroupingSeparator="-"

>

> That is, a SSN is a group of 3 digits followed by a group of 2 digits 
> followed by a group of 4 digits, where the groups are separated by dashes.

>

> However, that doesn't work. Here's why.

>

>  From the DFDL specification:

> -----------------------------------------------------

> If a dfdl:textNumberPattern contains multiple grouping separators, the 
> interval between the last one and the end of the integer defines the primary 
> grouping size, and the interval between the last two defines the secondary 
> grouping size. All others are ignored, so "###,##,####" == "##,##,####"

> -----------------------------------------------------

>

> Thus, this SSN:

>

> 123-45-6789

>

> is parsed to this:

>

> <SSN>123456789</SSN>

>

> and unparsed to this:

>

> 1-23-45-6789

>

> Eek! The parse is what is desired, unparse is not.

>

> Here's the DFDL that I used:

>

> <xs:element name="ssn" type="xs:unsignedInt"

>      dfdl:textNumberRep="standard"

>      dfdl:textNumberCheckPolicy="lax"

>      dfdl:textNumberPattern="###,##,####"

>      dfdl:textStandardGroupingSeparator="-"

>      dfdl:textStandardDecimalSeparator="."

>      dfdl:textStandardBase="10"

>      dfdl:textNumberRounding="pattern"

> />


Reply via email to