SSN's are probably best represented as an xs:string, with a pattern facet to validate they are the correct format, e.g.

  <element name="SSN" dfdl:length="11" ...>
    <simpleType>
      <restriction base="xs:string">
        <pattern value="\d{3}-\d{2}-\d{4}" />
      </restriction>
    </simpleType>
  </element>

It also looks like there are a number of resources online that have more complex regexs for more accurate SSN validation if needed.

In addition to the unparse issue you've found, another reason for using string instead of a number is because some SSNs start with zero. Leading zeros do not appear in the infoset for numbers, so these SSNs would look like they don't have the required 9 digits in the infoset. That's not a huge deal, but could potentially be confusing.

Although SSN's kindof look like numbers, it's probably best to not treat them that way.

If you really don't want the hyphens, you could model an SSN with its individual parts, e.g. three elements with lengths 3, 2, and 4 for the separate parts, all with a infix hyphen separator, but I'm not sure that really gains much except added complexity. And that still has the leading zero issue so the fields probably still want to be strings anyways.



On 2023-11-26 05:27 AM, Roger L Costello wrote:
It appears the answer is: No.

You might think that the SSN pattern could be expressed this way:

dfdl:textNumberPattern="###,##,####"
dfdl:textStandardGroupingSeparator="-"

That is, a SSN is a group of 3 digits followed by a group of 2 digits followed 
by a group of 4 digits, where the groups are separated by dashes.

However, that doesn't work. Here's why.

 From the DFDL specification:
-----------------------------------------------------
If a dfdl:textNumberPattern contains multiple grouping separators, the interval between the last 
one and the end of the integer defines the primary grouping size, and the interval between the last 
two defines the secondary grouping size. All others are ignored, so "###,##,####" == 
"##,##,####"
-----------------------------------------------------

Thus, this SSN:

123-45-6789

is parsed to this:

<SSN>123456789</SSN>

and unparsed to this:

1-23-45-6789

Eek! The parse is what is desired, unparse is not.

Here's the DFDL that I used:

<xs:element name="ssn" type="xs:unsignedInt"
     dfdl:textNumberRep="standard"
     dfdl:textNumberCheckPolicy="lax"
     dfdl:textNumberPattern="###,##,####"
     dfdl:textStandardGroupingSeparator="-"
     dfdl:textStandardDecimalSeparator="."
     dfdl:textStandardBase="10"
     dfdl:textNumberRounding="pattern"
/>

Reply via email to