SSN's are probably best represented as an xs:string, with a pattern
facet to validate they are the correct format, e.g.
<element name="SSN" dfdl:length="11" ...>
<simpleType>
<restriction base="xs:string">
<pattern value="\d{3}-\d{2}-\d{4}" />
</restriction>
</simpleType>
</element>
It also looks like there are a number of resources online that have more
complex regexs for more accurate SSN validation if needed.
In addition to the unparse issue you've found, another reason for using
string instead of a number is because some SSNs start with zero. Leading
zeros do not appear in the infoset for numbers, so these SSNs would look
like they don't have the required 9 digits in the infoset. That's not a
huge deal, but could potentially be confusing.
Although SSN's kindof look like numbers, it's probably best to not treat
them that way.
If you really don't want the hyphens, you could model an SSN with its
individual parts, e.g. three elements with lengths 3, 2, and 4 for the
separate parts, all with a infix hyphen separator, but I'm not sure that
really gains much except added complexity. And that still has the
leading zero issue so the fields probably still want to be strings anyways.
On 2023-11-26 05:27 AM, Roger L Costello wrote:
It appears the answer is: No.
You might think that the SSN pattern could be expressed this way:
dfdl:textNumberPattern="###,##,####"
dfdl:textStandardGroupingSeparator="-"
That is, a SSN is a group of 3 digits followed by a group of 2 digits followed
by a group of 4 digits, where the groups are separated by dashes.
However, that doesn't work. Here's why.
From the DFDL specification:
-----------------------------------------------------
If a dfdl:textNumberPattern contains multiple grouping separators, the interval between the last
one and the end of the integer defines the primary grouping size, and the interval between the last
two defines the secondary grouping size. All others are ignored, so "###,##,####" ==
"##,##,####"
-----------------------------------------------------
Thus, this SSN:
123-45-6789
is parsed to this:
<SSN>123456789</SSN>
and unparsed to this:
1-23-45-6789
Eek! The parse is what is desired, unparse is not.
Here's the DFDL that I used:
<xs:element name="ssn" type="xs:unsignedInt"
dfdl:textNumberRep="standard"
dfdl:textNumberCheckPolicy="lax"
dfdl:textNumberPattern="###,##,####"
dfdl:textStandardGroupingSeparator="-"
dfdl:textStandardDecimalSeparator="."
dfdl:textStandardBase="10"
dfdl:textNumberRounding="pattern"
/>