You said the length is 100, so that's what's going to want to be the
lengthKind 'explicit' length.

What about using your regex but via a pattern facet?

<element name="Foo" dfdl:lengthKind='explicit' dfdl:length='100'>
  <simpleType>
    <restriction base="xs:string">
       <pattern value="[A-Z]{100}|[ ]*-[ ]*"/>
    </restriction>
  </simpleType>
</element>

You should be able to trim spaces as well from this so that you will get
either 100 characters of A-Z or a single "-" character as the string's
actual length.

Note that in this case your regex is simpler. The two "[ ]*" are gone
because the spaces will be trimmed from both ends of the string.

<element name="Foo" dfdl:lengthKind='explicit' dfdl:length='100'
   dfdl:textTrimKind='padChar'
   dfdl:textStringPadCharacter='%SP;'
   dfdl:textPadKind='padChar'
   dfdl:textStringJustification="center">
  <simpleType>
    <restriction base="xs:string">
       <pattern value="[A-Z]{100}|-"/>
    </restriction>
  </simpleType>
</element>

I did not run this DFDL, but this sort of thing is typical of fixed length
data.

On Thu, Jul 28, 2022 at 8:52 AM Roger L Costello <coste...@mitre.org> wrote:

> Hi Folks,
>
> The text data format that I am writing a DFDL schema for has a field
> (let's name it "Foo") with a fixed width. Let's say the width is 100
> characters. The content of the field is uppercase letters. If there is no
> data available to populate the field, it must be populated with a single
> hyphen (surrounded by spaces to ensure the field has a width of 100). The
> hyphen may be in any position within the field. For reasons I will not
> share, I must specify the field's content using a regex:
>
> lengthKind=pattern
> lengthPattern=[A-Z]{100}
>
> However, that lengthPattern doesn't take into account the hyphen that is
> needed when there is no data. So I updated the regex like this:
>
> lengthPattern=[A-Z]{100}|[ ]*-[ ]*
>
> However, the right-hand side of that regex (which deals with the hyphen)
> doesn't constrain the length of the field. Recall the hyphen may be
> positioned anywhere within the 100 character field. Writing a regex that
> specifies all possible positions of the hyphen, while ensuring the field is
> 100 characters, is not reasonable.
>
> So it would seem that I need to specify length=100 on the element
> declaration:
>
> lengthKind=explicit
> length=100
>
> But now I have conflicting requirements:
>
> 1. The element declaration needs to specify lengthKind=pattern for the
> regex
>
> 2. The element declaration needs to specify lengthKind=explicit for the
> field length
>
> That's a problem. That's not legal.
>
> It other words, I need this illegal DFDL:
>
> <xs:element name="Foo"
>         nillable="true"
>         dfdl: nilValue="-"
>         dfdl:lengthKind="explicit"
>         dfdl:length="100"
>         dfdl:lengthUnits="characters"
>         dfdl:lengthKind="pattern"
>         dfdl:lengthPattern="[A-Z]{100}|[ ]*-[ ]*">
>    <xs:simpleType>
>         <xs:annotation>
>             <xs:appinfo source="http://www.ogf.org/dfdl/";>
>                 <dfdl:assert test="{ (fn:nilled(.)) or (. ne '') }"/>
>             </xs:appinfo>
>         </xs:annotation>
>         <xs:restriction base="xs:string"/>
>     </xs:simpleType>
> </xs:element>
>
> Is there a solution to this problem? If not, is there a workaround?
>
> /Roger
>

Reply via email to