I'm not sure I love this possibles solution, and it doesn't scale very well, but what about something like this:

  <element name="field" type="xs:string" nillable="true"
    dfdl:lengthKind="explicit"
    dfdl:length="3"
    dfdl:textStringJustification="left"
    dfdl:textTrimKind="padChar"
    dfdl:textPadKind="padChar"
    dfdl:textStringPadCharacter="%SP;"
    dfdl:nilKind="literalValue"
    dfdl:nilValue="- %SP;- %SP;%SP;-" />

So the field is left-justified and right-padded with spaces. Left padded spaces are not trimmed, so a field like " A " will show up in the infoset with the left space and fail validation. And the nilValue is set to all the combinations of the nil character preceded with a space.

Like I said, this doesn't scale because you need N nilValues for a string of length N. And this scala at all for delimited length fields where you don't know the length of the field, unless you just add a bunch of nilValues up to some size.

If we had something like %SP*; (similar to how we have %WSP*;), then the nilValue could just be "%SP*;-" and this would scale without issue, and work for both fixed length and delimited length fields. I believe %SP*; has come up in the past, so this might be another argument to added it.


On 8/10/22 7:54 AM, Roger L Costello wrote:
Thanks Mike. I implemented your approach. It fails to detect invalid input. Let
me explain.

Input specifications:

   * Fixed length field (3)
   * Nillable, hyphen is the nil value, the hyphen may be anywhere within the 3
     character field
   * Values must be left-justified

Here are examples of valid inputs:

…/AB /…

…/ABC/…

…/-  /…

.../ - /…

…/  -/…

Your solution permits this input (I tested it, Daffodil gives no error or 
warning):

…/ AB/…

Notice that the value is right-justified. That is invalid.

/Roger

/Roger

*From:* Mike Beckerle <mbecke...@apache.org>
*Sent:* Monday, August 8, 2022 3:58 PM
*To:* users@daffodil.apache.org
*Subject:* [EXT] Re: Conflicting requirements: fixed length field, nillable,
some enumeration values shorter than the required length

So I think your requirements are this:

* fixed length 5

* the hyphen nil indicator may have spaces around it

* canonical form is left justified for "-" or any value.

This is the best I could do. I had to surround the nillable element with another
element so as to get left-justification by way of filling of the unused region
of a complex type, with fillByte which is %SP;.

If you want center justified hyphens for the nil case and left-justified strings
for the value case, then I think it's not possible to model this without using
separate elements for the nil and value. (That solution not shown here.)

<*element *name*="Foo"
*dfdl:length*="5"
*dfdl:lengthKind*="explicit"
*dfdl:terminator*="/"
*dfdl:fillByte*="%SP;"* >
/<!--
    The above achieves canonical unparse
    as left-justified fixed length because
    the fillByte will be used to fill unused
    space on the right.

    This only works for fixed length left-justified data.
    If this was right-justified, this trick would not work.
    -->
/<*complexType* >
    <*sequence* >
/<!--
      The below achieves trimming of spaces either side,
      but only when parsing. Nothing is added when unparsing.
      -->
/<*element *name*="value" *nillable*="true"
*dfdl:nilValue*="-"
*dfdl:lengthKind*="delimited"
*dfdl:textStringJustification*="center"
*dfdl:textTrimKind*="padChar"
*dfdl:textPadKind*="none"* >
        <*simpleType* >
          <*restriction *base*="xs:string"* >
            <*enumeration *value*="AB"*/>
            <*enumeration *value*="ABC"*/>
          </*restriction* >
        </*simpleType* >
      </*element* >
      </*sequence* >
</*complexType* >
</*element* >

The TDML file I created for this has these tests in it showing that this works:

    <parserTestCase name="foo1" root="Foo" model="s" roundTrip="onePass">
      <document>-    /</document>
      <infoset>
        <dfdlInfoset>
          <ex:Foo xmlns=""><value xsi:nil="true"/></ex:Foo>
        </dfdlInfoset>
      </infoset>
    </parserTestCase>

    <parserTestCase name="foo2" root="Foo" model="s" roundTrip="twoPass">
      <document> -   /</document>
      <infoset>
        <dfdlInfoset>
          <ex:Foo xmlns=""><value xsi:nil="true"/></ex:Foo>
        </dfdlInfoset>
      </infoset>
    </parserTestCase>

    <parserTestCase name="foo3" root="Foo" model="s" roundTrip="twoPass">
      <document> AB  /</document>
      <infoset>
        <dfdlInfoset>
          <ex:Foo xmlns=""><value>AB</value></ex:Foo>
        </dfdlInfoset>
      </infoset>
    </parserTestCase>

    <parserTestCase name="foo4" root="Foo" model="s" roundTrip="onePass">
      <document>AB   /</document>
      <infoset>
        <dfdlInfoset>
          <ex:Foo xmlns=""><value>AB</value></ex:Foo>
        </dfdlInfoset>
      </infoset>
    </parserTestCase>

On Mon, Aug 8, 2022 at 10:22 AM Roger L Costello <coste...@mitre.org
<mailto:coste...@mitre.org>> wrote:

     Hi Mike,

     I gave your suggested approach a try. It failed.

     With this input:

     …/AB /…

     it works.

     With this input:

     …/ - /…

     it fails, producing this error:

     [error] Validation Error: Foo failed facet checks due to: facet
     enumeration(s): AB|ABC

     Further, even if the approach were to work with this example where the 
field
     length is 3, it would be an untenable approach for longer fixed fields. For
     example, if the field length was 10, then the nilValue would need something
     like 10-factorial whitespace-separated values.

     Do you have another suggested approach?

     /Roger

     *From:* Mike Beckerle <mbecke...@apache.org <mailto:mbecke...@apache.org>>
     *Sent:* Monday, August 8, 2022 9:38 AM
     *To:* users@daffodil.apache.org <mailto:users@daffodil.apache.org>
     *Subject:* [EXT] Re: Conflicting requirements: fixed length field, 
nillable,
     some enumeration values shorter than the required length

     I would try making the nilValue "%SP;-%SP; -". That is two separate
     possibilities for nilValue, one is space-hyphen-space, the other just
     hyphen. (It's a whitespace-separated list of nil values tokens.)

     The first one will be used for unparsing. Both will be tried for parsing.

     That along with justification left might work.

     On Mon, Aug 8, 2022 at 8:01 AM Roger L Costello <coste...@mitre.org
     <mailto:coste...@mitre.org>> wrote:

         Hi Folks,

         I have an input field that is fixed length (3). If there is no data, 
the
         field is to be populated with a hyphen (of course, it must be padded
         with spaces to the required length). The schema has a simpleType with
         enumeration facets. Some enumeration values are less than the required
         length.

         Here's how I specify the field:

         <xs:element name="Foo"
              nillable="true"
              dfdl:nilKind="literalValue"
              dfdl:nilValue="-"
              dfdl:lengthKind="explicit"
              dfdl:length="3"
              dfdl:textTrimKind="padChar"
              dfdl:textPadKind="padChar"
              dfdl:textStringPadCharacter="%SP;"
              dfdl:textStringJustification="center">
              <xs:simpleType>
                  <xs:restriction base="xs:string">
                      <xs:enumeration value="AB"/>
                      <xs:enumeration value="ABC"/>
                  </xs:restriction>
              </xs:simpleType>
         </xs:element>

         Notice dfdl:textStringJustification="center" which is fine for the
         nillable value (hyphen) but not for a regular value such as AB which
         should be left justified. As the schema is, the input could contain 
this
         (assume slash separators):

         .../ AB/...

         which is incorrect.

         So, there are conflicting requirements: the nillable value needs
         dfdl:textStringJustification="center" whereas the normal values need
         dfdl:textStringJustification="left". What to do about this?

         /Roger


Reply via email to