Ok, if you need to use lengthKind="pattern" you can do the following:

<xs:element name="RunwayIdentifier" type="xs:string"
    dfdl:lengthKind="pattern"
    dfdl:lengthPattern="[0-9]{2,2}(L|R){0,1}"
    xs:minLength="3" xs:maxLength="3"
    dfdl:textPadKind="char"
    dfdl:textStringJustification="left"
    dfdl:textStringPadChar="%SP;" />

Note that I dropped the optional space from the regex.  This should produce the 
result you are looking for.  According to the spec, DFDL will pad the output to 
the given xs:minLength if necessary.

Josh Adams
________________________________
From: Roger L Costello <coste...@mitre.org>
Sent: Thursday, July 29, 2021 3:21 PM
To: users@daffodil.apache.org <users@daffodil.apache.org>
Subject: Re: I am confused about what Daffodil does, in a fixed field, with the 
data following the data that matches a regex


Hello Josh,



No doubt you are correct that a better solution is do as you describe.



However, for reasons that I cannot reveal, I must use regexes. Thus my question.



/Roger



From: Adams, Joshua <jad...@owlcyberdefense.com>
Sent: Thursday, July 29, 2021 3:14 PM
To: users@daffodil.apache.org
Subject: [EXT] Re: I am confused about what Daffodil does, in a fixed field, 
with the data following the data that matches a regex



Hello Roger,



In this case where there is always going to be exactly 3 characters, your best 
bet would be to use lengthKind="explicit" and specify the appropriate padding:



<xs:element name="RunwayIdentifier" type="xs:string"
    dfdl:lengthKind="explicit"
    dfdl:textPadKind="char"

    dfdl:textStringJustification="left"

    dfdl:textStringPadChar="%SP;" />



When parsing, DFDL will generate an infoset without the extra space, but when 
DFDL unparses it will know to add in the space if necessary.



If you want to further validate the data, you could add an assert to validate 
against the pattern:



<xs:element name="RunwayIdentifier" type="xs:string"
    dfdl:lengthKind="explicit"
    dfdl:textPadKind="char"

    dfdl:textStringJustification="left"

    dfdl:textStringPadChar="%SP;" >

      <xs:annotation>

        <xs:appinfo source="http://www.ogf.org/dfdl/";>

          <dfdl:assert testKind="pattern" testPattern=" [0-9]{2,2}(L|R){0,1}[ 
]{0,1}" />

        </xs:appinfo>

      </xs:annotation>

</xs:element>



Josh Adams

________________________________

From: Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>>
Sent: Thursday, July 29, 2021 2:31 PM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> 
<users@daffodil.apache.org<mailto:users@daffodil.apache.org>>
Subject: I am confused about what Daffodil does, in a fixed field, with the 
data following the data that matches a regex



Hi Folks,

I have a data item that identifies an airport runway, e.g. 23L (runway 23 left).

The data format specifies that the length of the field for the data item is 
exactly 3 characters.

A runway identifier does not need to have the indication of whether it is left 
or right; it's acceptable to just provide a two digit identifier, e.g., 23. So, 
L (and R) is optional.

Here's the DFDL Schema for the data item:

<xs:element name="RunwayIdentifier" type="xs:string"
    dfdl:lengthKind="pattern"
    dfdl:lengthPattern="[0-9]{2,2}(L|R){0,1}[ ]{0,1}"/>

Notice that the end of the regex says that there is an optional space (which is 
needed in the event that only the two digit identifier is used, without L or R).

If this is the input:

/23L/

(Slashes separate the data format's fields.)

Then this is the generated XML:

<RunwayIdentifier>23L</RunwayIdentifier>

If this is the input:

/23 /

(One space after the two digit runway identifier.)

Then this is the generated XML:

<RunwayIdentifier>23 </RunwayIdentifier>

Notice the space after the two digit runway identifier.

Interestingly, I have found that in the regex I can omit [ ]{0,1}

If the schema has this:

<xs:element name="RunwayIdentifier" type="xs:string"
    dfdl:lengthKind="pattern"
    dfdl:lengthPattern="[0-9]{2,2}(L|R){0,1}"/>

Notice that I omitted [ ]{0,1}

Then with this input:

/23 /

(One space after the two digit runway identifier.)

This is the generated XML:

<RunwayIdentifier>23</RunwayIdentifier>

Notice that there is no space after the two digit runway identifier.

Observe the different outputs:

<RunwayIdentifier>23 </RunwayIdentifier>
<RunwayIdentifier>23</RunwayIdentifier>

I get the former when the regex specifies [ ]{0,1} and I get the latter when I 
omit it.

I don't understand what's happening here. The field has a fixed length - 3 
characters. In the second case, the regex specifies two digits followed 
optionally by L or R and it does not specify an optional space. But the input 
has a space. Apparently Daffodil gobbles up the input per the regex pattern 
(i.e., it gobbles up 23) and then it does what with the space? Discard it? How 
can Daffodil simply discard data? I'm confused.

/Roger

Reply via email to