Re: How to describe a data format with fields that are fixed length unless there are no following fields then they are variable length?

Mike Beckerle Sat, 30 Sep 2023 07:18:40 -0700

The feature you want is some lookahead.

In DFDL this is done with dfdl:assert with testKind 'pattern' and a regex.


So you can, for just one field, define it as either fixed or variable
length depending on whether the data looks like 3 characters and another
delimiter, or not.

That way each field can be defined this way, and each one is isolated from
the next, so the whole thing doesn't become a big coupled mess with
everything having to be combined with the next field.

It's not perfect, because you are expressing the separator in two places,
as a sequence separator, and in this look ahead regex, but OTOH it
expresses exactly the way you described the problem in terms of "it's fixed
length if it's followed by a next field"

So something like this would be in a sequence separated by "/"

<element name="b">
   <complexType>
      <choice>
        <sequence>
            <sequence>
                <annotation><appinfo ..>
                     <!-- look ahead for 3 non-slash non-line-ending then a
slash -->
                     <dfdl:assert testKind="pattern"
testPattern="[^/\R][^/\R][^/\R]/" />
                </appinfo></annotation>
             </sequence>
             <!-- this len named element is here to obey XSD's UPA rules.
-->
             <element name="len" type="xs:unsignedInt"
dfdl:inputValueCalc="{ 3 }"/>
             <element name="str" type="xs:string"
dfdl:lengthKind="explicit" dfdl:length="{ ../len }"/>
              <!-- you could add space pad/trim to the str if you want it
left justified -->
         </sequence>
         <element name="str" type="xs:string" dfdl:lengthKind="delimited"/>
         </choice>
     </complexType>
</element>


On Thu, Sep 28, 2023 at 9:09 AM Roger L Costello <coste...@mitre.org> wrote:

> My input is a single line consisting of three fields separated by slashes.
> The first field (A) can contain any string. The second field (B) has a
> fixed length (3); if the data does not consume the allotted 3 spaces, then
> the data is left-aligned and padded with spaces on the right. The third
> field (C) can contain any string. Here is a sample input:
>
>
>
> Hello/X  /Comment
>
>
>
> Notice the two padding spaces following X.
>
>
>
> Here is another sample input:
>
>
>
> Hello/XYZ/Comment
>
>
>
> That is all very straightforward and easily described in DFDL.
>
>
>
> Now for the complexity …
>
>
>
> The third field (C) is optional. If there is no data for the third field,
> then the data in the second field (B) does not need to be padded. So here
> is a valid input:
>
>
>
> Hello/X
>
>
>
> There is no padding following X. (Nor is there a slash separator)
>
>
>
> So, the second field (B) has a fixed length only if there is a third field
> (C).
>
>
>
> I created a DFDL schema which seems to correctly express this data format.
> See below. The approach I use is a choice for the second field:
>
>
>
> choice
>     <sequence>
>         element declaration for fixed length B
>
>         element declaration for C
>
>     </sequence>
>
>    element declaration for variable length B
>
>
>
> Eek! I don’t think that approach is scalable.
>
>
>
> Suppose instead of 3 fields, there are 4 fields, A, B, C, D. Suppose B, C,
> D are optional and B, C are fixed length unless there are no following
> fields then they are variable length. The choice approach quickly becomes
> untenable as all permutations must be described. Is there a better approach
> to this problem?
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:dfdl=http://www.ogf.org/dfdl/dfdl-1.0/
>                     xmlns:xs=http://www.w3.org/2001/XMLSchema
>                     xmlns:fn=http://www.w3.org/2005/xpath-functions >
>     <xs:annotation>
>         <xs:appinfo source=http://www.ogf.org/dfdl/>
>             <dfdl:format alignment="1"
>                 alignmentUnits="bytes"
>                 emptyValueDelimiterPolicy="none"
>                 encoding="ASCII"
>                 encodingErrorPolicy="replace"
>                 escapeSchemeRef=""
>                 fillByte="%SP;"
>                 floating="no"
>                 ignoreCase="yes"
>                 initiatedContent="no"
>                 initiator=""
>                 leadingSkip="0"
>                 lengthKind="delimited"
>                 lengthUnits="characters"
>                 nilValueDelimiterPolicy="none"
>                 occursCountKind="implicit"
>                 outputNewLine="%CR;%LF;"
>                 representation="text"
>                 separator=""
>                 separatorSuppressionPolicy="anyEmpty"
>                 sequenceKind="ordered"
>                 textBidi="no"
>                 textPadKind="none"
>                 textTrimKind="none"
>                 trailingSkip="0"
>                 truncateSpecifiedLengthString="no"
>                 terminator=""
>                 textNumberRep="standard"
>                 textStandardBase="10"
>                 textStandardZeroRep="0"
>                 textNumberRounding="pattern"
>                 textStandardExponentRep="E"
>                 textNumberCheckPolicy="strict"/>
>         </xs:appinfo>
>     </xs:annotation>
>     <xs:element name="Test">
>         <xs:complexType>
>             <xs:sequence dfdl:separator="/" dfdl:separatorPosition="infix"
> >
>                 <xs:element name="A" type="xs:string" />
>                 <xs:choice dfdl:choiceLengthKind="implicit">
>                     <xs:sequence dfdl:separator="/" dfdl:separatorPosition
> ="infix">
>                         <xs:element name="B-fixed-length"
>                                             dfdl:lengthKind="explicit"
>                                             dfdl:length="3"
>                                             dfdl:textTrimKind="padChar"
>                                             dfdl:textPadKind="padChar"
>                                             dfdl:textStringPadCharacter=
> "%SP;"
>                                             dfdl:textStringJustification=
> "left">
>                             <xs:simpleType>
>                                 <xs:restriction base="validString">
>                                     <xs:enumeration value="X"/>
>                                     <xs:enumeration value="XY"/>
>                                     <xs:enumeration value="XYZ"/>
>                                 </xs:restriction>
>                             </xs:simpleType>
>                         </xs:element>
>                         <xs:element name="C" type="xs:string"/>
>                     </xs:sequence>
>                     <xs:element name="B-variable-length">
>                         <xs:simpleType>
>                             <xs:restriction base="validString">
>                                 <xs:enumeration value="X"/>
>                                 <xs:enumeration value="XY"/>
>                                 <xs:enumeration value="XYZ"/>
>                             </xs:restriction>
>                         </xs:simpleType>
>                     </xs:element>
>                 </xs:choice>
>             </xs:sequence>
>         </xs:complexType>
>     </xs:element>
>
>     <xs:simpleType name="validString">
>         <xs:annotation>
>             <xs:appinfo source=http://www.ogf.org/dfdl/>
>                 <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
>             </xs:appinfo>
>         </xs:annotation>
>         <xs:restriction base="xs:string"/>
>     </xs:simpleType>
>
> </xs:schema>
>
>
>

Re: How to describe a data format with fields that are fixed length unless there are no following fields then they are variable length?

Reply via email to