How to describe a data format with fields that are fixed length unless there are no following fields then they are variable length?

Roger L Costello Thu, 28 Sep 2023 06:09:12 -0700

My input is a single line consisting of three fields separated by slashes. The 
first field (A) can contain any string. The second field (B) has a fixed length 
(3); if the data does not consume the allotted 3 spaces, then the data is 
left-aligned and padded with spaces on the right. The third field (C) can 
contain any string. Here is a sample input:


Hello/X  /Comment

Notice the two padding spaces following X.

Here is another sample input:

Hello/XYZ/Comment

That is all very straightforward and easily described in DFDL.

Now for the complexity ...

The third field (C) is optional. If there is no data for the third field, then 
the data in the second field (B) does not need to be padded. So here is a valid 
input:

Hello/X

There is no padding following X. (Nor is there a slash separator)

So, the second field (B) has a fixed length only if there is a third field (C).

I created a DFDL schema which seems to correctly express this data format. See 
below. The approach I use is a choice for the second field:

choice
    <sequence>
        element declaration for fixed length B
        element declaration for C
    </sequence>
   element declaration for variable length B

Eek! I don't think that approach is scalable.

Suppose instead of 3 fields, there are 4 fields, A, B, C, D. Suppose B, C, D 
are optional and B, C are fixed length unless there are no following fields 
then they are variable length. The choice approach quickly becomes untenable as 
all permutations must be described. Is there a better approach to this problem?

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:dfdl=http://www.ogf.org/dfdl/dfdl-1.0/
                    xmlns:xs=http://www.w3.org/2001/XMLSchema
                    xmlns:fn=http://www.w3.org/2005/xpath-functions >
    <xs:annotation>
        <xs:appinfo source=http://www.ogf.org/dfdl/>
            <dfdl:format alignment="1"
                alignmentUnits="bytes"
                emptyValueDelimiterPolicy="none"
                encoding="ASCII"
                encodingErrorPolicy="replace"
                escapeSchemeRef=""
                fillByte="%SP;"
                floating="no"
                ignoreCase="yes"
                initiatedContent="no"
                initiator=""
                leadingSkip="0"
                lengthKind="delimited"
                lengthUnits="characters"
                nilValueDelimiterPolicy="none"
                occursCountKind="implicit"
                outputNewLine="%CR;%LF;"
                representation="text"
                separator=""
                separatorSuppressionPolicy="anyEmpty"
                sequenceKind="ordered"
                textBidi="no"
                textPadKind="none"
                textTrimKind="none"
                trailingSkip="0"
                truncateSpecifiedLengthString="no"
                terminator=""
                textNumberRep="standard"
                textStandardBase="10"
                textStandardZeroRep="0"
                textNumberRounding="pattern"
                textStandardExponentRep="E"
                textNumberCheckPolicy="strict"/>
        </xs:appinfo>
    </xs:annotation>
    <xs:element name="Test">
        <xs:complexType>
            <xs:sequence dfdl:separator="/" dfdl:separatorPosition="infix">
                <xs:element name="A" type="xs:string" />
                <xs:choice dfdl:choiceLengthKind="implicit">
                    <xs:sequence dfdl:separator="/" 
dfdl:separatorPosition="infix">
                        <xs:element name="B-fixed-length"
                                            dfdl:lengthKind="explicit"
                                            dfdl:length="3"
                                            dfdl:textTrimKind="padChar"
                                            dfdl:textPadKind="padChar"
                                            dfdl:textStringPadCharacter="%SP;"
                                            dfdl:textStringJustification="left">
                            <xs:simpleType>
                                <xs:restriction base="validString">
                                    <xs:enumeration value="X"/>
                                    <xs:enumeration value="XY"/>
                                    <xs:enumeration value="XYZ"/>
                                </xs:restriction>
                            </xs:simpleType>
                        </xs:element>
                        <xs:element name="C" type="xs:string"/>
                    </xs:sequence>
                    <xs:element name="B-variable-length">
                        <xs:simpleType>
                            <xs:restriction base="validString">
                                <xs:enumeration value="X"/>
                                <xs:enumeration value="XY"/>
                                <xs:enumeration value="XYZ"/>
                            </xs:restriction>
                        </xs:simpleType>
                    </xs:element>
                </xs:choice>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

    <xs:simpleType name="validString">
        <xs:annotation>
            <xs:appinfo source=http://www.ogf.org/dfdl/>
                <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
            </xs:appinfo>
        </xs:annotation>
        <xs:restriction base="xs:string"/>
    </xs:simpleType>

</xs:schema>

How to describe a data format with fields that are fixed length unless there are no following fields then they are variable length?

Reply via email to