Thank you Mike - outstanding information!

Recall my data format:

                Label: Message

I have 3 perspectives on that format:

Perspective #1: There is a sequence of two strings, separated by a colon.

Perspective #2: There is a label terminated by a colon, followed by a message.

Perspective #3: There is a label, a colon, and a message.

Below I show the element declarations for the 3 perspectives. I am wondering if 
I have inlined the correct amount of DFDL stuff, per Best Practice?

Perspective #1: There is a sequence of two strings, separated by a colon.

<xs:element name="really-simple-format">
    <xs:complexType>
        <xs:sequence dfdl:separator=":" dfdl:separatorPosition="infix">
            <xs:element name="label" type="xs:string" />
            <xs:element name="message" type="xs:string" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

Perspective #2: There is a label terminated by a colon, followed by a message.

<xs:element name="really-simple-format">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="label" type="xs:string" dfdl:terminator=":" />
            <xs:element name="message" type="xs:string" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

Perspective #3: There is a label, a colon, and a message.

<xs:element name="really-simple-format">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="label" type="xs:string"
                                dfdl:lengthUnits="characters"
                                dfdl:lengthKind="pattern"
                                dfdl:lengthPattern="[\x0D-\xFF]+?(?=[:])" />
            <xs:element name="colon" type="xs:string"
                                dfdl:lengthUnits="characters"
                                dfdl:lengthKind="explicit"
                                dfdl:length="1" />
            <xs:element name="message" type="xs:string" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

For schemas 1 and 2, should I inline more DFDL stuff? For schema 3, do I have 
too much DFDL stuff?

/Roger




From: Mike Beckerle <[email protected]>
Sent: Wednesday, November 14, 2018 4:03 PM
To: [email protected]
Subject: Re: Best practice: all inline? no inline? mix?


Yes, of course the mix is best.



So why... well not all properties work the same way.



Delimiters are almost always going to be locally expressed on a model-group or 
element.



The escape schemes used for delimiters are almost always going to be defined 
centrally, but where they are used, varies with the delimiter - e.g., a 
comma-separated sequence might allow escaping a comma embedded in an element 
value, but the CRLF at the end of a line might not be allowed to appear in data 
at all, i.e., cannot be escaped. So the escape scheme is defined centrally, but 
applied locally.



Policy properties like encodingErrorPolicy almost always want to be at top 
level.



LengthKind is a property that is quite problematic. I kind of wish we had 
complexType lengthKind distinguished from simpleType lengthKind becasue a very 
common situation in binary data wants all complex types to have 
dfdl:lengthKind="implicit", but simple types to have 
dfdl:lengthKind="explicit". And only one of those can be expressed as the 
default at top level. So for binary data often every single complex type 
element has a dfdl:ref="..." that refers to a named format that has 
dfdl:lengthKind="implicit". Maybe. That might not be necessary if the simple 
types all have type-definitions and those all say lengthKind="explicit" and are 
heavily reused.



Some formats, not all, are reasonably well behaved.



They have mostly similar properties used throughout. E.g., they use a single 
text encoding. A single byte order and bit-order, etc.



However, there are plenty of cases where quite diverse data is simply 
juxtaposed in a data format, so that characteristics of the data change wildly. 
A very common idiom is "envelope and payload" where an envelope, or header, 
format is used to augment a payload that is in a quite-different, and perhaps 
harder to access, data format. The envelope or header is often byte-aligned, 
byte-oriented, and well-behaved i.e., "easy" data, encapsulating a payload that 
is bit-oriented, bit aligned, perhaps different bit-order or byte-order, 
different numeric representations, etc. NACT headers before Link16 payloads  is 
a good example of this. MIL-STD-2045 headers before USMTF payloads is sort of 
the opposite example. That header format is bit-packed binary, and is quite 
complex/challenging, and USMTF is textual and relatively speaking, easier.



DFDL has some properties that you aren't even allowed to put in top-level 
scope, because that would *never* make sense. E.g., dfdl:inputValueCalc.



A practice I consider valuable is that the top level <dfdl:format.../> 
annotation of a DFDL schema file, should always consist of exactly and only a 
reference to a named format.



E.g.,



<xs:annotation><xs:appinfo...>

    <dfdl:defineFormat name="xyzFormat2">

         <dfdl:format ... all the 'top-level' basic properties for this 
format... />

    </dfdl:defineFormat>



    <dfdl:format ref="tns:xyzFormat2"/> <!-- use the format -->



</xs:appinfo></xs:annotation>



That way you can reuse the format in another file that extends the schema, you 
can build variations of it easily, etc.



Another good practice is to put the basic format definition as above here, in 
its own DFDL schema file that is imported by the DFDL schema files that 
actually define types and elements and groups.





________________________________
From: Costello, Roger L. <[email protected]<mailto:[email protected]>>
Sent: Wednesday, November 14, 2018 2:33:43 PM
To: [email protected]<mailto:[email protected]>
Subject: Best practice: all inline? no inline? mix?

Hello DFDL Community,

I have a simple data format:

        Label: Message

Here is a sample input:

        Dear Sir: Thank you for your response.

I will take this perspective:

        The data format consists of a series of
        two strings (label and message),
        separated by colon.

Here is the XML that I wish to produce:

<really-simple-format>
      <label>Dear Sir</label>
      <message>Thank you for your response.</message>
</really-simple-format>

I have identified 3 approaches to design the DFDL schema:

1. Inline the DFDL stuff with the XML schema stuff.

2. Don't inline any DFDL stuff; instead, put all the DFDL stuff at the top, 
inside xs:annotation.

3. Inline some DFDL stuff, put some DFDL stuff at the top, inside xs:annotation.

Below I show the 3 approaches. Which is best practice? I know you will say the 
third approach (mix approach) is best practice. Okay, then, which DFDL stuff 
should be inlined and which should be put at the top, inside xs:annotation? 
What is the rationale for how you divvy up the DFDL stuff between inline and at 
the top? Do you agree with how I divvied up the DFDL stuff? Would you put more 
stuff inline? If so, what other stuff would you put inline?

------------------------------------
    All Inline Approach
------------------------------------

<xs:element name="really-simple-format"
    dfdl:alignment="implicit"
    dfdl:alignmentUnits="bytes"
    dfdl:encoding="UTF-8"
    dfdl:escapeSchemeRef=""
    dfdl:ignoreCase="no"
    dfdl:initiator=""
    dfdl:leadingSkip="0"
    dfdl:lengthKind="delimited"
    dfdl:outputNewLine="%CR;%LF;"
    dfdl:representation="text"
    dfdl:terminator=""
    dfdl:textPadKind="none"
    dfdl:textTrimKind="none"
    dfdl:trailingSkip="0"
    dfdl:truncateSpecifiedLengthString="no"
    >
    <xs:complexType>
        <xs:sequence
            dfdl:alignment="implicit"
            dfdl:alignmentUnits="bytes"
            dfdl:encoding="UTF-8"
            dfdl:ignoreCase="no"
            dfdl:initiatedContent="no"
            dfdl:initiator=""
            dfdl:leadingSkip="0"
            dfdl:lengthKind="delimited"
            dfdl:outputNewLine="%CR;%LF;"
            dfdl:separator=":"
            dfdl:separatorPosition="infix"
            dfdl:separatorSuppressionPolicy="never"
            dfdl:sequenceKind="ordered"
            dfdl:terminator=""
            dfdl:trailingSkip="0"
            >
            <xs:element name="label" type="xs:string"
                dfdl:alignment="implicit"
                dfdl:alignmentUnits="bytes"
                dfdl:encoding="UTF-8"
                dfdl:escapeSchemeRef=""
                dfdl:ignoreCase="no"
                dfdl:initiator=""
                dfdl:leadingSkip="0"
                dfdl:lengthKind="delimited"
                dfdl:outputNewLine="%CR;%LF;"
                dfdl:representation="text"
                dfdl:terminator=""
                dfdl:textPadKind="none"
                dfdl:textTrimKind="none"
                dfdl:trailingSkip="0"
                dfdl:truncateSpecifiedLengthString="no"
            />
            <xs:element name="message" type="xs:string"
                dfdl:alignment="implicit"
                dfdl:alignmentUnits="bytes"
                dfdl:encoding="UTF-8"
                dfdl:escapeSchemeRef=""
                dfdl:ignoreCase="no"
                dfdl:initiator=""
                dfdl:leadingSkip="0"
                dfdl:lengthKind="delimited"
                dfdl:outputNewLine="%CR;%LF;"
                dfdl:representation="text"
                dfdl:terminator=""
                dfdl:textPadKind="none"
                dfdl:textTrimKind="none"
                dfdl:trailingSkip="0"
                dfdl:truncateSpecifiedLengthString="no"
            />
        </xs:sequence>
    </xs:complexType>
</xs:element>

------------------------------------
    No Inline Approach
------------------------------------

<xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/";>
        <dfdl:format
            alignment="1"
            alignmentUnits="bytes"
            binaryFloatRep="ieee"
            binaryNumberRep="binary"
            bitOrder="mostSignificantBitFirst"
            byteOrder="bigEndian"
            calendarPatternKind="implicit"
            documentFinalTerminatorCanBeMissing="yes"
            emptyValueDelimiterPolicy="none"
            encoding="ISO-8859-1"
            encodingErrorPolicy="replace"
            escapeSchemeRef=""
            fillByte="f"
            floating="no"
            ignoreCase="no"
            initiator=""
            initiatedContent="no"
            leadingSkip="0"
            lengthKind="delimited"
            lengthUnits="bits"
            nilKind="literalValue"
            nilValueDelimiterPolicy="none"
            occursCountKind="implicit"
            outputNewLine="%CR;%LF;"
            representation="text"
            separator=":"
            separatorPosition="infix"
            separatorSuppressionPolicy="never"
            sequenceKind="ordered"
            terminator=""
            textBidi="no"
            textNumberCheckPolicy="strict"
            textNumberPattern="#,##0.###;-#,##0.###"
            textNumberRep="standard"
            textNumberRounding="explicit"
            textNumberRoundingIncrement="0"
            textNumberRoundingMode="roundUnnecessary"
            textOutputMinLength="0"
            textPadKind="none"
            textStandardBase="10"
            textStandardExponentRep="E"
            textStandardInfinityRep="Inf"
            textStandardNaNRep="NaN"
            textStandardZeroRep="0"
            textStandardGroupingSeparator=","
            textTrimKind="none"
            trailingSkip="0"
            truncateSpecifiedLengthString="no"
            utf16Width="fixed"
        />
    </xs:appinfo>
</xs:annotation>

<xs:element name="really-simple-format">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="label" type="xs:string" />
            <xs:element name="message" type="xs:string" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

------------------------------------
    Mix Approach
------------------------------------

<xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/";>
        <dfdl:format
            alignment="1"
            alignmentUnits="bytes"
            binaryFloatRep="ieee"
            binaryNumberRep="binary"
            bitOrder="mostSignificantBitFirst"
            byteOrder="bigEndian"
            calendarPatternKind="implicit"
            documentFinalTerminatorCanBeMissing="yes"
            emptyValueDelimiterPolicy="none"
            encoding="ISO-8859-1"
            encodingErrorPolicy="replace"
            escapeSchemeRef=""
            fillByte="f"
            floating="no"
            ignoreCase="no"
            initiator=""
            initiatedContent="no"
            leadingSkip="0"
            lengthKind="delimited"
            lengthUnits="bits"
            nilKind="literalValue"
            nilValueDelimiterPolicy="none"
            occursCountKind="implicit"
            outputNewLine="%CR;%LF;"
            representation="text"
            separatorSuppressionPolicy="never"
            sequenceKind="ordered"
            terminator=""
            textBidi="no"
            textNumberCheckPolicy="strict"
            textNumberPattern="#,##0.###;-#,##0.###"
            textNumberRep="standard"
            textNumberRounding="explicit"
            textNumberRoundingIncrement="0"
            textNumberRoundingMode="roundUnnecessary"
            textOutputMinLength="0"
            textPadKind="none"
            textStandardBase="10"
            textStandardExponentRep="E"
            textStandardInfinityRep="Inf"
            textStandardNaNRep="NaN"
            textStandardZeroRep="0"
            textStandardGroupingSeparator=","
            textTrimKind="none"
            trailingSkip="0"
            truncateSpecifiedLengthString="no"
            utf16Width="fixed"
        />
    </xs:appinfo>
</xs:annotation>

<xs:element name="really-simple-format">
    <xs:complexType>
        <xs:sequence dfdl:separator=":" dfdl:separatorPosition="infix">
            <xs:element name="label" type="xs:string" />
            <xs:element name="message" type="xs:string" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

Reply via email to