Re: Warning: Counterintuitive placement detected

Mike Beckerle Tue, 25 Jun 2024 19:45:59 -0700

Well, here's what ChatCPT4.0 said when I asked "Explain the importance of
discriminators in DFDL"
Importance of Discriminators in DFDL


   1.

   *Differentiating Choices*:
   - *Handling Multiple Possibilities*: In many data formats, a single
      field can represent different types of information based on context. For
      example, a field could be an integer in one context and a string in
      another. Discriminators help in distinguishing between these
possibilities.
      - *Choice Elements*: DFDL supports choice elements, where a
      particular element can be one of several types. Discriminators
are used to
      determine which type is currently being parsed or serialized.
   2.

   *Ensuring Accurate Parsing*:
   - *Type Safety*: Discriminators ensure that the data is correctly
      interpreted according to its actual type. Without discriminators, the
      parser might misinterpret the data, leading to errors and incorrect
      processing.
      - *Validation*: They help in validating the data against the expected
      format, ensuring that the data conforms to the defined structure.
   3.

   *Improving Readability and Maintenance*:
   - *Explicit Conditions*: Discriminators make the conditions under which
      different data structures are used explicit. This improves the
readability
      of the DFDL schema and makes it easier to maintain and understand.
      - *Documentation*: They act as documentation within the schema,
      clearly indicating the criteria for selecting different data structures.
   4.

   *Efficiency in Processing*:
   - *Optimized Parsing*: By using discriminators, the parser can quickly
      determine the correct type of data to parse, making the parsing process
      more efficient.
      - *Reduced Ambiguity*: Discriminators reduce ambiguity in the data
      format, leading to fewer errors and less need for complex error handling
      mechanisms.

I particularly appreciated its point about readability and maintenance.
Their use of the term "validation" is not consistent with XML/DFDL usage
however.

ChatGPT4 then proceeded to hallucinate some things, including some
imaginary incorrect DFDL that doesn't actually illustrate the important
cases for discriminators.

A discriminator provides more-specific information than just an assertion.
The notion of discriminators implies something about exclusion of other
possibilities.

Toy examples aren't terribly informative. I would refer you to
https://github.com/DFDLSchemas/IBM4690-TLOG/tree/daffodil which is the TLOG
(think "cash register data") for numerous examples of discriminators in
use. Many of them are things one might have expressed using
choiceDispatchKey/choiceBranchKey, but some are not.

To me the best argument for discriminators is that most format
specifications often have prose that directly maps to the DFDL concept of
discriminators.

Consider a message format where there are a couple fields described (in the
spec) that seem to be in every message.
If the spec just says that a particular message has some field with value A
and some other field with value B, that's an assertion. It's one sided
because nothing said that no other message shares those same field values.
You'd have to study all the other message definitions to know.

But, If the specification says (or implies, perhaps by calling the fields
"type" and "sub-type") that no other message can have those same field
values, that's a discriminator. It goes beyond an assertion by giving you
not just positive value information but also exclusion information about
those fields and values.

A discriminator also has a completeness aspect. If I have two fields A and
B, which will have values 5 and 6 respectively, then the following
conceptual assertions are equivalent:

    assert(A = 5) ; assert(B = 6) // two assertions in a row are equivalent
to an "AND" of the two
    assert(A = 5 && B = 6)

If I tell you A = 5 and B = 6 form a discriminator, then the two fields
(and values) are NOT individually discriminators. Only both taken together
are a discriminator. so these are equivalent:

    assert(A = 5); discriminator(B = 6)
    assert(A = 5); assert(B = 6) ; discriminator(true);
    discriminator(A = 5 && B = 6)

Toy examples aren't very informative, but here's a DFDL schema that
produces different parse results if I use an assertion vs discriminator.
Assume dfdl:lengthKind="delimited".

<choice>
     <choice>
            <sequence dfdl:separator=";">
              <element name="tag" type="xs:string"/>
                   <annotation><appinfo source="http://www.ogf.org/dfdl/";>
                      <dfdl:assert>{ . eq "A"}</dfdl:assert>
                   </appinfo></application>
              </element>
              <element name="num" type="xs:int"/>
            </sequence>
            ... more sequences here for tag B, tag C, etc....
            <sequence dfdl:separator=";">
               <element name="unrecognizedTag" type="xs:string"/>
               <element name="data" type="xs:string"/>
            </sequence>

     </choice>
     <element name="malformed" type="xs:string"/>
</choice>

If the input data is "A;5" this schema produces <tag>A</tag><num>5</num>

With the assertion as written, if the input data is "A;X" this schema
produces
<unrecognizedTag>A</unrecognizedTag><data>X</data>
This is arguably incorrect because the tag A was in fact recognized. The
failure occurred after that in the subsequent field 'num'.

If I change the dfdl:assert(s) to discriminators, this schema produces
<malformed>A;X</malformed>.


On Fri, Jun 14, 2024 at 3:41 AM Roger L Costello <coste...@mitre.org> wrote:

> Mike wrote:
>
>
>
>    - Your second assert should be a discriminator.
>
>
>
> Mike, you are saying that instead of this:
>
>
>
> <xs:element name='D' type="unsignedint4">
>     <xs:annotation>
>         <xs:appinfo source=http://www.ogf.org/dfdl/>
>             <dfdl:assert>{(. eq 9) or (. eq 10)}</dfdl:assert>
>         </xs:appinfo>
>     </xs:annotation>
> </xs:element>
>
>
>
> I should use this:
>
>
>
> <xs:element name='D' type="unsignedint4">
>     <xs:annotation>
>         <xs:appinfo source=http://www.ogf.org/dfdl/>
>             <dfdl:discriminator test="{(. eq 9) or (. eq 10)}" />
>         </xs:appinfo>
>     </xs:annotation>
> </xs:element>
>
>
>
> Would you explain the difference between those two, please?
>
>
>
> Would you provide an example of where the former would behave incorrectly
> whereas the latter would behave correctly, please?
>
>
>
> /Roger
>
>
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Thursday, June 13, 2024 2:30 PM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: Warning: Counterintuitive placement detected
>
>
>
> Minor suggestion. Your second assert should be a discriminator. This would
> be a more precise expression of exactly the criteria for choosing this
> sequence as the data format. DFDL can use that to improve diagnostic
> behavior if both "assertions"
>
> ZjQcmQRYFpfptBannerStart
>
> Minor suggestion.
>
>
>
> Your second assert should be a discriminator.
>
>
>
> This would be a more precise expression of exactly the criteria for
> choosing this sequence as the data format.
>
>
>
> DFDL can use that to improve diagnostic behavior if both "assertions"
> hold, but something is wrong with the data for elements B, C, and D.
>
>
>
> There is even a small possibility of an incorrect false-positive parse
> happening that the discriminator rules out.
>
>
>
>
>
> On Thu, Jun 13, 2024 at 7:52 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> I found a solution that seems to work. See below. Any suggestions for
> improving it is welcome.
>
> <xs:element name="test">
>     <xs:complexType>
>         <xs:sequence>
>             <!--
>                 The decimal value of the first bit of the input should be
>                 wrapped in an <A> tag if the decimal value of the next 4
> bits
>                 equals 7 and the decimal value of the 4 bits after that
> equals 9,
>                 or the decimal value of the next 4 bits equals 7 and the
> decimal
>                 value of the 4 bits after that equals 10, otherwise the
> first bit
>                 of the input should be wrapped in a <B> tag.
>
>                 The below is not looking ahead. Assume the input is ACD.
> If C is not
>                 equal to 7 and D is not equal to 9 or 10, then we realize
> that
>                 we made an error, back up, and try the next branch of the
> choice.
>             -->
>             <xs:choice>
>                 <xs:sequence>
>                     <xs:element name='A' type="unsignedint1"/>
>                     <xs:element name='C' type="unsignedint4">
>                         <xs:annotation>
>                             <xs:appinfo source="http://www.ogf.org/dfdl/";>
>                                 <dfdl:assert>{. eq 7}</dfdl:assert>
>                             </xs:appinfo>
>                         </xs:annotation>
>                     </xs:element>
>                     <xs:element name='D' type="unsignedint4">
>                         <xs:annotation>
>                             <xs:appinfo source="http://www.ogf.org/dfdl/";>
>                                 <dfdl:assert>{(. eq 9) or (. eq
> 10)}</dfdl:assert>
>                             </xs:appinfo>
>                         </xs:annotation>
>                     </xs:element>
>                 </xs:sequence>
>                 <xs:sequence>
>                     <xs:element name='B' type="unsignedint1"/>
>                     <xs:element name='C' type="unsignedint4"/>
>                     <xs:element name='D' type="unsignedint4"/>
>                 </xs:sequence>
>             </xs:choice>
>         </xs:sequence>
>     </xs:complexType>
> </xs:element>
>
> -----Original Message-----
> From: Roger L Costello <coste...@mitre.org>
> Sent: Thursday, June 13, 2024 7:10 AM
> To: users@daffodil.apache.org
> Subject: Warning: Counterintuitive placement detected
>
> Hi Folks,
>
> I need my DFDL to look-ahead:
>
> The decimal value of the first bit of the input should be wrapped in an
> <A> tag if the decimal value of the next 4 bits equals 7 and the decimal
> value of the 4 bits after that equals 9, or the decimal value of the next 4
> bits equals 7 and the decimal value of the 4 bits after that equals 10,
> otherwise the first bit of the input should be wrapped in a <B> tag.
>
> Below is my attempt at implementing this. Daffodil gives this warning
> message:
>
> Schema Definition Warning: Counterintuitive placement detected. Wrap the
> discriminator or assert in an empty sequence to evaluate before the
> contents. (id: discouragedDiscriminatorPlacement)
>
> Here is my DFDL:
>
> <xs:element name="test">
>     <xs:complexType>
>         <xs:sequence>
>             <xs:choice>
>                 <xs:sequence>
>                     <xs:annotation>
>                         <xs:appinfo source="http://www.ogf.org/dfdl/";>
>                             <dfdl:discriminator testKind="pattern"
> testPattern="(.x7x9)|(.x7xA)" />
>                         </xs:appinfo>
>                     </xs:annotation>
>                     <xs:element name='A' type="unsignedint1"/>
>                 </xs:sequence>
>                 <xs:sequence>
>                     <xs:element name='B' type="unsignedint1"/>
>                 </xs:sequence>
>             </xs:choice>
>             <xs:element name='C' type="unsignedint4"/>
>             <xs:element name='D' type="unsignedint4"/>
>         </xs:sequence>
>     </xs:complexType>
> </xs:element>
>
>

Re: Warning: Counterintuitive placement detected

Reply via email to