Well, here's what ChatCPT4.0 said when I asked "Explain the importance of discriminators in DFDL" Importance of Discriminators in DFDL
1. *Differentiating Choices*: - *Handling Multiple Possibilities*: In many data formats, a single field can represent different types of information based on context. For example, a field could be an integer in one context and a string in another. Discriminators help in distinguishing between these possibilities. - *Choice Elements*: DFDL supports choice elements, where a particular element can be one of several types. Discriminators are used to determine which type is currently being parsed or serialized. 2. *Ensuring Accurate Parsing*: - *Type Safety*: Discriminators ensure that the data is correctly interpreted according to its actual type. Without discriminators, the parser might misinterpret the data, leading to errors and incorrect processing. - *Validation*: They help in validating the data against the expected format, ensuring that the data conforms to the defined structure. 3. *Improving Readability and Maintenance*: - *Explicit Conditions*: Discriminators make the conditions under which different data structures are used explicit. This improves the readability of the DFDL schema and makes it easier to maintain and understand. - *Documentation*: They act as documentation within the schema, clearly indicating the criteria for selecting different data structures. 4. *Efficiency in Processing*: - *Optimized Parsing*: By using discriminators, the parser can quickly determine the correct type of data to parse, making the parsing process more efficient. - *Reduced Ambiguity*: Discriminators reduce ambiguity in the data format, leading to fewer errors and less need for complex error handling mechanisms. I particularly appreciated its point about readability and maintenance. Their use of the term "validation" is not consistent with XML/DFDL usage however. ChatGPT4 then proceeded to hallucinate some things, including some imaginary incorrect DFDL that doesn't actually illustrate the important cases for discriminators. A discriminator provides more-specific information than just an assertion. The notion of discriminators implies something about exclusion of other possibilities. Toy examples aren't terribly informative. I would refer you to https://github.com/DFDLSchemas/IBM4690-TLOG/tree/daffodil which is the TLOG (think "cash register data") for numerous examples of discriminators in use. Many of them are things one might have expressed using choiceDispatchKey/choiceBranchKey, but some are not. To me the best argument for discriminators is that most format specifications often have prose that directly maps to the DFDL concept of discriminators. Consider a message format where there are a couple fields described (in the spec) that seem to be in every message. If the spec just says that a particular message has some field with value A and some other field with value B, that's an assertion. It's one sided because nothing said that no other message shares those same field values. You'd have to study all the other message definitions to know. But, If the specification says (or implies, perhaps by calling the fields "type" and "sub-type") that no other message can have those same field values, that's a discriminator. It goes beyond an assertion by giving you not just positive value information but also exclusion information about those fields and values. A discriminator also has a completeness aspect. If I have two fields A and B, which will have values 5 and 6 respectively, then the following conceptual assertions are equivalent: assert(A = 5) ; assert(B = 6) // two assertions in a row are equivalent to an "AND" of the two assert(A = 5 && B = 6) If I tell you A = 5 and B = 6 form a discriminator, then the two fields (and values) are NOT individually discriminators. Only both taken together are a discriminator. so these are equivalent: assert(A = 5); discriminator(B = 6) assert(A = 5); assert(B = 6) ; discriminator(true); discriminator(A = 5 && B = 6) Toy examples aren't very informative, but here's a DFDL schema that produces different parse results if I use an assertion vs discriminator. Assume dfdl:lengthKind="delimited". <choice> <choice> <sequence dfdl:separator=";"> <element name="tag" type="xs:string"/> <annotation><appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert>{ . eq "A"}</dfdl:assert> </appinfo></application> </element> <element name="num" type="xs:int"/> </sequence> ... more sequences here for tag B, tag C, etc.... <sequence dfdl:separator=";"> <element name="unrecognizedTag" type="xs:string"/> <element name="data" type="xs:string"/> </sequence> </choice> <element name="malformed" type="xs:string"/> </choice> If the input data is "A;5" this schema produces <tag>A</tag><num>5</num> With the assertion as written, if the input data is "A;X" this schema produces <unrecognizedTag>A</unrecognizedTag><data>X</data> This is arguably incorrect because the tag A was in fact recognized. The failure occurred after that in the subsequent field 'num'. If I change the dfdl:assert(s) to discriminators, this schema produces <malformed>A;X</malformed>. On Fri, Jun 14, 2024 at 3:41 AM Roger L Costello <coste...@mitre.org> wrote: > Mike wrote: > > > > - Your second assert should be a discriminator. > > > > Mike, you are saying that instead of this: > > > > <xs:element name='D' type="unsignedint4"> > <xs:annotation> > <xs:appinfo source=http://www.ogf.org/dfdl/> > <dfdl:assert>{(. eq 9) or (. eq 10)}</dfdl:assert> > </xs:appinfo> > </xs:annotation> > </xs:element> > > > > I should use this: > > > > <xs:element name='D' type="unsignedint4"> > <xs:annotation> > <xs:appinfo source=http://www.ogf.org/dfdl/> > <dfdl:discriminator test="{(. eq 9) or (. eq 10)}" /> > </xs:appinfo> > </xs:annotation> > </xs:element> > > > > Would you explain the difference between those two, please? > > > > Would you provide an example of where the former would behave incorrectly > whereas the latter would behave correctly, please? > > > > /Roger > > > > > > *From:* Mike Beckerle <mbecke...@apache.org> > *Sent:* Thursday, June 13, 2024 2:30 PM > *To:* users@daffodil.apache.org > *Subject:* [EXT] Re: Warning: Counterintuitive placement detected > > > > Minor suggestion. Your second assert should be a discriminator. This would > be a more precise expression of exactly the criteria for choosing this > sequence as the data format. DFDL can use that to improve diagnostic > behavior if both "assertions" > > ZjQcmQRYFpfptBannerStart > > Minor suggestion. > > > > Your second assert should be a discriminator. > > > > This would be a more precise expression of exactly the criteria for > choosing this sequence as the data format. > > > > DFDL can use that to improve diagnostic behavior if both "assertions" > hold, but something is wrong with the data for elements B, C, and D. > > > > There is even a small possibility of an incorrect false-positive parse > happening that the discriminator rules out. > > > > > > On Thu, Jun 13, 2024 at 7:52 AM Roger L Costello <coste...@mitre.org> > wrote: > > I found a solution that seems to work. See below. Any suggestions for > improving it is welcome. > > <xs:element name="test"> > <xs:complexType> > <xs:sequence> > <!-- > The decimal value of the first bit of the input should be > wrapped in an <A> tag if the decimal value of the next 4 > bits > equals 7 and the decimal value of the 4 bits after that > equals 9, > or the decimal value of the next 4 bits equals 7 and the > decimal > value of the 4 bits after that equals 10, otherwise the > first bit > of the input should be wrapped in a <B> tag. > > The below is not looking ahead. Assume the input is ACD. > If C is not > equal to 7 and D is not equal to 9 or 10, then we realize > that > we made an error, back up, and try the next branch of the > choice. > --> > <xs:choice> > <xs:sequence> > <xs:element name='A' type="unsignedint1"/> > <xs:element name='C' type="unsignedint4"> > <xs:annotation> > <xs:appinfo source="http://www.ogf.org/dfdl/"> > <dfdl:assert>{. eq 7}</dfdl:assert> > </xs:appinfo> > </xs:annotation> > </xs:element> > <xs:element name='D' type="unsignedint4"> > <xs:annotation> > <xs:appinfo source="http://www.ogf.org/dfdl/"> > <dfdl:assert>{(. eq 9) or (. eq > 10)}</dfdl:assert> > </xs:appinfo> > </xs:annotation> > </xs:element> > </xs:sequence> > <xs:sequence> > <xs:element name='B' type="unsignedint1"/> > <xs:element name='C' type="unsignedint4"/> > <xs:element name='D' type="unsignedint4"/> > </xs:sequence> > </xs:choice> > </xs:sequence> > </xs:complexType> > </xs:element> > > -----Original Message----- > From: Roger L Costello <coste...@mitre.org> > Sent: Thursday, June 13, 2024 7:10 AM > To: users@daffodil.apache.org > Subject: Warning: Counterintuitive placement detected > > Hi Folks, > > I need my DFDL to look-ahead: > > The decimal value of the first bit of the input should be wrapped in an > <A> tag if the decimal value of the next 4 bits equals 7 and the decimal > value of the 4 bits after that equals 9, or the decimal value of the next 4 > bits equals 7 and the decimal value of the 4 bits after that equals 10, > otherwise the first bit of the input should be wrapped in a <B> tag. > > Below is my attempt at implementing this. Daffodil gives this warning > message: > > Schema Definition Warning: Counterintuitive placement detected. Wrap the > discriminator or assert in an empty sequence to evaluate before the > contents. (id: discouragedDiscriminatorPlacement) > > Here is my DFDL: > > <xs:element name="test"> > <xs:complexType> > <xs:sequence> > <xs:choice> > <xs:sequence> > <xs:annotation> > <xs:appinfo source="http://www.ogf.org/dfdl/"> > <dfdl:discriminator testKind="pattern" > testPattern="(.x7x9)|(.x7xA)" /> > </xs:appinfo> > </xs:annotation> > <xs:element name='A' type="unsignedint1"/> > </xs:sequence> > <xs:sequence> > <xs:element name='B' type="unsignedint1"/> > </xs:sequence> > </xs:choice> > <xs:element name='C' type="unsignedint4"/> > <xs:element name='D' type="unsignedint4"/> > </xs:sequence> > </xs:complexType> > </xs:element> > >