Re: Design DFDL Schemas to Reject Malformed Data, Not Just Accept Correct Data [Was: DFDL can increase your IQ by 10 points!]

Mike Beckerle Wed, 27 Sep 2023 06:45:08 -0700

The only correction is to use a simpler test pattern:

testPattern="(OPER)|(EXER)"


because you don't care what's after OPER or EXER. Just those characters are
enough for you to decide the optional element DOES exist. I claim that's
what the format usually means/intends when it describes data as having
unique initiator strings like this. You look for those characters and only
those to decide.


On Wed, Sep 27, 2023 at 8:53 AM Roger L Costello <coste...@mitre.org> wrote:

> Mike Beckerle wrote:
>
>
>
>    - Design the DFDL schema to reject malformed data, not just accept
>    correct data.
>
>
>
> Oh, oh, yea!
>
>
>
> I like it!
>
>
>
> Not sure how to do that, however. Would you help me work through this,
> please?
>
>
>
> Mike points out, with this input:
>
>
>
>    - Foobar
>    - OPER/something not allowed//
>    - Barfoo
>    -
>    - Parsing the OPER line will fail, but then it will try parsing it as
>    an EXER line, which will also fail, so it will leave the whole wrapper
>    element out, and it will continue to try to parse the OPER line instead of
>    failing.
>
>
>
> Is this the behavior we desire:
>
>
>
> If an input line starts (is initiated by) OPER, then process the rest of
> the input line using the DFDL description of OPER. If, during the
> processing of the OPER field, an error arises, then the parser should
> display an error message, abandon the input line, proceed to the next input
> line and the element following the wrapper element.
>
>
>
> Is that the behavior we desire?
>
>
>
> Mike said that the solution is to:
>
>
>
>    - Use dfdl:discriminator with testKind='pattern'
>
>
>
> I don’t think that I’ve ever used that combination, so I did some
> experimenting.
>
>
>
> Suppose the legal value for the field following EXER is TANGO (all
> uppercase) and the legal value for the field following OPER is XRAY (all
> uppercase).
>
>
>
> Is this how to declare the wrapper element:
>
>
>
> <xs:element name="OPER-EXER-wrapper" minOccurs="0">
>     <xs:annotation>
>         <xs:appinfo source="http://www.ogf.org/dfdl/";>
>             <dfdl:discriminator testKind="pattern" testPattern=
> "(OPER/XRAY)|(EXER/TANGO)|"/>
>         </xs:appinfo>
>     </xs:annotation>
>     <xs:complexType>
>         <!-- OPER and EXER declarations -->
>     </xs:complexType>
> </xs:element>
>
>
>
> Is that correct?
>
>
>
> This is great stuff. Once I grok this, my IQ will have increased another
> 10 points.
>
>
>
> /Roger
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Tuesday, September 26, 2023 4:49 PM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: DFDL can increase your IQ by 10 points!
>
> ZjQcmQR
>
>
>
> YFThere is another detail which will further improve your schema.
>
>
>
> What if the data contains an OPER line, but after the OPER characters
> there is some defect in the data of the OPER line.
>
>
>
> foobar
>
> OPER/something not allowed
>
> barfoo
>
>
>
> Parsing the OPER line will fail, but then it will try parsing it as an
> EXER line, which will also fail, so it will leave the whole wrapper element
> out, and it will continue to try to parse the OPER line instead of failing.
> Your optional element gave it a way to suppress the error and parse
> differently.
>
>
>
> If the schema after this OPER/EXPR element is say, just a string, then
> "OPER/something not allowed" will be taken as the value of that string, and
> ... it's possible the parse will succeed and just produce an infoset that
> is perfectly valid according to the schema, but clearly the schema is
> allowing a solution we want to disallow.
>
>
>
> The fix here is your optionality needs a discriminator. The discriminator
> on the optional element you need checks that the data starts with OPER or
> EXPR only.
>
> (use dfdl:discriminator with testKind='pattern').
>
>
>
> This issue is a matter of precision. It's the difference between:
>
>    1. It's either a fully correct OPER line, or a fully correct EXER
>    line, or it isn't present.
>    2. It's either a line that starts with OPER or a line that starts with
>    EXER or it isn't present.
>
> That distinction is designing the schema to properly reject malformed
> data, not just accept correct data.
>
>
>
> See in (1) above, it allows for faulty OPER or EXPR lines to be correctly
> parsed as "it isn't present". The decision really should NOT depend on any
> more than the OPER or EXPR characters being there.
>
>
>
> I find it hard to remember to do this. But most decisions in the schema
> need discriminators. I have to revisit every decision point in the schema
> one by one to make sure there are discriminators everywhere there can be.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Sep 26, 2023 at 10:13 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Folks,
>
> I think DFDL is awesome. Think about it: DFDL is a standard language for
> describing (describe, not parse) just about any data format. Again, I
> emphasize that it's not about how to parse the data format, it's about
> describing the data format. Given a description a DFDL processor can figure
> out how to parse instances of the data format. Wow!
>
> But there's another reason that DFDL is awesome: it forces you to be very
> precise in your description. It forces you to think very logically. It
> forces you to understand the implications of your description decisions.
> Let me give you an example of the later.
>
> I am dealing with a data format that consists of a sequence of lines.
> Here's a sample instance:
>
> John Doe
> OPER/XRAY//
> Sally Smith
>
> The first and last lines are just strings. Not interesting. The second
> line is the interesting one. Here's another instance:
>
> John Doe
> EXER/TANGO//
> Sally Smith
>
> As you can see, the second line starts with either OPER or EXER and
> terminates with //. The second line is also optional. That is, the second
> line is either OPER, EXER, or neither. That leads one to this description:
>
> choice
>       OPER (optional)
>       EXER (optional)
>
> However, DFDL doesn't allow branches of a choice to be optional. So, the
> correct description is:
>
> choice
>       sequence
>             OPER (optional)
>       sequence
>             EXER (optional)
>
> Slick, aye?
>
> But not correct.
>
> Let's think about this. Suppose the input is this:
>
> John Doe
> EXER/TANGO//
> Sally Smith
>
> While processing the second line, you would think that the DFDL processor
> would find that the first branch of the choice (the OPER branch) doesn't
> match and therefore the processor would process the line using the second
> branch. Ha! Not correct!
>
> The first branch is optional. That is key! Since the second line doesn't
> start with OPER, the DFDL processor thinks, "Oh, there must be no
> occurrences of the OPER line." So, the processor moves on to the
> description following the choice. Do you see it? Do you see the problem? I
> hope so. This is wicked cool. As I worked through this example, it forced
> me to think very, very clearly about the implication of an optional OPER
> line. So, what's the solution? Make OPER and EXER mandatory:
>
>  choice
>       sequence
>             OPER (mandatory)
>       sequence
>             EXER (mandatory)
>
> And, place the choice inside an optional wrapper element:
>
> OPER-EXER-wrapper (optional)
>       choice
>             sequence
>                   OPER (mandatory)
>             sequence
>                   EXER (mandatory)
>
> Now, with this input:
>
> John Doe
> EXER/TANGO//
> Sally Smith
>
> The processor will try the first branch of the choice, it fails, so it
> tries the second branch and succeeds.
>
> With this input:
>
> John Doe
> Sally Smith
>
> The processor will try the first branch of the choice, it fails, try the
> second branch, it fails, so there is no value for the wrapper element.
>
> This blows my mind. I feel like this example alone boosted my IQ by 10
> points.
>
> /Roger
>
>

Re: Design DFDL Schemas to Reject Malformed Data, Not Just Accept Correct Data [Was: DFDL can increase your IQ by 10 points!]

Reply via email to