Re: DFDL can increase your IQ by 10 points!

Mike Beckerle Tue, 26 Sep 2023 13:51:19 -0700

There is another detail which will further improve your schema.

What if the data contains an OPER line, but after the OPER characters there
is some defect in the data of the OPER line.

foobar
OPER/something not allowed
barfoo

Parsing the OPER line will fail, but then it will try parsing it as an EXER
line, which will also fail, so it will leave the whole wrapper element out,
and it will continue to try to parse the OPER line instead of failing. Your
optional element gave it a way to suppress the error and parse differently.

If the schema after this OPER/EXPR element is say, just a string, then
"OPER/something not allowed" will be taken as the value of that string, and
... it's possible the parse will succeed and just produce an infoset that
is perfectly valid according to the schema, but clearly the schema is
allowing a solution we want to disallow.

The fix here is your optionality needs a discriminator. The discriminator
on the optional element you need checks that the data starts with OPER or
EXPR only.
(use dfdl:discriminator with testKind='pattern').

This issue is a matter of precision. It's the difference between:

   1. It's either a fully correct OPER line, or a fully correct EXER line,
   or it isn't present.
   2. It's either a line that starts with OPER or a line that starts with
   EXER or it isn't present.

That distinction is designing the schema to properly reject malformed data,
not just accept correct data.

See in (1) above, it allows for faulty OPER or EXPR lines to be correctly
parsed as "it isn't present". The decision really should NOT depend on any
more than the OPER or EXPR characters being there.

I find it hard to remember to do this. But most decisions in the schema
need discriminators. I have to revisit every decision point in the schema
one by one to make sure there are discriminators everywhere there can be.

On Tue, Sep 26, 2023 at 10:13 AM Roger L Costello <coste...@mitre.org>
wrote:

> Hi Folks,
>
> I think DFDL is awesome. Think about it: DFDL is a standard language for
> describing (describe, not parse) just about any data format. Again, I
> emphasize that it's not about how to parse the data format, it's about
> describing the data format. Given a description a DFDL processor can figure
> out how to parse instances of the data format. Wow!
>
> But there's another reason that DFDL is awesome: it forces you to be very
> precise in your description. It forces you to think very logically. It
> forces you to understand the implications of your description decisions.
> Let me give you an example of the later.
>
> I am dealing with a data format that consists of a sequence of lines.
> Here's a sample instance:
>
> John Doe
> OPER/XRAY//
> Sally Smith
>
> The first and last lines are just strings. Not interesting. The second
> line is the interesting one. Here's another instance:
>
> John Doe
> EXER/TANGO//
> Sally Smith
>
> As you can see, the second line starts with either OPER or EXER and
> terminates with //. The second line is also optional. That is, the second
> line is either OPER, EXER, or neither. That leads one to this description:
>
> choice
>       OPER (optional)
>       EXER (optional)
>
> However, DFDL doesn't allow branches of a choice to be optional. So, the
> correct description is:
>
> choice
>       sequence
>             OPER (optional)
>       sequence
>             EXER (optional)
>
> Slick, aye?
>
> But not correct.
>
> Let's think about this. Suppose the input is this:
>
> John Doe
> EXER/TANGO//
> Sally Smith
>
> While processing the second line, you would think that the DFDL processor
> would find that the first branch of the choice (the OPER branch) doesn't
> match and therefore the processor would process the line using the second
> branch. Ha! Not correct!
>
> The first branch is optional. That is key! Since the second line doesn't
> start with OPER, the DFDL processor thinks, "Oh, there must be no
> occurrences of the OPER line." So, the processor moves on to the
> description following the choice. Do you see it? Do you see the problem? I
> hope so. This is wicked cool. As I worked through this example, it forced
> me to think very, very clearly about the implication of an optional OPER
> line. So, what's the solution? Make OPER and EXER mandatory:
>
>  choice
>       sequence
>             OPER (mandatory)
>       sequence
>             EXER (mandatory)
>
> And, place the choice inside an optional wrapper element:
>
> OPER-EXER-wrapper (optional)
>       choice
>             sequence
>                   OPER (mandatory)
>             sequence
>                   EXER (mandatory)
>
> Now, with this input:
>
> John Doe
> EXER/TANGO//
> Sally Smith
>
> The processor will try the first branch of the choice, it fails, so it
> tries the second branch and succeeds.
>
> With this input:
>
> John Doe
> Sally Smith
>
> The processor will try the first branch of the choice, it fails, try the
> second branch, it fails, so there is no value for the wrapper element.
>
> This blows my mind. I feel like this example alone boosted my IQ by 10
> points.
>
> /Roger
>
>

Re: DFDL can increase your IQ by 10 points!

Reply via email to