The only correction is to use a simpler test pattern: testPattern="(OPER)|(EXER)"
because you don't care what's after OPER or EXER. Just those characters are enough for you to decide the optional element DOES exist. I claim that's what the format usually means/intends when it describes data as having unique initiator strings like this. You look for those characters and only those to decide. On Wed, Sep 27, 2023 at 8:53 AM Roger L Costello <coste...@mitre.org> wrote: > Mike Beckerle wrote: > > > > - Design the DFDL schema to reject malformed data, not just accept > correct data. > > > > Oh, oh, yea! > > > > I like it! > > > > Not sure how to do that, however. Would you help me work through this, > please? > > > > Mike points out, with this input: > > > > - Foobar > - OPER/something not allowed// > - Barfoo > - > - Parsing the OPER line will fail, but then it will try parsing it as > an EXER line, which will also fail, so it will leave the whole wrapper > element out, and it will continue to try to parse the OPER line instead of > failing. > > > > Is this the behavior we desire: > > > > If an input line starts (is initiated by) OPER, then process the rest of > the input line using the DFDL description of OPER. If, during the > processing of the OPER field, an error arises, then the parser should > display an error message, abandon the input line, proceed to the next input > line and the element following the wrapper element. > > > > Is that the behavior we desire? > > > > Mike said that the solution is to: > > > > - Use dfdl:discriminator with testKind='pattern' > > > > I don’t think that I’ve ever used that combination, so I did some > experimenting. > > > > Suppose the legal value for the field following EXER is TANGO (all > uppercase) and the legal value for the field following OPER is XRAY (all > uppercase). > > > > Is this how to declare the wrapper element: > > > > <xs:element name="OPER-EXER-wrapper" minOccurs="0"> > <xs:annotation> > <xs:appinfo source="http://www.ogf.org/dfdl/"> > <dfdl:discriminator testKind="pattern" testPattern= > "(OPER/XRAY)|(EXER/TANGO)|"/> > </xs:appinfo> > </xs:annotation> > <xs:complexType> > <!-- OPER and EXER declarations --> > </xs:complexType> > </xs:element> > > > > Is that correct? > > > > This is great stuff. Once I grok this, my IQ will have increased another > 10 points. > > > > /Roger > > > > *From:* Mike Beckerle <mbecke...@apache.org> > *Sent:* Tuesday, September 26, 2023 4:49 PM > *To:* users@daffodil.apache.org > *Subject:* [EXT] Re: DFDL can increase your IQ by 10 points! > > ZjQcmQR > > > > YFThere is another detail which will further improve your schema. > > > > What if the data contains an OPER line, but after the OPER characters > there is some defect in the data of the OPER line. > > > > foobar > > OPER/something not allowed > > barfoo > > > > Parsing the OPER line will fail, but then it will try parsing it as an > EXER line, which will also fail, so it will leave the whole wrapper element > out, and it will continue to try to parse the OPER line instead of failing. > Your optional element gave it a way to suppress the error and parse > differently. > > > > If the schema after this OPER/EXPR element is say, just a string, then > "OPER/something not allowed" will be taken as the value of that string, and > ... it's possible the parse will succeed and just produce an infoset that > is perfectly valid according to the schema, but clearly the schema is > allowing a solution we want to disallow. > > > > The fix here is your optionality needs a discriminator. The discriminator > on the optional element you need checks that the data starts with OPER or > EXPR only. > > (use dfdl:discriminator with testKind='pattern'). > > > > This issue is a matter of precision. It's the difference between: > > 1. It's either a fully correct OPER line, or a fully correct EXER > line, or it isn't present. > 2. It's either a line that starts with OPER or a line that starts with > EXER or it isn't present. > > That distinction is designing the schema to properly reject malformed > data, not just accept correct data. > > > > See in (1) above, it allows for faulty OPER or EXPR lines to be correctly > parsed as "it isn't present". The decision really should NOT depend on any > more than the OPER or EXPR characters being there. > > > > I find it hard to remember to do this. But most decisions in the schema > need discriminators. I have to revisit every decision point in the schema > one by one to make sure there are discriminators everywhere there can be. > > > > > > > > > > > > > > > > On Tue, Sep 26, 2023 at 10:13 AM Roger L Costello <coste...@mitre.org> > wrote: > > Hi Folks, > > I think DFDL is awesome. Think about it: DFDL is a standard language for > describing (describe, not parse) just about any data format. Again, I > emphasize that it's not about how to parse the data format, it's about > describing the data format. Given a description a DFDL processor can figure > out how to parse instances of the data format. Wow! > > But there's another reason that DFDL is awesome: it forces you to be very > precise in your description. It forces you to think very logically. It > forces you to understand the implications of your description decisions. > Let me give you an example of the later. > > I am dealing with a data format that consists of a sequence of lines. > Here's a sample instance: > > John Doe > OPER/XRAY// > Sally Smith > > The first and last lines are just strings. Not interesting. The second > line is the interesting one. Here's another instance: > > John Doe > EXER/TANGO// > Sally Smith > > As you can see, the second line starts with either OPER or EXER and > terminates with //. The second line is also optional. That is, the second > line is either OPER, EXER, or neither. That leads one to this description: > > choice > OPER (optional) > EXER (optional) > > However, DFDL doesn't allow branches of a choice to be optional. So, the > correct description is: > > choice > sequence > OPER (optional) > sequence > EXER (optional) > > Slick, aye? > > But not correct. > > Let's think about this. Suppose the input is this: > > John Doe > EXER/TANGO// > Sally Smith > > While processing the second line, you would think that the DFDL processor > would find that the first branch of the choice (the OPER branch) doesn't > match and therefore the processor would process the line using the second > branch. Ha! Not correct! > > The first branch is optional. That is key! Since the second line doesn't > start with OPER, the DFDL processor thinks, "Oh, there must be no > occurrences of the OPER line." So, the processor moves on to the > description following the choice. Do you see it? Do you see the problem? I > hope so. This is wicked cool. As I worked through this example, it forced > me to think very, very clearly about the implication of an optional OPER > line. So, what's the solution? Make OPER and EXER mandatory: > > choice > sequence > OPER (mandatory) > sequence > EXER (mandatory) > > And, place the choice inside an optional wrapper element: > > OPER-EXER-wrapper (optional) > choice > sequence > OPER (mandatory) > sequence > EXER (mandatory) > > Now, with this input: > > John Doe > EXER/TANGO// > Sally Smith > > The processor will try the first branch of the choice, it fails, so it > tries the second branch and succeeds. > > With this input: > > John Doe > Sally Smith > > The processor will try the first branch of the choice, it fails, try the > second branch, it fails, so there is no value for the wrapper element. > > This blows my mind. I feel like this example alone boosted my IQ by 10 > points. > > /Roger > >