> > When you say "occurrence constraint errors are not recoverable", I'm not > sure I understand what you mean. If something is minOccurs="1" > maxOccurs="1" i.e., a scalar element, then yes, not finding it is a parse > error. But for all other combinations of min/max occurs, the behavior > depends on dfdl:occursCountKind. >
I had a misconception how min/maxOccurs behave in DFDL. The occursCountKind attribute is new to me but now I've realised that I can ditch this strict vs lax schema approach. A lot of my problems can be solved by simply by changing occursCountKind from "implicit" to "parsed" for the EDIFACT segments (the DFDL schema was based on https://github.com/DFDLSchemas/EDIFACT). Cheers! Claude On Thu, Aug 24, 2023 at 5:24 PM Mike Beckerle <mbecke...@apache.org> wrote: > Relaxing the min/maxOccurs seems problematic to me. Lots of things parse > up to a maximum by forward speculation, but stop when maxOccurs is reached. > (This is what dfdl:occursCountKind="implicit" does) > > For optional elements (minOccurs 0, maxOccurs 1), this behavior is > particularly important. > > When you say "occurrence constraint errors are not recoverable", I'm not > sure I understand what you mean. If something is minOccurs="1" > maxOccurs="1" i.e., a scalar element, then yes, not finding it is a parse > error. But for all other combinations of min/max occurs, the behavior > depends on dfdl:occursCountKind. > > If you just put back the original min/max occurs, what exactly is > happening to make you think you need to relax those? > > A dfdl:assert statement of kind 'recoverableError' generates a warning aka > validation error, and doesn't interact with parser-behavior (i.e., > backtracking) at all. > > Are you using these 'recoverableError' asserts for your enhanced > validation rules? > > > On Sat, Aug 19, 2023 at 7:17 AM Claude Mamo <claude.m...@gmail.com> wrote: > >> As suggested, I'm attempting to validate the structure of the EDIFACT >> document with DFDL assertions instead of Schematron. One thing I observed >> is that I need to relax *maxOccurs *(i..e, unbounded) and *minOccurs* >> (i.e., 0) otherwise the assertion rules won't be evaluated since occurrence >> constraint errors are not recoverable (I imagine that it would be the same >> case for Schematron) . However, when I relax the constraints, the parsed >> structure changes from: >> >> <SegGrp-3> >> <RFF-18660> >> <C506> >> <E1153>VA</E1153> >> <E1154>UK19430839</E1154> >> </C506> >> </RFF-18660> >> <RFF-18660> >> <C506> >> <E1153>ADE</E1153> >> <E1154>00000767</E1154> >> </C506> >> </RFF-18660> >> </SegGrp-3> >> >> to >> >> <SegGrp-3> >> <RFF> >> <C506> >> <E1153>VA</E1153> >> <E1154>UK19430839</E1154> >> </C506> >> </RFF> >> </SegGrp-3> >> <SegGrp-3> >> <RFF> >> <C506> >> <E1153>ADE</E1153> >> <E1154>00000767</E1154> >> </C506> >> </RFF> >> </SegGrp-3> >> >> I'd rather avoid making breaking changes to the structure so I decided to >> have two flavours of EDIFACT messages: strict and lax. A choice element >> first attempts to parse the message using the strict schema and then falls >> back to the lax schema if parsing on the strict one fails. >> >> ... >> ... >> <xsd:sequence dfdl:choiceBranchKey="INVOIC"> >> <xsd:choice> >> <xsd:sequence> >> <xsd:element ref="D03B:INVOIC"/> >> </xsd:sequence> >> <xsd:sequence> >> <xsd:element ref="D03B:Bad-INVOIC"/> >> </xsd:sequence> >> </xsd:choice> >> </xsd:sequence> >> ... >> ... >> >> The recoverable assertions are all defined within the *Bad-INVOIC* type >> and, where possible, the occurrence constraints are relaxed within this >> element type. Does it make sense what I wrote or do you think there might >> be a better way to implement this? >> >> Claude >> >> On Sun, Aug 13, 2023 at 12:31 PM Claude Mamo <claude.m...@gmail.com> >> wrote: >> >>> Schematron is really only needed for really rich validation rules that >>>> use the tree-walking capabilities of XPath to scrutinize elements wherever >>>> they appear in the infoset tree. >>>> >>> >>> I'll give it a try with dfdl:assert and see how it goes. >>> >>> Thank for all the feedback! >>> >>> Claude >>> >>> On Mon, Jul 24, 2023 at 11:35 PM Mike Beckerle <mbecke...@apache.org> >>> wrote: >>> >>>> Something to consider: >>>> >>>> I think many useful validation checks can be expressed in DFDL's >>>> expression language using the dfdl:assert statement with >>>> failureType='recoverableError'. >>>> >>>> The sort of constraints that say if this element exists then that can't >>>> exist, or if this has a specific value that that must exist... those sorts >>>> of things can usually be expressed. >>>> >>>> Those are run in an incremental/streaming fashion as the parser >>>> traverses the data based on the schema. >>>> >>>> Recoverable errors from Daffodil are the same as validation errors from >>>> Daffodil's internal "limited" evaluation. They don't guide the parse (don't >>>> cause backtracking), but come out as diagnostic warnings. >>>> >>>> Schematron is really only needed for really rich validation rules that >>>> use the tree-walking capabilities of XPath to scrutinize elements wherever >>>> they appear in the infoset tree. >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Jul 24, 2023 at 7:47 AM Steve Lawrence <slawre...@apache.org> >>>> wrote: >>>> >>>>> This is correct. The way daffodil currently implements full validation >>>>> (xerces) and custom validation (e.g. schematron) is pretty >>>>> inefficient. >>>>> We create two infosets: one the kind that the user passed to the parse >>>>> function, and one that is text XML written to a ByteArrayOuputStream >>>>> in >>>>> memory that is used internally for the validation once the parse is >>>>> completed. We do not currently stream validation. >>>>> >>>>> If you wanted streaming, you would probably need to create custom >>>>> InfosetOutputter, or maybe use the SAXInfosetOutputter with an >>>>> XMLReader >>>>> that chains/tees SAX events to custom schematron validation. >>>>> >>>>> - Steve >>>>> >>>>> On 2023-07-22 03:29 AM, Claude Mamo wrote: >>>>> > Spotted this code so presumably it's not streaming when custom or >>>>> full >>>>> > validation is in force: >>>>> > >>>>> https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala#L345-L356 >>>>> < >>>>> https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala#L345-L356 >>>>> > >>>>> > >>>>> > Claude >>>>> > >>>>> > On Sat, Jul 22, 2023 at 8:07 AM Claude Mamo <claude.m...@gmail.com >>>>> > <mailto:claude.m...@gmail.com>> wrote: >>>>> > >>>>> > Hello Daffodil team, >>>>> > >>>>> > I'm looking into adding support for Schematron validation since >>>>> we >>>>> > have had many Smooks developers asking for better validation of >>>>> > EDIFACT documents. One question I have is whether Schematron >>>>> > validation is applied in a streaming fashion. I mean, does >>>>> Daffodil >>>>> > load the whole infoset into memory before applying the Schematron >>>>> > rules or is Schematron validating on the fly while accumulating >>>>> any >>>>> > state that is required to be able to evaluate the rules? >>>>> > >>>>> > Thanks, >>>>> > >>>>> > Claude >>>>> > >>>>> >>>>>