Relaxing the min/maxOccurs seems problematic to me. Lots of things parse up to a maximum by forward speculation, but stop when maxOccurs is reached. (This is what dfdl:occursCountKind="implicit" does)
For optional elements (minOccurs 0, maxOccurs 1), this behavior is particularly important. When you say "occurrence constraint errors are not recoverable", I'm not sure I understand what you mean. If something is minOccurs="1" maxOccurs="1" i.e., a scalar element, then yes, not finding it is a parse error. But for all other combinations of min/max occurs, the behavior depends on dfdl:occursCountKind. If you just put back the original min/max occurs, what exactly is happening to make you think you need to relax those? A dfdl:assert statement of kind 'recoverableError' generates a warning aka validation error, and doesn't interact with parser-behavior (i.e., backtracking) at all. Are you using these 'recoverableError' asserts for your enhanced validation rules? On Sat, Aug 19, 2023 at 7:17 AM Claude Mamo <claude.m...@gmail.com> wrote: > As suggested, I'm attempting to validate the structure of the EDIFACT > document with DFDL assertions instead of Schematron. One thing I observed > is that I need to relax *maxOccurs *(i..e, unbounded) and *minOccurs* > (i.e., 0) otherwise the assertion rules won't be evaluated since occurrence > constraint errors are not recoverable (I imagine that it would be the same > case for Schematron) . However, when I relax the constraints, the parsed > structure changes from: > > <SegGrp-3> > <RFF-18660> > <C506> > <E1153>VA</E1153> > <E1154>UK19430839</E1154> > </C506> > </RFF-18660> > <RFF-18660> > <C506> > <E1153>ADE</E1153> > <E1154>00000767</E1154> > </C506> > </RFF-18660> > </SegGrp-3> > > to > > <SegGrp-3> > <RFF> > <C506> > <E1153>VA</E1153> > <E1154>UK19430839</E1154> > </C506> > </RFF> > </SegGrp-3> > <SegGrp-3> > <RFF> > <C506> > <E1153>ADE</E1153> > <E1154>00000767</E1154> > </C506> > </RFF> > </SegGrp-3> > > I'd rather avoid making breaking changes to the structure so I decided to > have two flavours of EDIFACT messages: strict and lax. A choice element > first attempts to parse the message using the strict schema and then falls > back to the lax schema if parsing on the strict one fails. > > ... > ... > <xsd:sequence dfdl:choiceBranchKey="INVOIC"> > <xsd:choice> > <xsd:sequence> > <xsd:element ref="D03B:INVOIC"/> > </xsd:sequence> > <xsd:sequence> > <xsd:element ref="D03B:Bad-INVOIC"/> > </xsd:sequence> > </xsd:choice> > </xsd:sequence> > ... > ... > > The recoverable assertions are all defined within the *Bad-INVOIC* type > and, where possible, the occurrence constraints are relaxed within this > element type. Does it make sense what I wrote or do you think there might > be a better way to implement this? > > Claude > > On Sun, Aug 13, 2023 at 12:31 PM Claude Mamo <claude.m...@gmail.com> > wrote: > >> Schematron is really only needed for really rich validation rules that >>> use the tree-walking capabilities of XPath to scrutinize elements wherever >>> they appear in the infoset tree. >>> >> >> I'll give it a try with dfdl:assert and see how it goes. >> >> Thank for all the feedback! >> >> Claude >> >> On Mon, Jul 24, 2023 at 11:35 PM Mike Beckerle <mbecke...@apache.org> >> wrote: >> >>> Something to consider: >>> >>> I think many useful validation checks can be expressed in DFDL's >>> expression language using the dfdl:assert statement with >>> failureType='recoverableError'. >>> >>> The sort of constraints that say if this element exists then that can't >>> exist, or if this has a specific value that that must exist... those sorts >>> of things can usually be expressed. >>> >>> Those are run in an incremental/streaming fashion as the parser >>> traverses the data based on the schema. >>> >>> Recoverable errors from Daffodil are the same as validation errors from >>> Daffodil's internal "limited" evaluation. They don't guide the parse (don't >>> cause backtracking), but come out as diagnostic warnings. >>> >>> Schematron is really only needed for really rich validation rules that >>> use the tree-walking capabilities of XPath to scrutinize elements wherever >>> they appear in the infoset tree. >>> >>> >>> >>> >>> >>> On Mon, Jul 24, 2023 at 7:47 AM Steve Lawrence <slawre...@apache.org> >>> wrote: >>> >>>> This is correct. The way daffodil currently implements full validation >>>> (xerces) and custom validation (e.g. schematron) is pretty inefficient. >>>> We create two infosets: one the kind that the user passed to the parse >>>> function, and one that is text XML written to a ByteArrayOuputStream in >>>> memory that is used internally for the validation once the parse is >>>> completed. We do not currently stream validation. >>>> >>>> If you wanted streaming, you would probably need to create custom >>>> InfosetOutputter, or maybe use the SAXInfosetOutputter with an >>>> XMLReader >>>> that chains/tees SAX events to custom schematron validation. >>>> >>>> - Steve >>>> >>>> On 2023-07-22 03:29 AM, Claude Mamo wrote: >>>> > Spotted this code so presumably it's not streaming when custom or >>>> full >>>> > validation is in force: >>>> > >>>> https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala#L345-L356 >>>> < >>>> https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala#L345-L356 >>>> > >>>> > >>>> > Claude >>>> > >>>> > On Sat, Jul 22, 2023 at 8:07 AM Claude Mamo <claude.m...@gmail.com >>>> > <mailto:claude.m...@gmail.com>> wrote: >>>> > >>>> > Hello Daffodil team, >>>> > >>>> > I'm looking into adding support for Schematron validation since we >>>> > have had many Smooks developers asking for better validation of >>>> > EDIFACT documents. One question I have is whether Schematron >>>> > validation is applied in a streaming fashion. I mean, does >>>> Daffodil >>>> > load the whole infoset into memory before applying the Schematron >>>> > rules or is Schematron validating on the fly while accumulating >>>> any >>>> > state that is required to be able to evaluate the rules? >>>> > >>>> > Thanks, >>>> > >>>> > Claude >>>> > >>>> >>>>