As suggested, I'm attempting to validate the structure of the EDIFACT document with DFDL assertions instead of Schematron. One thing I observed is that I need to relax *maxOccurs *(i..e, unbounded) and *minOccurs* (i.e., 0) otherwise the assertion rules won't be evaluated since occurrence constraint errors are not recoverable (I imagine that it would be the same case for Schematron) . However, when I relax the constraints, the parsed structure changes from:
<SegGrp-3> <RFF-18660> <C506> <E1153>VA</E1153> <E1154>UK19430839</E1154> </C506> </RFF-18660> <RFF-18660> <C506> <E1153>ADE</E1153> <E1154>00000767</E1154> </C506> </RFF-18660> </SegGrp-3> to <SegGrp-3> <RFF> <C506> <E1153>VA</E1153> <E1154>UK19430839</E1154> </C506> </RFF> </SegGrp-3> <SegGrp-3> <RFF> <C506> <E1153>ADE</E1153> <E1154>00000767</E1154> </C506> </RFF> </SegGrp-3> I'd rather avoid making breaking changes to the structure so I decided to have two flavours of EDIFACT messages: strict and lax. A choice element first attempts to parse the message using the strict schema and then falls back to the lax schema if parsing on the strict one fails. ... ... <xsd:sequence dfdl:choiceBranchKey="INVOIC"> <xsd:choice> <xsd:sequence> <xsd:element ref="D03B:INVOIC"/> </xsd:sequence> <xsd:sequence> <xsd:element ref="D03B:Bad-INVOIC"/> </xsd:sequence> </xsd:choice> </xsd:sequence> ... ... The recoverable assertions are all defined within the *Bad-INVOIC* type and, where possible, the occurrence constraints are relaxed within this element type. Does it make sense what I wrote or do you think there might be a better way to implement this? Claude On Sun, Aug 13, 2023 at 12:31 PM Claude Mamo <claude.m...@gmail.com> wrote: > Schematron is really only needed for really rich validation rules that use >> the tree-walking capabilities of XPath to scrutinize elements wherever they >> appear in the infoset tree. >> > > I'll give it a try with dfdl:assert and see how it goes. > > Thank for all the feedback! > > Claude > > On Mon, Jul 24, 2023 at 11:35 PM Mike Beckerle <mbecke...@apache.org> > wrote: > >> Something to consider: >> >> I think many useful validation checks can be expressed in DFDL's >> expression language using the dfdl:assert statement with >> failureType='recoverableError'. >> >> The sort of constraints that say if this element exists then that can't >> exist, or if this has a specific value that that must exist... those sorts >> of things can usually be expressed. >> >> Those are run in an incremental/streaming fashion as the parser traverses >> the data based on the schema. >> >> Recoverable errors from Daffodil are the same as validation errors from >> Daffodil's internal "limited" evaluation. They don't guide the parse (don't >> cause backtracking), but come out as diagnostic warnings. >> >> Schematron is really only needed for really rich validation rules that >> use the tree-walking capabilities of XPath to scrutinize elements wherever >> they appear in the infoset tree. >> >> >> >> >> >> On Mon, Jul 24, 2023 at 7:47 AM Steve Lawrence <slawre...@apache.org> >> wrote: >> >>> This is correct. The way daffodil currently implements full validation >>> (xerces) and custom validation (e.g. schematron) is pretty inefficient. >>> We create two infosets: one the kind that the user passed to the parse >>> function, and one that is text XML written to a ByteArrayOuputStream in >>> memory that is used internally for the validation once the parse is >>> completed. We do not currently stream validation. >>> >>> If you wanted streaming, you would probably need to create custom >>> InfosetOutputter, or maybe use the SAXInfosetOutputter with an XMLReader >>> that chains/tees SAX events to custom schematron validation. >>> >>> - Steve >>> >>> On 2023-07-22 03:29 AM, Claude Mamo wrote: >>> > Spotted this code so presumably it's not streaming when custom or full >>> > validation is in force: >>> > >>> https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala#L345-L356 >>> < >>> https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala#L345-L356 >>> > >>> > >>> > Claude >>> > >>> > On Sat, Jul 22, 2023 at 8:07 AM Claude Mamo <claude.m...@gmail.com >>> > <mailto:claude.m...@gmail.com>> wrote: >>> > >>> > Hello Daffodil team, >>> > >>> > I'm looking into adding support for Schematron validation since we >>> > have had many Smooks developers asking for better validation of >>> > EDIFACT documents. One question I have is whether Schematron >>> > validation is applied in a streaming fashion. I mean, does Daffodil >>> > load the whole infoset into memory before applying the Schematron >>> > rules or is Schematron validating on the fly while accumulating any >>> > state that is required to be able to evaluate the rules? >>> > >>> > Thanks, >>> > >>> > Claude >>> > >>> >>>