Something to consider:

I think many useful validation checks can be expressed in DFDL's expression
language using the dfdl:assert statement with
failureType='recoverableError'.

The sort of constraints that say if this element exists then that can't
exist, or if this has a specific value that that must exist... those sorts
of things can usually be expressed.

Those are run in an incremental/streaming fashion as the parser traverses
the data based on the schema.

Recoverable errors from Daffodil are the same as validation errors from
Daffodil's internal "limited" evaluation. They don't guide the parse (don't
cause backtracking), but come out as diagnostic warnings.

Schematron is really only needed for really rich validation rules that use
the tree-walking capabilities of XPath to scrutinize elements wherever they
appear in the infoset tree.





On Mon, Jul 24, 2023 at 7:47 AM Steve Lawrence <slawre...@apache.org> wrote:

> This is correct. The way daffodil currently implements full validation
> (xerces) and custom validation (e.g. schematron) is pretty inefficient.
> We create two infosets: one the kind that the user passed to the parse
> function, and one that is text XML written to a ByteArrayOuputStream in
> memory that is used internally for the validation once the parse is
> completed. We do not currently stream validation.
>
> If you wanted streaming, you would probably need to create custom
> InfosetOutputter, or maybe use the SAXInfosetOutputter with an XMLReader
> that chains/tees SAX events to custom schematron validation.
>
> - Steve
>
> On 2023-07-22 03:29 AM, Claude Mamo wrote:
> > Spotted this code so presumably it's not streaming when custom or full
> > validation is in force:
> >
> https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala#L345-L356
> <
> https://github.com/apache/daffodil/blob/main/daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/DataProcessor.scala#L345-L356
> >
> >
> > Claude
> >
> > On Sat, Jul 22, 2023 at 8:07 AM Claude Mamo <claude.m...@gmail.com
> > <mailto:claude.m...@gmail.com>> wrote:
> >
> >     Hello Daffodil team,
> >
> >     I'm looking into adding support for Schematron validation since we
> >     have had many Smooks developers asking for better validation of
> >     EDIFACT documents. One question I have is whether Schematron
> >     validation is applied in a streaming fashion. I mean, does Daffodil
> >     load the whole infoset into memory before applying the Schematron
> >     rules or is Schematron validating on the fly while accumulating any
> >     state that is required to be able to evaluate the rules?
> >
> >     Thanks,
> >
> >     Claude
> >
>
>

Reply via email to