I want to add one important thought to this.

You don't want your <invalid>...</invalid> element to actually be
considered "valid" by the XML Schema (which is the DFDL schema).

If you just construct an <invalid> .... hexBinary here </invalid> element
when parsing fails, e.g., as the final branch of an xs:choice, then that
will be VALID xml.

So I find it helpful to define "always invalid" types, and use those for
these sorts of error-tolerating catch-alls.

One way is like this:

   <complexType name="alwaysInvalid">
       <sequence>
           <element name="hex" maxOccurs="0"
                          dfdl:occursCountKind='parsed'>
               <simpleType>
                    <restriction base="xs:hexBinary">
                        <length value="0"/>
                    </restriction>
               </simpleType>
          </element>
       </sequence>
  </complexType>

The idea here is that maxOccurs="0" will mean if there is an instance of
this hex element at all, it is invalid.
The length facet 0 is insisting that the hex element, if it exists, must
contain at least 1 byte.

The challenge with this is that different XML tools tolerate maxOccurs="0"
to different degrees. I believe it is technically allowed by XSD, but I
recall tools other than Xerces (esp. user interfaces associated with
writing XSD) giving me a hard time about it.  Your mileage may vary.





On Fri, Sep 8, 2023 at 8:29 AM Steve Lawrence <slawre...@apache.org> wrote:

> If you want different element names, there's really way to avoid a
> choice. You can get fancy with a hidden group, restriction, choice
> dispatch and checkConstraints, and that avoids parsing the same field
> twice, but it's very ugly and I wouldn't recommend it. I've put an
> example of what this might look like at the end of this email.
>
> In general, I would recommend not trying to have separate valid/invalid
> elements, but instead parse the the field and rely on restrictions for
> validation. For example:
>
>    <element name="field" dfdl:lengthKind="delimited"
> dfdl:terminator="%NL;">
>      <simpleType>
>        <restriction base="xs:string" />
>          <pattern value="[0-9][a-zA-Z]" />
>        </restriction>
>      </simpleType>
>    </element>
>
> This works very similar, but now field is used for the well-formed
> content, regardless if it's valid or not, and the restriction will be
> used to validate it, either with Daffodils internal "limited"
> validation, "full" Xerces validation, or external validation. An added
> benefit it is doesn't have to parse the data twice for invalid data.
>
>
> Below is the approach mentioned at the top. Note that it uses the same
> technique as above, but it hides the field element and uses choice
> dispatch, inputValueCalc, and outputValueCalc to create the
> invalid/valid elements.
>
>    <group name="hiddenField">
>      <sequence>
>        <element name="field" dfdl:terminator="%NL;"
>          dfdl:outputValueCalc="{ if (fn:exists(../valid)) then ../valid
> else ../invalid }">
>          <simpleType>
>            <restriction base="xs:string">
>              <pattern value="[0-9][a-zA-Z]" />
>            </restriction>
>          </simpleType>
>        </element>
>      </sequence>
>    </group>
>
>    <element name="root">
>      <complexType>
>        <sequence>
>          <sequence dfdl:hiddenGroupRef="ex:hiddenField" />
>          <choice dfdl:choiceDispatchKey="{
> xs:string(dfdl:checkConstraints(./field)) }">
>            <sequence dfdl:choiceBranchKey="true">
>              <element name="valid" type="xs:string"
> dfdl:inputValueCalc="{ ../field }" />
>            </sequence>
>            <sequence dfdl:choiceBranchKey="false">
>              <element name="invalid" type="xs:string"
> dfdl:inputValueCalc="{ ../field }" />
>            </sequence>
>          </choice>
>        </sequence>
>      </complexType>
>    </element>
>
>
> On 2023-09-07 05:04 PM, Roger L Costello wrote:
> > Hi Folks,
> >
> > Good input contains a digit followed by a letter, e.g., this is good
> > input: 1H
> >
> > Anything else is bad input, e.g., this is bad input: 1H23
> >
> > If the input is good, I want to put the input into a <valid> element,
> e.g.,
> >
> > <valid>1H</valid>
> >
> > If the input is bad, I want to put the input into an <invalid> element,
> > e.g.,
> >
> > <invalid>1H23</invalid>
> >
> > This DFDL seems to work:
> >
> > <xs:choice>
> > <xs:sequencedfdl:terminator="%NL;">
> > <xs:elementname="valid"type="xs:string"dfdl:lengthKind="pattern"
> >              dfdl:lengthPattern="[0-9][a-zA-Z]"/>
> > </xs:sequence>
> > <xs:elementname="invalid"type="xs:string"/>
> > </xs:choice>
> >
> > But that doesn’t seem like a good solution. Is there a better way to
> > solve this problem?
> >
> > /Roger
> >
>
>

Reply via email to