So our warning is not correct here. It doesn't distinguish
discriminators/asserts with test kind 'pattern' from those with text kind
'expression'.

The DFDL spec Section 9.5 Evaluation Order for Statement Annotations says
that statement annotations on a sequence execute

1. dfdl:discriminator or dfdl:assert(s) with testKind 'pattern' (parsing
only)
2. dfdl:newVariableInstance(s) - in lexical order, innermost schema
component first
3. dfdl:setVariable(s) - in lexical order, innermost schema component first
4. dfdl:sequence or dfdl:choice or dfdl:group following property scoping
rules and evaluating any property expressions (corresponds to
ComplexContent grammar region)
5. dfdl:discriminator or dfdl:assert(s) with testKind 'expression' (parsing
only)

Step 4 is the parsing of the contents of the sequence.

1, 2, and 3 happen before the sequence contents are parsed. Only 5 happens
after. So only 5, written at the top of a sequence with some content
besides just the annotation, has the counterintuitive placement issue where
the sequence body happens first, then the discriminator/assert that appears
lexically before it.

There was some rationale for this odd rule that I can't crisply recall,
having to do with schema authors being able to write discriminators without
having to worry about whether they are forward referencing or not.

But a pattern discriminator should not be triggering the warning. I'll
submit a bug report on this.



On Thu, Jun 13, 2024 at 8:12 AM Steve Lawrence <slawre...@apache.org> wrote:

> Discriminators/asserts annotated on sequences are evaluated at the end of
> the
> sequence after the content has been evaluated. This is usually not what is
> intended so Daffodil outputs this warning. To get it to evaluate before
> the
> content (like what you intend in this case) you need to wrap the
> discriminator/assert in a sequence, for example:
>
>    <xs:sequence>
>      <xs:sequence>
>        <xs:annotation>
>          <xs:appinfo source="http://www.ogf.org/dfdl/";>
>            <dfdl:discriminator ... />
>          </xs:appinfo>
>        </xs:annotation>
>      </xs:sequence>
>      <xs:element name='A' type="unsignedint1"/>
>   </xs:sequence>
>
> Tha should resolve the warning.
>
> Also, note that lengthKind="pattern" is character based, so might not work
> as
> expected since your data is bit-based. In order for a pattern
> discriminator to
> work you need to decode 4-bit characters (since both fields you care about
> are 4
> bits) after skipping the first bit. We have the X-DFDL-HEX-MSBF encoding
> that
> can decode 4-bit characters to a hex string, but there is no way to skip
> the
> first bit. You could move the discriminator sequence to after the A
> element,
> which works since at that point A is already parsed and the first bit is
> skipped, e.g.
>
>    <xs:sequence>
>      <xs:element name='A' type="unsignedint1"/>
>      <xs:sequence dfdl:encoding="X-DFDL-HEX-MSBF">
>        <xs:annotation>
>          <xs:appinfo source="http://www.ogf.org/dfdl/";>
>            <dfdl:discriminator testKind="pattern" testPattern="(79)|(7A)"
> />
>          </xs:appinfo>
>        </xs:annotation>
>      </xs:sequence>
>    </xs:sequence>
>
> But that feels a bit complex and unclear to me.
>
> I would suggest instead using the dfdlx:lookAhead extension function which
> was
> added exactly for this kind of thing. For example, your discriminator
> would look
> something like this:
>
>    <dfdl:discriminator testKind="expression" test="{
>      ((dfdlx:lookAhead(1, 4) eq 7) and (dfdlx:lookAhead(5, 4) eq 9)) or
>      ((dfdlx:lookAhead(1, 4) eq 7) and (dfdlx:lookAhead(5, 4) eq 10))
>    }" />
>
> It's a bit longer, but I think the intention is much ore clear. It's also
> probably going to be significantly more efficient since it doesn't have to
> decode characters and then run a regex on them.
>
>
> On 2024-06-13 07:09 AM, Roger L Costello wrote:
> > Hi Folks,
> >
> > I need my DFDL to look-ahead:
> >
> > The decimal value of the first bit of the input should be wrapped in an
> <A> tag if the decimal value of the next 4 bits equals 7 and the decimal
> value of the 4 bits after that equals 9, or the decimal value of the next 4
> bits equals 7 and the decimal value of the 4 bits after that equals 10,
> otherwise the first bit of the input should be wrapped in a <B> tag.
> >
> > Below is my attempt at implementing this. Daffodil gives this warning
> message:
> >
> > Schema Definition Warning: Counterintuitive placement detected. Wrap the
> discriminator or assert in an empty sequence to evaluate before the
> contents. (id: discouragedDiscriminatorPlacement)
> >
> > Here is my DFDL:
> >
> > <xs:element name="test">
> >      <xs:complexType>
> >          <xs:sequence>
> >              <xs:choice>
> >                  <xs:sequence>
> >                      <xs:annotation>
> >                          <xs:appinfo source="http://www.ogf.org/dfdl/";>
> >                              <dfdl:discriminator testKind="pattern"
> testPattern="(.x7x9)|(.x7xA)" />
> >                          </xs:appinfo>
> >                      </xs:annotation>
> >                      <xs:element name='A' type="unsignedint1"/>
> >                  </xs:sequence>
> >                  <xs:sequence>
> >                      <xs:element name='B' type="unsignedint1"/>
> >                  </xs:sequence>
> >              </xs:choice>
> >              <xs:element name='C' type="unsignedint4"/>
> >              <xs:element name='D' type="unsignedint4"/>
> >          </xs:sequence>
> >      </xs:complexType>
> > </xs:element>
>
>

Reply via email to