DFDL properties eliminate the ambiguity for parsing.

But don't forget about unparsing. You start from XML and need to be able to
choose the right parts of the DFDL schema to use to unparse it. That has to
be unambiguous without doing a bunch of look-ahead, and the only thing you
have is the XML.

So "turning off" UPA errors isn't possible.

In general the fix to UPA errors is "Add more Elements" in various idioms.
Generally each branch of a choice has to be a unique element, or begin with
a unique element.

In your example, I noticed that both the "D" and "E" branches are the same.
You can avoid that duplication by doing dfdl:choiceBranchKey="D E" which
means "D" or "E" (it's a whitespace separated list).
So in this tiny case you can avoid the UPA error by consolidation, But I
expect this is actually a stripped down example and in the real world you
would have some different stuff in the D and E branches.

I'm going to assume your letters D and E are the syntax, but that those
have some conceptual/logical meaning more than just the letter which I
called D_conceptName and E_conceptName below.

Idea 1: Marker elements

<xs:element name="record">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="Section_Code" type="fixedLength_string"
 dfdl:length="1"/>
            <xs:choice dfdl:choiceDispatchKey="{Section_Code}">
                <xs:sequence dfdl:choiceBranchKey="D">
                    <xs:element name="D_conceptName" type="emptyString"/>
<!-- just a marker element -->
                    <xs:element name="Subsection_Code" type=
"fixedLength_string" dfdl:length="1"/>
                </xs:sequence>
                <xs:sequence dfdl:choiceBranchKey="E">
                    <xs:element name="E_conceptName" type="emptyString"/>
<!-- just a marker element -->
                    <xs:element name="Subsection_Code" type=
"fixedLength_string" dfdl:length="1"/>
                </xs:sequence>
            </xs:choice>
        </xs:sequence>
    </xs:complexType>
</xs:element>

You define the emptyString type to consume no data and create an empty
marker element. That will make the XML unambiguous, and preserves the
polymorphic path "record/Subsection_Code" as being independent of whether
the D or E branch was parsed.

This works in Cyberia in that parsing will populate this D and E empty
element for you.

If someone was creating this data from whole cloth, then they have to know
to insert an <D_conceptName/> or <E_conceptName/> marker element before the
Subsection_Code to tell an XML Reader which branch of the DFDL schema to
use.  Viewed from a purely XML perspective, not DFDL, this marker element
thing is a hack.



*Idea 2: Still Choice-by-dispatch, but with distinguished elements:*

This will work and definitely be fast:

<xs:element name="record">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="Section_Code" type="fixedLength_string"
 dfdl:length="1"/>
            <xs:choice dfdl:choiceDispatchKey="{Section_Code}">
                    <xs:element name="D_elementName"
dfdl:choiceBranchKey="D">
                        ... D branch contents
                   </xs:element>
                   <xs:element name="E_elementName" dfdl:choiceBranchKey="E"
>
                         ... E branch contents
                   </xs:element>
            </xs:choice>
        </xs:sequence>
    </xs:complexType>
</xs:element>

It also loses some polymorphism. For example you now have two different
children of your record element so the path to the Subsection_Code field is
either "record/D_conceptName/Subsection_Code" or it is
"record/E_conceptName/Subsection_Code". This doesn't matter once you have
created XML or EXI as the XPath "record/*/Subsection_Code" would work. But
DFDL's expressions (in DFDL v1.0) have no wildcard feature like that.

*Idea 3: Initiated Content*

This is the solution I really like, but I don't think it will be fast
enough today.

DFDL really wants your D and E to be syntax features of the data, not
values in the data.

<xs:element name="record">
    <xs:complexType>
            <xs:choice dfdl:initiatedContent="yes">
                    <xs:element name="D_conceptName" dfdl:initiator="D">
                           ... contents of element D
                    </xs:element>
                    <xs:element name="E_conceptName" dfdl:initiator="E">
                           ... contents of element E
                    </xs:element>
            </xs:choice>
    </xs:complexType>
</xs:element>

In theory this can go just as fast as your choiceDispatchKey solution if
all the initiators are the same length strings without any entities like
%WSP*; or anything like that.

But I think Daffodil does not have this optimization today.


On Wed, Jan 10, 2024 at 1:10 PM Roger L Costello <coste...@mitre.org> wrote:

> Hi Folks,
>
>
>
> The Unique Particle Attribution (UPA) error is driving me crazy.
>
>
>
> UPA doesn’t make sense in a DFDL schema because the DFDL properties
> eliminate the ambiguity.
>
>
>
> Here’s a schema that produces a UPA error:
>
>
>
> <xs:element name="record">
>     <xs:complexType>
>         <xs:sequence>
>             <xs:element name="Section_Code" type="fixedLength_string"
> dfdl:length="1"/>
>             <xs:choice dfdl:choiceDispatchKey="{Section_Code}">
>                 <xs:sequence dfdl:choiceBranchKey="D">
>                     <xs:element name="Subsection_Code" type=
> "fixedLength_string" dfdl:length="1"/>
>                 </xs:sequence>
>                 <xs:sequence dfdl:choiceBranchKey="E">
>                     <xs:element name="Subsection_Code" type=
> "fixedLength_string" dfdl:length="1"/>
>                 </xs:sequence>
>             </xs:choice>
>         </xs:sequence>
>     </xs:complexType>
> </xs:element>
>
>
>
> The UPA error occurs because there are two Subsection_Code elements within
> the record element. If this was a plain XML Schema, then I can understand
> the UPA error because an XSD validator wouldn’t know whether to use the
> first or the second Subsection_Code element to validate XML. But this isn’t
> a plain XML Schema, it is a DFDL schema and the choiceBranchKey let’s us
> know that the first Subsection_Code should be used when the Section_Code =
> D, the second should be used when Section_Code = E.
>
>
>
> Is there any way to disable UPA errors?
>
>
>
> /Roger
>
>
>
>
>
>
>

Reply via email to