Re: How to specify that the nilValue can occur anywhere within a fixed field?

Mike Beckerle Thu, 04 Jan 2024 07:25:11 -0800

Right, so you can't put it inside a sequence that has a separator, because
that will then require another separator.
DFDL doesn't know that your group can never contain any syntax or elements.
It assumes a group ref means another term so another separator.


Here's where I would suggest putting it:

                <xs:element name="Line" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:sequence> <!-- new sequence without any
separator -->
                             <!-- In this format, if any data is available,
then a line element exists! -->
                             <xs:group ref="discriminator_hasAnyData"/>
                             <!-- and that means this separated sequence
must appear -->
                            <xs:sequence dfdl:separator="/"
 dfdl:separatorPosition="infix">
                                <xs:element name="A" type="xs:string" />
                                <xs:element ref="Foo"/>
                                <xs:element name="B" type="xs:string" />
                            </xs:sequence>
                      </xs:sequence><!-- end new sequence without any
separator -->
                    </xs:complexType>
                </xs:element>

On Thu, Jan 4, 2024 at 10:11 AM Roger L Costello <coste...@mitre.org> wrote:

> Hi Mike,
>
>
>
> I am trying to get your discriminator suggestion working. I added it to
> the schema that is getting “left over data”. Where do I put the group ref?
> I tried it in several locations but wherever I put it, I got this error
> message:
>
>
>
> *Parse Error: Failed to populate Line[1]. Cause: Parse Error: Failed to
> parse infix separator. Cause: Parse Error: Separator '/' not found*
>
>
>
> Here’s my input:
>
>
>
> .../ABC/...
> .../-  /...
> .../ - /...
> .../  -/...
>
>
>
>
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xs:schema xmlns:dfdl=http://www.ogf.org/dfdl/dfdl-1.0/ xmlns:xs=
> http://www.w3.org/2001/XMLSchema>
>
>     <xs:include schemaLocation=
> "../default-dfdl-properties/defaults.dfdl.xsd" />
>
>     <xs:annotation>
>         <xs:appinfo source=http://www.ogf.org/dfdl/>
>             <dfdl:format ref="default-dfdl-properties" />
>         </xs:appinfo>
>     </xs:annotation>
>
>     <xs:element name="Test">
>         <xs:complexType>
>             <xs:sequence dfdl:separator="%NL;" dfdl:separatorPosition=
> "infix">
>                 <xs:element name="Line" maxOccurs="unbounded">
>                     <xs:complexType>
>                         <xs:sequence dfdl:separator="/"
> dfdl:separatorPosition="infix">
>                             <xs:element name="A" type="xs:string" />
>                             <xs:group ref="discriminator_hasAnyData"/>
>                             <xs:element ref="Foo"/>
>                             <xs:element name="B" type="xs:string" />
>                         </xs:sequence>
>                     </xs:complexType>
>                 </xs:element>
>             </xs:sequence>
>         </xs:complexType>
>     </xs:element>
>
>     <xs:element name="Foo"
>                 type="Foo_simpleType"
>                 nillable="true"
>                 dfdl:nilKind="literalValue"
>                 dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-"
>                 dfdl:lengthKind="explicit"
>                 dfdl:length="3"
>                 dfdl:textTrimKind="padChar"
>                 dfdl:textPadKind="padChar"
>                 dfdl:textStringPadCharacter="%SP;"
>                 dfdl:textStringJustification="left"/>
>
>     <xs:simpleType name="Foo_simpleType">
>         <xs:restriction base="validString">
>             <xs:pattern value="ABC|DEF|GHI" />
>         </xs:restriction>
>     </xs:simpleType>
>
>     <xs:simpleType name="validString">
>         <xs:annotation>
>             <xs:appinfo source=http://www.ogf.org/dfdl/>
>                 <dfdl:assert>{ dfdl:checkConstraints(.) }</dfdl:assert>
>             </xs:appinfo>
>         </xs:annotation>
>         <xs:restriction base="xs:string"/>
>     </xs:simpleType>
>
>     <xs:group name="discriminator_hasAnyData">
>         <xs:sequence>
>             <xs:annotation>
>                 <xs:appinfo source=http://www.ogf.org/dfdl/>
>                     <dfdl:discriminator testKind="pattern" testPattern=
> "[\s\S]"/>
>                 </xs:appinfo>
>            </xs:annotation>
>        </xs:sequence>
>     </xs:group>
>
> </xs:schema>
>
>
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Thursday, January 4, 2024 8:53 AM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere
> within a fixed field?
>
>
>
> My guess is that one of the whitespace characters is a tab, not a space or
> two spaces. So your nilValue doesn't match. That causes a subsequent parse
> error, and it backtracks, and your schema then succeeds, without consuming
> all the data. Your
>
> ZjQcmQRYFpfptBannerEnd
>
> My guess is that one of the whitespace characters is a tab, not a space or
> two spaces. So your nilValue doesn't match. That causes a subsequent parse
> error, and it backtracks, and your schema then succeeds, without consuming
> all the data.
>
>
>
> Your schema likely could be improved by adding discriminators. That's a
> common need when the "left over data" issue is reported. Your schema is
> currently happy to successfully complete parsing, but not consuming all the
> data. If your schema is for a file format where there is a requirement that
> it consume all the data, then discriminators should ensure all the data is
> consumed or a parse error occurs.
>
>
>
> I have found this discriminator useful:
>
>
>
> <dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/>
>
>
>
> This is true if the regex matches the front of the data stream at that
> point, which means "there is at least one character/byte of anything at
> all. I.e., there is more data to be had.
>
>
>
> For example if you have a file that is an array of records. So if there is
> more data, it must be a record. Ending the array before all the data is
> consumed because attempting to parse another record fails is not
> acceptable. So putting this discriminator on that record array element decl
> insures this. You will never get 'left over data' because the schema isn't
> allowed to succeed if there is data remaining.
>
>
>
> I like to wrap this discriminator in a group decl to make it self
> documenting:
>
>
>
> <group name="discriminator_hasAnyData">
>
>   <sequence>
>
>       <annotation><appinfo source="http://www.ogf.org/dfdl/";>
>
>           <dfdl:discriminator testKind="pattern" testPattern="[\s\S]"/>
>
>        </appinfo></annotation>
>
>    </sequence>
>
> </group>
>
>
>
> Then a group reference to this is a compact one-liner, not 5 or 7 lines of
> sequence and annotation.
>
>
>
>
>
> On Thu, Jan 4, 2024 at 7:51 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Mike,
>
>
>
> To allow a hyphen to occur anywhere within a 3-character field I specified
> this:
>
>
>
> dfdl:nilValue="-%SP;%SP; %SP;-%SP; %SP;%SP;-"
>
>
>
> But that failed with the dreaded “Left over data” error message.
>
>
>
> Conversely, both of these succeeded:
>
>
>
> dfdl:nilValue="%WSP*;-%WSP*;"
>
> dfdl:nilValue="%WSP*;-"
>
>
>
> Why is that?
>
>
>
> /Roger
>
>
>
>
>
> *From:* Mike Beckerle <mbecke...@apache.org>
> *Sent:* Tuesday, January 2, 2024 11:58 AM
> *To:* users@daffodil.apache.org
> *Subject:* [EXT] Re: How to specify that the nilValue can occur anywhere
> within a fixed field?
>
>
>
> Tricky! For strings we typically justify left, meaning we trim padding
> characters on the right, i. e. , textStringJustification="left". That means
> if your data is "- " or " - ", then the spaces on the right side
>
> Tricky!
>
>
>
> For strings we typically justify left, meaning we trim padding characters
> on the right, i.e., textStringJustification="left".
>
>
>
> That means if your data is "-  " or " - ", then the spaces on the right
> side are trimmed away before comparison against the "%WSP*;-" nilValue is
> done.
>
>
>
> However, for numbers we typically justify right, meaning we trim on the
> left, ie., textNumberJustification="right".
>
>
>
> In that case "-  " or " - " would not be trimmed on the right side, but on
> the left, leaving them with spaces after the hyphen, so "%WSP*;-" won't
> match them.
>
>
>
> So, the rationale for suggesting "%WSP*;-%WSP*;" i.e., with WSP* on both
> sides, is so that your nilValue matching conventions are  insensitive to
> type and to whether you use text justification of left or right.
>
>
>
>
>
> On Fri, Dec 22, 2023 at 8:01 AM Roger L Costello <coste...@mitre.org>
> wrote:
>
> Hi Folks,
>
>
>
> I have a fixed-length field (3) that has hyphen as the nilValue. The
> hyphen can be positioned anywhere in the field, e.g.,
>
>
>
> .../-  /...
>
> .../ - /...
>
> .../  -/...
>
>
>
> What is the right way to specify the nilValue? I specified it this way:
>
>
>
> dfdl:nilValue="%WSP*;-"
>
>
>
> and it seems to work just fine.
>
>
>
> But I was told, “that only allows whitespace before the hyphen; it should
> be specified this way:
>
>
>
> dfdl:nilValue="%WSP*;-%WSP*;"
>
>
>
> What is the correct way?
>
>
>
> /Roger
>
>
>
>
>
>

Re: How to specify that the nilValue can occur anywhere within a fixed field?

Reply via email to