RE: Parsing/unparsing with optional characters

Larry Barber Mon, 12 Feb 2024 08:59:29 -0800

Thanks for the quick and helpful response Steve!
I didn't realize that each choice could have multiple things in the 
choiceBranchKey.

-----Original Message-----
From: Steve Lawrence <slawre...@apache.org> 
Sent: Monday, February 12, 2024 11:48 AM
To: users@daffodil.apache.org
Subject: Re: Parsing/unparsing with optional characters

Are these keywords something like initiators/tags? So your data is something 
like this?

   KEYWORD=foo

If so, a common way to model that is something like this:

   <choice>
     <element name="keyword" type="xs:string"
       dfdl:initiator="KEYWORD= KEYW=" ... />
     <!-- other tagged elements with different initiators -->
   </choice>

Which as you mention always unparses to KEYWORD=, regardless of the original 
initiator found in the data.

If you need exact round tripping, you need to model the structure as data so 
the initiators end up in the infoset and can be unparsed exactly. You can still 
have a choice, but you would use choice dispatch instead of initiated content. 
So for example, your schema might become something like this:

   <element name="tag" dfdl:lengthKind="delimited" dfdl:terminator="=">
     <restriction base="xs:string">
       <enumeration value="KEYW">
       <enumeration value="KEYWORD">
       <!-- all valid tags -->
     </restriction>
   </element>
   <choice dfdl:choiceDispatchKey="{ ./tag }">
      <element name="keyword" dfdl:choiceBranchKey="KEYW KEYWORD" 
type="xs:string" ... />
      <!-- other elements with different choiceBranchKeys -->
   </choice>

This has the benefit that this has the possibility to be more efficient, since 
choice dispatch avoids parsing the same text multiple times to find the correct 
initiator.

On 2024-02-12 11:27 AM, Larry Barber wrote:
> Does anyone have suggestions on how structure a schema to parse a file 
> where both “KEYW” and “KEYWORD” are valid and considered identical and 
> that will return with the original version when unparsed?
>

RE: Parsing/unparsing with optional characters

Reply via email to