Hi Mike,

My schema is at the end of this message.

I did some testing (Daffodil, version 3.0) and here’s what I found.


Input: michael, james,,rogers,888-888-8888,777-777-7777,,,,

When I use separatorSuppressionPolicy="never"

Parsing yields:


<file>

  <given-name>michael</given-name>

  <given-name> james</given-name>

  <given-name></given-name>

  <surname>rogers</surname>

  <phone>888-888-8888</phone>

  <phone>777-777-7777</phone>

  <phone></phone>

  <phone></phone>

  <phone></phone>

  <phone></phone>
</file>

Unparsing yields:

michael, james,,rogers,888-888-8888,777-777-7777,,,,

When I use separatorSuppressionPolicy="anyEmpty"

Parsing yields:

<file>
  <given-name>michael</given-name>
  <given-name> james</given-name>
  <surname>rogers</surname>
  <phone>888-888-8888</phone>
  <phone>777-777-7777</phone>
</file>

Unparsing yields:

michael, james,rogers,888-888-8888,777-777-7777

When I use separatorSuppressionPolicy="trailingEmpty"

Parsing yields:

<file>
  <given-name>michael</given-name>
  <given-name> james</given-name>
  <surname>rogers</surname>
  <phone>888-888-8888</phone>
  <phone>777-777-7777</phone>
</file>

Unparsing yields:

michael, james,,rogers,888-888-8888,777-777-7777

Next, instead of 1-3 given-names and 1-6 phones, and changed it to 3-3 
given-names and 6-6 phones (i.e., exactly 3 given-names and exactly 6 phones).

When I use separatorSuppressionPolicy="never"

Parsing yields:

<file>
  <given-name>michael</given-name>
  <given-name> james</given-name>
  <given-name></given-name>
  <surname>rogers</surname>
  <phone>888-888-8888</phone>
  <phone>777-777-7777</phone>
  <phone></phone>
  <phone></phone>
  <phone></phone>
  <phone></phone>
</file>

Unparsing yields:

michael, james,,rogers,888-888-8888,777-777-7777,,,,

When I use separatorSuppressionPolicy="anyEmpty"

Parsing yields:

<file>
  <given-name>michael</given-name>
  <given-name> james</given-name>
  <given-name></given-name>
  <surname>rogers</surname>
  <phone>888-888-8888</phone>
  <phone>777-777-7777</phone>
  <phone></phone>
  <phone></phone>
  <phone></phone>
  <phone></phone>
</file>

Unparsing yields:

michael, james,rogers,888-888-8888,777-777-7777

When I use separatorSuppressionPolicy="trailingEmpty"

Parsing yields:

<file>
  <given-name>michael</given-name>
  <given-name> james</given-name>
  <given-name></given-name>
  <surname>rogers</surname>
  <phone>888-888-8888</phone>
  <phone>777-777-7777</phone>
  <phone></phone>
  <phone></phone>
  <phone></phone>
  <phone></phone>
</file>

Unparsing yields:

michael, james,,rogers,888-888-8888,777-777-7777

Here is the schema:

<xs:schema
  xmlns:xs="http://www.w3.org/2001/XMLSchema";
  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";>

  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/";>
      <dfdl:format
        alignment="1"
        alignmentUnits="bytes"
        binaryFloatRep="ieee"
        binaryNumberCheckPolicy="lax"
        binaryNumberRep="binary"
        binaryCalendarEpoch="1970-01-01T00:00:00"
        bitOrder="mostSignificantBitFirst"
        byteOrder="bigEndian"
       calendarCenturyStart="53"
        calendarCheckPolicy="strict"
        calendarDaysInFirstWeek="4"
        calendarFirstDayOfWeek="Sunday"
        calendarLanguage="en"
        calendarObserveDST="yes"
        calendarPatternKind="implicit"
        calendarTimeZone=""
        choiceLengthKind="implicit"
        decimalSigned="yes"
        documentFinalTerminatorCanBeMissing="no"
        emptyValueDelimiterPolicy="both"
        encodingErrorPolicy="replace"
        encoding="US-ASCII"
        escapeSchemeRef=""
        fillByte="%#r20;"
        floating="no"
        ignoreCase="no"
        initiatedContent="no"
        initiator=""
        leadingSkip="0"
        lengthUnits="bytes"
        occursCountKind="implicit"
        outputNewLine="%LF;"
        representation="text"
        separator=""
        separatorPosition="infix"
        sequenceKind="ordered"
        terminator=""
        textBidi="no"
        textBooleanPadCharacter="%SP;"
        textCalendarJustification="left"
        textCalendarPadCharacter="%SP;"
        textNumberCheckPolicy="lax"
        textNumberJustification="right"
        textNumberPadCharacter="%SP;"
        textNumberPattern="#,##0.###;-#,##0.###"
        textNumberRep="standard"
        textNumberRounding="explicit"
        textNumberRoundingIncrement="0"
        textNumberRoundingMode="roundHalfEven"
        textOutputMinLength="0"
        textPadKind="none"
        textStandardBase="10"
        textStandardDecimalSeparator="."
        textStandardExponentRep="E"
        textStandardGroupingSeparator=","
        textStandardInfinityRep="Inf"
        textStandardNaNRep="NaN"
        textStandardZeroRep="0"
        textStringJustification="left"
        textStringPadCharacter="%SP;"
        textTrimKind="none"
        trailingSkip="0"
        truncateSpecifiedLengthString="no"
        utf16Width="fixed"

        lengthKind="delimited"

      />
    </xs:appinfo>
  </xs:annotation>

  <!--<xs:element name="file">
    <xs:complexType>
      <xs:sequence dfdl:separator="," dfdl:separatorPosition="infix"
        dfdl:separatorSuppressionPolicy="trailingEmpty">
        <xs:element name="given-name" type="xs:string" maxOccurs="3" />
        <xs:element name="surname" type="xs:string" />
        <xs:element name="phone" type="xs:string" maxOccurs="6" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>-->

  <xs:element name="file">
    <xs:complexType>
      <xs:sequence dfdl:separator="," dfdl:separatorPosition="infix"
        dfdl:separatorSuppressionPolicy="trailingEmpty">
        <xs:element name="given-name" type="xs:string" minOccurs="3" 
maxOccurs="3" />
        <xs:element name="surname" type="xs:string" />
        <xs:element name="phone" type="xs:string" minOccurs="6" maxOccurs="6" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

From: Beckerle, Mike <mbecke...@owlcyberdefense.com>
Sent: Wednesday, April 21, 2021 9:24 AM
To: users@daffodil.apache.org
Subject: [EXT] Re: Is separatorSuppressionPolicy=never meaningless?

Roger, please send the whole schema. I'll figure out why my intuition about 
this is totally off.

This does depend on assumptions like dfdl:occursCountKind='implicit'.

I believe it should​ not be putting those empty elements into the infoset.

So it could either be a bug, or there's some property missing/wrong that I 
can't guess right off.
________________________________
From: Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>>
Sent: Wednesday, April 21, 2021 9:16 AM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> 
<users@daffodil.apache.org<mailto:users@daffodil.apache.org>>
Subject: Re: Is separatorSuppressionPolicy=never meaningless?


Hi Mike,



I ran your (fantastic!) example:



<sequence dfdl:separator="," dfdl:separatorSuppressionPolicy="never">

  <element name="givenName" type="xs:string" minOccurs="0" maxOccurs="3"/>

  <element name="surname" type="xs:string" minOccurs="0"/>

  <element name="ph" type="xs:string" minOccurs="0" maxOccurs="6"/>

</sequence>



You said that parsing this input:



michael, james,,rogers,888-888-8888,777-777-7777,,,,



would produce this XML:



<givenName>michael</givenName>

<givenName>james</givenName>

<surname>rogers</surname>

<ph>888-888-8888</ph>

<ph>777-777-7777</ph>



I didn’t get that XML; instead, I got this XML:



<file>

  <given-name>michael</given-name>

  <given-name> james</given-name>

  <given-name></given-name>

  <surname>rogers</surname>

  <phone>888-888-8888</phone>

  <phone>777-777-7777</phone>

  <phone></phone>

  <phone></phone>

  <phone></phone>

  <phone></phone>

</file>



/Roger



From: Beckerle, Mike 
<mbecke...@owlcyberdefense.com<mailto:mbecke...@owlcyberdefense.com>>
Sent: Tuesday, April 20, 2021 12:13 PM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org>
Subject: [EXT] Re: Is separatorSuppressionPolicy=never meaningless?



minOccurs/maxOccurs are logical constructs.



saying the representation requires or allows separators that don't correlate 
with minOccurs/maxOccurs is strange perhaps, but lots of legacy data formats 
have rigidity about allocating things. E.g., they allow for say 10 things, and 
if you aren't using all 10, you leave some of them empty as the way of 
indicating you are using only some of the available 10.



Example:



<sequence dfdl:separator="," dfdl:separatorSuppressionPolicy="never">

  <element name="givenName" type="xs:string" minOccurs="0" maxOccurs="3"/>

  <element name="surname" type="xs:string" minOccurs="0"/>

  <element name="ph" type="xs:string" minOccurs="0" maxOccurs="6"/>

</sequence>



This format means there are 10 locations separated by 9 commas.

Whether something is a givenName, surname, or phone number is just determined 
positionally by counting the separators as the parse passes them.



Well-formed instance:



"michael, james,,rogers,888-888-8888,777-777-7777,,,,"



infoset:



<givenName>michael</givenName>

<givenName>james</givenName>

<surname>rogers</surname>

<ph>888-888-8888</ph>

<ph>777-777-7777</ph>



Well-formed instance:



"madonna,,,,,,,,,"



infoset



<givenName>madonna</givenName>



Both the above examples would unparse to exactly the input data.



To me this makes perfect sense both representationally, and in the infoset.







________________________________

From: Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>>
Sent: Tuesday, April 20, 2021 11:08 AM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> 
<users@daffodil.apache.org<mailto:users@daffodil.apache.org>>
Subject: Re: Is separatorSuppressionPolicy=never meaningless?



Hi Mike,



Thank you for the explanation. Yes, I see that is how Daffodil behaves.



But, but, but, …



Does it make sense? If a data format specifies that instances contain 1 to 5 
string values separated by forward slashes, then any of these instances should 
be valid:



              a

              a/b

              a/b/c

              a/b/c/d

              a/b/c/d/e



But you are saying that only the last instance is valid when 
separatorSuppressionPolicy=never is also specified. You are saying that 
instances must always contain 5 values (a zero-length string is a value):



              a////

              a/b///

              a/b/c//

              a/b/c/d/

              a/b/c/d/e



To my mind, the constraints form a logical inconsistency. The constraints



              minOccurs=1

              maxOccurs=5



specifies instances contain 1 to 5 values



The constraint



              separatorSuppressionPolicy=never



specifies instances must contain exactly 5 values.



Therefore, the constraints form a logical inconsistency, don’t they?



/Roger



From: Beckerle, Mike 
<mbecke...@owlcyberdefense.com<mailto:mbecke...@owlcyberdefense.com>>
Sent: Tuesday, April 20, 2021 10:34 AM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org>
Subject: [EXT] Re: Is separatorSuppressionPolicy=never meaningless?



The separatorSuppresssionPolicy 'never' used with a variable-length array, 
means that there will always be separators for maxOccurs items. That is, the 
separators are never suppressed even for optional item occurrences that are 
absent.



So CSV-style data with separatorSuppressionPolicy 'never' and minOccurs 0, 
maxOccurs 10 always requires 9 separators.



E.g.,



a/b/c///////



always 9 (for infix separator). Never any fewer, never any additional.



maxOccurs="unbounded" is not allowed with separatorSuppressionPolicy 'never'.











________________________________

From: Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>>
Sent: Tuesday, April 20, 2021 9:44 AM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> 
<users@daffodil.apache.org<mailto:users@daffodil.apache.org>>
Subject: Is separatorSuppressionPolicy=never meaningless?



Hi Folks,

separatorSuppressionPolicy=never means separators are never omitted.

I have convinced myself that there are no instances that would ever raise an 
error due to separatorSuppressionPolicy=never

Case #1: Suppose the schema specifies that instances must contain exactly 3 
string data items, separated by forward slashes. There is no data for the 3rd 
data item. Then instances must look like this:

        a/b/

The instance cannot omit the last separator because the schema specifies 
exactly 3 data items. So, separatorSuppressionPolicy=never has no effect in 
this case.

Case #2: Suppose the schema specifies that instances contain 1 to 3 string data 
items, separated by forward slashes. There is no data for the 3rd data item. 
Then this is a valid instance:

        a/b

Since there may be less than 3 data items, there are no omitted separators in 
the instance. Again, separatorSuppressionPolicy=never has no effect in this 
case.

I think those are the only two cases possible. In both cases 
separatorSuppressionPolicy=never has no effect. I conclude that 
separatorSuppressionPolicy=never is meaningless. I look forward to being proven 
wrong.

/Roger

Reply via email to