Hi Mike,

I ran your (fantastic!) example:

<sequence dfdl:separator="," dfdl:separatorSuppressionPolicy="never">
  <element name="givenName" type="xs:string" minOccurs="0" maxOccurs="3"/>
  <element name="surname" type="xs:string" minOccurs="0"/>
  <element name="ph" type="xs:string" minOccurs="0" maxOccurs="6"/>
</sequence>

You said that parsing this input:

michael, james,,rogers,888-888-8888,777-777-7777,,,,

would produce this XML:

<givenName>michael</givenName>
<givenName>james</givenName>
<surname>rogers</surname>
<ph>888-888-8888</ph>
<ph>777-777-7777</ph>

I didn't get that XML; instead, I got this XML:

<file>
  <given-name>michael</given-name>
  <given-name> james</given-name>
  <given-name></given-name>
  <surname>rogers</surname>
  <phone>888-888-8888</phone>
  <phone>777-777-7777</phone>
  <phone></phone>
  <phone></phone>
  <phone></phone>
  <phone></phone>
</file>

/Roger

From: Beckerle, Mike <mbecke...@owlcyberdefense.com>
Sent: Tuesday, April 20, 2021 12:13 PM
To: users@daffodil.apache.org
Subject: [EXT] Re: Is separatorSuppressionPolicy=never meaningless?

minOccurs/maxOccurs are logical constructs.

saying the representation requires or allows separators that don't correlate 
with minOccurs/maxOccurs is strange perhaps, but lots of legacy data formats 
have rigidity about allocating things. E.g., they allow for say 10 things, and 
if you aren't using all 10, you leave some of them empty as the way of 
indicating you are using only some of the available 10.

Example:

<sequence dfdl:separator="," dfdl:separatorSuppressionPolicy="never">
  <element name="givenName" type="xs:string" minOccurs="0" maxOccurs="3"/>
  <element name="surname" type="xs:string" minOccurs="0"/>
  <element name="ph" type="xs:string" minOccurs="0" maxOccurs="6"/>
</sequence>

This format means there are 10 locations separated by 9 commas.
Whether something is a givenName, surname, or phone number is just determined 
positionally by counting the separators as the parse passes them.

Well-formed instance:

"michael, james,,rogers,888-888-8888,777-777-7777,,,,"

infoset:

<givenName>michael</givenName>
<givenName>james</givenName>
<surname>rogers</surname>
<ph>888-888-8888</ph>
<ph>777-777-7777</ph>

Well-formed instance:

"madonna,,,,,,,,,"

infoset

<givenName>madonna</givenName>

Both the above examples would unparse to exactly the input data.

To me this makes perfect sense both representationally, and in the infoset.



________________________________
From: Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>>
Sent: Tuesday, April 20, 2021 11:08 AM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> 
<users@daffodil.apache.org<mailto:users@daffodil.apache.org>>
Subject: Re: Is separatorSuppressionPolicy=never meaningless?


Hi Mike,



Thank you for the explanation. Yes, I see that is how Daffodil behaves.



But, but, but, ...



Does it make sense? If a data format specifies that instances contain 1 to 5 
string values separated by forward slashes, then any of these instances should 
be valid:



              a

              a/b

              a/b/c

              a/b/c/d

              a/b/c/d/e



But you are saying that only the last instance is valid when 
separatorSuppressionPolicy=never is also specified. You are saying that 
instances must always contain 5 values (a zero-length string is a value):



              a////

              a/b///

              a/b/c//

              a/b/c/d/

              a/b/c/d/e



To my mind, the constraints form a logical inconsistency. The constraints



              minOccurs=1

              maxOccurs=5



specifies instances contain 1 to 5 values



The constraint



              separatorSuppressionPolicy=never



specifies instances must contain exactly 5 values.



Therefore, the constraints form a logical inconsistency, don't they?



/Roger



From: Beckerle, Mike 
<mbecke...@owlcyberdefense.com<mailto:mbecke...@owlcyberdefense.com>>
Sent: Tuesday, April 20, 2021 10:34 AM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org>
Subject: [EXT] Re: Is separatorSuppressionPolicy=never meaningless?



The separatorSuppresssionPolicy 'never' used with a variable-length array, 
means that there will always be separators for maxOccurs items. That is, the 
separators are never suppressed even for optional item occurrences that are 
absent.



So CSV-style data with separatorSuppressionPolicy 'never' and minOccurs 0, 
maxOccurs 10 always requires 9 separators.



E.g.,



a/b/c///////



always 9 (for infix separator). Never any fewer, never any additional.



maxOccurs="unbounded" is not allowed with separatorSuppressionPolicy 'never'.











________________________________

From: Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>>
Sent: Tuesday, April 20, 2021 9:44 AM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> 
<users@daffodil.apache.org<mailto:users@daffodil.apache.org>>
Subject: Is separatorSuppressionPolicy=never meaningless?



Hi Folks,

separatorSuppressionPolicy=never means separators are never omitted.

I have convinced myself that there are no instances that would ever raise an 
error due to separatorSuppressionPolicy=never

Case #1: Suppose the schema specifies that instances must contain exactly 3 
string data items, separated by forward slashes. There is no data for the 3rd 
data item. Then instances must look like this:

        a/b/

The instance cannot omit the last separator because the schema specifies 
exactly 3 data items. So, separatorSuppressionPolicy=never has no effect in 
this case.

Case #2: Suppose the schema specifies that instances contain 1 to 3 string data 
items, separated by forward slashes. There is no data for the 3rd data item. 
Then this is a valid instance:

        a/b

Since there may be less than 3 data items, there are no omitted separators in 
the instance. Again, separatorSuppressionPolicy=never has no effect in this 
case.

I think those are the only two cases possible. In both cases 
separatorSuppressionPolicy=never has no effect. I conclude that 
separatorSuppressionPolicy=never is meaningless. I look forward to being proven 
wrong.

/Roger

Reply via email to