Hi Mike, My schema is at the end of this message.
I did some testing (Daffodil, version 3.0) and here’s what I found. Input: michael, james,,rogers,888-888-8888,777-777-7777,,,, When I use separatorSuppressionPolicy="never" Parsing yields: <file> <given-name>michael</given-name> <given-name> james</given-name> <given-name></given-name> <surname>rogers</surname> <phone>888-888-8888</phone> <phone>777-777-7777</phone> <phone></phone> <phone></phone> <phone></phone> <phone></phone> </file> Unparsing yields: michael, james,,rogers,888-888-8888,777-777-7777,,,, When I use separatorSuppressionPolicy="anyEmpty" Parsing yields: <file> <given-name>michael</given-name> <given-name> james</given-name> <surname>rogers</surname> <phone>888-888-8888</phone> <phone>777-777-7777</phone> </file> Unparsing yields: michael, james,rogers,888-888-8888,777-777-7777 When I use separatorSuppressionPolicy="trailingEmpty" Parsing yields: <file> <given-name>michael</given-name> <given-name> james</given-name> <surname>rogers</surname> <phone>888-888-8888</phone> <phone>777-777-7777</phone> </file> Unparsing yields: michael, james,,rogers,888-888-8888,777-777-7777 Next, instead of 1-3 given-names and 1-6 phones, and changed it to 3-3 given-names and 6-6 phones (i.e., exactly 3 given-names and exactly 6 phones). When I use separatorSuppressionPolicy="never" Parsing yields: <file> <given-name>michael</given-name> <given-name> james</given-name> <given-name></given-name> <surname>rogers</surname> <phone>888-888-8888</phone> <phone>777-777-7777</phone> <phone></phone> <phone></phone> <phone></phone> <phone></phone> </file> Unparsing yields: michael, james,,rogers,888-888-8888,777-777-7777,,,, When I use separatorSuppressionPolicy="anyEmpty" Parsing yields: <file> <given-name>michael</given-name> <given-name> james</given-name> <given-name></given-name> <surname>rogers</surname> <phone>888-888-8888</phone> <phone>777-777-7777</phone> <phone></phone> <phone></phone> <phone></phone> <phone></phone> </file> Unparsing yields: michael, james,rogers,888-888-8888,777-777-7777 When I use separatorSuppressionPolicy="trailingEmpty" Parsing yields: <file> <given-name>michael</given-name> <given-name> james</given-name> <given-name></given-name> <surname>rogers</surname> <phone>888-888-8888</phone> <phone>777-777-7777</phone> <phone></phone> <phone></phone> <phone></phone> <phone></phone> </file> Unparsing yields: michael, james,,rogers,888-888-8888,777-777-7777 Here is the schema: <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:format alignment="1" alignmentUnits="bytes" binaryFloatRep="ieee" binaryNumberCheckPolicy="lax" binaryNumberRep="binary" binaryCalendarEpoch="1970-01-01T00:00:00" bitOrder="mostSignificantBitFirst" byteOrder="bigEndian" calendarCenturyStart="53" calendarCheckPolicy="strict" calendarDaysInFirstWeek="4" calendarFirstDayOfWeek="Sunday" calendarLanguage="en" calendarObserveDST="yes" calendarPatternKind="implicit" calendarTimeZone="" choiceLengthKind="implicit" decimalSigned="yes" documentFinalTerminatorCanBeMissing="no" emptyValueDelimiterPolicy="both" encodingErrorPolicy="replace" encoding="US-ASCII" escapeSchemeRef="" fillByte="%#r20;" floating="no" ignoreCase="no" initiatedContent="no" initiator="" leadingSkip="0" lengthUnits="bytes" occursCountKind="implicit" outputNewLine="%LF;" representation="text" separator="" separatorPosition="infix" sequenceKind="ordered" terminator="" textBidi="no" textBooleanPadCharacter="%SP;" textCalendarJustification="left" textCalendarPadCharacter="%SP;" textNumberCheckPolicy="lax" textNumberJustification="right" textNumberPadCharacter="%SP;" textNumberPattern="#,##0.###;-#,##0.###" textNumberRep="standard" textNumberRounding="explicit" textNumberRoundingIncrement="0" textNumberRoundingMode="roundHalfEven" textOutputMinLength="0" textPadKind="none" textStandardBase="10" textStandardDecimalSeparator="." textStandardExponentRep="E" textStandardGroupingSeparator="," textStandardInfinityRep="Inf" textStandardNaNRep="NaN" textStandardZeroRep="0" textStringJustification="left" textStringPadCharacter="%SP;" textTrimKind="none" trailingSkip="0" truncateSpecifiedLengthString="no" utf16Width="fixed" lengthKind="delimited" /> </xs:appinfo> </xs:annotation> <!--<xs:element name="file"> <xs:complexType> <xs:sequence dfdl:separator="," dfdl:separatorPosition="infix" dfdl:separatorSuppressionPolicy="trailingEmpty"> <xs:element name="given-name" type="xs:string" maxOccurs="3" /> <xs:element name="surname" type="xs:string" /> <xs:element name="phone" type="xs:string" maxOccurs="6" /> </xs:sequence> </xs:complexType> </xs:element>--> <xs:element name="file"> <xs:complexType> <xs:sequence dfdl:separator="," dfdl:separatorPosition="infix" dfdl:separatorSuppressionPolicy="trailingEmpty"> <xs:element name="given-name" type="xs:string" minOccurs="3" maxOccurs="3" /> <xs:element name="surname" type="xs:string" /> <xs:element name="phone" type="xs:string" minOccurs="6" maxOccurs="6" /> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> From: Beckerle, Mike <mbecke...@owlcyberdefense.com> Sent: Wednesday, April 21, 2021 9:24 AM To: users@daffodil.apache.org Subject: [EXT] Re: Is separatorSuppressionPolicy=never meaningless? Roger, please send the whole schema. I'll figure out why my intuition about this is totally off. This does depend on assumptions like dfdl:occursCountKind='implicit'. I believe it should not be putting those empty elements into the infoset. So it could either be a bug, or there's some property missing/wrong that I can't guess right off. ________________________________ From: Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>> Sent: Wednesday, April 21, 2021 9:16 AM To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> <users@daffodil.apache.org<mailto:users@daffodil.apache.org>> Subject: Re: Is separatorSuppressionPolicy=never meaningless? Hi Mike, I ran your (fantastic!) example: <sequence dfdl:separator="," dfdl:separatorSuppressionPolicy="never"> <element name="givenName" type="xs:string" minOccurs="0" maxOccurs="3"/> <element name="surname" type="xs:string" minOccurs="0"/> <element name="ph" type="xs:string" minOccurs="0" maxOccurs="6"/> </sequence> You said that parsing this input: michael, james,,rogers,888-888-8888,777-777-7777,,,, would produce this XML: <givenName>michael</givenName> <givenName>james</givenName> <surname>rogers</surname> <ph>888-888-8888</ph> <ph>777-777-7777</ph> I didn’t get that XML; instead, I got this XML: <file> <given-name>michael</given-name> <given-name> james</given-name> <given-name></given-name> <surname>rogers</surname> <phone>888-888-8888</phone> <phone>777-777-7777</phone> <phone></phone> <phone></phone> <phone></phone> <phone></phone> </file> /Roger From: Beckerle, Mike <mbecke...@owlcyberdefense.com<mailto:mbecke...@owlcyberdefense.com>> Sent: Tuesday, April 20, 2021 12:13 PM To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> Subject: [EXT] Re: Is separatorSuppressionPolicy=never meaningless? minOccurs/maxOccurs are logical constructs. saying the representation requires or allows separators that don't correlate with minOccurs/maxOccurs is strange perhaps, but lots of legacy data formats have rigidity about allocating things. E.g., they allow for say 10 things, and if you aren't using all 10, you leave some of them empty as the way of indicating you are using only some of the available 10. Example: <sequence dfdl:separator="," dfdl:separatorSuppressionPolicy="never"> <element name="givenName" type="xs:string" minOccurs="0" maxOccurs="3"/> <element name="surname" type="xs:string" minOccurs="0"/> <element name="ph" type="xs:string" minOccurs="0" maxOccurs="6"/> </sequence> This format means there are 10 locations separated by 9 commas. Whether something is a givenName, surname, or phone number is just determined positionally by counting the separators as the parse passes them. Well-formed instance: "michael, james,,rogers,888-888-8888,777-777-7777,,,," infoset: <givenName>michael</givenName> <givenName>james</givenName> <surname>rogers</surname> <ph>888-888-8888</ph> <ph>777-777-7777</ph> Well-formed instance: "madonna,,,,,,,,," infoset <givenName>madonna</givenName> Both the above examples would unparse to exactly the input data. To me this makes perfect sense both representationally, and in the infoset. ________________________________ From: Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>> Sent: Tuesday, April 20, 2021 11:08 AM To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> <users@daffodil.apache.org<mailto:users@daffodil.apache.org>> Subject: Re: Is separatorSuppressionPolicy=never meaningless? Hi Mike, Thank you for the explanation. Yes, I see that is how Daffodil behaves. But, but, but, … Does it make sense? If a data format specifies that instances contain 1 to 5 string values separated by forward slashes, then any of these instances should be valid: a a/b a/b/c a/b/c/d a/b/c/d/e But you are saying that only the last instance is valid when separatorSuppressionPolicy=never is also specified. You are saying that instances must always contain 5 values (a zero-length string is a value): a//// a/b/// a/b/c// a/b/c/d/ a/b/c/d/e To my mind, the constraints form a logical inconsistency. The constraints minOccurs=1 maxOccurs=5 specifies instances contain 1 to 5 values The constraint separatorSuppressionPolicy=never specifies instances must contain exactly 5 values. Therefore, the constraints form a logical inconsistency, don’t they? /Roger From: Beckerle, Mike <mbecke...@owlcyberdefense.com<mailto:mbecke...@owlcyberdefense.com>> Sent: Tuesday, April 20, 2021 10:34 AM To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> Subject: [EXT] Re: Is separatorSuppressionPolicy=never meaningless? The separatorSuppresssionPolicy 'never' used with a variable-length array, means that there will always be separators for maxOccurs items. That is, the separators are never suppressed even for optional item occurrences that are absent. So CSV-style data with separatorSuppressionPolicy 'never' and minOccurs 0, maxOccurs 10 always requires 9 separators. E.g., a/b/c/////// always 9 (for infix separator). Never any fewer, never any additional. maxOccurs="unbounded" is not allowed with separatorSuppressionPolicy 'never'. ________________________________ From: Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>> Sent: Tuesday, April 20, 2021 9:44 AM To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> <users@daffodil.apache.org<mailto:users@daffodil.apache.org>> Subject: Is separatorSuppressionPolicy=never meaningless? Hi Folks, separatorSuppressionPolicy=never means separators are never omitted. I have convinced myself that there are no instances that would ever raise an error due to separatorSuppressionPolicy=never Case #1: Suppose the schema specifies that instances must contain exactly 3 string data items, separated by forward slashes. There is no data for the 3rd data item. Then instances must look like this: a/b/ The instance cannot omit the last separator because the schema specifies exactly 3 data items. So, separatorSuppressionPolicy=never has no effect in this case. Case #2: Suppose the schema specifies that instances contain 1 to 3 string data items, separated by forward slashes. There is no data for the 3rd data item. Then this is a valid instance: a/b Since there may be less than 3 data items, there are no omitted separators in the instance. Again, separatorSuppressionPolicy=never has no effect in this case. I think those are the only two cases possible. In both cases separatorSuppressionPolicy=never has no effect. I conclude that separatorSuppressionPolicy=never is meaningless. I look forward to being proven wrong. /Roger