Mike wrote:
I suggest adding this
<choice>
<sequence dfdl:initiator="%NL;" />
<sequence />
</choice>
At the end of the schema after the repeating row element.
This will absorb and discard any final newline.
Oh! That is a wicked cool idea! I gave it a try. Daffodil doesn't seem to like
it:
[warning] Schema Definition Warning: Multiple choice branches are associated
with the end of element {}csv.
Note that elements with dfdl:outputValueCalc cannot be used to distinguish
choice branches.
Note that choice branches with entirely optional content are not allowed.
What does that message mean? How to fix it?
/Roger
From: Beckerle, Mike <[email protected]>
Sent: Sunday, November 10, 2019 7:56 AM
To: [email protected]
Subject: [EXT] Re: Is it okay to officially publish a DFDL schema that produces
warnings on valid input data?
I would avoid this.
One thing you need to take a position on is whether on unparsing you generate
this final new line, or not, or try to preserve whatever the file had
originally.
Choosing to always generate this, or always omit it is canonicalization.
I suggest adding this
<choice>
<sequence dfdl:initiator="%NL;" />
<sequence />
</choice>
At the end of the schema after the repeating row element.
This will absorb and discard any final newline.
If you want to preserve the final newline then you have to model it as data so
change the first branch of the choice above and make it an element named
'finalNewLine' with initiator and type string with explicit length 0.
________________________________
From: Costello, Roger L. <[email protected]<mailto:[email protected]>>
Sent: Saturday, November 9, 2019 8:05:19 AM
To: [email protected]<mailto:[email protected]>
<[email protected]<mailto:[email protected]>>
Subject: Is it okay to officially publish a DFDL schema that produces warnings
on valid input data?
Hi Folks,
Suppose you are creating the official, standard DFDL schema for a data format.
Would you be okay with officially releasing a schema that generates warnings on
data that is valid?
Here's an example. The RFC for CSV (RFC 4180) says that CSV files consist of
records separated by newlines. Each record consists of fields separated by
commas. The last record may or may not have a new line.
Suppose the last record of a CSV file has newline. My DFDL schema generates
this warning:
[warning] Left over data. Consumed 1680 bit(s) with at least 16 bit(s)
remaining.
I am thinking that that warning is okay. Why? Because when the last record has
a newline, then the file really does have left over data - the newline on the
last record. So, a warning is not unreasonable.
Well, that's what I think. I might be thinking wrongly. What do you think?
Would you ever officially release a DFDL schema that generates warnings on
valid input data?
/Roger