Mike wrote:

I suggest adding this

<choice>
  <sequence dfdl:initiator="%NL;" />
  <sequence />
</choice>

At the end of the schema after the repeating row element.

This will absorb and discard any final newline.

Oh! That is a wicked cool idea! I gave it a try. Daffodil doesn't seem to like 
it:

[warning] Schema Definition Warning: Multiple choice branches are associated 
with the end of element {}csv.
Note that elements with dfdl:outputValueCalc cannot be used to distinguish 
choice branches.
Note that choice branches with entirely optional content are not allowed.

What does that message mean? How to fix it?

/Roger


From: Beckerle, Mike <[email protected]>
Sent: Sunday, November 10, 2019 7:56 AM
To: [email protected]
Subject: [EXT] Re: Is it okay to officially publish a DFDL schema that produces 
warnings on valid input data?

I would avoid this.

One thing you need to take a position on is whether on unparsing you generate 
this final new line, or not, or try to preserve whatever the file had 
originally.

Choosing to always generate this, or always omit it is canonicalization.

I suggest adding this

<choice>
  <sequence dfdl:initiator="%NL;" />
  <sequence />
</choice>

At the end of the schema after the repeating row element.

This will absorb and discard any final newline.

If you want to preserve the final newline then you have to model it as data so 
change the first branch of the choice above and make it an element named 
'finalNewLine' with initiator and type string with explicit length 0.


________________________________
From: Costello, Roger L. <[email protected]<mailto:[email protected]>>
Sent: Saturday, November 9, 2019 8:05:19 AM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Subject: Is it okay to officially publish a DFDL schema that produces warnings 
on valid input data?


Hi Folks,



Suppose you are creating the official, standard DFDL schema for a data format. 
Would you be okay with officially releasing a schema that generates warnings on 
data that is valid?



Here's an example. The RFC for CSV (RFC 4180) says that CSV files consist of 
records separated by newlines. Each record consists of fields separated by 
commas. The last record may or may not have a new line.



Suppose the last record of a CSV file has newline. My DFDL schema generates 
this warning:



[warning] Left over data. Consumed 1680 bit(s) with at least 16 bit(s) 
remaining.



I am thinking that that warning is okay. Why? Because when the last record has 
a newline, then the file really does have left over data - the newline on the 
last record. So, a warning is not unreasonable.



Well, that's what I think. I might be thinking wrongly. What do you think? 
Would you ever officially release a DFDL schema that generates warnings on 
valid input data?



/Roger

Reply via email to