For reference, here is the ticket to add a way to disable this behavior:
https://issues.apache.org/jira/browse/DAFFODIL-1559
On 2025-06-02 03:03 PM, Mike Beckerle wrote:
Parsing with iso-8859-1 preserves all bytes from native form into the DFDL
Infoset.
But ... then Daffodil is projecting the DFDL infoset into XML.
It is this XML conversion step that is causing the problem.
XML reading does not preserve CRLFs. On input XML readers convert CRLF->LF, and
stand alone CR to LF also.
Your data has CRCRLF so that becomes two LFs.
(This is one of several reasons why, in hindsight, XML isn't a very good data
language. I.e., it's not just that it is verbose!)
Unlike the illegal XML characters, which we have no choice but to remap into
the Unicode private use area (aka PUA) (as detailed here: https://
daffodil.apache.org/infoset/ <https://daffodil.apache.org/infoset/> See heading
"XML Illegal Characters"), Daffodil really does need a "preserveCR" flag of some
kind, as CR isn't technically an "illegal character" in XML data.
The workaround I have used and suggested in the past is to model a string which
can contain CR as an array of strings separated by CR.
On Mon, Jun 2, 2025 at 2:29 PM Mark Kozak <mark.ko...@adeptus-cs.com
<mailto:mark.ko...@adeptus-cs.com>> wrote:
Hello folks,____
__ __
Section 11.2.3 of the documentation says that if I use the ISO-8859-1
encoding, all bytes will be preserved. ____
So I have a simple text file that has the following text, represented as
hex:____
____
__ __
Using the following schema, I get the expected xml on parse____
__ __
<element name="file">____
<complexType>____
<sequence >____
<element name="file_string" type="xs:string" dfdl:lengthKind =
"delimited" dfdl:encoding="ISO-8859-1"/>____
</sequence>____
</complexType>____
</element>____
__ __
But when unparsing, one 0D is dropped, and one is converted to 0A as shown
below:____
____
__ __
What am I missing to actually preserve all bytes?____
__ __
Thanks,____
Mark____
__ __
Mark Kozak____
Director of Engineering____
Adeptus Cyber Solutions____
Adeptus-CS.com____
__ __