Hmmm. Data is fixed length so %NL; and dfdl:outputNewLine property aren't involved.
XML doesn't preserve CR naturally. It converts CRLF or lone CR into LF. That's standard XML parser behavior, nothing to do with DFDL. Daffodil's XML conversion from the DFDL Infoset preserves CR by remapping them into the Uncode Private Use Area (PUA), which it does by adding 0xE000 to the character code roughly. This happens for all the C0 control characters, so starting at byte 11 you have 00 0C 0C 0D 01, and all 5 of those get remapped, to E000, E00C, E00C, E00D, E001 characters. That explains all the EE 80 80 (UTF-8 for E000) bytes you see in the XML text, which is UTF-8. (See The section "XML Illegal Characters" in https://daffodil.apache.org/infoset/) This is inverted on output. You should get back 00 0C 0C 0D 01. But in the UTF-8, you don't have an EE 80 0D which would be E00D, you have just 0A. Was this string of test data held in an XML file before the test? Somehow that 0D got converted to 0A before Daffodil ever saw the byte, because Daffodil would have created E00D from it, and the bytes would be EE 80 0D in the UTF-8. On Mon, May 20, 2024 at 2:35 PM Larry Barber <larry.bar...@nteligen.com> wrote: > I have a string parsing issue that I’ve replicated in a very simple schema > (attached) that is basically is nothing but the a 16 character string: > > > > <xs:element name="TEST"> > > <xs:complexType> > > <xs:sequence> > > <xs:element name="name" type="xs:string" > > dfdl:lengthUnits="bytes" dfdl:lengthKind="explicit" > dfdl:length="16" dfdl:encoding="US-ASCII" dfdl:alignmentUnits="bytes" /> > > </xs:sequence> > > </xs:complexType> > > </xs:element> > > > > The input looks like this: > > After parsing & unparsing, my output file looks like this: > > The <CR> at location 0x0E has been transformed into a <LF>! > > > > The infoset produced by the parse shows strangeness that I would not > expect: > > > > I’ve tried a variety of dfdl:encoding settings and get the same results > with US_ASCII, ASCII, and ISO-8859-1. > > Maybe, it’s somehow related to the outputNewLine="%CR;%LF;" or some other > obscure string setting that I’ve missed? > > Daffodil version 3.7.0 > > >