Hello DFDL community,
I have a binary file that contains, among other things, this text:
Nova Scotia / Nouvelle-Écosse
Its corresponding hex binary is this:
4E 6F 76 61 20 53 63 6F 74 69 61 20 2F 20 4E 6F 75 76 65 6C 6C 65 2D C3 89 63
6F 73 73 65 20 ...
I used this element declaration in my DFDL schema to parse that binary:
<xs:element name="NAME"
type="xs:string"
dfdl:length="93"
dfdl:lengthKind="explicit"
dfdl:lengthUnits="characters"
dfdl:textTrimKind="padChar"
dfdl:textStringPadCharacter="%SP;"
dfdl:textStringJustification="center"/>
Surprisingly, during parsing Daffodil modified the text to this:
Nova Scotia / Nouvelle-Ã?cosse
With this corresponding hex binary:
4E 6F 76 61 20 53 63 6F 74 69 61 20 2F 20 4E 6F 75 76 65 6C 6C 65 2D C3 3F 63
6F 73 73 65 20 ...
The part in yellow changed -- from C3 89 (original) to C3 3F (after parsing).
Hex C3 89 corresponds to the É symbol whereas C3 3F is not a valid unicode
codepoint.
Why did Daffodil change the binary?
One other piece of the puzzle: in my DFDL schema I specify
encoding="ISO-8859-1". For a reason I do not understand, when I specify
encoding="utf-8" I get an error message on parse.
Please help!
/Roger