Hello DFDL community,

I have a binary file that contains, among other things, this text:

Nova Scotia / Nouvelle-Écosse

Its corresponding hex binary is this:

4E 6F 76 61 20 53 63 6F 74 69 61 20 2F 20 4E 6F 75 76 65 6C 6C 65 2D C3 89 63 
6F 73 73 65 20 ...

I used this element declaration in my DFDL schema to parse that binary:

<xs:element     name="NAME"
                       type="xs:string"
                       dfdl:length="93"
                        dfdl:lengthKind="explicit"
                       dfdl:lengthUnits="characters"
                        dfdl:textTrimKind="padChar"
                        dfdl:textStringPadCharacter="%SP;"
                        dfdl:textStringJustification="center"/>

Surprisingly, during parsing Daffodil modified the text to this:

Nova Scotia / Nouvelle-Ã?cosse

With this corresponding hex binary:

4E 6F 76 61 20 53 63 6F 74 69 61 20 2F 20 4E 6F 75 76 65 6C 6C 65 2D C3 3F 63 
6F 73 73 65 20 ...

The part in yellow changed -- from C3 89 (original) to C3 3F (after parsing).

Hex C3 89 corresponds to the É symbol whereas C3 3F is not a valid unicode 
codepoint.

Why did Daffodil change the binary?

One other piece of the puzzle: in my DFDL schema I specify 
encoding="ISO-8859-1". For a reason I do not understand, when I specify 
encoding="utf-8" I get an error message on parse.

Please help!

/Roger

Reply via email to