Using FPC trunk, Lazarus on WinXP, file and $codepage UTF-8, DefaultSystemCodePage is 1252.

Then AnsiString variables can contain either UTF-8 or cp1252 strings (inconsistent), but that's an already known problem :-(

Now I found another bug with AnsiString(1252), which IMO should behave like AnsiString(CP_ACP). Unfortunately this is not true, the same assignments of literals to both variables leads to different strings:

type
  WinAnsiString = type AnsiString(1252);
const
  cACP: AnsiString = 'ä'; //encoded UTF-8 = 'ä'
  cWin: WinAnsiString = 'ä'; //encoded 1252 = 'ä?'
var
  strA: AnsiString;
  strW: WinAnsiString;
begin
  strA := 'ä'; //encoded UTF-8 = 'ä'
  strW := 'ä'; //encoded 1252 = 'ä?'
  WriteLn('equal ',strA=strW); //FALSE!
  strW := cACP; //1252 'ä' okay
  strA := cWin; //1252 'ä?' wrong as above
end;

It looks to me as if the cp1252 strings (both const and var) are converted from an UTF-16 char (2 bytes into 2 chars), with the first char being the letter, the second one being the UTF-16 high byte (0) as '?' (#63).

Longer literals, like 'äöü', are converted properly, but to encoding UTF-8 for AnsiString and encoding 1252 for WinAnsiString.

Should I submit an bug report?

DoDi

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to