Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file
On 2019-09-04 13:39, LacaK wrote: You may be able to improve on this using system.BlockRead. Probably yes, but then I must read in local buffer and examine buffer for CR/LF. And return from my function UCS2ReadLn() only portion of string up to CR/LF and rest of string return on next call to my function. (so I must keep unprocessed part in global buffer) Also, you are assuming low order byte first which may not be portable. Yes, In my case LE is sufficient as far as I check presence of BOM $FF$FE Just as a comment - a contribution allowing ReadLn to read UTF-16 files (preferably complete from functional point of view, especially without shortcuts like handling only UCS2 instead of complete Unicode) would be obviously welcome. Tomas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file
You may be able to improve on this using system.BlockRead. Probably yes, but then I must read in local buffer and examine buffer for CR/LF. And return from my function UCS2ReadLn() only portion of string up to CR/LF and rest of string return on next call to my function. (so I must keep unprocessed part in global buffer) Also, you are assuming low order byte first which may not be portable. Yes, In my case LE is sufficient as far as I check presence of BOM $FF$FE L. On 04/09/2019 11:14, LacaK wrote: Nice! Thank you very much. As an alternative for F:TextFile I am using: procedure UCS2ReadLn(var F: TextFile; out s: String); var c: record case boolean of false: (a: array[0..1] of AnsiChar); true : (w: WideChar); end; begin s:=''; while not Eof(F) do begin System.Read(F,c.a[0]); System.Read(F,c.a[1]); if c.w in [#10,#13] then if s = '' then {begin of line} else break {end of line} else s := s + c.w; end; end; which works for me also, but I would be like to have better solution. I will try LoadFromFile with TEncoding once FPC 3.2 will be out. -L. Stupid an lazy workaround, probably not suitable for larger files. {$mode objfpc} {$h+} uses sysutils; type TUCS2TextFile = file of WideChar; procedure ReadLine(var F: TUCS2TextFile; out S: UnicodeString); var WC: WideChar; begin //Assume file is opend for read S := ''; while not Eof(F) do begin Read(F, WC); if WC = WideChar(#$000A) then exit else if (WC <> WideChar(#$000D)) and (WC<>WideChar(#$FEFF {Unicode LE BOM})) then S := S + WC; end; end; var UFile: TUCS2TextFile; US: UnicodeString; begin AssignFile(UFile, 'ucs2.txt'); Reset(Ufile); while not Eof(UFile) do begin ReadLine(UFile, US); writeln('US = ',US); end; CloseFile(UFile); end. Outputs US = Line1 US = Line2 US = Line3 which is correct for my test file (Unicode LE encoding created with Notepad). -- Bart ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file
You may be able to improve on this using system.BlockRead. Also, you are assuming low order byte first which may not be portable. On 04/09/2019 11:14, LacaK wrote: Nice! Thank you very much. As an alternative for F:TextFile I am using: procedure UCS2ReadLn(var F: TextFile; out s: String); var c: record case boolean of false: (a: array[0..1] of AnsiChar); true : (w: WideChar); end; begin s:=''; while not Eof(F) do begin System.Read(F,c.a[0]); System.Read(F,c.a[1]); if c.w in [#10,#13] then if s = '' then {begin of line} else break {end of line} else s := s + c.w; end; end; which works for me also, but I would be like to have better solution. I will try LoadFromFile with TEncoding once FPC 3.2 will be out. -L. Stupid an lazy workaround, probably not suitable for larger files. {$mode objfpc} {$h+} uses sysutils; type TUCS2TextFile = file of WideChar; procedure ReadLine(var F: TUCS2TextFile; out S: UnicodeString); var WC: WideChar; begin //Assume file is opend for read S := ''; while not Eof(F) do begin Read(F, WC); if WC = WideChar(#$000A) then exit else if (WC <> WideChar(#$000D)) and (WC<>WideChar(#$FEFF {Unicode LE BOM})) then S := S + WC; end; end; var UFile: TUCS2TextFile; US: UnicodeString; begin AssignFile(UFile, 'ucs2.txt'); Reset(Ufile); while not Eof(UFile) do begin ReadLine(UFile, US); writeln('US = ',US); end; CloseFile(UFile); end. Outputs US = Line1 US = Line2 US = Line3 which is correct for my test file (Unicode LE encoding created with Notepad). -- Bart ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file
Nice! Thank you very much. As an alternative for F:TextFile I am using: procedure UCS2ReadLn(var F: TextFile; out s: String); var c: record case boolean of false: (a: array[0..1] of AnsiChar); true : (w: WideChar); end; begin s:=''; while not Eof(F) do begin System.Read(F,c.a[0]); System.Read(F,c.a[1]); if c.w in [#10,#13] then if s = '' then {begin of line} else break {end of line} else s := s + c.w; end; end; which works for me also, but I would be like to have better solution. I will try LoadFromFile with TEncoding once FPC 3.2 will be out. -L. Stupid an lazy workaround, probably not suitable for larger files. {$mode objfpc} {$h+} uses sysutils; type TUCS2TextFile = file of WideChar; procedure ReadLine(var F: TUCS2TextFile; out S: UnicodeString); var WC: WideChar; begin //Assume file is opend for read S := ''; while not Eof(F) do begin Read(F, WC); if WC = WideChar(#$000A) then exit else if (WC <> WideChar(#$000D)) and (WC<>WideChar(#$FEFF {Unicode LE BOM})) then S := S + WC; end; end; var UFile: TUCS2TextFile; US: UnicodeString; begin AssignFile(UFile, 'ucs2.txt'); Reset(Ufile); while not Eof(UFile) do begin ReadLine(UFile, US); writeln('US = ',US); end; CloseFile(UFile); end. Outputs US = Line1 US = Line2 US = Line3 which is correct for my test file (Unicode LE encoding created with Notepad). -- Bart ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file
is there any smart way how to read string data line by line from UCS2 encoded text files (lines delimited by $0A00). So, some LoadFromFile with a stream is no option for you? It should be an option, but AFAIK LoadFromFile with optional TEncoding is not a part of FPC 3.0.4 It is only in upcoming 3.2.0 ... I wonder if Delphi supports ReadLn() for UTF-16 encoded text files ...? From what I gather from the Embarcadero wiki and google searches it does not. I only have D7 so I cannot test that myself though, Seems you need to use LoadFromFile with a TEncoding specified, see: http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Using_TEncoding_for_Unicode_Files Yes it was my impression also ... I was wondering if there is other way? (best using ReadLn() ... so I can open TextFile, then read first 2-3 bytes (BOM) and detect what encoding file has (UTF-8 or UTF-16) and then either use ReadLn with AnsiString (UTF-8 case) or UnicodeString (UTF-16 case)) L. Bart ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file
Stupid an lazy workaround, probably not suitable for larger files. {$mode objfpc} {$h+} uses sysutils; type TUCS2TextFile = file of WideChar; procedure ReadLine(var F: TUCS2TextFile; out S: UnicodeString); var WC: WideChar; begin //Assume file is opend for read S := ''; while not Eof(F) do begin Read(F, WC); if WC = WideChar(#$000A) then exit else if (WC <> WideChar(#$000D)) and (WC<>WideChar(#$FEFF {Unicode LE BOM})) then S := S + WC; end; end; var UFile: TUCS2TextFile; US: UnicodeString; begin AssignFile(UFile, 'ucs2.txt'); Reset(Ufile); while not Eof(UFile) do begin ReadLine(UFile, US); writeln('US = ',US); end; CloseFile(UFile); end. Outputs US = Line1 US = Line2 US = Line3 which is correct for my test file (Unicode LE encoding created with Notepad). -- Bart ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file
On Wed, Sep 4, 2019 at 7:46 AM LacaK wrote: > is there any smart way how to read string data line by line from UCS2 > encoded text files (lines delimited by $0A00). So, some LoadFromFile with a stream is no option for you? > I wonder if Delphi supports ReadLn() for UTF-16 encoded text files ...? From what I gather from the Embarcadero wiki and google searches it does not. I only have D7 so I cannot test that myself though, Seems you need to use LoadFromFile with a TEncoding specified, see: http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Using_TEncoding_for_Unicode_Files Bart -- Bart ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal