Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file

2019-09-04 Thread Tomas Hajny

On 2019-09-04 13:39, LacaK wrote:

You may be able to improve on this using system.BlockRead.
Probably yes, but then I must read in local buffer and examine buffer 
for CR/LF.


And return from my function UCS2ReadLn() only portion of string up to
CR/LF and rest of string return on next call to my function.
(so I must keep unprocessed part in global buffer)



Also, you are assuming low order byte first which may not be portable.


Yes, In my case LE is sufficient as far as I check presence of BOM 
$FF$FE


Just as a comment - a contribution allowing ReadLn to read UTF-16 files 
(preferably complete from functional point of view, especially without 
shortcuts like handling only UCS2 instead of complete Unicode) would be 
obviously welcome.


Tomas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file

2019-09-04 Thread LacaK



You may be able to improve on this using system.BlockRead.
Probably yes, but then I must read in local buffer and examine buffer 
for CR/LF.


And return from my function UCS2ReadLn() only portion of string up to 
CR/LF and rest of string return on next call to my function.

(so I must keep unprocessed part in global buffer)




Also, you are assuming low order byte first which may not be portable.


Yes, In my case LE is sufficient as far as I check presence of BOM $FF$FE

L.



On 04/09/2019 11:14, LacaK wrote:

Nice! Thank you very much.

As an alternative for F:TextFile I am using:

procedure UCS2ReadLn(var F: TextFile; out s: String);
var
  c: record
  case boolean of
   false: (a: array[0..1] of AnsiChar);
   true : (w: WideChar);
 end;
begin
  s:='';
  while not Eof(F) do begin
    System.Read(F,c.a[0]);
    System.Read(F,c.a[1]);
    if c.w in [#10,#13] then
  if s = '' then {begin of line} else break {end of line}
    else
  s := s + c.w;
  end;
end;

which works for me also, but I would be like to have better solution. 
I will try LoadFromFile with TEncoding once FPC 3.2 will be out.


-L.


Stupid an lazy workaround, probably not suitable for larger files.

{$mode objfpc}
{$h+}
uses
   sysutils;

type
   TUCS2TextFile = file of WideChar;

procedure ReadLine(var F: TUCS2TextFile; out S: UnicodeString);
var
   WC: WideChar;
begin
   //Assume file is opend for read
   S := '';
   while not Eof(F) do
   begin
 Read(F, WC);
 if WC = WideChar(#$000A) then
   exit
 else
   if (WC <> WideChar(#$000D)) and (WC<>WideChar(#$FEFF {Unicode LE
BOM})) then S := S + WC;
   end;
end;

var
   UFile: TUCS2TextFile;
   US: UnicodeString;
begin
   AssignFile(UFile, 'ucs2.txt');
   Reset(Ufile);
   while not Eof(UFile) do
   begin
 ReadLine(UFile, US);
 writeln('US = ',US);
   end;
   CloseFile(UFile);
end.

Outputs
US = Line1
US = Line2
US = Line3
which is correct for my test file (Unicode LE encoding created with 
Notepad).


--
Bart
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file

2019-09-04 Thread Tony Whyman

You may be able to improve on this using system.BlockRead.

Also, you are assuming low order byte first which may not be portable.

On 04/09/2019 11:14, LacaK wrote:

Nice! Thank you very much.

As an alternative for F:TextFile I am using:

procedure UCS2ReadLn(var F: TextFile; out s: String);
var
  c: record
  case boolean of
   false: (a: array[0..1] of AnsiChar);
   true : (w: WideChar);
 end;
begin
  s:='';
  while not Eof(F) do begin
    System.Read(F,c.a[0]);
    System.Read(F,c.a[1]);
    if c.w in [#10,#13] then
  if s = '' then {begin of line} else break {end of line}
    else
  s := s + c.w;
  end;
end;

which works for me also, but I would be like to have better solution. 
I will try LoadFromFile with TEncoding once FPC 3.2 will be out.


-L.


Stupid an lazy workaround, probably not suitable for larger files.

{$mode objfpc}
{$h+}
uses
   sysutils;

type
   TUCS2TextFile = file of WideChar;

procedure ReadLine(var F: TUCS2TextFile; out S: UnicodeString);
var
   WC: WideChar;
begin
   //Assume file is opend for read
   S := '';
   while not Eof(F) do
   begin
 Read(F, WC);
 if WC = WideChar(#$000A) then
   exit
 else
   if (WC <> WideChar(#$000D)) and (WC<>WideChar(#$FEFF {Unicode LE
BOM})) then S := S + WC;
   end;
end;

var
   UFile: TUCS2TextFile;
   US: UnicodeString;
begin
   AssignFile(UFile, 'ucs2.txt');
   Reset(Ufile);
   while not Eof(UFile) do
   begin
 ReadLine(UFile, US);
 writeln('US = ',US);
   end;
   CloseFile(UFile);
end.

Outputs
US = Line1
US = Line2
US = Line3
which is correct for my test file (Unicode LE encoding created with 
Notepad).


--
Bart
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file

2019-09-04 Thread LacaK

Nice! Thank you very much.

As an alternative for F:TextFile I am using:

procedure UCS2ReadLn(var F: TextFile; out s: String);
var
  c: record
  case boolean of
   false: (a: array[0..1] of AnsiChar);
   true : (w: WideChar);
 end;
begin
  s:='';
  while not Eof(F) do begin
    System.Read(F,c.a[0]);
    System.Read(F,c.a[1]);
    if c.w in [#10,#13] then
  if s = '' then {begin of line} else break {end of line}
    else
  s := s + c.w;
  end;
end;

which works for me also, but I would be like to have better solution. I 
will try LoadFromFile with TEncoding once FPC 3.2 will be out.


-L.


Stupid an lazy workaround, probably not suitable for larger files.

{$mode objfpc}
{$h+}
uses
   sysutils;

type
   TUCS2TextFile = file of WideChar;

procedure ReadLine(var F: TUCS2TextFile; out S: UnicodeString);
var
   WC: WideChar;
begin
   //Assume file is opend for read
   S := '';
   while not Eof(F) do
   begin
 Read(F, WC);
 if WC = WideChar(#$000A) then
   exit
 else
   if (WC <> WideChar(#$000D)) and (WC<>WideChar(#$FEFF {Unicode LE
BOM})) then S := S + WC;
   end;
end;

var
   UFile: TUCS2TextFile;
   US: UnicodeString;
begin
   AssignFile(UFile, 'ucs2.txt');
   Reset(Ufile);
   while not Eof(UFile) do
   begin
 ReadLine(UFile, US);
 writeln('US = ',US);
   end;
   CloseFile(UFile);
end.

Outputs
US = Line1
US = Line2
US = Line3
which is correct for my test file (Unicode LE encoding created with Notepad).

--
Bart
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file

2019-09-04 Thread LacaK






is there any smart way how to read string data line by line from UCS2
encoded text files (lines delimited by $0A00).

So, some LoadFromFile with a stream is no option for you?
It should be an option, but AFAIK LoadFromFile with optional TEncoding 
is not a part of FPC 3.0.4


It is only in upcoming 3.2.0 ...





I wonder if Delphi supports ReadLn() for UTF-16 encoded text files ...?

 From what I gather from the Embarcadero wiki and google searches it does not.
I only have D7 so I cannot test that myself though,

Seems you need to use LoadFromFile with a TEncoding specified, see:
http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Using_TEncoding_for_Unicode_Files


Yes it was my impression also ... I was wondering if there is other way?
(best using ReadLn() ... so I can open TextFile, then read first 2-3 
bytes (BOM) and detect what encoding file has (UTF-8 or UTF-16) and then 
either use ReadLn with AnsiString (UTF-8 case) or UnicodeString (UTF-16 
case))


L.



Bart

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file

2019-09-04 Thread Bart
Stupid an lazy workaround, probably not suitable for larger files.

{$mode objfpc}
{$h+}
uses
  sysutils;

type
  TUCS2TextFile = file of WideChar;

procedure ReadLine(var F: TUCS2TextFile; out S: UnicodeString);
var
  WC: WideChar;
begin
  //Assume file is opend for read
  S := '';
  while not Eof(F) do
  begin
Read(F, WC);
if WC = WideChar(#$000A) then
  exit
else
  if (WC <> WideChar(#$000D)) and (WC<>WideChar(#$FEFF {Unicode LE
BOM})) then S := S + WC;
  end;
end;

var
  UFile: TUCS2TextFile;
  US: UnicodeString;
begin
  AssignFile(UFile, 'ucs2.txt');
  Reset(Ufile);
  while not Eof(UFile) do
  begin
ReadLine(UFile, US);
writeln('US = ',US);
  end;
  CloseFile(UFile);
end.

Outputs
US = Line1
US = Line2
US = Line3
which is correct for my test file (Unicode LE encoding created with Notepad).

--
Bart
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] Read lines into UnicodeString variable from UCS2 (UTF-16) encoded text file

2019-09-04 Thread Bart
On Wed, Sep 4, 2019 at 7:46 AM LacaK  wrote:

> is there any smart way how to read string data line by line from UCS2
> encoded text files (lines delimited by $0A00).

So, some LoadFromFile with a stream is no option for you?

> I wonder if Delphi supports ReadLn() for UTF-16 encoded text files ...?

From what I gather from the Embarcadero wiki and google searches it does not.
I only have D7 so I cannot test that myself though,

Seems you need to use LoadFromFile with a TEncoding specified, see:
http://docwiki.embarcadero.com/RADStudio/Tokyo/en/Using_TEncoding_for_Unicode_Files

Bart
-- 
Bart
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal