Re: How to know whether a file's encoding is ansi or utf8?

2014-07-24 Thread Kagamin via Digitalmars-d-learn
I first try to load the file as utf8 (or some 8kb at the start of 
it) with encoding exceptions turned on, if I catch an exception, 
I reload it as ansi, otherwise I assume it's valid utf8.


Re: How to know whether a file's encoding is ansi or utf8?

2014-07-22 Thread Sam Hu via Digitalmars-d-learn

On Tuesday, 22 July 2014 at 09:50:00 UTC, Sam Hu wrote:

Greetings!

As subjected,how can I know whether a file is in UTF8 encoding 
or ansi?


Thanks for the help in advance.

Regards,
Sam


Sorry,I mean by by code,for example,when I try to read a file 
content and printed to a text control in GUI,or to console,will 
proceed differently regarding file encoding.


Re: How to know whether a file's encoding is ansi or utf8?

2014-07-22 Thread Alexandre via Digitalmars-d-learn

Read the BOM ?

module main;

import std.stdio;

enum Encoding
{
UTF7,
UTF8,
UTF32,
Unicode,
BigEndianUnicode,
ASCII
};

Encoding GetFileEncoding(string fileName)
{
import std.file;
auto bom = cast(ubyte[]) read(fileName, 4);

if (bom[0] == 0x2b  bom[1] == 0x2f  bom[2] == 0x76)
return Encoding.UTF7;
if (bom[0] == 0xef  bom[1] == 0xbb  bom[2] == 0xbf)
return Encoding.UTF8;
if (bom[0] == 0xff  bom[1] == 0xfe)
return Encoding.Unicode; //UTF-16LE
if (bom[0] == 0xfe  bom[1] == 0xff)
return Encoding.BigEndianUnicode; //UTF-16BE
	if (bom[0] == 0  bom[1] == 0  bom[2] == 0xfe  bom[3] == 
0xff)

return Encoding.UTF32;

return Encoding.ASCII;
}

void main(string[] args)
{
if(GetFileEncoding(test.txt) == Encoding.UTF8)
writeln(The file is UTF8);
else
writeln(File is not UTF8 :();
}



On Tuesday, 22 July 2014 at 09:50:00 UTC, Sam Hu wrote:

Greetings!

As subjected,how can I know whether a file is in UTF8 encoding 
or ansi?


Thanks for the help in advance.

Regards,
Sam




Re: How to know whether a file's encoding is ansi or utf8?

2014-07-22 Thread Sam Hu via Digitalmars-d-learn

On Tuesday, 22 July 2014 at 11:59:34 UTC, Alexandre wrote:

Read the BOM ?

module main;

import std.stdio;

enum Encoding
{
UTF7,
UTF8,
UTF32,
Unicode,
BigEndianUnicode,
ASCII
};

Encoding GetFileEncoding(string fileName)
{
import std.file;
auto bom = cast(ubyte[]) read(fileName, 4);

if (bom[0] == 0x2b  bom[1] == 0x2f  bom[2] == 0x76)
return Encoding.UTF7;
if (bom[0] == 0xef  bom[1] == 0xbb  bom[2] == 0xbf)
return Encoding.UTF8;
if (bom[0] == 0xff  bom[1] == 0xfe)
return Encoding.Unicode; //UTF-16LE
if (bom[0] == 0xfe  bom[1] == 0xff)
return Encoding.BigEndianUnicode; //UTF-16BE
	if (bom[0] == 0  bom[1] == 0  bom[2] == 0xfe  bom[3] == 
0xff)

return Encoding.UTF32;

return Encoding.ASCII;
}

void main(string[] args)
{
if(GetFileEncoding(test.txt) == Encoding.UTF8)
writeln(The file is UTF8);
else
writeln(File is not UTF8 :();
}



On Tuesday, 22 July 2014 at 09:50:00 UTC, Sam Hu wrote:

Greetings!

As subjected,how can I know whether a file is in UTF8 encoding 
or ansi?


Thanks for the help in advance.

Regards,
Sam


Thanks. This is exactly what I want at this moment.


Re: How to know whether a file's encoding is ansi or utf8?

2014-07-22 Thread FreeSlave via Digitalmars-d-learn
Note that BOMs are optional and may be not presented in Unicode 
file. Also presence of leading bytes which look BOM does not 
necessarily mean that file is encoded in some kind of Unicode.


Re: How to know whether a file's encoding is ansi or utf8?

2014-07-22 Thread Alexandre via Digitalmars-d-learn

http://www.architectshack.com/TextFileEncodingDetector.ashx

On Tuesday, 22 July 2014 at 15:53:23 UTC, FreeSlave wrote:
Note that BOMs are optional and may be not presented in Unicode 
file. Also presence of leading bytes which look BOM does not 
necessarily mean that file is encoded in some kind of Unicode.



There are several difficulties in this case ...