Re: [PHP] Re: Detecting The Encoding Of A Text File

2009-11-26 Thread Nitsan Bin-Nun
Someone have already suggested it but I haven't tried it yet.

The thing is that right now it contains Hebrew, but tommorrow this file will
be in German or any other accented language.
I'm trying to create a function which would detect the encoding and convert
it into UTF8.

(I don't have much experience in encoding.. :( )

2009/11/26 Nisse Engström news.nospam.0ixbt...@luden.se

 On Thu, 26 Nov 2009 06:55:31 +0200, Nitsan Bin-Nun wrote:

  Hi,
 
  I have been trying for the last couple of hours to determine the
  encoding of a text file (.txt in windowz).
 
  I have this code:
 
  $contents = file_get_contents($config['
  txt_dir'] . $file);
  $encoding = mb_detect_encoding($contents,
  UTF-8,ISO-8859-1,WINDOWS-1252); //,Windows-1255
 
  echo ||encoding:.$encoding.||;
 
  if ($encoding == 'UTF-8')
  {
  $utfcontents = $contents;
  }
  else if ($encoding == 'ISO-8859-1')
  {
  $utfcontents = utf8_encode($contents);
  }
 
  var_dump($utfcontents);
 
  The $encoding is ISO-8859-1, the text file contains Hebrew characters,
 then
  I'm converting it to utf8.
 
  The above code is outputing gibbrish, it seems that it has converted it
 in
  some way but not in the
  proper way that it should have converted it.

 If you know that the file contains Hebrew, maybe you should
 try converting from ISO-8859-8?


 /Nisse

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Re: Detecting The Encoding Of A Text File

2009-11-26 Thread דניאל דנון
If windows notepad can detect encoding there must be a way to do it
yourself.

Maybe try to get the file's headers, I think it should also contain the
encoding of the file...

2009/11/26 Nitsan Bin-Nun nitsa...@gmail.com

 Someone have already suggested it but I haven't tried it yet.

 The thing is that right now it contains Hebrew, but tommorrow this file
 will
 be in German or any other accented language.
 I'm trying to create a function which would detect the encoding and convert
 it into UTF8.

 (I don't have much experience in encoding.. :( )

 2009/11/26 Nisse Engström news.nospam.0ixbt...@luden.se

  On Thu, 26 Nov 2009 06:55:31 +0200, Nitsan Bin-Nun wrote:
 
   Hi,
  
   I have been trying for the last couple of hours to determine the
   encoding of a text file (.txt in windowz).
  
   I have this code:
  
   $contents = file_get_contents($config['
   txt_dir'] . $file);
   $encoding = mb_detect_encoding($contents,
   UTF-8,ISO-8859-1,WINDOWS-1252); //,Windows-1255
  
   echo ||encoding:.$encoding.||;
  
   if ($encoding == 'UTF-8')
   {
   $utfcontents = $contents;
   }
   else if ($encoding == 'ISO-8859-1')
   {
   $utfcontents = utf8_encode($contents);
   }
  
   var_dump($utfcontents);
  
   The $encoding is ISO-8859-1, the text file contains Hebrew characters,
  then
   I'm converting it to utf8.
  
   The above code is outputing gibbrish, it seems that it has converted it
  in
   some way but not in the
   proper way that it should have converted it.
 
  If you know that the file contains Hebrew, maybe you should
  try converting from ISO-8859-8?
 
 
  /Nisse
 
  --
  PHP General Mailing List (http://www.php.net/)
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 




-- 
Use ROT26 for best security


Re: [PHP] Re: Detecting The Encoding Of A Text File

2009-11-26 Thread Ashley Sheridan
On Thu, 2009-11-26 at 15:39 +0200, דניאל דנון wrote:

 If windows notepad can detect encoding there must be a way to do it
 yourself.
 
 Maybe try to get the file's headers, I think it should also contain the
 encoding of the file...
 
 2009/11/26 Nitsan Bin-Nun nitsa...@gmail.com
 
  Someone have already suggested it but I haven't tried it yet.
 
  The thing is that right now it contains Hebrew, but tommorrow this file
  will
  be in German or any other accented language.
  I'm trying to create a function which would detect the encoding and convert
  it into UTF8.
 
  (I don't have much experience in encoding.. :( )
 
  2009/11/26 Nisse Engström news.nospam.0ixbt...@luden.se
 
   On Thu, 26 Nov 2009 06:55:31 +0200, Nitsan Bin-Nun wrote:
  
Hi,
   
I have been trying for the last couple of hours to determine the
encoding of a text file (.txt in windowz).
   
I have this code:
   
$contents = file_get_contents($config['
txt_dir'] . $file);
$encoding = mb_detect_encoding($contents,
UTF-8,ISO-8859-1,WINDOWS-1252); //,Windows-1255
   
echo ||encoding:.$encoding.||;
   
if ($encoding == 'UTF-8')
{
$utfcontents = $contents;
}
else if ($encoding == 'ISO-8859-1')
{
$utfcontents = utf8_encode($contents);
}
   
var_dump($utfcontents);
   
The $encoding is ISO-8859-1, the text file contains Hebrew characters,
   then
I'm converting it to utf8.
   
The above code is outputing gibbrish, it seems that it has converted it
   in
some way but not in the
proper way that it should have converted it.
  
   If you know that the file contains Hebrew, maybe you should
   try converting from ISO-8859-8?
  
  
   /Nisse
  
   --
   PHP General Mailing List (http://www.php.net/)
   To unsubscribe, visit: http://www.php.net/unsub.php
  
  
 
 
 
 


A plain text file wouldn't have headers like that would it? At least,
not in the sense that an image file has a header, or an office document
file has a header.

Thanks,
Ash
http://www.ashleysheridan.co.uk




Re: [PHP] Re: Detecting The Encoding Of A Text File

2009-11-26 Thread Nisse Engström
On Thu, 26 Nov 2009 15:39:04 +0200, דניאל דנון wrote:

 If windows notepad can detect encoding there must be a way to do it
 yourself.
 
 Maybe try to get the file's headers, I think it should also contain the
 encoding of the file...

Plain text files don't have any headers. Perhaps they use
heuristics, eg. examine the distribution of characters to
determine a probable encoding.

[Quick Google...]

http://blogs.msdn.com/oldnewthing/archive/2007/04/17/2158334.aspx


/Nisse

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Detecting The Encoding Of A Text File

2009-11-26 Thread דניאל דנון
I was thinking that if notepad can open it correctly it has headers - but
the link you gave clarify that, my bad.

2009/11/26 Nisse Engström news.nospam.0ixbt...@luden.se

 On Thu, 26 Nov 2009 15:39:04 +0200, דניאל דנון wrote:

  If windows notepad can detect encoding there must be a way to do it
  yourself.
 
  Maybe try to get the file's headers, I think it should also contain the
  encoding of the file...

 Plain text files don't have any headers. Perhaps they use
 heuristics, eg. examine the distribution of characters to
 determine a probable encoding.

 [Quick Google...]

 http://blogs.msdn.com/oldnewthing/archive/2007/04/17/2158334.aspx


 /Nisse

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Use ROT26 for best security