Re: [PHP] Re: Detecting The Encoding Of A Text File
Someone have already suggested it but I haven't tried it yet. The thing is that right now it contains Hebrew, but tommorrow this file will be in German or any other accented language. I'm trying to create a function which would detect the encoding and convert it into UTF8. (I don't have much experience in encoding.. :( ) 2009/11/26 Nisse Engström news.nospam.0ixbt...@luden.se On Thu, 26 Nov 2009 06:55:31 +0200, Nitsan Bin-Nun wrote: Hi, I have been trying for the last couple of hours to determine the encoding of a text file (.txt in windowz). I have this code: $contents = file_get_contents($config[' txt_dir'] . $file); $encoding = mb_detect_encoding($contents, UTF-8,ISO-8859-1,WINDOWS-1252); //,Windows-1255 echo ||encoding:.$encoding.||; if ($encoding == 'UTF-8') { $utfcontents = $contents; } else if ($encoding == 'ISO-8859-1') { $utfcontents = utf8_encode($contents); } var_dump($utfcontents); The $encoding is ISO-8859-1, the text file contains Hebrew characters, then I'm converting it to utf8. The above code is outputing gibbrish, it seems that it has converted it in some way but not in the proper way that it should have converted it. If you know that the file contains Hebrew, maybe you should try converting from ISO-8859-8? /Nisse -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: Detecting The Encoding Of A Text File
If windows notepad can detect encoding there must be a way to do it yourself. Maybe try to get the file's headers, I think it should also contain the encoding of the file... 2009/11/26 Nitsan Bin-Nun nitsa...@gmail.com Someone have already suggested it but I haven't tried it yet. The thing is that right now it contains Hebrew, but tommorrow this file will be in German or any other accented language. I'm trying to create a function which would detect the encoding and convert it into UTF8. (I don't have much experience in encoding.. :( ) 2009/11/26 Nisse Engström news.nospam.0ixbt...@luden.se On Thu, 26 Nov 2009 06:55:31 +0200, Nitsan Bin-Nun wrote: Hi, I have been trying for the last couple of hours to determine the encoding of a text file (.txt in windowz). I have this code: $contents = file_get_contents($config[' txt_dir'] . $file); $encoding = mb_detect_encoding($contents, UTF-8,ISO-8859-1,WINDOWS-1252); //,Windows-1255 echo ||encoding:.$encoding.||; if ($encoding == 'UTF-8') { $utfcontents = $contents; } else if ($encoding == 'ISO-8859-1') { $utfcontents = utf8_encode($contents); } var_dump($utfcontents); The $encoding is ISO-8859-1, the text file contains Hebrew characters, then I'm converting it to utf8. The above code is outputing gibbrish, it seems that it has converted it in some way but not in the proper way that it should have converted it. If you know that the file contains Hebrew, maybe you should try converting from ISO-8859-8? /Nisse -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- Use ROT26 for best security
Re: [PHP] Re: Detecting The Encoding Of A Text File
On Thu, 2009-11-26 at 15:39 +0200, דניאל דנון wrote: If windows notepad can detect encoding there must be a way to do it yourself. Maybe try to get the file's headers, I think it should also contain the encoding of the file... 2009/11/26 Nitsan Bin-Nun nitsa...@gmail.com Someone have already suggested it but I haven't tried it yet. The thing is that right now it contains Hebrew, but tommorrow this file will be in German or any other accented language. I'm trying to create a function which would detect the encoding and convert it into UTF8. (I don't have much experience in encoding.. :( ) 2009/11/26 Nisse Engström news.nospam.0ixbt...@luden.se On Thu, 26 Nov 2009 06:55:31 +0200, Nitsan Bin-Nun wrote: Hi, I have been trying for the last couple of hours to determine the encoding of a text file (.txt in windowz). I have this code: $contents = file_get_contents($config[' txt_dir'] . $file); $encoding = mb_detect_encoding($contents, UTF-8,ISO-8859-1,WINDOWS-1252); //,Windows-1255 echo ||encoding:.$encoding.||; if ($encoding == 'UTF-8') { $utfcontents = $contents; } else if ($encoding == 'ISO-8859-1') { $utfcontents = utf8_encode($contents); } var_dump($utfcontents); The $encoding is ISO-8859-1, the text file contains Hebrew characters, then I'm converting it to utf8. The above code is outputing gibbrish, it seems that it has converted it in some way but not in the proper way that it should have converted it. If you know that the file contains Hebrew, maybe you should try converting from ISO-8859-8? /Nisse -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php A plain text file wouldn't have headers like that would it? At least, not in the sense that an image file has a header, or an office document file has a header. Thanks, Ash http://www.ashleysheridan.co.uk
Re: [PHP] Re: Detecting The Encoding Of A Text File
On Thu, 26 Nov 2009 15:39:04 +0200, דניאל דנון wrote: If windows notepad can detect encoding there must be a way to do it yourself. Maybe try to get the file's headers, I think it should also contain the encoding of the file... Plain text files don't have any headers. Perhaps they use heuristics, eg. examine the distribution of characters to determine a probable encoding. [Quick Google...] http://blogs.msdn.com/oldnewthing/archive/2007/04/17/2158334.aspx /Nisse -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: Detecting The Encoding Of A Text File
I was thinking that if notepad can open it correctly it has headers - but the link you gave clarify that, my bad. 2009/11/26 Nisse Engström news.nospam.0ixbt...@luden.se On Thu, 26 Nov 2009 15:39:04 +0200, דניאל דנון wrote: If windows notepad can detect encoding there must be a way to do it yourself. Maybe try to get the file's headers, I think it should also contain the encoding of the file... Plain text files don't have any headers. Perhaps they use heuristics, eg. examine the distribution of characters to determine a probable encoding. [Quick Google...] http://blogs.msdn.com/oldnewthing/archive/2007/04/17/2158334.aspx /Nisse -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- Use ROT26 for best security