Handling multiple character sets within the same file is still a problem.
Sometimes the agent encounters a multiple language file. At times the file
appearly is using overlapping character sets. The character sets like CP1252
and ISO8859-1 are used ( and browsers tolerate it, so the source
At 06:43 PM 4/5/2002 -0800, you wrote:
I'm working on a multi language spider, and I've come to a point where I'm
not sure what assumption to make.
BIG SNIP
The solution to your problem is to use a language identifier.
A language identifier is capable of recognizing not only what
language it