[Robots] Re: Multilanguage robots

2002-04-08 Thread thomas.kay
Handling multiple character sets within the same file is still a problem. Sometimes the agent encounters a multiple language file. At times the file appearly is using overlapping character sets. The character sets like CP1252 and ISO8859-1 are used ( and browsers tolerate it, so the source

[Robots] Re: Multilanguage robots

2002-04-06 Thread Art Pollard
At 06:43 PM 4/5/2002 -0800, you wrote: I'm working on a multi language spider, and I've come to a point where I'm not sure what assumption to make. BIG SNIP The solution to your problem is to use a language identifier. A language identifier is capable of recognizing not only what language it