Nick Lothian wrote:

You may also be able to extract some useful information from the character
encoding (available in the Content-Type header - see
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11).

Obviously this won't always be useful, but encodings like Shift-JIS are
pretty good indicators of the language (Japanese in that case)


good point, need to dig more into that.

--
Sami Siren


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to