> Sometimes web pages do not identify the encoding the page is in.  In
> these cases, the client has to "guess" the encoding.  Nutch currently
> does not have a guessing algorithm, so if it encounters one of these
> pages, it just decodes the page using the
> parser.character.encoding.default parameter.

mmm, we have checked that search.jsp has pageEncoding set to UTF-8 and
then we have set parser.character.encoding to UTF-8

for example when searching this string 'perch�'  we obtain in the url this:

http://localhost:8080/search.jsp?query=perch%C3%A8

i.e. two urlencoded characters... however it should be %E8, the '�'.

thanks for your support

ciao,
KTeam


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_idt12&alloc_id344&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to