> Sometimes web pages do not identify the encoding the page is in. In > these cases, the client has to "guess" the encoding. Nutch currently > does not have a guessing algorithm, so if it encounters one of these > pages, it just decodes the page using the > parser.character.encoding.default parameter.
mmm, we have checked that search.jsp has pageEncoding set to UTF-8 and then we have set parser.character.encoding to UTF-8 for example when searching this string 'perch�' we obtain in the url this: http://localhost:8080/search.jsp?query=perch%C3%A8 i.e. two urlencoded characters... however it should be %E8, the '�'. thanks for your support ciao, KTeam ------------------------------------------------------- This SF.Net email is sponsored by Oracle Space Sweepstakes Want to be the first software developer in space? Enter now for the Oracle Space Sweepstakes! http://ads.osdn.com/?ad_idt12&alloc_id344&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
