> Sometimes web pages do not identify the encoding the page is in. In > these cases, the client has to "guess" the encoding. Nutch currently > does not have a guessing algorithm, so if it encounters one of these > pages, it just decodes the page using the > parser.character.encoding.default parameter.
mmm, we have checked that search.jsp has pageEncoding set to UTF-8 and then we have set parser.character.encoding to UTF-8 for example when searching this string 'perch�' we obtain in the url this: http://localhost:8080/search.jsp?query=perch%C3%A8 i.e. two urlencoded characters... however it should be %E8, the '�'. thanks for your support ciao, KTeam
