Re: Charset encoding

k-team Wed, 18 May 2005 06:48:15 -0700

> Sometimes web pages do not identify the encoding the page is in.  In
> these cases, the client has to "guess" the encoding.  Nutch currently
> does not have a guessing algorithm, so if it encounters one of these
> pages, it just decodes the page using the
> parser.character.encoding.default parameter.


mmm, we have checked that search.jsp has pageEncoding set to UTF-8 and
then we have set parser.character.encoding to UTF-8

for example when searching this string 'perch�'  we obtain in the url this:

http://localhost:8080/search.jsp?query=perch%C3%A8

i.e. two urlencoded characters... however it should be %E8, the '�'.

thanks for your support

ciao,
KTeam

Re: Charset encoding

Reply via email to