Note: setting parser.character.encoding.default to UTF-8 doesn't work.

Many thanks,
Markus
 
-----Original message-----
> From:Markus Jelsma <[email protected]>
> Sent: Thursday 26th October 2017 17:33
> To: User <[email protected]>
> Subject: Wrong encoding
> 
> Hello, 
> 
> I have this URL that says according to parsechecker it has 
> Content-Type=text/html; charset=windows-1252, which is incorrect. There is 
> also Content-Type=text/html; charset=utf-8 in the metadata, which i do find 
> in the HTML, at least i see <meta charset="utf-8">. This is Nutch 
> 1.14-SNAPSHOT.
> 
> But anyway, the text extracted is completely messed up, not all, but most 
> accents are unreadable.
> 
> No idea, do you have any?
> 
> Many thanks,
> Markus
> 
> https://www.aarstiderne.com/frugt-groent-og-mere/mixkasser
> 

Reply via email to