Hello, 

I have this URL that says according to parsechecker it has 
Content-Type=text/html; charset=windows-1252, which is incorrect. There is also 
Content-Type=text/html; charset=utf-8 in the metadata, which i do find in the 
HTML, at least i see <meta charset="utf-8">. This is Nutch 1.14-SNAPSHOT.

But anyway, the text extracted is completely messed up, not all, but most 
accents are unreadable.

No idea, do you have any?

Many thanks,
Markus

https://www.aarstiderne.com/frugt-groent-og-mere/mixkasser

Reply via email to