Hello, We have a problem with Tika, encoding and pages on this website: https://www.aarstiderne.com/frugt-groent-og-mere/mixkasser
Using Nutch and Tika 1.12, but also using Tika 1.16, we found out that the regular HTML parser does a fine job, but our TikaParser has a tough job dealing with this HTML. For some reason Tika thinks Content-Encoding=windows-1252 is what this webpage says it is, instead the page identifies itself properly as UTF-8. Of all websites we index, this is so far the only one giving trouble indexing accents, getting fÃ¥ instead of a regular få. Any tips to spare? Many many thanks! Markus
