Increase buffer size for meta tag sniffing ------------------------------------------
Key: TIKA-357 URL: https://issues.apache.org/jira/browse/TIKA-357 Project: Tika Issue Type: Improvement Affects Versions: 0.6 Reporter: Ken Krugler Assignee: Ken Krugler Priority: Minor Fix For: 0.6 Attachments: makler.html Some web pages (such as makler.su, see attached) have lots of script data before the body of the HTML. When this happens, the sniffing code fails to find the charset info in the meta tag, because it currently only sniffs the first 4K. Bumping it to 8K would cover all of the cases that I (Ken) have seen during a test crawl. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.