Increase buffer size for meta tag sniffing
------------------------------------------

                 Key: TIKA-357
                 URL: https://issues.apache.org/jira/browse/TIKA-357
             Project: Tika
          Issue Type: Improvement
    Affects Versions: 0.6
            Reporter: Ken Krugler
            Assignee: Ken Krugler
            Priority: Minor
             Fix For: 0.6
         Attachments: makler.html

Some web pages (such as makler.su, see attached) have lots of script data 
before the body of the HTML.

When this happens, the sniffing code fails to find the charset info in the meta 
tag, because it currently only sniffs the first 4K.

Bumping it to 8K would cover all of the cases that I (Ken) have seen during a 
test crawl.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to