[ https://issues.apache.org/jira/browse/TIKA-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-154. -------------------------------- Resolution: Fixed Fix Version/s: 0.3 Assignee: Jukka Zitting In revision 735193 I implemented the plain text detection mechanism described in section 4 of the content type sniffing draft [1] I mentioned earlier on the mailing list. This seems to work pretty fine, and finally allows us to detect plain text documents with no file name or type hints. :-) Resolving as Fixed. [1] http://webblaze.cs.berkeley.edu/2009/mime-sniff/mime-sniff.txt > Better detection of plain text versus binary formats with a text header > ----------------------------------------------------------------------- > > Key: TIKA-154 > URL: https://issues.apache.org/jira/browse/TIKA-154 > Project: Tika > Issue Type: Improvement > Components: mime > Reporter: Jukka Zitting > Assignee: Jukka Zitting > Priority: Minor > Fix For: 0.3 > > > Antoni Mylka noted on the mailing list: > Many binary formats begin with magic byte sequences composed of ASCII > characters, e.g. > zipfiles begin with PK > pdfs begin with %PDF- > chms help files begin with ITSF > etc. > Tika should do a better job of detecting such cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.