Chris -

I agree.  I now see the wisdom of making application/octet-stream the
default mime type, possibly with the ability to override that.

In addition, though, I think we want to consider what Tika should do with
such a byte stream.  One option is to run it through strings to get ASCII
text.  Another is to have it fail the parse so that the user can be notified
that Tika could not find a (definitely) suitable parser.  Another might be
to parse it as an empty string (if, for example, the text is known to be in
Chinese, and the output of strings would be meaningless random garbage).  In
the future, maybe the user would consider it important enough to write a
custom parser for application/octet-stream, and plug it into Tika.

- Keith


Chris Mattmann wrote:
> 
> Hi Folks,
> 
>  Thinking this through more, it probably makes a lot of sense for the
> Default MIME TYPE in Tika to be application/octet-stream.
> 

-- 
View this message in context: 
http://www.nabble.com/Default-MIME-Type--tf4609978.html#a13185693
Sent from the Apache Tika - Development mailing list archive at Nabble.com.

Reply via email to