Hi Keith,

 The default mime type in TIKA is application/octet-stream. It gets set when
the mime type can't be determined using 3 main means (url resolution,
extension resolution, or magic chars). This is in the MimeTypes.java file
within the mime package. The reason no parser gets called is because there
is no parser registered to handle that mime type.

 Are you suggesting that there is another, more sensible default?

Thanks!

Cheers,
  Chris



On 10/11/07 2:06 PM, "Keith R. Bennett" <[EMAIL PROTECTED]> wrote:

> 
> All -
> 
> I tested Tika with a bunch of miscellaneous text files (shell scripts,
> etc.), and found that an unknown (or nonexistent) extension results in the
> failure to get a parser using ParseUtils.getParser(URL, TikaConfig).  I
> think that means that a MIME type could not be determined from the URL.
> Should an unknown file type default to text/plain and use the text parser?
> 
> Also, I believe there was code added to determine the MIME type from the
> stream of bytes itself, wasn't there?  How would that be used?
> 
> Thanks,
> - Keith

______________________________________________
Chris Mattmann, Ph.D.
[EMAIL PROTECTED]
Cognizant Development Engineer
Early Detection Research Network Project

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.


Reply via email to