Hi Keith, The default mime type in TIKA is application/octet-stream. It gets set when the mime type can't be determined using 3 main means (url resolution, extension resolution, or magic chars). This is in the MimeTypes.java file within the mime package. The reason no parser gets called is because there is no parser registered to handle that mime type.
Are you suggesting that there is another, more sensible default? Thanks! Cheers, Chris On 10/11/07 2:06 PM, "Keith R. Bennett" <[EMAIL PROTECTED]> wrote: > > All - > > I tested Tika with a bunch of miscellaneous text files (shell scripts, > etc.), and found that an unknown (or nonexistent) extension results in the > failure to get a parser using ParseUtils.getParser(URL, TikaConfig). I > think that means that a MIME type could not be determined from the URL. > Should an unknown file type default to text/plain and use the text parser? > > Also, I believe there was code added to determine the MIME type from the > stream of bytes itself, wasn't there? How would that be used? > > Thanks, > - Keith ______________________________________________ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer Early Detection Research Network Project _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
