Thanks for the suggestion. That way the problem is solved at some point. I run some more tests, but this time I removed the ms file extensions. I get the same not consistent results as before, even if I use TikaInputStream as a wrapper. Probably TikaInputStream just adds some metadata to include the file extension in the detection.
On Sun, Sep 23, 2012 at 8:37 PM, Nick Burch <[email protected]> wrote: > On Sun, 23 Sep 2012, naskoo wrote: > >> But If Tika's detect(InputStream is) method is used the picture is not the >> same. >> The results are: >> doc - "application/x-tika-msoffice" >> docx - "application/x-tika-ooxml" >> > > Try wrapping your InputStream as a TikaInputStream - for full container > detection Tika needs to be able to read the whole file, but still have it > available for the parser > > Nick >
