Thanks for the suggestion. That way the problem is solved at some point.
I run some more tests, but this time I removed the ms file extensions.
I get the same not consistent results as before, even if I use
TikaInputStream as a wrapper.
Probably TikaInputStream just adds some metadata to include the file
extension in the
detection.

On Sun, Sep 23, 2012 at 8:37 PM, Nick Burch <[email protected]> wrote:

> On Sun, 23 Sep 2012, naskoo wrote:
>
>> But If Tika's detect(InputStream is) method is used the picture is not the
>> same.
>> The results are:
>> doc - "application/x-tika-msoffice"
>> docx - "application/x-tika-ooxml"
>>
>
> Try wrapping your InputStream as a TikaInputStream - for full container
> detection Tika needs to be able to read the whole file, but still have it
> available for the parser
>
> Nick
>

Reply via email to