Can you add the content-disposition header to pass the filename as a hint? See: https://cwiki.apache.org/confluence/display/TIKA/TikaServer
It still feels wrong to me that Tika isn't correctly identifying pptx without the filename hint. I'll take a look tomorrow. On Sun, Feb 5, 2023 at 10:27 AM שי ברק <[email protected]> wrote: > > I have a PowerPoint document that I pass to /unpack/all endpoint via Postman. > I’ve noticed that postman automatically adds in the request header the key of > Content-Type, which eventually helps Tika to detect the type of the document > and the response I get back is a proper one(including the text, metadata and > the images within the presentation). > However, when I do the same process within my C# project except adding the > header of the Content-Type, I get different response from Tika, > Which is: > _rels, docProps and ppt folders. > I also get empty text file back. > While inspecting the metadata file, it seems the wrong parsers have been used. > How can I fix this? > Note that I can’t add the Content-Type header in my code. > >
