Hi Tilman! Thanks for the quick reply. According to my tests, the issue is fixed for docx, odt, pptx, and xlsx, but still happening for doc, ppt and xls extensions I will test it further and let you know if I find anything, but hopefully that can point you in the right direction
Regards, Alvaro On Mon, Jun 23, 2025 at 11:11 AM Tilman Hausherr <thaush...@t-online.de> wrote: > Hi, > > Please test with the unreleased 3.2.1: > > https://dist.apache.org/repos/dist/dev/tika/3.2.1/ > > https://repository.apache.org/content/repositories/orgapachetika-1115/org/apache/tika > > Tilman > > > > On 6/23/2025 11:01 AM, Alvaro Nogueira via user wrote: > > > > ---------- Forwarded message --------- > From: Alvaro Nogueira <alvaro.nogue...@flywire.com> > Date: Mon, Jun 23, 2025 at 10:54 AM > Subject: InputStream consumed by Tika.detect > To: <user-subscr...@tika.apache.org> > > > Hello, > We've been using Tika version 3.1.0 to successfully detect MimeTypes from > files before uploading them to our S3. > However, after v3.2.0 upgrade, we've noticed that the original inputStream > is being consumed entirely for certain file extensions. > The affected extensions seem to be all for Microsoft files, pointing us to > the POIFSContainerDetector, which was actually changed for this release. > This is the list of extensions we've tested with errors: doc, docx, odt, > ppt, pptx, xls, xlsx > And these ones work as before: bmp, csv, gif, jpeg, jpg, pdf, png, rtf, > svg, txt > > Here's some code to reproduce the issue: > > class TikaBugReport { > > // affected extensions: doc, docx, odt, ppt, pptx, xls, xlsx public > static void main(String[] args) throws IOException { > String fileName = "Test.docx"; > InputStream inputStream = new > ClassPathResource(fileName).getInputStream(); > checkFileMime(inputStream, fileName); > } > > public static void checkFileMime(InputStream inputStream, String > fileName) { > try { > Tika tika = new Tika(); > System.out.println("InputStream available bytes before > processing: " + inputStream.available()); > System.out.println("InputStream supports mark: " + > inputStream.markSupported()); > > Metadata metadata = new Metadata(); > > TikaInputStream tikaInputStream = > TikaInputStream.get(inputStream); > System.out.println("Original InputStream available bytes after > TikaInputStream.get(): " + inputStream.available()); > > String mimeType = tika.detect(tikaInputStream, metadata); > > // Debug: Check state after detection > System.out.println("Original InputStream available bytes after tika.detect(): > " + inputStream.available()); > System.out.println("TikaInputStream available bytes after > tika.detect(): " + tikaInputStream.available()); > if (inputStream.available() == 0) { > throw new IllegalStateException("InputStream is empty after > TikaInputStream creation"); > } > > } catch (Exception e) { > System.out.printf("Mime check exception for file '%s': [%s]%n", > fileName, e.getMessage()); > } > } > } > > > -- > Thank you and regards, > > Álvaro Nogueira > Senior Software Engineer > [image: Logo] <https://www.flywire.com/> [image: LinkedIn icon] > <https://www.linkedin.com/company/flywire> [image: Twitter icon] > <https://twitter.com/Flywire> [image: Facebook icon] > <https://www.facebook.com/Flywire> [image: Instagram icon] > <https://www.instagram.com/insideflywire/> > > Disclaimer for electronic communications > <https://www.flywire.com/legal/disclaimer-for-electronic-communications> > > > -- Disclaimer for electronic communications <https://www.flywire.com/legal/disclaimer-for-electronic-communications>