I wanted to post a few findings... I discovered that this only seems
to happen when calling the Tika.detect variant that accepts and
InputStream as input, and when that InputStream is created via
getClass().getResourceAsStream(...) for a zip compressed resource. The
strange part is that it doesn't fail for all zip resources, only
certain ones. For example I tested using the
"test-documents/testZipEntryNameCharsetShiftSJIS.zip" file included
under the tika parser test resources and it works fine. The zip file
I'm using locally contains some proprietary stuff, so can't pass that
one along in its current form for testing, but I will see if there is
a way I can strip it down.

On Thu, May 29, 2025 at 3:52 PM Tim Allison <talli...@apache.org> wrote:
>
> I opened this: https://issues.apache.org/jira/browse/TIKA-4424 If anyone 
> wants to join the conversation there and doesn't have a JIRA account, please 
> request one. On 2025/05/29 16:11:25 Tim Allis
> External (talli...@apache.org)
>   Report This Email
>
> I opened this: https://issues.apache.org/jira/browse/TIKA-4424
>
>
>
> If anyone wants to join the conversation there and doesn't have a JIRA 
> account, please request one.
>
>
>
> On 2025/05/29 16:11:25 Tim Allison wrote:
>
> > It has been released. I'm running into a challenge trying to create
>
> > the javadocs [0][1] which are required for the web page update, which
>
> > is required for the announcement. I'm not thrilled with where we are
>
> > on this. :/
>
> >
>
> > Any help would be appreciated. I'm sorry this got messed up. Once we
>
> > figure out the javadocs issue, I'll make the announcement and then
>
> > look into the detector regression.
>
> >
>
> > As for email..not clear to me.
>
> >
>
> > [0] https://lists.apache.org/thread/p4pwxcz9tt1wqn82wb4sx6drsrrtpofk
>
> > [1] https://lists.apache.org/thread/n7mjkfcsdf4lvwjtp1l3c77fs0yb5q0h
>
> >
>
> > On Thu, May 29, 2025 at 11:57 AM Joe Wicentowski <joe...@gmail.com> wrote:
>
> > >
>
> > > Hi all,
>
> > >
>
> > > Has 3.2.0 been released? I ask for a few reasons.
>
> > >
>
> > > First, the Tika homepage and download page don't show a 3.2.0 release and 
> > > instead show 3.1.0 as the current release:
>
> > >
>
> > >   https://tika.apache.org/
>
> > >   https://tika.apache.org/download.html
>
> > >
>
> > > However, there are indications that 3.2.0 is out, including the fact that 
> > > Homebrew automatically picked up 3.2.0:
>
> > >
>
> > >   https://github.com/Homebrew/homebrew-core/pull/224764
>
> > >
>
> > > And similarly, Craig's note below says:
>
> > >
>
> > > > ... my dependency management tool made me aware that 3.2.0 was 
> > > > available ...
>
> > >
>
> > > And other projects have picked up the release via Dependabot:
>
> > >
>
> > >   https://github.com/eXist-db/exist/pull/5754
>
> > >
>
> > > I checked here but didn't see a release announcement.
>
> > >
>
> > > On the possibly separate topic of the mailing list, could it be that 
> > > there is an issue with the mailing list? I didn't receive Craig's note 
> > > directly (only quoted in Pontus's post). I don't see Craig's post in the 
> > > list archives either:
>
> > >
>
> > >   https://lists.apache.org/list.html?user@tika.apache.org
>
> > >
>
> > > I hope this info helps! I'd be happy to provide any info if needed.
>
> > >
>
> > > Joe
>
> > >
>
> > > On Wed, May 28, 2025 at 6:39 PM Pontus Amberg <pontus.amb...@gmail.com> 
> > > wrote:
>
> > >>
>
> > >> I have also encountered the same issue in a simple test that tries to 
> > >> identify an
>
> > >> "application/vnd.google-earth.kmz" file. I can work around the "invalid 
> > >> mark" problem
>
> > >> by wrapping the InputStream used in Tike.detect(InputStream stream, 
> > >> String name)
>
> > >> with an TikaInputStream. Sadly I still have problems with 3.2.0 since 
> > >> the test now
>
> > >> fails since the "application/vnd.google-earth.kmz" file is detected as a
>
> > >> plain "application/zip".
>
> > >>
>
> > >> Reverting back to 3.1.0 makes the detection work with a plain InputStream
>
> > >>
>
> > >> /Pontus
>
> > >>
>
> > >> On 2025/05/28 16:33:23 Craig Muchinsky via user wrote:
>
> > >> > I tested using the release, my dependency management tool made me aware
>
> > >> > that 3.2.0 was available so I decided to kick the tires and ran into 
> > >> > this
>
> > >> > issue. I will have to spend some time on a reproduction scenario
>
> > >> >
>
> > >> > On Wed, May 28, 2025 at 1:33 AM Tilman Hausherr <th...@t-online.de>
>
> > >> > wrote:
>
> > >> >
>
> > >> > > Did you test with the release or with the candidate or with an 
> > >> > > earlier
>
> > >> > > build? A bug like you mentioned was fixed just a few days ago. 
> > >> > > Please share
>
> > >> > > the file and some minimal code. Tilman On 5/28/2025 2
>
> > >> > > *Caution*: External (thaush...@t-online.de)
>
> > >> > > First-Time Sender   Details
>
> > >> > > <https://protection.inkyphishfence.com/details?id=Y29sbGlicmEvY3JhaWcubXVjaGluc2t5QGNvbGxpYnJhLmNvbS8xMDA5MGNmMjBhODUwMjViNzQzYzVlM2VhYjk3MDI4MS8xNzQ4NDEwNDIyLjcwMTE2Nzg=#key=31268a81d07715bf5cf4cef79d6ad111>
>
> > >> > >   Report This Email
>
> > >> > > <https://protection.inkyphishfence.com/report?id=Y29sbGlicmEvY3JhaWcubXVjaGluc2t5QGNvbGxpYnJhLmNvbS8xMDA5MGNmMjBhODUwMjViNzQzYzVlM2VhYjk3MDI4MS8xNzQ4NDEwNDIyLjcwMTE2Nzg=#key=31268a81d07715bf5cf4cef79d6ad111>
>
> > >> > >
>
> > >> > > Did you test with the release or with the candidate or with an 
> > >> > > earlier
>
> > >> > >
>
> > >> > > build? A bug like you mentioned was fixed just a few days ago. Please
>
> > >> > >
>
> > >> > > share the file and some minimal code.
>
> > >> > >
>
> > >> > > Tilman
>
> > >> > >
>
> > >> > >
>
> > >> > >
>
> > >> > > On 5/28/2025 2:08 AM, Craig Muchinsky via user wrote:
>
> > >> > >
>
> > >> > > > After upgrading to tika 3.2.0, I started seeing the following
>
> > >> > >
>
> > >> > > > exception when attempting to detect the mime type for a given file,
>
> > >> > >
>
> > >> > > > I'm wondering if something in the way input streams are handled has
>
> > >> > >
>
> > >> > > > changed, or if this might be a regression?
>
> > >> > >
>
> > >> > > >
>
> > >> > >
>
> > >> > > > Caused by: java.io.IOException: Resetting to invalid mark
>
> > >> > >
>
> > >> > > > at java.base@21.0.7
>
> > >> > > /java.io.BufferedInputStream.implReset(BufferedInputStream.java:583)
>
> > >> > >
>
> > >> > > > at java.base@21.0.7
>
> > >> > > /java.io.BufferedInputStream.reset(BufferedInputStream.java:569)
>
> > >> > >
>
> > >> > > > at
>
> > >> > > app//org.apache.tika.io.BoundedInputStream.reset(BoundedInputStream.java:115)
>
> > >> > >
>
> > >> > > > at
>
> > >> > > app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detectStreaming(DefaultZipContainerDetector.java:279)
>
> > >> > >
>
> > >> > > > at
>
> > >> > > app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detect(DefaultZipContainerDetector.java:192)
>
> > >> > >
>
> > >> > > > at
>
> > >> > > app//org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
>
> > >> > >
>
> > >> > > > at app//org.apache.tika.Tika.detect(Tika.java:160)
>
> > >> > >
>
> > >> > > > at app//org.apache.tika.Tika.detect(Tika.java:185)
>
> > >> > >
>
> > >> > >
>
> > >> > >
>
> > >> > >
>
> > >> > >
>
> > >> >
>
> >

Reply via email to