The rfc822 changes are nearly all for the good. This was the only new "bad" detection that I could find: https://corpora.tika.apache.org/base/docs/commoncrawl3/SZ/SZQNRMB4XAYR6N6ULYJVULN3FS32P4E4
The robots.txt changes, though, are nearly all bad. We should revert that change. On Sat, Oct 14, 2023 at 7:16 AM Tim Allison <[email protected]> wrote: > Looks like we have a bunch of new > "org.apache.poi.util.RecordFormatException: Tried to allocate an array of > length 10,xxx,xxx, but the maximum length for this record type is > 10,000,000." triggered by: > org.apache.poi.hslf.usermodel.HSLFSlideShowImpl.readPictures ... I'm not > sure why the regression tests didn't pick this up. > > The changes in rfc822 detection have also had some effects. The few > handfuls that I've reviewed are actually positive changes. I'll review > systematically on Monday. > > On Sat, Oct 14, 2023 at 6:35 AM Tim Allison <[email protected]> wrote: > >> Reports are here: >> https://corpora.tika.apache.org/base/reports/tika-2.9.1-reports.tgz >> >> I haven't had a chance to look at them yet. :( Will take a look early >> Monday (ET). >> >> On Wed, Oct 11, 2023 at 10:24 AM Tim Allison <[email protected]> wrote: >> >>> Unless there are objections, I'll kick off the 2.9.1 regression tests >>> shortly. I just cherry-picked TIKA-4153 into 2.x...will be interesting to >>> see how that works. >>> >>> Best, >>> >>> Tim >>> >>> On Tue, Oct 10, 2023 at 1:37 PM Tim Allison <[email protected]> wrote: >>> >>>> All, >>>> Nandita's email didn't go through for some reason. >>>> Seems reasonable to kick off a 2.9.1 release cycle? What do you >>>> think? >>>> >>>> Best, >>>> >>>> Tim >>>> >>>> >>>> >>>> *From:* Nandita Mohan >>>> *Sent:* Monday, October 9, 2023 3:41 PM >>>> *To:* [email protected] >>>> *Subject:* Requesting Tika Server release: commons-compress >>>> vulnerability >>>> >>>> >>>> >>>> Hi there, >>>> >>>> >>>> >>>> I work on a service which needs to upgrade our images due to this >>>> vulnerability in Apache *commons-compress*: Apache Commons Compress >>>> denial of service vulnerability · CVE-2023-42503 · GitHub Advisory Database >>>> <https://github.com/advisories/GHSA-cgwf-w82q-5jrr> >>>> >>>> >>>> >>>> This is due to use of Tika Server 2.9.0 (Apache Tika – Apache Tika 1.27 >>>> <https://tika.apache.org/2.9.0/index.html>), which has >>>> commons-compress as a dependency. I saw that Tim Allison recently updated >>>> this* commons-compress* version in the Github mirror repo: TIKA-4123 >>>> -- general updates for 3.0.0-BETA -- upgrade commons-compress · >>>> apache/tika@3c88246 (github.com) >>>> <https://github.com/apache/tika/commit/3c882460838c818ab2aff310d1fba9a084fe4800> >>>> >>>> >>>> >>>> We would greatly appreciate if this could be released to tika-server >>>> package in the next week , so we can update our images soon from this >>>> vulnerability. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Nandita Mohan >>>> >>>
