Re: [PR] Bump org.testcontainers:testcontainers-bom from 1.19.1 to 1.19.2 [tika]
THausherr merged PR #1449: URL: https://github.com/apache/tika/pull/1449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Bump org.testcontainers:testcontainers-bom from 1.19.1 to 1.19.2 [tika]
dependabot[bot] opened a new pull request, #1449: URL: https://github.com/apache/tika/pull/1449 Bumps [org.testcontainers:testcontainers-bom](https://github.com/testcontainers/testcontainers-java) from 1.19.1 to 1.19.2. Release notes Sourced from https://github.com/testcontainers/testcontainers-java/releases;>org.testcontainers:testcontainers-bom's releases. 1.19.2 Testcontainers for Java 1.19.2 Core Add shutdownHook to send sigterm to ryuk (https://redirect.github.com/testcontainers/testcontainers-java/issues/7717;>#7717) https://github.com/eddumelendez;>@eddumelendez Deprecate file/volume mapping APIs (https://redirect.github.com/testcontainers/testcontainers-java/issues/7652;>#7652) https://github.com/eddumelendez;>@eddumelendez Container definition API (https://redirect.github.com/testcontainers/testcontainers-java/issues/7714;>#7714) https://github.com/eddumelendez;>@eddumelendez Enable HTTP and HTTPS on native for HttpWaitStrategy (https://redirect.github.com/testcontainers/testcontainers-java/issues/7790;>#7790) https://github.com/eddumelendez;>@eddumelendez Resolve strategy to detect the remote docker socket (https://redirect.github.com/testcontainers/testcontainers-java/issues/7727;>#7727) https://github.com/eddumelendez;>@eddumelendez Modules New Oracle Free module (https://redirect.github.com/testcontainers/testcontainers-java/pull/7749;>testcontainers/testcontainers-java#7749) https://github.com/gvenzl;>@gvenzl Elasticserach Support Elastisearch image from DockerHub (https://redirect.github.com/testcontainers/testcontainers-java/issues/;>#) https://github.com/eddumelendez;>@eddumelendez JDBC Fix SQL parser (https://redirect.github.com/testcontainers/testcontainers-java/issues/7646;>#7646) https://github.com/inponomarev;>@inponomarev K3S Fix K3S start command (https://redirect.github.com/testcontainers/testcontainers-java/issues/7677;>#7677) https://github.com/tgeens;>@tgeens Kafka Create KafkaContainerDef (https://redirect.github.com/testcontainers/testcontainers-java/issues/7748;>#7748) https://github.com/eddumelendez;>@eddumelendez Add examples enabling SASL with JAAS (https://redirect.github.com/testcontainers/testcontainers-java/issues/7763;>#7763) https://github.com/eddumelendez;>@eddumelendez LocalStack Fix default credentials (https://redirect.github.com/testcontainers/testcontainers-java/issues/7718;>#7718) https://github.com/fokion;>@fokion YugabyteDB Improve SQL wait strategy (https://redirect.github.com/testcontainers/testcontainers-java/issues/7784;>#7784) https://github.com/HarshDaryani896;>@HarshDaryani896 What's Changed Documentation Introducing Oracle Free module (https://redirect.github.com/testcontainers/testcontainers-java/issues/7749;>#7749) https://github.com/gvenzl;>@gvenzl Update PR template with more specific wording (https://redirect.github.com/testcontainers/testcontainers-java/issues/7751;>#7751) https://github.com/gvenzl;>@gvenzl Fix small typo in new Podman docs (https://redirect.github.com/testcontainers/testcontainers-java/issues/7722;>#7722) https://github.com/TheHaf;>@TheHaf Deprecate file/volume mapping APIs (https://redirect.github.com/testcontainers/testcontainers-java/issues/7652;>#7652) https://github.com/eddumelendez;>@eddumelendez Fix link to Toxiproxy docs from Kafka docs (https://redirect.github.com/testcontainers/testcontainers-java/issues/7684;>#7684) https://github.com/alex-sherwin;>@alex-sherwin Fix documentation for BigQuery in gcloud module (https://redirect.github.com/testcontainers/testcontainers-java/issues/7681;>#7681) https://github.com/zanmagerl;>@zanmagerl Update Docker requirements page to be more container runtime agnostic (https://redirect.github.com/testcontainers/testcontainers-java/issues/7655;>#7655) https://github.com/kiview;>@kiview Dependency updates Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7810;>#7810) https://github.com/eddumelendez;>@eddumelendez Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7809;>#7809) https://github.com/eddumelendez;>@eddumelendez Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7807;>#7807) https://github.com/eddumelendez;>@eddumelendez Update docker-java version to 3.3.4 (https://redirect.github.com/testcontainers/testcontainers-java/issues/7730;>#7730) https://github.com/eddumelendez;>@eddumelendez Update kubernetes client version to 19.0.0 (https://redirect.github.com/testcontainers/testcontainers-java/issues/7716;>#7716) https://
Re: [PR] Bump test.containers.version from 1.19.0 to 1.19.1 [tika]
THausherr merged PR #1379: URL: https://github.com/apache/tika/pull/1379 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] Bump test.containers.version from 1.19.0 to 1.19.1 [tika]
dependabot[bot] opened a new pull request, #1379: URL: https://github.com/apache/tika/pull/1379 Bumps `test.containers.version` from 1.19.0 to 1.19.1. Updates `org.testcontainers:testcontainers-bom` from 1.19.0 to 1.19.1 Release notes Sourced from https://github.com/testcontainers/testcontainers-java/releases;>org.testcontainers:testcontainers-bom's releases. 1.19.1 Testcontainers for Java 1.19.1 Core Allow to define a custom ImagePullPolicy via configuration (https://redirect.github.com/testcontainers/testcontainers-java/issues/7520;>#7520) https://github.com/eddumelendez;>@eddumelendez Override ChainedImageNameSubstitutor toString (https://redirect.github.com/testcontainers/testcontainers-java/issues/7522;>#7522) https://github.com/eddumelendez;>@eddumelendez Log image pull and container startup time independently (https://redirect.github.com/testcontainers/testcontainers-java/issues/7455;>#7455) https://github.com/eddumelendez;>@eddumelendez Modules New https://java.testcontainers.org/modules/minio/;>MinIO module (https://redirect.github.com/testcontainers/testcontainers-java/issues/7440;>#7440) https://github.com/frozenwizard;>@frozenwizard Redpanda Additional listener should inherit the configured authentication method (https://redirect.github.com/testcontainers/testcontainers-java/issues/7594;>#7594) https://github.com/lburgazzoli;>@lburgazzoli What's Changed Migrate examples to junit5 (https://redirect.github.com/testcontainers/testcontainers-java/issues/7417;>#7417) https://github.com/samed-bicer;>@samed-bicer ☠️ Deprecations Deprecate CLI utility methods in RabbitMQ module (https://redirect.github.com/testcontainers/testcontainers-java/issues/7588;>#7588) https://github.com/eddumelendez;>@eddumelendez Deprecate withSecretInVault (https://redirect.github.com/testcontainers/testcontainers-java/issues/7576;>#7576) https://github.com/eddumelendez;>@eddumelendez Documentation Proposing Update to index.md - Env Settings for Rancher Desktop (https://redirect.github.com/testcontainers/testcontainers-java/issues/7591;>#7591) https://github.com/sunilarjun;>@sunilarjun Add docs for copyFile API (https://redirect.github.com/testcontainers/testcontainers-java/issues/4661;>#4661) https://github.com/kiview;>@kiview Add section for dependency upgrades in PR template (https://redirect.github.com/testcontainers/testcontainers-java/issues/7577;>#7577) https://github.com/eddumelendez;>@eddumelendez [Docs] GCloud: Add BigQuery Client creation (https://redirect.github.com/testcontainers/testcontainers-java/issues/7528;>#7528) https://github.com/fabriciorby;>@fabriciorby Add docs to run Testcontainers using Podman (https://redirect.github.com/testcontainers/testcontainers-java/issues/7447;>#7447) https://github.com/eddumelendez;>@eddumelendez Dependency updates Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7587;>#7587) https://github.com/eddumelendez;>@eddumelendez Update guava version to 32.1.2-jre (https://redirect.github.com/testcontainers/testcontainers-java/issues/7534;>#7534) https://github.com/eddumelendez;>@eddumelendez Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7584;>#7584) https://github.com/eddumelendez;>@eddumelendez Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7519;>#7519) https://github.com/eddumelendez;>@eddumelendez Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7500;>#7500) https://github.com/eddumelendez;>@eddumelendez Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7496;>#7496) https://github.com/eddumelendez;>@eddumelendez Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7494;>#7494) https://github.com/eddumelendez;>@eddumelendez Commits https://github.com/testcontainers/testcontainers-java/commit/dd1427ebd30bbaba7f32184f1376b7c21e725ab5;>dd1427e Add maven central badge in readme file https://github.com/testcontainers/testcontainers-java/commit/4c83a54331a1de3c4c5a71a6a1e2b617cd704dcb;>4c83a54 Deprecate CLI utility methods in RabbitMQ module (https://redirect.github.com/testcontainers/testcontainers-java/issues/7588;>#7588) https://github.com/testcontainers/testcontainers-java/commit/4296b5beb6cb7e072f63a91b1b70f352e80f3ad9;>4296b5b Additional listeners should inherit the configured authentication method (https://redirect.github.com/testcontainers/testcontainers-java/issues/7594;>#7594) https://github.com/testcon
[jira] [Updated] (TIKA-3128) MOV file produces RuntimeException with 1.24.1, used to work with earlier version 1.19.1
[ https://issues.apache.org/jira/browse/TIKA-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sameer Apte updated TIKA-3128: -- Summary: MOV file produces RuntimeException with 1.24.1, used to work with earlier version 1.19.1 (was: MOV file produces RuntimeException with 1.24.1, used to work with earlier version) > MOV file produces RuntimeException with 1.24.1, used to work with earlier > version 1.19.1 > > > Key: TIKA-3128 > URL: https://issues.apache.org/jira/browse/TIKA-3128 > Project: Tika > Issue Type: Bug > Components: parser >Affects Versions: 1.24.1 >Reporter: Sameer Apte >Priority: Major > Attachments: HDSIT_157516.mov > > > Attached _mov_ file produces _RuntimeException_ when parsed with *tika > v1.24.1* > The same _mov_ file can be parsed without any issues with *tika v1.19.1* > *Tika 1.19.1 stand alone app _SUCCESSFUL_ run* > {code:java} > [sapte@sapte-dt tikatest]$ java -jar tika-app-1.19.1.jar -m HDSIT_157516.mov > Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. > See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io > for optional dependencies.Jun 18, 2020 11:25:00 AM > org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > WARNING: org.xerial's sqlite-jdbc is not loaded. > Please provide the jar on your classpath to parse sqlite files. > See tika-parsers/pom.xml for the correct version. > Content-Length: 51066400 > Content-Type: application/mp4 > Creation-Date: 2015-05-18T16:23:25Z > Last-Modified: 2015-05-18T16:31:09Z > Last-Save-Date: 2015-05-18T16:31:09Z > X-Parsed-By: org.apache.tika.parser.DefaultParser > X-Parsed-By: org.apache.tika.parser.mp4.MP4Parser > date: 2015-05-18T16:31:09Z > dcterms:created: 2015-05-18T16:23:25Z > dcterms:modified: 2015-05-18T16:31:09Z > meta:creation-date: 2015-05-18T16:23:25Z > meta:save-date: 2015-05-18T16:31:09Z > modified: 2015-05-18T16:31:09Z > resourceName: HDSIT_157516.mov > tiff:ImageLength: 1080 > tiff:ImageWidth: 1920 > xmpDM:audioSampleRate: 3 > xmpDM:duration: 125.99 > {code} > *Tika 1.24.1 standalone app _RUNTIMEEXCEPTION_ run* > {code:java} > [sapte@sapte-dt tikatest]$ java -jar tika-app-1.24.1.jar -m HDSIT_157516.mov > Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. > See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io > for optional dependencies. > Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 > handleInitializableProblem > WARNING: org.xerial's sqlite-jdbc is not loaded. > Please provide the jar on your classpath to parse sqlite files. > See tika-parsers/pom.xml for the correct version. > Exception in thread "main" org.apache.tika.exception.TikaException: > Unexpected RuntimeException from org.apache.tika.parser.mp4.MP4Parser@23348b5d > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) > at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) > at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149) > Caused by: java.lang.RuntimeException: box size of zero means 'till end of > file. That is not yet supported > at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:90) > at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) > at > org.mp4parser.boxes.sampleentry.VisualSampleEntry.parse(VisualSampleEntry.java:195) > at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) > at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) > at > org.mp4parser.boxes.iso14496.part12.SampleDescriptionBox.parse(SampleDescriptionBox.java:91) > at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115) > at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107) > at > org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76) > at org.mp4parser.AbstractBoxParser.parseBox(A
[jira] [Resolved] (TIKA-2855) pdfbox version used by both Apache Tika 1.19.1 and 1.20 is vulnerable
[ https://issues.apache.org/jira/browse/TIKA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2855. --- Resolution: Duplicate Thank you! > pdfbox version used by both Apache Tika 1.19.1 and 1.20 is vulnerable > - > > Key: TIKA-2855 > URL: https://issues.apache.org/jira/browse/TIKA-2855 > Project: Tika > Issue Type: Bug > Components: core > Affects Versions: 1.19.1 >Reporter: Abhijit Rajwade >Priority: Major > > As per Sonatype Nexus Auditor, pdfbox versions upto 2.0.14 are vulnerable to > "CVE-2019-0228: possible XML External Entity (XXE) attack". > Recommended fix is to upgrade to pdfbox version 2.0.15 > Refer following pdfbox issue > https://issues.apache.org/jira/browse/PDFBOX-4505 > which is fixed on version 2.0.15 > Can you please upgrade Apache Tika to use pdfbox 2.0.15? > Following are details from the Sonatype Nexus scan report > Issue: CVE-2019-0228 > Severity: Sonatype CVSS 3.0: 7.3 > Weakness: Sonatype CWE: 611 > Source: National Vulnerability Database > Categories: Data > Description from CVE: apache pdfbox - XML External Entity (XXE) > Root Cause: pdfbox-2.0.12.jar : ( , 2.0.15) > Advisories: > Project: https://github.com/apache/pdfbox-docs/commit/b7869c3e4c62c5d... > Project: https://issues.apache.org/jira/browse/PDFBOX-4505 > Third Party: https://bugzilla.redhat.com/show_bug.cgi?id=1699740 > CVSS Details: > Sonatype CVSS 3.0: 7.3 > CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:L -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TIKA-2855) pdfbox version used by both Apache Tika 1.19.1 and 1.20 is vulnerable
Abhijit Rajwade created TIKA-2855: - Summary: pdfbox version used by both Apache Tika 1.19.1 and 1.20 is vulnerable Key: TIKA-2855 URL: https://issues.apache.org/jira/browse/TIKA-2855 Project: Tika Issue Type: Bug Components: core Affects Versions: 1.19.1 Reporter: Abhijit Rajwade As per Sonatype Nexus Auditor, pdfbox versions upto 2.0.14 are vulnerable to "CVE-2019-0228: possible XML External Entity (XXE) attack". Recommended fix is to upgrade to pdfbox version 2.0.15 Refer following pdfbox issue https://issues.apache.org/jira/browse/PDFBOX-4505 which is fixed on version 2.0.15 Can you please upgrade Apache Tika to use pdfbox 2.0.15? Following are details from the Sonatype Nexus scan report Issue: CVE-2019-0228 Severity: Sonatype CVSS 3.0: 7.3 Weakness: Sonatype CWE: 611 Source: National Vulnerability Database Categories: Data Description from CVE: apache pdfbox - XML External Entity (XXE) Root Cause: pdfbox-2.0.12.jar : ( , 2.0.15) Advisories: Project: https://github.com/apache/pdfbox-docs/commit/b7869c3e4c62c5d... Project: https://issues.apache.org/jira/browse/PDFBOX-4505 Third Party: https://bugzilla.redhat.com/show_bug.cgi?id=1699740 CVSS Details: Sonatype CVSS 3.0: 7.3 CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:L -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[ANNOUNCE] Apache Tika 1.19.1 released
The Apache Tika project is pleased to announce the release of Apache Tika 1.19.1. The release contents have been pushed out to the main Apache release site and to the Maven Central sync, so the releases should be available as soon as the mirrors get the syncs. Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Apache Tika 1.19.1 contains two critical bug fixes to the MP3Parser and the handling of SAX parsing. Details can be found in the changes file: https://www.apache.org/dist/tika/CHANGES-1.19.1.txt Apache Tika is available on the download page: https://tika.apache.org/download.html Apache Tika is also available in binary form or for use using Maven 2 from the Central Repository: https://repo1.maven.org/maven2/org/apache/tika/ In the initial 48 hours, the release may not be available on all mirrors. When downloading from a mirror site, please remember to verify the downloads using signatures found on the Apache site: https://www.apache.org/dist/tika/KEYS For more information on Apache Tika, visit the project home page: https://tika.apache.org/ -- Tim Allison, on behalf of the Apache Tika community
[RESULT][VOTE] Release Apache Tika 1.19.1 Candidate #2
The vote has passed. +1 from Tim Allison Dave Meikle Oleg Tikhonov Thejan Wijesinghe Thank you, all! Cheers, Tim On Tue, Oct 9, 2018 at 1:11 PM Thejan Wijesinghe wrote: > > All tests passed for me on my linux, +1 from me. > > On Tue, Oct 9, 2018 at 10:37 PM Tim Allison wrote: > > > No problem at all. Thank you, Oleg! > > On Tue, Oct 9, 2018 at 12:33 PM Oleg Tikhonov > > wrote: > > > > > > sorry. > > > +1 > > > > > > On Tue, Oct 9, 2018 at 7:26 PM Tim Allison wrote: > > > > > > > Thank you, Dave! > > > > > > > > Fellow devs, would anyone else have a chance to vote? We need a third > > > > for the release. Thank you! > > > > On Mon, Oct 8, 2018 at 4:36 AM wrote: > > > > > > > > > > Hello, > > > > > > > > > > On Thu, 4 Oct 2018 at 23:03, Tim Allison > > wrote: > > > > >> > > > > >> A candidate for the Tika 1.19.1 release is available at: > > > > >> https://dist.apache.org/repos/dist/dev/tika/ > > > > >> > > > > >> The release candidate is a zip archive of the sources in: > > > > >> https://github.com/apache/tika/tree/1.19.1-rc2/ > > > > >> > > > > >> The SHA-512 checksum of the archive is > > > > >> > > > > > > 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb > > > > >> > > > > >> In addition, a staged maven repository is available here: > > > > >> > > > > > > https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika > > > > >> > > > > >> Please vote on releasing this package as Apache Tika 1.19.1. > > > > >> > > > > >> The vote is open for the next 72 hours and passes if a majority of > > at > > > > >> least three +1 Tika PMC votes are cast. > > > > >> > > > > >> [ ] +1 Release this package as Apache Tika 1.19.1 > > > > >> [ ] -1 Do not release this package because... > > > > > > > > > > > > > > > +1 from me. > > > > > > > > > > Thanks for rolling the release Tim! > > > > > > > > > > Cheers, > > > > > Dave > > > > > >
Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2
All tests passed for me on my linux, +1 from me. On Tue, Oct 9, 2018 at 10:37 PM Tim Allison wrote: > No problem at all. Thank you, Oleg! > On Tue, Oct 9, 2018 at 12:33 PM Oleg Tikhonov > wrote: > > > > sorry. > > +1 > > > > On Tue, Oct 9, 2018 at 7:26 PM Tim Allison wrote: > > > > > Thank you, Dave! > > > > > > Fellow devs, would anyone else have a chance to vote? We need a third > > > for the release. Thank you! > > > On Mon, Oct 8, 2018 at 4:36 AM wrote: > > > > > > > > Hello, > > > > > > > > On Thu, 4 Oct 2018 at 23:03, Tim Allison > wrote: > > > >> > > > >> A candidate for the Tika 1.19.1 release is available at: > > > >> https://dist.apache.org/repos/dist/dev/tika/ > > > >> > > > >> The release candidate is a zip archive of the sources in: > > > >> https://github.com/apache/tika/tree/1.19.1-rc2/ > > > >> > > > >> The SHA-512 checksum of the archive is > > > >> > > > > 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb > > > >> > > > >> In addition, a staged maven repository is available here: > > > >> > > > > https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika > > > >> > > > >> Please vote on releasing this package as Apache Tika 1.19.1. > > > >> > > > >> The vote is open for the next 72 hours and passes if a majority of > at > > > >> least three +1 Tika PMC votes are cast. > > > >> > > > >> [ ] +1 Release this package as Apache Tika 1.19.1 > > > >> [ ] -1 Do not release this package because... > > > > > > > > > > > > +1 from me. > > > > > > > > Thanks for rolling the release Tim! > > > > > > > > Cheers, > > > > Dave > > > >
Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2
No problem at all. Thank you, Oleg! On Tue, Oct 9, 2018 at 12:33 PM Oleg Tikhonov wrote: > > sorry. > +1 > > On Tue, Oct 9, 2018 at 7:26 PM Tim Allison wrote: > > > Thank you, Dave! > > > > Fellow devs, would anyone else have a chance to vote? We need a third > > for the release. Thank you! > > On Mon, Oct 8, 2018 at 4:36 AM wrote: > > > > > > Hello, > > > > > > On Thu, 4 Oct 2018 at 23:03, Tim Allison wrote: > > >> > > >> A candidate for the Tika 1.19.1 release is available at: > > >> https://dist.apache.org/repos/dist/dev/tika/ > > >> > > >> The release candidate is a zip archive of the sources in: > > >> https://github.com/apache/tika/tree/1.19.1-rc2/ > > >> > > >> The SHA-512 checksum of the archive is > > >> > > > > 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb > > >> > > >> In addition, a staged maven repository is available here: > > >> > > https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika > > >> > > >> Please vote on releasing this package as Apache Tika 1.19.1. > > >> > > >> The vote is open for the next 72 hours and passes if a majority of at > > >> least three +1 Tika PMC votes are cast. > > >> > > >> [ ] +1 Release this package as Apache Tika 1.19.1 > > >> [ ] -1 Do not release this package because... > > > > > > > > > +1 from me. > > > > > > Thanks for rolling the release Tim! > > > > > > Cheers, > > > Dave > >
Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2
sorry. +1 On Tue, Oct 9, 2018 at 7:26 PM Tim Allison wrote: > Thank you, Dave! > > Fellow devs, would anyone else have a chance to vote? We need a third > for the release. Thank you! > On Mon, Oct 8, 2018 at 4:36 AM wrote: > > > > Hello, > > > > On Thu, 4 Oct 2018 at 23:03, Tim Allison wrote: > >> > >> A candidate for the Tika 1.19.1 release is available at: > >> https://dist.apache.org/repos/dist/dev/tika/ > >> > >> The release candidate is a zip archive of the sources in: > >> https://github.com/apache/tika/tree/1.19.1-rc2/ > >> > >> The SHA-512 checksum of the archive is > >> > > 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb > >> > >> In addition, a staged maven repository is available here: > >> > https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika > >> > >> Please vote on releasing this package as Apache Tika 1.19.1. > >> > >> The vote is open for the next 72 hours and passes if a majority of at > >> least three +1 Tika PMC votes are cast. > >> > >> [ ] +1 Release this package as Apache Tika 1.19.1 > >> [ ] -1 Do not release this package because... > > > > > > +1 from me. > > > > Thanks for rolling the release Tim! > > > > Cheers, > > Dave >
Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2
Thank you, Dave! Fellow devs, would anyone else have a chance to vote? We need a third for the release. Thank you! On Mon, Oct 8, 2018 at 4:36 AM wrote: > > Hello, > > On Thu, 4 Oct 2018 at 23:03, Tim Allison wrote: >> >> A candidate for the Tika 1.19.1 release is available at: >> https://dist.apache.org/repos/dist/dev/tika/ >> >> The release candidate is a zip archive of the sources in: >> https://github.com/apache/tika/tree/1.19.1-rc2/ >> >> The SHA-512 checksum of the archive is >> >> 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb >> >> In addition, a staged maven repository is available here: >> >> https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika >> >> Please vote on releasing this package as Apache Tika 1.19.1. >> >> The vote is open for the next 72 hours and passes if a majority of at >> least three +1 Tika PMC votes are cast. >> >> [ ] +1 Release this package as Apache Tika 1.19.1 >> [ ] -1 Do not release this package because... > > > +1 from me. > > Thanks for rolling the release Tim! > > Cheers, > Dave
Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2
I’ll review tonight! Sent from my iPhone > On Oct 8, 2018, at 6:40 PM, Tim Allison wrote: > > Third +1? > >> On Thu, Oct 4, 2018 at 6:03 PM Tim Allison wrote: >> >> A candidate for the Tika 1.19.1 release is available at: >> https://dist.apache.org/repos/dist/dev/tika/ >> >> The release candidate is a zip archive of the sources in: >> https://github.com/apache/tika/tree/1.19.1-rc2/ >> >> The SHA-512 checksum of the archive is >> >> 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb >> >> In addition, a staged maven repository is available here: >> >> https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika >> >> Please vote on releasing this package as Apache Tika 1.19.1. >> >> The vote is open for the next 72 hours and passes if a majority of at >> least three +1 Tika PMC votes are cast. >> >> [ ] +1 Release this package as Apache Tika 1.19.1 >> [ ] -1 Do not release this package because... >> >> Here's my +1. >> >> Cheers, >> >> Tim >>
Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2
Third +1? On Thu, Oct 4, 2018 at 6:03 PM Tim Allison wrote: > A candidate for the Tika 1.19.1 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: > https://github.com/apache/tika/tree/1.19.1-rc2/ > > The SHA-512 checksum of the archive is > > 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb > > In addition, a staged maven repository is available here: > > https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika > > Please vote on releasing this package as Apache Tika 1.19.1. > > The vote is open for the next 72 hours and passes if a majority of at > least three +1 Tika PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.19.1 > [ ] -1 Do not release this package because... > > Here's my +1. > > Cheers, > > Tim >
Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2
Hello, On Thu, 4 Oct 2018 at 23:03, Tim Allison wrote: > A candidate for the Tika 1.19.1 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: > https://github.com/apache/tika/tree/1.19.1-rc2/ > > The SHA-512 checksum of the archive is > > 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb > > In addition, a staged maven repository is available here: > > https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika > > Please vote on releasing this package as Apache Tika 1.19.1. > > The vote is open for the next 72 hours and passes if a majority of at > least three +1 Tika PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.19.1 > [ ] -1 Do not release this package because... > +1 from me. Thanks for rolling the release Tim! Cheers, Dave
[VOTE] Release Apache Tika 1.19.1 Candidate #2
A candidate for the Tika 1.19.1 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/1.19.1-rc2/ The SHA-512 checksum of the archive is 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb In addition, a staged maven repository is available here: https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika Please vote on releasing this package as Apache Tika 1.19.1. The vote is open for the next 72 hours and passes if a majority of at least three +1 Tika PMC votes are cast. [ ] +1 Release this package as Apache Tika 1.19.1 [ ] -1 Do not release this package because... Here's my +1. Cheers, Tim
[CANCEL][VOTE] Release Apache Tika 1.19.1 Candidate #1
All, The release process for PDFBox 2.0.12 should start today[1]. That has some important updates. Let's cancel RC1, and I'll roll RC2 as soon as PDFBox 2.0.12 hits maven. I've already run regression tests on PDFs with 2.0.12-SNAPSHOT, so we'll be good to move quickly. This is my official -1 on RC1. Best, Tim [1] https://lists.apache.org/thread.html/3992edce61194ca79b37044d5f43597f1e240a1d022bbc3569e3a517@%3Cdev.pdfbox.apache.org%3E On Thu, Sep 27, 2018 at 7:48 AM wrote: > > > On Wed, 26 Sep 2018 at 20:20, Tim Allison wrote: >> >> A candidate for the Tika 1.19.1 release is available at: >> https://dist.apache.org/repos/dist/dev/tika/ >> >> The release candidate is a zip archive of the sources in: >> https://github.com/apache/tika/tree/1.19.1-rc1/ >> >> The SHA-512 checksum of the archive is >> >> 88c79c106d78983effc9b41147b46b3722cb7afb8c847d340d3504f56488b8a7267fd634efe638afd2a2c52419fe6b84249ac6e641d5c8c5e6e4795f004b9a45 >> >> In addition, a staged maven repository is available here: >> >> https://repository.apache.org/content/repositories/orgapachetika-1044/org/apache/tika >> >> Please vote on releasing this package as Apache Tika 1.19.1. >> >> The vote is open for the next 72 hours and passes if a majority of at >> least three +1 Tika PMC votes are cast. >> >> [ ] +1 Release this package as Apache Tika 1.19.1 >> [ ] -1 Do not release this package because... > > > +1 - Checksum OK, Signatures OK (although need to get you some trust, Tim) > and test results looked good. > > I noticed a minor issue on a clean Ubuntu 18.04 with the Python rotation > script when I didn't have python-tk installed the rotation script fails and > thus the build. I've got a patch for the check so it looks for this but > don't think it is worth stopping this RC for, so will fire it in JIRA. > > Thanks for rolling this RC. > > Cheers, > Dave
Re: [VOTE] Release Apache Tika 1.19.1 Candidate #1
On Wed, 26 Sep 2018 at 20:20, Tim Allison wrote: > A candidate for the Tika 1.19.1 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: > https://github.com/apache/tika/tree/1.19.1-rc1/ > > The SHA-512 checksum of the archive is > > 88c79c106d78983effc9b41147b46b3722cb7afb8c847d340d3504f56488b8a7267fd634efe638afd2a2c52419fe6b84249ac6e641d5c8c5e6e4795f004b9a45 > > In addition, a staged maven repository is available here: > > https://repository.apache.org/content/repositories/orgapachetika-1044/org/apache/tika > > Please vote on releasing this package as Apache Tika 1.19.1. > > The vote is open for the next 72 hours and passes if a majority of at > least three +1 Tika PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.19.1 > [ ] -1 Do not release this package because... > +1 - Checksum OK, Signatures OK (although need to get you some trust, Tim) and test results looked good. I noticed a minor issue on a clean Ubuntu 18.04 with the Python rotation script when I didn't have python-tk installed the rotation script fails and thus the build. I've got a patch for the check so it looks for this but don't think it is worth stopping this RC for, so will fire it in JIRA. Thanks for rolling this RC. Cheers, Dave
Re: [VOTE] Release Apache Tika 1.19.1 Candidate #1
Side note: I didn't run the full regression testing, but I did run against the ~10k mp3 files in our corpus and found the same 4 exceptions. On Wed, Sep 26, 2018 at 3:20 PM Tim Allison wrote: > > A candidate for the Tika 1.19.1 release is available at: > https://dist.apache.org/repos/dist/dev/tika/ > > The release candidate is a zip archive of the sources in: > https://github.com/apache/tika/tree/1.19.1-rc1/ > > The SHA-512 checksum of the archive is > > 88c79c106d78983effc9b41147b46b3722cb7afb8c847d340d3504f56488b8a7267fd634efe638afd2a2c52419fe6b84249ac6e641d5c8c5e6e4795f004b9a45 > > In addition, a staged maven repository is available here: > > https://repository.apache.org/content/repositories/orgapachetika-1044/org/apache/tika > > Please vote on releasing this package as Apache Tika 1.19.1. > > The vote is open for the next 72 hours and passes if a majority of at > least three +1 Tika PMC votes are cast. > > [ ] +1 Release this package as Apache Tika 1.19.1 > [ ] -1 Do not release this package because... > > Here's my +1. > > Cheers, > > Tim
[VOTE] Release Apache Tika 1.19.1 Candidate #1
A candidate for the Tika 1.19.1 release is available at: https://dist.apache.org/repos/dist/dev/tika/ The release candidate is a zip archive of the sources in: https://github.com/apache/tika/tree/1.19.1-rc1/ The SHA-512 checksum of the archive is 88c79c106d78983effc9b41147b46b3722cb7afb8c847d340d3504f56488b8a7267fd634efe638afd2a2c52419fe6b84249ac6e641d5c8c5e6e4795f004b9a45 In addition, a staged maven repository is available here: https://repository.apache.org/content/repositories/orgapachetika-1044/org/apache/tika Please vote on releasing this package as Apache Tika 1.19.1. The vote is open for the next 72 hours and passes if a majority of at least three +1 Tika PMC votes are cast. [ ] +1 Release this package as Apache Tika 1.19.1 [ ] -1 Do not release this package because... Here's my +1. Cheers, Tim
Re: 1.19.1?
Sounds great! From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Tuesday, September 25, 2018 at 9:40 AM To: "dev@tika.apache.org" Subject: Re: 1.19.1? Given the mp3 issue and some other items, let's go with 1.19.1 rc1 today or tomorrow? On Mon, Sep 24, 2018 at 3:07 PM Nick Burch wrote: On Mon, 24 Sep 2018, Tim Allison wrote: > Aside from the problem with users and non-standard XML parsers, were > there any other show-stoppers in POI 4.0.0? Is there a reason to wait > for POI 4.0.1? I think, in terms of Tika affecting bugs, it was the xml parser stuff, and commons compress missing from the pom. Nick
Re: 1.19.1?
Given the mp3 issue and some other items, let's go with 1.19.1 rc1 today or tomorrow? On Mon, Sep 24, 2018 at 3:07 PM Nick Burch wrote: > > On Mon, 24 Sep 2018, Tim Allison wrote: > > Aside from the problem with users and non-standard XML parsers, were > > there any other show-stoppers in POI 4.0.0? Is there a reason to wait > > for POI 4.0.1? > > I think, in terms of Tika affecting bugs, it was the xml parser stuff, and > commons compress missing from the pom. > > Nick
Re: 1.19.1?
On Mon, 24 Sep 2018, Tim Allison wrote: Aside from the problem with users and non-standard XML parsers, were there any other show-stoppers in POI 4.0.0? Is there a reason to wait for POI 4.0.1? I think, in terms of Tika affecting bugs, it was the xml parser stuff, and commons compress missing from the pom. Nick
Re: 1.19.1?
Nick, Aside from the problem with users and non-standard XML parsers, were there any other show-stoppers in POI 4.0.0? Is there a reason to wait for POI 4.0.1? On Fri, Sep 21, 2018 at 12:48 PM Chris Mattmann wrote: > > Let’s roll it…. > > > > > > > > From: Tim Allison > Reply-To: "dev@tika.apache.org" > Date: Wednesday, September 19, 2018 at 12:14 PM > To: "dev@tika.apache.org" > Subject: 1.19.1? > > > > The mp3 regression is bad. In hindsight, the Tika-eval reports were fairly > > clear on this but I did some self-hand-waving to excuse away the > > numbers...I shouldn’t have. > > > > I want to add some new reports to tika-eval so that this never happens > > again. > > > > How long should we wait for 1.19.1 or 1.20? > > > > Best, > > > > Tim > > > > On Wed, Sep 19, 2018 at 2:29 PM Hudson (JIRA) wrote: > > > > > > [ > > https://issues.apache.org/jira/browse/TIKA-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621008#comment-16621008 > > ] > > > > Hudson commented on TIKA-2730: > > -- > > > > SUCCESS: Integrated in Jenkins build tika-branch-1x #94 (See [ > > https://builds.apache.org/job/tika-branch-1x/94/]) > > TIKA-2730 -- allow last frame to be truncated w/o throwing an EOF > > (tallison: [ > > https://github.com/apache/tika/commit/80cfd6d4a4270f8f3697c6dc083b3dedfc36c86a > > ]) > > * (edit) > > tika-parsers/src/main/java/org/apache/tika/parser/mp3/MpegStream.java > > * (edit) > > tika-parsers/src/test/java/org/apache/tika/parser/mp3/Mp3ParserTest.java > > * (add) > > tika-parsers/src/test/resources/test-documents/testMP3i18n_truncated.mp3 > > * (edit) > > tika-parsers/src/main/java/org/apache/tika/parser/mp3/Mp3Parser.java > > > > > > > parseToString fails for a simple mp3 > > > > > > > > > Key: TIKA-2730 > > > URL: https://issues.apache.org/jira/browse/TIKA-2730 > > > Project: Tika > > > Issue Type: Bug > > >Affects Versions: 1.19 > > >Reporter: Boris Petrov > > >Assignee: Tim Allison > > >Priority: Major > > > Fix For: 2.0.0, 1.20 > > > > > > Attachments: demo.mp3 > > > > > > > > > This is a regression from 1.18. I've attached the mp3 that fails. The > > exception I get is: > > > {noformat} > > > org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException > > from org.apache.tika.parser.mp3.Mp3Parser@cefe6c6 > > > at > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) > > > at > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > > > at > > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > > > at org.apache.tika.Tika.parseToString(Tika.java:527) > > > at com.company.TextExtractor.getText(TextExtractor.java:39) > > > Caused by: > > > java.io.EOFException: EOF: tried to skip 361 but could only skip 247 > > > at > > org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:166) > > > at > > org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:204) > > > at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71) > > > at > > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > > > ... 5 more{noformat} > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v7.6.3#76005) > > > > >
Re: 1.19.1?
Let’s roll it…. From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Wednesday, September 19, 2018 at 12:14 PM To: "dev@tika.apache.org" Subject: 1.19.1? The mp3 regression is bad. In hindsight, the Tika-eval reports were fairly clear on this but I did some self-hand-waving to excuse away the numbers...I shouldn’t have. I want to add some new reports to tika-eval so that this never happens again. How long should we wait for 1.19.1 or 1.20? Best, Tim On Wed, Sep 19, 2018 at 2:29 PM Hudson (JIRA) wrote: [ https://issues.apache.org/jira/browse/TIKA-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621008#comment-16621008 ] Hudson commented on TIKA-2730: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #94 (See [ https://builds.apache.org/job/tika-branch-1x/94/]) TIKA-2730 -- allow last frame to be truncated w/o throwing an EOF (tallison: [ https://github.com/apache/tika/commit/80cfd6d4a4270f8f3697c6dc083b3dedfc36c86a ]) * (edit) tika-parsers/src/main/java/org/apache/tika/parser/mp3/MpegStream.java * (edit) tika-parsers/src/test/java/org/apache/tika/parser/mp3/Mp3ParserTest.java * (add) tika-parsers/src/test/resources/test-documents/testMP3i18n_truncated.mp3 * (edit) tika-parsers/src/main/java/org/apache/tika/parser/mp3/Mp3Parser.java > parseToString fails for a simple mp3 > > > Key: TIKA-2730 > URL: https://issues.apache.org/jira/browse/TIKA-2730 > Project: Tika > Issue Type: Bug >Affects Versions: 1.19 >Reporter: Boris Petrov >Assignee: Tim Allison >Priority: Major > Fix For: 2.0.0, 1.20 > > Attachments: demo.mp3 > > > This is a regression from 1.18. I've attached the mp3 that fails. The exception I get is: > {noformat} > org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.mp3.Mp3Parser@cefe6c6 > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at org.apache.tika.Tika.parseToString(Tika.java:527) > at com.company.TextExtractor.getText(TextExtractor.java:39) > Caused by: > java.io.EOFException: EOF: tried to skip 361 but could only skip 247 > at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:166) > at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:204) > at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ... 5 more{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: 1.19.1?
Y, and I think I duplicated that bug when I copied/pasted from POI to Tika, so that's a good reminder to fix that in Tika asap as well as potentially wait for POI 4.0.1. Thank you! On Wed, Sep 19, 2018 at 4:53 PM Nick Burch wrote: > > On Wed, 19 Sep 2018, Tim Allison wrote: > > The mp3 regression is bad. In hindsight, the Tika-eval reports were > > fairly clear on this but I did some self-hand-waving to excuse away the > > numbers...I shouldn’t have. > > > > I want to add some new reports to tika-eval so that this never happens > > again. > > > > How long should we wait for 1.19.1 or 1.20? > > There's a POI xml bug on certain older platforms (POI tries too hard to > lock down the xml settings even if the xml parser doesn't do that...), > maybe worth trying to get a POI 4.0.1 out, then do a Tika 1.19.1 or 1.20 > (depending on how many other bugs we spot in the POI wait!) > > Nick
Re: 1.19.1?
On Wed, 19 Sep 2018, Tim Allison wrote: The mp3 regression is bad. In hindsight, the Tika-eval reports were fairly clear on this but I did some self-hand-waving to excuse away the numbers...I shouldn’t have. I want to add some new reports to tika-eval so that this never happens again. How long should we wait for 1.19.1 or 1.20? There's a POI xml bug on certain older platforms (POI tries too hard to lock down the xml settings even if the xml parser doesn't do that...), maybe worth trying to get a POI 4.0.1 out, then do a Tika 1.19.1 or 1.20 (depending on how many other bugs we spot in the POI wait!) Nick
1.19.1?
The mp3 regression is bad. In hindsight, the Tika-eval reports were fairly clear on this but I did some self-hand-waving to excuse away the numbers...I shouldn’t have. I want to add some new reports to tika-eval so that this never happens again. How long should we wait for 1.19.1 or 1.20? Best, Tim On Wed, Sep 19, 2018 at 2:29 PM Hudson (JIRA) wrote: > > [ > https://issues.apache.org/jira/browse/TIKA-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621008#comment-16621008 > ] > > Hudson commented on TIKA-2730: > -- > > SUCCESS: Integrated in Jenkins build tika-branch-1x #94 (See [ > https://builds.apache.org/job/tika-branch-1x/94/]) > TIKA-2730 -- allow last frame to be truncated w/o throwing an EOF > (tallison: [ > https://github.com/apache/tika/commit/80cfd6d4a4270f8f3697c6dc083b3dedfc36c86a > ]) > * (edit) > tika-parsers/src/main/java/org/apache/tika/parser/mp3/MpegStream.java > * (edit) > tika-parsers/src/test/java/org/apache/tika/parser/mp3/Mp3ParserTest.java > * (add) > tika-parsers/src/test/resources/test-documents/testMP3i18n_truncated.mp3 > * (edit) > tika-parsers/src/main/java/org/apache/tika/parser/mp3/Mp3Parser.java > > > > parseToString fails for a simple mp3 > > > > > > Key: TIKA-2730 > > URL: https://issues.apache.org/jira/browse/TIKA-2730 > > Project: Tika > > Issue Type: Bug > >Affects Versions: 1.19 > >Reporter: Boris Petrov > >Assignee: Tim Allison > >Priority: Major > > Fix For: 2.0.0, 1.20 > > > > Attachments: demo.mp3 > > > > > > This is a regression from 1.18. I've attached the mp3 that fails. The > exception I get is: > > {noformat} > > org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException > from org.apache.tika.parser.mp3.Mp3Parser@cefe6c6 > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > > at org.apache.tika.Tika.parseToString(Tika.java:527) > > at com.company.TextExtractor.getText(TextExtractor.java:39) > > Caused by: > > java.io.EOFException: EOF: tried to skip 361 but could only skip 247 > > at > org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:166) > > at > org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:204) > > at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71) > > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > > ... 5 more{noformat} > > > > -- > This message was sent by Atlassian JIRA > (v7.6.3#76005) >