Re: [PR] Bump org.testcontainers:testcontainers-bom from 1.19.1 to 1.19.2 [tika]

2023-11-14 Thread via GitHub


THausherr merged PR #1449:
URL: https://github.com/apache/tika/pull/1449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump org.testcontainers:testcontainers-bom from 1.19.1 to 1.19.2 [tika]

2023-11-14 Thread via GitHub


dependabot[bot] opened a new pull request, #1449:
URL: https://github.com/apache/tika/pull/1449

   Bumps 
[org.testcontainers:testcontainers-bom](https://github.com/testcontainers/testcontainers-java)
 from 1.19.1 to 1.19.2.
   
   Release notes
   Sourced from https://github.com/testcontainers/testcontainers-java/releases;>org.testcontainers:testcontainers-bom's
 releases.
   
   1.19.2
   Testcontainers for Java 1.19.2
   Core
   
   Add shutdownHook to send sigterm to ryuk (https://redirect.github.com/testcontainers/testcontainers-java/issues/7717;>#7717)
 https://github.com/eddumelendez;>@​eddumelendez
   Deprecate file/volume mapping APIs (https://redirect.github.com/testcontainers/testcontainers-java/issues/7652;>#7652)
 https://github.com/eddumelendez;>@​eddumelendez
   Container definition API (https://redirect.github.com/testcontainers/testcontainers-java/issues/7714;>#7714)
 https://github.com/eddumelendez;>@​eddumelendez
   Enable HTTP and HTTPS on native for HttpWaitStrategy (https://redirect.github.com/testcontainers/testcontainers-java/issues/7790;>#7790)
 https://github.com/eddumelendez;>@​eddumelendez
   Resolve strategy to detect the remote docker socket (https://redirect.github.com/testcontainers/testcontainers-java/issues/7727;>#7727)
 https://github.com/eddumelendez;>@​eddumelendez
   
   Modules
   
   New Oracle Free module (https://redirect.github.com/testcontainers/testcontainers-java/pull/7749;>testcontainers/testcontainers-java#7749)
 https://github.com/gvenzl;>@​gvenzl
   
   Elasticserach
   
   Support Elastisearch image from DockerHub (https://redirect.github.com/testcontainers/testcontainers-java/issues/;>#)
 https://github.com/eddumelendez;>@​eddumelendez
   
   JDBC
   
   Fix SQL parser (https://redirect.github.com/testcontainers/testcontainers-java/issues/7646;>#7646)
 https://github.com/inponomarev;>@​inponomarev
   
   K3S
   
   Fix K3S start command (https://redirect.github.com/testcontainers/testcontainers-java/issues/7677;>#7677)
 https://github.com/tgeens;>@​tgeens
   
   Kafka
   
   Create KafkaContainerDef (https://redirect.github.com/testcontainers/testcontainers-java/issues/7748;>#7748)
 https://github.com/eddumelendez;>@​eddumelendez
   Add examples enabling SASL with JAAS (https://redirect.github.com/testcontainers/testcontainers-java/issues/7763;>#7763)
 https://github.com/eddumelendez;>@​eddumelendez
   
   LocalStack
   
   Fix default credentials (https://redirect.github.com/testcontainers/testcontainers-java/issues/7718;>#7718)
 https://github.com/fokion;>@​fokion
   
   YugabyteDB
   
   Improve SQL wait strategy (https://redirect.github.com/testcontainers/testcontainers-java/issues/7784;>#7784)
 https://github.com/HarshDaryani896;>@​HarshDaryani896
   
   What's Changed
    Documentation
   
   Introducing Oracle Free module (https://redirect.github.com/testcontainers/testcontainers-java/issues/7749;>#7749)
 https://github.com/gvenzl;>@​gvenzl
   Update PR template with more specific wording (https://redirect.github.com/testcontainers/testcontainers-java/issues/7751;>#7751)
 https://github.com/gvenzl;>@​gvenzl
   Fix small typo in new Podman docs (https://redirect.github.com/testcontainers/testcontainers-java/issues/7722;>#7722)
 https://github.com/TheHaf;>@​TheHaf
   Deprecate file/volume mapping APIs (https://redirect.github.com/testcontainers/testcontainers-java/issues/7652;>#7652)
 https://github.com/eddumelendez;>@​eddumelendez
   Fix link to Toxiproxy docs from Kafka docs (https://redirect.github.com/testcontainers/testcontainers-java/issues/7684;>#7684)
 https://github.com/alex-sherwin;>@​alex-sherwin
   Fix documentation for BigQuery in gcloud module (https://redirect.github.com/testcontainers/testcontainers-java/issues/7681;>#7681)
 https://github.com/zanmagerl;>@​zanmagerl
   Update Docker requirements page to be more container runtime agnostic 
(https://redirect.github.com/testcontainers/testcontainers-java/issues/7655;>#7655)
 https://github.com/kiview;>@​kiview
   
    Dependency updates
   
   
   Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7810;>#7810)
 https://github.com/eddumelendez;>@​eddumelendez
   Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7809;>#7809)
 https://github.com/eddumelendez;>@​eddumelendez
   Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7807;>#7807)
 https://github.com/eddumelendez;>@​eddumelendez
   Update docker-java version to 3.3.4 (https://redirect.github.com/testcontainers/testcontainers-java/issues/7730;>#7730)
 https://github.com/eddumelendez;>@​eddumelendez
   Update kubernetes client version to 19.0.0 (https://redirect.github.com/testcontainers/testcontainers-java/issues/7716;>#7716)
 https://

Re: [PR] Bump test.containers.version from 1.19.0 to 1.19.1 [tika]

2023-10-03 Thread via GitHub


THausherr merged PR #1379:
URL: https://github.com/apache/tika/pull/1379


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] Bump test.containers.version from 1.19.0 to 1.19.1 [tika]

2023-10-02 Thread via GitHub


dependabot[bot] opened a new pull request, #1379:
URL: https://github.com/apache/tika/pull/1379

   Bumps `test.containers.version` from 1.19.0 to 1.19.1.
   Updates `org.testcontainers:testcontainers-bom` from 1.19.0 to 1.19.1
   
   Release notes
   Sourced from https://github.com/testcontainers/testcontainers-java/releases;>org.testcontainers:testcontainers-bom's
 releases.
   
   1.19.1
   Testcontainers for Java 1.19.1
   Core
   
   Allow to define a custom ImagePullPolicy via configuration (https://redirect.github.com/testcontainers/testcontainers-java/issues/7520;>#7520)
 https://github.com/eddumelendez;>@​eddumelendez
   Override ChainedImageNameSubstitutor toString (https://redirect.github.com/testcontainers/testcontainers-java/issues/7522;>#7522)
 https://github.com/eddumelendez;>@​eddumelendez
   Log image pull and container startup time independently (https://redirect.github.com/testcontainers/testcontainers-java/issues/7455;>#7455)
 https://github.com/eddumelendez;>@​eddumelendez
   
   Modules
   
   New https://java.testcontainers.org/modules/minio/;>MinIO 
module (https://redirect.github.com/testcontainers/testcontainers-java/issues/7440;>#7440)
 https://github.com/frozenwizard;>@​frozenwizard
   
   Redpanda
   
   Additional listener should inherit the configured authentication method 
(https://redirect.github.com/testcontainers/testcontainers-java/issues/7594;>#7594)
 https://github.com/lburgazzoli;>@​lburgazzoli
   
   What's Changed
   
   Migrate examples to junit5 (https://redirect.github.com/testcontainers/testcontainers-java/issues/7417;>#7417)
 https://github.com/samed-bicer;>@​samed-bicer
   
   ☠️ Deprecations
   
   Deprecate CLI utility methods in RabbitMQ module (https://redirect.github.com/testcontainers/testcontainers-java/issues/7588;>#7588)
 https://github.com/eddumelendez;>@​eddumelendez
   Deprecate withSecretInVault (https://redirect.github.com/testcontainers/testcontainers-java/issues/7576;>#7576)
 https://github.com/eddumelendez;>@​eddumelendez
   
    Documentation
   
   Proposing Update to index.md - Env Settings for Rancher Desktop (https://redirect.github.com/testcontainers/testcontainers-java/issues/7591;>#7591)
 https://github.com/sunilarjun;>@​sunilarjun
   Add docs for copyFile API (https://redirect.github.com/testcontainers/testcontainers-java/issues/4661;>#4661)
 https://github.com/kiview;>@​kiview
   Add section for dependency upgrades in PR template (https://redirect.github.com/testcontainers/testcontainers-java/issues/7577;>#7577)
 https://github.com/eddumelendez;>@​eddumelendez
   [Docs] GCloud: Add BigQuery Client creation (https://redirect.github.com/testcontainers/testcontainers-java/issues/7528;>#7528)
 https://github.com/fabriciorby;>@​fabriciorby
   Add docs to run Testcontainers using Podman (https://redirect.github.com/testcontainers/testcontainers-java/issues/7447;>#7447)
 https://github.com/eddumelendez;>@​eddumelendez
   
    Dependency updates
   
   
   Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7587;>#7587)
 https://github.com/eddumelendez;>@​eddumelendez
   Update guava version to 32.1.2-jre (https://redirect.github.com/testcontainers/testcontainers-java/issues/7534;>#7534)
 https://github.com/eddumelendez;>@​eddumelendez
   Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7584;>#7584)
 https://github.com/eddumelendez;>@​eddumelendez
   Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7519;>#7519)
 https://github.com/eddumelendez;>@​eddumelendez
   Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7500;>#7500)
 https://github.com/eddumelendez;>@​eddumelendez
   Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7496;>#7496)
 https://github.com/eddumelendez;>@​eddumelendez
   Combined dependencies PR (https://redirect.github.com/testcontainers/testcontainers-java/issues/7494;>#7494)
 https://github.com/eddumelendez;>@​eddumelendez
   
   
   
   
   
   Commits
   
   https://github.com/testcontainers/testcontainers-java/commit/dd1427ebd30bbaba7f32184f1376b7c21e725ab5;>dd1427e
 Add maven central badge in readme file
   https://github.com/testcontainers/testcontainers-java/commit/4c83a54331a1de3c4c5a71a6a1e2b617cd704dcb;>4c83a54
 Deprecate CLI utility methods in RabbitMQ module (https://redirect.github.com/testcontainers/testcontainers-java/issues/7588;>#7588)
   https://github.com/testcontainers/testcontainers-java/commit/4296b5beb6cb7e072f63a91b1b70f352e80f3ad9;>4296b5b
 Additional listeners should inherit the configured authentication method (https://redirect.github.com/testcontainers/testcontainers-java/issues/7594;>#7594)
   https://github.com/testcon

[jira] [Updated] (TIKA-3128) MOV file produces RuntimeException with 1.24.1, used to work with earlier version 1.19.1

2020-07-07 Thread Sameer Apte (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sameer Apte updated TIKA-3128:
--
Summary: MOV file produces RuntimeException with 1.24.1, used to work with 
earlier version 1.19.1  (was: MOV file produces RuntimeException with 1.24.1, 
used to work with earlier version)

> MOV file produces RuntimeException with 1.24.1, used to work with earlier 
> version 1.19.1
> 
>
> Key: TIKA-3128
> URL: https://issues.apache.org/jira/browse/TIKA-3128
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.24.1
>Reporter: Sameer Apte
>Priority: Major
> Attachments: HDSIT_157516.mov
>
>
> Attached _mov_ file produces _RuntimeException_ when parsed with *tika 
> v1.24.1*
> The same _mov_ file can be parsed without any issues with *tika v1.19.1*
>  *Tika 1.19.1 stand alone app _SUCCESSFUL_ run*
> {code:java}
> [sapte@sapte-dt tikatest]$ java -jar tika-app-1.19.1.jar -m HDSIT_157516.mov
> Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem
> WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
> for optional dependencies.Jun 18, 2020 11:25:00 AM 
> org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Content-Length: 51066400
> Content-Type: application/mp4
> Creation-Date: 2015-05-18T16:23:25Z
> Last-Modified: 2015-05-18T16:31:09Z
> Last-Save-Date: 2015-05-18T16:31:09Z
> X-Parsed-By: org.apache.tika.parser.DefaultParser
> X-Parsed-By: org.apache.tika.parser.mp4.MP4Parser
> date: 2015-05-18T16:31:09Z
> dcterms:created: 2015-05-18T16:23:25Z
> dcterms:modified: 2015-05-18T16:31:09Z
> meta:creation-date: 2015-05-18T16:23:25Z
> meta:save-date: 2015-05-18T16:31:09Z
> modified: 2015-05-18T16:31:09Z
> resourceName: HDSIT_157516.mov
> tiff:ImageLength: 1080
> tiff:ImageWidth: 1920
> xmpDM:audioSampleRate: 3
> xmpDM:duration: 125.99
>  {code}
> *Tika 1.24.1 standalone app _RUNTIMEEXCEPTION_ run*
> {code:java}
> [sapte@sapte-dt tikatest]$ java -jar tika-app-1.24.1.jar -m HDSIT_157516.mov
> Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem
> WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
> for optional dependencies.
> Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 
> handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Exception in thread "main" org.apache.tika.exception.TikaException: 
> Unexpected RuntimeException from org.apache.tika.parser.mp4.MP4Parser@23348b5d
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>   at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209)
>   at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496)
>   at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
> Caused by: java.lang.RuntimeException: box size of zero means 'till end of 
> file. That is not yet supported
>   at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:90)
>   at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
>   at 
> org.mp4parser.boxes.sampleentry.VisualSampleEntry.parse(VisualSampleEntry.java:195)
>   at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
>   at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
>   at 
> org.mp4parser.boxes.iso14496.part12.SampleDescriptionBox.parse(SampleDescriptionBox.java:91)
>   at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
>   at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
>   at 
> org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
>   at org.mp4parser.AbstractBoxParser.parseBox(A

[jira] [Resolved] (TIKA-2855) pdfbox version used by both Apache Tika 1.19.1 and 1.20 is vulnerable

2019-04-19 Thread Tim Allison (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2855.
---
Resolution: Duplicate

Thank you!

> pdfbox version used by both Apache Tika 1.19.1 and 1.20 is vulnerable
> -
>
> Key: TIKA-2855
> URL: https://issues.apache.org/jira/browse/TIKA-2855
> Project: Tika
>  Issue Type: Bug
>  Components: core
>    Affects Versions: 1.19.1
>Reporter: Abhijit Rajwade
>Priority: Major
>
> As per Sonatype Nexus Auditor, pdfbox versions upto 2.0.14 are vulnerable to
> "CVE-2019-0228: possible XML External Entity (XXE) attack".
> Recommended fix is to upgrade to pdfbox version 2.0.15
> Refer following pdfbox issue 
>   https://issues.apache.org/jira/browse/PDFBOX-4505 
> which is fixed on version 2.0.15
> Can you please upgrade Apache Tika to use pdfbox 2.0.15?
> Following are details from the Sonatype Nexus scan report
> Issue: CVE-2019-0228 
> Severity: Sonatype CVSS 3.0: 7.3 
> Weakness: Sonatype CWE: 611 
> Source: National Vulnerability Database 
> Categories: Data 
> Description from CVE: apache pdfbox - XML External Entity (XXE) 
> Root Cause: pdfbox-2.0.12.jar : ( , 2.0.15) 
> Advisories:
> Project: https://github.com/apache/pdfbox-docs/commit/b7869c3e4c62c5d...
> Project: https://issues.apache.org/jira/browse/PDFBOX-4505
> Third Party: https://bugzilla.redhat.com/show_bug.cgi?id=1699740 
> CVSS Details:
> Sonatype CVSS 3.0: 7.3
> CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:L 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TIKA-2855) pdfbox version used by both Apache Tika 1.19.1 and 1.20 is vulnerable

2019-04-18 Thread Abhijit Rajwade (JIRA)
Abhijit Rajwade created TIKA-2855:
-

 Summary: pdfbox version used by both Apache Tika 1.19.1 and 1.20 
is vulnerable
 Key: TIKA-2855
 URL: https://issues.apache.org/jira/browse/TIKA-2855
 Project: Tika
  Issue Type: Bug
  Components: core
Affects Versions: 1.19.1
Reporter: Abhijit Rajwade


As per Sonatype Nexus Auditor, pdfbox versions upto 2.0.14 are vulnerable to
"CVE-2019-0228: possible XML External Entity (XXE) attack".

Recommended fix is to upgrade to pdfbox version 2.0.15
Refer following pdfbox issue 
  https://issues.apache.org/jira/browse/PDFBOX-4505 
which is fixed on version 2.0.15

Can you please upgrade Apache Tika to use pdfbox 2.0.15?

Following are details from the Sonatype Nexus scan report

Issue: CVE-2019-0228 
Severity: Sonatype CVSS 3.0: 7.3 
Weakness: Sonatype CWE: 611 
Source: National Vulnerability Database 
Categories: Data 

Description from CVE: apache pdfbox - XML External Entity (XXE) 
Root Cause: pdfbox-2.0.12.jar : ( , 2.0.15) 
Advisories:
Project: https://github.com/apache/pdfbox-docs/commit/b7869c3e4c62c5d...
Project: https://issues.apache.org/jira/browse/PDFBOX-4505
Third Party: https://bugzilla.redhat.com/show_bug.cgi?id=1699740 
CVSS Details:
Sonatype CVSS 3.0: 7.3
CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:L 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[ANNOUNCE] Apache Tika 1.19.1 released

2018-10-09 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.19.1. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync, so the releases
should be available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.19.1 contains two critical bug fixes to the
MP3Parser and the handling of SAX parsing.  Details can be found in
the changes file:
https://www.apache.org/dist/tika/CHANGES-1.19.1.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using signatures found on the Apache site:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[RESULT][VOTE] Release Apache Tika 1.19.1 Candidate #2

2018-10-09 Thread Tim Allison
The vote has passed.

+1 from
Tim Allison
Dave Meikle
Oleg Tikhonov
Thejan Wijesinghe


Thank you, all!

Cheers,

   Tim

On Tue, Oct 9, 2018 at 1:11 PM Thejan Wijesinghe
 wrote:
>
> All tests passed for me on my linux, +1 from me.
>
> On Tue, Oct 9, 2018 at 10:37 PM Tim Allison  wrote:
>
> > No problem at all.  Thank you, Oleg!
> > On Tue, Oct 9, 2018 at 12:33 PM Oleg Tikhonov 
> > wrote:
> > >
> > > sorry.
> > > +1
> > >
> > > On Tue, Oct 9, 2018 at 7:26 PM Tim Allison  wrote:
> > >
> > > > Thank you, Dave!
> > > >
> > > > Fellow devs, would anyone else have a chance to vote?  We need a third
> > > > for the release.  Thank you!
> > > > On Mon, Oct 8, 2018 at 4:36 AM  wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > On Thu, 4 Oct 2018 at 23:03, Tim Allison 
> > wrote:
> > > > >>
> > > > >> A candidate for the Tika 1.19.1 release is available at:
> > > > >>   https://dist.apache.org/repos/dist/dev/tika/
> > > > >>
> > > > >> The release candidate is a zip archive of the sources in:
> > > > >>   https://github.com/apache/tika/tree/1.19.1-rc2/
> > > > >>
> > > > >> The SHA-512 checksum of the archive is
> > > > >>
> > > >
> > 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb
> > > > >>
> > > > >> In addition, a staged maven repository is available here:
> > > > >>
> > > >
> > https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika
> > > > >>
> > > > >> Please vote on releasing this package as Apache Tika 1.19.1.
> > > > >>
> > > > >> The vote is open for the next 72 hours and passes if a majority of
> > at
> > > > >> least three +1 Tika PMC votes are cast.
> > > > >>
> > > > >> [ ] +1 Release this package as Apache Tika 1.19.1
> > > > >> [ ] -1 Do not release this package because...
> > > > >
> > > > >
> > > > > +1 from me.
> > > > >
> > > > > Thanks for rolling the release Tim!
> > > > >
> > > > > Cheers,
> > > > > Dave
> > > >
> >


Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2

2018-10-09 Thread Thejan Wijesinghe
All tests passed for me on my linux, +1 from me.

On Tue, Oct 9, 2018 at 10:37 PM Tim Allison  wrote:

> No problem at all.  Thank you, Oleg!
> On Tue, Oct 9, 2018 at 12:33 PM Oleg Tikhonov 
> wrote:
> >
> > sorry.
> > +1
> >
> > On Tue, Oct 9, 2018 at 7:26 PM Tim Allison  wrote:
> >
> > > Thank you, Dave!
> > >
> > > Fellow devs, would anyone else have a chance to vote?  We need a third
> > > for the release.  Thank you!
> > > On Mon, Oct 8, 2018 at 4:36 AM  wrote:
> > > >
> > > > Hello,
> > > >
> > > > On Thu, 4 Oct 2018 at 23:03, Tim Allison 
> wrote:
> > > >>
> > > >> A candidate for the Tika 1.19.1 release is available at:
> > > >>   https://dist.apache.org/repos/dist/dev/tika/
> > > >>
> > > >> The release candidate is a zip archive of the sources in:
> > > >>   https://github.com/apache/tika/tree/1.19.1-rc2/
> > > >>
> > > >> The SHA-512 checksum of the archive is
> > > >>
> > >
> 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb
> > > >>
> > > >> In addition, a staged maven repository is available here:
> > > >>
> > >
> https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika
> > > >>
> > > >> Please vote on releasing this package as Apache Tika 1.19.1.
> > > >>
> > > >> The vote is open for the next 72 hours and passes if a majority of
> at
> > > >> least three +1 Tika PMC votes are cast.
> > > >>
> > > >> [ ] +1 Release this package as Apache Tika 1.19.1
> > > >> [ ] -1 Do not release this package because...
> > > >
> > > >
> > > > +1 from me.
> > > >
> > > > Thanks for rolling the release Tim!
> > > >
> > > > Cheers,
> > > > Dave
> > >
>


Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2

2018-10-09 Thread Tim Allison
No problem at all.  Thank you, Oleg!
On Tue, Oct 9, 2018 at 12:33 PM Oleg Tikhonov  wrote:
>
> sorry.
> +1
>
> On Tue, Oct 9, 2018 at 7:26 PM Tim Allison  wrote:
>
> > Thank you, Dave!
> >
> > Fellow devs, would anyone else have a chance to vote?  We need a third
> > for the release.  Thank you!
> > On Mon, Oct 8, 2018 at 4:36 AM  wrote:
> > >
> > > Hello,
> > >
> > > On Thu, 4 Oct 2018 at 23:03, Tim Allison  wrote:
> > >>
> > >> A candidate for the Tika 1.19.1 release is available at:
> > >>   https://dist.apache.org/repos/dist/dev/tika/
> > >>
> > >> The release candidate is a zip archive of the sources in:
> > >>   https://github.com/apache/tika/tree/1.19.1-rc2/
> > >>
> > >> The SHA-512 checksum of the archive is
> > >>
> >  
> > 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb
> > >>
> > >> In addition, a staged maven repository is available here:
> > >>
> > https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika
> > >>
> > >> Please vote on releasing this package as Apache Tika 1.19.1.
> > >>
> > >> The vote is open for the next 72 hours and passes if a majority of at
> > >> least three +1 Tika PMC votes are cast.
> > >>
> > >> [ ] +1 Release this package as Apache Tika 1.19.1
> > >> [ ] -1 Do not release this package because...
> > >
> > >
> > > +1 from me.
> > >
> > > Thanks for rolling the release Tim!
> > >
> > > Cheers,
> > > Dave
> >


Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2

2018-10-09 Thread Oleg Tikhonov
sorry.
+1

On Tue, Oct 9, 2018 at 7:26 PM Tim Allison  wrote:

> Thank you, Dave!
>
> Fellow devs, would anyone else have a chance to vote?  We need a third
> for the release.  Thank you!
> On Mon, Oct 8, 2018 at 4:36 AM  wrote:
> >
> > Hello,
> >
> > On Thu, 4 Oct 2018 at 23:03, Tim Allison  wrote:
> >>
> >> A candidate for the Tika 1.19.1 release is available at:
> >>   https://dist.apache.org/repos/dist/dev/tika/
> >>
> >> The release candidate is a zip archive of the sources in:
> >>   https://github.com/apache/tika/tree/1.19.1-rc2/
> >>
> >> The SHA-512 checksum of the archive is
> >>
>  
> 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb
> >>
> >> In addition, a staged maven repository is available here:
> >>
> https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika
> >>
> >> Please vote on releasing this package as Apache Tika 1.19.1.
> >>
> >> The vote is open for the next 72 hours and passes if a majority of at
> >> least three +1 Tika PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Tika 1.19.1
> >> [ ] -1 Do not release this package because...
> >
> >
> > +1 from me.
> >
> > Thanks for rolling the release Tim!
> >
> > Cheers,
> > Dave
>


Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2

2018-10-09 Thread Tim Allison
Thank you, Dave!

Fellow devs, would anyone else have a chance to vote?  We need a third
for the release.  Thank you!
On Mon, Oct 8, 2018 at 4:36 AM  wrote:
>
> Hello,
>
> On Thu, 4 Oct 2018 at 23:03, Tim Allison  wrote:
>>
>> A candidate for the Tika 1.19.1 release is available at:
>>   https://dist.apache.org/repos/dist/dev/tika/
>>
>> The release candidate is a zip archive of the sources in:
>>   https://github.com/apache/tika/tree/1.19.1-rc2/
>>
>> The SHA-512 checksum of the archive is
>>   
>> 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb
>>
>> In addition, a staged maven repository is available here:
>>   
>> https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika
>>
>> Please vote on releasing this package as Apache Tika 1.19.1.
>>
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Tika PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Tika 1.19.1
>> [ ] -1 Do not release this package because...
>
>
> +1 from me.
>
> Thanks for rolling the release Tim!
>
> Cheers,
> Dave


Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2

2018-10-08 Thread Mattmann, Chris A (1761)
I’ll review tonight!

Sent from my iPhone

> On Oct 8, 2018, at 6:40 PM, Tim Allison  wrote:
> 
> Third +1?
> 
>> On Thu, Oct 4, 2018 at 6:03 PM Tim Allison  wrote:
>> 
>> A candidate for the Tika 1.19.1 release is available at:
>>  https://dist.apache.org/repos/dist/dev/tika/
>> 
>> The release candidate is a zip archive of the sources in:
>>  https://github.com/apache/tika/tree/1.19.1-rc2/
>> 
>> The SHA-512 checksum of the archive is
>> 
>> 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb
>> 
>> In addition, a staged maven repository is available here:
>> 
>> https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika
>> 
>> Please vote on releasing this package as Apache Tika 1.19.1.
>> 
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Tika PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache Tika 1.19.1
>> [ ] -1 Do not release this package because...
>> 
>> Here's my +1.
>> 
>> Cheers,
>> 
>>  Tim
>> 


Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2

2018-10-08 Thread Tim Allison
Third +1?

On Thu, Oct 4, 2018 at 6:03 PM Tim Allison  wrote:

> A candidate for the Tika 1.19.1 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>   https://github.com/apache/tika/tree/1.19.1-rc2/
>
> The SHA-512 checksum of the archive is
>
> 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika
>
> Please vote on releasing this package as Apache Tika 1.19.1.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.19.1
> [ ] -1 Do not release this package because...
>
> Here's my +1.
>
> Cheers,
>
>   Tim
>


Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2

2018-10-08 Thread loompa
Hello,

On Thu, 4 Oct 2018 at 23:03, Tim Allison  wrote:

> A candidate for the Tika 1.19.1 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>   https://github.com/apache/tika/tree/1.19.1-rc2/
>
> The SHA-512 checksum of the archive is
>
> 4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika
>
> Please vote on releasing this package as Apache Tika 1.19.1.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.19.1
> [ ] -1 Do not release this package because...
>

+1 from me.

Thanks for rolling the release Tim!

Cheers,
Dave


[VOTE] Release Apache Tika 1.19.1 Candidate #2

2018-10-04 Thread Tim Allison
A candidate for the Tika 1.19.1 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
  https://github.com/apache/tika/tree/1.19.1-rc2/

The SHA-512 checksum of the archive is
  
4f89216eb3332288c4839139e4af78395fefb3c03be4a6d41a8c9ffadebf69e1732afced25e7fe3c563fb6ce95726a89bd9924c69ddab8e6875a45eec1564fcb

In addition, a staged maven repository is available here:
  
https://repository.apache.org/content/repositories/orgapachetika-1045/org/apache/tika

Please vote on releasing this package as Apache Tika 1.19.1.

The vote is open for the next 72 hours and passes if a majority of at
least three +1 Tika PMC votes are cast.

[ ] +1 Release this package as Apache Tika 1.19.1
[ ] -1 Do not release this package because...

Here's my +1.

Cheers,

  Tim


[CANCEL][VOTE] Release Apache Tika 1.19.1 Candidate #1

2018-10-01 Thread Tim Allison
All,
  The release process for PDFBox 2.0.12 should start today[1].  That
has some important updates.  Let's cancel RC1, and I'll roll RC2 as
soon as PDFBox 2.0.12 hits maven.  I've already run regression tests
on PDFs with 2.0.12-SNAPSHOT, so we'll be good to move quickly.

  This is my official -1 on RC1.

   Best,

 Tim


[1] 
https://lists.apache.org/thread.html/3992edce61194ca79b37044d5f43597f1e240a1d022bbc3569e3a517@%3Cdev.pdfbox.apache.org%3E
On Thu, Sep 27, 2018 at 7:48 AM  wrote:
>
>
> On Wed, 26 Sep 2018 at 20:20, Tim Allison  wrote:
>>
>> A candidate for the Tika 1.19.1 release is available at:
>>   https://dist.apache.org/repos/dist/dev/tika/
>>
>> The release candidate is a zip archive of the sources in:
>>   https://github.com/apache/tika/tree/1.19.1-rc1/
>>
>> The SHA-512 checksum of the archive is
>>   
>> 88c79c106d78983effc9b41147b46b3722cb7afb8c847d340d3504f56488b8a7267fd634efe638afd2a2c52419fe6b84249ac6e641d5c8c5e6e4795f004b9a45
>>
>> In addition, a staged maven repository is available here:
>>   
>> https://repository.apache.org/content/repositories/orgapachetika-1044/org/apache/tika
>>
>> Please vote on releasing this package as Apache Tika 1.19.1.
>>
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Tika PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Tika 1.19.1
>> [ ] -1 Do not release this package because...
>
>
> +1 - Checksum OK, Signatures OK (although need to get you some trust, Tim) 
> and test results looked good.
>
> I noticed a minor issue on a clean Ubuntu 18.04 with the Python rotation 
> script when I didn't have python-tk installed the rotation script fails and 
> thus the build.  I've got a patch for the check so it looks for this but 
> don't think it is worth stopping this RC for, so will fire it in JIRA.
>
> Thanks for rolling this RC.
>
> Cheers,
> Dave


Re: [VOTE] Release Apache Tika 1.19.1 Candidate #1

2018-09-27 Thread loompa
On Wed, 26 Sep 2018 at 20:20, Tim Allison  wrote:

> A candidate for the Tika 1.19.1 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>   https://github.com/apache/tika/tree/1.19.1-rc1/
>
> The SHA-512 checksum of the archive is
>
> 88c79c106d78983effc9b41147b46b3722cb7afb8c847d340d3504f56488b8a7267fd634efe638afd2a2c52419fe6b84249ac6e641d5c8c5e6e4795f004b9a45
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1044/org/apache/tika
>
> Please vote on releasing this package as Apache Tika 1.19.1.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.19.1
> [ ] -1 Do not release this package because...
>

+1 - Checksum OK, Signatures OK (although need to get you some trust, Tim)
and test results looked good.

I noticed a minor issue on a clean Ubuntu 18.04 with the Python rotation
script when I didn't have python-tk installed the rotation script fails and
thus the build.  I've got a patch for the check so it looks for this but
don't think it is worth stopping this RC for, so will fire it in JIRA.

Thanks for rolling this RC.

Cheers,
Dave


Re: [VOTE] Release Apache Tika 1.19.1 Candidate #1

2018-09-26 Thread Tim Allison
Side note: I didn't run the full regression testing, but I did run
against the ~10k mp3 files in our corpus and found the same 4
exceptions.
On Wed, Sep 26, 2018 at 3:20 PM Tim Allison  wrote:
>
> A candidate for the Tika 1.19.1 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>   https://github.com/apache/tika/tree/1.19.1-rc1/
>
> The SHA-512 checksum of the archive is
>   
> 88c79c106d78983effc9b41147b46b3722cb7afb8c847d340d3504f56488b8a7267fd634efe638afd2a2c52419fe6b84249ac6e641d5c8c5e6e4795f004b9a45
>
> In addition, a staged maven repository is available here:
>   
> https://repository.apache.org/content/repositories/orgapachetika-1044/org/apache/tika
>
> Please vote on releasing this package as Apache Tika 1.19.1.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.19.1
> [ ] -1 Do not release this package because...
>
> Here's my +1.
>
> Cheers,
>
>   Tim


[VOTE] Release Apache Tika 1.19.1 Candidate #1

2018-09-26 Thread Tim Allison
A candidate for the Tika 1.19.1 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
  https://github.com/apache/tika/tree/1.19.1-rc1/

The SHA-512 checksum of the archive is
  
88c79c106d78983effc9b41147b46b3722cb7afb8c847d340d3504f56488b8a7267fd634efe638afd2a2c52419fe6b84249ac6e641d5c8c5e6e4795f004b9a45

In addition, a staged maven repository is available here:
  
https://repository.apache.org/content/repositories/orgapachetika-1044/org/apache/tika

Please vote on releasing this package as Apache Tika 1.19.1.

The vote is open for the next 72 hours and passes if a majority of at
least three +1 Tika PMC votes are cast.

[ ] +1 Release this package as Apache Tika 1.19.1
[ ] -1 Do not release this package because...

Here's my +1.

Cheers,

  Tim


Re: 1.19.1?

2018-09-25 Thread Chris Mattmann
Sounds great!

 

 

 

From: Tim Allison 
Reply-To: "dev@tika.apache.org" 
Date: Tuesday, September 25, 2018 at 9:40 AM
To: "dev@tika.apache.org" 
Subject: Re: 1.19.1?

 

Given the mp3 issue and some other items, let's go with 1.19.1 rc1

today or tomorrow?

On Mon, Sep 24, 2018 at 3:07 PM Nick Burch  wrote:

 

On Mon, 24 Sep 2018, Tim Allison wrote:

> Aside from the problem with users and non-standard XML parsers, were

> there any other show-stoppers in POI 4.0.0?  Is there a reason to wait

> for POI 4.0.1?

 

I think, in terms of Tika affecting bugs, it was the xml parser stuff, and

commons compress missing from the pom.

 

Nick

 



Re: 1.19.1?

2018-09-25 Thread Tim Allison
Given the mp3 issue and some other items, let's go with 1.19.1 rc1
today or tomorrow?
On Mon, Sep 24, 2018 at 3:07 PM Nick Burch  wrote:
>
> On Mon, 24 Sep 2018, Tim Allison wrote:
> > Aside from the problem with users and non-standard XML parsers, were
> > there any other show-stoppers in POI 4.0.0?  Is there a reason to wait
> > for POI 4.0.1?
>
> I think, in terms of Tika affecting bugs, it was the xml parser stuff, and
> commons compress missing from the pom.
>
> Nick


Re: 1.19.1?

2018-09-24 Thread Nick Burch

On Mon, 24 Sep 2018, Tim Allison wrote:

Aside from the problem with users and non-standard XML parsers, were
there any other show-stoppers in POI 4.0.0?  Is there a reason to wait
for POI 4.0.1?


I think, in terms of Tika affecting bugs, it was the xml parser stuff, and 
commons compress missing from the pom.


Nick


Re: 1.19.1?

2018-09-24 Thread Tim Allison
Nick,
  Aside from the problem with users and non-standard XML parsers, were
there any other show-stoppers in POI 4.0.0?  Is there a reason to wait
for POI 4.0.1?
On Fri, Sep 21, 2018 at 12:48 PM Chris Mattmann  wrote:
>
> Let’s roll it….
>
>
>
>
>
>
>
> From: Tim Allison 
> Reply-To: "dev@tika.apache.org" 
> Date: Wednesday, September 19, 2018 at 12:14 PM
> To: "dev@tika.apache.org" 
> Subject: 1.19.1?
>
>
>
> The mp3 regression is bad. In hindsight, the Tika-eval reports were fairly
>
> clear on this but I did some self-hand-waving to excuse away the
>
> numbers...I shouldn’t have.
>
>
>
> I want to add some new reports to tika-eval so that this never happens
>
> again.
>
>
>
> How long should we wait for 1.19.1 or 1.20?
>
>
>
> Best,
>
>
>
> Tim
>
>
>
> On Wed, Sep 19, 2018 at 2:29 PM Hudson (JIRA)  wrote:
>
>
>
>
>
>  [
>
> https://issues.apache.org/jira/browse/TIKA-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621008#comment-16621008
>
> ]
>
>
>
> Hudson commented on TIKA-2730:
>
> --
>
>
>
> SUCCESS: Integrated in Jenkins build tika-branch-1x #94 (See [
>
> https://builds.apache.org/job/tika-branch-1x/94/])
>
> TIKA-2730 -- allow last frame to be truncated w/o throwing an EOF
>
> (tallison: [
>
> https://github.com/apache/tika/commit/80cfd6d4a4270f8f3697c6dc083b3dedfc36c86a
>
> ])
>
> * (edit)
>
> tika-parsers/src/main/java/org/apache/tika/parser/mp3/MpegStream.java
>
> * (edit)
>
> tika-parsers/src/test/java/org/apache/tika/parser/mp3/Mp3ParserTest.java
>
> * (add)
>
> tika-parsers/src/test/resources/test-documents/testMP3i18n_truncated.mp3
>
> * (edit)
>
> tika-parsers/src/main/java/org/apache/tika/parser/mp3/Mp3Parser.java
>
>
>
>
>
> > parseToString fails for a simple mp3
>
> > 
>
> >
>
> > Key: TIKA-2730
>
> > URL: https://issues.apache.org/jira/browse/TIKA-2730
>
> > Project: Tika
>
> >  Issue Type: Bug
>
> >Affects Versions: 1.19
>
> >Reporter: Boris Petrov
>
> >Assignee: Tim Allison
>
> >Priority: Major
>
> > Fix For: 2.0.0, 1.20
>
> >
>
> > Attachments: demo.mp3
>
> >
>
> >
>
> > This is a regression from 1.18. I've attached the mp3 that fails. The
>
> exception I get is:
>
> > {noformat}
>
> > org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException
>
> from org.apache.tika.parser.mp3.Mp3Parser@cefe6c6
>
> > at
>
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
>
> > at
>
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
> > at
>
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>
> > at org.apache.tika.Tika.parseToString(Tika.java:527)
>
> > at com.company.TextExtractor.getText(TextExtractor.java:39)
>
> > Caused by:
>
> > java.io.EOFException: EOF: tried to skip 361 but could only skip 247
>
> > at
>
> org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:166)
>
> > at
>
> org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:204)
>
> > at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
>
> > at
>
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
> > ... 5 more{noformat}
>
>
>
>
>
>
>
> --
>
> This message was sent by Atlassian JIRA
>
> (v7.6.3#76005)
>
>
>
>
>


Re: 1.19.1?

2018-09-21 Thread Chris Mattmann
Let’s roll it….

 

 

 

From: Tim Allison 
Reply-To: "dev@tika.apache.org" 
Date: Wednesday, September 19, 2018 at 12:14 PM
To: "dev@tika.apache.org" 
Subject: 1.19.1?

 

The mp3 regression is bad. In hindsight, the Tika-eval reports were fairly

clear on this but I did some self-hand-waving to excuse away the

numbers...I shouldn’t have.

 

I want to add some new reports to tika-eval so that this never happens

again.

 

How long should we wait for 1.19.1 or 1.20?

 

Best,

 

Tim

 

On Wed, Sep 19, 2018 at 2:29 PM Hudson (JIRA)  wrote:

 

 

 [

https://issues.apache.org/jira/browse/TIKA-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621008#comment-16621008

]

 

Hudson commented on TIKA-2730:

--

 

SUCCESS: Integrated in Jenkins build tika-branch-1x #94 (See [

https://builds.apache.org/job/tika-branch-1x/94/])

TIKA-2730 -- allow last frame to be truncated w/o throwing an EOF

(tallison: [

https://github.com/apache/tika/commit/80cfd6d4a4270f8f3697c6dc083b3dedfc36c86a

])

* (edit)

tika-parsers/src/main/java/org/apache/tika/parser/mp3/MpegStream.java

* (edit)

tika-parsers/src/test/java/org/apache/tika/parser/mp3/Mp3ParserTest.java

* (add)

tika-parsers/src/test/resources/test-documents/testMP3i18n_truncated.mp3

* (edit)

tika-parsers/src/main/java/org/apache/tika/parser/mp3/Mp3Parser.java

 

 

> parseToString fails for a simple mp3

> 

> 

> Key: TIKA-2730

> URL: https://issues.apache.org/jira/browse/TIKA-2730

> Project: Tika

>  Issue Type: Bug

>Affects Versions: 1.19

>Reporter: Boris Petrov

>Assignee: Tim Allison

>Priority: Major

> Fix For: 2.0.0, 1.20

> 

> Attachments: demo.mp3

> 

> 

> This is a regression from 1.18. I've attached the mp3 that fails. The

exception I get is:

> {noformat}

> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException

from org.apache.tika.parser.mp3.Mp3Parser@cefe6c6

> at

org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)

> at

org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

> at

org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

> at org.apache.tika.Tika.parseToString(Tika.java:527)

> at com.company.TextExtractor.getText(TextExtractor.java:39)

> Caused by:

> java.io.EOFException: EOF: tried to skip 361 but could only skip 247

> at

org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:166)

> at

org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:204)

> at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)

> at

org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

> ... 5 more{noformat}

 

 

 

--

This message was sent by Atlassian JIRA

(v7.6.3#76005)

 

 



Re: 1.19.1?

2018-09-19 Thread Tim Allison
Y, and I think I duplicated that bug when I copied/pasted from POI to
Tika, so that's a good reminder to fix that in Tika asap as well as
potentially wait for POI 4.0.1.  Thank you!
On Wed, Sep 19, 2018 at 4:53 PM Nick Burch  wrote:
>
> On Wed, 19 Sep 2018, Tim Allison wrote:
> > The mp3 regression is bad. In hindsight, the Tika-eval reports were
> > fairly clear on this but I did some self-hand-waving to excuse away the
> > numbers...I shouldn’t have.
> >
> > I want to add some new reports to tika-eval so that this never happens
> > again.
> >
> > How long should we wait for 1.19.1 or 1.20?
>
> There's a POI xml bug on certain older platforms (POI tries too hard to
> lock down the xml settings even if the xml parser doesn't do that...),
> maybe worth trying to get a POI 4.0.1 out, then do a Tika 1.19.1 or 1.20
> (depending on how many other bugs we spot in the POI wait!)
>
> Nick


Re: 1.19.1?

2018-09-19 Thread Nick Burch

On Wed, 19 Sep 2018, Tim Allison wrote:
The mp3 regression is bad. In hindsight, the Tika-eval reports were 
fairly clear on this but I did some self-hand-waving to excuse away the 
numbers...I shouldn’t have.


I want to add some new reports to tika-eval so that this never happens
again.

How long should we wait for 1.19.1 or 1.20?


There's a POI xml bug on certain older platforms (POI tries too hard to 
lock down the xml settings even if the xml parser doesn't do that...), 
maybe worth trying to get a POI 4.0.1 out, then do a Tika 1.19.1 or 1.20 
(depending on how many other bugs we spot in the POI wait!)


Nick

1.19.1?

2018-09-19 Thread Tim Allison
The mp3 regression is bad. In hindsight, the Tika-eval reports were fairly
clear on this but I did some self-hand-waving to excuse away the
numbers...I shouldn’t have.

I want to add some new reports to tika-eval so that this never happens
again.

How long should we wait for 1.19.1 or 1.20?

Best,

Tim

On Wed, Sep 19, 2018 at 2:29 PM Hudson (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/TIKA-2730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621008#comment-16621008
> ]
>
> Hudson commented on TIKA-2730:
> --
>
> SUCCESS: Integrated in Jenkins build tika-branch-1x #94 (See [
> https://builds.apache.org/job/tika-branch-1x/94/])
> TIKA-2730 -- allow last frame to be truncated w/o throwing an EOF
> (tallison: [
> https://github.com/apache/tika/commit/80cfd6d4a4270f8f3697c6dc083b3dedfc36c86a
> ])
> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/mp3/MpegStream.java
> * (edit)
> tika-parsers/src/test/java/org/apache/tika/parser/mp3/Mp3ParserTest.java
> * (add)
> tika-parsers/src/test/resources/test-documents/testMP3i18n_truncated.mp3
> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/mp3/Mp3Parser.java
>
>
> > parseToString fails for a simple mp3
> > 
> >
> > Key: TIKA-2730
> > URL: https://issues.apache.org/jira/browse/TIKA-2730
> > Project: Tika
> >  Issue Type: Bug
> >Affects Versions: 1.19
> >Reporter: Boris Petrov
> >Assignee: Tim Allison
> >Priority: Major
> > Fix For: 2.0.0, 1.20
> >
> > Attachments: demo.mp3
> >
> >
> > This is a regression from 1.18. I've attached the mp3 that fails. The
> exception I get is:
> > {noformat}
> > org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException
> from org.apache.tika.parser.mp3.Mp3Parser@cefe6c6
> > at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
> > at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> > at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> > at org.apache.tika.Tika.parseToString(Tika.java:527)
> > at com.company.TextExtractor.getText(TextExtractor.java:39)
> > Caused by:
> > java.io.EOFException: EOF: tried to skip 361 but could only skip 247
> > at
> org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:166)
> > at
> org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:204)
> > at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
> > at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> > ... 5 more{noformat}
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>