[jira] [Commented] (TIKA-3323) FileCommandDetectorTest inconsistent results depending on platform
[ https://issues.apache.org/jira/browse/TIKA-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306537#comment-17306537 ] Andrew Pavlin commented on TIKA-3323: - I hope you noticed additional changes would be needed. In my original proposed fix, I forgot to repeat the fixes in lines 57-60. > FileCommandDetectorTest inconsistent results depending on platform > -- > > Key: TIKA-3323 > URL: https://issues.apache.org/jira/browse/TIKA-3323 > Project: Tika > Issue Type: Bug > Components: core >Affects Versions: 1.25 >Reporter: Andrew Pavlin >Assignee: Tim Allison >Priority: Major > Fix For: 1.26 > > > The unit test for org.apache.tika.detect.FileCommandDetector fails on some > platforms due to inconsistent return values from the operating system's "file > --mime" command. For example, on Fedora Core 32 and Mint 19, the test case > returns "text/xml". However, on Oracle Enterprise Linux 6, the test case > returns "application/xml", which is equally valid but causes the unit test > case to fail. When the unit test case fails, it is impossible to build Tika. > The unit test program should be fixed to accept either answer so all dialects > of the file command work successfully. A proposed patch is included below: > {code} > diff --git > a/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java > > b/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java > index 21a24ab..1911e05 100644 > --- > a/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java > +++ > b/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java > @@ -44,9 +44,11 @@ public class FileCommandDetectorTest { > assumeTrue(FileCommandDetector.checkHasFile()); > > try (InputStream is = > getClass().getResourceAsStream("/test-documents/basic_embedded.xml")) { > - assertEquals(MediaType.text("xml"), DETECTOR.detect(is, new Metadata())); > + MediaType answer = DETECTOR.detect(is, new Metadata())); > + assert(MediaType.text("xml").equals(answer) || > MediaType.application("xml").equals(answer)); > //make sure that the detector is resetting the stream > - assertEquals(MediaType.text("xml"), DETECTOR.detect(is, new Metadata())); > + answer = DETECTOR.detect(is, new Metadata())); > + assert(MediaType.text("xml").equals(answer) || > MediaType.application("xml").equals(answer)); > } > > //now try with TikaInputStream > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (TIKA-3323) FileCommandDetectorTest incocnistent results depending on platform
Andrew Pavlin created TIKA-3323: --- Summary: FileCommandDetectorTest incocnistent results depending on platform Key: TIKA-3323 URL: https://issues.apache.org/jira/browse/TIKA-3323 Project: Tika Issue Type: Bug Components: core Affects Versions: 1.25 Reporter: Andrew Pavlin The unit test for org.apache.tika.detect.FileCommandDetector fails on some platforms due to inconsistent return values from the operating system's "file --mime" command. For example, on Fedora Core 32 and Mint 19, the test case returns "text/xml". However, on Oracle Enterprise Linux 6, the test case returns "application/xml", which is equally valid but causes the unit test case to fail. When the unit test case fails, it is impossible to build Tika. The unit test program should be fixed to accept either answer so all dialects of the file command work successfully. A proposed patch is included below: diff --git a/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java b/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java index 21a24ab..1911e05 100644 --- a/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java +++ b/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java @@ -44,9 +44,11 @@ public class FileCommandDetectorTest { assumeTrue(FileCommandDetector.checkHasFile()); try (InputStream is = getClass().getResourceAsStream("/test-documents/basic_embedded.xml")) { - assertEquals(MediaType.text("xml"), DETECTOR.detect(is, new Metadata())); + MediaType answer = DETECTOR.detect(is, new Metadata())); + assert(MediaType.text("xml").equals(answer) || MediaType.application("xml").equals(answer)); //make sure that the detector is resetting the stream - assertEquals(MediaType.text("xml"), DETECTOR.detect(is, new Metadata())); + answer = DETECTOR.detect(is, new Metadata())); + assert(MediaType.text("xml").equals(answer) || MediaType.application("xml").equals(answer)); } //now try with TikaInputStream -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TIKA-2854) upgrade out-of-date dependencies with outstanding CVEs
[ https://issues.apache.org/jira/browse/TIKA-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824212#comment-16824212 ] Andrew Pavlin commented on TIKA-2854: - Regarding your question on the ucar versions, I got those version numbers from the ucar.edu website. Not sure why (at the time of my bug submittal) they listed different version numbers for the different software modules, but even their examples for POM files show using common version numbers for netcdf, cdm, grib. I may also have been reading an incorrect page, as they now say the current version is consistently 4.6.13 as of today. > upgrade out-of-date dependencies with outstanding CVEs > -- > > Key: TIKA-2854 > URL: https://issues.apache.org/jira/browse/TIKA-2854 > Project: Tika > Issue Type: Bug > Components: languageidentifier, parser >Affects Versions: 1.20 >Reporter: Andrew Pavlin >Priority: Major > > Besides the libraries reported in TIKA-2801 and TIKA-2835, the following 4th > party dependencies are out-of-date and should be upgraded to the latest > versions. The first three have outstanding CVEs which would be resolved by > using the newer versions of those dependencies. > jackson-databind (is 2.9.7, should be 2.9.8) > guava (is 17.0, should be 27.0) > sqlite-jdbc (is 3.25.2, should be 3.27.2.1) > No current CVEs but still out-of-date: > Apache commons-codec (is 1.11, should be 1.12) > Apache CXF (is 3.2.7, should be 3.3.1) > Apache httpcomponents (is 4.5.6, should be 4.5.8) > Apache james mime4j (is 0.8.2, should be 0.8.3) > Apache opennlp-tools (is 1.9.0, should be 1.9.1) > parso (is 2.0.10, should be 2.0.11) > jackson-annotations > jackson-core > jackcess (is 2.1.12, should be 3.0.0) > jackcess-encrypt (is 2.1.4, should be 3.0.0) > org.osgi.compendium (is 4.0.0, should be 5.0.0) > org.osgi.core (is 4.0.0, should be 6.0.0) > junrar (is 2.0.0, should be 4.0.0) > java-libpst (is 0.8.1, should be 0.9.3) > jna (is 5.1.0, should be 5.2.0) > Bouncy Castle bcprov and bcmail (is 1.60, should be 1.61) > slf4j-log4j12 (is 1.7.25, should be 1.7.26) > UCAR cdm (is 4.5.5, should be 5.0.0) > UCAR grib (is 4.5.5, should be 8.0.0) > UCAR httpservices (is 4.5.5, should be 4.6.7) > UCAR netcdf4 (incorrectly labeled as 4.5.5, should be 4.3.22) > bndlib (is 1.50.0, should be 4.2.0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2854) upgrade out-of-date dependencies with outstanding CVEs
[ https://issues.apache.org/jira/browse/TIKA-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819398#comment-16819398 ] Andrew Pavlin commented on TIKA-2854: - Correction: bndlib 4.2.0 isn't fully in Maven yet, so bndlib 3.5.0 would be acceptable (still better than the antique version in use now). > upgrade out-of-date dependencies with outstanding CVEs > -- > > Key: TIKA-2854 > URL: https://issues.apache.org/jira/browse/TIKA-2854 > Project: Tika > Issue Type: Bug > Components: languageidentifier, parser >Affects Versions: 1.20 >Reporter: Andrew Pavlin >Priority: Major > > Besides the libraries reported in TIKA-2801 and TIKA-2835, the following 4th > party dependencies are out-of-date and should be upgraded to the latest > versions. The first three have outstanding CVEs which would be resolved by > using the newer versions of those dependencies. > jackson-databind (is 2.9.7, should be 2.9.8) > guava (is 17.0, should be 27.0) > sqlite-jdbc (is 3.25.2, should be 3.27.2.1) > No current CVEs but still out-of-date: > Apache commons-codec (is 1.11, should be 1.12) > Apache CXF (is 3.2.7, should be 3.3.1) > Apache httpcomponents (is 4.5.6, should be 4.5.8) > Apache james mime4j (is 0.8.2, should be 0.8.3) > Apache opennlp-tools (is 1.9.0, should be 1.9.1) > parso (is 2.0.10, should be 2.0.11) > jackson-annotations > jackson-core > jackcess (is 2.1.12, should be 3.0.0) > jackcess-encrypt (is 2.1.4, should be 3.0.0) > org.osgi.compendium (is 4.0.0, should be 5.0.0) > org.osgi.core (is 4.0.0, should be 6.0.0) > junrar (is 2.0.0, should be 4.0.0) > java-libpst (is 0.8.1, should be 0.9.3) > jna (is 5.1.0, should be 5.2.0) > Bouncy Castle bcprov and bcmail (is 1.60, should be 1.61) > slf4j-log4j12 (is 1.7.25, should be 1.7.26) > UCAR cdm (is 4.5.5, should be 5.0.0) > UCAR grib (is 4.5.5, should be 8.0.0) > UCAR httpservices (is 4.5.5, should be 4.6.7) > UCAR netcdf4 (incorrectly labeled as 4.5.5, should be 4.3.22) > bndlib (is 1.50.0, should be 4.2.0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TIKA-2854) upgrade out-of-date dependencies with outstanding CVEs
Andrew Pavlin created TIKA-2854: --- Summary: upgrade out-of-date dependencies with outstanding CVEs Key: TIKA-2854 URL: https://issues.apache.org/jira/browse/TIKA-2854 Project: Tika Issue Type: Bug Components: languageidentifier, parser Affects Versions: 1.20 Reporter: Andrew Pavlin Besides the libraries reported in TIKA-2801 and TIKA-2835, the following 4th party dependencies are out-of-date and should be upgraded to the latest versions. The first three have outstanding CVEs which would be resolved by using the newer versions of those dependencies. jackson-databind (is 2.9.7, should be 2.9.8) guava (is 17.0, should be 27.0) sqlite-jdbc (is 3.25.2, should be 3.27.2.1) No current CVEs but still out-of-date: Apache commons-codec (is 1.11, should be 1.12) Apache CXF (is 3.2.7, should be 3.3.1) Apache httpcomponents (is 4.5.6, should be 4.5.8) Apache james mime4j (is 0.8.2, should be 0.8.3) Apache opennlp-tools (is 1.9.0, should be 1.9.1) parso (is 2.0.10, should be 2.0.11) jackson-annotations jackson-core jackcess (is 2.1.12, should be 3.0.0) jackcess-encrypt (is 2.1.4, should be 3.0.0) org.osgi.compendium (is 4.0.0, should be 5.0.0) org.osgi.core (is 4.0.0, should be 6.0.0) junrar (is 2.0.0, should be 4.0.0) java-libpst (is 0.8.1, should be 0.9.3) jna (is 5.1.0, should be 5.2.0) Bouncy Castle bcprov and bcmail (is 1.60, should be 1.61) slf4j-log4j12 (is 1.7.25, should be 1.7.26) UCAR cdm (is 4.5.5, should be 5.0.0) UCAR grib (is 4.5.5, should be 8.0.0) UCAR httpservices (is 4.5.5, should be 4.6.7) UCAR netcdf4 (incorrectly labeled as 4.5.5, should be 4.3.22) bndlib (is 1.50.0, should be 4.2.0) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2577) Sonatype Nexus Auditor is reporting that the Bouncy castle version used by Tika 1.17 is vulnerable
[ https://issues.apache.org/jira/browse/TIKA-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653994#comment-16653994 ] Andrew Pavlin commented on TIKA-2577: - I have to agree with the comment. Next build should include the latest BouncyCastle release, so as to avoid CVE issues. After all, just because Tika isn't using the vulnerable parts of BouncyCastle doesn't mean other parts of the application using Tika couldn't call the defective BouncyCastle code. > Sonatype Nexus Auditor is reporting that the Bouncy castle version used by > Tika 1.17 is vulnerable > -- > > Key: TIKA-2577 > URL: https://issues.apache.org/jira/browse/TIKA-2577 > Project: Tika > Issue Type: Bug >Affects Versions: 1.17 >Reporter: Abhijit Rajwade >Priority: Major > > Sonatype Nexus Auditor is reporting that the Bouncy castle version used by > Tika 1.17 (tika-app-1.17.jar) is vulnerable. > Here are the details of CVE-2016-1000341. > > *Explanation* > {{BouncyCastle}} is vulnerable to a Timing Attack. The > {{generateSignature()}} function in the {{DSASigner.java}} file allows the > per message key (the {{k}} value in the DSA algorithm) to be predictable > while generating DSA signatures. A remote attacker can exploit this > vulnerability to determine the {{k}} value by closely observing the timings > for the generation of signatures, allowing the attacker to deduce the > signer?s private key. > Detection > The application is vulnerable by using this component. > > *Recommendation* > We recommend upgrading to a version of this component that is not vulnerable > to this specific issue. > Categories > Data > > *Root Cause* > tika-app-1.17.jar *<=* DSASigner.class : (, 1.56) > tika-app-1.17.jar *<=* DSASigner.class : (,1.56) > Advisories > Third Party: > [https://rdist.root.org/2010/11/19/dsa-requirements-for-rando...|https://rdist.root.org/2010/11/19/dsa-requirements-for-random-k-value/] > Project: [https://www.bouncycastle.org/releasenotes.html] > > *Resolution* > Refer [https://www.bouncycastle.org/releasenotes.html] > You can see that Bouncy caste version 1.56 fixes CVE-2016-1000341 > Recommend that Apach Tika upgrade Bouncy Castle to version 1.56 or latyer. > --- Abhijit Rajwade > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TIKA-2595) If source build creates 6 jars, why aren't all 6 binaries available for download?
Andrew Pavlin created TIKA-2595: --- Summary: If source build creates 6 jars, why aren't all 6 binaries available for download? Key: TIKA-2595 URL: https://issues.apache.org/jira/browse/TIKA-2595 Project: Tika Issue Type: Bug Components: packaging Affects Versions: 1.17 Reporter: Andrew Pavlin My company would like to use Tika, but some of the libraries Tika includes may not be compatible with our licensing. We notice that the source build creates 6 different binary JAR files, but only 3 of them are available for download (the 2 largest ones with the most extra baggage we don't need, and a test tool we don't need either). For various legal reasons, we can't create our own JAR files from source; we have to use the officially built ones from Apache. Could the smaller binary JAR files (tika-core and tika-parsers) be made available for download? -- This message was sent by Atlassian JIRA (v7.6.3#76005)