[jira] [Commented] (TIKA-3323) FileCommandDetectorTest inconsistent results depending on platform

2021-03-22 Thread Andrew Pavlin (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306537#comment-17306537
 ] 

Andrew Pavlin commented on TIKA-3323:
-

I hope you noticed additional changes would be needed. In my original proposed 
fix, I forgot to repeat the fixes in lines 57-60.

> FileCommandDetectorTest inconsistent results depending on platform
> --
>
> Key: TIKA-3323
> URL: https://issues.apache.org/jira/browse/TIKA-3323
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.25
>Reporter: Andrew Pavlin
>Assignee: Tim Allison
>Priority: Major
> Fix For: 1.26
>
>
> The unit test for org.apache.tika.detect.FileCommandDetector fails on some 
> platforms due to inconsistent return values from the operating system's "file 
> --mime" command. For example, on Fedora Core 32 and Mint 19, the test case 
> returns "text/xml". However, on Oracle Enterprise Linux 6, the test case 
> returns "application/xml", which is equally valid but causes the unit test 
> case to fail. When the unit test case fails, it is impossible to build Tika.
> The unit test program should be fixed to accept either answer so all dialects 
> of the file command work successfully. A proposed patch is included below:
> {code}
> diff --git 
> a/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java
>  
> b/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java
> index 21a24ab..1911e05 100644
> --- 
> a/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java
> +++ 
> b/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java
> @@ -44,9 +44,11 @@ public class FileCommandDetectorTest {
>  assumeTrue(FileCommandDetector.checkHasFile());
>  
>  try (InputStream is = 
> getClass().getResourceAsStream("/test-documents/basic_embedded.xml")) {
> - assertEquals(MediaType.text("xml"), DETECTOR.detect(is, new Metadata()));
> + MediaType answer = DETECTOR.detect(is, new Metadata()));
> + assert(MediaType.text("xml").equals(answer) || 
> MediaType.application("xml").equals(answer));
>  //make sure that the detector is resetting the stream
> - assertEquals(MediaType.text("xml"), DETECTOR.detect(is, new Metadata()));
> + answer = DETECTOR.detect(is, new Metadata()));
> + assert(MediaType.text("xml").equals(answer) || 
> MediaType.application("xml").equals(answer));
>  }
>  
>  //now try with TikaInputStream
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-3323) FileCommandDetectorTest incocnistent results depending on platform

2021-03-15 Thread Andrew Pavlin (Jira)
Andrew Pavlin created TIKA-3323:
---

 Summary: FileCommandDetectorTest incocnistent results depending on 
platform
 Key: TIKA-3323
 URL: https://issues.apache.org/jira/browse/TIKA-3323
 Project: Tika
  Issue Type: Bug
  Components: core
Affects Versions: 1.25
Reporter: Andrew Pavlin


The unit test for org.apache.tika.detect.FileCommandDetector fails on some 
platforms due to inconsistent return values from the operating system's "file 
--mime" command. For example, on Fedora Core 32 and Mint 19, the test case 
returns "text/xml". However, on Oracle Enterprise Linux 6, the test case 
returns "application/xml", which is equally valid but causes the unit test case 
to fail. When the unit test case fails, it is impossible to build Tika.

The unit test program should be fixed to accept either answer so all dialects 
of the file command work successfully. A proposed patch is included below:

diff --git 
a/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java
 
b/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java
index 21a24ab..1911e05 100644
--- 
a/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java
+++ 
b/tika-1.25/tika-core/src/test/java/org/apache/tika/detect/FileCommandDetectorTest.java
@@ -44,9 +44,11 @@ public class FileCommandDetectorTest {
 assumeTrue(FileCommandDetector.checkHasFile());
 
 try (InputStream is = 
getClass().getResourceAsStream("/test-documents/basic_embedded.xml")) {
- assertEquals(MediaType.text("xml"), DETECTOR.detect(is, new Metadata()));
+ MediaType answer = DETECTOR.detect(is, new Metadata()));
+ assert(MediaType.text("xml").equals(answer) || 
MediaType.application("xml").equals(answer));
 //make sure that the detector is resetting the stream
- assertEquals(MediaType.text("xml"), DETECTOR.detect(is, new Metadata()));
+ answer = DETECTOR.detect(is, new Metadata()));
+ assert(MediaType.text("xml").equals(answer) || 
MediaType.application("xml").equals(answer));
 }
 
 //now try with TikaInputStream



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-2854) upgrade out-of-date dependencies with outstanding CVEs

2019-04-23 Thread Andrew Pavlin (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824212#comment-16824212
 ] 

Andrew Pavlin commented on TIKA-2854:
-

Regarding your question on the ucar versions, I got those version numbers from 
the ucar.edu website. Not sure why (at the time of my bug submittal) they 
listed different version numbers for the different software modules, but even 
their examples for POM files show using common version numbers for netcdf, cdm, 
grib. I may also have been reading an incorrect page, as they now say the 
current version is consistently 4.6.13 as of today.

> upgrade out-of-date dependencies with outstanding CVEs
> --
>
> Key: TIKA-2854
> URL: https://issues.apache.org/jira/browse/TIKA-2854
> Project: Tika
>  Issue Type: Bug
>  Components: languageidentifier, parser
>Affects Versions: 1.20
>Reporter: Andrew Pavlin
>Priority: Major
>
> Besides the libraries reported in TIKA-2801 and TIKA-2835, the following 4th 
> party dependencies are out-of-date and should be upgraded to the latest 
> versions. The first three have outstanding CVEs which would be resolved by 
> using the newer versions of those dependencies.
> jackson-databind (is 2.9.7, should be 2.9.8)
> guava (is 17.0, should be 27.0)
> sqlite-jdbc (is 3.25.2, should be 3.27.2.1)
> No current CVEs but still out-of-date:
> Apache commons-codec (is 1.11, should be 1.12)
> Apache CXF (is 3.2.7, should be 3.3.1)
> Apache httpcomponents (is 4.5.6, should be 4.5.8)
> Apache james mime4j (is 0.8.2, should be 0.8.3)
> Apache opennlp-tools (is 1.9.0, should be 1.9.1)
> parso (is 2.0.10, should be  2.0.11)
> jackson-annotations
> jackson-core
> jackcess (is 2.1.12, should be 3.0.0)
> jackcess-encrypt (is 2.1.4, should be 3.0.0)
> org.osgi.compendium (is 4.0.0, should be 5.0.0)
> org.osgi.core (is 4.0.0, should be 6.0.0)
> junrar (is 2.0.0, should be 4.0.0)
> java-libpst (is 0.8.1, should be 0.9.3)
> jna (is 5.1.0, should be 5.2.0)
> Bouncy Castle bcprov and bcmail (is 1.60, should be 1.61)
> slf4j-log4j12 (is 1.7.25, should be 1.7.26)
> UCAR cdm (is 4.5.5, should be 5.0.0)
> UCAR grib (is 4.5.5, should be 8.0.0)
> UCAR httpservices (is 4.5.5, should be 4.6.7)
> UCAR netcdf4 (incorrectly labeled as 4.5.5, should be 4.3.22)
> bndlib (is 1.50.0, should be 4.2.0)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2854) upgrade out-of-date dependencies with outstanding CVEs

2019-04-16 Thread Andrew Pavlin (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819398#comment-16819398
 ] 

Andrew Pavlin commented on TIKA-2854:
-

Correction: bndlib 4.2.0 isn't fully in Maven yet, so bndlib 3.5.0 would be 
acceptable (still better than the antique version in use now).

> upgrade out-of-date dependencies with outstanding CVEs
> --
>
> Key: TIKA-2854
> URL: https://issues.apache.org/jira/browse/TIKA-2854
> Project: Tika
>  Issue Type: Bug
>  Components: languageidentifier, parser
>Affects Versions: 1.20
>Reporter: Andrew Pavlin
>Priority: Major
>
> Besides the libraries reported in TIKA-2801 and TIKA-2835, the following 4th 
> party dependencies are out-of-date and should be upgraded to the latest 
> versions. The first three have outstanding CVEs which would be resolved by 
> using the newer versions of those dependencies.
> jackson-databind (is 2.9.7, should be 2.9.8)
> guava (is 17.0, should be 27.0)
> sqlite-jdbc (is 3.25.2, should be 3.27.2.1)
> No current CVEs but still out-of-date:
> Apache commons-codec (is 1.11, should be 1.12)
> Apache CXF (is 3.2.7, should be 3.3.1)
> Apache httpcomponents (is 4.5.6, should be 4.5.8)
> Apache james mime4j (is 0.8.2, should be 0.8.3)
> Apache opennlp-tools (is 1.9.0, should be 1.9.1)
> parso (is 2.0.10, should be  2.0.11)
> jackson-annotations
> jackson-core
> jackcess (is 2.1.12, should be 3.0.0)
> jackcess-encrypt (is 2.1.4, should be 3.0.0)
> org.osgi.compendium (is 4.0.0, should be 5.0.0)
> org.osgi.core (is 4.0.0, should be 6.0.0)
> junrar (is 2.0.0, should be 4.0.0)
> java-libpst (is 0.8.1, should be 0.9.3)
> jna (is 5.1.0, should be 5.2.0)
> Bouncy Castle bcprov and bcmail (is 1.60, should be 1.61)
> slf4j-log4j12 (is 1.7.25, should be 1.7.26)
> UCAR cdm (is 4.5.5, should be 5.0.0)
> UCAR grib (is 4.5.5, should be 8.0.0)
> UCAR httpservices (is 4.5.5, should be 4.6.7)
> UCAR netcdf4 (incorrectly labeled as 4.5.5, should be 4.3.22)
> bndlib (is 1.50.0, should be 4.2.0)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TIKA-2854) upgrade out-of-date dependencies with outstanding CVEs

2019-04-16 Thread Andrew Pavlin (JIRA)
Andrew Pavlin created TIKA-2854:
---

 Summary: upgrade out-of-date dependencies with outstanding CVEs
 Key: TIKA-2854
 URL: https://issues.apache.org/jira/browse/TIKA-2854
 Project: Tika
  Issue Type: Bug
  Components: languageidentifier, parser
Affects Versions: 1.20
Reporter: Andrew Pavlin


Besides the libraries reported in TIKA-2801 and TIKA-2835, the following 4th 
party dependencies are out-of-date and should be upgraded to the latest 
versions. The first three have outstanding CVEs which would be resolved by 
using the newer versions of those dependencies.

jackson-databind (is 2.9.7, should be 2.9.8)

guava (is 17.0, should be 27.0)

sqlite-jdbc (is 3.25.2, should be 3.27.2.1)

No current CVEs but still out-of-date:

Apache commons-codec (is 1.11, should be 1.12)

Apache CXF (is 3.2.7, should be 3.3.1)

Apache httpcomponents (is 4.5.6, should be 4.5.8)

Apache james mime4j (is 0.8.2, should be 0.8.3)

Apache opennlp-tools (is 1.9.0, should be 1.9.1)

parso (is 2.0.10, should be  2.0.11)

jackson-annotations

jackson-core

jackcess (is 2.1.12, should be 3.0.0)

jackcess-encrypt (is 2.1.4, should be 3.0.0)

org.osgi.compendium (is 4.0.0, should be 5.0.0)

org.osgi.core (is 4.0.0, should be 6.0.0)

junrar (is 2.0.0, should be 4.0.0)

java-libpst (is 0.8.1, should be 0.9.3)

jna (is 5.1.0, should be 5.2.0)

Bouncy Castle bcprov and bcmail (is 1.60, should be 1.61)

slf4j-log4j12 (is 1.7.25, should be 1.7.26)

UCAR cdm (is 4.5.5, should be 5.0.0)

UCAR grib (is 4.5.5, should be 8.0.0)

UCAR httpservices (is 4.5.5, should be 4.6.7)

UCAR netcdf4 (incorrectly labeled as 4.5.5, should be 4.3.22)

bndlib (is 1.50.0, should be 4.2.0)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2577) Sonatype Nexus Auditor is reporting that the Bouncy castle version used by Tika 1.17 is vulnerable

2018-10-17 Thread Andrew Pavlin (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653994#comment-16653994
 ] 

Andrew Pavlin commented on TIKA-2577:
-

I have to agree with the comment. Next build should include the latest 
BouncyCastle release, so as to avoid CVE issues. After all, just because Tika 
isn't using the vulnerable parts of BouncyCastle doesn't mean other parts of 
the application using Tika couldn't call the defective BouncyCastle code.

> Sonatype Nexus Auditor is reporting that the Bouncy castle version used by 
> Tika 1.17 is vulnerable
> --
>
> Key: TIKA-2577
> URL: https://issues.apache.org/jira/browse/TIKA-2577
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.17
>Reporter: Abhijit Rajwade
>Priority: Major
>
> Sonatype Nexus Auditor is reporting that the Bouncy castle version used by 
> Tika 1.17 (tika-app-1.17.jar) is vulnerable.
> Here are the details of CVE-2016-1000341.
>  
> *Explanation*
> {{BouncyCastle}} is vulnerable to a Timing Attack. The 
> {{generateSignature()}} function in the {{DSASigner.java}} file allows the 
> per message key (the {{k}} value in the DSA algorithm) to be predictable 
> while generating DSA signatures. A remote attacker can exploit this 
> vulnerability to determine the {{k}} value by closely observing the timings 
> for the generation of signatures, allowing the attacker to deduce the 
> signer?s private key.
> Detection
> The application is vulnerable by using this component.
>  
> *Recommendation*
> We recommend upgrading to a version of this component that is not vulnerable 
> to this specific issue.
> Categories
> Data
>  
> *Root Cause*
> tika-app-1.17.jar *<=* DSASigner.class : (, 1.56)
> tika-app-1.17.jar *<=* DSASigner.class : (,1.56)
> Advisories
> Third Party: 
> [https://rdist.root.org/2010/11/19/dsa-requirements-for-rando...|https://rdist.root.org/2010/11/19/dsa-requirements-for-random-k-value/]
> Project: [https://www.bouncycastle.org/releasenotes.html]
>  
> *Resolution*
> Refer [https://www.bouncycastle.org/releasenotes.html]
> You can see that Bouncy caste version 1.56 fixes CVE-2016-1000341
> Recommend that Apach Tika upgrade Bouncy Castle to version 1.56 or latyer.
> --- Abhijit Rajwade
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TIKA-2595) If source build creates 6 jars, why aren't all 6 binaries available for download?

2018-03-01 Thread Andrew Pavlin (JIRA)
Andrew Pavlin created TIKA-2595:
---

 Summary: If source build creates 6 jars, why aren't all 6 binaries 
available for download?
 Key: TIKA-2595
 URL: https://issues.apache.org/jira/browse/TIKA-2595
 Project: Tika
  Issue Type: Bug
  Components: packaging
Affects Versions: 1.17
Reporter: Andrew Pavlin


My company would like to use Tika, but some of the libraries Tika includes may 
not be compatible with our licensing. We notice that the source build creates 6 
different binary JAR files, but only 3 of them are available for download (the 
2 largest ones with the most extra baggage we don't need, and a test tool we 
don't need either). For various legal reasons, we can't create our own JAR 
files from source; we have to use the officially built ones from Apache. Could 
the smaller binary JAR files (tika-core and tika-parsers) be made available for 
download?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)