[jira] [Commented] (TIKA-2676) After switching to TIKA 1.18 from 1.17 started to get exception

2018-09-19 Thread Slava G (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16620204#comment-16620204
 ] 

Slava G commented on TIKA-2676:
---

I wish I knew :) but it's 99% coming from that area. I'll change log 
configuration to get more information.

> After switching to TIKA 1.18 from 1.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2676) After switching to TIKA 1.18 from 1.17 started to get exception

2018-09-18 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619766#comment-16619766
 ] 

Tim Allison edited comment on TIKA-2676 at 9/18/18 9:39 PM:


Then how are you getting that exception?   :D  I don't think that could come 
from our code...I could be wrong


was (Author: talli...@mitre.org):
Then how are you getting that exception?!  
!/jira/images/icons/emoticons/tongue.png|width=16,height=16!   :D

> After switching to TIKA 1.18 from 1.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2676) After switching to TIKA 1.18 from 1.17 started to get exception

2018-09-18 Thread Slava G (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619727#comment-16619727
 ] 

Slava G commented on TIKA-2676:
---

Well, I'm not using ActivationDataFlavor directly in any part of this flow. As 
far as I know. 

> After switching to TIKA 1.18 from 1.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2676) After switching to TIKA 1.18 from 1.17 started to get exception

2018-09-18 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619717#comment-16619717
 ] 

Tim Allison commented on TIKA-2676:
---

This is helpful...what are you using from that parse to initialize 
ActivationDataFlavor?

> After switching to TIKA 1.18 from 1.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2676) After switching to TIKA 1.18 from 1.17 started to get exception

2018-09-18 Thread Slava G (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619714#comment-16619714
 ] 

Slava G commented on TIKA-2676:
---

Well, the flow is next, we're parsing MimeMessage, taking kust budy as string. 
Making byte[] from it and then making TikaStream by using  
TikaInputStream.get(byte[]).

Then we're calling detect method on AutoDetectParser.getDetector().detect() 
with TikaStream and content type null.

Once we're getting content type, we're calling AutoDetectParser.parse() with 
TikaStream and content type.

 

As for 1.19 - will try on it :)

 

Thanks

> After switching to TIKA 1.18 from 1.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2676) After switching to TIKA 1.18 from 1.17 started to get exception

2018-09-18 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619701#comment-16619701
 ] 

Tim Allison commented on TIKA-2676:
---

{quote}bq. We're passing null as a content type, in this case. All other cases 
where we do pass content type were working fine. I don't get your question 
{quote}
It looks from your initial description like the error is happening during the 
initialization of ActivationDataFlavor.  The MimeTypeParseException will occur 
when the mimetype that is passed in is invalid.  So, my question is...are you 
running Tika against a file and then trying to create an ActivationDataFlavor 
with a mime-type that you've gotten from Tika.  I'm trying to understand your 
workflow.
{quote}I'll ask from dev team to upgrade to 1.18 and run with extended logging
{quote}
1.19 is just out.  Give that a try. :D
{quote}Question are AutoDetectParser and other TIKA parsers in general are 
thread-safe ? as it could be that we have about 15 threads that each of then 
will have its own Parser
{quote}
Yes, absolutely.  If you find that they are not, let us know!!!  Our regression 
tests run multi-threaded on a single parser.

> After switching to TIKA 1.18 from 1.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2676) After switching to TIKA 1.18 from 1.17 started to get exception

2018-09-18 Thread Slava G (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619646#comment-16619646
 ] 

Slava G commented on TIKA-2676:
---

We're passing null as a content type, in this case. All other cases where we do 
pass content type were working fine. I don't get your question :  Do you have a 
sense of what part of the output you're getting from Tika during the 
initialization of DataFlavor?.

I'll ask from dev team to upgrade to 1.18 and run with extended logging, hope 
will be able to provide some more information soon. 

Question are AutoDetectParser and other TIKA parsers in general are thread-safe 
? as it could be that we have about 15 threads that each of then will have its 
own Parser

Thanks fro your help, really appreciate that !!!

Slava.

 

> After switching to TIKA 1.18 from 1.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2676) After switching to TIKA 1.18 from 1.17 started to get exception

2018-09-18 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619581#comment-16619581
 ] 

Tim Allison commented on TIKA-2676:
---

{quote}bq."email parsing" that's _some_ information at least. 
{quote}
Sorry, I should have read the original issue more closely...  I don't see 
anywhere in our code or mime4j that uses 
java.awt.datatransfer.MimeTypeParseException.  Do you have a sense of what part 
of the output you're getting from Tika during the initialization of DataFlavor? 
 Are you passing in, e.g., Metadata.Content_Type, and that is somehow 
corrupted?  

> After switching to TIKA 1.18 from 1.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2676) After switching to TIKA 1.18 from 1.17 started to get exception

2018-09-18 Thread Slava G (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619160#comment-16619160
 ] 

Slava G commented on TIKA-2676:
---

I wish it would be that simple , but the only version is mime4j 0.8.1

And maybe I mislead you, we're taking from emails mimparts and passing for 
parsing to TIKA only strings, so It's kind "fake" mail parsing.

But, maybe another binary is making problems here , but still can figure it out 
who is troublemaker.

> After switching to TIKA 1.18 from 1.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2676) After switching to TIKA 1.18 from 1.17 started to get exception

2018-09-18 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16619122#comment-16619122
 ] 

Tim Allison commented on TIKA-2676:
---

>From [~slavago]
{quote}The problems is that issue not always reproducible, I can't point on 
some content or anything else.

Seems that it's somehow related to jars conflicts and order of class loading, 
but don't know what exactly, in version 1.17 everything works fine, in the 1.18 
we started to fail on many emails parsings. 
{quote}
"email parsing" that's _some_ information at least.  Once, when I upgraded Tika 
for Solr, I forgot to upgrade mime4j, and there was a binary incompatibility: 
SOLR-11622.  Perhaps you have an old version of mime4j somewhere on your 
classpath?

> After switching to TIKA 1.18 from 1.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2716) Sonatype Nexus auditor is reporting that spring framework vesrion used by Tika 1.18 is vulnerable

2018-09-04 Thread Konstantin Gribov (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603335#comment-16603335
 ] 

Konstantin Gribov commented on TIKA-2716:
-

Won't Fix because {{spring-*}} is excluded from dependency tree now (see 
TIKA-2721)

> Sonatype Nexus auditor is reporting that spring framework vesrion used by 
> Tika 1.18 is vulnerable
> -
>
> Key: TIKA-2716
> URL: https://issues.apache.org/jira/browse/TIKA-2716
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.18
>Reporter: Abhijit Rajwade
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.0, 1.19
>
>
> Sonatype Nexus auditor is reporting that spring framework version used by 
> Apache Tika 1.18 is vulnerable. Recommendation is to upgrade to a non 
> vulnerable version of Spring framework - 4.3.15/later or 5.0.5/later
>  
> Refer following details
>  
> Issue 
> [CVE-2018-1270|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-1270]
>  
> Source National Vulnerability Database
>  
> Severity
> CVE CVSS 3.0: 9.8
> CVE CVSS 2.0: 7.5
> Sonatype CVSS 3.0: 9.8
>  
> Weakness
> CVE CWE: [358|https://cwe.mitre.org/data/definitions/358.html]
>  
> Description from CVE
> Spring Framework, versions 5.0 prior to 5.0.5 and versions 4.3 prior to 
> 4.3.15 and older unsupported versions, allow applications to expose STOMP 
> over WebSocket endpoints with a simple, in-memory STOMP broker through the 
> spring-messaging module. A malicious user (or attacker) can craft a message 
> to the broker that can lead to a remote code execution attack.
> Explanation
> The Spring Framework {{spring-messaging}} module is vulnerable to Remote Code 
> Execution (RCE). The {{getMethods()}} method in the 
> {{ReflectiveMethodResolver}} class, the {{canWrite}} method in the 
> {{ReflectivePropertyAccessor}} class, and the {{filterSubscriptions()}} 
> method in the {{DefaultSubscriptionRegistry}} class do not properly restrict 
> SpEL expression evaluation. A remote attacker can exploit this vulnerability 
> by crafting a request to an exposed STOMP endpoint and injecting a malicious 
> payload into the {{selector}} header. The application would then execute the 
> payload via a call to {{expression.getValue()}} whenever a new message is 
> sent to the broker.
>  
> Detection
> The application is vulnerable by using this component.
>  
> Recommendation
> We recommend upgrading to a version of this component that is not vulnerable 
> to this specific issue.
> Categories
> Data
> Root Cause
> tika-app-1.18.jar *<=* ReflectivePropertyAccessor.class : [3.0.0.RELEASE , 
> 4.3.15.RELEASE)
> tika-app-1.18.jar *<=* ReflectiveMethodResolver.class : [3.0.0.RELEASE , 
> 4.3.15.RELEASE)
>  
> Advisories
> Attack: [http://www.polaris-lab.com/index.php/archives/501/]
> Attack: 
> [https://chybeta.github.io/2018/04/07/spring-messaging-Remote...|https://chybeta.github.io/2018/04/07/spring-messaging-Remote-Code-Execution-%E5%88%86%E6%9E%90-%E3%80%90CVE-2018-1270%E3%80%91/]
> Project: [https://jira.spring.io/browse/SPR-16588]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (TIKA-2716) Sonatype Nexus auditor is reporting that spring framework vesrion used by Tika 1.18 is vulnerable

2018-09-04 Thread Konstantin Gribov (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Gribov closed TIKA-2716.
---
   Resolution: Won't Fix
 Assignee: Konstantin Gribov
Fix Version/s: 1.19
   2.0

> Sonatype Nexus auditor is reporting that spring framework vesrion used by 
> Tika 1.18 is vulnerable
> -
>
> Key: TIKA-2716
> URL: https://issues.apache.org/jira/browse/TIKA-2716
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.18
>Reporter: Abhijit Rajwade
>Assignee: Konstantin Gribov
>Priority: Major
> Fix For: 2.0, 1.19
>
>
> Sonatype Nexus auditor is reporting that spring framework version used by 
> Apache Tika 1.18 is vulnerable. Recommendation is to upgrade to a non 
> vulnerable version of Spring framework - 4.3.15/later or 5.0.5/later
>  
> Refer following details
>  
> Issue 
> [CVE-2018-1270|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-1270]
>  
> Source National Vulnerability Database
>  
> Severity
> CVE CVSS 3.0: 9.8
> CVE CVSS 2.0: 7.5
> Sonatype CVSS 3.0: 9.8
>  
> Weakness
> CVE CWE: [358|https://cwe.mitre.org/data/definitions/358.html]
>  
> Description from CVE
> Spring Framework, versions 5.0 prior to 5.0.5 and versions 4.3 prior to 
> 4.3.15 and older unsupported versions, allow applications to expose STOMP 
> over WebSocket endpoints with a simple, in-memory STOMP broker through the 
> spring-messaging module. A malicious user (or attacker) can craft a message 
> to the broker that can lead to a remote code execution attack.
> Explanation
> The Spring Framework {{spring-messaging}} module is vulnerable to Remote Code 
> Execution (RCE). The {{getMethods()}} method in the 
> {{ReflectiveMethodResolver}} class, the {{canWrite}} method in the 
> {{ReflectivePropertyAccessor}} class, and the {{filterSubscriptions()}} 
> method in the {{DefaultSubscriptionRegistry}} class do not properly restrict 
> SpEL expression evaluation. A remote attacker can exploit this vulnerability 
> by crafting a request to an exposed STOMP endpoint and injecting a malicious 
> payload into the {{selector}} header. The application would then execute the 
> payload via a call to {{expression.getValue()}} whenever a new message is 
> sent to the broker.
>  
> Detection
> The application is vulnerable by using this component.
>  
> Recommendation
> We recommend upgrading to a version of this component that is not vulnerable 
> to this specific issue.
> Categories
> Data
> Root Cause
> tika-app-1.18.jar *<=* ReflectivePropertyAccessor.class : [3.0.0.RELEASE , 
> 4.3.15.RELEASE)
> tika-app-1.18.jar *<=* ReflectiveMethodResolver.class : [3.0.0.RELEASE , 
> 4.3.15.RELEASE)
>  
> Advisories
> Attack: [http://www.polaris-lab.com/index.php/archives/501/]
> Attack: 
> [https://chybeta.github.io/2018/04/07/spring-messaging-Remote...|https://chybeta.github.io/2018/04/07/spring-messaging-Remote-Code-Execution-%E5%88%86%E6%9E%90-%E3%80%90CVE-2018-1270%E3%80%91/]
> Project: [https://jira.spring.io/browse/SPR-16588]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TIKA-2716) Sonatype Nexus auditor is reporting that spring framework vesrion used by Tika 1.18 is vulnerable

2018-08-22 Thread Abhijit Rajwade (JIRA)
Abhijit Rajwade created TIKA-2716:
-

 Summary: Sonatype Nexus auditor is reporting that spring framework 
vesrion used by Tika 1.18 is vulnerable
 Key: TIKA-2716
 URL: https://issues.apache.org/jira/browse/TIKA-2716
 Project: Tika
  Issue Type: Bug
  Components: core
Affects Versions: 1.18
Reporter: Abhijit Rajwade


Sonatype Nexus auditor is reporting that spring framework version used by 
Apache Tika 1.18 is vulnerable. Recommendation is to upgrade to a non 
vulnerable version of Spring framework - 4.3.15/later or 5.0.5/later
 
Refer following details
 
Issue 
[CVE-2018-1270|http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-1270]
 
Source National Vulnerability Database
 
Severity
CVE CVSS 3.0: 9.8
CVE CVSS 2.0: 7.5
Sonatype CVSS 3.0: 9.8
 
Weakness
CVE CWE: [358|https://cwe.mitre.org/data/definitions/358.html]
 
Description from CVE
Spring Framework, versions 5.0 prior to 5.0.5 and versions 4.3 prior to 4.3.15 
and older unsupported versions, allow applications to expose STOMP over 
WebSocket endpoints with a simple, in-memory STOMP broker through the 
spring-messaging module. A malicious user (or attacker) can craft a message to 
the broker that can lead to a remote code execution attack.
Explanation
The Spring Framework {{spring-messaging}} module is vulnerable to Remote Code 
Execution (RCE). The {{getMethods()}} method in the 
{{ReflectiveMethodResolver}} class, the {{canWrite}} method in the 
{{ReflectivePropertyAccessor}} class, and the {{filterSubscriptions()}} method 
in the {{DefaultSubscriptionRegistry}} class do not properly restrict SpEL 
expression evaluation. A remote attacker can exploit this vulnerability by 
crafting a request to an exposed STOMP endpoint and injecting a malicious 
payload into the {{selector}} header. The application would then execute the 
payload via a call to {{expression.getValue()}} whenever a new message is sent 
to the broker.
 
Detection
The application is vulnerable by using this component.
 
Recommendation
We recommend upgrading to a version of this component that is not vulnerable to 
this specific issue.
Categories
Data
Root Cause
tika-app-1.18.jar *<=* ReflectivePropertyAccessor.class : [3.0.0.RELEASE , 
4.3.15.RELEASE)
tika-app-1.18.jar *<=* ReflectiveMethodResolver.class : [3.0.0.RELEASE , 
4.3.15.RELEASE)
 
Advisories
Attack: [http://www.polaris-lab.com/index.php/archives/501/]
Attack: 
[https://chybeta.github.io/2018/04/07/spring-messaging-Remote...|https://chybeta.github.io/2018/04/07/spring-messaging-Remote-Code-Execution-%E5%88%86%E6%9E%90-%E3%80%90CVE-2018-1270%E3%80%91/]
Project: [https://jira.spring.io/browse/SPR-16588]
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TIKA-2676) After switching to TIKA 1.18 from 1.17 started to get exception

2018-06-20 Thread Slava G (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slava G updated TIKA-2676:
--
Summary: After switching to TIKA 1.18 from 1.17 started to get exception  
(was: After switching to TIKA 1.18 from !.17 started to get exception)

> After switching to TIKA 1.18 from 1.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2676) After switching to TIKA 1.18 from !.17 started to get exception

2018-06-20 Thread Slava G (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518427#comment-16518427
 ] 

Slava G commented on TIKA-2676:
---

Hi Tim,

I would like to share more of stacktrace but here's a problem, if'll turn on 
printing entire stacktrace it'll blow up my log entirely and will kill the 
production environment as there such exception is thrown in masses, so I have 
only this. and currently our production is is running with TIKA 1.17 to solve 
the issue and it's not reproducible in dev workstations or even in AWS cloud 
it's almost not possible to reproduce, I know it's sound very strange, sorry 
about that but this is currently the situation, our dev team is trying to 
identify the issue  and reproduce it, but it could take some time.

Thanks.

> After switching to TIKA 1.18 from !.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TIKA-2676) After switching to TIKA 1.18 from !.17 started to get exception

2018-06-20 Thread Slava G (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Slava G updated TIKA-2676:
--
Description: 
I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
emails).

And I started to get exceptions in :

IllegalArgumentException: failed to parse:
  at java.lang.IllegalArgumentException: failed to parse:
  at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
  at javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)

I'm using AutoDetectParser.

The DataFlavor constructor throws an exception when it catches 
MimeTypeParseException during initialization when can't recognize MIME type and 
indeed mimeType that printed in the log is something not printable.

This started to happen in the production environment , when I'm tried to 
reproduce it in my workstation it was not reproducible, switched back to TIKA 
1.17 solved the issue. 

 

Thanks

  was:
I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
emails).

And I started to get exceptions in :

IllegalArgumentException: failed to parse:#012
 at java.lang.IllegalArgumentException: failed to parse:
 at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
 at javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)

I'm using AutoDetectParser.

The DataFlavor constructor throws an exception when it catches 
MimeTypeParseException during initialization when can't recognize MIME type and 
indeed mimeType that printed in the log is something not printable.

This started to happen in the production environment , when I'm tried to 
reproduce it in my workstation it was not reproducible, switched back to TIKA 
1.17 solved the issue. 

 

Thanks


> After switching to TIKA 1.18 from !.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:
>   at java.lang.IllegalArgumentException: failed to parse:
>   at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>   at 
> javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2676) After switching to TIKA 1.18 from !.17 started to get exception

2018-06-20 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518406#comment-16518406
 ] 

Tim Allison commented on TIKA-2676:
---

Thank you for raising this.  Can you share more of the stacktrace?  The only 
place we use DataFlavor is in our gui.

> After switching to TIKA 1.18 from !.17 started to get exception
> ---
>
> Key: TIKA-2676
> URL: https://issues.apache.org/jira/browse/TIKA-2676
> Project: Tika
>  Issue Type: Bug
> Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
> update 60.
>Reporter: Slava G
>Priority: Major
>
> I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
> emails).
> And I started to get exceptions in :
> IllegalArgumentException: failed to parse:#012
>  at java.lang.IllegalArgumentException: failed to parse:
>  at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
>  at javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)
> I'm using AutoDetectParser.
> The DataFlavor constructor throws an exception when it catches 
> MimeTypeParseException during initialization when can't recognize MIME type 
> and indeed mimeType that printed in the log is something not printable.
> This started to happen in the production environment , when I'm tried to 
> reproduce it in my workstation it was not reproducible, switched back to TIKA 
> 1.17 solved the issue. 
>  
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TIKA-2676) After switching to TIKA 1.18 from !.17 started to get exception

2018-06-20 Thread Slava G (JIRA)
Slava G created TIKA-2676:
-

 Summary: After switching to TIKA 1.18 from !.17 started to get 
exception
 Key: TIKA-2676
 URL: https://issues.apache.org/jira/browse/TIKA-2676
 Project: Tika
  Issue Type: Bug
 Environment: CentOS 7 running on Amazon EC2 I3.Xlarge. With JAVA 8 
update 60.
Reporter: Slava G


I recently switched from TIKA 1.17 to TIKA 1.18 (I'm using tika to parse 
emails).

And I started to get exceptions in :

IllegalArgumentException: failed to parse:#012
 at java.lang.IllegalArgumentException: failed to parse:
 at java.awt.datatransfer.DataFlavor.(DataFlavor.java:435)
 at javax.activation.ActivationDataFlavor.(ActivationDataFlavor.java:81)

I'm using AutoDetectParser.

The DataFlavor constructor throws an exception when it catches 
MimeTypeParseException during initialization when can't recognize MIME type and 
indeed mimeType that printed in the log is something not printable.

This started to happen in the production environment , when I'm tried to 
reproduce it in my workstation it was not reproducible, switched back to TIKA 
1.17 solved the issue. 

 

Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Fwd: [ANNOUNCE] Apache Tika 1.18 released

2018-04-25 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika
1.18.

The release contents have been pushed out to the main Apache release site
and to the Maven
Central sync, so the releases should be available as soon as the mirrors
get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from
various documents using existing parser libraries.

Apache Tika 1.18 contains a number of improvements as well as security and
bug fixes.

Details can be found in the changes file:

http://www.apache.org/dist/tika/CHANGES-1.18.txt

Apache Tika is available on the download page: http://tika.apache.org/
download.html

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Repository: http://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using
signatures found on the Apache site: http://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
http://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[RESULT] [VOTE] Release Apache Tika 1.18 Candidate #3

2018-04-24 Thread talli...@apache.org
The vote passed with +1 from:

Tim Allison
Dave Meikle
Oleg Tikhonov

and no -1 votes.

Many thanks to all!

I'll run the dist/announce process tomorrow.

Cheers,

          Tim


From: David Meikle [mailto:loo...@gmail.com] 
Sent: Tuesday, April 24, 2018 1:27 PM
To: dev@tika.apache.org
Subject: Re: [VOTE] Release Apache Tika 1.18 Candidate #3

 


Hi Tim


 



It looks like something to do with my Tesseract setup on my Mac



 







 



It is all working perfectly on my Windows and Linux machines, so +1 from me on 
the release too.



 



Cheers,



Dave




 


On 24 April 2018 at 13:29, Allison, Timothy B. <talli...@mitre.org> wrote:

>  
> Hi Dave,
> 
> Let us know what you find.  I had a successful build w and w/o tesseract on 
> both Windows and Linux.
> 
> I did get a failed build if I wasn't connected to the internet because the 
> URL exception in DL4JInceptionV3NetTest didn't have the expected message.  
> I've since fixed this in master and branch_1x.
> 
> If that's worth a respin, I can do that.
> 
> If you find something else (problems w diff version of tesseract, diff os, 
> diff java version, etc), let us know.
> 
> Cheers,
> 
>            Tim
> 
>  
>  
> 
> -Original Message-
> From: David Meikle [mailto:loo...@gmail.com] 
> Sent: Monday, April 23, 2018 7:07 PM
> To: dev@tika.apache.org
> Subject: Re: [VOTE] Release Apache Tika 1.18 Candidate #3
> 
> Hey Tim,
> 
> Just started looking at this and got an error with Tesseract enabled. Will 
> try to see if it is localised to me or not.
> 
> Cheers,
> Dave
> 
> On 22 April 2018 at 13:29, Oleg Tikhonov <o...@apache.org> wrote:
> 
>> Sorry for the noise.
>>
>> tar'ed
>>
>> On Sun, Apr 22, 2018 at 3:07 PM, Oleg Tikhonov <o...@apache.org> wrote:
>>
>>> My bad. This one, hopefully ...
>>>
>>>
>>> On Sun, Apr 22, 2018 at 3:01 PM, Oleg Tikhonov <o...@apache.org> wrote:
>>>
>>>> Hi,
>>>> thanks a lot.
>>>> [x] +1 Release this package as Apache Tika 1.18
>>>>
>>>> Even did a security scan:
>>>> mvn org.owasp:dependency-check-maven:3.1.2:check
>>>>
>>>> Report is attached.
>>>>
>>>> Best regards,
>>>> Oleg
>>>>
>>>>
>>>> On Sat, Apr 21, 2018 at 12:54 AM, talli...@apache.org < 
>>>> talli...@apache.org> wrote:
>>>>
>>>>> All,
>>>>> A candidate for the Tika 1.18 release is available at:
>>>>> https://dist.apache.org/repos/dist/dev/tika/
>>>>> The release candidate is a zip archive of the sources in:
>>>>> https://github.com/apache/tika/tree/1.18-rc3
>>>>> The SHA-512 checksum of the archive is    f69ee27b31cf7bcb1eaf114b93c23d
>>>>> d85b974356cc7e6e265b1c9366a11d711a3341e690f5b452a3e8b0c5cc6f
>>>>> 5839db01b3ef6ec3a2a29ffcd332ff7a63dcf3.
>>>>> In addition, a staged maven repository is available here:
>>>>>  https://repository.apache.org/content/repositories/orgapachetika-10
> 
> 
> 
>>>>> 33 Please vote on releasing this package as Apache Tika 1.18.The 
> 
>  
>  
>>>>> vote is open for the next 72 hours and passes if a majority of 
>>>>> atleast three +1 Tika PMC votes are cast.
>>>>> [ ] +1 Release this package as Apache Tika 1.18[ ] -1 Do not 
>>>>> release this package because...
>>>>> +1 from me; third time's the charm...
>>>>> Cheers,
>>>>>             Tim
>>>>
>>>>
>>>>
>>>
>>
> 
> 
> 


 






Re: [VOTE] Release Apache Tika 1.18 Candidate #3

2018-04-24 Thread David Meikle
Hi Tim

It looks like something to do with my Tesseract setup on my Mac


It is all working perfectly on my Windows and Linux machines, so +1 from me
on the release too.

Cheers,
Dave


On 24 April 2018 at 13:29, Allison, Timothy B. <talli...@mitre.org> wrote:

> Hi Dave,
>
> Let us know what you find.  I had a successful build w and w/o tesseract
> on both Windows and Linux.
>
> I did get a failed build if I wasn't connected to the internet because the
> URL exception in DL4JInceptionV3NetTest didn't have the expected message.
> I've since fixed this in master and branch_1x.
>
> If that's worth a respin, I can do that.
>
> If you find something else (problems w diff version of tesseract, diff os,
> diff java version, etc), let us know.
>
> Cheers,
>
>Tim
>
> -Original Message-
> From: David Meikle [mailto:loo...@gmail.com]
> Sent: Monday, April 23, 2018 7:07 PM
> To: dev@tika.apache.org
> Subject: Re: [VOTE] Release Apache Tika 1.18 Candidate #3
>
> Hey Tim,
>
> Just started looking at this and got an error with Tesseract enabled. Will
> try to see if it is localised to me or not.
>
> Cheers,
> Dave
>
> On 22 April 2018 at 13:29, Oleg Tikhonov <o...@apache.org> wrote:
>
> > Sorry for the noise.
> >
> > tar'ed
> >
> > On Sun, Apr 22, 2018 at 3:07 PM, Oleg Tikhonov <o...@apache.org> wrote:
> >
> >> My bad. This one, hopefully ...
> >>
> >>
> >> On Sun, Apr 22, 2018 at 3:01 PM, Oleg Tikhonov <o...@apache.org> wrote:
> >>
> >>> Hi,
> >>> thanks a lot.
> >>> [x] +1 Release this package as Apache Tika 1.18
> >>>
> >>> Even did a security scan:
> >>> mvn org.owasp:dependency-check-maven:3.1.2:check
> >>>
> >>> Report is attached.
> >>>
> >>> Best regards,
> >>> Oleg
> >>>
> >>>
> >>> On Sat, Apr 21, 2018 at 12:54 AM, talli...@apache.org <
> >>> talli...@apache.org> wrote:
> >>>
> >>>> All,
> >>>> A candidate for the Tika 1.18 release is available at:
> >>>> https://dist.apache.org/repos/dist/dev/tika/
> >>>> The release candidate is a zip archive of the sources in:
> >>>> https://github.com/apache/tika/tree/1.18-rc3
> >>>> The SHA-512 checksum of the archive is
> f69ee27b31cf7bcb1eaf114b93c23d
> >>>> d85b974356cc7e6e265b1c9366a11d711a3341e690f5b452a3e8b0c5cc6f
> >>>> 5839db01b3ef6ec3a2a29ffcd332ff7a63dcf3.
> >>>> In addition, a staged maven repository is available here:
> >>>> https://repository.apache.org/content/repositories/orgapachetika-10
> >>>> 33 Please vote on releasing this package as Apache Tika 1.18.The
> >>>> vote is open for the next 72 hours and passes if a majority of
> >>>> atleast three +1 Tika PMC votes are cast.
> >>>> [ ] +1 Release this package as Apache Tika 1.18[ ] -1 Do not
> >>>> release this package because...
> >>>> +1 from me; third time's the charm...
> >>>> Cheers,
> >>>> Tim
> >>>
> >>>
> >>>
> >>
> >
>


RE: [VOTE] Release Apache Tika 1.18 Candidate #3

2018-04-24 Thread Allison, Timothy B.
Hi Dave,

Let us know what you find.  I had a successful build w and w/o tesseract on 
both Windows and Linux.

I did get a failed build if I wasn't connected to the internet because the URL 
exception in DL4JInceptionV3NetTest didn't have the expected message.  I've 
since fixed this in master and branch_1x.

If that's worth a respin, I can do that.

If you find something else (problems w diff version of tesseract, diff os, diff 
java version, etc), let us know.

Cheers,

   Tim

-Original Message-
From: David Meikle [mailto:loo...@gmail.com] 
Sent: Monday, April 23, 2018 7:07 PM
To: dev@tika.apache.org
Subject: Re: [VOTE] Release Apache Tika 1.18 Candidate #3

Hey Tim,

Just started looking at this and got an error with Tesseract enabled. Will try 
to see if it is localised to me or not.

Cheers,
Dave

On 22 April 2018 at 13:29, Oleg Tikhonov <o...@apache.org> wrote:

> Sorry for the noise.
>
> tar'ed
>
> On Sun, Apr 22, 2018 at 3:07 PM, Oleg Tikhonov <o...@apache.org> wrote:
>
>> My bad. This one, hopefully ...
>>
>>
>> On Sun, Apr 22, 2018 at 3:01 PM, Oleg Tikhonov <o...@apache.org> wrote:
>>
>>> Hi,
>>> thanks a lot.
>>> [x] +1 Release this package as Apache Tika 1.18
>>>
>>> Even did a security scan:
>>> mvn org.owasp:dependency-check-maven:3.1.2:check
>>>
>>> Report is attached.
>>>
>>> Best regards,
>>> Oleg
>>>
>>>
>>> On Sat, Apr 21, 2018 at 12:54 AM, talli...@apache.org < 
>>> talli...@apache.org> wrote:
>>>
>>>> All,
>>>> A candidate for the Tika 1.18 release is available at:
>>>> https://dist.apache.org/repos/dist/dev/tika/
>>>> The release candidate is a zip archive of the sources in:
>>>> https://github.com/apache/tika/tree/1.18-rc3
>>>> The SHA-512 checksum of the archive isf69ee27b31cf7bcb1eaf114b93c23d
>>>> d85b974356cc7e6e265b1c9366a11d711a3341e690f5b452a3e8b0c5cc6f
>>>> 5839db01b3ef6ec3a2a29ffcd332ff7a63dcf3.
>>>> In addition, a staged maven repository is available here:
>>>> https://repository.apache.org/content/repositories/orgapachetika-10
>>>> 33 Please vote on releasing this package as Apache Tika 1.18.The 
>>>> vote is open for the next 72 hours and passes if a majority of 
>>>> atleast three +1 Tika PMC votes are cast.
>>>> [ ] +1 Release this package as Apache Tika 1.18[ ] -1 Do not 
>>>> release this package because...
>>>> +1 from me; third time's the charm...
>>>> Cheers,
>>>> Tim
>>>
>>>
>>>
>>
>


Re: [VOTE] Release Apache Tika 1.18 Candidate #3

2018-04-23 Thread David Meikle
Hey Tim,

Just started looking at this and got an error with Tesseract enabled. Will
try to see if it is localised to me or not.

Cheers,
Dave

On 22 April 2018 at 13:29, Oleg Tikhonov <o...@apache.org> wrote:

> Sorry for the noise.
>
> tar'ed
>
> On Sun, Apr 22, 2018 at 3:07 PM, Oleg Tikhonov <o...@apache.org> wrote:
>
>> My bad. This one, hopefully ...
>>
>>
>> On Sun, Apr 22, 2018 at 3:01 PM, Oleg Tikhonov <o...@apache.org> wrote:
>>
>>> Hi,
>>> thanks a lot.
>>> [x] +1 Release this package as Apache Tika 1.18
>>>
>>> Even did a security scan:
>>> mvn org.owasp:dependency-check-maven:3.1.2:check
>>>
>>> Report is attached.
>>>
>>> Best regards,
>>> Oleg
>>>
>>>
>>> On Sat, Apr 21, 2018 at 12:54 AM, talli...@apache.org <
>>> talli...@apache.org> wrote:
>>>
>>>> All,
>>>> A candidate for the Tika 1.18 release is available at:
>>>> https://dist.apache.org/repos/dist/dev/tika/
>>>> The release candidate is a zip archive of the sources in:
>>>> https://github.com/apache/tika/tree/1.18-rc3
>>>> The SHA-512 checksum of the archive isf69ee27b31cf7bcb1eaf114b93c23d
>>>> d85b974356cc7e6e265b1c9366a11d711a3341e690f5b452a3e8b0c5cc6f
>>>> 5839db01b3ef6ec3a2a29ffcd332ff7a63dcf3.
>>>> In addition, a staged maven repository is available here:
>>>> https://repository.apache.org/content/repositories/orgapachetika-1033
>>>> Please vote on releasing this package as Apache Tika 1.18.The vote is
>>>> open for the next 72 hours and passes if a majority of atleast three +1
>>>> Tika PMC votes are cast.
>>>> [ ] +1 Release this package as Apache Tika 1.18[ ] -1 Do not release
>>>> this package because...
>>>> +1 from me; third time's the charm...
>>>> Cheers,
>>>> Tim
>>>
>>>
>>>
>>
>


Re: [VOTE] Release Apache Tika 1.18 Candidate #3

2018-04-22 Thread Oleg Tikhonov
Hi,
thanks a lot.
[x] +1 Release this package as Apache Tika 1.18

Even did a security scan:
mvn org.owasp:dependency-check-maven:3.1.2:check

Report is attached.

Best regards,
Oleg


On Sat, Apr 21, 2018 at 12:54 AM, talli...@apache.org <talli...@apache.org>
wrote:

> All,
> A candidate for the Tika 1.18 release is available at:
> https://dist.apache.org/repos/dist/dev/tika/
> The release candidate is a zip archive of the sources in:
> https://github.com/apache/tika/tree/1.18-rc3
> The SHA-512 checksum of the archive isf69ee27b31cf7bcb1eaf114b93c23d
> d85b974356cc7e6e265b1c9366a11d711a3341e690f5b452a3e8b0c5cc6f
> 5839db01b3ef6ec3a2a29ffcd332ff7a63dcf3.
> In addition, a staged maven repository is available here:
> https://repository.apache.org/content/repositories/orgapachetika-1033
> Please vote on releasing this package as Apache Tika 1.18.The vote is open
> for the next 72 hours and passes if a majority of atleast three +1 Tika PMC
> votes are cast.
> [ ] +1 Release this package as Apache Tika 1.18[ ] -1 Do not release this
> package because...
> +1 from me; third time's the charm...
> Cheers,
> Tim


[VOTE] Release Apache Tika 1.18 Candidate #3

2018-04-20 Thread talli...@apache.org
All,
A candidate for the Tika 1.18 release is available at:    
https://dist.apache.org/repos/dist/dev/tika/
The release candidate is a zip archive of the sources in:    
https://github.com/apache/tika/tree/1.18-rc3
The SHA-512 checksum of the archive is    
f69ee27b31cf7bcb1eaf114b93c23dd85b974356cc7e6e265b1c9366a11d711a3341e690f5b452a3e8b0c5cc6f5839db01b3ef6ec3a2a29ffcd332ff7a63dcf3.
In addition, a staged maven repository is available here:    
https://repository.apache.org/content/repositories/orgapachetika-1033
Please vote on releasing this package as Apache Tika 1.18.The vote is open for 
the next 72 hours and passes if a majority of atleast three +1 Tika PMC votes 
are cast.
[ ] +1 Release this package as Apache Tika 1.18[ ] -1 Do not release this 
package because...
+1 from me; third time's the charm...
Cheers,
            Tim 

Re: [VOTE][CANCELLED] Release Apache Tika 1.18 Candidate #2

2018-04-20 Thread talli...@apache.org
 Cancelling vote because of TIKA-2634.
Will respin RC3 shortly.
On Monday, April 16, 2018, 9:32:06 PM EDT, Tim Allison 
<talli...@apache.org> wrote:  
 
 A candidate for the Tika 1.18 release is available at:  
https://dist.apache.org/repos/dist/dev/tika/
The release candidate is a zip archive of the sources in:  
https://github.com/apache/tika/tree/1.18-rc2/
The SHA-512 checksum of the archive is  
254421677bf152591dd142bad2abdd35fc60f3029a91b210944759c77ce385fb3172e6a8d65dc357a0bc590c432dff5bbf8781db6539aa3054f44aed16fc63d7.
In addition, a staged maven repository is available here:  
https://repository.apache.org/content/repositories/orgapachetika-1032     
Please vote on releasing this package as Apache Tika 1.18.The vote is open for 
the next 72 hours and passes if a majority of atleast three +1 Tika PMC votes 
are cast.
[ ] +1 Release this package as Apache Tika 1.18[ ] -1 Do not release this 
package because...
Here's my +1
On behalf of the Apache Tika team,
         Tim Allison
  

[VOTE] Release Apache Tika 1.18 Candidate #2

2018-04-16 Thread Tim Allison
A candidate for the Tika 1.18 release is available at:  
https://dist.apache.org/repos/dist/dev/tika/
The release candidate is a zip archive of the sources in:  
https://github.com/apache/tika/tree/1.18-rc2/
The SHA-512 checksum of the archive is  
254421677bf152591dd142bad2abdd35fc60f3029a91b210944759c77ce385fb3172e6a8d65dc357a0bc590c432dff5bbf8781db6539aa3054f44aed16fc63d7.
In addition, a staged maven repository is available here:  
https://repository.apache.org/content/repositories/orgapachetika-1032     
Please vote on releasing this package as Apache Tika 1.18.The vote is open for 
the next 72 hours and passes if a majority of atleast three +1 Tika PMC votes 
are cast.
[ ] +1 Release this package as Apache Tika 1.18[ ] -1 Do not release this 
package because...
Here's my +1
On behalf of the Apache Tika team,
         Tim Allison


Re: [VOTE] Release Apache Tika 1.18 Candidate #1

2018-04-11 Thread Oleg Tikhonov
[+] Release this package as Apache Tika 1.18

[INFO] Apache Tika parent . SUCCESS [
12.379 s]
[INFO] Apache Tika core ... SUCCESS [
55.650 s]
[INFO] Apache Tika parsers  SUCCESS [05:55
min]
[INFO] Apache Tika XMP  SUCCESS [
7.254 s]
[INFO] Apache Tika serialization .. SUCCESS [
3.857 s]
[INFO] Apache Tika batch .. SUCCESS [02:13
min]
[INFO] Apache Tika language detection . SUCCESS [
8.152 s]
[INFO] Apache Tika application  SUCCESS [01:13
min]
[INFO] Apache Tika OSGi bundle  SUCCESS [
57.625 s]
[INFO] Apache Tika translate .. SUCCESS [
8.393 s]
[INFO] Apache Tika server . SUCCESS [01:05
min]
[INFO] Apache Tika examples ... SUCCESS [
19.053 s]
[INFO] Apache Tika Java-7 Components .. SUCCESS [
5.646 s]
[INFO] Apache Tika eval ... SUCCESS [
44.564 s]
[INFO] Apache Tika Deep Learning (powered by DL4J)  SUCCESS [07:45
min]
[INFO] Apache Tika Natural Language Processing  SUCCESS [01:47
min]
[INFO] Apache Tika  SUCCESS [
0.145 s]
[INFO] 

[INFO] BUILD SUCCESS

CentOS 7.3. Did only basic stuff.

I've seen that we have Docker image build script. Is there some
documentation?
I will dig into it ...
Thanks a lot,
Oleg

On Tue, Apr 10, 2018 at 3:36 PM, Tim Allison <talli...@apache.org> wrote:

> A candidate for the Tika 1.18 release is available at:
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>   https://github.com/apache/tika/tree/1.18-rc1/
>
> The SHA-512 checksum of the archive is
>   7f2e76e2973c9a0c3ba572afa74686ff95f0628136940b592c61d3639fe8
> 123f977fe321693a6c02a650172f3ef442e7a3adfa93d81d1d770233e47d8911b79e.
>
> In addition, a staged maven repository is available here:
>   https://repository.apache.org/content/repositories/orgapachetika-1031
>
>
>
> Please vote on releasing this package as Apache Tika 1.18.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.18
> [ ] -1 Do not release this package because...
>
> Here's my +1
>
> On behalf of the Apache Tika team,
>
>  Tim
>


[VOTE] Release Apache Tika 1.18 Candidate #1

2018-04-10 Thread Tim Allison
A candidate for the Tika 1.18 release is available at:
  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:
  https://github.com/apache/tika/tree/1.18-rc1/

The SHA-512 checksum of the archive is
  
7f2e76e2973c9a0c3ba572afa74686ff95f0628136940b592c61d3639fe8123f977fe321693a6c02a650172f3ef442e7a3adfa93d81d1d770233e47d8911b79e.

In addition, a staged maven repository is available here:
  https://repository.apache.org/content/repositories/orgapachetika-1031
  
   

Please vote on releasing this package as Apache Tika 1.18.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Tika PMC votes are cast.

[ ] +1 Release this package as Apache Tika 1.18
[ ] -1 Do not release this package because...

Here's my +1

On behalf of the Apache Tika team,

         Tim


RE: Tika 1.18?

2018-03-12 Thread Nick Burch

On Mon, 12 Mar 2018, Allison, Timothy B. wrote:
Anyone have anything they'd like to get in before I run the regression 
tests?  I can certainly put it off a few days.


I've made some progress on the metadata-only fallback/merge multiple 
parser work from https://wiki.apache.org/tika/CompositeParserDiscussion, 
but it's some way off finished yet. I don't think I can cause any 
regressions though! It can also wait for 1.19 if I don't get it stable in 
time to come off a branch.


Nick


RE: Tika 1.18?

2018-03-12 Thread Allison, Timothy B.
I'm working with PDFBox on regression tests for 2.0.9 now.  I'll probably kick 
off our own preliminary full corpus regression tests shortly... 
~2018-03-12T20:00 UTC 

Anyone have anything they'd like to get in before I run the regression tests?  
I can certainly put it off a few days.

Cheers,

 Tim

-Original Message-
From: Chris Mattmann [mailto:mattm...@apache.org] 
Sent: Wednesday, March 7, 2018 4:57 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.18?

Sounds good to me thanks Tim. Happy to line it up with PDF Box 2.0.9



Re: Tika 1.18?

2018-03-07 Thread Chris Mattmann
Sounds good to me thanks Tim. Happy to line it up with PDF Box 2.0.9


On 3/7/18, 1:16 PM, "Allison, Timothy B."  wrote:

All,

  I think I've made the updates that I wanted to make sure got in to 1.18.  
It looks like PDFBox is going to start their release cycle shortly.  Should we 
wait for PDFBox 2.0.9?

  That may add a week or two to our release, although, frankly, it might 
not.  We can start running the regression tests March 9(ish) and see if 
anything dire appears...

  Cheers,

  Tim






RE: Tika 1.18?

2018-03-07 Thread Allison, Timothy B.
All,

  I think I've made the updates that I wanted to make sure got in to 1.18.  It 
looks like PDFBox is going to start their release cycle shortly.  Should we 
wait for PDFBox 2.0.9?

  That may add a week or two to our release, although, frankly, it might not.  
We can start running the regression tests March 9(ish) and see if anything dire 
appears...

  Cheers,

  Tim



Re: Tika 1.18?

2018-03-07 Thread Luís Filipe Nassif
I thought about logging any custom-mimetype override applied, so the user
will be warned about that. Maybe additionally creating a specific attribute
in mimetype definition xml to configure it must override the default one
instead of aborting. About multiple conflicting custom mimes from different
(external) projetcs, Tika currently aborts and it is already a problem now.

So I think it needs additional discussion and should not be addressed in
the next release. Will copy/paste this discussion in the jira issue.

But I would like to see fixed the detection of MTS videos, but it conflicts
with another existing mime glob. Any workaround for this specific case? If
yes, I can open a different ticket.



Em 2 de mar de 2018 18:23, "Nick Burch"  escreveu:

On Fri, 2 Mar 2018, Luís Filipe Nassif wrote:

> If I make no progress on TIKA-1466 until 3/9, you can start the release
> process without it. But do you devs agree with the proposed change: allow
> overriding of glob patterns in custom-mimetypes.xml?
>

What happens if you have two different custom files which both claim the
same glob?

We have historically been a bit stricter about built-in types overriding,
in part to avoid people doing silly things by mistake, and in part to push
people a bit more towards contributing fixes/enhancements for built-in
types. I think the latter is less of a thing today, as we've a lot more
covered as standard, so it's just the former we need to worry about.

How do we help people know when they have conflicting overrides (possibly
from different projects), help them sensibly merge or turn off Tika
provided magic+definitions, and to alert them to when their copied +
customised version probably wants updating following a tika upgrade giving
a newer definition? Do a better job of those than we currently do now, then
I'm very happy to +1 it :)

Nick


Re: Tika 1.18?

2018-03-02 Thread Nick Burch

On Fri, 2 Mar 2018, Luís Filipe Nassif wrote:

If I make no progress on TIKA-1466 until 3/9, you can start the release
process without it. But do you devs agree with the proposed change: allow
overriding of glob patterns in custom-mimetypes.xml?


What happens if you have two different custom files which both claim the 
same glob?


We have historically been a bit stricter about built-in types overriding, 
in part to avoid people doing silly things by mistake, and in part to push 
people a bit more towards contributing fixes/enhancements for built-in 
types. I think the latter is less of a thing today, as we've a lot more 
covered as standard, so it's just the former we need to worry about.


How do we help people know when they have conflicting overrides (possibly 
from different projects), help them sensibly merge or turn off Tika 
provided magic+definitions, and to alert them to when their copied + 
customised version probably wants updating following a tika upgrade giving 
a newer definition? Do a better job of those than we currently do now, 
then I'm very happy to +1 it :)


Nick

RE: Tika 1.18?

2018-03-02 Thread Allison, Timothy B.
> But do you devs agree with the proposed change: allow overriding of glob 
> patterns in custom-mimetypes.xml?

+1 from me

From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com]
Sent: Friday, March 2, 2018 8:21 AM
To: Allison, Timothy B. <talli...@mitre.org>
Cc: dev@tika.apache.org
Subject: Re: Tika 1.18?

If I make no progress on TIKA-1466 until 3/9, you can start the release process 
without it. But do you devs agree with the proposed change: allow overriding of 
glob patterns in custom-mimetypes.xml?



RE: Tika 1.18?

2018-03-02 Thread Allison, Timothy B.
TIKA-2591 and TIKA-2568
+1

TIKA-1466 -- how long will it take, do you think?  This seems potentially 
non-trivial...

-Original Message-
From: Luís Filipe Nassif [mailto:lfcnas...@gmail.com] 
Sent: Thursday, March 1, 2018 5:41 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.18?

I think we should workaround TIKA-2591, and I would like to work on TIKA-1466 
(what do you think?) and fix TIKA-2568.

Cheers,
Luis


Re: Tika 1.18?

2018-03-01 Thread Luís Filipe Nassif
I think we should workaround TIKA-2591, and I would like to work
on TIKA-1466 (what do you think?) and fix TIKA-2568.

Cheers,
Luis


Livre
de vírus. www.avast.com
.
<#m_3134801720618142664_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

2018-03-01 13:24 GMT-03:00 Chris Mattmann :

> Same: makes perfect sense to me and let's do it ( I just updated (finally)
> Tika Python down
> stream to be based on the 1.16 Tika, I guess I should get it based on 1.17
> soon too (
>
> https://github.com/chrismattmann/tika-python/blob/master/tika/__init__.py#
> L17
>
> Cheers,
> Chris
>
> On 3/1/18, 5:16 AM, "Nick Burch"  wrote:
>
> On Thu, 1 Mar 2018, Allison, Timothy B. wrote:
> > There have been some important bug fixes, a few new capabilities, and
> > the upgrading of dependencies because of CVEs.  There are a bunch of
> > mime tickets from Andreas Meier that I’d like to get into 1.18.  Is
> > there anything else that is critical?
>
> I've had a busy few weeks, so haven't yet had a chance to try out my
> proposed multi-parser stuff for 2.x. I'll hopefully take a look next
> week,
> assuming even the fastest review cycle and everyone loving it, I can't
> see
> us being ready to all sign-off on those "2.x breaking changes" until
> probably April.
>
> Given that, doing an interim 1.x release soon makes sense to me!
>
> Nick
>
>
>


Re: Tika 1.18?

2018-03-01 Thread Chris Mattmann
Same: makes perfect sense to me and let's do it ( I just updated (finally) Tika 
Python down
stream to be based on the 1.16 Tika, I guess I should get it based on 1.17 soon 
too (

https://github.com/chrismattmann/tika-python/blob/master/tika/__init__.py#L17

Cheers,
Chris

On 3/1/18, 5:16 AM, "Nick Burch"  wrote:

On Thu, 1 Mar 2018, Allison, Timothy B. wrote:
> There have been some important bug fixes, a few new capabilities, and 
> the upgrading of dependencies because of CVEs.  There are a bunch of 
> mime tickets from Andreas Meier that I’d like to get into 1.18.  Is 
> there anything else that is critical?

I've had a busy few weeks, so haven't yet had a chance to try out my 
proposed multi-parser stuff for 2.x. I'll hopefully take a look next week, 
assuming even the fastest review cycle and everyone loving it, I can't see 
us being ready to all sign-off on those "2.x breaking changes" until 
probably April.

Given that, doing an interim 1.x release soon makes sense to me!

Nick




Re: Tika 1.18?

2018-03-01 Thread Nick Burch

On Thu, 1 Mar 2018, Allison, Timothy B. wrote:
There have been some important bug fixes, a few new capabilities, and 
the upgrading of dependencies because of CVEs.  There are a bunch of 
mime tickets from Andreas Meier that I’d like to get into 1.18.  Is 
there anything else that is critical?


I've had a busy few weeks, so haven't yet had a chance to try out my 
proposed multi-parser stuff for 2.x. I'll hopefully take a look next week, 
assuming even the fastest review cycle and everyone loving it, I can't see 
us being ready to all sign-off on those "2.x breaking changes" until 
probably April.


Given that, doing an interim 1.x release soon makes sense to me!

Nick

Tika 1.18?

2018-03-01 Thread Allison, Timothy B.
All,
There have been some important bug fixes, a few new capabilities, and the 
upgrading of dependencies because of CVEs.  There are a bunch of mime tickets 
from Andreas Meier that I’d like to get into 1.18.  Is there anything else that 
is critical?
Schedule wise, I propose getting changes in by say, next Friday (3/9), 
regression tests the next week, RC1 the following week[0]?
WDYT?

Cheers,

Tim

[0] week = “open source week” which can be significantly longer than a calendar 
week when surprises emerge. 

Timothy B. Allison, Ph.D.
Principal Artificial Intelligence Engineer
T835/Human Language Technology
The MITRE Corporation
7515 Colshire Drive, McLean, VA  22102
703-983-2473 (phone); 703-983-1379 (fax)