Re: [EXTERNAL] Docker image along with 1.23?

2019-11-20 Thread Chris Mattmann
Yeah producing the actual image is tricky and my recommendation is for Tika to 
stay out of the business of that. Leave it to LogicalSpark or others to do 
this. It’s 
tricky with licenses and I doubt ASF will ever develop an optimal solution to 
this 
due to the nature of its core mission as Nick stated.

 

 

 

 

From: Eric Pugh 
Reply-To: "dev@tika.apache.org" 
Date: Wednesday, November 20, 2019 at 6:02 PM
To: "dev@tika.apache.org" 
Cc: "Allison, Timothy B (US 1760-Affiliate)" 
Subject: Re: [EXTERNAL] Docker image along with 1.23?

 

I was thinking more of producing the actual image, so that others don’t have to 
go through the pain of compiling an image.   Having the Dockerfile made 
available as well does give a nice recipe for modifying the “official” image.   
I recently tested Tesseract 3 with the latest Tika, and I did it by tweaking 
the existing Dockerfile that LogicalSpark has published.

 

I don’t know how other projects at ASF handle the image publishing.

 

 

 

 

On Nov 20, 2019, at 7:02 PM, Chris Mattmann  wrote:

Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping 
text file, 

code. Under a license. If we create a “docker image” and then publish it to the 
ASF 

hub then I agree with you.

My suggestion and my interpretation of Tim’s is to ship a standard 
“Dockerfile”. Do you

agree with this? It should be air covered (as former VP, Legal, at least it 
would have been

with me). 

Cheers,

Chris

From: Nick Burch 

Reply-To: "dev@tika.apache.org" 

Date: Wednesday, November 20, 2019 at 3:57 PM

To: "Allison, Timothy B (US 1760-Affiliate)" 

Cc: "" 

Subject: [EXTERNAL] Re: Docker image along with 1.23?

On Wed, 20 Nov 2019, Tim Allison wrote:

Eric Pugh recently asked on another channel if we had any plans to

release an official docker image for 1.23.

Depending on what we put in the container, we do need to be a little 

careful. There's "platform dependencies" under non-compatible licenses 

that we can optionally use if people have installed them, which we 

ourselves can't directly ship under ASF rules. (Tesseract is fine as 

that's Apache Licenses, Java itself is trickier, see the Netbeans 

discussions on legal-discuss@ and LEGAL jira)

Shipping an official docker container with the Tika Server on seems to me 

to be a helpful step for users, but we just need to make sure we're 

following ASF policies. (The Apache Software Foundation mission is to 

"provide software for the public good", but source code is the main focus 

for the mission, binaries are trickier!)

Nick

 

___

Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   

Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 

   

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

 

 



Re: [EXTERNAL] Docker image along with 1.23?

2019-11-20 Thread Eric Pugh
I was thinking more of producing the actual image, so that others don’t have to 
go through the pain of compiling an image.   Having the Dockerfile made 
available as well does give a nice recipe for modifying the “official” image.   
I recently tested Tesseract 3 with the latest Tika, and I did it by tweaking 
the existing Dockerfile that LogicalSpark has published.

I don’t know how other projects at ASF handle the image publishing.




> On Nov 20, 2019, at 7:02 PM, Chris Mattmann  wrote:
> 
> Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping 
> text file, 
> code. Under a license. If we create a “docker image” and then publish it to 
> the ASF 
> hub then I agree with you.
> 
> 
> 
> My suggestion and my interpretation of Tim’s is to ship a standard 
> “Dockerfile”. Do you
> agree with this? It should be air covered (as former VP, Legal, at least it 
> would have been
> with me). 
> 
> 
> 
> Cheers,
> 
> Chris
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Nick Burch 
> Reply-To: "dev@tika.apache.org" 
> Date: Wednesday, November 20, 2019 at 3:57 PM
> To: "Allison, Timothy B (US 1760-Affiliate)" 
> Cc: "" 
> Subject: [EXTERNAL] Re: Docker image along with 1.23?
> 
> 
> 
> On Wed, 20 Nov 2019, Tim Allison wrote:
> 
> Eric Pugh recently asked on another channel if we had any plans to
> 
> release an official docker image for 1.23.
> 
> 
> 
> Depending on what we put in the container, we do need to be a little 
> 
> careful. There's "platform dependencies" under non-compatible licenses 
> 
> that we can optionally use if people have installed them, which we 
> 
> ourselves can't directly ship under ASF rules. (Tesseract is fine as 
> 
> that's Apache Licenses, Java itself is trickier, see the Netbeans 
> 
> discussions on legal-discuss@ and LEGAL jira)
> 
> 
> 
> Shipping an official docker container with the Tika Server on seems to me 
> 
> to be a helpful step for users, but we just need to make sure we're 
> 
> following ASF policies. (The Apache Software Foundation mission is to 
> 
> "provide software for the public good", but source code is the main focus 
> 
> for the mission, binaries are trickier!)
> 
> 
> 
> Nick
> 
> 
> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



[jira] [Commented] (TIKA-2993) tika-server's /rmeta endpoint shouldn't throw an exception for a parse exception

2019-11-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978831#comment-16978831
 ] 

Hudson commented on TIKA-2993:
--

SUCCESS: Integrated in Jenkins build tika-branch-1x #277 (See 
[https://builds.apache.org/job/tika-branch-1x/277/])
TIKA-2993 -- tika-server's /rmeta endpoint shouldn't throw an exception 
(tallison: 
[https://github.com/apache/tika/commit/afab2a92f91ada54602b028d30979e048bdfbda2])
* (edit) 
tika-server/src/main/java/org/apache/tika/server/resource/RecursiveMetadataResource.java
* (edit) 
tika-server/src/test/java/org/apache/tika/server/RecursiveMetadataResourceTest.java
* (edit) tika-server/src/test/java/org/apache/tika/server/StackTraceOffTest.java
* (edit) tika-server/src/test/java/org/apache/tika/server/StackTraceTest.java


> tika-server's /rmeta endpoint shouldn't throw an exception for a parse 
> exception
> 
>
> Key: TIKA-2993
> URL: https://issues.apache.org/jira/browse/TIKA-2993
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Fix For: 1.23
>
>
> tika-server's /rmeta endpoint should catch exceptions and report them in the 
> returned metadata list as happens with tika-app in batch mode.
> This includes parse exceptions, null-byte exceptions and encrypted document 
> exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-2994) ExceptionUtils should let TikaException subclasses through

2019-11-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978832#comment-16978832
 ] 

Hudson commented on TIKA-2994:
--

SUCCESS: Integrated in Jenkins build tika-branch-1x #277 (See 
[https://builds.apache.org/job/tika-branch-1x/277/])
TIKA-2994 -- ExceptionUtils should not extract cause from subclasses of 
(tallison: 
[https://github.com/apache/tika/commit/ff93d145d5e176047b5609233d97a5c78804b795])
* (edit) 
tika-server/src/test/java/org/apache/tika/server/RecursiveMetadataResourceTest.java
* (edit) tika-core/src/main/java/org/apache/tika/utils/ExceptionUtils.java


> ExceptionUtils should let TikaException subclasses through
> --
>
> Key: TIKA-2994
> URL: https://issues.apache.org/jira/browse/TIKA-2994
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 1.23
>
>
> ExceptionUtils is currently pulling the cause from a class that is an 
> instance of TikaException.  This hides the commonality of subclasses of 
> TikaException, e.g. EncryptedDocumentException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-2979) tika-server shouldn't throw an exception for a non-supported format

2019-11-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978830#comment-16978830
 ] 

Hudson commented on TIKA-2979:
--

SUCCESS: Integrated in Jenkins build tika-branch-1x #277 (See 
[https://builds.apache.org/job/tika-branch-1x/277/])
TIKA-2979 -- tika-server shouldn't throw an exception for a file format 
(tallison: 
[https://github.com/apache/tika/commit/f216d84a9a6430fe370233a406054d157dc5fedb])
* (edit) tika-server/src/test/java/org/apache/tika/server/StackTraceTest.java
* (edit) 
tika-server/src/test/java/org/apache/tika/server/UnpackerResourceTest.java
* (edit) CHANGES.txt
* (edit) 
tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java
* (edit) tika-server/src/test/java/org/apache/tika/server/StackTraceOffTest.java


> tika-server shouldn't throw an exception for a non-supported format
> ---
>
> Key: TIKA-2979
> URL: https://issues.apache.org/jira/browse/TIKA-2979
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 1.23
>
>
> tika-server throws an UnsupportedMediaTypeException if there is no parser for 
> a given format.  This prevents users from getting the mime type and any 
> digests that were computed on the stream.  Further this is different behavior 
> than tika-app where we rely on the EmptyParser to do nothing.
> Is there a need for this exception?
> Would it be too great a change to turn off this behavior?  If it would be, 
> any recommendations for the commandline switch to turn it off?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [EXTERNAL] Re: Docker image along with 1.23?

2019-11-20 Thread Chris Mattmann
Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping 
text file, 
code. Under a license. If we create a “docker image” and then publish it to the 
ASF 
hub then I agree with you.

 

My suggestion and my interpretation of Tim’s is to ship a standard 
“Dockerfile”. Do you
agree with this? It should be air covered (as former VP, Legal, at least it 
would have been
with me). 

 

Cheers,

Chris

 

 

 

 

From: Nick Burch 
Reply-To: "dev@tika.apache.org" 
Date: Wednesday, November 20, 2019 at 3:57 PM
To: "Allison, Timothy B (US 1760-Affiliate)" 
Cc: "" 
Subject: [EXTERNAL] Re: Docker image along with 1.23?

 

On Wed, 20 Nov 2019, Tim Allison wrote:

Eric Pugh recently asked on another channel if we had any plans to

release an official docker image for 1.23.

 

Depending on what we put in the container, we do need to be a little 

careful. There's "platform dependencies" under non-compatible licenses 

that we can optionally use if people have installed them, which we 

ourselves can't directly ship under ASF rules. (Tesseract is fine as 

that's Apache Licenses, Java itself is trickier, see the Netbeans 

discussions on legal-discuss@ and LEGAL jira)

 

Shipping an official docker container with the Tika Server on seems to me 

to be a helpful step for users, but we just need to make sure we're 

following ASF policies. (The Apache Software Foundation mission is to 

"provide software for the public good", but source code is the main focus 

for the mission, binaries are trickier!)

 

Nick

 



Re: Docker image along with 1.23?

2019-11-20 Thread Nick Burch

On Wed, 20 Nov 2019, Tim Allison wrote:

Eric Pugh recently asked on another channel if we had any plans to
release an official docker image for 1.23.


Depending on what we put in the container, we do need to be a little 
careful. There's "platform dependencies" under non-compatible licenses 
that we can optionally use if people have installed them, which we 
ourselves can't directly ship under ASF rules. (Tesseract is fine as 
that's Apache Licenses, Java itself is trickier, see the Netbeans 
discussions on legal-discuss@ and LEGAL jira)


Shipping an official docker container with the Tika Server on seems to me 
to be a helpful step for users, but we just need to make sure we're 
following ASF policies. (The Apache Software Foundation mission is to 
"provide software for the public good", but source code is the main focus 
for the mission, binaries are trickier!)


Nick


Re: [EXTERNAL] Docker image along with 1.23?

2019-11-20 Thread Mattmann, Chris A (US 1760)
Sure let's do that. I also have a set of tika-dockers in USCDataScience useful 
for the ML/Deep learning stuff.



Chris Mattmann, Ph.D.
Deputy Chief Technology & Innovation Officer
17x   |   Office of the Chief Information Officer, Chief Technology and 
Innovation Office (1760)

JPL   |   jpl.nasa.gov
4800 Oak Grove Dr, Mail Stop 171-377
Pasadena, California 91109
O 818-354-8810   |   M 626-755-6564


From: Tim Allison 
Reply-To: "dev@tika.apache.org" , "Allison, Timothy B (US 
1760-Affiliate)" 
Date: Wednesday, November 20, 2019 at 1:20 PM
To: "" 
Subject: [EXTERNAL] Docker image along with 1.23?

All,
  Eric Pugh recently asked on another channel if we had any plans to
release an official docker image for 1.23.  IIRC Dave had that up and
running, but I couldn't get it to work as part of the release
process because I was behind a proxy or on Windows or something.  My dev
environment is now different, and I _should_ be able to get it to work.
  Do we want to try to release an official Docker image as part of the 1.23
release?

   Cheers,

   Tim



[jira] [Resolved] (TIKA-2994) ExceptionUtils should let TikaException subclasses through

2019-11-20 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2994.
---
Fix Version/s: 1.23
   Resolution: Fixed

> ExceptionUtils should let TikaException subclasses through
> --
>
> Key: TIKA-2994
> URL: https://issues.apache.org/jira/browse/TIKA-2994
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 1.23
>
>
> ExceptionUtils is currently pulling the cause from a class that is an 
> instance of TikaException.  This hides the commonality of subclasses of 
> TikaException, e.g. EncryptedDocumentException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (TIKA-2993) tika-server's /rmeta endpoint shouldn't throw an exception for a parse exception

2019-11-20 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2993.
---
Fix Version/s: 1.23
   Resolution: Fixed

> tika-server's /rmeta endpoint shouldn't throw an exception for a parse 
> exception
> 
>
> Key: TIKA-2993
> URL: https://issues.apache.org/jira/browse/TIKA-2993
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Fix For: 1.23
>
>
> tika-server's /rmeta endpoint should catch exceptions and report them in the 
> returned metadata list as happens with tika-app in batch mode.
> This includes parse exceptions, null-byte exceptions and encrypted document 
> exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-2994) ExceptionUtils should let TikaException subclasses through

2019-11-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978810#comment-16978810
 ] 

Hudson commented on TIKA-2994:
--

UNSTABLE: Integrated in Jenkins build Tika-trunk #1745 (See 
[https://builds.apache.org/job/Tika-trunk/1745/])
TIKA-2994 -- ExceptionUtils should not extract cause from subclasses of 
(tallison: 
[https://github.com/apache/tika/commit/7063b7d1b79495a3c390f06eee1ba451155257ed])
* (edit) 
tika-server/src/test/java/org/apache/tika/server/RecursiveMetadataResourceTest.java
* (edit) tika-core/src/main/java/org/apache/tika/utils/ExceptionUtils.java


> ExceptionUtils should let TikaException subclasses through
> --
>
> Key: TIKA-2994
> URL: https://issues.apache.org/jira/browse/TIKA-2994
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Minor
>
> ExceptionUtils is currently pulling the cause from a class that is an 
> instance of TikaException.  This hides the commonality of subclasses of 
> TikaException, e.g. EncryptedDocumentException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (TIKA-2993) tika-server's /rmeta endpoint shouldn't throw an exception for a parse exception

2019-11-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978809#comment-16978809
 ] 

Hudson commented on TIKA-2993:
--

UNSTABLE: Integrated in Jenkins build Tika-trunk #1745 (See 
[https://builds.apache.org/job/Tika-trunk/1745/])
TIKA-2993 -- tika-server's /rmeta endpoint shouldn't throw an exception 
(tallison: 
[https://github.com/apache/tika/commit/80d96a4ae4207324c17e3195f7dd6552ae235152])
* (edit) 
tika-server/src/test/java/org/apache/tika/server/RecursiveMetadataResourceTest.java
* (edit) tika-server/src/test/java/org/apache/tika/server/StackTraceOffTest.java
* (edit) tika-server/src/test/java/org/apache/tika/server/StackTraceTest.java
* (edit) 
tika-server/src/main/java/org/apache/tika/server/resource/RecursiveMetadataResource.java


> tika-server's /rmeta endpoint shouldn't throw an exception for a parse 
> exception
> 
>
> Key: TIKA-2993
> URL: https://issues.apache.org/jira/browse/TIKA-2993
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
>
> tika-server's /rmeta endpoint should catch exceptions and report them in the 
> returned metadata list as happens with tika-app in batch mode.
> This includes parse exceptions, null-byte exceptions and encrypted document 
> exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (TIKA-2979) tika-server shouldn't throw an exception for a non-supported format

2019-11-20 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2979.
---
Fix Version/s: 1.23
   Resolution: Fixed

> tika-server shouldn't throw an exception for a non-supported format
> ---
>
> Key: TIKA-2979
> URL: https://issues.apache.org/jira/browse/TIKA-2979
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 1.23
>
>
> tika-server throws an UnsupportedMediaTypeException if there is no parser for 
> a given format.  This prevents users from getting the mime type and any 
> digests that were computed on the stream.  Further this is different behavior 
> than tika-app where we rely on the EmptyParser to do nothing.
> Is there a need for this exception?
> Would it be too great a change to turn off this behavior?  If it would be, 
> any recommendations for the commandline switch to turn it off?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-2994) ExceptionUtils should let TikaException subclasses through

2019-11-20 Thread Tim Allison (Jira)
Tim Allison created TIKA-2994:
-

 Summary: ExceptionUtils should let TikaException subclasses through
 Key: TIKA-2994
 URL: https://issues.apache.org/jira/browse/TIKA-2994
 Project: Tika
  Issue Type: Improvement
Reporter: Tim Allison


ExceptionUtils is currently pulling the cause from a class that is an instance 
of TikaException.  This hides the commonality of subclasses of TikaException, 
e.g. EncryptedDocumentException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Docker image along with 1.23?

2019-11-20 Thread Tim Allison
All,
  Eric Pugh recently asked on another channel if we had any plans to
release an official docker image for 1.23.  IIRC Dave had that up and
running, but I couldn't get it to work as part of the release
process because I was behind a proxy or on Windows or something.  My dev
environment is now different, and I _should_ be able to get it to work.
  Do we want to try to release an official Docker image as part of the 1.23
release?

   Cheers,

   Tim


[jira] [Commented] (TIKA-2979) tika-server shouldn't throw an exception for a non-supported format

2019-11-20 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-2979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978735#comment-16978735
 ] 

Hudson commented on TIKA-2979:
--

UNSTABLE: Integrated in Jenkins build Tika-trunk #1744 (See 
[https://builds.apache.org/job/Tika-trunk/1744/])
TIKA-2979 -- tika-server shouldn't throw an exception for a file format 
(tallison: 
[https://github.com/apache/tika/commit/833630effeefdd60c2319b2f51acb8baf3830027])
* (edit) tika-server/src/test/java/org/apache/tika/server/StackTraceTest.java
* (edit) tika-server/src/test/java/org/apache/tika/server/StackTraceOffTest.java
* (edit) 
tika-server/src/test/java/org/apache/tika/server/UnpackerResourceTest.java
* (edit) CHANGES.txt
* (edit) 
tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java


> tika-server shouldn't throw an exception for a non-supported format
> ---
>
> Key: TIKA-2979
> URL: https://issues.apache.org/jira/browse/TIKA-2979
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Minor
>
> tika-server throws an UnsupportedMediaTypeException if there is no parser for 
> a given format.  This prevents users from getting the mime type and any 
> digests that were computed on the stream.  Further this is different behavior 
> than tika-app where we rely on the EmptyParser to do nothing.
> Is there a need for this exception?
> Would it be too great a change to turn off this behavior?  If it would be, 
> any recommendations for the commandline switch to turn it off?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-2993) tika-server's /rmeta endpoint shouldn't throw an exception for a parse exception

2019-11-20 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-2993:
--
Description: 
tika-server's /rmeta endpoint should catch exceptions and report them in the 
returned metadata list as happens with tika-app in batch mode.

This includes parse exceptions, null-byte exceptions and encrypted document 
exceptions.

  was:
tika-server's /rmeta endpoint should catch exceptions and report them in the 
returned metadata list as happens with tika-app in batch mode.



> tika-server's /rmeta endpoint shouldn't throw an exception for a parse 
> exception
> 
>
> Key: TIKA-2993
> URL: https://issues.apache.org/jira/browse/TIKA-2993
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
>
> tika-server's /rmeta endpoint should catch exceptions and report them in the 
> returned metadata list as happens with tika-app in batch mode.
> This includes parse exceptions, null-byte exceptions and encrypted document 
> exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (TIKA-2993) tika-server's /rmeta endpoint shouldn't throw an exception for a parse exception

2019-11-20 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-2993:
--
Summary: tika-server's /rmeta endpoint shouldn't throw an exception for a 
parse exception  (was: tika-server's /rmeta endpoint shouldn't throw an 
exception when stacktrace is turned on)

> tika-server's /rmeta endpoint shouldn't throw an exception for a parse 
> exception
> 
>
> Key: TIKA-2993
> URL: https://issues.apache.org/jira/browse/TIKA-2993
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
>
> tika-server's /rmeta endpoint should catch exceptions and report them in the 
> returned metadata list as happens with tika-app in batch mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (TIKA-2993) tika-server's /rmeta endpoint shouldn't throw an exception when stacktrace is turned on

2019-11-20 Thread Tim Allison (Jira)
Tim Allison created TIKA-2993:
-

 Summary: tika-server's /rmeta endpoint shouldn't throw an 
exception when stacktrace is turned on
 Key: TIKA-2993
 URL: https://issues.apache.org/jira/browse/TIKA-2993
 Project: Tika
  Issue Type: Improvement
Reporter: Tim Allison


tika-server's /rmeta endpoint should catch exceptions and report them in the 
returned metadata list as happens with tika-app in batch mode.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [EXTERNAL] Tika 1.23?

2019-11-20 Thread Eric Pugh
+1 from contributor

On Wed, Nov 20, 2019 at 12:09 PM Chris Mattmann  wrote:

> +1 ship it
>
>
>
>
>
>
>
> From: Tim Allison 
> Reply-To: "dev@tika.apache.org" , "Allison, Timothy
> B (US 1760-Affiliate)" 
> Date: Wednesday, November 20, 2019 at 9:07 AM
> To: "" 
> Subject: [EXTERNAL] Tika 1.23?
>
>
>
> All,
>
>   I've abandoned hope of getting the contenthandler factory configuration
>
> stuff into 1.23.  We've added some new mime types, upgraded POI and made a
>
> number of other useful changes.
>
>   WDYT about kicking off regression tests shortly?  Any blockers?
>
>
>
>   Best,
>
>
>
> Tim
>
>
>
>


Re: [EXTERNAL] Tika 1.23?

2019-11-20 Thread Chris Mattmann
+1 ship it

 

 

 

From: Tim Allison 
Reply-To: "dev@tika.apache.org" , "Allison, Timothy B (US 
1760-Affiliate)" 
Date: Wednesday, November 20, 2019 at 9:07 AM
To: "" 
Subject: [EXTERNAL] Tika 1.23?

 

All,

  I've abandoned hope of getting the contenthandler factory configuration

stuff into 1.23.  We've added some new mime types, upgraded POI and made a

number of other useful changes.

  WDYT about kicking off regression tests shortly?  Any blockers?

 

  Best,

 

Tim

 



[jira] [Commented] (TIKA-2992) java.lang.UnsupportedOperationException: This feature requires ASM7 in Tika 1.21

2019-11-20 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-2992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978265#comment-16978265
 ] 

Nick Burch commented on TIKA-2992:
--

Most likely you have an older version of ASM on your classpath which is being 
used instead of the one that comes via Tika Parsers

Can you check, and remove the older one if so? Apache POI has sample code 
snippet to see where it came from at http://poi.apache.org/help/faq.html - you 
could do the same for an ASM class to check for that if you're struggling to 
hunt down the jar

>  java.lang.UnsupportedOperationException: This feature requires ASM7 in Tika 
> 1.21
> -
>
> Key: TIKA-2992
> URL: https://issues.apache.org/jira/browse/TIKA-2992
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.21
>Reporter: Arvind Jain
>Priority: Major
>
> We are using Tika java library to parse a bunch of documents (various 
> formats). We are seeing the exception below regularly in our logs on certain 
> documents. Any suggestions on how to fix would be really useful. On initial 
> investigation it looks like its a bug with mismatched ASM between 
> XHTMLClassVisitor and tika-parsers pom. 
>  
> Failed to parse the document. org.apache.tika.exception.TikaException: Failed 
> to parse a Java class
> at org.apache.tika.parser.asm.XHTMLClassVisitor.parse 
> (XHTMLClassVisitor.java:66)
> at org.apache.tika.parser.asm.ClassParser.parse (ClassParser.java:51)
> at org.apache.tika.parser.CompositeParser.parse (CompositeParser.java:280)
> at org.apache.tika.parser.CompositeParser.parse (CompositeParser.java:280)
> at org.apache.tika.parser.AutoDetectParser.parse (AutoDetectParser.java:143)
> at com.askscio.beam.docbuilder.processor.parsers.GenericParser.parse 
> (GenericParser.java:55)
> 
> Caused by: java.lang.UnsupportedOperationException: This feature requires ASM7
> at org.objectweb.asm.ClassVisitor.visitNestMember (ClassVisitor.java:236)
> at org.objectweb.asm.ClassReader.accept (ClassReader.java:660)
> at org.objectweb.asm.ClassReader.accept (ClassReader.java:400)
> at org.apache.tika.parser.asm.XHTMLClassVisitor.parse 
> (XHTMLClassVisitor.java:61)}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)