[jira] [Updated] (TIKA-2365) Signer's Information doesn't match issue

2017-05-16 Thread Mujahid Ateeb Khan (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mujahid Ateeb Khan updated TIKA-2365: - Description: I'm working with Birt and tika, birt uses a jar called org.apache.batik.pdf

[jira] [Comment Edited] (TIKA-2362) Skipping Header and Footer data from documents

2017-05-16 Thread Mujahid Ateeb Khan (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013488#comment-16013488 ] Mujahid Ateeb Khan edited comment on TIKA-2362 at 5/17/17 4:17 AM: --- Yes I

[jira] [Commented] (TIKA-2362) Skipping Header and Footer data from documents

2017-05-16 Thread Mujahid Ateeb Khan (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013488#comment-16013488 ] Mujahid Ateeb Khan commented on TIKA-2362: -- Yes I tried that method using XHTML handler but some

[jira] [Commented] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-16 Thread Thamme Gowda (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013472#comment-16013472 ] Thamme Gowda commented on TIKA-2360: Sorry, I am late to the discussion. 1. (y) to turn it OFF. I had

[jira] [Commented] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013087#comment-16013087 ] Tim Allison commented on TIKA-2360: --- > Thanks Tim, appreciate it. Of course! I'm sorry for moving out on

[jira] [Commented] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013042#comment-16013042 ] Chris A. Mattmann commented on TIKA-2360: - Thanks Tim, appreciate it. I think at the end of the day

[jira] [Commented] (TIKA-2367) Avoid npe in wmf

2017-05-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013002#comment-16013002 ] Hudson commented on TIKA-2367: -- FAILURE: Integrated in Jenkins build Tika-trunk #1268 (See

RE: Tika 1.15

2017-05-16 Thread Allison, Timothy B.
I reran the eval with some updates, including rc1 of PDFBox 2.0.6, which is now integrated. http://162.242.228.174/reports/reports_tika_20170515.tar.gz I need to do some more digging on attachments -- hit max limit. The decrease in attachments from the few docs I reviewed is explained by

[jira] [Resolved] (TIKA-2367) Avoid npe in wmf

2017-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2367. --- Resolution: Fixed Fix Version/s: 1.15 > Avoid npe in wmf > > >

[jira] [Reopened] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-2360: --- Reopening to discuss > Handle SentimentParser resource failure more robustly >

[jira] [Commented] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012940#comment-16012940 ] Tim Allison commented on TIKA-2360: --- Doh. Sorry. Should I revert anything? > Handle SentimentParser

[jira] [Resolved] (TIKA-2364) Clean up printstacktrace

2017-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2364. --- Resolution: Fixed Left a few in CLIs and in tika-core > Clean up printstacktrace >

[jira] [Updated] (TIKA-2364) Clean up printstacktrace

2017-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2364: -- Fix Version/s: 1.15 > Clean up printstacktrace > > > Key:

[jira] [Commented] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-16 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012930#comment-16012930 ] Chris A. Mattmann commented on TIKA-2360: - Tim, I didn't' get a chance at all to comment on this

[jira] [Created] (TIKA-2367) Avoid npe in wmf

2017-05-16 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2367: - Summary: Avoid npe in wmf Key: TIKA-2367 URL: https://issues.apache.org/jira/browse/TIKA-2367 Project: Tika Issue Type: Improvement Reporter: Tim

Re: Tika talk next week - help needed!

2017-05-16 Thread Bob Paulin
Quick slide on camel-tika. https://docs.google.com/presentation/d/1OUORiDwB4d0FkLZ0HIlQDLE30vvTniawdyzhQmLj1xE/edit?usp=sharing On 5/16/2017 10:31 AM, Nick Burch wrote: > On Tue, 16 May 2017, Eric Pugh wrote: >> It was great to read through >>

Re: Tika talk next week - help needed!

2017-05-16 Thread Nick Burch
On Tue, 16 May 2017, Eric Pugh wrote: It was great to read through http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf… Wow there is a lot in Tika. And I think that might be the one challenge with the talk structure, there is SOO much information. The

Re: Tika talk next week - help needed!

2017-05-16 Thread Eric Pugh
Nick, It was great to read through http://events.linuxfoundation.org/sites/events/files/slides/WhatsNewWithApacheTika_1.pdf… Wow there is a lot in Tika. And I think that might be the one challenge with the talk structure, there is SOO much information. I think I’d like to see “How does

[jira] [Commented] (TIKA-2364) Clean up printstacktrace

2017-05-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012555#comment-16012555 ] Hudson commented on TIKA-2364: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1267 (See

[jira] [Commented] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012557#comment-16012557 ] Hudson commented on TIKA-2360: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1267 (See

Re: Tika talk next week - help needed!

2017-05-16 Thread Thamme Gowda
Nick, Here are some pointers: 1. Image recognition using Tensorflow: https://wiki.apache.org/tika/TikaAndVision; Link to Paper: https://memex.jpl.nasa.gov/MFSEC17.pdf 2. Image Recognition using Deeplearning4j - https://wiki.apache.org/tika/TikaAndVisionDL4J 3. Sentiment Analysis using OpenNLP:

[jira] [Created] (TIKA-2366) Add image cropping functionality to TesseractOCRParser

2017-05-16 Thread Zachary Lee Jones (JIRA)
Zachary Lee Jones created TIKA-2366: --- Summary: Add image cropping functionality to TesseractOCRParser Key: TIKA-2366 URL: https://issues.apache.org/jira/browse/TIKA-2366 Project: Tika

Re: Tika talk next week - help needed!

2017-05-16 Thread Chris Mattmann
Yep, literally take a look at the Tika wiki – there are examples a plenty and even screen shots. Further, if you look at the MEMEX site under our new publications section, there are a few examples (like the ICMR paper on forensics) that show it in action.

Re: Tika talk next week - help needed!

2017-05-16 Thread Konstantin Gribov
IIRC, image and video labeling basic support was added (Chris & Thamme could you elaborate on that, please), TSD (TIKA-2309, time stamped data envelope format) support, slf4j migration (ongoing on 2.x branch). вт, 16 мая 2017 г. в 16:06, Allison, Timothy B. : > Doh! Sorry

RE: Tika talk next week - help needed!

2017-05-16 Thread Allison, Timothy B.
Doh! Sorry for the delay...might add configuration of EncodingDetectors, but that's probably too far into the weeds? -Original Message- From: Nick Burch [mailto:n...@apache.org] Sent: Sunday, May 14, 2017 11:34 AM To: dev@tika.apache.org Subject: Tika talk next week - help needed! Hi

[jira] [Updated] (TIKA-2365) Signer's Information doesn't match issue

2017-05-16 Thread Mujahid Ateeb Khan (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mujahid Ateeb Khan updated TIKA-2365: - Description: I'm working with Birt and tika birt uses a jar called org.apache.batik.pdf it

[jira] [Created] (TIKA-2365) Signer's Information doesn't match issue

2017-05-16 Thread Mujahid Ateeb Khan (JIRA)
Mujahid Ateeb Khan created TIKA-2365: Summary: Signer's Information doesn't match issue Key: TIKA-2365 URL: https://issues.apache.org/jira/browse/TIKA-2365 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012279#comment-16012279 ] Hudson commented on TIKA-2360: -- UNSTABLE: Integrated in Jenkins build Tika-trunk #1266 (See

[jira] [Commented] (TIKA-2363) Skip image recognition test if network call fails

2017-05-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012278#comment-16012278 ] Hudson commented on TIKA-2363: -- UNSTABLE: Integrated in Jenkins build Tika-trunk #1266 (See

[jira] [Commented] (TIKA-2362) Skipping Header and Footer data from documents

2017-05-16 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012245#comment-16012245 ] Thejan Wijesinghe commented on TIKA-2362: - Can't we use regular expressions to detect headers &

[jira] [Commented] (TIKA-2362) Skipping Header and Footer data from documents

2017-05-16 Thread Mujahid Ateeb Khan (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012244#comment-16012244 ] Mujahid Ateeb Khan commented on TIKA-2362: -- Is there any alternate way to skip headers and footers

[jira] [Created] (TIKA-2364) Clean up printstacktrace

2017-05-16 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2364: - Summary: Clean up printstacktrace Key: TIKA-2364 URL: https://issues.apache.org/jira/browse/TIKA-2364 Project: Tika Issue Type: Improvement Reporter:

[jira] [Commented] (TIKA-2362) Skipping Header and Footer data from documents

2017-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012238#comment-16012238 ] Tim Allison commented on TIKA-2362: --- There isn't, and it shouldn't be hard to add. Prob won't make it

[jira] [Comment Edited] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012235#comment-16012235 ] Tim Allison edited comment on TIKA-2360 at 5/16/17 12:20 PM: - I removed the

[jira] [Updated] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2360: -- Fix Version/s: 1.15 > Handle SentimentParser resource failure more robustly >

[jira] [Resolved] (TIKA-2360) Handle SentimentParser resource failure more robustly

2017-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2360. --- Resolution: Fixed I removed the SentimentParser from SPI, removed glob detection for .sent, and made

[jira] [Created] (TIKA-2363) Skip image recognition test if network call fails

2017-05-16 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2363: - Summary: Skip image recognition test if network call fails Key: TIKA-2363 URL: https://issues.apache.org/jira/browse/TIKA-2363 Project: Tika Issue Type:

[jira] [Commented] (TIKA-2361) Upgrade to PDFBox 2.0.6

2017-05-16 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16012202#comment-16012202 ] Hudson commented on TIKA-2361: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1265 (See

[jira] [Created] (TIKA-2362) Skipping Header and Footer data from documents

2017-05-16 Thread Mujahid Ateeb Khan (JIRA)
Mujahid Ateeb Khan created TIKA-2362: Summary: Skipping Header and Footer data from documents Key: TIKA-2362 URL: https://issues.apache.org/jira/browse/TIKA-2362 Project: Tika Issue

[jira] [Resolved] (TIKA-2361) Upgrade to PDFBox 2.0.6

2017-05-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2361. --- Resolution: Fixed Fix Version/s: 1.15 > Upgrade to PDFBox 2.0.6 > --- > >

[jira] [Commented] (TIKA-2298) To improve object recognition parser so that it may work without external RESTful service setup

2017-05-16 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011799#comment-16011799 ] ASF GitHub Bot commented on TIKA-2298: -- asmehra95 commented on issue #159: Creation of TIKA-2298