[jira] [Commented] (TIKA-2334) Upgrade SQLite to 3.16.1

2017-04-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975950#comment-15975950 ] Hudson commented on TIKA-2334: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1243 (See

[jira] [Commented] (TIKA-2331) Upgrade RTFParser to allow configuration of max bytes per embedded object

2017-04-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975914#comment-15975914 ] Hudson commented on TIKA-2331: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1242 (See

[jira] [Commented] (TIKA-2330) Prevent preventable OOM in CompressorInputStream

2017-04-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975913#comment-15975913 ] Hudson commented on TIKA-2330: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1242 (See

[jira] [Updated] (TIKA-2334) Upgrade SQLite to 3.16.1

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2334: -- Fix Version/s: 1.15 > Upgrade SQLite to 3.16.1 > > > Key:

[jira] [Resolved] (TIKA-2334) Upgrade SQLite to 3.16.1

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2334. --- Resolution: Fixed > Upgrade SQLite to 3.16.1 > > > Key:

[jira] [Created] (TIKA-2334) Upgrade SQLite to 3.16.1

2017-04-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2334: - Summary: Upgrade SQLite to 3.16.1 Key: TIKA-2334 URL: https://issues.apache.org/jira/browse/TIKA-2334 Project: Tika Issue Type: Improvement Reporter:

[jira] [Resolved] (TIKA-2331) Upgrade RTFParser to allow configuration of max bytes per embedded object

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2331. --- Resolution: Fixed Fix Version/s: 1.15 > Upgrade RTFParser to allow configuration of max bytes

[jira] [Resolved] (TIKA-2330) Prevent preventable OOM in CompressorInputStream

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2330. --- Resolution: Fixed Fix Version/s: 1.15 Didn't touch 2.0 because we should have an updated

[jira] [Resolved] (TIKA-2333) Upgrade commons-compress to 1.13

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2333. --- Resolution: Fixed Fix Version/s: 1.15 > Upgrade commons-compress to 1.13 >

[jira] [Created] (TIKA-2333) Upgrade commons-compress to 1.13

2017-04-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2333: - Summary: Upgrade commons-compress to 1.13 Key: TIKA-2333 URL: https://issues.apache.org/jira/browse/TIKA-2333 Project: Tika Issue Type: Sub-task

[jira] [Created] (TIKA-2332) Output SNOMED codes for CUIs in CTAKES output?

2017-04-19 Thread Dillon Welch (JIRA)
Dillon Welch created TIKA-2332: -- Summary: Output SNOMED codes for CUIs in CTAKES output? Key: TIKA-2332 URL: https://issues.apache.org/jira/browse/TIKA-2332 Project: Tika Issue Type: New

Re: Regarding Image Captioning in Tika for Image MIME Types

2017-04-19 Thread Kranthi Kiran G V
Hello mentors, I have released a trained model of the neural image captioning system, im2txt. It can be found here: https://github.com/KranthiGV/Pretrained-Show-and-Tell-model I am hopeful it would benefit both the researchers community and Apache Tika's community for the image captioning. Have

[jira] [Created] (TIKA-2331) Upgrade RTFParser to allow configuration of max bytes per embedded object

2017-04-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2331: - Summary: Upgrade RTFParser to allow configuration of max bytes per embedded object Key: TIKA-2331 URL: https://issues.apache.org/jira/browse/TIKA-2331 Project: Tika

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975470#comment-15975470 ] ASF GitHub Bot commented on TIKA-2306: -- KranthiGV commented on issue #163: TIKA-2306: Update Inception

[jira] [Commented] (TIKA-1631) OutOfMemoryException in ZipContainerDetector

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975460#comment-15975460 ] Tim Allison commented on TIKA-1631: --- This is now a subset of TIKA-2330. > OutOfMemoryException in

[jira] [Created] (TIKA-2330) Prevent preventable OOM in CompressorInputStream

2017-04-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2330: - Summary: Prevent preventable OOM in CompressorInputStream Key: TIKA-2330 URL: https://issues.apache.org/jira/browse/TIKA-2330 Project: Tika Issue Type:

[jira] [Commented] (TIKA-2328) HtmlParser fails when DOCTYPE has unbalanced quotes

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975418#comment-15975418 ] Tim Allison commented on TIKA-2328: --- Y, thank you for this data point! My initial comparisons suggested

[jira] [Commented] (TIKA-2328) HtmlParser fails when DOCTYPE has unbalanced quotes

2017-04-19 Thread Shai Erera (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975402#comment-15975402 ] Shai Erera commented on TIKA-2328: -- Thanks [~talli...@apache.org]. I did look at that issue before, so

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975376#comment-15975376 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975255#comment-15975255 ] ASF GitHub Bot commented on TIKA-2306: -- tballison commented on issue #163: TIKA-2306: Update Inception

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975246#comment-15975246 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975243#comment-15975243 ] ASF GitHub Bot commented on TIKA-2306: -- tballison commented on issue #163: TIKA-2306: Update Inception

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975241#comment-15975241 ] ASF GitHub Bot commented on TIKA-2306: -- tballison commented on issue #163: TIKA-2306: Update Inception

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975147#comment-15975147 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975139#comment-15975139 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975132#comment-15975132 ] ASF GitHub Bot commented on TIKA-2306: -- tballison commented on issue #163: TIKA-2306: Update Inception

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975128#comment-15975128 ] ASF GitHub Bot commented on TIKA-2306: -- tballison commented on issue #163: TIKA-2306: Update Inception

[jira] [Commented] (TIKA-1631) OutOfMemoryException in ZipContainerDetector

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975126#comment-15975126 ] Tim Allison commented on TIKA-1631: --- bq. And after that, maybe I could contribute a new ForkParser

[jira] [Commented] (TIKA-2306) Update Inception v3 to Inception v4 in Object recognition parser

2017-04-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975112#comment-15975112 ] ASF GitHub Bot commented on TIKA-2306: -- thammegowda commented on issue #163: TIKA-2306: Update

Re: Improving Tika OCR

2017-04-19 Thread Thamme Gowda
Hi Kranthi, Thanks for updating us. I believe in the long run both of these two models may co-exist (tesseract for flat-bench scanner images with perfect lighting conditions, VGG models for natural images taken by cellphone/digital cameras with weird orientations and lighting conditions). I

[jira] [Commented] (TIKA-1631) OutOfMemoryException in ZipContainerDetector

2017-04-19 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975097#comment-15975097 ] Luis Filipe Nassif commented on TIKA-1631: -- {quote} If at all possible, consider running Tika

[jira] [Commented] (TIKA-1631) OutOfMemoryException in ZipContainerDetector

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15975033#comment-15975033 ] Tim Allison commented on TIKA-1631: --- bq. Out of curiosity, an user reported an OOME that stopped our

[jira] [Comment Edited] (TIKA-1631) OutOfMemoryException in ZipContainerDetector

2017-04-19 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974956#comment-15974956 ] Luis Filipe Nassif edited comment on TIKA-1631 at 4/19/17 4:07 PM: --- As we

[jira] [Commented] (TIKA-2329) Upgrade to POI 3.16-final

2017-04-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974973#comment-15974973 ] Hudson commented on TIKA-2329: -- SUCCESS: Integrated in Jenkins build tika-2.x-windows #198 (See

[jira] [Commented] (TIKA-1195) XLSB support

2017-04-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974972#comment-15974972 ] Hudson commented on TIKA-1195: -- SUCCESS: Integrated in Jenkins build tika-2.x-windows #198 (See

[jira] [Commented] (TIKA-2329) Upgrade to POI 3.16-final

2017-04-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974964#comment-15974964 ] Hudson commented on TIKA-2329: -- SUCCESS: Integrated in Jenkins build tika-2.x #244 (See

[jira] [Commented] (TIKA-1631) OutOfMemoryException in ZipContainerDetector

2017-04-19 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974956#comment-15974956 ] Luis Filipe Nassif commented on TIKA-1631: -- As we have already copied some code from Compress and

[jira] [Resolved] (TIKA-1195) XLSB support

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1195. --- Resolution: Fixed Only 3.5 years later... > XLSB support > > > Key:

[jira] [Updated] (TIKA-1195) XLSB support

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1195: -- Fix Version/s: 1.15 2.0 > XLSB support > > > Key:

[jira] [Resolved] (TIKA-2329) Upgrade to POI 3.16-final

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2329. --- Resolution: Fixed Fix Version/s: 1.15 2.0 > Upgrade to POI 3.16-final >

[jira] [Commented] (TIKA-2329) Upgrade to POI 3.16-final

2017-04-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974824#comment-15974824 ] Hudson commented on TIKA-2329: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1240 (See

[jira] [Commented] (TIKA-1195) XLSB support

2017-04-19 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974823#comment-15974823 ] Hudson commented on TIKA-1195: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1240 (See

[jira] [Commented] (TIKA-1631) OutOfMemoryException in ZipContainerDetector

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974755#comment-15974755 ] Tim Allison commented on TIKA-1631: --- bq. But can we fix at Tika side without Compress help? We can do

[jira] [Commented] (TIKA-1631) OutOfMemoryException in ZipContainerDetector

2017-04-19 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974666#comment-15974666 ] Luis Filipe Nassif commented on TIKA-1631: -- Hi [~talli...@apache.org], currently no. We have hit

Re: Improving Tika OCR

2017-04-19 Thread Kranthi Kiran G V
Hello community, I have successfully tested Tesseract 4.0 on various images of different sizes, orientation and lightening conditions. I would, in the next few days, publish the results on a blog for you to have a look at. Although I'm able to reliably measure the clock time, accuracy, etc, I am

[jira] [Created] (TIKA-2329) Upgrade to POI 3.16-final

2017-04-19 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2329: - Summary: Upgrade to POI 3.16-final Key: TIKA-2329 URL: https://issues.apache.org/jira/browse/TIKA-2329 Project: Tika Issue Type: Improvement Reporter:

[jira] [Comment Edited] (TIKA-1631) OutOfMemoryException in ZipContainerDetector

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974533#comment-15974533 ] Tim Allison edited comment on TIKA-1631 at 4/19/17 12:28 PM: - [~lfcnassif], I'd

[jira] [Commented] (TIKA-2328) HtmlParser fails when DOCTYPE has unbalanced quotes

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974545#comment-15974545 ] Tim Allison commented on TIKA-2328: --- [~shaie] thank you for raising this. Y, I don't think there's much

[jira] [Commented] (TIKA-1631) OutOfMemoryException in ZipContainerDetector

2017-04-19 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974533#comment-15974533 ] Tim Allison commented on TIKA-1631: --- [~lfcnassif], I'd like to add a temporary hack for OOM protections

Re: 1.15?

2017-04-19 Thread Luís Filipe Nassif
+1 from me, there are so many fixes and improvements! Best, Luis Em 18 de abr de 2017 03:13, "Oleg Tikhonov" escreveu: > +1 for the release. > > On Mon, Apr 17, 2017 at 8:39 PM, David Meikle wrote: > > > +1 from me too. > > > > Cheers, > > Dave > > > > On 13