[GitHub] tika pull request #132: Remove unused variable

2016-09-16 Thread haisi
GitHub user haisi opened a pull request: https://github.com/apache/tika/pull/132 Remove unused variable You can merge this pull request into a Git repository by running: $ git pull https://github.com/haisi/tika master Alternatively you can review and apply these changes as

[GitHub] tika pull request #132: Remove unused variable

2016-09-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/tika/pull/132 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

RE: [GitHub] tika pull request #132: Remove unused variable

2016-09-16 Thread Allison, Timothy B.
Thank you haisi! -Original Message- From: asfgit [mailto:g...@git.apache.org] Sent: Friday, September 16, 2016 6:59 AM To: dev@tika.apache.org Subject: [GitHub] tika pull request #132: Remove unused variable Github user asfgit closed the pull request at:

RE: Query on correct use of 'fileUrl' in TikaJAXRS Server to extract document at remote url - my request is not working

2016-09-16 Thread Allison, Timothy B.
I may have time to do this. I would definitely have time to review a patch ;). Please open a ticket on our Jira. -Original Message- From: John Dougrez-Lewis [mailto:jle...@lightblue.com] Sent: Friday, September 16, 2016 1:39 AM To: dev@tika.apache.org Subject: RE: Query on correct use

[jira] [Commented] (TIKA-2058) Memory Leak in Tika version 1.13 when parsing millions of files

2016-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496190#comment-15496190 ] Tim Allison commented on TIKA-2058: --- [~comcortim], how'd the re-run go with the updated POI? > Memory

tika-2.x - Build # 143 - Failure

2016-09-16 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x (build #143) Status: Failure Check console output at https://builds.apache.org/job/tika-2.x/143/ to view the results.

[jira] [Commented] (TIKA-2058) Memory Leak in Tika version 1.13 when parsing millions of files

2016-09-16 Thread Tim Barrett (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496357#comment-15496357 ] Tim Barrett commented on TIKA-2058: --- I have that large input set available for the foreseeable future, so

[jira] [Commented] (TIKA-2058) Memory Leak in Tika version 1.13 when parsing millions of files

2016-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496351#comment-15496351 ] Tim Allison commented on TIKA-2058: --- Great!!! Thank you, [~lfcnassif], for figuring this out! Y, I'd

[jira] [Commented] (TIKA-2058) Memory Leak in Tika version 1.13 when parsing millions of files

2016-09-16 Thread Tim Barrett (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496297#comment-15496297 ] Tim Barrett commented on TIKA-2058: --- It completed an hour ago without any OOM problems, so very good.

tika-2.x-windows - Build # 47 - Still Failing

2016-09-16 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #47) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/47/ to view the results.

Re: PDF with embedded attachments and Tika 2.0 modularity

2016-09-16 Thread Sergey Beryozkin
Hi Bob Thanks for all the info, much appreciated. I agree it makes sense to start with the multimedia bundle to increase the format coverage. I'll keep experimenting. The demo is quite basic as far as Tika is concerned - I've only tried PDF/ODT/ODP files and looks like they are really simple

[jira] [Closed] (TIKA-2080) PDFParser tika-parsers-1.13.jar not parsing Japanese and Chinese Characters correctly

2016-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison closed TIKA-2080. - Resolution: Not A Problem > PDFParser tika-parsers-1.13.jar not parsing Japanese and Chinese Characters >

Re: PDF with embedded attachments and Tika 2.0 modularity

2016-09-16 Thread Bob Paulin
Hi Sergey, On 9/15/2016 3:33 PM, Sergey Beryozkin wrote: Hi Bob, Tim, All, On 15/09/16 18:06, Bob Paulin wrote: Hi Sergey, I definitely get the challenges. In fact recently we merged the PDF module into the Multimedia module due to the tight coupling around the TesseractOCR[1] [2]. We

[jira] [Commented] (TIKA-2058) Memory Leak in Tika version 1.13 when parsing millions of files

2016-09-16 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496844#comment-15496844 ] Tim Allison commented on TIKA-2058: --- Great! Please let us know what else you find. As a [public service

[jira] [Commented] (TIKA-2058) Memory Leak in Tika version 1.13 when parsing millions of files

2016-09-16 Thread Tim Barrett (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15496959#comment-15496959 ] Tim Barrett commented on TIKA-2058: --- Thanks for being super responsive. I especially appreciated the fact