[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490028#comment-16490028 ] Hudson commented on TIKA-2520: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1491 (See

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489993#comment-16489993 ] Hudson commented on TIKA-2520: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #257 (See

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489927#comment-16489927 ] ASF GitHub Bot commented on TIKA-2520: -- chrismattmann commented on issue #237: TIKA-2520 optimize

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489926#comment-16489926 ] Chris A. Mattmann commented on TIKA-2520: - Integrated into 2.x master too: {noformat} [INFO]

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489911#comment-16489911 ] Hudson commented on TIKA-2520: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #30 (See

[jira] [Comment Edited] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489790#comment-16489790 ] Chris A. Mattmann edited comment on TIKA-2520 at 5/24/18 8:56 PM: --

[jira] [Resolved] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann resolved TIKA-2520. - Resolution: Fixed Fix Version/s: 1.19 {noformat} nonas:tika2.0.0 mattmann$ git

[jira] [Assigned] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned TIKA-2520: --- Assignee: Chris A. Mattmann > OptimaizeLangDetector#loadModels() should not be called

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489788#comment-16489788 ] ASF GitHub Bot commented on TIKA-2520: -- chrismattmann closed pull request #237: TIKA-2520 optimize

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489789#comment-16489789 ] ASF GitHub Bot commented on TIKA-2520: -- chrismattmann commented on issue #237: TIKA-2520 optimize

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489780#comment-16489780 ] ASF GitHub Bot commented on TIKA-2520: -- kkrugler commented on issue #237: TIKA-2520 optimize

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489750#comment-16489750 ] ASF GitHub Bot commented on TIKA-2520: -- mbaechler commented on issue #237: TIKA-2520 optimize

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489731#comment-16489731 ] ASF GitHub Bot commented on TIKA-2520: -- chrismattmann commented on issue #237: TIKA-2520 optimize

[jira] [Created] (TIKA-2651) tika-translate jar contains duplicate classes from tika-core jar

2018-05-24 Thread Ewan Mellor (JIRA)
Ewan Mellor created TIKA-2651: - Summary: tika-translate jar contains duplicate classes from tika-core jar Key: TIKA-2651 URL: https://issues.apache.org/jira/browse/TIKA-2651 Project: Tika Issue

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489662#comment-16489662 ] ASF GitHub Bot commented on TIKA-2520: -- kkrugler commented on issue #237: TIKA-2520 optimize

[jira] [Commented] (TIKA-2650) Soft-hyphen is not extracted properly

2018-05-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489606#comment-16489606 ] Tim Allison commented on TIKA-2650: --- Can you share with us exactly where the soft-hyphen isn't working?

Re: Branch_1x build broke?

2018-05-24 Thread Tim Allison
Y, you're probably running a different version of tesseract than I was running and getting different (worse) text out during ocr. I guess we could add an or 'dehaystack'? On Thu, May 24, 2018 at 12:09 PM, Chris Mattmann wrote: > Tim, > > > > Are you seeing this? > > > >

Re: Branch_1x build broke?

2018-05-24 Thread Chris Mattmann
Thanks Dave, yes I have tesseract enabled and this is on my Mac Book. Thanks for looking into it Daveā€¦ Cheers, Chris From: "loo...@gmail.com" Reply-To: "dev@tika.apache.org" Date: Thursday, May 24, 2018 at 11:34 AM To: "dev@tika.apache.org"

Re: Branch_1x build broke?

2018-05-24 Thread loompa
Hey Chris, This is happening to me with Tesseract enabled but only on my MacBook. Are you running this on OSX? Been trying to get some time to dig into it as it works perfectly on my Windows and Linux setups. Cheers, Dave On Thu, 24 May 2018, 17:09 Chris Mattmann,

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489455#comment-16489455 ] ASF GitHub Bot commented on TIKA-2520: -- kkrugler commented on issue #237: TIKA-2520 optimize

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489453#comment-16489453 ] ASF GitHub Bot commented on TIKA-2520: -- kkrugler commented on issue #237: TIKA-2520 optimize

Branch_1x build broke?

2018-05-24 Thread Chris Mattmann
Tim, Are you seeing this? Results : Failed tests: PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103 pdf_haystack not found in: http://www.w3.org/1999/xhtml;>

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489288#comment-16489288 ] ASF GitHub Bot commented on TIKA-2520: -- mbaechler commented on issue #237: TIKA-2520 optimize

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489274#comment-16489274 ] ASF GitHub Bot commented on TIKA-2520: -- chrismattmann commented on issue #237: TIKA-2520 optimize

[jira] [Commented] (TIKA-2520) OptimaizeLangDetector#loadModels() should not be called for every single langdetect HTTP request

2018-05-24 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489163#comment-16489163 ] ASF GitHub Bot commented on TIKA-2520: -- mbaechler opened a new pull request #237: TIKA-2520 optimize

[jira] [Created] (TIKA-2650) Soft-hyphen is not extracted properly

2018-05-24 Thread Saurabh Patil (JIRA)
Saurabh Patil created TIKA-2650: --- Summary: Soft-hyphen is not extracted properly Key: TIKA-2650 URL: https://issues.apache.org/jira/browse/TIKA-2650 Project: Tika Issue Type: Bug

[jira] [Commented] (TIKA-2100) Html Parser does not keep the html tag attributes

2018-05-24 Thread Gerard Bouchar (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489082#comment-16489082 ] Gerard Bouchar commented on TIKA-2100: -- Is there something that could be done to fix this ? > Html

[jira] [Commented] (TIKA-2648) mime detection based on resource name detects resources as "text/x-php" instead of "text/html"

2018-05-24 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489027#comment-16489027 ] Sebastian Nagel commented on TIKA-2648: --- Thanks, [~gbouchar]! That needs to be reviewed by the Tika

[jira] [Comment Edited] (TIKA-2648) mime detection based on resource name detects resources as "text/x-php" instead of "text/html"

2018-05-24 Thread Gerard Bouchar (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487578#comment-16487578 ] Gerard Bouchar edited comment on TIKA-2648 at 5/24/18 12:20 PM:

[jira] [Closed] (TIKA-2649) Tika server starts at localhost:9998 and I'm unable to make REST calls to it from other servers

2018-05-24 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison closed TIKA-2649. - Resolution: Not A Problem Please ask "use" questions at u...@tika.apache.org. You may want to check out