[jira] [Commented] (TIKA-93) OCR support

2014-09-19 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140649#comment-14140649 ] Grant Ingersoll commented on TIKA-93: - Very cool! Thanks for following through on this!

[jira] [Moved] (TIKA-2573) Map Data Extraction/OCR

2018-02-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll moved SOLR-11963 to TIKA-2573: -- Security: (was: Public) Workflow: classic default workflow (was:

[jira] [Commented] (TIKA-93) OCR support

2014-02-07 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894779#comment-13894779 ] Grant Ingersoll commented on TIKA-93: - I'm noodling around with producing a patch for thi

[jira] [Commented] (TIKA-93) OCR support

2014-02-07 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895054#comment-13895054 ] Grant Ingersoll commented on TIKA-93: - bq. Is anyone aware of anything in PDFBox that al

[jira] [Commented] (TIKA-93) OCR support

2014-02-07 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895080#comment-13895080 ] Grant Ingersoll commented on TIKA-93: - I thought about the Parser approach, but it doesn'

[jira] [Commented] (TIKA-93) OCR support

2014-02-07 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895205#comment-13895205 ] Grant Ingersoll commented on TIKA-93: - Chris, are Parsers composable? If it is a Parser,

[jira] [Commented] (TIKA-93) OCR support

2014-02-07 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895241#comment-13895241 ] Grant Ingersoll commented on TIKA-93: - Food for thought: We introduce OCRParser that ext

[jira] [Commented] (TIKA-93) OCR support

2014-02-07 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895276#comment-13895276 ] Grant Ingersoll commented on TIKA-93: - Well, Tesseract is out, at least as far as using T

[jira] [Commented] (TIKA-93) OCR support

2014-02-08 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895514#comment-13895514 ] Grant Ingersoll commented on TIKA-93: - It can, via some ancient JavaIO stuff, which, in s

[jira] [Updated] (TIKA-93) OCR support

2014-02-08 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated TIKA-93: Attachment: TIKA-93.patch Here is a _very_ early stage patch that creates a JavaOCR parser. It is not

[jira] [Updated] (TIKA-93) OCR support

2014-02-08 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated TIKA-93: Attachment: TIKA-93.patch Tests for the JavaOCRParser. Next step is to start integrating into various

[jira] [Updated] (TIKA-93) OCR support

2014-02-08 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated TIKA-93: Attachment: TIKA-93.patch This shows what I am thinking for integration with PDFParser. Not sure if i

[jira] [Commented] (TIKA-93) OCR support

2014-02-08 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895718#comment-13895718 ] Grant Ingersoll commented on TIKA-93: - bq. what is the dependency on jacoco in tika-paren

[jira] [Commented] (TIKA-93) OCR support

2014-02-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895886#comment-13895886 ] Grant Ingersoll commented on TIKA-93: - FYI: http://roncemer.com/software-development/java

[jira] [Commented] (TIKA-93) OCR support

2014-02-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895887#comment-13895887 ] Grant Ingersoll commented on TIKA-93: - Not sure I am happy w/ the changes here yet, esp.

[jira] [Updated] (TIKA-93) OCR support

2014-02-09 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated TIKA-93: Attachment: testOCR.pptx testOCR.pdf testOCR.docx TIKA-

[jira] [Commented] (TIKA-93) OCR support

2014-02-10 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896506#comment-13896506 ] Grant Ingersoll commented on TIKA-93: - bq. changed to the Tesseract exec approach Can yo

[jira] [Commented] (TIKA-93) OCR support

2014-02-17 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903202#comment-13903202 ] Grant Ingersoll commented on TIKA-93: - Frank, no, 1.5 is due out soon (already?) and this

[jira] Created: (TIKA-433) Tika + Hadoop

2010-05-25 Thread Grant Ingersoll (JIRA)
Tika + Hadoop - Key: TIKA-433 URL: https://issues.apache.org/jira/browse/TIKA-433 Project: Tika Issue Type: New Feature Components: general Reporter: Grant Ingersoll Priority: Minor Would be gr

[jira] Commented: (TIKA-433) Tika + Hadoop

2010-05-26 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871616#action_12871616 ] Grant Ingersoll commented on TIKA-433: -- Does that mean you are going to extract it from

[jira] Commented: (TIKA-433) Tika + Hadoop

2010-05-26 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871720#action_12871720 ] Grant Ingersoll commented on TIKA-433: -- I think it makes sense as a Tika contrib, but th

[jira] Commented: (TIKA-488) Add alternative search provider on site

2010-08-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900744#action_12900744 ] Grant Ingersoll commented on TIKA-488: -- Yeah, I'd agree, a year is a bit much. I think

[jira] Commented: (TIKA-433) Tika + Hadoop

2010-10-03 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917390#action_12917390 ] Grant Ingersoll commented on TIKA-433: -- I've taken this offline and am going to put it u

[jira] Created: (TIKA-554) ParseUtils.getStringContent needs an option to set the write limit that can be passed into the BodyContentHandler

2010-11-18 Thread Grant Ingersoll (JIRA)
ParseUtils.getStringContent needs an option to set the write limit that can be passed into the BodyContentHandler - Key: TIKA-554 URL: https://issues.a

[jira] Updated: (TIKA-554) ParseUtils.getStringContent needs an option to set the write limit that can be passed into the BodyContentHandler

2010-11-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated TIKA-554: - Attachment: TIKA-554.patch No tests, but it is pretty straightforward. > ParseUtils.getStringConte

[jira] Created: (TIKA-568) Language Detection isReasonablyCertain() hides valuable information

2010-12-05 Thread Grant Ingersoll (JIRA)
Language Detection isReasonablyCertain() hides valuable information --- Key: TIKA-568 URL: https://issues.apache.org/jira/browse/TIKA-568 Project: Tika Issue Type: Improvement

[jira] Updated: (TIKA-568) Language Detection isReasonablyCertain() hides valuable information

2010-12-05 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated TIKA-568: - Attachment: TIKA-568.patch Adds a getDistance method > Language Detection isReasonablyCertain() hi