[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140649#comment-14140649
]
Grant Ingersoll commented on TIKA-93:
-
Very cool! Thanks for following through on this!
[
https://issues.apache.org/jira/browse/TIKA-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll moved SOLR-11963 to TIKA-2573:
--
Security: (was: Public)
Workflow: classic default workflow (was:
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894779#comment-13894779
]
Grant Ingersoll commented on TIKA-93:
-
I'm noodling around with producing a patch for thi
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895054#comment-13895054
]
Grant Ingersoll commented on TIKA-93:
-
bq. Is anyone aware of anything in PDFBox that al
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895080#comment-13895080
]
Grant Ingersoll commented on TIKA-93:
-
I thought about the Parser approach, but it doesn'
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895205#comment-13895205
]
Grant Ingersoll commented on TIKA-93:
-
Chris, are Parsers composable? If it is a Parser,
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895241#comment-13895241
]
Grant Ingersoll commented on TIKA-93:
-
Food for thought:
We introduce OCRParser that ext
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895276#comment-13895276
]
Grant Ingersoll commented on TIKA-93:
-
Well, Tesseract is out, at least as far as using T
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895514#comment-13895514
]
Grant Ingersoll commented on TIKA-93:
-
It can, via some ancient JavaIO stuff, which, in s
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated TIKA-93:
Attachment: TIKA-93.patch
Here is a _very_ early stage patch that creates a JavaOCR parser. It is not
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated TIKA-93:
Attachment: TIKA-93.patch
Tests for the JavaOCRParser. Next step is to start integrating into various
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated TIKA-93:
Attachment: TIKA-93.patch
This shows what I am thinking for integration with PDFParser. Not sure if i
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895718#comment-13895718
]
Grant Ingersoll commented on TIKA-93:
-
bq. what is the dependency on jacoco in tika-paren
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895886#comment-13895886
]
Grant Ingersoll commented on TIKA-93:
-
FYI: http://roncemer.com/software-development/java
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895887#comment-13895887
]
Grant Ingersoll commented on TIKA-93:
-
Not sure I am happy w/ the changes here yet, esp.
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated TIKA-93:
Attachment: testOCR.pptx
testOCR.pdf
testOCR.docx
TIKA-
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896506#comment-13896506
]
Grant Ingersoll commented on TIKA-93:
-
bq. changed to the Tesseract exec approach
Can yo
[
https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903202#comment-13903202
]
Grant Ingersoll commented on TIKA-93:
-
Frank, no, 1.5 is due out soon (already?) and this
Tika + Hadoop
-
Key: TIKA-433
URL: https://issues.apache.org/jira/browse/TIKA-433
Project: Tika
Issue Type: New Feature
Components: general
Reporter: Grant Ingersoll
Priority: Minor
Would be gr
[
https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871616#action_12871616
]
Grant Ingersoll commented on TIKA-433:
--
Does that mean you are going to extract it from
[
https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871720#action_12871720
]
Grant Ingersoll commented on TIKA-433:
--
I think it makes sense as a Tika contrib, but th
[
https://issues.apache.org/jira/browse/TIKA-488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900744#action_12900744
]
Grant Ingersoll commented on TIKA-488:
--
Yeah, I'd agree, a year is a bit much. I think
[
https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917390#action_12917390
]
Grant Ingersoll commented on TIKA-433:
--
I've taken this offline and am going to put it u
ParseUtils.getStringContent needs an option to set the write limit that can be
passed into the BodyContentHandler
-
Key: TIKA-554
URL: https://issues.a
[
https://issues.apache.org/jira/browse/TIKA-554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated TIKA-554:
-
Attachment: TIKA-554.patch
No tests, but it is pretty straightforward.
> ParseUtils.getStringConte
Language Detection isReasonablyCertain() hides valuable information
---
Key: TIKA-568
URL: https://issues.apache.org/jira/browse/TIKA-568
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated TIKA-568:
-
Attachment: TIKA-568.patch
Adds a getDistance method
> Language Detection isReasonablyCertain() hi
27 matches
Mail list logo