[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832676#comment-16832676
]
Hudson commented on TIKA-2749:
--
SUCCESS: Integrated in Jenkins build tika-branch-1x #185 (See
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832670#comment-16832670
]
Hudson commented on TIKA-2749:
--
SUCCESS: Integrated in Jenkins build Tika-trunk #1654 (See
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832646#comment-16832646
]
Hudson commented on TIKA-2749:
--
UNSTABLE: Integrated in Jenkins build tika-2.x-windows #409 (See
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832550#comment-16832550
]
Tim Allison commented on TIKA-2749:
---
As a first step for 1.21, I've added "AUTO" as a new OCRStrategy
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811963#comment-16811963
]
Ross Johnson commented on TIKA-2749:
[~talli...@apache.org]
First, to make sure we're on the same
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810256#comment-16810256
]
Tim Allison commented on TIKA-2749:
---
[~rossj], this is very helpful...any recs on how to detect "not a
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810172#comment-16810172
]
Ross Johnson commented on TIKA-2749:
OCRing the inlined images directly can be tricky, in my
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810071#comment-16810071
]
Tim Allison commented on TIKA-2749:
---
Thank you, [~tilman]. Fixed.
> OCR on PDFs should "just work" out
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810059#comment-16810059
]
Tilman Hausherr commented on TIKA-2749:
---
You probably mean "vector graphics".
> OCR on PDFs should
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809855#comment-16809855
]
Tim Allison commented on TIKA-2749:
---
There are several reasons why one might want to run OCR on a PDF
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808791#comment-16808791
]
Tim Allison commented on TIKA-2749:
---
Thank you, [~tilman]!
> OCR on PDFs should "just work" out of the
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808400#comment-16808400
]
Tilman Hausherr commented on TIKA-2749:
---
See the accepted answer here:
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807661#comment-16807661
]
Tim Allison commented on TIKA-2749:
---
A recent question on the user list has me returning to something I
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733501#comment-16733501
]
Tim Allison commented on TIKA-2749:
---
On caching, that would be neat, but I worry that that will be
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732043#comment-16732043
]
Markus Mandalka commented on TIKA-2749:
---
Another nice thing would be to cache OCR results of images
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732035#comment-16732035
]
Markus Mandalka commented on TIKA-2749:
---
Some ideas/experience/wishes from my side for development
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696034#comment-16696034
]
Luis Filipe Nassif commented on TIKA-2749:
--
I don't do that. I thought you questioned if doing
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695964#comment-16695964
]
Rick Leir commented on TIKA-2749:
-
Luis, Tesseract accepts TIFF and JPEG, so why convert it to a PDF?
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695351#comment-16695351
]
Luis Filipe Nassif commented on TIKA-2749:
--
Hi [~rleir]. Sorry, I meant our main goal when OCRing
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695001#comment-16695001
]
Tim Allison commented on TIKA-2749:
---
bq. Note: I have no need for OCR recently, so this is just talk
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694939#comment-16694939
]
Rick Leir commented on TIKA-2749:
-
Hi Tim [~talli...@apache.org]
Yes, the "just work" goal is great.
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694911#comment-16694911
]
Rick Leir commented on TIKA-2749:
-
Hi Luis [~lfcnassif]
Your main goal is "to ocr scanned docs". Can I
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640724#comment-16640724
]
Luis Filipe Nassif commented on TIKA-2749:
--
Hi [~talli...@apache.org],
Yes, currently we run ocr
[
https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638183#comment-16638183
]
Tim Allison commented on TIKA-2749:
---
The two basic options (see our [wiki on OCR and
24 matches
Mail list logo