[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-05-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832676#comment-16832676 ] Hudson commented on TIKA-2749: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #185 (See

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-05-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832670#comment-16832670 ] Hudson commented on TIKA-2749: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1654 (See

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-05-03 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832646#comment-16832646 ] Hudson commented on TIKA-2749: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #409 (See

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-05-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832550#comment-16832550 ] Tim Allison commented on TIKA-2749: --- As a first step for 1.21, I've added "AUTO" as a new OCRStrategy

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-07 Thread Ross Johnson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811963#comment-16811963 ] Ross Johnson commented on TIKA-2749: [~talli...@apache.org] First, to make sure we're on the same

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810256#comment-16810256 ] Tim Allison commented on TIKA-2749: --- [~rossj], this is very helpful...any recs on how to detect "not a

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-04 Thread Ross Johnson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810172#comment-16810172 ] Ross Johnson commented on TIKA-2749: OCRing the inlined images directly can be tricky, in my

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810071#comment-16810071 ] Tim Allison commented on TIKA-2749: --- Thank you, [~tilman].  Fixed. > OCR on PDFs should "just work" out

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-04 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810059#comment-16810059 ] Tilman Hausherr commented on TIKA-2749: --- You probably mean "vector graphics". > OCR on PDFs should

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809855#comment-16809855 ] Tim Allison commented on TIKA-2749: --- There are several reasons why one might want to run OCR on a PDF

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808791#comment-16808791 ] Tim Allison commented on TIKA-2749: --- Thank you, [~tilman]! > OCR on PDFs should "just work" out of the

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-03 Thread Tilman Hausherr (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808400#comment-16808400 ] Tilman Hausherr commented on TIKA-2749: --- See the accepted answer here:

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-04-02 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807661#comment-16807661 ] Tim Allison commented on TIKA-2749: --- A recent question on the user list has me returning to something I

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-01-03 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733501#comment-16733501 ] Tim Allison commented on TIKA-2749: --- On caching, that would be neat, but I worry that that will be

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-01-02 Thread Markus Mandalka (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732043#comment-16732043 ] Markus Mandalka commented on TIKA-2749: --- Another nice thing would be to cache OCR results of images

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2019-01-02 Thread Markus Mandalka (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732035#comment-16732035 ] Markus Mandalka commented on TIKA-2749: --- Some ideas/experience/wishes from my side for development

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-11-22 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16696034#comment-16696034 ] Luis Filipe Nassif commented on TIKA-2749: -- I don't do that. I thought you questioned if doing

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-11-22 Thread Rick Leir (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695964#comment-16695964 ] Rick Leir commented on TIKA-2749: - Luis, Tesseract accepts TIFF and JPEG, so why convert it to a PDF?

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-11-21 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695351#comment-16695351 ] Luis Filipe Nassif commented on TIKA-2749: -- Hi [~rleir]. Sorry, I meant our main goal when OCRing

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-11-21 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695001#comment-16695001 ] Tim Allison commented on TIKA-2749: --- bq. Note: I have no need for OCR recently, so this is just talk

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-11-21 Thread Rick Leir (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694939#comment-16694939 ] Rick Leir commented on TIKA-2749: - Hi Tim [~talli...@apache.org] Yes, the "just work" goal is great.

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-11-21 Thread Rick Leir (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694911#comment-16694911 ] Rick Leir commented on TIKA-2749: - Hi Luis [~lfcnassif] Your main goal is "to ocr scanned docs". Can I

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-10-06 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640724#comment-16640724 ] Luis Filipe Nassif commented on TIKA-2749: -- Hi [~talli...@apache.org], Yes, currently we run ocr

[jira] [Commented] (TIKA-2749) OCR on PDFs should "just work" out of the box

2018-10-04 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638183#comment-16638183 ] Tim Allison commented on TIKA-2749: --- The two basic options (see our [wiki on OCR and