Re: [tesseract-ocr] Tesseract 4 for old languages

2018-06-12 Thread ShreeDevi Kumar
Please also see http://doc-creator.labri.fr/ which makes it easy to create synthetic data similar to manuscript pages. On Tue, Jun 12, 2018 at 9:03 PM ShreeDevi Kumar wrote: > Please see the project https://github.com/OCR-D/ocrd-train > > It has support for training tesseract if you provide

Re: [tesseract-ocr] Tesseract 4 for old languages

2018-06-12 Thread ShreeDevi Kumar
Please see the project https://github.com/OCR-D/ocrd-train It has support for training tesseract if you provide line images and matching ground truth text. On Tue, Jun 12, 2018 at 8:19 PM wrote: > Same question here. I see that the documentation on training Tesseract 4 > makes some reference

Re: [tesseract-ocr] Tesseract 4 for old languages

2018-06-12 Thread jbcamps
Same question here. I see that the documentation on training Tesseract 4 makes some reference to manuscripts: As with base Tesseract, there is a choice between rendering synthetic training data from fonts, or labeling some pre-existing images (like ancient manuscripts for example). So, if

[tesseract-ocr] Tesseract OCR quality issues with python

2018-06-12 Thread Vidur Malhotra
Hi, I tried running tesseract OCR on the same image using below 2 approach: 1. Command line (tesseract version 3.05.01) tesseract image.jpg out.txt 2. using pytesseract in python (pytesseract version 0.2.2) import PIL from PIL import Image import pytesseract text =

Re: [tesseract-ocr] Image DPI restriction

2018-06-12 Thread zbgns
Actual DPI is unknown as it depends on various factors (inter alia physical dimensions of taken object and distance you took the picture from). The easiest way to establish real DPI is to take photo of a ruler and count number of pixels on 1 inch distance. As an example, there is approximately

Re: [tesseract-ocr] Re: use multi threads in tesseract

2018-06-12 Thread ShreeDevi Kumar
Thank you for the info. The following link also has helpful info. https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.2/com.ibm.xlc131.aix.doc/compiler_ref/omp_thread_limit.html ShreeDevi भजन - कीर्तन - आरती @

[tesseract-ocr] Using tesseract to extract text from License/Voter ID/PAN Card

2018-06-12 Thread Vidur Malhotra
Has anybody developed a solution on the same? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this