Re: [tesseract-ocr] does it make sense to train existing languages? how to fix repeatedly wrong letters?

2018-04-02 Thread ShreeDevi Kumar
My suggestion would be to do post processing of the OCR output. On Mon 2 Apr, 2018, 6:09 PM JP T, wrote: > Hi > > I don't really got an understanding of the consequences of training. > > My problem: > I've got tons of pages with a special format. ("one place study"

[tesseract-ocr] in the script data directory , script data of English is Latin.traineddata ?

2018-04-02 Thread notoriousterran
Hi in the script data directory(tess_best/script) , script data of English is Latin.traineddata ? waiting for answer. Thank you -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from

Re: [tesseract-ocr] Extracting pristine rasterized text

2018-04-02 Thread ShreeDevi Kumar
Thank you for the detailed info. My suggestion is to try recognition with eng.traineddata from the tessdata_fast repository with --oem 1. On Tue 3 Apr, 2018, 3:13 AM Patrick Ramsey, wrote: > Answers below inline. And thank you very much for your help :) > > |PTR

Re: [tesseract-ocr] Extracting pristine rasterized text

2018-04-02 Thread Patrick Ramsey
Answers below inline. And thank you very much for your help :) |PTR On Friday, March 30, 2018 at 2:00:18 AM UTC-7, shree wrote: > > Please check GitHub/issues for similar reports and suggestions. > > Also specify, > Which version/commit of tesseract 4 > commit hash:

Re: [tesseract-ocr] [4.0.0-beta.1] read_params_file: parameter not found: PNG

2018-04-02 Thread Zdenko Podobny
aim is to have tool that is easy portable with minimum dependencies. IMO it is standard on linux/unix like system to use --help option for explanation of usage. Zdenko 2018-04-02 14:38 GMT+02:00 JP T : > Well, the problem is error handling. > If tesseract would have

[tesseract-ocr] does it make sense to train existing languages? how to fix repeatedly wrong letters?

2018-04-02 Thread JP T
Hi I don't really got an understanding of the consequences of training. My problem: I've got tons of pages with a special format. ("one place study" about the historic inhabitants of a town) tesseract repeatedly fails on a few special words: oo (oh-oh) at start of line for "wedding" is often

Re: [tesseract-ocr] [4.0.0-beta.1] read_params_file: parameter not found: PNG

2018-04-02 Thread JP T
Well, the problem is error handling. If tesseract would have given a meaningful error message... This is about basic parameter handling, nothing sophisticated. Am Montag, 2. April 2018 09:02:02 UTC+2 schrieb zdenop: > > ... and it was exactly the same in tesseract 3.0x as in 4.0 > > -- You

[tesseract-ocr] Where is /path/to/eng.user-words?

2018-04-02 Thread 이경준
Hi .. I incited this page . I cannot find (lang).user-words . How can I find? Tesseract config files consist of lines with variable-value pairs (space separated). The variables are documented as flags in the source code like the following one in tesseractclass.h:

[tesseract-ocr] Re: When tesseract(3.04) makes a box, is there a way to control it if it is made more than the number of letters?

2018-04-02 Thread notoriousterran
The original image contains eight characters, but tesseract(3.04) has nine boxes. = The original image contains eight characters, but tesseract(3.04) makes nine boxes. ($ tesseract (lang).(fontname).exp(num).tif tesseract (lang).(fontname).exp(num) -l lang batch.nochop makebox) 2018년 4월 2일

[tesseract-ocr] When tesseract(3.04) makes a box, is there a way to control it if it is made more than the number of letters?

2018-04-02 Thread notoriousterran
Hi When tesseract(3.04) makes a box, is there a way to control it if it is made more than the number of letters? The original image contains eight characters, but tesseract(3.04) has nine boxes. So I only put 8 boxes of file information into the box file, but A showed 9 characters in the

Re: [tesseract-ocr] [4.0.0-beta.1] read_params_file: parameter not found: PNG

2018-04-02 Thread Zdenko Podobny
... and it was exactly the same in tesseract 3.0x as in 4.0 Zdenko 2018-04-02 0:14 GMT+02:00 JP T : > Solved: > must be* tesseract infile outfile options* instead of standard unix *program > options infile outfile*. > On Sun 1 Apr, 2018, 7:25 PM JP T,