Re: [tesseract-ocr] Spanish text better processed in eng than in spa

2017-08-29 Thread ShreeDevi Kumar
I have opened this as an issue at https://github.com/tesserac t-ocr/tessdata/issues/77 You can provide additional feedback there. @theraysmith is doing the training at Google. The examples you provide will be helpful to him and improve future training. ShreeDevi

Re: [tesseract-ocr] Spanish text better processed in eng than in spa

2017-08-29 Thread valentin . depablo
spa and latin within best folders are moreless equivalent, there is no significant difference, although there are several failures they are quite reasonable. The one that provide real bad output are the official ones that are automatically installed. Do you need help training the data? (is a

Re: [tesseract-ocr] Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.

2017-08-29 Thread ShreeDevi Kumar
Also see https://github.com/tesseract-ocr/tesseract/issues/221 On 29-Aug-2017 3:26 PM, "ShreeDevi Kumar" wrote: > Check where the osd.traineddata and eng.trsineddata are installed. > Download other trained data to same directory. > > On Linux, it is usually

Re: [tesseract-ocr] Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.

2017-08-29 Thread ShreeDevi Kumar
Check where the osd.traineddata and eng.trsineddata are installed. Download other trained data to same directory. On Linux, it is usually /use/share/tessdata On 29-Aug-2017 1:58 PM, "vikram charan" wrote: > Hello, > I'm working on project which base on scan many kind of

[tesseract-ocr] Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.

2017-08-29 Thread vikram charan
Hello, I'm working on project which base on scan many kind of documents (like: - Image that contain text, file, inquiry forms, documents etc.) . I'm using Tesseract library to scan these documents. As mention on Github i followed all step to setup Tesseract. I drag and drop tessdata folder

Re: [tesseract-ocr] tesseract is not working for straightforward image

2017-08-29 Thread ShreeDevi Kumar
Take a look at improve quality page in wiki. On 28-Aug-2017 6:16 PM, "Lada Tylich" wrote: > Hi, > I am confused that for the attached image it gives with parameter *-psm > 7* result *88C. *It should detect such a picture, I guess. > Am I missing something something? > >

Re: [tesseract-ocr] Tesseract OCR 4.0.0 Alpha how to train a new font

2017-08-29 Thread ShreeDevi Kumar
Try first with best/Latin.traineddata that should handle text with diacritics --- >>Pango suggested font Gandhari Unicode. Use "Gandhari Unicode" within quotes as Font name >>ERROR: Could not find training text file /usr/local/share/tessdata// eng/eng.training_text give script_dir

[tesseract-ocr] Tesseract OCR 4.0.0 Alpha how to train a new font

2017-08-29 Thread Anand Akella
Hi, Im new to tesseract and have a pdf file with diacritical marks. I tried to run tesseract 4.0.0 with language eng. I see that it is not able to recognize the text with diacritical marks. I found a font that can detect diacritical mark. Gandhari Unicode 5.1