[tesseract-ocr] Re: Own, Custom tessdata files for training

2016-06-13 Thread Quan Nguyen
Images appearing readable to human eyes may not be so to computers. Therefore, image processing is most likely required prior to OCR step. Sure, you can use jTessBoxEditor to train for your language. The generated .traineddata will be placed in a tessdata folder and you can use the *Validate

[tesseract-ocr] Re: OCR of screen images is poor

2016-06-13 Thread Dave Faliskie
Hi Sach, I am having a very similar problem, did you have any luck getting a full screen shot to OCR close to 100%? On Thursday, June 9, 2016 at 10:38:26 AM UTC-4, Sach wrote: > > Expected the OCR of a screenshot to be 100%. Please see the attached PNG > image. Most of the labels are not

[tesseract-ocr] Re: invalid floating point operation when calling TessBaseAPIAnalyseLayout

2016-06-13 Thread Tom Morris
On Monday, June 13, 2016 at 10:22:32 AM UTC-4, Matthias Schneider wrote: > > I'm using latest dev version 3.05.00dev and I used peirick/leptonica ( > https://github.com/peirick/leptonica) to build libtesseract.dll and > liblept.dll with Visual Studio 2015. > However, the resulting DLLs I'm using

Re: [tesseract-ocr] Re: Do we have Sanskrit training images and box files online?

2016-06-13 Thread ShreeDevi Kumar
If you look at the readme files in the diff subdirectories starting with OCR under https://github.com/Shreeshrii/imagessan/tree/master you will see results of character and word level accuracy. Depending on the font, character level accuracy is around 80% and word level accuracy around 60% I have

[tesseract-ocr] Re: Tesseract FONT for OCRA Standard

2016-06-13 Thread Pierre-Luc Pineault
I haven't train this font and I've not encounter the same problem as you. This might mean that you haven't drop your trained data file to the good directory. If you have installed tesseract for Windows, you will have to drop the file in that directory. Tesseract-Ocr uses some Environment

Re: [tesseract-ocr] Re: Do we have Sanskrit training images and box files online?

2016-06-13 Thread rohit saluja
Thanks again for replying. I will surely check them out. My experience is that OCR on sanskrit data with hin.traineddata gives better results than san.traineddata. I do know know, it is due to cube mode or devanagari preprocessing(segmentation i guess) in devanagari? I wonder why such