[tesseract-ocr] How to read French text using tesseract ?

2016-03-09 Thread Pixxe
Hi all, I would like to use tesseract for extracting french language. and i hope it is possible to do it with existing tesseract and available french dictionary. *For English:* tesseract::TessBaseAPI *tess = new tesseract::TessBaseAPI(); if (tess->Init(NULL, "eng")) { fprintf(stderr, "Could

Re: [tesseract-ocr] how to compile tesseract on msys2/mingw?

2016-03-09 Thread Sriranga(83yrsold)
Shree, I have installed tesseract-ocr dev version 3.5.0 in ubuntu 15.10. It is presumed that version 3.5.00 dev is latest version than 3.04.01 and also no changes effected on existing traineddata files generated by Ray. With blessings, sriranga(83+) On Thu, Mar 10, 2016 at 9:43 AM, ShreeDevi

Re: [tesseract-ocr] how to compile tesseract on msys2/mingw?

2016-03-09 Thread ShreeDevi Kumar
Simon, You can try the attached pkgbuild for building the latest source from git. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Mar 6, 2016 at 12:19 AM, Simon Eigeldinger wrote: >

Re: [tesseract-ocr] Dropped characters from perfect image

2016-03-09 Thread 'John Taves' via tesseract-ocr
I am using the c# API and whatever default page segmentation happens. What tess variable[1] should I play with? [1]http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version jt On Wednesday, March 9, 2016 at 8:44:02 AM UTC-8, zdenop wrote: > > What page segmentation method[1] you used?

Re: [tesseract-ocr] Dropped characters from perfect image

2016-03-09 Thread zdenko podobny
What page segmentation method[1] you used? [1] https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method Zdenko On Wed, Mar 9, 2016 at 5:14 PM, 'John Taves' via tesseract-ocr < tesseract-ocr@googlegroups.com> wrote: > I am trying to recognize a flawless image. I

[tesseract-ocr] Dropped characters from perfect image

2016-03-09 Thread 'John Taves' via tesseract-ocr
I am trying to recognize a flawless image. I created the image from a pdf that is all vector, not image. It has no noise, no skew, flawless characters in any DPI that I want. The recognition from Tesseract sucks. Generally the problem is dropped characters. It seems to randomly ignore

[tesseract-ocr] Re: Training CMC7 Font

2016-03-09 Thread Roger
Yes, that's what I'm doing. After I reduced the image size and increased the image contrast and brightness, tesseract was able to recognize about 5 characters. But still, it is hard to recognize the whole string. Anyone has another approach I could try? Thank you. On Friday, March 4, 2016 at

[tesseract-ocr] Re: Training Tesseract: unicharset extractor producing "Bad properties"

2016-03-09 Thread Meltem Çetiner
Hi, Im trying to train as well and I have the same problem. I got this result : "P 5 0,255,0,255,0,32767,0,32767,0,32767 NULL 54 0 0 # # P [50 ]A A 5 0,255,0,255,0,32767,0,32767,0,32767 NULL 38 0 0 # # A [41 ]A S 5 0,255,0,255,0,32767,0,32767,0,32767 NULL 53 0 0 # # S [53 ]A" I have the

[tesseract-ocr] Re: OCR Recognition for Underlined text

2016-03-09 Thread Gunasekaran Velu
Hi Tom Any update regarding underline text problem? Regards Guna On Monday, March 7, 2016 at 6:08:03 AM UTC+5:30, Gunasekaran Velu wrote: > > HI > > I just sent own creation f image in paint and sent you. > > Now i have attached the real document(Cropping from full image due to > confidential