[tesseract-ocr] Fine Tuning all Fonts List

2017-06-19 Thread Ibr
Hi, for engtrain and engeval they almost have the same command but for eval you specify the font using the argument --font-list, while in train you define the fonts in language-specifics.sh , I ran both command and I noticed that they produce the same results files, except in engtrain case ther

Re: [tesseract-ocr] unicharset_extractor extracting zero values

2017-06-19 Thread ShreeDevi Kumar
do u have the common and latin unicharset in ur langdata directory. See https://github.com/tesseract-ocr/langdata Try to build the latest 3.05.01 version. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 19, 2

[tesseract-ocr] Recognite source code

2017-06-19 Thread 'Phillipp Ohlandt' via tesseract-ocr
Hello, has someone experience with applying Tesseract on an image with source code? I get really poor results. I already opened an issue on GitHub, where you can see the source image and the result. https://github.com/tesseract-ocr/tesseract/issues/997 -- You received this message because you

Re: [tesseract-ocr] unicharset_extractor extracting zero values

2017-06-19 Thread David Barishev
Thanks for the replay, If you mean if i have the latin and common unicharset in the tessdata direcotry( /usr/share/tesseract-ocr/tessdata ),i have downloaded them and placed them in the directory and still getting the same behavior. I have also tried doing it from my windows machine which has 3.

Re: [tesseract-ocr] unicharset_extractor extracting zero values

2017-06-19 Thread ShreeDevi Kumar
Where do you have your source files for english langdata? If it is in a directory such as ../langdata/eng/ then put the common.unicharset, latin.unicharset and font_properties etc in ../langdata ShreeDevi भजन - कीर्तन - आरती @ http://

Re: [tesseract-ocr] unicharset_extractor extracting zero values

2017-06-19 Thread ShreeDevi Kumar
​You could also try running training on your windows pc with 3.05.01 using tesstrain.sh using `git for windows` which will provide you a shell for running ​bash scripts. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On M

Re: [tesseract-ocr] unicharset_extractor extracting zero values

2017-06-19 Thread ShreeDevi Kumar
I would also suggest that you add spaces between words in your input text, ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 19, 2017 at 9:19 PM, ShreeDevi Kumar wrote: > ​You could also try running training on

Re: [tesseract-ocr] unicharset_extractor extracting zero values

2017-06-19 Thread David Barishev
hey, i try to build tesseract from source now, and after i have built Leptonica, i couldn't build tesseract with this error : /bin/bash ../libtool --tag=CXX --mode=link g++ -g -O2 -std=c++11 -o tesseract tesseract-tesseractmain.o libtesseract.la -lrt -lpthread libtool: link: g++ -g -O2 -

[tesseract-ocr] error building 3.05.01

2017-06-19 Thread ShreeDevi Kumar
Sorry, I haven't built 3.05.01. Hope others can help. On Tue, Jun 20, 2017 at 2:32 AM, David Barishev wrote: > hey, i try to build tesseract from source now, and after i have > built Leptonica, i couldn't build tesseract with this error : > > /bin/bash ../libtool --tag=CXX --mode=link g++

[tesseract-ocr] Re: unicharset_extractor extracting zero values

2017-06-19 Thread shree
See https://github.com/tesseract-ocr/tesseract/issues/318 regarding the unicharset format I was able to do regular tesseract training (not lstm) using tesseract 4.00.00 version from github master and create new unicharset and traineddata with your box/tiff pair. The output on the same tiff file

[tesseract-ocr] Re: unicharset_extractor extracting zero values

2017-06-19 Thread shree
See https://github.com/tesseract-ocr/tesseract/issues/318 regarding the unicharset format I was able to do regular tesseract training (not lstm) using tesseract 4.00.00 version from github master and create new unicharset and traineddata with your box/tiff pair. The output on the same tiff file

[tesseract-ocr] How to improve the recognition of receipt (text not in words dictionary)

2017-06-19 Thread Laura
Hi, I’m new on tesseract. I’m trying to recognize receipts. Since on receipts, lots of text are not dictionary words. I disabled the dictionaries, it increased the recognition rate, but it’s still low, I’d like to create my own dictionary with the product catalog. Is there someone who can

[tesseract-ocr] Tesseract 4.00.00alpha Windows doesn't find image files

2017-06-19 Thread J. Karjalainen
Hi! I need your help please! I'm trying to run Tesseract 4.00.00alpha (used the installer) on Win7 sp1 32-bit and it doesn't find any files even if it's in the same folder with the tesseract.exe. I always get: Error in *fopenReadStream: file not found* C:\Program Files\Tesseract-OCR>tesser