inconsistent results from tesseract when the same TessBaseAPI object is used for decoding multiple images

2012-11-15 Thread newtotesseract
Hi friends I am using a static TessBaseAPI object in my application. This object gets initialized and reads, processes the training data at the startup of the application. Then, this application processes multiple scanned images through the TESS_API TessBaseAPI::ProcessPages() function, using

Re: Confidence in HOCR file

2012-11-15 Thread zdenko podobny
On Thu, Nov 15, 2012 at 10:15 AM, José Luis Rey jluis...@gmail.com wrote: Thanks very much for your responses zdenop, I'm not used to dev in open source projects like this, perhaps you may help me to understand, for example if I implement a feature to add character rectconfidence to the

Re: Confidence in HOCR file

2012-11-15 Thread José Luis Rey
Thanks very much for your responses zdenop, I'm not used to dev in open source projects like this, perhaps you may help me to understand, for example if I implement a feature to add character rectconfidence to the hocr output, how this is translated to the main project (if it is good enough

ocr of image fails

2012-11-15 Thread sascha4j
Hi, i try to ocr some scanned text with tesseract-ocr. for some images the result is quite good. but for this one ( see attached file) the result is poor. any hints why ? and what i could do to get a better result? i use tesseract 3.0.2 with german language. greetings sascha4j

Re: ocr of image fails

2012-11-15 Thread sascha4j
after converting the image with imagmagick the result is better. not 100% but nearly. the options for imagemagick were convert -colorspace gray -resize 200% -unsharp 0x8+1.5+0.05 Am Donnerstag, 15. November 2012 10:26:21 UTC+1 schrieb sascha4j: Hi, i try to ocr some scanned text

Re: ocr of image fails

2012-11-15 Thread Sven Pedersen
Yes, I think the text size (x-height) was too small. Also, the English language data may be trained with more fonts, given that Google created it. --Sven On Thu, Nov 15, 2012 at 6:43 AM, sascha4j sascha.j...@gmx.net wrote: after converting the image with imagmagick the result is better. not

Re: Having traindata files uncombined

2012-11-15 Thread Zdenko Podobný
Can you please use 3.02 version instead of 3.01 and write exact error message? There is possibility to copy text from windows console - select relevant text/lines with pressed left mouse button then click with right mouse button outside of selected text but in console window - highlight will

Re: Can I configure Tesseract to *always* match a dictionary word?

2012-11-15 Thread Zdenko Podobný
Regarding user_patterns_suffix have a look at tesseract manual page [1]. I am not sure if there is possibility to force tesseract choose ocr output from dictionary (I never tried it ;-) ) But you can increase dictionary strength with variables language_model_penalty_non_freq_dict_word and

Re: Word Search Using Tessnet

2012-11-15 Thread zdenko podobny
On Fri, Nov 9, 2012 at 1:43 PM, Troy Frazier troypow...@gmail.com wrote: Is it possible to search an image for a particular word using the Tessnet wrapper? I see that it is possible to limit your scan to certain characters, but what I would like to do is to input a word and have all

Re: Word Search Using Tessnet

2012-11-15 Thread Sven Pedersen
There is a newer wrapper for 3.x version: http://code.google.com/p/tesseractdotnet/w/list I think it was made by the developer of VietOCR --Sven On Thu, Nov 15, 2012 at 5:06 PM, zdenko podobny zde...@gmail.com wrote: On Fri, Nov 9, 2012 at 1:43 PM, Troy Frazier troypow...@gmail.com wrote:

How to build the tesseract 3.02.02 project in Eclipse at Ubuntu?

2012-11-15 Thread Linda Li
I want to build the tesseract 3.02.02 project so that I can modify some code to tune it to some specific task. Version: tesseract 3.02.02 Ubuntu 12.04, Eclipse Juno I put the tesseract into the Eclipse project. Include directories /usr/local/include /usr/local /usr/include/leptonica and all

Problem with ViewerDebugging with tesseract 3.02.02

2012-11-15 Thread Linda Li
Version: tesseract 3.02.02 Ubuntu 12.04, Eclipse Juno I am trying to use ViewerDebugging. Following the instructions in http://code.google.com/p/tesseract-ocr/wiki/ViewerDebugging I installed javac download piccolo-1.2.jar, piccolox-1.2.jar, and make ScrollView.jar Then I use export to set the

Re: inconsistent results from tesseract when the same TessBaseAPI object is used for decoding multiple images

2012-11-15 Thread newtotesseract
Hi Dmitri, How do we clear the adaptive classifier? Can I please know, what is the API or function for clearing the adaptive classifier? Best Regards, - ganesh On Friday, November 16, 2012 3:39:22 AM UTC+8, Dmitri Silaev wrote: Sriranga, All you can specify in the command line can be seen