Hi, Glad you've made some progress with your goal.
As for parameters that can influence speed vs. accuracy - they are many. Just to name a few: classify_class_pruner_threshold classify_class_pruner_multiplier classify_cp_cutoff_strength classify_integer_matcher_multiplier These relate to the pruner and matcher (low-level glyph matching). Also there are many parameters for segmentation, font detection, etc. They also sometimes make a big difference in favor of speed. However I doubt you can find anything about it in the Wiki... Warm regards, Dmitri Silaev On Fri, Mar 25, 2011 at 3:24 PM, Adetokunbo Bamidele <[email protected]> wrote: Hi, Now that you understand what my final goal is, :-) This is probably the easiest way forward for now. I will look into the use of .box files to generate the coords and wingrep to XML file output format I need. Thanks for your advice. So far the transition looks favourable for such basic use of functions from leptonica library but we'll see. If I want to look into tuning tesseract for both recognition and speed performances can you point me to a useful article on the wiki if you're famaliar with one. Best regards Ade. From: Dmitri Silaev Sent: 25 March 2011 10:22 To: Adetokunbo Bamidele Subject: Re: tesseract api, how do you get the bbox co-ordinates in commandline using the exe in win32 Well, now that I know what you need, I can suggest that you use .box file generation. Not only it can be used as a step in Tesseract training procedure, but also serve as a simple means to obtain BB coordinates of recognized blobs. Imo for your needs it's the easiest way. Refer to http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Make_Box_Files and around for more details. If you need to get output in a specific format you can use e.g. wingrep or the like. Not so long ago Tesseract didn't use Leptonica at all. For the moment, Tesseract mainly uses Leptonica's basic functions, so I wouldn't say "Tesseract uses Leptonica for this purpose", I'd say "Tesseract is in the beginning of its transition to Leptonica". HTH Warm regards, Dmitri Silaev On Fri, Mar 25, 2011 at 12:15 PM, Adetokunbo Bamidele <[email protected]> wrote: > Pls can you expand on each of your proposals. > > The end goal which you might find interesting is to benchmark the text > localisation and segmentation capability against dataset. The aim is to > fully understand how good the detector in tesseract is. > > I understand that tesseract uses leptonica for this purpose. > > So I am looking for the shortest path to my goal. > From: Dmitri Silaev > Sent: 25 March 2011 06:19 > To: [email protected] > Cc: BYTEFX > Subject: Re: tesseract api, how do you get the bbox co-ordinates in > commandline using the exe in win32 > Well, I still don't get your final goal, maybe it could be easier to > suggest something having known what you try to achieve. > However if you'd decide to dive into programming, a better way of > getting rects is using the ResultIterator/PageIterator > interface. > > Also you can benefit of knowing that generation of .box files also > provides you with rect coords... You can count lines, > calculate widths and heights... There's a number of text file > processing utilities... Well, probably you know what to do. > > Btw examining control paths used to generate .box files is a good > point to devise your own blob rect dumper with minimal effort. > > Warm regards, > Dmitri Silaev > > > > > > On Thu, Mar 24, 2011 at 6:48 PM, BYTEFX <[email protected]> wrote: >> hi thanks for the pointer. i have set this up but this does not answer >> my questions. >> >> let me explain: >> >> for a sample image sent to tesseract for processing, "GetRegions" >> function can be called from the api. >> i can figure this out and print out the results from within tesseract >> but i imagined this must have been done elsewhere in the forum ! >> >> On Mar 24, 3:11 pm, Dmitri Silaev <[email protected]> wrote: >>> I'm not sure if it's exactly what you want, but at first you can try >>> to create a config file with the following line inside: >>> >>> textord_oldbl_debug T >>> >>> Since the output can be quite long and might not fit into the console >>> window, you can also specify: >>> >>> debug_file tesseract.log >>> >>> I suspect any other debug info is only accessible from the ScrollView >>> facility. You can read the Wiki and search this forum for "ScrollView" >>> to find out more. If you are on Windows, you can use my article >>> athttp://rdaemons.blogspot.com/2011/02/tesseract-ocr-setting-up-interac... >>> to make your ScrollView installation process quicker >>> >>> Warm regards, >>> Dmitri Silaev >>> >>> >>> >>> On Thu, Mar 24, 2011 at 2:41 PM, BYTEFX <[email protected]> wrote: >>> > Hi, >>> >>> > i am interested in understanding the internal detector of tesseract >>> > 3.00 in win32. >>> >>> > How do i go about printing out to winconsole the detected Rects from >>> > tesseract. >>> > is there a commandline arg for this, or where in the code (baseapi) >>> > can i start from the get this information. >>> >>> > Basically need the output format, [x,y,width & height], and count of >>> > Rect's identified from tesseract. >>> >>> > best >>> >>> > bytefx. >>> >>> > -- >>> > You received this message because you are subscribed to the Google Groups >>> > "tesseract-ocr" group. >>> > To post to this group, send email to [email protected]. >>> > To unsubscribe from this group, send email to >>> > [email protected]. >>> > For more options, visit this group >>> > athttp://groups.google.com/group/tesseract-ocr?hl=en.- Hide quoted text - >>> >>> - Show quoted text - >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en. >> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

