Hi,

Glad you've made some progress with your goal.

As for parameters that can influence speed vs. accuracy - they are
many. Just to name a few:

classify_class_pruner_threshold
classify_class_pruner_multiplier
classify_cp_cutoff_strength
classify_integer_matcher_multiplier

These relate to the pruner and matcher (low-level glyph matching).
Also there are many parameters for segmentation, font detection, etc.
They also sometimes make a big difference in favor of speed. However I
doubt you can find anything about it in the Wiki...

Warm regards,
Dmitri Silaev






On Fri, Mar 25, 2011 at 3:24 PM, Adetokunbo Bamidele <[email protected]> wrote:

Hi,

Now that you understand what my final goal is, :-)
This is probably the easiest way forward for now.

I will look into the use of .box files to generate the coords and
wingrep to XML file output format I need.
Thanks for your advice.

So far the transition looks favourable for such basic use of functions
from leptonica library but we'll see.

If I want to look into tuning tesseract for both recognition and speed
performances can you point me to a useful article on the wiki if you're
famaliar with one.

Best regards

Ade. From: Dmitri Silaev
Sent: 25 March 2011 10:22
To: Adetokunbo Bamidele
Subject: Re: tesseract api, how do you get the bbox co-ordinates in
commandline using the exe in win32
Well, now that I know what you need, I can suggest that you use .box
file generation. Not only it can be used as a step in Tesseract
training procedure, but also serve as a simple means to obtain BB
coordinates of recognized blobs. Imo for your needs it's the easiest
way. Refer to 
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Make_Box_Files
and around for more details.

If you need to get output in a specific format you can use e.g.
wingrep or the like.

Not so long ago Tesseract didn't use Leptonica at all. For the moment,
Tesseract mainly uses Leptonica's basic functions, so I wouldn't say
"Tesseract uses Leptonica for this purpose", I'd say "Tesseract is in
the beginning of its transition to Leptonica".

HTH

Warm regards,
Dmitri Silaev





On Fri, Mar 25, 2011 at 12:15 PM, Adetokunbo Bamidele
<[email protected]> wrote:
> Pls can you expand on each of your proposals.
>
> The end goal which you might find interesting is to benchmark the text
> localisation and segmentation capability against dataset. The aim is to
> fully understand how good the detector in tesseract is.
>
> I understand that tesseract uses leptonica for this purpose.
>
> So I am looking for the shortest path to my goal.
> From: Dmitri Silaev
> Sent: 25 March 2011 06:19
> To: [email protected]
> Cc: BYTEFX
> Subject: Re: tesseract api, how do you get the bbox co-ordinates in
> commandline using the exe in win32
> Well, I still don't get your final goal, maybe it could be easier to
> suggest something having known what you try to achieve.
> However if you'd decide to dive into programming, a better way of
> getting rects is using the ResultIterator/PageIterator
> interface.
>
> Also you can benefit of knowing that generation of .box files also
> provides you with rect coords... You can count lines,
> calculate widths and heights... There's a number of text file
> processing utilities... Well, probably you know what to do.
>
> Btw examining control paths used to generate .box files is a good
> point to devise your own blob rect dumper with minimal effort.
>
> Warm regards,
> Dmitri Silaev
>
>
>
>
>
> On Thu, Mar 24, 2011 at 6:48 PM, BYTEFX <[email protected]> wrote:
>> hi thanks for the pointer. i have set this up but this does not answer
>> my questions.
>>
>> let me explain:
>>
>> for a sample image sent to tesseract for processing, "GetRegions"
>> function can be called from the api.
>> i can figure this out and print out the results from within tesseract
>> but i imagined this must have been done elsewhere in the forum !
>>
>> On Mar 24, 3:11 pm, Dmitri Silaev <[email protected]> wrote:
>>> I'm not sure if it's exactly what you want, but at first you can try
>>> to create a config file with the following line inside:
>>>
>>> textord_oldbl_debug             T
>>>
>>> Since the output can be quite long and might not fit into the console
>>> window, you can also specify:
>>>
>>> debug_file tesseract.log
>>>
>>> I suspect any other debug info is only accessible from the ScrollView
>>> facility. You can read the Wiki and search this forum for "ScrollView"
>>> to find out more. If you are on Windows, you can use my article 
>>> athttp://rdaemons.blogspot.com/2011/02/tesseract-ocr-setting-up-interac...
>>> to make your ScrollView installation process quicker
>>>
>>> Warm regards,
>>> Dmitri Silaev
>>>
>>>
>>>
>>> On Thu, Mar 24, 2011 at 2:41 PM, BYTEFX <[email protected]> wrote:
>>> > Hi,
>>>
>>> > i am interested in understanding the internal detector of tesseract
>>> > 3.00 in win32.
>>>
>>> > How do i go about printing out to winconsole the detected Rects from
>>> > tesseract.
>>> > is there a commandline arg for this, or where in the code (baseapi)
>>> > can i start from the get this information.
>>>
>>> > Basically need the output format, [x,y,width & height], and count of
>>> > Rect's identified from tesseract.
>>>
>>> > best
>>>
>>> > bytefx.
>>>
>>> > --
>>> > You received this message because you are subscribed to the Google Groups 
>>> > "tesseract-ocr" group.
>>> > To post to this group, send email to [email protected].
>>> > To unsubscribe from this group, send email to 
>>> > [email protected].
>>> > For more options, visit this group 
>>> > athttp://groups.google.com/group/tesseract-ocr?hl=en.- Hide quoted text -
>>>
>>> - Show quoted text -
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to