Well, wingrep is not a must have. I just mentioned it to name
anything. After all, it's shareware ))

You need a program that is just capable of processing text files and
doing some basic operations with words or numbers within a text line.
There's a vast of such programs on the Internet, you'll probably be
able to find one that is free and easy to use.

HTH

Warm regards,
Dmitri Silaev





On Fri, Mar 25, 2011 at 7:31 PM, BYTEFX <[email protected]> wrote:
> hi,
>
> i have create dthe box file. it's quite useful to see each character
> box dimensions.
> it's almost what i need, it's just that each text line is a single
> character in the box file.
>
> P 28 297 102 382 0
> r 101 298 148 357 0
> i 150 298 178 382 0
> n 184 299 244 358 0
> t 245 297 285 370 0
> M 288 298 383 379 0
> e 387 296 446 355 0
> d 448 295 507 376 0
> i 511 299 540 378 0
> a 543 298 601 359 0
>
> i would like a rect co-ordinate for a group of word "PrintMedia" as
> tesseract would receive the input.
> also what does this format mean: first text line 28 297 102 382 0  ==
> xx xx xx xx xx.
>
> i'm not familair with wingrep :-(
>
> thanks for your help in advance. alsmost there :-)
>
> Best Regards
> Bytefx
>
> On Mar 25, 10:26 am, Dmitri Silaev <[email protected]> wrote:
>> Well, now that I know what you need, I can suggest that you use .box
>> file generation. Not only it can be used as a step in Tesseract
>> training procedure, but also serve as a simple means to obtain BB
>> coordinates of recognized blobs. Imo for your needs it's the easiest
>> way. Refer 
>> tohttp://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Make_B...
>> and around for more details.
>>
>> If you need to get output in a specific format you can use e.g.
>> wingrep or the like.
>>
>> Not so long ago Tesseract didn't use Leptonica at all. For the moment,
>> Tesseract mainly uses Leptonica's basic functions, so I wouldn't say
>> "Tesseract uses Leptonica for this purpose", I'd say "Tesseract is in
>> the beginning of its transition to Leptonica".
>>
>> HTH
>>
>> Warm regards,
>> Dmitri Silaev
>>
>> On Fri, Mar 25, 2011 at 12:15 PM, Adetokunbo Bamidele
>>
>>
>>
>> <[email protected]> wrote:
>> > Pls can you expand on each of your proposals.
>>
>> > The end goal which you might find interesting is to benchmark the text
>> > localisation and segmentation capability against dataset. The aim is to
>> > fully understand how good the detector in tesseract is.
>>
>> > I understand that tesseract uses leptonica for this purpose.
>>
>> > So I am looking for the shortest path to my goal.
>> > From: Dmitri Silaev
>> > Sent: 25 March 2011 06:19
>> > To: [email protected]
>> > Cc: BYTEFX
>> > Subject: Re: tesseract api, how do you get the bbox co-ordinates in
>> > commandline using the exe in win32
>> > Well, I still don't get your final goal, maybe it could be easier to
>> > suggest something having known what you try to achieve.
>> > However if you'd decide to dive into programming, a better way of
>> > getting rects is using the ResultIterator/PageIterator
>> > interface.
>>
>> > Also you can benefit of knowing that generation of .box files also
>> > provides you with rect coords... You can count lines,
>> > calculate widths and heights... There's a number of text file
>> > processing utilities... Well, probably you know what to do.
>>
>> > Btw examining control paths used to generate .box files is a good
>> > point to devise your own blob rect dumper with minimal effort.
>>
>> > Warm regards,
>> > Dmitri Silaev
>>
>> > On Thu, Mar 24, 2011 at 6:48 PM, BYTEFX <[email protected]> wrote:
>> >> hi thanks for the pointer. i have set this up but this does not answer
>> >> my questions.
>>
>> >> let me explain:
>>
>> >> for a sample image sent to tesseract for processing, "GetRegions"
>> >> function can be called from the api.
>> >> i can figure this out and print out the results from within tesseract
>> >> but i imagined this must have been done elsewhere in the forum !
>>
>> >> On Mar 24, 3:11 pm, Dmitri Silaev <[email protected]> wrote:
>> >>> I'm not sure if it's exactly what you want, but at first you can try
>> >>> to create a config file with the following line inside:
>>
>> >>> textord_oldbl_debug             T
>>
>> >>> Since the output can be quite long and might not fit into the console
>> >>> window, you can also specify:
>>
>> >>> debug_file tesseract.log
>>
>> >>> I suspect any other debug info is only accessible from the ScrollView
>> >>> facility. You can read the Wiki and search this forum for "ScrollView"
>> >>> to find out more. If you are on Windows, you can use my article 
>> >>> athttp://rdaemons.blogspot.com/2011/02/tesseract-ocr-setting-up-interac...
>> >>> to make your ScrollView installation process quicker
>>
>> >>> Warm regards,
>> >>> Dmitri Silaev
>>
>> >>> On Thu, Mar 24, 2011 at 2:41 PM, BYTEFX <[email protected]> wrote:
>> >>> > Hi,
>>
>> >>> > i am interested in understanding the internal detector of tesseract
>> >>> > 3.00 in win32.
>>
>> >>> > How do i go about printing out to winconsole the detected Rects from
>> >>> > tesseract.
>> >>> > is there a commandline arg for this, or where in the code (baseapi)
>> >>> > can i start from the get this information.
>>
>> >>> > Basically need the output format, [x,y,width & height], and count of
>> >>> > Rect's identified from tesseract.
>>
>> >>> > best
>>
>> >>> > bytefx.
>>
>> >>> > --
>> >>> > You received this message because you are subscribed to the Google 
>> >>> > Groups "tesseract-ocr" group.
>> >>> > To post to this group, send email to [email protected].
>> >>> > To unsubscribe from this group, send email to 
>> >>> > [email protected].
>> >>> > For more options, visit this group 
>> >>> > athttp://groups.google.com/group/tesseract-ocr?hl=en.-Hide quoted text 
>> >>> > -
>>
>> >>> - Show quoted text -
>>
>> >> --
>> >> You received this message because you are subscribed to the Google Groups 
>> >> "tesseract-ocr" group.
>> >> To post to this group, send email to [email protected].
>> >> To unsubscribe from this group, send email to 
>> >> [email protected].
>> >> For more options, visit this group 
>> >> athttp://groups.google.com/group/tesseract-ocr?hl=en.- Hide quoted text -
>>
>> - Show quoted text -

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to