Great, I use it too, that's one of the famous free text editing
programs )) However it's not capable to do massive automated text file
processing, but I think this is what you need to achieve your goal...




On Sun, Mar 27, 2011 at 12:25 AM, Adetokunbo Bamidele
<[email protected]> wrote:
> Thanks. I use notepad++. :-)
>
> -----Original Message-----
> From: Dmitri Silaev
> Sent: 26 March 2011 20:57
> To: [email protected]; BYTEFX
> Subject: Re: tesseract api, how do you get the bbox co-ordinates in
> commandline using the exe in win32
>
> Well, wingrep is not a must have. I just mentioned it to name
> anything. After all, it's shareware ))
>
> You need a program that is just capable of processing text files and
> doing some basic operations with words or numbers within a text line.
> There's a vast of such programs on the Internet, you'll probably be
> able to find one that is free and easy to use.
>
> HTH
>
> Warm regards,
> Dmitri Silaev
>
>
>
>
>
> On Fri, Mar 25, 2011 at 7:31 PM, BYTEFX <[email protected]> wrote:
>> hi,
>>
>> i have create dthe box file. it's quite useful to see each character
>> box dimensions.
>> it's almost what i need, it's just that each text line is a single
>> character in the box file.
>>
>> P 28 297 102 382 0
>> r 101 298 148 357 0
>> i 150 298 178 382 0
>> n 184 299 244 358 0
>> t 245 297 285 370 0
>> M 288 298 383 379 0
>> e 387 296 446 355 0
>> d 448 295 507 376 0
>> i 511 299 540 378 0
>> a 543 298 601 359 0
>>
>> i would like a rect co-ordinate for a group of word "PrintMedia" as
>> tesseract would receive the input.
>> also what does this format mean: first text line 28 297 102 382 0  ==
>> xx xx xx xx xx.
>>
>> i'm not familair with wingrep :-(
>>
>> thanks for your help in advance. alsmost there :-)
>>
>> Best Regards
>> Bytefx
>>
>> On Mar 25, 10:26 am, Dmitri Silaev <[email protected]> wrote:
>>> Well, now that I know what you need, I can suggest that you use .box
>>> file generation. Not only it can be used as a step in Tesseract
>>> training procedure, but also serve as a simple means to obtain BB
>>> coordinates of recognized blobs. Imo for your needs it's the easiest
>>> way. Refer 
>>> tohttp://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Make_B...
>>> and around for more details.
>>>
>>> If you need to get output in a specific format you can use e.g.
>>> wingrep or the like.
>>>
>>> Not so long ago Tesseract didn't use Leptonica at all. For the moment,
>>> Tesseract mainly uses Leptonica's basic functions, so I wouldn't say
>>> "Tesseract uses Leptonica for this purpose", I'd say "Tesseract is in
>>> the beginning of its transition to Leptonica".
>>>
>>> HTH
>>>
>>> Warm regards,
>>> Dmitri Silaev
>>>
>>> On Fri, Mar 25, 2011 at 12:15 PM, Adetokunbo Bamidele
>>>
>>>
>>>
>>> <[email protected]> wrote:
>>> > Pls can you expand on each of your proposals.
>>>
>>> > The end goal which you might find interesting is to benchmark the text
>>> > localisation and segmentation capability against dataset. The aim is to
>>> > fully understand how good the detector in tesseract is.
>>>
>>> > I understand that tesseract uses leptonica for this purpose.
>>>
>>> > So I am looking for the shortest path to my goal.
>>> > From: Dmitri Silaev
>>> > Sent: 25 March 2011 06:19
>>> > To: [email protected]
>>> > Cc: BYTEFX
>>> > Subject: Re: tesseract api, how do you get the bbox co-ordinates in
>>> > commandline using the exe in win32
>>> > Well, I still don't get your final goal, maybe it could be easier to
>>> > suggest something having known what you try to achieve.
>>> > However if you'd decide to dive into programming, a better way of
>>> > getting rects is using the ResultIterator/PageIterator
>>> > interface.
>>>
>>> > Also you can benefit of knowing that generation of .box files also
>>> > provides you with rect coords... You can count lines,
>>> > calculate widths and heights... There's a number of text file
>>> > processing utilities... Well, probably you know what to do.
>>>
>>> > Btw examining control paths used to generate .box files is a good
>>> > point to devise your own blob rect dumper with minimal effort.
>>>
>>> > Warm regards,
>>> > Dmitri Silaev
>>>
>>> > On Thu, Mar 24, 2011 at 6:48 PM, BYTEFX <[email protected]> wrote:
>>> >> hi thanks for the pointer. i have set this up but this does not answer
>>> >> my questions.
>>>
>>> >> let me explain:
>>>
>>> >> for a sample image sent to tesseract for processing, "GetRegions"
>>> >> function can be called from the api.
>>> >> i can figure this out and print out the results from within tesseract
>>> >> but i imagined this must have been done elsewhere in the forum !
>>>
>>> >> On Mar 24, 3:11 pm, Dmitri Silaev <[email protected]> wrote:
>>> >>> I'm not sure if it's exactly what you want, but at first you can try
>>> >>> to create a config file with the following line inside:
>>>
>>> >>> textord_oldbl_debug             T
>>>
>>> >>> Since the output can be quite long and might not fit into the console
>>> >>> window, you can also specify:
>>>
>>> >>> debug_file tesseract.log
>>>
>>> >>> I suspect any other debug info is only accessible from the ScrollView
>>> >>> facility. You can read the Wiki and search this forum for "ScrollView"
>>> >>> to find out more. If you are on Windows, you can use my article 
>>> >>> athttp://rdaemons.blogspot.com/2011/02/tesseract-ocr-setting-up-interac...
>>> >>> to make your ScrollView installation process quicker
>>>
>>> >>> Warm regards,
>>> >>> Dmitri Silaev
>>>
>>> >>> On Thu, Mar 24, 2011 at 2:41 PM, BYTEFX <[email protected]> wrote:
>>> >>> > Hi,
>>>
>>> >>> > i am interested in understanding the internal detector of tesseract
>>> >>> > 3.00 in win32.
>>>
>>> >>> > How do i go about printing out to winconsole the detected Rects from
>>> >>> > tesseract.
>>> >>> > is there a commandline arg for this, or where in the code (baseapi)
>>> >>> > can i start from the get this information.
>>>
>>> >>> > Basically need the output format, [x,y,width & height], and count of
>>> >>> > Rect's identified from tesseract.
>>>
>>> >>> > best
>>>
>>> >>> > bytefx.
>>>
>>> >>> > --
>>> >>> > You received this message because you are subscribed to the Google 
>>> >>> > Groups "tesseract-ocr" group.
>>> >>> > To post to this group, send email to [email protected].
>>> >>> > To unsubscribe from this group, send email to 
>>> >>> > [email protected].
>>> >>> > For more options, visit this group 
>>> >>> > athttp://groups.google.com/group/tesseract-ocr?hl=en.-Hide quoted 
>>> >>> > text -
>>>
>>> >>> - Show quoted text -
>>>
>>> >> --
>>> >> You received this message because you are subscribed to the Google 
>>> >> Groups "tesseract-ocr" group.
>>> >> To post to this group, send email to [email protected].
>>> >> To unsubscribe from this group, send email to 
>>> >> [email protected].
>>> >> For more options, visit this group 
>>> >> athttp://groups.google.com/group/tesseract-ocr?hl=en.- Hide quoted text -
>>>
>>> - Show quoted text -
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to