Great, I use it too, that's one of the famous free text editing programs )) However it's not capable to do massive automated text file processing, but I think this is what you need to achieve your goal...
On Sun, Mar 27, 2011 at 12:25 AM, Adetokunbo Bamidele <[email protected]> wrote: > Thanks. I use notepad++. :-) > > -----Original Message----- > From: Dmitri Silaev > Sent: 26 March 2011 20:57 > To: [email protected]; BYTEFX > Subject: Re: tesseract api, how do you get the bbox co-ordinates in > commandline using the exe in win32 > > Well, wingrep is not a must have. I just mentioned it to name > anything. After all, it's shareware )) > > You need a program that is just capable of processing text files and > doing some basic operations with words or numbers within a text line. > There's a vast of such programs on the Internet, you'll probably be > able to find one that is free and easy to use. > > HTH > > Warm regards, > Dmitri Silaev > > > > > > On Fri, Mar 25, 2011 at 7:31 PM, BYTEFX <[email protected]> wrote: >> hi, >> >> i have create dthe box file. it's quite useful to see each character >> box dimensions. >> it's almost what i need, it's just that each text line is a single >> character in the box file. >> >> P 28 297 102 382 0 >> r 101 298 148 357 0 >> i 150 298 178 382 0 >> n 184 299 244 358 0 >> t 245 297 285 370 0 >> M 288 298 383 379 0 >> e 387 296 446 355 0 >> d 448 295 507 376 0 >> i 511 299 540 378 0 >> a 543 298 601 359 0 >> >> i would like a rect co-ordinate for a group of word "PrintMedia" as >> tesseract would receive the input. >> also what does this format mean: first text line 28 297 102 382 0 == >> xx xx xx xx xx. >> >> i'm not familair with wingrep :-( >> >> thanks for your help in advance. alsmost there :-) >> >> Best Regards >> Bytefx >> >> On Mar 25, 10:26 am, Dmitri Silaev <[email protected]> wrote: >>> Well, now that I know what you need, I can suggest that you use .box >>> file generation. Not only it can be used as a step in Tesseract >>> training procedure, but also serve as a simple means to obtain BB >>> coordinates of recognized blobs. Imo for your needs it's the easiest >>> way. Refer >>> tohttp://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Make_B... >>> and around for more details. >>> >>> If you need to get output in a specific format you can use e.g. >>> wingrep or the like. >>> >>> Not so long ago Tesseract didn't use Leptonica at all. For the moment, >>> Tesseract mainly uses Leptonica's basic functions, so I wouldn't say >>> "Tesseract uses Leptonica for this purpose", I'd say "Tesseract is in >>> the beginning of its transition to Leptonica". >>> >>> HTH >>> >>> Warm regards, >>> Dmitri Silaev >>> >>> On Fri, Mar 25, 2011 at 12:15 PM, Adetokunbo Bamidele >>> >>> >>> >>> <[email protected]> wrote: >>> > Pls can you expand on each of your proposals. >>> >>> > The end goal which you might find interesting is to benchmark the text >>> > localisation and segmentation capability against dataset. The aim is to >>> > fully understand how good the detector in tesseract is. >>> >>> > I understand that tesseract uses leptonica for this purpose. >>> >>> > So I am looking for the shortest path to my goal. >>> > From: Dmitri Silaev >>> > Sent: 25 March 2011 06:19 >>> > To: [email protected] >>> > Cc: BYTEFX >>> > Subject: Re: tesseract api, how do you get the bbox co-ordinates in >>> > commandline using the exe in win32 >>> > Well, I still don't get your final goal, maybe it could be easier to >>> > suggest something having known what you try to achieve. >>> > However if you'd decide to dive into programming, a better way of >>> > getting rects is using the ResultIterator/PageIterator >>> > interface. >>> >>> > Also you can benefit of knowing that generation of .box files also >>> > provides you with rect coords... You can count lines, >>> > calculate widths and heights... There's a number of text file >>> > processing utilities... Well, probably you know what to do. >>> >>> > Btw examining control paths used to generate .box files is a good >>> > point to devise your own blob rect dumper with minimal effort. >>> >>> > Warm regards, >>> > Dmitri Silaev >>> >>> > On Thu, Mar 24, 2011 at 6:48 PM, BYTEFX <[email protected]> wrote: >>> >> hi thanks for the pointer. i have set this up but this does not answer >>> >> my questions. >>> >>> >> let me explain: >>> >>> >> for a sample image sent to tesseract for processing, "GetRegions" >>> >> function can be called from the api. >>> >> i can figure this out and print out the results from within tesseract >>> >> but i imagined this must have been done elsewhere in the forum ! >>> >>> >> On Mar 24, 3:11 pm, Dmitri Silaev <[email protected]> wrote: >>> >>> I'm not sure if it's exactly what you want, but at first you can try >>> >>> to create a config file with the following line inside: >>> >>> >>> textord_oldbl_debug T >>> >>> >>> Since the output can be quite long and might not fit into the console >>> >>> window, you can also specify: >>> >>> >>> debug_file tesseract.log >>> >>> >>> I suspect any other debug info is only accessible from the ScrollView >>> >>> facility. You can read the Wiki and search this forum for "ScrollView" >>> >>> to find out more. If you are on Windows, you can use my article >>> >>> athttp://rdaemons.blogspot.com/2011/02/tesseract-ocr-setting-up-interac... >>> >>> to make your ScrollView installation process quicker >>> >>> >>> Warm regards, >>> >>> Dmitri Silaev >>> >>> >>> On Thu, Mar 24, 2011 at 2:41 PM, BYTEFX <[email protected]> wrote: >>> >>> > Hi, >>> >>> >>> > i am interested in understanding the internal detector of tesseract >>> >>> > 3.00 in win32. >>> >>> >>> > How do i go about printing out to winconsole the detected Rects from >>> >>> > tesseract. >>> >>> > is there a commandline arg for this, or where in the code (baseapi) >>> >>> > can i start from the get this information. >>> >>> >>> > Basically need the output format, [x,y,width & height], and count of >>> >>> > Rect's identified from tesseract. >>> >>> >>> > best >>> >>> >>> > bytefx. >>> >>> >>> > -- >>> >>> > You received this message because you are subscribed to the Google >>> >>> > Groups "tesseract-ocr" group. >>> >>> > To post to this group, send email to [email protected]. >>> >>> > To unsubscribe from this group, send email to >>> >>> > [email protected]. >>> >>> > For more options, visit this group >>> >>> > athttp://groups.google.com/group/tesseract-ocr?hl=en.-Hide quoted >>> >>> > text - >>> >>> >>> - Show quoted text - >>> >>> >> -- >>> >> You received this message because you are subscribed to the Google >>> >> Groups "tesseract-ocr" group. >>> >> To post to this group, send email to [email protected]. >>> >> To unsubscribe from this group, send email to >>> >> [email protected]. >>> >> For more options, visit this group >>> >> athttp://groups.google.com/group/tesseract-ocr?hl=en.- Hide quoted text - >>> >>> - Show quoted text - > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

