You can compute the bounding box as the bounding box of the text lines. That will probably give you a fairly reasonable page bounding box for most pages.
We have separately developed another page bounding box detector that we will be incorporating into OCRopus over the next few months; that detector detects the page boundary directly. Tom On Fri, Oct 31, 2008 at 02:54, jimfunderburk <[EMAIL PROTECTED]>wrote: > > I am a potential ocropus user. Based on a lecture by Breuel at a > Sanskrit symposium in May 2008, and from what I've seen in ocropus > wiki, I suspect that ocropus can solve the problem described below. > But for me it is a non-trivial task to get a ubuntu computer, install > ocropus, etc. etc., so I am hoping that the experts of this group will > be able to say "Sure, ocropus can do that!", before I proceed further. > > The project is to look up a word in scans of the pages of the Wilson > Sanskrit dictionary, and highlight on the scanned image of the > relevant page the part pertaining to the word. > > You can see the current state of this for the Wilson dictionary at > http://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/web/index.php > If you enter 'azva', the page for this word is retrieved, and the > part of the page containing the word is emphasized. > For this word, 'azva' the process is quite satisfactory. > However, if you try the word 'rAma' or 'sItA', for instance, you see > that the region highlighted is not quite right. > The main problem is that the position of the page within the whole > scanned image varies, due in part to vagaries of > the scanning process. > > Here is where I thought OCROPUS might come in usefully: to > determine the pixel coordinates of the'bounding rectangle' of > the text. A table of such information for each page could be fed > into some other program, possibly such as imageMagick, > to automate the 'normalization' of the image within the page. > > Thanks for any suggestions. > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
