You can compute the bounding box as the bounding box of the text lines.
That will probably give you a fairly reasonable page bounding box for most
pages.

We have separately developed another page bounding box detector that we will
be incorporating into OCRopus over the next few months; that detector
detects the page boundary directly.

Tom

On Fri, Oct 31, 2008 at 02:54, jimfunderburk
<[EMAIL PROTECTED]>wrote:

>
> I am a potential ocropus user.  Based on a lecture by Breuel at a
> Sanskrit symposium in May 2008, and from what I've seen in ocropus
> wiki, I suspect that ocropus can solve the problem described below.
> But for me it is a non-trivial task to get a ubuntu computer, install
> ocropus, etc. etc., so I am hoping that the experts of this group will
> be able to say "Sure, ocropus can do that!", before I proceed further.
>
> The project is to look up a word in scans of the pages of the Wilson
> Sanskrit dictionary, and highlight on the scanned image of the
> relevant page the part pertaining to the word.
>
> You can see the current state of this for the Wilson dictionary at
>  http://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/web/index.php
>  If you enter 'azva', the page for this word is retrieved, and the
> part of the page containing the word is emphasized.
>  For this word, 'azva' the process is quite satisfactory.
>  However, if you try the word 'rAma' or 'sItA', for instance, you see
> that the region highlighted is not quite right.
>  The main problem is that the position of the page within the whole
> scanned image varies, due in part to vagaries of
>  the scanning process.
>
>  Here is where I thought OCROPUS might come in usefully:  to
> determine the pixel coordinates of the'bounding rectangle' of
>  the text.  A table of such information for each page could be fed
> into some other program, possibly such as imageMagick,
>  to automate the 'normalization' of the image within the page.
>
>  Thanks for any suggestions.
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to