Tom

Has the bounding box detector been added to OCROpus? I downloaded the 
latest version and couldn't get the word coordinates. Hence I request your 
help 

Thanks in advance

Natraj 

On Wednesday, November 5, 2008 2:18:24 AM UTC+5:30, Tom wrote:
>
> You can compute the bounding box as the bounding box of the text lines.  
> That will probably give you a fairly reasonable page bounding box for most 
> pages.
>
> We have separately developed another page bounding box detector that we 
> will be incorporating into OCRopus over the next few months; that detector 
> detects the page boundary directly.
>
> Tom
>
> On Fri, Oct 31, 2008 at 02:54, jimfunderburk 
> <[email protected]<javascript:>
> > wrote:
>
>>
>> I am a potential ocropus user.  Based on a lecture by Breuel at a
>> Sanskrit symposium in May 2008, and from what I've seen in ocropus
>> wiki, I suspect that ocropus can solve the problem described below.
>> But for me it is a non-trivial task to get a ubuntu computer, install
>> ocropus, etc. etc., so I am hoping that the experts of this group will
>> be able to say "Sure, ocropus can do that!", before I proceed further.
>>
>> The project is to look up a word in scans of the pages of the Wilson
>> Sanskrit dictionary, and highlight on the scanned image of the
>> relevant page the part pertaining to the word.
>>
>> You can see the current state of this for the Wilson dictionary at
>>  http://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/web/index.php
>>  If you enter 'azva', the page for this word is retrieved, and the
>> part of the page containing the word is emphasized.
>>  For this word, 'azva' the process is quite satisfactory.
>>  However, if you try the word 'rAma' or 'sItA', for instance, you see
>> that the region highlighted is not quite right.
>>  The main problem is that the position of the page within the whole
>> scanned image varies, due in part to vagaries of
>>  the scanning process.
>>
>>  Here is where I thought OCROPUS might come in usefully:  to
>> determine the pixel coordinates of the'bounding rectangle' of
>>  the text.  A table of such information for each page could be fed
>> into some other program, possibly such as imageMagick,
>>  to automate the 'normalization' of the image within the page.
>>
>>  Thanks for any suggestions.
>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ocropus/3462f85f-850f-4797-925f-b0cc568011ee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to