I am a potential ocropus user. Based on a lecture by Breuel at a Sanskrit symposium in May 2008, and from what I've seen in ocropus wiki, I suspect that ocropus can solve the problem described below. But for me it is a non-trivial task to get a ubuntu computer, install ocropus, etc. etc., so I am hoping that the experts of this group will be able to say "Sure, ocropus can do that!", before I proceed further.
The project is to look up a word in scans of the pages of the Wilson Sanskrit dictionary, and highlight on the scanned image of the relevant page the part pertaining to the word. You can see the current state of this for the Wilson dictionary at http://www.sanskrit-lexicon.uni-koeln.de/scans/WILScan/web/index.php If you enter 'azva', the page for this word is retrieved, and the part of the page containing the word is emphasized. For this word, 'azva' the process is quite satisfactory. However, if you try the word 'rAma' or 'sItA', for instance, you see that the region highlighted is not quite right. The main problem is that the position of the page within the whole scanned image varies, due in part to vagaries of the scanning process. Here is where I thought OCROPUS might come in usefully: to determine the pixel coordinates of the'bounding rectangle' of the text. A table of such information for each page could be fed into some other program, possibly such as imageMagick, to automate the 'normalization' of the image within the page. Thanks for any suggestions. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
