On 2011-12-30 12:12, Lars Aronsson wrote: > On 12/30/2011 08:51 PM, Edward Betts wrote: >> On 2011-12-30 11:33, Lars Aronsson wrote: >>> Sorry to focus on the bugs. This is worth more work. What if >>> the OCR software made mistakes in segmentation, could the >>> proofreader correct this by drawing text boxes manually? >> >> This is a good question. The OCR can identify images. I think it might >> have the box around the word already, but think it is an image. I >> should add this to the display somehow with a button to switch it from >> being an image to being a word. That way we already know the coordinates. > > You might look at Wikimapia for ideas for the user interface, > http://www.wikimapia.org/#lat=37.7781096&lon=-122.5062275&z=13&l=0&m=b > > Wikimapia is just a Google map backdrop with user-editable > polygons on top. > > There you can zoom in to details and zoom out, and "edit" > the map to add or modify polygons, which are then tagged > with names and attributes (= transcribed OCR text). With > the zoom out, you could see all pages side by side, so the > item overview (page viewer) and correction mode (leaf view) > would be one and the same. > > These polygons could be individual words or entire text columns, > or illustrations. If one polygon was "tagged as illustration" by > the OCR software, a user could correct that. Or merely > adjust the layout of the polygon.
I'm familiar with Wikimapia. Something like this could work. > On this page, > http://edwardbetts.com/correct/leaf/tillhstgenomry00lang/8 > I found some words were split by OCR and should be joined, > e.g. RIK+ENAS = RIKENAS and Len+NAR+T = Lennart. These are > probably words that weren't in the OCR software's dictionary. > So "join words" is one operation I'd like to see. Agreed. It should be easy to maintain the coordinates when merging words. -- Edward. _______________________________________________ Ol-discuss mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss To unsubscribe from this mailing list, send email to [email protected]
