On 2011-12-30 12:12, Lars Aronsson wrote:
> On 12/30/2011 08:51 PM, Edward Betts wrote:
>> On 2011-12-30 11:33, Lars Aronsson wrote:
>>> Sorry to focus on the bugs. This is worth more work. What if
>>> the OCR software made mistakes in segmentation, could the
>>> proofreader correct this by drawing text boxes manually?
>>
>> This is a good question. The OCR can identify images. I think it might
>> have the box around the word already, but think it is an image. I
>> should add this to the display somehow with a button to switch it from
>> being an image to being a word. That way we already know the coordinates.
>
> You might look at Wikimapia for ideas for the user interface,
> http://www.wikimapia.org/#lat=37.7781096&lon=-122.5062275&z=13&l=0&m=b
>
> Wikimapia is just a Google map backdrop with user-editable
> polygons on top.
>
> There you can zoom in to details and zoom out, and "edit"
> the map to add or modify polygons, which are then tagged
> with names and attributes (= transcribed OCR text). With
> the zoom out, you could see all pages side by side, so the
> item overview (page viewer) and correction mode (leaf view)
> would be one and the same.
>
> These polygons could be individual words or entire text columns,
> or illustrations. If one polygon was "tagged as illustration" by
> the OCR software, a user could correct that. Or merely
> adjust the layout of the polygon.

I'm familiar with Wikimapia. Something like this could work.

> On this page,
> http://edwardbetts.com/correct/leaf/tillhstgenomry00lang/8
> I found some words were split by OCR and should be joined,
> e.g. RIK+ENAS = RIKENAS and Len+NAR+T = Lennart. These are
> probably words that weren't in the OCR software's dictionary.
> So "join words" is one operation I'd like to see.

Agreed. It should be easy to maintain the coordinates when merging words.

-- 
Edward.
_______________________________________________
Ol-discuss mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-discuss
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to