Perhaps there's a misinterpretation - I mentioned abbyy.xml but with no project to import it as-it-is; abbyy.xml is only a surprising data container from which extract anything useful to speed up proofreading (and formatting) - nothing more than this.
Just an example: vertical djvu coordinates of lines can be used to get font-size; horizontal coordinates of lines can be used to recognize centered text; paragraphs splitting is valuable; coolumns can be recognized; margin too; with some effort probably poems can pop up. Far from simply importing coordinates, it's a matter of use them at our best; no data, no data information contents. Alex 2013/7/17 Lars Aronsson <[email protected]> > On 07/17/2013 12:57 PM, Alex Brollo wrote: > >> FineReader OCR stores an incredibly detailed information in [...] >> abbyy.xml >> > > In the other end, Wikisource is a wiki that edits wiki text. > Sure, you could insert the XML there and let users > edit the XML, but that would scare more users away > and allow for more mistakes. > > For example, if proofreading Hamlet, > > To be or not to bc, that is the question, > > anybody can easily spot "bc" and correct that. > In the XML version, > > <word x=1 y=1>To</word> > <word x=5 y=1>be</word> > <word x=8 y=1>or</word> > > someone might think that "or" should be a litte more > to the right, so one user inserts a space between the > tag "<word x=8 y=1>" and "or", while another user > adjusts the tag to "<word x=9 y=1>". All the tags > make it harder to spot the OCR error "bc". > > Even if you replace manual XML editing with some > graphic tool, you get the same ambiguity between > adding whitespace and adjusting coordinates. > > This is a nightmare that we avoid by throwing away > all the coordinates and just proofreading the plain text. > It is not the perfect system, it's a compromise, in > order to get some useful work done. > > > -- > Lars Aronsson ([email protected]) > Project Runeberg - free Nordic literature - http://runeberg.org/ > > > > ______________________________**_________________ > Wikisource-l mailing list > [email protected].**org <[email protected]> > https://lists.wikimedia.org/**mailman/listinfo/wikisource-l<https://lists.wikimedia.org/mailman/listinfo/wikisource-l> >
_______________________________________________ Wikisource-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikisource-l
