I like and I studies - as deeply I can - djvu file structure and DjvuLibre routines; dealing with wikisource needs, I appreciate, but I like less, pdf files for their complexity. Proofread procedure is presently based on djvu or pdf files; but I see that another approach could be used, using only simpler routines.
Proofreading procedure needs two inputs: 1. a set of good images of page scans; 2. a good mapped file of text content matched with images. About "mapped text", there are two alternatives, hOCR and xml; both can be used to extract "unmapped raw text" when needed at server level, but at local level too by jQuery. If hOCR/xml of page text could be fastly and simply accessed from nsPage, I see interesting opportunities - i.e. generalized highlighting of selected text on nsPage image both in view and in edit mode; formatting suggestions from heuristic analysis of word coordinates; different organization of high level text structures, as wrong column layout). Alex brollo (it.wikisource)
_______________________________________________ Wikisource-l mailing list Wikisourceemail@example.com https://lists.wikimedia.org/mailman/listinfo/wikisource-l