I like and I studies - as deeply I can - djvu file structure and DjvuLibre
routines; dealing with wikisource needs, I appreciate, but I like less, pdf
files for their complexity. Proofread procedure is presently based on djvu
or pdf files; but I see that another approach could be used, using only
simpler routines.

Proofreading procedure needs two inputs:
1. a set of good images of page scans;
2. a good mapped file of text content matched with images.

About "mapped text", there are two alternatives, hOCR and xml; both can be
used to extract "unmapped raw text" when needed at server level, but at
local level too by jQuery. If hOCR/xml of page text could be fastly and
simply accessed from nsPage, I see interesting opportunities - i.e.
generalized highlighting of selected text on nsPage image both in view and
in edit mode; formatting suggestions from heuristic analysis of word
coordinates; different organization of high level text structures, as wrong
column layout).

Alex brollo (it.wikisource)
Wikisource-l mailing list

Reply via email to