That's really promising! Thank you for sharing this.
A. On Oct 17, 2017 00:11, "Alex Brollo" <[email protected]> wrote: > Here: > Pagina:D'Ayala_-_Dizionario_militare_francese_italiano.djvu/46 > <https://it.wikisource.org/wiki/Pagina:D%27Ayala_-_Dizionario_militare_francese_italiano.djvu/46> > and immediately previous and following pages both the text and some > formatting from Internet Archive file bub_gb_lvzoCyRdzsoC_abbyy.gz > <https://archive.org/download/bub_gb_lvzoCyRdzsoC/bub_gb_lvzoCyRdzsoC_abbyy.gz> > (in previous pages only some templates have been added and a little bit > of regex manipulation has be done) > > Internet Archive _abbyy.gz files are gzipped, enormous xml files where any > detail of FineReader OCR output is exported - but, even if enormous and > terribly complex, they can be parsed and any detail (a little bit > painfully...) can be used; presently, only bold, italic, smallcaps and > paragraphs have been explored, translated into wiki code by a prettily > simple python code. > > Alex > > > > _______________________________________________ > Wikisource-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikisource-l > >
_______________________________________________ Wikisource-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikisource-l
