That's really promising!

Thank you for sharing this.

   A.

On Oct 17, 2017 00:11, "Alex Brollo" <[email protected]> wrote:

> Here:
> Pagina:D'Ayala_-_Dizionario_militare_francese_italiano.djvu/46
> <https://it.wikisource.org/wiki/Pagina:D%27Ayala_-_Dizionario_militare_francese_italiano.djvu/46>
> and immediately previous and following pages both the text and some
> formatting  from Internet Archive file bub_gb_lvzoCyRdzsoC_abbyy.gz
> <https://archive.org/download/bub_gb_lvzoCyRdzsoC/bub_gb_lvzoCyRdzsoC_abbyy.gz>
>  (in previous pages only some templates have been added and a little bit
> of regex manipulation has be done)
>
> Internet Archive _abbyy.gz files are gzipped, enormous xml files where any
> detail of FineReader OCR output is exported - but, even if enormous and
> terribly complex, they can be parsed and any detail (a little bit
> painfully...)  can be used; presently, only bold, italic,  smallcaps and
> paragraphs have been explored,  translated into wiki code by a prettily
> simple python code.
>
> Alex
>
>
>
> _______________________________________________
> Wikisource-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Reply via email to