@Anika: happy to know that you like "visualizzatore" and that you discovered the search function, that is perhaps the most useful trick, together with pre-viewing of OCR for "red" pages, the latter allowing to refine a book-specific shared regex set.
Alex <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Mail priva di virus. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> 2017-10-16 20:09 GMT+02:00 Anika Born <[email protected]>: > as aubrey: Thank you very much! > > I shared these news at the Scriptorium of de.ws. > > I also used the opportunity to inform them about your "Visualizzatore". > This is so cool!!!! (especially the search-function) > > And because I had some time (and the best things come in threes) I invited > them to your it.WikiCon in Trento (https://meta.wikimedia.org/ > wiki/ItWikiCon/2017/Proposte#Wikisource). Have fun there! My best wishes > to the organizers. I co-organized it three times in a row for the > all-German-Community.... > > https://de.wikisource.org/wiki/Wikisource:Skriptorium# > Italien:_17._bis_19._November_WikiCon_in_Trient > > > Anika > > 2017-10-16 19:35 GMT+02:00 Andrea Zanni <[email protected]>: > >> Thanks Alex! >> I really hope this is a direction where other developers will follow: >> being able to harness the full potential of structured data from OCR >> software is absolutely crucial for Wikisource: >> we could actually automatize *a lot* of the formatting work now done by >> volunteers, and their time could be spent still formatting, proofreading >> and validating, but with much power than before. >> IMO, it changes a lot if a book is formatted ~50% by a machine, we could >> do much more books in less time. >> Go Alex! >> >> Aubrey >> >> On Mon, Oct 16, 2017 at 5:42 PM, Asaf Bartov <[email protected]> >> wrote: >> >>> That's really promising! >>> >>> Thank you for sharing this. >>> >>> A. >>> >>> On Oct 17, 2017 00:11, "Alex Brollo" <[email protected]> wrote: >>> >>>> Here: >>>> Pagina:D'Ayala_-_Dizionario_militare_francese_italiano.djvu/46 >>>> <https://it.wikisource.org/wiki/Pagina:D%27Ayala_-_Dizionario_militare_francese_italiano.djvu/46> >>>> and immediately previous and following pages both the text and some >>>> formatting from Internet Archive file bub_gb_lvzoCyRdzsoC_abbyy.gz >>>> <https://archive.org/download/bub_gb_lvzoCyRdzsoC/bub_gb_lvzoCyRdzsoC_abbyy.gz> >>>> (in previous pages only some templates have been added and a little >>>> bit of regex manipulation has be done) >>>> >>>> Internet Archive _abbyy.gz files are gzipped, enormous xml files where >>>> any detail of FineReader OCR output is exported - but, even if enormous and >>>> terribly complex, they can be parsed and any detail (a little bit >>>> painfully...) can be used; presently, only bold, italic, smallcaps and >>>> paragraphs have been explored, translated into wiki code by a prettily >>>> simple python code. >>>> >>>> Alex >>>> >>>> >>>> >>>> _______________________________________________ >>>> Wikisource-l mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >>>> >>>> >>> _______________________________________________ >>> Wikisource-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >>> >>> >> >> _______________________________________________ >> Wikisource-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >> >> > > _______________________________________________ > Wikisource-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikisource-l > >
_______________________________________________ Wikisource-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikisource-l
