There's also this new Phab task, that's looking at a more limited first-step:
Investigation: Could we build a Tool Labs project to generate Djvu files for WikiSource https://phabricator.wikimedia.org/T154538 On Tue, 3 Jan 2017, at 07:46 AM, Alex Brollo wrote: > You can see a great advantage of djvu files over pdf files into the > present file list of any IA item. You can see that IA removed djvu > files, but it builds and publishes _djvu.xml file. Why? I presume > that IA uses that file to "map words" into its book viewer, since it > has a good text structure while being *pretty simple*. It can be > translated into hOCR, and editing its text nodes the edited text can > be uploaded again into the djvu file. Itsource is testing, on some > texts, tricks to mass-fix djvu text layer (removing scannos etc.) > *before* uploading it into Commons. > > It's a pity IMHO that this magic book format has been disregarded. Its > structure is *open* just as the pdf structure is *closed*. > > Alex > > > > 2017-01-03 0:19 GMT+01:00 Sam Wilson <[email protected]>: >> __ >> I wonder if, rather than creating a new IA item, we should just >> link the original IA item to the DjVu on Commons (via a review)? Or >> is there a discoverability benefit to be had by having the DjVu >> also on IA? >> >> >> On Tue, 3 Jan 2017, at 07:07 AM, Sam Wilson wrote: >>> Good idea. I guess it's not ideal to end up with two items, but at >>> least the 2nd will be updateable from our end. >>> >>> It looks like we can add HTML links to IA reviews too, which is >>> nice: https://archive.org/details/spinoza_etica_paravia >>> >>> >>> On Mon, 2 Jan 2017, at 11:52 PM, Alex Brollo wrote: >>>> Done :-) >>>> >>>> Alex >>>> >>>> 2017-01-02 16:49 GMT+01:00 Alex Brollo <[email protected]>: >>>>> Please take a look to >>>>> https://archive.org/details/spinoza_etica_paravia_djvu, this is >>>>> precisely a djvu-only item that I uploaded some days ago. I asked >>>>> for permission to create "djvu-only items" into IA forum and I got >>>>> it; this is the fiirst item I created; as you see there's some >>>>> "implicit convention" too (the name of item is the original one + >>>>> a _djvu suffix: it has been derived from >>>>> https://archive.org/details/spinoza_etica_paravia) and metadata >>>>> are the same, but a standard warning "Derived from files into >>>>> L'Etica[1]" into the description field. >>>>> >>>>> So far I did not do the last step, t.i. adding a "backlink" from >>>>> original item to the derived one. >>>>> >>>>> internetarchive.py allows to automatize the whole work (to >>>>> download metadata of source item, to build the new item name and >>>>> to add the warning do description field and to upload the new >>>>> item). >>>>> >>>>> >>> >>> >>> _________________________________________________ >>> Wikisource-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >>> >> >> >> _______________________________________________ >> Wikisource-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >> > _________________________________________________ > Wikisource-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikisource-l Links: 1. https://archive.org/details/spinoza_etica_paravia
_______________________________________________ Wikisource-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikisource-l
