Hi everyone, please let me revive this thread. There is an ongoing discussion on it.source about the new Internet Archive policy, because this is becoming a *quality problem* for the community. You can see for yourself, here: this is a detail[1] from a pdf[2] taken from Archive this is the detail[3] from a djvu (handmade by the user Alex)
Please look at the pictures to understand the problem :-) The compression of the IA pdf is unfortunately too high, and also the OCR is not that good. We can't probably ask IA to change its mind and redo djvus, there are other more technical ways. But I'd like this to be a problem to be solved together, maybe directly into the magnificent "IA Upload" tool. Wikisource prides itself with quality, so it's right to demand good scans. What I fear is that bigger communities will have expert users that will make their own djvus, and smaller ones that will have to keep IA uploaded PDFs... Do you have any solutions? Is your community worried about this? Thanks Aubrey [1] https://it.wikisource.org/wiki/File:Tarchetti_pdf.png [2] https://commons.wikimedia.org/w/index.php?title=File%3ATarchetti_-_Paolina.pdf&page=4 [3] https://it.wikisource.org/wiki/File:Tarchetti_pdf.png On Mon, Apr 18, 2016 at 3:12 PM, Alex Brollo <[email protected]> wrote: > Can someone "ping" Phe & Tpt into this talk? > > Alex > > 2016-04-18 10:51 GMT+02:00 Andrea Zanni <[email protected]>: > >> I think that the crucial issue here is: will the ia-upload tool run? >> https://tools.wmflabs.org/ia-upload/commons/init >> >> Aubrey >> >> >> On Fri, Apr 15, 2016 at 8:29 PM, Alex Brollo <[email protected]> >> wrote: >> >>> Again, just to explain: pdftodjvu output of a IA pdf is a perfect djvu, >>> with its regular OCR mapped layer, so nothing changes but the need of >>> running a very simple command: >>> >>> pdf2djvu namefile.pdf -o namefile.djvu >>> >>> Alex >>> >>> >>> >>> >>> >>> 2016-04-15 10:01 GMT+02:00 Andrea Zanni <[email protected]>: >>> >>>> Yes, this is why I cited it: if we can manage to use it for Wikisource >>>> importing, we could be safe :-) >>>> >>>> Aubrey >>>> >>>> On Fri, Apr 15, 2016 at 9:41 AM, Federico Leva (Nemo) < >>>> [email protected]> wrote: >>>> >>>>> Andrea Zanni, 15/04/2016 09:03: >>>>> >>>>>> I remember Alex Brollo was working with the djvu_xml layer >>>>>> >>>>> >>>>> The XML output from ABBYY is still being published, AFAIK. >>>>> >>>>> >>>>> Nemo >>>>> >>>>> _______________________________________________ >>>>> Wikisource-l mailing list >>>>> [email protected] >>>>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Wikisource-l mailing list >>>> [email protected] >>>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >>>> >>>> >>> >>> _______________________________________________ >>> Wikisource-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >>> >>> >> >> _______________________________________________ >> Wikisource-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >> >> > > _______________________________________________ > Wikisource-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikisource-l > >
_______________________________________________ Wikisource-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikisource-l
