Nemo, try to do an "autopsy" of cited IA pdf by pdfimages (xpdf) that
recovers raw pdf images into its pages. You'll find that pages are
exotically segmented into a full color background, a strange image, and an
inverted image of thresholded image (I presume, used as a mask). Just
negating the last one, you can get a decent, light BW image of the page. I
could build from the last one a decent BW djvu image:
https://it.wikisource.org/wiki/File:Paolina.djvu , but it.source users
didn't like the idea
https://it.wikisource.org/wiki/Wikisource:Bar#Pensiero_in_libert.C3.A0_sulle_immagini_delle_pagine

I presume that this complex structure is somewhat similar of djvu
background/foreground segmentation into djvu files, and artifacts are
similar.

So, pdf images are not only "compressed", but deeply processed and
segmented images.

Anyway: IA image viewer doesn't use at all pdf (nor djvu) but uses jpg from
jp2 files; so, if you need a djvu similar, for details, to what you see
into the IA viewer, you have to download and process jp2 images to build a
decent djvu file.

Is something of this complex IA image processing path documented anywhere?
I got my conclusions simply by "try and learn" from IA  file "necropsy".

Alex

2016-05-12 20:10 GMT+02:00 Federico Leva (Nemo) <[email protected]>:

> Andrea Zanni, 12/05/2016 19:38:
>
>> [1] https://it.wikisource.org/wiki/File:Tarchetti_pdf.png
>> [2]
>>
>> https://commons.wikimedia.org/w/index.php?title=File%3ATarchetti_-_Paolina.pdf&page=4
>> [3] https://it.wikisource.org/wiki/File:Tarchetti_pdf.png
>>
>
> That was meant to be
> https://it.wikisource.org/wiki/File:Tarchetti_alex_djvu.png
>
> I don't think this has anything to do with DjVu or PDF, the problem is
> very clear just by looking at
> https://archive.org/download/digitami_LO10534041 : the JP2 conversion
> compressed the images 30 times, the PDF compression 5 more times.
>
> The first step in such cases, as documented in
> https://en.wikisource.org/wiki/Help:DjVu_files#The_Internet_Archive , is
> to add/increase the fixed-ppi field. I don't understand what was used in
> https://catalogd.archive.org/log/487271468
>
>
> Nemo
>
> _______________________________________________
> Wikisource-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Reply via email to