Re: [Wikisource-l] [pywikibot] pdf library

Alex Brollo Fri, 13 May 2016 02:07:17 -0700

Simply, from a practital point iof view, my suggestion is: don't try to get
a good djvu from IA pdf, use instead _jp2.zip images (after conversion to
jpg the images are very good), and the result will be much better - almost
as good as images into IA viewer, that uses the same images.


Alex



2016-05-13 10:06 GMT+02:00 Federico Leva (Nemo) <[email protected]>:

> Alex Brollo, 13/05/2016 09:02:
>
>> I presume that this complex structure is somewhat similar of djvu
>> background/foreground segmentation into djvu files, and artifacts are
>> similar.
>>
>
> Sure.
>
>
>> So, pdf images are not only "compressed", but deeply processed and
>> segmented images.
>>
>
> ...which is what I call "compression". I still recommend to try and
> increase the fixed-ppi parameter in such a case of excessive compression.
>
> I also still need an answer to https://it.wikisource.org/?diff=1733473
>
> Is something of this complex IA image processing path documented
>> anywhere?
>>
>
> What do you mean? Are you asking about details of their derivation plan
> for books? What we know has been summarised over time at
> https://en.wikisource.org/wiki/Help:DjVu_files#The_Internet_Archive , as
> always. As the help page IIRC states, the best way to understand what's
> going on is to check the item history and read the derive.php log, like
> https://catalogd.archive.org/log/487271468 which I linked.
>
> The main difference compared to the past is, I think, that they're no
> longer creating the luratech b/w PDF, probably because the "normal" PDF now
> manages to compress enough. They may have not realised that the single PDF
> they now produce is too compressed for illustrations and for cases where
> the original JP2 is too small.
>
>
> Nemo
>
> _______________________________________________
> Wikisource-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] [pywikibot] pdf library

Reply via email to