Re: [Wikisource-l] Djvu format fate

2018-04-06 Thread Alex Brollo
There's on the web an interesting suggestion about difference between djvu
and pdf. The question was: how I can get hOCR from hidden layer of a pdf
file? The reply: convert pdf in djvu, then all wik be simple (more or
less). This comes from the fact that anything into a djvu file is open and
"simply" accessible, just as anything into a pdf is difficult and obscure.
Djvu is wiki, pdf isn't. I don't know any other open format that implements
searchable hidden text underlying page image.

But as a first step, incredible djvu opportunities should be *actively
explored and used*! If you use a car simply as a hen-house, never driving
it, any  standard and effective hen-house is similar, or more effective, in
your opinion.

Alex





2018-04-06 15:45 GMT+02:00 Federico Leva (Nemo) :

> Peter Meyer, 06/04/2018 14:59:
>
>> Could we distill these issues online on a wiki page somewhere?   Or is it
>> already done?
>> (1) what are the significant differences between pdf and djvu (or some
>> new version of djvu that we could imagine coming up with)
>>
>
> I agree this is important to outline. For instance, is there some
> Wikisource where PDF files are actively discouraged in favour of DjVu, and
> for what reasons?
>
> Which DjVu features we dream of using within 5 years, which PDF doesn't
> provide? Do we want a system where libraries can feed us with DjVu files,
> the proofread text gets ingested back to the DjVu file and libraries can
> reuse it? Do we want to use some of the low level features of the text
> layer to widely deploy some dark magic, such as the captcha-based
> proofreading we talked about many times or some other interaction between
> MediaWiki and the scans? What "market" is there for such features?
>
> DjVu became our favourite format back at the time when the upload size
> limit was around 10 MiB, if I remember correctly, and compression was the
> most important factor. I often find myself explaining why it's such a
> useful format, but in the end if someone asks me "so, is it fine to just
> upload a PDF at Wikisource?" I have a hard time giving an answer other than
> "sure, don't worry, it will be the same".
>
> Federico
>
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Djvu format fate

2018-04-06 Thread Federico Leva (Nemo)

Peter Meyer, 06/04/2018 14:59:
Could we distill these issues online on a wiki page somewhere?   Or is 
it already done?
(1) what are the significant differences between pdf and djvu (or some 
new version of djvu that we could imagine coming up with)


I agree this is important to outline. For instance, is there some 
Wikisource where PDF files are actively discouraged in favour of DjVu, 
and for what reasons?


Which DjVu features we dream of using within 5 years, which PDF doesn't 
provide? Do we want a system where libraries can feed us with DjVu 
files, the proofread text gets ingested back to the DjVu file and 
libraries can reuse it? Do we want to use some of the low level features 
of the text layer to widely deploy some dark magic, such as the 
captcha-based proofreading we talked about many times or some other 
interaction between MediaWiki and the scans? What "market" is there for 
such features?


DjVu became our favourite format back at the time when the upload size 
limit was around 10 MiB, if I remember correctly, and compression was 
the most important factor. I often find myself explaining why it's such 
a useful format, but in the end if someone asks me "so, is it fine to 
just upload a PDF at Wikisource?" I have a hard time giving an answer 
other than "sure, don't worry, it will be the same".


Federico

___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Djvu format fate

2018-04-06 Thread Aleksey Chalabyan
What do you think  about aproaching ABBYY and explaining that this is
really important to us? Simply keeping feature usually is not a big issue.
Can we ask Foundation to approach them?

2018-04-06 15:59 GMT+04:00 Peter Meyer :

> I have heard about this issue many times, and it always seemed important,
> but also seems too hard to understand and act on.  We could at least
> discuss possibilities online and at upcoming conferences.
>
> Could we distill these issues online on a wiki page somewhere?   Or is it
> already done?
> (1) what are the significant differences between pdf and djvu (or some new
> version of djvu that we could imagine coming up with)
> (2) what partner organizations or developers would want to help or work on
> it?  Is Internet Archive, for example, able and willing to work together on
> it, if we said specifically that this interests us?   Can they help at
> least describe the difficulty of the things we’re taking about?
> Wikisource user group, some chapters perhaps.   It is hard for a network of
> nonprofits to take the lead here, maybe, compared to a focused company .  .
> . but maybe possible.
>
> On Apr 6, 2018, at 10:06 AM, Asaf Bartov  wrote:
>
> Perhaps.  But the case would need to be made (in more detail, and more
> compellingly) to the people who could decide that, and none of them are on
> this list.  You'd probably have to convince managers of technical teams at
> WMF if there is to be a chance of prioritizing it.
>
>A.
>
> On Fri, Apr 6, 2018 at 1:33 AM Alex Brollo  wrote:
>
>> As you know, djvu format is an excellent and open format, but its fate is
>> uncertain since it is overwhelmed by pdf, surely excellent, but closed and
>> very difficult to manage.
>>
>> Even if a small part of djvu features are used by mediawiki, djvu in a
>> necessary tool for wikisource work.
>>
>> Unluckily, there's no sufficient work about djvu, and I see that recently
>> ABBYY discontinued the support of djvu format as output in its OCR engines.
>> This is probably the cause of discontinuation of djvu files output bi
>> Internet Archive.
>>
>> Is it possible to encourage MediaWiki to devoid sufficient energies to
>> save djvu format to its fate?
>>
>> Alex brollo
>>
>>
>> ___
>> Wikisource-l mailing list
>> Wikisource-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
>
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Djvu format fate

2018-04-06 Thread Peter Meyer
I have heard about this issue many times, and it always seemed important, but 
also seems too hard to understand and act on.  We could at least discuss 
possibilities online and at upcoming conferences.

Could we distill these issues online on a wiki page somewhere?   Or is it 
already done?
(1) what are the significant differences between pdf and djvu (or some new 
version of djvu that we could imagine coming up with)
(2) what partner organizations or developers would want to help or work on it?  
Is Internet Archive, for example, able and willing to work together on it, if 
we said specifically that this interests us?   Can they help at least describe 
the difficulty of the things we’re taking about?   Wikisource user group, some 
chapters perhaps.   It is hard for a network of nonprofits to take the lead 
here, maybe, compared to a focused company .  . . but maybe possible.

> On Apr 6, 2018, at 10:06 AM, Asaf Bartov  wrote:
> 
> Perhaps.  But the case would need to be made (in more detail, and more 
> compellingly) to the people who could decide that, and none of them are on 
> this list.  You'd probably have to convince managers of technical teams at 
> WMF if there is to be a chance of prioritizing it.
> 
>A.
> 
> On Fri, Apr 6, 2018 at 1:33 AM Alex Brollo  > wrote:
> As you know, djvu format is an excellent and open format, but its fate is 
> uncertain since it is overwhelmed by pdf, surely excellent, but closed and 
> very difficult to manage.
> 
> Even if a small part of djvu features are used by mediawiki, djvu in a 
> necessary tool for wikisource work. 
> 
> Unluckily, there's no sufficient work about djvu, and I see that recently 
> ABBYY discontinued the support of djvu format as output in its OCR engines. 
> This is probably the cause of discontinuation of djvu files output bi 
> Internet Archive.
> 
> Is it possible to encourage MediaWiki to devoid sufficient energies to save 
> djvu format to its fate?
> 
> Alex brollo
> 
> 
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org 
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l 
> 
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l

___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l


Re: [Wikisource-l] Djvu format fate

2018-04-06 Thread Asaf Bartov
Perhaps.  But the case would need to be made (in more detail, and more
compellingly) to the people who could decide that, and none of them are on
this list.  You'd probably have to convince managers of technical teams at
WMF if there is to be a chance of prioritizing it.

   A.

On Fri, Apr 6, 2018 at 1:33 AM Alex Brollo  wrote:

> As you know, djvu format is an excellent and open format, but its fate is
> uncertain since it is overwhelmed by pdf, surely excellent, but closed and
> very difficult to manage.
>
> Even if a small part of djvu features are used by mediawiki, djvu in a
> necessary tool for wikisource work.
>
> Unluckily, there's no sufficient work about djvu, and I see that recently
> ABBYY discontinued the support of djvu format as output in its OCR engines.
> This is probably the cause of discontinuation of djvu files output bi
> Internet Archive.
>
> Is it possible to encourage MediaWiki to devoid sufficient energies to
> save djvu format to its fate?
>
> Alex brollo
>
>
> ___
> Wikisource-l mailing list
> Wikisource-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikisource-l
>
___
Wikisource-l mailing list
Wikisource-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikisource-l