On 2016-02-06 20:32, Dave Wade wrote:


-----Original Message-----
From: Simh [mailto:simh-boun...@trailing-edge.com] On Behalf Of Paul
Koning
Sent: 06 February 2016 19:01
To: Timothe Litt <l...@ieee.org>
Cc: simh@trailing-edge.com
Subject: Re: [Simh] OSs with accessible documentation


On Feb 5, 2016, at 6:10 PM, Timothe Litt <l...@ieee.org> wrote:

Some of the PDFs on bitsavers are searchable.  It would be a good
project to OCR the rest into searchable pdfs - as that also means that
the text can be extracted.   OCR is getting good enough (finally) that
it's feasible.  I'm sure that they'd be accepted back into bitsavers
- searchable is good for everyone.

Some disapprove of OCR for reasons I don't really understand.

It depends how you build the PDF. If you replace the images with the OCR's 
text, which seems to be the default, then you introduce errors.
If you leave the images in place and put text behind the images I can't see 
what the problem is,

For me personally, I would like to have two copies of documentation. One which is pure/plain text. No preservation of the scan. Images in the documentation needs to be preserved, but nothing else. And then you can have the full scanned sources in a different file for those who actually want that.

The reason is that working on a 50M pdf file is horrible. PDF do not work that good with huge amounts of data for each page. It gets slow, it eats resources, and becomes almost unusable as reading material.

I want manuals to use them, not to just "preserve" them.

        Johnny

--
Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: b...@softjar.se             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol
_______________________________________________
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh

Reply via email to