Re: Scanning docs for bitsavers

2019-12-04 Thread Christian Corti via cctalk
Dear Mister Noname, On Tue, 3 Dec 2019, it was written That's _LOSSY_ JBIG2. YOU DON"T HAVE TO USE LOSSY MODE! Don't shout!! And for the topic: you don't have to use JBIG2. Space isn't really an issue today for scanned bilevel documents, so you can just stick with TIFF G4 or PNG.

Re: Scanning docs for bitsavers

2019-12-03 Thread Antonio Carlini via cctalk
On 03/12/2019 20:22, Fred Cisin via cctalk wrote: Watch out.  PDF with OCR can show you a clear and crisp  [possibly wrong] interpretation of the scan, not what the actual scan looked like. The OCR may well say "0" where the printing says "8" but what your eyes will see will be the

Re: Scanning docs for bitsavers

2019-12-03 Thread Fred Cisin via cctalk
On Tue, 3 Dec 2019, Paul Koning via cctalk wrote: The trouble (for both of these) is that many of the users don't know the limitations and blindly use the wrong tools. "To the man who has a hammer, the whole world looks like a thumb." (which is an idictment about misuse, not an indictment of

Re: Scanning docs for bitsavers

2019-12-03 Thread Fred Cisin via cctalk
> JBIG2 .. introduces so many actual factual errors (typically > substituted letters and numbers) On Tue, 3 Dec 2019, Noel Chiappa via cctalk wrote: It's probably worth noting that there are often errors _in the original documents_, too - so even a perfect image doesn't guarantee no

Re: Scanning docs for bitsavers

2019-12-03 Thread Eric Smith via cctalk
On Tue, Dec 3, 2019 at 10:59 AM Paul Berger via cctalk < cctalk@classiccmp.org> wrote: > Is there any way to know what compression was used in a pdf file? > There's not necessarily only one. Every object in a PDF file can have its own selection of compression algorithm. I don't know of any

Re: Scanning docs for bitsavers

2019-12-03 Thread Paul Koning via cctalk
> On Dec 3, 2019, at 12:59 PM, Paul Berger via cctalk > wrote: > > ... > Would TIFF G4 still be preferable to JPEG2000? It would seem I can control > the compression used by selecting the pdf compatibility level. JPEG2000 apparently has a lossless mode (says Wikipedia). If so, it would be

Re: Scanning docs for bitsavers

2019-12-03 Thread Paul Berger via cctalk
On 2019-12-02 4:57 p.m., Eric Smith via cctalk wrote: On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk wrote: When I corresponded with Al Kossow about format several years ago, he indicated that CCITT Group 4 lossless compression was their standard. There are newer bilevel encodings

Re: Scanning docs for bitsavers

2019-12-03 Thread Grant Taylor via cctalk
On 12/3/19 10:30 AM, Eric Smith via cctalk wrote: PDF was never _intended_ for documents that should undergo any further processing. Okay. Fair rebuttal. The few things that have been hacked onto it for interactive use are actually the worst thing about PDF. My opinion Okay. I don't

Re: Scanning docs for bitsavers

2019-12-03 Thread Paul Koning via cctalk
> On Dec 2, 2019, at 11:12 PM, Grant Taylor via cctalk > wrote: > > On 12/2/19 9:06 PM, Grant Taylor via cctalk wrote: >> In my opinion, PDFs are the last place that computer usable data goes. >> Because getting anything out of a PDF as a data source is next to impossible. >> Sure, you, a

Re: Scanning docs for bitsavers

2019-12-03 Thread Eric Smith via cctalk
On Mon, Dec 2, 2019 at 9:06 PM Grant Taylor via cctalk < cctalk@classiccmp.org> wrote: > My problem with PDFs starts where most people stop using them. > > Take the average PDF of text, try to copy and paste the text into a text > file. (That may work.) > Sure. Now try thing same thing with a

Re: Scanning docs for bitsavers

2019-12-03 Thread Eric Smith via cctalk
On Tue, Dec 3, 2019 at 1:50 AM Christian Corti via cctalk < cctalk@classiccmp.org> wrote: > *NEVER* use JBIG2! I hope you know about the Xerox JBIG2 bug (e.g. making > That's _LOSSY_ JBIG2. YOU DON"T HAVE TO USE LOSSY MODE!

Re: Scanning docs for bitsavers

2019-12-03 Thread Eric Smith via cctalk
On Mon, Dec 2, 2019 at 7:08 PM Grant Taylor via cctalk < cctalk@classiccmp.org> wrote: > I *HATE* doing anything with PDFs other than reading them. PDF was never _intended_ for documents that should undergo any further processing. The few things that have been hacked onto it for interactive use

Re: Scanning docs for bitsavers

2019-12-03 Thread Eric Smith via cctalk
On Mon, Dec 2, 2019 at 5:34 PM Guy Dunphy via cctalk wrote: > Mentioning JBIG2 (or any of its predecessors) without noting that it is > completely unacceptable as a scanned document compression scheme, > demonstrates > a lack of awareness of the defects it introduces in encoded documents. >

Re: Scanning docs for bitsavers

2019-12-03 Thread Noel Chiappa via cctalk
> From: Guy Dunphy > JBIG2 .. introduces so many actual factual errors (typically > substituted letters and numbers) It's probably worth noting that there are often errors _in the original documents_, too - so even a perfect image doesn't guarantee no errors. The most recent one (of

Re: Scanning docs for bitsavers

2019-12-03 Thread Guy Dunphy via cctalk
At 01:20 AM 3/12/2019 -0200, you wrote: >I cannot understand your problems with PDF files. >I've created lots and lots of PDFs, with treated and untreated scanned >material. All of them are very readable and in use for years. Of course, >garbage in, garbage out. I take the utmost care in my scans

Re: Scanning docs for bitsavers

2019-12-03 Thread ED SHARPE via cctalk
actually   we scan to pdf  with back ocr  also text  also tiff also jpegwith the slooowww   hp 11x17 scan fax print thing i can scan entite document then save 1 save2 save3  save 4 without rescanning each time   ed  at smecc In a message dated 12/3/2019 2:16:01 AM US Mountain Standard Time,

Re: Scanning docs for bitsavers

2019-12-03 Thread ED SHARPE via cctalk
very nice  file yep, we prefer pdf   with  ocr   back  stuff   ed smecc,orgIn a message dated 12/2/2019 8:20:36 PM US Mountain Standard Time, cctalk@classiccmp.org writes: I cannot understand your problems with PDF files. I've created lots and lots of PDFs, with treated and untreated scanned

Re: Scanning docs for bitsavers

2019-12-03 Thread Jan-Benedict Glaw via cctalk
Hi! On Tue, 2019-12-03 11:34:06 +1100, Guy Dunphy via cctalk wrote: > At 01:57 PM 2/12/2019 -0700, you wrote: > >On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk > >wrote: > > > > > When I corresponded with Al Kossow about format several years ago, he > > > indicated that CCITT Group 4

Re: Scanning docs for bitsavers

2019-12-03 Thread Christian Corti via cctalk
On Mon, 2 Dec 2019, Eric Smith wrote: There are newer bilevel encodings that are somewhat more efficient than G4 (ITU-T T.6), such as JBIG (T.82) and JBIG2 (T.88), but they are not as widely supported, and AFAIK JBIG2 is still patent encumbered. As a result, *NEVER* use JBIG2! I hope you know

Re: Scanning docs for bitsavers

2019-12-02 Thread Grant Taylor via cctalk
On 12/2/19 9:06 PM, Grant Taylor via cctalk wrote: In my opinion, PDFs are the last place that computer usable data goes. Because getting anything out of a PDF as a data source is next to impossible. Sure, you, a human, can read it and consume the data. Try importing a simple table from a

Re: Scanning docs for bitsavers

2019-12-02 Thread Grant Taylor via cctalk
On 12/2/19 8:20 PM, Alexandre Souza via cctalk wrote: I cannot understand your problems with PDF files. My problem with PDFs starts where most people stop using them. Take the average PDF of text, try to copy and paste the text into a text file. (That may work.) Now try to edit a piece of

Re: Scanning docs for bitsavers

2019-12-02 Thread Alexandre Souza via cctalk
I cannot understand your problems with PDF files. I've created lots and lots of PDFs, with treated and untreated scanned material. All of them are very readable and in use for years. Of course, garbage in, garbage out. I take the utmost care in my scans to have good enough source files, so I can

Re: Scanning docs for bitsavers

2019-12-02 Thread Grant Taylor via cctalk
On 12/2/19 5:34 PM, Guy Dunphy via cctalk wrote: Interesting comments Guy. I'm completely naive when it comes to scanning things for preservation. Your comments do pass my naive understanding. But PDF literally cannot be used as a wrapper for the results, since it doesn't incorporate the

Re: Scanning docs for bitsavers

2019-12-02 Thread Guy Dunphy via cctalk
At 01:57 PM 2/12/2019 -0700, you wrote: >On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk >wrote: > >> When I corresponded with Al Kossow about format several years ago, he >> indicated that CCITT Group 4 lossless compression was their standard. >> > >There are newer bilevel encodings that

Re: Scanning docs for bitsavers

2019-12-02 Thread Eric Smith via cctalk
On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk wrote: > When I corresponded with Al Kossow about format several years ago, he > indicated that CCITT Group 4 lossless compression was their standard. > There are newer bilevel encodings that are somewhat more efficient than G4 (ITU-T T.6),

Re: Scanning docs for bitsavers

2019-11-27 Thread Alexandre Souza via cctalk
>My recommendation: use a proper multi-function copier (the big copiers) >that can also scan to network. I currently use our big Konica-Minolta I've got a Lexmark X646E full duplex printing/scanner. I'm still learning how to use it at its max, but I believe I'll scan TONS of documents I have

Re: Scanning docs for bitsavers

2019-11-27 Thread Jason T via cctalk
On Wed, Nov 27, 2019 at 2:01 PM Paul Koning wrote: > Another problem with bilevel scans is that, on some machines at least, they > can be very noisy. That's what I saw on the copier/scanner at the office. > For good scans I use gray scale scanning, with post-processing if desired to >

Re: Scanning docs for bitsavers

2019-11-27 Thread Christian Corti via cctalk
On Wed, 27 Nov 2019, Paul Koning wrote: On Nov 27, 2019, at 2:56 PM, Jason T via cctalk wrote: On Wed, Nov 27, 2019 at 10:12 AM Christian Corti via cctalk wrote: My recommendation: use a proper multi-function copier (the big copiers) that can also scan to network. I currently use our big

Re: Scanning docs for bitsavers

2019-11-27 Thread Jason T via cctalk
On Wed, Nov 27, 2019 at 10:12 AM Christian Corti via cctalk wrote: > My recommendation: use a proper multi-function copier (the big copiers) > that can also scan to network. I currently use our big Konica-Minolta > bizhub 754. Although it'a b/w copier, it can also scan in color. This These are

Re: Scanning docs for bitsavers

2019-11-27 Thread Christian Corti via cctalk
On Wed, 27 Nov 2019, mloe...@cpumagic.scol.pa.us wrote: On Wed, 27 Nov 2019, Noel Chiappa via cctalk wrote: That's what I use too; it has tons of useful features, including being able to drive my single-sided page-feed scanner and being able to number the even-sided pages correctly. The one I

Re: Scanning docs for bitsavers

2019-11-27 Thread Mike Loewen via cctalk
On Wed, 27 Nov 2019, Noel Chiappa via cctalk wrote: > From: Jay Jaeger > CCITT Group 4 lossless compression That's very good indeed. I scan text pages in B+W at slightly less resolution (engineering prints I do higher, they need it), but compressed they turn out to be ~50KB per page, or

Re: Scanning docs for bitsavers

2019-11-27 Thread Noel Chiappa via cctalk
> From: Jay Jaeger > CCITT Group 4 lossless compression That's very good indeed. I scan text pages in B+W at slightly less resolution (engineering prints I do higher, they need it), but compressed they turn out to be ~50KB per page, or less - for long documents (e.g. the DOS-11 System

Re: Scanning docs for bitsavers

2019-11-26 Thread Dennis Boone via cctalk
> As far as multi-page documents, it seems as if my scanner (or its > software) only does uncompressed TIFF. At bitsaver's recommended 400 > dpi, that means about 4M per page. If you're on unix of some sort, the libtiff tools can convert these uncompressed images to G4. The command you'd use

Re: Scanning docs for bitsavers

2019-11-26 Thread Al Kossow via cctalk
On 11/26/19 7:10 PM, Alexandre Souza wrote: > Al, is there a "standard" you would recommend us mere mortals to scan and > archive docs? I've moved to 600dpi bi-tonal tiffs for all new text work since that is the maximum resolution my Panasonic KV-S3065 scanner supports. I use a flatbed at

Re: Scanning docs for bitsavers

2019-11-26 Thread Jay Jaeger via cctalk
On 11/26/2019 8:52 PM, Alan Perry via cctalk wrote: > > I am going through stuff in my office and found that I have some SCSI > device docs that aren't on bitsavers. As far as multi-page documents, it > seems as if my scanner (or its software) only does uncompressed TIFF. At > bitsaver's

Re: Scanning docs for bitsavers

2019-11-26 Thread Alan Perry via cctalk
On 11/26/19 7:05 PM, Chuck Guzis via cctalk wrote: On 11/26/19 6:52 PM, Alan Perry via cctalk wrote: I am going through stuff in my office and found that I have some SCSI device docs that aren't on bitsavers. As far as multi-page documents, it seems as if my scanner (or its software) only

Re: Scanning docs for bitsavers

2019-11-26 Thread Alexandre Souza via cctalk
Al, is there a "standard" you would recommend us mere mortals to scan and archive docs? ---8<---Corte aqui---8<--- http://www.tabajara-labs.blogspot.com http://www.tabalabs.com.br ---8<---Corte aqui---8<--- Em qua., 27 de nov. de 2019 às 01:07, Al Kossow via cctalk < cctalk@classiccmp.org>

Re: Scanning docs for bitsavers

2019-11-26 Thread Al Kossow via cctalk
you can ftp the uncompressed files to me and I'll take care of the conversions On 11/26/19 6:52 PM, Alan Perry via cctalk wrote: > > I am going through stuff in my office and found that I have some SCSI device > docs that aren't on bitsavers. As far as > multi-page documents, it seems as if my

Re: Scanning docs for bitsavers

2019-11-26 Thread Chuck Guzis via cctalk
On 11/26/19 6:52 PM, Alan Perry via cctalk wrote: > > I am going through stuff in my office and found that I have some SCSI > device docs that aren't on bitsavers. As far as multi-page documents, it > seems as if my scanner (or its software) only does uncompressed TIFF. At > bitsaver's