subject:"Re\: Scanning docs for bitsavers"

Re: Scanning docs for bitsavers

2019-12-04 Thread Christian Corti via cctalk

Dear Mister Noname, On Tue, 3 Dec 2019, it was written That's _LOSSY_ JBIG2. YOU DON"T HAVE TO USE LOSSY MODE! Don't shout!! And for the topic: you don't have to use JBIG2. Space isn't really an issue today for scanned bilevel documents, so you can just stick with TIFF G4 or PNG.

Re: Scanning docs for bitsavers

2019-12-03 Thread Antonio Carlini via cctalk

On 03/12/2019 20:22, Fred Cisin via cctalk wrote: Watch out. PDF with OCR can show you a clear and crisp [possibly wrong] interpretation of the scan, not what the actual scan looked like. The OCR may well say "0" where the printing says "8" but what your eyes will see will be the

Re: Scanning docs for bitsavers

2019-12-03 Thread Fred Cisin via cctalk

On Tue, 3 Dec 2019, Paul Koning via cctalk wrote: The trouble (for both of these) is that many of the users don't know the limitations and blindly use the wrong tools. "To the man who has a hammer, the whole world looks like a thumb." (which is an idictment about misuse, not an indictment of

Re: Scanning docs for bitsavers

2019-12-03 Thread Fred Cisin via cctalk

> JBIG2 .. introduces so many actual factual errors (typically > substituted letters and numbers) On Tue, 3 Dec 2019, Noel Chiappa via cctalk wrote: It's probably worth noting that there are often errors _in the original documents_, too - so even a perfect image doesn't guarantee no

Re: Scanning docs for bitsavers

2019-12-03 Thread Eric Smith via cctalk

On Tue, Dec 3, 2019 at 10:59 AM Paul Berger via cctalk < cctalk@classiccmp.org> wrote: > Is there any way to know what compression was used in a pdf file? > There's not necessarily only one. Every object in a PDF file can have its own selection of compression algorithm. I don't know of any

Re: Scanning docs for bitsavers

2019-12-03 Thread Paul Koning via cctalk

> On Dec 3, 2019, at 12:59 PM, Paul Berger via cctalk > wrote: > > ... > Would TIFF G4 still be preferable to JPEG2000? It would seem I can control > the compression used by selecting the pdf compatibility level. JPEG2000 apparently has a lossless mode (says Wikipedia). If so, it would be

Re: Scanning docs for bitsavers

2019-12-03 Thread Paul Berger via cctalk

On 2019-12-02 4:57 p.m., Eric Smith via cctalk wrote: On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk wrote: When I corresponded with Al Kossow about format several years ago, he indicated that CCITT Group 4 lossless compression was their standard. There are newer bilevel encodings

Re: Scanning docs for bitsavers

2019-12-03 Thread Grant Taylor via cctalk

On 12/3/19 10:30 AM, Eric Smith via cctalk wrote: PDF was never _intended_ for documents that should undergo any further processing. Okay. Fair rebuttal. The few things that have been hacked onto it for interactive use are actually the worst thing about PDF. My opinion Okay. I don't

Re: Scanning docs for bitsavers

2019-12-03 Thread Paul Koning via cctalk

> On Dec 2, 2019, at 11:12 PM, Grant Taylor via cctalk > wrote: > > On 12/2/19 9:06 PM, Grant Taylor via cctalk wrote: >> In my opinion, PDFs are the last place that computer usable data goes. >> Because getting anything out of a PDF as a data source is next to impossible. >> Sure, you, a

Re: Scanning docs for bitsavers

2019-12-03 Thread Eric Smith via cctalk

On Mon, Dec 2, 2019 at 9:06 PM Grant Taylor via cctalk < cctalk@classiccmp.org> wrote: > My problem with PDFs starts where most people stop using them. > > Take the average PDF of text, try to copy and paste the text into a text > file. (That may work.) > Sure. Now try thing same thing with a

Re: Scanning docs for bitsavers

2019-12-03 Thread Eric Smith via cctalk

On Tue, Dec 3, 2019 at 1:50 AM Christian Corti via cctalk < cctalk@classiccmp.org> wrote: > *NEVER* use JBIG2! I hope you know about the Xerox JBIG2 bug (e.g. making > That's _LOSSY_ JBIG2. YOU DON"T HAVE TO USE LOSSY MODE!

Re: Scanning docs for bitsavers

2019-12-03 Thread Eric Smith via cctalk

On Mon, Dec 2, 2019 at 7:08 PM Grant Taylor via cctalk < cctalk@classiccmp.org> wrote: > I *HATE* doing anything with PDFs other than reading them. PDF was never _intended_ for documents that should undergo any further processing. The few things that have been hacked onto it for interactive use

Re: Scanning docs for bitsavers

2019-12-03 Thread Eric Smith via cctalk

On Mon, Dec 2, 2019 at 5:34 PM Guy Dunphy via cctalk wrote: > Mentioning JBIG2 (or any of its predecessors) without noting that it is > completely unacceptable as a scanned document compression scheme, > demonstrates > a lack of awareness of the defects it introduces in encoded documents. >

Re: Scanning docs for bitsavers

2019-12-03 Thread Noel Chiappa via cctalk

> From: Guy Dunphy > JBIG2 .. introduces so many actual factual errors (typically > substituted letters and numbers) It's probably worth noting that there are often errors _in the original documents_, too - so even a perfect image doesn't guarantee no errors. The most recent one (of

Re: Scanning docs for bitsavers

2019-12-03 Thread Guy Dunphy via cctalk

At 01:20 AM 3/12/2019 -0200, you wrote: >I cannot understand your problems with PDF files. >I've created lots and lots of PDFs, with treated and untreated scanned >material. All of them are very readable and in use for years. Of course, >garbage in, garbage out. I take the utmost care in my scans

Re: Scanning docs for bitsavers

2019-12-03 Thread ED SHARPE via cctalk

actually we scan to pdf with back ocr also text also tiff also jpegwith the slooowww hp 11x17 scan fax print thing i can scan entite document then save 1 save2 save3 save 4 without rescanning each time ed at smecc In a message dated 12/3/2019 2:16:01 AM US Mountain Standard Time,

Re: Scanning docs for bitsavers

2019-12-03 Thread ED SHARPE via cctalk

very nice file yep, we prefer pdf with ocr back stuff ed smecc,orgIn a message dated 12/2/2019 8:20:36 PM US Mountain Standard Time, cctalk@classiccmp.org writes: I cannot understand your problems with PDF files. I've created lots and lots of PDFs, with treated and untreated scanned

Re: Scanning docs for bitsavers

2019-12-03 Thread Jan-Benedict Glaw via cctalk

Hi! On Tue, 2019-12-03 11:34:06 +1100, Guy Dunphy via cctalk wrote: > At 01:57 PM 2/12/2019 -0700, you wrote: > >On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk > >wrote: > > > > > When I corresponded with Al Kossow about format several years ago, he > > > indicated that CCITT Group 4

Re: Scanning docs for bitsavers

2019-12-03 Thread Christian Corti via cctalk

On Mon, 2 Dec 2019, Eric Smith wrote: There are newer bilevel encodings that are somewhat more efficient than G4 (ITU-T T.6), such as JBIG (T.82) and JBIG2 (T.88), but they are not as widely supported, and AFAIK JBIG2 is still patent encumbered. As a result, *NEVER* use JBIG2! I hope you know

Re: Scanning docs for bitsavers

2019-12-02 Thread Grant Taylor via cctalk

On 12/2/19 9:06 PM, Grant Taylor via cctalk wrote: In my opinion, PDFs are the last place that computer usable data goes. Because getting anything out of a PDF as a data source is next to impossible. Sure, you, a human, can read it and consume the data. Try importing a simple table from a

Re: Scanning docs for bitsavers

2019-12-02 Thread Grant Taylor via cctalk

On 12/2/19 8:20 PM, Alexandre Souza via cctalk wrote: I cannot understand your problems with PDF files. My problem with PDFs starts where most people stop using them. Take the average PDF of text, try to copy and paste the text into a text file. (That may work.) Now try to edit a piece of

Re: Scanning docs for bitsavers

2019-12-02 Thread Alexandre Souza via cctalk

I cannot understand your problems with PDF files. I've created lots and lots of PDFs, with treated and untreated scanned material. All of them are very readable and in use for years. Of course, garbage in, garbage out. I take the utmost care in my scans to have good enough source files, so I can

Re: Scanning docs for bitsavers

2019-12-02 Thread Grant Taylor via cctalk

On 12/2/19 5:34 PM, Guy Dunphy via cctalk wrote: Interesting comments Guy. I'm completely naive when it comes to scanning things for preservation. Your comments do pass my naive understanding. But PDF literally cannot be used as a wrapper for the results, since it doesn't incorporate the

Re: Scanning docs for bitsavers

2019-12-02 Thread Guy Dunphy via cctalk

At 01:57 PM 2/12/2019 -0700, you wrote: >On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk >wrote: > >> When I corresponded with Al Kossow about format several years ago, he >> indicated that CCITT Group 4 lossless compression was their standard. >> > >There are newer bilevel encodings that

Re: Scanning docs for bitsavers

2019-12-02 Thread Eric Smith via cctalk

On Tue, Nov 26, 2019 at 8:51 PM Jay Jaeger via cctalk wrote: > When I corresponded with Al Kossow about format several years ago, he > indicated that CCITT Group 4 lossless compression was their standard. > There are newer bilevel encodings that are somewhat more efficient than G4 (ITU-T T.6),

Re: Scanning docs for bitsavers

2019-11-27 Thread Alexandre Souza via cctalk

>My recommendation: use a proper multi-function copier (the big copiers) >that can also scan to network. I currently use our big Konica-Minolta I've got a Lexmark X646E full duplex printing/scanner. I'm still learning how to use it at its max, but I believe I'll scan TONS of documents I have

Re: Scanning docs for bitsavers

2019-11-27 Thread Jason T via cctalk

On Wed, Nov 27, 2019 at 2:01 PM Paul Koning wrote: > Another problem with bilevel scans is that, on some machines at least, they > can be very noisy. That's what I saw on the copier/scanner at the office. > For good scans I use gray scale scanning, with post-processing if desired to >

Re: Scanning docs for bitsavers

2019-11-27 Thread Christian Corti via cctalk

On Wed, 27 Nov 2019, Paul Koning wrote: On Nov 27, 2019, at 2:56 PM, Jason T via cctalk wrote: On Wed, Nov 27, 2019 at 10:12 AM Christian Corti via cctalk wrote: My recommendation: use a proper multi-function copier (the big copiers) that can also scan to network. I currently use our big

Re: Scanning docs for bitsavers

2019-11-27 Thread Jason T via cctalk

On Wed, Nov 27, 2019 at 10:12 AM Christian Corti via cctalk wrote: > My recommendation: use a proper multi-function copier (the big copiers) > that can also scan to network. I currently use our big Konica-Minolta > bizhub 754. Although it'a b/w copier, it can also scan in color. This These are

Re: Scanning docs for bitsavers

2019-11-27 Thread Christian Corti via cctalk

On Wed, 27 Nov 2019, mloe...@cpumagic.scol.pa.us wrote: On Wed, 27 Nov 2019, Noel Chiappa via cctalk wrote: That's what I use too; it has tons of useful features, including being able to drive my single-sided page-feed scanner and being able to number the even-sided pages correctly. The one I

Re: Scanning docs for bitsavers

2019-11-27 Thread Mike Loewen via cctalk

On Wed, 27 Nov 2019, Noel Chiappa via cctalk wrote: > From: Jay Jaeger > CCITT Group 4 lossless compression That's very good indeed. I scan text pages in B+W at slightly less resolution (engineering prints I do higher, they need it), but compressed they turn out to be ~50KB per page, or

Re: Scanning docs for bitsavers

2019-11-27 Thread Noel Chiappa via cctalk

> From: Jay Jaeger > CCITT Group 4 lossless compression That's very good indeed. I scan text pages in B+W at slightly less resolution (engineering prints I do higher, they need it), but compressed they turn out to be ~50KB per page, or less - for long documents (e.g. the DOS-11 System

Re: Scanning docs for bitsavers

2019-11-26 Thread Dennis Boone via cctalk

> As far as multi-page documents, it seems as if my scanner (or its > software) only does uncompressed TIFF. At bitsaver's recommended 400 > dpi, that means about 4M per page. If you're on unix of some sort, the libtiff tools can convert these uncompressed images to G4. The command you'd use

Re: Scanning docs for bitsavers

2019-11-26 Thread Al Kossow via cctalk

On 11/26/19 7:10 PM, Alexandre Souza wrote: > Al, is there a "standard" you would recommend us mere mortals to scan and > archive docs? I've moved to 600dpi bi-tonal tiffs for all new text work since that is the maximum resolution my Panasonic KV-S3065 scanner supports. I use a flatbed at

Re: Scanning docs for bitsavers

2019-11-26 Thread Jay Jaeger via cctalk

On 11/26/2019 8:52 PM, Alan Perry via cctalk wrote: > > I am going through stuff in my office and found that I have some SCSI > device docs that aren't on bitsavers. As far as multi-page documents, it > seems as if my scanner (or its software) only does uncompressed TIFF. At > bitsaver's

Re: Scanning docs for bitsavers

2019-11-26 Thread Alan Perry via cctalk

On 11/26/19 7:05 PM, Chuck Guzis via cctalk wrote: On 11/26/19 6:52 PM, Alan Perry via cctalk wrote: I am going through stuff in my office and found that I have some SCSI device docs that aren't on bitsavers. As far as multi-page documents, it seems as if my scanner (or its software) only

Re: Scanning docs for bitsavers

2019-11-26 Thread Alexandre Souza via cctalk

Al, is there a "standard" you would recommend us mere mortals to scan and archive docs? ---8<---Corte aqui---8<--- http://www.tabajara-labs.blogspot.com http://www.tabalabs.com.br ---8<---Corte aqui---8<--- Em qua., 27 de nov. de 2019 às 01:07, Al Kossow via cctalk < cctalk@classiccmp.org>

Re: Scanning docs for bitsavers

2019-11-26 Thread Al Kossow via cctalk

you can ftp the uncompressed files to me and I'll take care of the conversions On 11/26/19 6:52 PM, Alan Perry via cctalk wrote: > > I am going through stuff in my office and found that I have some SCSI device > docs that aren't on bitsavers. As far as > multi-page documents, it seems as if my

Re: Scanning docs for bitsavers

2019-11-26 Thread Chuck Guzis via cctalk

On 11/26/19 6:52 PM, Alan Perry via cctalk wrote: > > I am going through stuff in my office and found that I have some SCSI > device docs that aren't on bitsavers. As far as multi-page documents, it > seems as if my scanner (or its software) only does uncompressed TIFF. At > bitsaver's

39 matches

Mail list logo