Re: [SWCollect] Computist
Jim Leonard wrote: Dan Chisarick wrote: JPEG: I guess I was lazy because the scanner supports JPEG, TIFF, PCT and PSD as native outputs. I can scan each page w/its own settings (color vs B&W pretty much) in TIFF format and post-process the scans into PNG. One more thing I wanted to add: Don't be afraid of B&W Lineart. For a full page of B&W text/code with no photos, it really is the best option. Try a scan of Computist at 600 DPI at your scanner's Line Art setting (NO dithering) for a text page as a test. -- Jim Leonard ([EMAIL PROTECTED]) World's largest electronic gaming project:http://www.MobyGames.com/ A delicious slice of the demoscene:http://www.MindCandyDVD.com/ Various oldskool PC rants and ramblings: http://www.oldskool.org/ -- This message was sent to you because you are currently subscribed to the swcollect mailing list. To unsubscribe, send mail to [EMAIL PROTECTED] with a subject of 'unsubscribe swcollect' Archives are available at: http://www.mail-archive.com/[EMAIL PROTECTED]/
Re: [SWCollect] Computist
Dan Chisarick wrote: JPEG: I guess I was lazy because the scanner supports JPEG, TIFF, PCT and PSD as native outputs. I can scan each page w/its own settings (color vs B&W pretty much) in TIFF format and post-process the scans into PNG. JPEG is fantastic if there are any full-page ads or photos or something -- you scan those at 300 DPI with some sort of descreening option turned on and it turns out great, then JPEG turns it into a managable size. But for just B&W text, it's the last thing you want to use. PDF: I was going for the 'book binding' wrapper rather than having a ZIP of loose pages, but when I look at even my original scans... they're only 17MB for the individual pages, but the resulting PDF is 43! I'm willing to bet that there is some optimization I can do to make the PDF files smaller w/o compromising the quality, but that's my last priority. I was also thinking annotations, the 'searchable image' option, etc. If you're going to use PDF, you should OCR the images to turn stuff into usuable text. Adobe Acrobat's Capture has a checkbox'd option to do OCR but *leave* the image in place so that any bad OCR won't mangle the displayed results (but the text will still be hidden and searchable). That is probably the only reason to use PDF. Otherwise, as you found out, it's better to just archive a series of image files, properly numbered. -- Jim Leonard ([EMAIL PROTECTED]) World's largest electronic gaming project:http://www.MobyGames.com/ A delicious slice of the demoscene:http://www.MindCandyDVD.com/ Various oldskool PC rants and ramblings: http://www.oldskool.org/ -- This message was sent to you because you are currently subscribed to the swcollect mailing list. To unsubscribe, send mail to [EMAIL PROTECTED] with a subject of 'unsubscribe swcollect' Archives are available at: http://www.mail-archive.com/[EMAIL PROTECTED]/
Re: [SWCollect] Computist
Ok, lots of stuff to try it seems :) JPEG: I guess I was lazy because the scanner supports JPEG, TIFF, PCT and PSD as native outputs. I can scan each page w/its own settings (color vs B&W pretty much) in TIFF format and post-process the scans into PNG. PDF: I was going for the 'book binding' wrapper rather than having a ZIP of loose pages, but when I look at even my original scans... they're only 17MB for the individual pages, but the resulting PDF is 43! I'm willing to bet that there is some optimization I can do to make the PDF files smaller w/o compromising the quality, but that's my last priority. I was also thinking annotations, the 'searchable image' option, etc. BTW, scans used to make the PDF: http://homepage.mac.com/chisarickd/061.zip I'll try some of these out and see what sort of improvements I get. Thanks. On Apr 12, 2004, at 3:27 AM, Jim Leonard wrote: Dan Chisarick wrote: Any feedback appreciated before I make 93+ more mistakes... (remaining issues). While I'd like to say you should run it through OCR, I can see from the content (which kicks f**king ass, btw) that OCR would most likely murder it. But 43MB per issue is nuts. So, my suggestions are: - Don't use JPEG for 8-bit B&W text. JPEG was architected for continuous-tone images, not harshly-contrasting edges (like text). Use something lossless (preferably PNG) for text. Don't believe me? As proof, I used Acrobat on your PDF to extract the source JPGs for page 10 (there are two that make up the page) and I combined them in Photoshop, then saved out to a grayscale (8-bit) PNG. Total size of source JPGs was 781K, but the PNG as saved from Photoshop was 465K. For extra crunching, I let PNGGauntlet chew on that file for about 10 minutes and it got it down to 316K. (Since PNGGauntlet can batch files overnight, making the time it takes a non-issue, I usually include it in all of my processes.) - Scan at 600 DPI halftone (that's 2-color B&W) for text-only pages without color. Not only will you completely eliminate "bleed" from the other side of the page, but it will compress better than anything else. You're archiving text; at that high a resolution (600 DPI), you don't need anti-aliased edges. Again, as an example, I scanned a text-only page without color or photos as 600 DPI and the resulting PNG saved out of Photoshop was 363K. Running through PNGGauntlet for 12 minutes shaved it down to 270K. That's four times your previous scanning resolution at 1/3rd the filesize (and it's perfectly clean and readable). - Don't deliver the images in a PDF wrapper. I love PDF, but it's meant for text mixed with images, not just images. Try just a .zip (with no compression of course) with all the images. BTW, if you would like the exact images I scanned, I still have them on the hard drive -- I'm not just making numbers up, you can see the test files for yourself. Just tell me where to email them. -- Jim Leonard ([EMAIL PROTECTED]) http://www.oldskool.org/ Want to help an ambitious games project? http://www.mobygames.com/ Or check out some trippy MindCandy at http://www.mindcandydvd.com/ -- This message was sent to you because you are currently subscribed to the swcollect mailing list. To unsubscribe, send mail to [EMAIL PROTECTED] with a subject of 'unsubscribe swcollect' Archives are available at: http://www.mail-archive.com/[EMAIL PROTECTED]/ -- This message was sent to you because you are currently subscribed to the swcollect mailing list. To unsubscribe, send mail to [EMAIL PROTECTED] with a subject of 'unsubscribe swcollect' Archives are available at: http://www.mail-archive.com/[EMAIL PROTECTED]/
Re: [SWCollect] Computist
Dan Chisarick wrote: Any feedback appreciated before I make 93+ more mistakes... (remaining issues). While I'd like to say you should run it through OCR, I can see from the content (which kicks f**king ass, btw) that OCR would most likely murder it. But 43MB per issue is nuts. So, my suggestions are: - Don't use JPEG for 8-bit B&W text. JPEG was architected for continuous-tone images, not harshly-contrasting edges (like text). Use something lossless (preferably PNG) for text. Don't believe me? As proof, I used Acrobat on your PDF to extract the source JPGs for page 10 (there are two that make up the page) and I combined them in Photoshop, then saved out to a grayscale (8-bit) PNG. Total size of source JPGs was 781K, but the PNG as saved from Photoshop was 465K. For extra crunching, I let PNGGauntlet chew on that file for about 10 minutes and it got it down to 316K. (Since PNGGauntlet can batch files overnight, making the time it takes a non-issue, I usually include it in all of my processes.) - Scan at 600 DPI halftone (that's 2-color B&W) for text-only pages without color. Not only will you completely eliminate "bleed" from the other side of the page, but it will compress better than anything else. You're archiving text; at that high a resolution (600 DPI), you don't need anti-aliased edges. Again, as an example, I scanned a text-only page without color or photos as 600 DPI and the resulting PNG saved out of Photoshop was 363K. Running through PNGGauntlet for 12 minutes shaved it down to 270K. That's four times your previous scanning resolution at 1/3rd the filesize (and it's perfectly clean and readable). - Don't deliver the images in a PDF wrapper. I love PDF, but it's meant for text mixed with images, not just images. Try just a .zip (with no compression of course) with all the images. BTW, if you would like the exact images I scanned, I still have them on the hard drive -- I'm not just making numbers up, you can see the test files for yourself. Just tell me where to email them. -- Jim Leonard ([EMAIL PROTECTED])http://www.oldskool.org/ Want to help an ambitious games project? http://www.mobygames.com/ Or check out some trippy MindCandy at http://www.mindcandydvd.com/ -- This message was sent to you because you are currently subscribed to the swcollect mailing list. To unsubscribe, send mail to [EMAIL PROTECTED] with a subject of 'unsubscribe swcollect' Archives are available at: http://www.mail-archive.com/[EMAIL PROTECTED]/
[SWCollect] Computist
So after cleaning up from fixing a few stray disk errors on my laptop (soft errors, backup, reformat, restore, life is good), getting pasted by GNU Chess (a vintage game in every respect) in the afternoon, I contemplate how I can keep that momentum going. So I tried scanning in one of my Computist issues. There's a good pile of them here: http://computist.textfiles.com So I pick one that's not already there (I'll offer the scans to the site maintainer later). My favorite issue #61. The only issue that has one of my articles in it (look for Dan Halfwit on page 7. Trust me that its me. No one else would want to take credit for writing like that.) An hour of messing w/scanner and compression settings yields the following: 150 DPI JPEG 50% compression Slight boost to brightness (eliminates shadows, better compression) Fine. Now getting the pages to lay flush on the scanner bed is nigh impossible, so I do the unpleasant: I pop the staples out of the binding. It had the added plus of not scratching the scanner's glass surface. Being practical, I'd rather get good scans and leave the mags in acid-free bags for the rest of eternity than have lousy scans and keep the staples intact. So it takes what seems like forever, but here's the entire issue as a PDF: http://homepage.mac.com/chisarickd/Computist_61.pdf Beware, its almost 43MB. I considered at one point scanning at a higher resolution/lower compression, but the things are huge the way they are. I'm not really up to 1 CD per issue. Any feedback appreciated before I make 93+ more mistakes... (remaining issues). Dan PS - Someday, I swear, I will scan in the entire set of documentation for Origin's OMEGA. I have over 10 copies of it, and I have redundant manuals out the wazoo explicitly for the purpose of sacrificing one of them for archiving. -- This message was sent to you because you are currently subscribed to the swcollect mailing list. To unsubscribe, send mail to [EMAIL PROTECTED] with a subject of 'unsubscribe swcollect' Archives are available at: http://www.mail-archive.com/[EMAIL PROTECTED]/