Re: [SWCollect] Computist

2004-04-12 Thread Jim Leonard
Jim Leonard wrote:

Dan Chisarick wrote:

JPEG: I guess I was lazy because the scanner supports JPEG, TIFF, PCT 
and PSD as native outputs.  I can scan each page w/its own settings 
(color vs B&W pretty much) in TIFF format and post-process the scans 
into PNG.
One more thing I wanted to add:  Don't be afraid of B&W Lineart.  For a 
full page of B&W text/code with no photos, it really is the best option. 
 Try a scan of Computist at 600 DPI at your scanner's Line Art setting 
(NO dithering) for a text page as a test.
--
Jim Leonard ([EMAIL PROTECTED])
World's largest electronic gaming project:http://www.MobyGames.com/
A delicious slice of the demoscene:http://www.MindCandyDVD.com/
Various oldskool PC rants and ramblings:   http://www.oldskool.org/

--
This message was sent to you because you are currently subscribed to
the swcollect mailing list.  To unsubscribe, send mail to 
[EMAIL PROTECTED] with a subject of 'unsubscribe swcollect'
Archives are available at: http://www.mail-archive.com/[EMAIL PROTECTED]/



Re: [SWCollect] Computist

2004-04-12 Thread Jim Leonard
Dan Chisarick wrote:

JPEG: I guess I was lazy because the scanner supports JPEG, TIFF, PCT 
and PSD as native outputs.  I can scan each page w/its own settings 
(color vs B&W pretty much) in TIFF format and post-process the scans 
into PNG.
JPEG is fantastic if there are any full-page ads or photos or something 
-- you scan those at 300 DPI with some sort of descreening option turned 
on and it turns out great, then JPEG turns it into a managable size. 
But for just B&W text, it's the last thing you want to use.

PDF: I was going for the 'book binding' wrapper rather than having a ZIP 
of loose pages, but when I look at even my original scans... they're 
only 17MB for the individual pages, but the resulting PDF is 43!  I'm 
willing to bet that there is some optimization I can do to make the PDF 
files smaller w/o compromising the quality, but that's my last 
priority.  I was also thinking annotations, the 'searchable image' 
option, etc.
If you're going to use PDF, you should OCR the images to turn stuff into 
usuable text.  Adobe Acrobat's Capture has a checkbox'd option to do OCR 
but *leave* the image in place so that any bad OCR won't mangle the 
displayed results (but the text will still be hidden and searchable). 
That is probably the only reason to use PDF.  Otherwise, as you found 
out, it's better to just archive a series of image files, properly numbered.
--
Jim Leonard ([EMAIL PROTECTED])
World's largest electronic gaming project:http://www.MobyGames.com/
A delicious slice of the demoscene:http://www.MindCandyDVD.com/
Various oldskool PC rants and ramblings:   http://www.oldskool.org/

--
This message was sent to you because you are currently subscribed to
the swcollect mailing list.  To unsubscribe, send mail to 
[EMAIL PROTECTED] with a subject of 'unsubscribe swcollect'
Archives are available at: http://www.mail-archive.com/[EMAIL PROTECTED]/



Re: [SWCollect] Computist

2004-04-12 Thread Dan Chisarick
Ok, lots of stuff to try it seems :)

JPEG: I guess I was lazy because the scanner supports JPEG, TIFF, PCT 
and PSD as native outputs.  I can scan each page w/its own settings 
(color vs B&W pretty much) in TIFF format and post-process the scans 
into PNG.

PDF: I was going for the 'book binding' wrapper rather than having a 
ZIP of loose pages, but when I look at even my original scans... 
they're only 17MB for the individual pages, but the resulting PDF is 
43!  I'm willing to bet that there is some optimization I can do to 
make the PDF files smaller w/o compromising the quality, but that's my 
last priority.  I was also thinking annotations, the 'searchable image' 
option, etc.

BTW, scans used to make the PDF:  
http://homepage.mac.com/chisarickd/061.zip

I'll try some of these out and see what sort of improvements I get.  
Thanks.

On Apr 12, 2004, at 3:27 AM, Jim Leonard wrote:

Dan Chisarick wrote:
Any feedback appreciated before I make 93+ more mistakes... 
(remaining issues).
While I'd like to say you should run it through OCR, I can see from 
the content (which kicks f**king ass, btw) that OCR would most likely 
murder it. But 43MB per issue is nuts.  So, my suggestions are:

- Don't use JPEG for 8-bit B&W text.  JPEG was architected for 
continuous-tone images, not harshly-contrasting edges (like text).  
Use something lossless (preferably PNG) for text.  Don't believe me?  
As proof, I used Acrobat on your PDF to extract the source JPGs for 
page 10 (there are two that make up the page) and I combined them in 
Photoshop, then saved out to a grayscale (8-bit) PNG.  Total size of 
source JPGs was 781K, but the PNG as saved from Photoshop was 465K.  
For extra crunching, I let PNGGauntlet chew on that file for about 10 
minutes and it got it down to 316K.  (Since PNGGauntlet can batch 
files overnight, making the time it takes a non-issue, I usually 
include it in all of my processes.)

- Scan at 600 DPI halftone (that's 2-color B&W) for text-only pages 
without color.  Not only will you completely eliminate "bleed" from 
the other side of the page, but it will compress better than anything 
else.  You're archiving text; at that high a resolution (600 DPI), you 
don't need anti-aliased edges.  Again, as an example, I scanned a 
text-only page without color or photos as 600 DPI and the resulting 
PNG saved out of Photoshop was 363K.  Running through PNGGauntlet for 
12 minutes shaved it down to 270K.  That's four times your previous 
scanning resolution at 1/3rd the filesize (and it's perfectly clean 
and readable).

- Don't deliver the images in a PDF wrapper.  I love PDF, but it's 
meant for text mixed with images, not just images.  Try just a .zip 
(with no compression of course) with all the images.

BTW, if you would like the exact images I scanned, I still have them 
on the hard drive -- I'm not just making numbers up, you can see the 
test files for yourself.  Just tell me where to email them.
--
Jim Leonard ([EMAIL PROTECTED])
http://www.oldskool.org/
Want to help an ambitious games project? 
http://www.mobygames.com/
Or check out some trippy MindCandy at 
http://www.mindcandydvd.com/

--
This message was sent to you because you are currently subscribed to
the swcollect mailing list.  To unsubscribe, send mail to 
[EMAIL PROTECTED] with a subject of 'unsubscribe swcollect'
Archives are available at: 
http://www.mail-archive.com/[EMAIL PROTECTED]/



--
This message was sent to you because you are currently subscribed to
the swcollect mailing list.  To unsubscribe, send mail to 
[EMAIL PROTECTED] with a subject of 'unsubscribe swcollect'
Archives are available at: http://www.mail-archive.com/[EMAIL PROTECTED]/



Re: [SWCollect] Computist

2004-04-12 Thread Jim Leonard
Dan Chisarick wrote:
Any feedback appreciated before I 
make 93+ more mistakes... (remaining issues).
While I'd like to say you should run it through OCR, I can see from the 
content (which kicks f**king ass, btw) that OCR would most likely murder it. 
But 43MB per issue is nuts.  So, my suggestions are:

- Don't use JPEG for 8-bit B&W text.  JPEG was architected for continuous-tone 
images, not harshly-contrasting edges (like text).  Use something lossless 
(preferably PNG) for text.  Don't believe me?  As proof, I used Acrobat on 
your PDF to extract the source JPGs for page 10 (there are two that make up 
the page) and I combined them in Photoshop, then saved out to a grayscale 
(8-bit) PNG.  Total size of source JPGs was 781K, but the PNG as saved from 
Photoshop was 465K.  For extra crunching, I let PNGGauntlet chew on that file 
for about 10 minutes and it got it down to 316K.  (Since PNGGauntlet can batch 
files overnight, making the time it takes a non-issue, I usually include it in 
all of my processes.)

- Scan at 600 DPI halftone (that's 2-color B&W) for text-only pages without 
color.  Not only will you completely eliminate "bleed" from the other side of 
the page, but it will compress better than anything else.  You're archiving 
text; at that high a resolution (600 DPI), you don't need anti-aliased edges. 
 Again, as an example, I scanned a text-only page without color or photos as 
600 DPI and the resulting PNG saved out of Photoshop was 363K.  Running 
through PNGGauntlet for 12 minutes shaved it down to 270K.  That's four times 
your previous scanning resolution at 1/3rd the filesize (and it's perfectly 
clean and readable).

- Don't deliver the images in a PDF wrapper.  I love PDF, but it's meant for 
text mixed with images, not just images.  Try just a .zip (with no compression 
of course) with all the images.

BTW, if you would like the exact images I scanned, I still have them on the 
hard drive -- I'm not just making numbers up, you can see the test files for 
yourself.  Just tell me where to email them.
--
Jim Leonard ([EMAIL PROTECTED])http://www.oldskool.org/
Want to help an ambitious games project? http://www.mobygames.com/
Or check out some trippy MindCandy at http://www.mindcandydvd.com/

--
This message was sent to you because you are currently subscribed to
the swcollect mailing list.  To unsubscribe, send mail to 
[EMAIL PROTECTED] with a subject of 'unsubscribe swcollect'
Archives are available at: http://www.mail-archive.com/[EMAIL PROTECTED]/



[SWCollect] Computist

2004-04-11 Thread Dan Chisarick
So after cleaning up from fixing a few stray disk errors on my laptop 
(soft errors, backup, reformat, restore, life is good), getting pasted 
by GNU Chess (a vintage game in every respect) in the afternoon, I 
contemplate how I can keep that momentum going.  So I tried scanning in 
one of my Computist issues.  There's a good pile of them here:

http://computist.textfiles.com

So I pick one that's not already there (I'll offer the scans to the 
site maintainer later).  My favorite issue #61.  The only issue that 
has one of my articles in it (look for Dan Halfwit on page 7.  Trust me 
that its me.  No one else would want to take credit for writing like 
that.)  An hour of messing w/scanner and compression settings yields 
the following:

150 DPI
JPEG 50% compression
Slight boost to brightness (eliminates shadows, better compression)
Fine.  Now getting the pages to lay flush on the scanner bed is nigh 
impossible, so I do the unpleasant: I pop the staples out of the 
binding.  It had the added plus of not scratching the scanner's glass 
surface.  Being practical, I'd rather get good scans and leave the mags 
in acid-free bags for the rest of eternity than have lousy scans and 
keep the staples intact.  So it takes what seems like forever, but 
here's the entire issue as a PDF:

http://homepage.mac.com/chisarickd/Computist_61.pdf

Beware, its almost 43MB.  I considered at one point scanning at a 
higher resolution/lower compression, but the things are huge the way 
they are.  I'm not really up to 1 CD per issue.  Any feedback 
appreciated before I make 93+ more mistakes... (remaining issues).

Dan

PS - Someday, I swear, I will scan in the entire set of documentation 
for Origin's OMEGA.  I have over 10 copies of it, and I have redundant 
manuals out the wazoo explicitly for the purpose of sacrificing one of 
them for archiving.

--
This message was sent to you because you are currently subscribed to
the swcollect mailing list.  To unsubscribe, send mail to 
[EMAIL PROTECTED] with a subject of 'unsubscribe swcollect'
Archives are available at: http://www.mail-archive.com/[EMAIL PROTECTED]/