Almost every wide printer had adjustable widths. If this is a problem
try a different printer. Non-standard lengths may be more of an issue.
If you shoot an entire page at a time line spacing is a problem for your
OCR software.
With tractor feed page length variability should not be an issue - every
page will have the same number of holes. I almost always ran whole boxes
of paper without adjusting top of form. On the big printers with dual
tractors I could start a new box without adjusting top of form. Some
printers may not repeatably feed exactly an integer number of holes each
time -- I didn't experience this.
Friction feed is another mater, and accumulated form feed errors will
present exactly the problems you describe. Many interesting listings
probably don't have tractor holes.
Problems feeding and stacking will prevent this from being an unattended
operation. Perfs will be weak - both the perfs between the page and the
tractor holes and the perfs between pages. A Data Products B1200 would
sometimes break perfs on new paper. Some consumer model dot-matrix
printers had gentle form feeds. Having a manual feed mode, requiring a
button push for each page, would probably be a good idea.
Upfront camera setup care would be required. Proper camera selection and
position should minimize keystone, pincushion, barrel etc. distortion.
Even lighting would be required - much as would be used with a copy
stand. I am somewhat more worried that the paper would be hanging
loosely from the printer, not sandwiched between glass. The camera will
need a small enough aperture to keep the entire sheet in focus while the
paper does whatever the heck it wants to. It gets into the basic
principle that you have to get the analog part right if you want to
successfully digitize.
I probably did understate the effort. There would be work to do, but no
part of it seems to be intractable to me. If worse came to worse and you
had to ditch the printer's control electronics and drive the feed
steppers directly you would still be ahead not having to build a paper
transport from scratch.
I remember wanting one of those thunder scan devices. In this case I
think that approach would cause more problems than it would solve.
OCRing the result, however you collect the images, is likely the hardest
part anyway.
On 02/11/2018 03:10 PM, Timothe Litt wrote:
It's not that simple. You need to deal with at least 2 common
vertical pitches (6 & 8 LPI), and a number of page lengths (and
widths). These need to be setup per job; not all printers support all
these. Plus, misalignment (as Al noted, crossing the perforations at
the bottom of a page is quite common). The OP mentioned that his
listings have a hard crease; this will cause (at least) feed and
stacking problems. Form feed causes a high-speed slew; this becomes
less reliable as the distance moved increases. You're proposing an
entire page at a time - which means that the paper will jump off the
tractors frequently.[1] Old paper is fragile. Over hundreds of pages,
dimensions may not be stable; it was not uncommon to have to re-adjust
TOF after a while. There's a fair bit of error detection and recovery
to work out.
Lighting is an issue, as is compensating for keystoning and other
misalignments. Most cameras don't have a standard remote trigger
interface - one of the pointers I provided loads modified firmware
into cameras from one manufacturer to make this work. If you look at
digital camera reviews, you'll see that the lenses have varying
degrees of artifacts, especially at the edges. So you need to find
and zoom to an area that's relatively "flat" & doesn't need a lot of
correction. While depth of field will help, it also will result in
apparent font size changes as paper sways forward and back. If you
stop that, you simplify the OCR - and don't need as much depth of field.
There are many backgrounds that need to be subtracted for OCR to
work. (Printer paper was notorious for institutional logos, as well
as bars and other aids to human readers.) Then there are the other
issues mentioned in my earlier note.
It seems simple, but it is a P.roject. That's a capital P. With a lot
of roject to work out.
It's worthwhile, but it's not simple. It's a pretty interesting
hardware (and software) project. I don't mean to discourage anyone
who wants to work on it - but you need to go in with eyes open, or
you'll end up very, very frustrated.
Thunderscan tried to scan line by line & retrieve grayscale; the
challenges were piecing together the adjacent lines with pixel
resolution. The focal distance was constant because the camera was
on a carriage. The idea here is to capture a page per frame. So the
registration problems are quite different. One could try the
thunderscan approach; it would trade one set of problems xxx
"challenges and opportunities" for another.
_______________________________________________
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh