Almost every wide printer had adjustable widths. If this is a problem try a different printer. Non-standard lengths may be more of an issue.

If you shoot an entire page at a time line spacing is a problem for your OCR software.

With tractor feed page length variability should not be an issue - every page will have the same number of holes. I almost always ran whole boxes of paper without adjusting top of form. On the big printers with dual tractors I could start a new box without adjusting top of form. Some printers may not repeatably feed exactly an integer number of holes each time -- I didn't experience this.

Friction feed is another mater, and accumulated form feed errors will present exactly the problems you describe. Many interesting listings probably don't have tractor holes.

Problems feeding and stacking will prevent this from being an unattended operation. Perfs will be weak - both the perfs between the page and the tractor holes and the perfs between pages. A Data Products B1200 would sometimes break perfs on new paper. Some consumer model dot-matrix printers had gentle form feeds. Having a manual feed mode, requiring a button push for each page, would probably be a good idea.

Upfront camera setup care would be required. Proper camera selection and position should minimize keystone, pincushion, barrel etc. distortion. Even lighting would be required - much as would be used with a copy stand. I am somewhat more worried that the paper would be hanging loosely from the printer, not sandwiched between glass. The camera will need a small enough aperture to keep the entire sheet in focus while the paper does whatever the heck it wants to. It gets into the basic principle that you have to get the analog part right if you want to successfully digitize.

I probably did understate the effort. There would be work to do, but no part of it seems to be intractable to me. If worse came to worse and you had to ditch the printer's control electronics and drive the feed steppers directly you would still be ahead not having to build a paper transport from scratch.

I remember wanting one of those thunder scan devices. In this case I think that approach would cause more problems than it would solve.

OCRing the result, however you collect the images, is likely the hardest part anyway.



On 02/11/2018 03:10 PM, Timothe Litt wrote:
It's not that simple.  You need to deal with at least 2 common vertical pitches (6 & 8 LPI), and a number of page lengths (and widths).  These need to be setup per job; not all printers support all these.  Plus, misalignment (as Al noted, crossing the perforations at the bottom of a page is quite common).  The OP mentioned that his listings have a hard crease; this will cause (at least) feed and stacking problems.  Form feed causes a high-speed slew; this becomes less reliable as the distance moved increases.  You're proposing an entire page at a time - which means that the paper will jump off the tractors frequently.[1] Old paper is fragile.  Over hundreds of pages, dimensions may not be stable; it was not uncommon to have to re-adjust TOF after a while.  There's a fair bit of error detection and recovery to work out.

Lighting is an issue, as is compensating for keystoning and other misalignments.  Most cameras don't have a standard remote trigger interface - one of the pointers I provided loads modified firmware into cameras from one manufacturer to make this work.  If you look at digital camera reviews, you'll see that the lenses have varying degrees of artifacts, especially at the edges.  So you need to find and zoom to an area that's relatively "flat" & doesn't need a lot of correction.  While depth of field will help, it also will result in apparent font size changes as paper sways forward and back.  If you stop that, you simplify the OCR - and don't need as much depth of field.

There are many backgrounds that need to be subtracted for OCR to work.  (Printer paper was notorious for institutional logos, as well as bars and other aids to human readers.)  Then there are the other issues mentioned in my earlier note.

It seems simple, but it is a P.roject.  That's a capital P. With a lot of roject to work out.

It's worthwhile, but it's not simple.  It's a pretty interesting hardware (and software) project.  I don't mean to discourage anyone who wants to work on it - but you need to go in with eyes open, or you'll end up very, very frustrated.

Thunderscan tried to scan line by line & retrieve grayscale; the challenges were piecing together the adjacent lines with pixel resolution.   The focal distance was constant because the camera was on a carriage.  The idea here is to capture a page per frame.  So the registration problems are quite different.  One could try the thunderscan approach; it would trade one set of problems xxx "challenges and opportunities" for another.


_______________________________________________
Simh mailing list
Simh@trailing-edge.com
http://mailman.trailing-edge.com/mailman/listinfo/simh

Reply via email to