Re: OCR old software listing

2019-01-02 Thread Toby Thain via cctalk
On 2019-01-02 7:22 AM, Steve Malikoff via cctalk wrote: > I timed myself how long it would take to clean up Mattis' supplied image so > it might > be able to be OCR'd more accurately. Using Paint.NET it took me 23 minutes to > get to > the following: > http://web.aanet.com.au/~malikoff/pdp11/dvY9

Re: OCR old software listing

2019-01-02 Thread Steve Malikoff via cctalk
I timed myself how long it would take to clean up Mattis' supplied image so it might be able to be OCR'd more accurately. Using Paint.NET it took me 23 minutes to get to the following: http://web.aanet.com.au/~malikoff/pdp11/dvY973s_cleaned.png There are still a few little bits I missed, but hap

Re: OCR old software listing

2019-01-02 Thread Larry Kraemer via cctalk
The only way I've been able to get any type of readable ASCII TEXT from the .tif's is to do the following for each tif: convert -density 1200 -resize 40% xaaa.tif -density 1 xaaa120040.tif Then, OCR it with Irfanview with the KADMOS Plugin Installed. For the first Page I get the following ASCII:

RE: OCR old software listing

2018-12-31 Thread Kevin Parker via cctalk
y, 1 January 2019 12:18 PM To: dwight ; General Discussion: On-Topic and Off-Topic Posts Subject: Re: OCR old software listing > On Dec 31, 2018, at 7:13 PM, dwight via cctalk wrote: > > Fred is right, OCR is only worth it if the document is in perfect condition. I just finish getting a

Re: OCR old software listing

2018-12-31 Thread Fred Cisin via cctalk
have analysed what I could see and make a judgement, based on what I could see and the general context as I was typing it in. Dwight From: cctalk on behalf of Fred Cisin via cctalk Sent: Monday, December 31, 2018 9:46 AM To: General Discussion: On-Topic and Off-Topic Posts Subject: Re: OCR old s

Re: OCR old software listing

2018-12-31 Thread Paul Koning via cctalk
> On Dec 31, 2018, at 7:13 PM, dwight via cctalk wrote: > > Fred is right, OCR is only worth it if the document is in perfect condition. > I just finish getting an old 4004 listing working. I made only two mistakes > on the 4K of code that were not the fault of the poorness of the listing.

Re: OCR old software listing

2018-12-31 Thread dwight via cctalk
-Topic and Off-Topic Posts Subject: Re: OCR old software listing On Mon, 31 Dec 2018, Larry Kraemer via cctalk wrote: > I used the libtiff-tools (Debian 8.x - 32 Bit) to extract all 61 .TIF's > from the Multipage .tif file. While the .tif's look descent, and > RasterVect shows th

Re: OCR old software listing

2018-12-31 Thread Fred Cisin via cctalk
On Mon, 31 Dec 2018, Larry Kraemer via cctalk wrote: I used the libtiff-tools (Debian 8.x - 32 Bit) to extract all 61 .TIF's from the Multipage .tif file. While the .tif's look descent, and RasterVect shows the .tif properties to be Group 4 Fax (1bpp) with 5100 x 6600 pixels - 300 DPI, I can't

Re: OCR old software listing

2018-12-31 Thread Toby Thain via cctalk
On 2018-12-31 7:20 AM, Larry Kraemer via cctalk wrote: > I used the libtiff-tools (Debian 8.x - 32 Bit) to extract all 61 .TIF's > from the > Multipage .tif file. While the .tif's look descent, and RasterVect shows > the > .tif properties to be Group 4 Fax (1bpp) with 5100 x 6600 pixels - 300 DPI,

Re: OCR old software listing

2018-12-31 Thread Larry Kraemer via cctalk
I used the libtiff-tools (Debian 8.x - 32 Bit) to extract all 61 .TIF's from the Multipage .tif file. While the .tif's look descent, and RasterVect shows the .tif properties to be Group 4 Fax (1bpp) with 5100 x 6600 pixels - 300 DPI, I can't get tesseract 3.x, TextBridge Classic 2.0, or Irfanview

Re: OCR old software listing.

2018-12-29 Thread Toby Thain via cctalk
On 2018-12-29 1:32 AM, Toby Thain via cctalk wrote: > On 2018-12-29 12:47 AM, Toby Thain via cctalk wrote: >> On 2018-12-26 4:29 PM, Mattis Lind via cctalk wrote: >>> Finally I got hold of the sources for the PDP-11 SPACE WAR that was >>> submitted to DECUS by Bill Seiler. >>> >>> The format is sca

Re: OCR old software listing.

2018-12-28 Thread Toby Thain via cctalk
On 2018-12-29 12:47 AM, Toby Thain via cctalk wrote: > On 2018-12-26 4:29 PM, Mattis Lind via cctalk wrote: >> Finally I got hold of the sources for the PDP-11 SPACE WAR that was >> submitted to DECUS by Bill Seiler. >> >> The format is scans of the PAL-11S listing output. It is easy to crop the >>

Re: OCR old software listing.

2018-12-28 Thread Toby Thain via cctalk
On 2018-12-26 4:29 PM, Mattis Lind via cctalk wrote: > Finally I got hold of the sources for the PDP-11 SPACE WAR that was > submitted to DECUS by Bill Seiler. > > The format is scans of the PAL-11S listing output. It is easy to crop the > image to only contain actual source. Then running OCR on i

Re: OCR old software listing.

2018-12-27 Thread Paul Koning via cctalk
> On Dec 26, 2018, at 10:30 PM, Jon Elson via cctalk > wrote: > > On 12/26/2018 03:29 PM, Mattis Lind via cctalk wrote: >> >> A good way to remove the black lines? >> >> >> >> https://i.imgur.com/dvY973s.png >> >> > Oh, boy! The printer was not properly aligned, so the lines actually o

Re: OCR old software listing.

2018-12-26 Thread Jon Elson via cctalk
On 12/26/2018 03:29 PM, Mattis Lind via cctalk wrote: A good way to remove the black lines? https://i.imgur.com/dvY973s.png Oh, boy! The printer was not properly aligned, so the lines actually overlay the dot-matrix printed text! This is going to make OCR very difficult! I don't think

Re: OCR old software listing.

2018-12-26 Thread Eric Smith via cctalk
On Wed, Dec 26, 2018, 17:15 Chuck Guzis via cctalk wrote: > On 12/26/18 3:17 PM, Al Kossow via cctalk wrote: > > On 12/26/18 2:55 PM, Steve Malikoff via cctalk wrote: > >> Scan them all as-is, put them up and 'crowd source' this list > > And TYPE the programs in again > > I've found that it's oft

Re: OCR old software listing.

2018-12-26 Thread Kyle Owen via cctalk
On Wed, Dec 26, 2018 at 6:15 PM Chuck Guzis via cctalk < cctalk@classiccmp.org> wrote: > On 12/26/18 3:17 PM, Al Kossow via cctalk wrote: > > > > And TYPE the programs in again > > I've found that it's often the best course of action and consumes the > least time overall. You also have a better c

Re: OCR old software listing.

2018-12-26 Thread Chuck Guzis via cctalk
On 12/26/18 3:17 PM, Al Kossow via cctalk wrote: > > > On 12/26/18 2:55 PM, Steve Malikoff via cctalk wrote: > >> Scan them all as-is, put them up and 'crowd source' this list > > And TYPE the programs in again I've found that it's often the best course of action and consumes the least time ov

Re: OCR old software listing.

2018-12-26 Thread Al Kossow via cctalk
On 12/26/18 2:55 PM, Steve Malikoff via cctalk wrote: > Scan them all as-is, put them up and 'crowd source' this list And TYPE the programs in again

Re: OCR old software listing.

2018-12-26 Thread Steve Malikoff via cctalk
Mattis said > Finally I got hold of the sources for the PDP-11 SPACE WAR that was > submitted to DECUS by Bill Seiler. > > The format is scans of the PAL-11S listing output. It is easy to crop the > image to only contain actual source. Then running OCR on it. Tried a few > online versions and tesse

Re: OCR old software listing.

2018-12-26 Thread Toby Thain via cctalk
On 2018-12-26 4:29 PM, Mattis Lind via cctalk wrote: > Finally I got hold of the sources for the PDP-11 SPACE WAR that was > submitted to DECUS by Bill Seiler. > > The format is scans of the PAL-11S listing output. It is easy to crop the > image to only contain actual source. Then running OCR on i

Re: OCR old software listing.

2018-12-26 Thread Will Cooke via cctalk
> On December 26, 2018 at 4:29 PM Mattis Lind via cctech > wrote: > > > Finally I got hold of the sources for the PDP-11 SPACE WAR that was > submitted to DECUS by Bill Seiler. > > The format is scans of the PAL-11S listing output. It is easy to crop the > image to only contain actual source.

OCR old software listing.

2018-12-26 Thread Mattis Lind via cctalk
Finally I got hold of the sources for the PDP-11 SPACE WAR that was submitted to DECUS by Bill Seiler. The format is scans of the PAL-11S listing output. It is easy to crop the image to only contain actual source. Then running OCR on it. Tried a few online versions and tesseract. The problem is t