It looks like you are losing 9 pixels from each array.
The easy answer would be to put them back. Ideally, you would try to put
some back on each side so that the result is not too far off center.
Thus:
hlines2=: 58 157 {. 58 _152 {. hlines
vlines2=: 58 157 {. _43 157 {. vlines
Note that your matched content will be smaller than the original. That's
also fixable if it's an issue for you.
Good enough for now?
Thanks,
--
Raul
On Tue, Feb 18, 2014 at 9:12 PM, Joe Bogner <[email protected]> wrote:
> I'm testing out OCRing PDF tables using Tesseract OCR. I'm borrowing the
> concept from here:
> http://craiget.com/extracting-table-data-from-pdfs-with-ocr/
>
> I've made good progress using the PPM examples on rosetta code and am
> having fun with it. (NOTE: I am starting off with an image resized down to
> 157x158. The original image is 5100x6601 -- 600 dpi)
>
> Here's an image of my progress.
> http://imgur.com/a/3fcKK
>
> I'm determining a "line" by seeing if the rolling sum of the previous 10
> points is zero. I don't want all black pixels, just the ones that
> constitute a line. I'm stuck because this simple approach of is
> compressing the matrix with the infix I think. I'm not yet saavy enough
> with matrices to figure out what to do from here.
>
> $ xb
> 58 157
> $ hlines
> 58 148
> $ vlines
> 49 157
>
> My next logical step (assuming the matrices were equal) was to essentially
> AND them together so that I had a combined image/matrix of black/white for
> the vertical and horizontal lines.
>
> I was then going to attempt to chop up the image like in the python blog
> post and feed it to Tesseract.
>
> Any tips or taking it further would be great. Thanks for the help
>
> You can get the PPM here:
> https://www.dropbox.com/s/qoi1glkqs0tfezs/small.ppm
>
> require 'files'
>
> readppm=: monad define
> dat=. fread y NB. read from
> file
> msk=. 1 ,~ (*. 3 >: +/\) (LF&=@}: *. '#'&~:@}.) dat NB. mark field
> ends
> 't wbyh maxval dat'=. msk <;._2 dat NB. parse
> 'wbyh maxval'=. 2 1([ {. [: _99&". (LF,' ')&charsub)&.> wbyh;maxval NB.
> convert to numeric
> if. (_99 0 +./@e. wbyh,maxval) +. 'P6' -.@-: 2{.t do. _1 return. end.
> (a. i. dat) makeRGB |.wbyh NB. convert to
> basic bitmap format
> )
>
> makeRGB=: 0&$: : (($,)~ ,&3)
> fillRGB=: makeRGB }:@$
> setPixels=: (1&{::@[)`(<"1@(0&{::@[))`]}
> getPixels=: <"1@[ { ]
>
> NB. viewmat _50 (+ / % #) \ _50 (+ / % #)\"1 x2
>
> z=:readppm 'c:/temp/small.ppm'
>
> NB. compress the RGB into a single number
> x2=:+/"1 z
>
> NB. convert the RGB into a binary if it's black/white
> xb =: 500 <: x2
>
> hlines=:(10 (+/)\"1 xb) = 0
> vlines=:(10 (+/)\ xb) = 0
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm