Re: [tesseract-ocr] how can I get better results for this

Rick Leir Mon, 27 Oct 2014 05:38:34 -0700

Hi Rob
My preprocessing is mentioned in this post:
https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4


Maybe you would call it adaptive?

Thanks for mentioning Wolf. I tried to compile the latest version but my 
14.04 config must be wrong and I get:

 ~/ocr/wolf/binarizewolfjolion-src$ PKG_CONFIG_PATH=/usr/lib/pkgconfig/ make
g++ -I/usr/include/opencv binarizewolfjolion.cpp -o binarizewolfjolion 
`pkg-config opencv --libs`-lstdc++
/usr/bin/ld: /tmp/ccwdIqAh.o: undefined reference to symbol 
'_ZN2cv11_InputArrayC1ERKNS_3MatE'
//usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4: error adding symbols: DSO 
missing from command line


On Friday, October 17, 2014 1:33:47 PM UTC-4, rkomar wrote:
>
> On Fri, 17 Oct 2014, Rick Leir wrote: 
>
> > I opened the jpg in Gimp, and you can see that it is about 
> > 100 pixels per text line: 
> > 
> > [gimpOriginal.png] 
>
> That image looks to be scanned at about 150 dpi.  With 
> such faint characters, scanning at 300 or 600 dpi would 
> have been better.  Anyway, try scaling the images up 
> by a factor of two.  Also try an "adaptive binarization" 
> algorithm to convert to black and white.  Google 
> "wolf binarization" for one example of such an 
> algorithm.  I tried myself on your example image, and 
> although it still didn't look that great, I can image 
> how bad it would look if a threshold binarization 
> algorithm was used. 
>
> Rob Komar 
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d4050809-27dc-41b9-980a-21272cdc4e1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] how can I get better results for this

Reply via email to