Hi Rob My preprocessing is mentioned in this post: https://groups.google.com/forum/#!topic/tesseract-ocr/jONGSChLRv4
Maybe you would call it adaptive? Thanks for mentioning Wolf. I tried to compile the latest version but my 14.04 config must be wrong and I get: ~/ocr/wolf/binarizewolfjolion-src$ PKG_CONFIG_PATH=/usr/lib/pkgconfig/ make g++ -I/usr/include/opencv binarizewolfjolion.cpp -o binarizewolfjolion `pkg-config opencv --libs`-lstdc++ /usr/bin/ld: /tmp/ccwdIqAh.o: undefined reference to symbol '_ZN2cv11_InputArrayC1ERKNS_3MatE' //usr/lib/x86_64-linux-gnu/libopencv_core.so.2.4: error adding symbols: DSO missing from command line On Friday, October 17, 2014 1:33:47 PM UTC-4, rkomar wrote: > > On Fri, 17 Oct 2014, Rick Leir wrote: > > > I opened the jpg in Gimp, and you can see that it is about > > 100 pixels per text line: > > > > [gimpOriginal.png] > > That image looks to be scanned at about 150 dpi. With > such faint characters, scanning at 300 or 600 dpi would > have been better. Anyway, try scaling the images up > by a factor of two. Also try an "adaptive binarization" > algorithm to convert to black and white. Google > "wolf binarization" for one example of such an > algorithm. I tried myself on your example image, and > although it still didn't look that great, I can image > how bad it would look if a threshold binarization > algorithm was used. > > Rob Komar > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d4050809-27dc-41b9-980a-21272cdc4e1d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

