Straighten the image before sending to tesseract. You can use scantailor or
unpaper.
Imagemagick may also have an option, you'll have to look.

See attached images - output from scantailor - and then OCRed using Vietocr
(gui frontend to Tesseract)


MODEL NAME 7
MOORE RF28HMEDBSR

ml.“
| mt RFQBHMEDBSH


MODEL NAML I
MODELE I RF34H996084


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Thu, Nov 13, 2014 at 3:30 AM, Bill Garrison <[email protected]>
wrote:

> So if someone sends in labels like the attached ones, I need to grab the
> model number. So far results from straight tesseract usage are dismal. I
> used an ImageMagick library to clean up the image a bit and send it in and
> if its rotated at ALL the results are still dismal. Overall, I am just
> looking to increase accuracy.
>
> Steps I have taken:
>
> 1) Using pre-processing library to clean up image
> 2) Added a new config that turns off dictionary and calls in a words file
> that has all the different samsung model numbers in it
> 3) tried to take my most promising pre-processed image and create a box
> file and then used "tesseract <image_name> <box_file_name> nobatch
> box.train" to train tesseract to not miss the two characters it missed
> ....this caused a segmentation fault.
>
> Any hints or advice about how I can use tesseract to grab this information
> with at least 50% accuracy would be GREATLY appreciated.
>
> Thanks!!
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/aeb92e24-faa7-4a08-bcca-e7ab0c225776%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/aeb92e24-faa7-4a08-bcca-e7ab0c225776%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVZLH3ksDeAW9D32WsDK_k-WF31Q0fVLb1CnKzOu_RPTQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to