HI,

I played with textcleaner :

http://www.fmwconcepts.com/imagemagick/textcleaner/

These options :

textcleaner -g -e stretch -f 25 -o 10 -u -s 1 -T -p 10 -t 80 page_0003.jpg 
page_0003_clean.jpg

The "-t 80" :

-t .... threshold ....... text smoothing threshold; 0<=threshold<=100; 
......................... nominal value is about 50; default is no smoothing

thins the lines enough to make a difference in the run together characters 
for tesseract.

I played with several settings from 50 to 100 and 80 was the best for me.

Its still only about 75% way below what tesseract handles on normal text I 
have but its going to work out.

Thanks for the help,

Stuart



On Thursday, September 12, 2013 10:18:44 PM UTC-4, rkomar wrote:
>
> On Thu, 12 Sep 2013, Stuart wrote: 
>
> > Automatically subdividing each image into character cells 
> > and OCR'ing each character separately sems like the only 
> > way out of this. I am experimenting with makebox to define 
> > the boxes first. 
>
> Argh!  When I read "proportional font" I thought 
> "monospace font", assuming that that was what the code 
> had been printed in.  That was why I suggested creating 
> the character cells, because it would be easy then. 
> I'm not sure it's worth trying to figure out where 
> the bounds of each character are, in your case. 
> Sorry, for reading the problem incorrectly. 
>
> Rob 
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to