I’ve been using PageSegMode with PSM_SINGLE_CHAR and I wasn’t being able to
get any choices with ChoiceIterator.

I workarounded this by changing the line in textord.cpp (in release # 724
this is line # 336):

 if (PSM_LINE_FIND_ENABLED(
pageseg_mode))

to

  if (PSM_LINE_FIND_ENABLED(pageseg_mode) || pageseg_mode ==
PSM_SINGLE_CHAR)

I don’t know if this could carry any side effects, but so far it is working
for me.
If this is not a bug, it could be interesting to introduce a variable in
order to be able to access to this behavior.

Without this I’ve been having big problems; when trying to recognize a
single char I was getting the recognition of the inner contour of the Q
(see my example below). Now I have the two options and I can decide between
them, based on the coordinates that GetBoxText() returns.

These are my two results, for clarification:

Reading 1:
https://docs.google.com/file/d/0BxkuvS_LuBAzYm9IUDVKVDJPaUk/edit?usp=sharing

Reading 2:
https://docs.google.com/file/d/0BxkuvS_LuBAzVkdBaHRmNWtYMW8/edit?usp=sharing

By the way, I’ve found a few variables which were very useful for debugging
this, they are:
tessedit_dump_choices
tessedit_debug_quality_metrics
tessedit_debug_doc_rejection

Finally, I have two questions to the list:

1)    I would have expected to have more results in the results iterator in
characters like “Q” which are too close to an “O”. Is there a way to
increase this ?

2)    I trained for a specific font (FE Schrift) printing in a paper and
then scanning. But as this project is for capturing with a camera, I need
then improve the training with the real captures character images, which
end up to be different. Should I use a different Tiff page for doing that,
as they were a different font ? Or could I include them in the same unique
page ?

This comes from my previous post:
https://groups.google.com/forum/?fromgroups#!topic/tesseract-ocr/et7bS5QRf2o


Thanks,

Andres Hurtis – www.visiondepatentes.com.ar - sorry for this, I’m in need
of SEO :)

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to