Does nobody have any clues here? Any suggestions of where else I
could ask, or where to go in the code to work it out for myself?

Thanks again.

Nick

On Wed, May 23, 2012 at 05:33:37PM +0100, Nick White wrote:
> Hi again,
> 
> I recently added a wordlist to my training, and was disappointed to
> find that it didn't seem to substantially improve the results. I
> suspect this is in significant part due to the unicharset not
> recognising equivalent upper and lower case letters (and hence not
> matching dictionary words case insensitively).
> 
> Examining the provided unicharset file for ell.trainingdata I see
> that the 7th column appears to refer to the id of the opposite case
> letter. So for example the two lines:
> 
> Α 5 39,70,132,255,39,204,0,44,52,288 Greek 25 0 101 Α>--# Α [391 ]A
> α 3 59,72,188,200,98,175,0,67,102,288 Greek 101 0 25 α>-# α [3b1 ]a
> 
> refer to each other as 101 and 25 respectively.
> 
> However my generated unicharset file includes no such references,
> with the 7th column being always 0. For example:
> 
> Α 5 0,255,0,255,0,32767,0,32767,0,32767 NULL 777 0 0 #>-# Α [391 ]A
> α 3 0,255,0,255,0,32767,0,32767,0,32767 NULL 766 0 0 #>-# α [3b1 ]a
> 
> Should this case information be handled automatically when the
> unicharset is created? If so, any clues as to how may I go about
> tracking down why it isn't working? If not, make a note to add that
> to the wiki when it's updated for 3.02.
> 
> Thanks for any advice,
> 
> Nick
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to