Hello there, Sorry if i sounds naive, but i think the original question is not answered yet, that is how to include our own word list. After going through FAQ page, i found that we can put our eng.user-words file in tessdata folder.
I did exactly same and to test if it works i put characters a though z as a single word in eng.user-words file, save it as UTF-8 encoding. Then i make an image in Paint and put character from a through z as one word (with different fonts in different lines in same image) and try to run OCR on it. Unfortunately it did not corrected the output even when there is only single wrongly identified character in all the characters from a through z. Could you please let me know if i am doing something wrong or if somehow i need to retrain using my user- words.. I shall be grateful for early reply. Thanks and Kind Regards Parmeet On May 10, 7:21 am, Max Cantor <[email protected]> wrote: > Ok, I found the problem. the fix is described here: > http://code.google.com/p/tesseract-ocr/issues/detail?id=356 > > the output dir needs to end in a period. > > my bad. > > max > > On May 9, 2011, at 3:30 PM, zdenko podobny wrote: > > > > > > > > > no problem :-) I think you will like option "-o" too. > > > Zdenko > > > On Mon, May 9, 2011 at 8:27 AM, Max Cantor <[email protected]> wrote: > > I feel really dumb now. Sorry for the bother. > > > Thanks, max > > > On May 9, 2011, at 14:01, zdenko podobny <[email protected]> wrote: > > >> Please try to read (to look is not enough ;-) ) [1] : > > >> // Specify option -u to unpack all the components to the specified path: > >> // > > >> // combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng. > >> // > > >> // This will create /home/$USER/temp/eng.* files with individual tessdata > >> // components from tessdata/eng.traineddata. > > >> // > >> [1]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Puttin... > > >> On Mon, May 9, 2011 at 2:01 AM, Max Cantor <[email protected]> wrote: > >> I was looking at that, but can't find the other component files in the > >> source tree. is there somewhere to get the component files for the > >> eng.trainneddata? > > >> sorry if i'm missing something obvious... > > >> max > >> On May 9, 2011, at 1:40 AM, zdenko podobny wrote: > > >> > see [1] or user-words on the same page. > > >> > [1]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Puttin... > > >> > Zdenko > > >> > On Sun, May 8, 2011 at 5:53 PM, Max Cantor <[email protected]> wrote: > >> > Is there a way to set up a custom wordlist without going through the > >> > entire retraining process? our wordlists will change a bit at runtime, > >> > so if there is an API variable to set, that would be perfect for us. > > >> > Thanks, > >> > Max > > >> > Keep up the good work! > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups "tesseract-ocr" group. > >> > To post to this group, send email to [email protected] > >> > To unsubscribe from this group, send email to > >> > [email protected] > >> > For more options, visit this group at > >> >http://groups.google.com/group/tesseract-ocr?hl=en > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups "tesseract-ocr" group. > >> > To post to this group, send email to [email protected] > >> > To unsubscribe from this group, send email to > >> > [email protected] > >> > For more options, visit this group at > >> >http://groups.google.com/group/tesseract-ocr?hl=en > > >> -- > >> You received this message because you are subscribed to the Google > >> Groups "tesseract-ocr" group. > >> To post to this group, send email to [email protected] > >> To unsubscribe from this group, send email to > >> [email protected] > >> For more options, visit this group at > >>http://groups.google.com/group/tesseract-ocr?hl=en > > >> -- > >> You received this message because you are subscribed to the Google > >> Groups "tesseract-ocr" group. > >> To post to this group, send email to [email protected] > >> To unsubscribe from this group, send email to > >> [email protected] > >> For more options, visit this group at > >>http://groups.google.com/group/tesseract-ocr?hl=en > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

