Hi, Well, it was answered enough in that I was able to make my own xxx.traineddata file. unfortunately, even with that traineddata file, I'm running into the same problem that you are and I can't seem to get tesseract to use the freq-dawg that I included. I've been digging through the source code to find the right config but haven't succeeded yet. I'll let you and the group know when I do!
thanks, max On May 10, 2011, at 4:32 PM, Parmeet wrote: > Hello there, > > Sorry if i sounds naive, but i think the original question is not > answered yet, that is how to include our own word list. After going > through FAQ page, i found that we can put our eng.user-words file in > tessdata folder. > > I did exactly same and to test if it works i put characters a though z > as a single word in eng.user-words file, save it as UTF-8 encoding. > Then i make an image in Paint and put character from a through z as > one word (with different fonts in different lines in same image) and > try to run OCR on it. Unfortunately it did not corrected the output > even when there is only single wrongly identified character in all the > characters from a through z. Could you please let me know if i am > doing something wrong or if somehow i need to retrain using my user- > words.. > > I shall be grateful for early reply. > > Thanks and Kind Regards > Parmeet > > > On May 10, 7:21 am, Max Cantor <[email protected]> wrote: >> Ok, I found the problem. the fix is described here: >> http://code.google.com/p/tesseract-ocr/issues/detail?id=356 >> >> the output dir needs to end in a period. >> >> my bad. >> >> max >> >> On May 9, 2011, at 3:30 PM, zdenko podobny wrote: >> >> >> >> >> >> >> >>> no problem :-) I think you will like option "-o" too. >> >>> Zdenko >> >>> On Mon, May 9, 2011 at 8:27 AM, Max Cantor <[email protected]> wrote: >>> I feel really dumb now. Sorry for the bother. >> >>> Thanks, max >> >>> On May 9, 2011, at 14:01, zdenko podobny <[email protected]> wrote: >> >>>> Please try to read (to look is not enough ;-) ) [1] : >> >>>> // Specify option -u to unpack all the components to the specified path: >>>> // >> >>>> // combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng. >>>> // >> >>>> // This will create /home/$USER/temp/eng.* files with individual tessdata >>>> // components from tessdata/eng.traineddata. >> >>>> // >>>> [1]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Puttin... >> >>>> On Mon, May 9, 2011 at 2:01 AM, Max Cantor <[email protected]> wrote: >>>> I was looking at that, but can't find the other component files in the >>>> source tree. is there somewhere to get the component files for the >>>> eng.trainneddata? >> >>>> sorry if i'm missing something obvious... >> >>>> max >>>> On May 9, 2011, at 1:40 AM, zdenko podobny wrote: >> >>>>> see [1] or user-words on the same page. >> >>>>> [1]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Puttin... >> >>>>> Zdenko >> >>>>> On Sun, May 8, 2011 at 5:53 PM, Max Cantor <[email protected]> wrote: >>>>> Is there a way to set up a custom wordlist without going through the >>>>> entire retraining process? our wordlists will change a bit at runtime, >>>>> so if there is an API variable to set, that would be perfect for us. >> >>>>> Thanks, >>>>> Max >> >>>>> Keep up the good work! >> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To post to this group, send email to [email protected] >>>>> To unsubscribe from this group, send email to >>>>> [email protected] >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/tesseract-ocr?hl=en >> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To post to this group, send email to [email protected] >>>>> To unsubscribe from this group, send email to >>>>> [email protected] >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/tesseract-ocr?hl=en >> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to [email protected] >>>> To unsubscribe from this group, send email to >>>> [email protected] >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en >> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to [email protected] >>>> To unsubscribe from this group, send email to >>>> [email protected] >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en >> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected] >>> To unsubscribe from this group, send email to >>> [email protected] >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

