Hi there, @ Max:Thanks,hope you will find the solution soon..
@ Admin: It would be great if you could suggest something, as i think it is quite important and great feature to correct user words in the output.. Thanks and Regards, Parmeet On May 10, 1:51 pm, Max Cantor <[email protected]> wrote: > Hi, > > Well, it was answered enough in that I was able to make my own > xxx.traineddata file. unfortunately, even with that traineddata file, I'm > running into the same problem that you are and I can't seem to get tesseract > to use the freq-dawg that I included. I've been digging through the source > code to find the right config but haven't succeeded yet. I'll let you and > the group know when I do! > > thanks, > max > > On May 10, 2011, at 4:32 PM, Parmeet wrote: > > > > > > > > > Hello there, > > > Sorry if i sounds naive, but i think the original question is not > > answered yet, that is how to include our own word list. After going > > through FAQ page, i found that we can put our eng.user-words file in > > tessdata folder. > > > I did exactly same and to test if it works i put characters a though z > > as a single word in eng.user-words file, save it as UTF-8 encoding. > > Then i make an image in Paint and put character from a through z as > > one word (with different fonts in different lines in same image) and > > try to run OCR on it. Unfortunately it did not corrected the output > > even when there is only single wrongly identified character in all the > > characters from a through z. Could you please let me know if i am > > doing something wrong or if somehow i need to retrain using my user- > > words.. > > > I shall be grateful for early reply. > > > Thanks and Kind Regards > > Parmeet > > > On May 10, 7:21 am, Max Cantor <[email protected]> wrote: > >> Ok, I found the problem. the fix is described here: > >> http://code.google.com/p/tesseract-ocr/issues/detail?id=356 > > >> the output dir needs to end in a period. > > >> my bad. > > >> max > > >> On May 9, 2011, at 3:30 PM, zdenko podobny wrote: > > >>> no problem :-) I think you will like option "-o" too. > > >>> Zdenko > > >>> On Mon, May 9, 2011 at 8:27 AM, Max Cantor <[email protected]> wrote: > >>> I feel really dumb now. Sorry for the bother. > > >>> Thanks, max > > >>> On May 9, 2011, at 14:01, zdenko podobny <[email protected]> wrote: > > >>>> Please try to read (to look is not enough ;-) ) [1] : > > >>>> // Specify option -u to unpack all the components to the specified path: > >>>> // > > >>>> // combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng. > >>>> // > > >>>> // This will create /home/$USER/temp/eng.* files with individual > >>>> tessdata > >>>> // components from tessdata/eng.traineddata. > > >>>> // > >>>> [1]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Puttin... > > >>>> On Mon, May 9, 2011 at 2:01 AM, Max Cantor <[email protected]> wrote: > >>>> I was looking at that, but can't find the other component files in the > >>>> source tree. is there somewhere to get the component files for the > >>>> eng.trainneddata? > > >>>> sorry if i'm missing something obvious... > > >>>> max > >>>> On May 9, 2011, at 1:40 AM, zdenko podobny wrote: > > >>>>> see [1] or user-words on the same page. > > >>>>> [1]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Puttin... > > >>>>> Zdenko > > >>>>> On Sun, May 8, 2011 at 5:53 PM, Max Cantor <[email protected]> wrote: > >>>>> Is there a way to set up a custom wordlist without going through the > >>>>> entire retraining process? our wordlists will change a bit at runtime, > >>>>> so if there is an API variable to set, that would be perfect for us. > > >>>>> Thanks, > >>>>> Max > > >>>>> Keep up the good work! > > >>>>> -- > >>>>> You received this message because you are subscribed to the Google > >>>>> Groups "tesseract-ocr" group. > >>>>> To post to this group, send email to [email protected] > >>>>> To unsubscribe from this group, send email to > >>>>> [email protected] > >>>>> For more options, visit this group at > >>>>>http://groups.google.com/group/tesseract-ocr?hl=en > > >>>>> -- > >>>>> You received this message because you are subscribed to the Google > >>>>> Groups "tesseract-ocr" group. > >>>>> To post to this group, send email to [email protected] > >>>>> To unsubscribe from this group, send email to > >>>>> [email protected] > >>>>> For more options, visit this group at > >>>>>http://groups.google.com/group/tesseract-ocr?hl=en > > >>>> -- > >>>> You received this message because you are subscribed to the Google > >>>> Groups "tesseract-ocr" group. > >>>> To post to this group, send email to [email protected] > >>>> To unsubscribe from this group, send email to > >>>> [email protected] > >>>> For more options, visit this group at > >>>>http://groups.google.com/group/tesseract-ocr?hl=en > > >>>> -- > >>>> You received this message because you are subscribed to the Google > >>>> Groups "tesseract-ocr" group. > >>>> To post to this group, send email to [email protected] > >>>> To unsubscribe from this group, send email to > >>>> [email protected] > >>>> For more options, visit this group at > >>>>http://groups.google.com/group/tesseract-ocr?hl=en > > >>> -- > >>> You received this message because you are subscribed to the Google > >>> Groups "tesseract-ocr" group. > >>> To post to this group, send email to [email protected] > >>> To unsubscribe from this group, send email to > >>> [email protected] > >>> For more options, visit this group at > >>>http://groups.google.com/group/tesseract-ocr?hl=en > > >>> -- > >>> You received this message because you are subscribed to the Google > >>> Groups "tesseract-ocr" group. > >>> To post to this group, send email to [email protected] > >>> To unsubscribe from this group, send email to > >>> [email protected] > >>> For more options, visit this group at > >>>http://groups.google.com/group/tesseract-ocr?hl=en > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

