Hi there,

@ Max:Thanks,hope you will find the solution soon..

@ Admin: It would be great if you could suggest something, as i think
it is quite important and great feature to correct user words in the
output..

Thanks and Regards,
Parmeet

On May 10, 1:51 pm, Max Cantor <[email protected]> wrote:
> Hi,
>
> Well, it was answered enough in that I was able to make my own 
> xxx.traineddata file.  unfortunately, even with that traineddata file, I'm 
> running into the same problem that you are and I can't seem to get tesseract 
> to use the freq-dawg that I included.  I've been digging through the source 
> code to find the right config but haven't succeeded yet.  I'll let you and 
> the group know when I do!
>
> thanks,
> max
>
> On May 10, 2011, at 4:32 PM, Parmeet wrote:
>
>
>
>
>
>
>
> > Hello there,
>
> > Sorry if i sounds naive, but i think the original question is not
> > answered yet, that is how to include our own word list. After going
> > through FAQ page, i found that we can put our eng.user-words file in
> > tessdata folder.
>
> > I did exactly same and to test if it works i put characters a though z
> > as a single word in eng.user-words file, save it as UTF-8 encoding.
> > Then i make an image in Paint and put character from a through z as
> > one word (with different fonts in different lines in same image) and
> > try to run OCR on it. Unfortunately it did not corrected the output
> > even when there is only single wrongly identified character in all the
> > characters from a through z. Could you please let me know if i am
> > doing something wrong or if somehow i need to retrain using my user-
> > words..
>
> > I shall be grateful for early reply.
>
> > Thanks and Kind Regards
> > Parmeet
>
> > On May 10, 7:21 am, Max Cantor <[email protected]> wrote:
> >> Ok, I found the problem.  the fix is described here:  
> >> http://code.google.com/p/tesseract-ocr/issues/detail?id=356
>
> >> the output dir needs to end in a period.  
>
> >> my bad.
>
> >> max
>
> >> On May 9, 2011, at 3:30 PM, zdenko podobny wrote:
>
> >>> no problem :-) I think you will like option "-o" too.
>
> >>> Zdenko
>
> >>> On Mon, May 9, 2011 at 8:27 AM, Max Cantor <[email protected]> wrote:
> >>> I feel really dumb now. Sorry for the bother.
>
> >>> Thanks, max
>
> >>> On May 9, 2011, at 14:01, zdenko podobny <[email protected]> wrote:
>
> >>>> Please try to read (to look is not enough ;-) ) [1] :
>
> >>>> // Specify option -u to unpack all the components to the specified path:
> >>>> //
>
> >>>> // combine_tessdata -u tessdata/eng.traineddata /home/$USER/temp/eng.
> >>>> //
>
> >>>> // This will create  /home/$USER/temp/eng.* files with individual 
> >>>> tessdata
> >>>> // components from tessdata/eng.traineddata.
>
> >>>> //
> >>>> [1]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Puttin...
>
> >>>> On Mon, May 9, 2011 at 2:01 AM, Max Cantor <[email protected]> wrote:
> >>>> I was looking at that, but can't find the other component files in the 
> >>>> source tree.  is there somewhere to get the component files for the 
> >>>> eng.trainneddata?
>
> >>>> sorry if i'm missing something obvious...
>
> >>>> max
> >>>> On May 9, 2011, at 1:40 AM, zdenko podobny wrote:
>
> >>>>> see [1] or user-words on the same page.
>
> >>>>> [1]http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Puttin...
>
> >>>>> Zdenko
>
> >>>>> On Sun, May 8, 2011 at 5:53 PM, Max Cantor <[email protected]> wrote:
> >>>>> Is there a way to set up a custom wordlist without going through the 
> >>>>> entire retraining process?  our wordlists will change a bit at runtime, 
> >>>>> so if there is an API variable to set, that would be perfect for us.
>
> >>>>> Thanks,
> >>>>> Max
>
> >>>>> Keep up the good work!
>
> >>>>> --
> >>>>> You received this message because you are subscribed to the Google
> >>>>> Groups "tesseract-ocr" group.
> >>>>> To post to this group, send email to [email protected]
> >>>>> To unsubscribe from this group, send email to
> >>>>> [email protected]
> >>>>> For more options, visit this group at
> >>>>>http://groups.google.com/group/tesseract-ocr?hl=en
>
> >>>>> --
> >>>>> You received this message because you are subscribed to the Google
> >>>>> Groups "tesseract-ocr" group.
> >>>>> To post to this group, send email to [email protected]
> >>>>> To unsubscribe from this group, send email to
> >>>>> [email protected]
> >>>>> For more options, visit this group at
> >>>>>http://groups.google.com/group/tesseract-ocr?hl=en
>
> >>>> --
> >>>> You received this message because you are subscribed to the Google
> >>>> Groups "tesseract-ocr" group.
> >>>> To post to this group, send email to [email protected]
> >>>> To unsubscribe from this group, send email to
> >>>> [email protected]
> >>>> For more options, visit this group at
> >>>>http://groups.google.com/group/tesseract-ocr?hl=en
>
> >>>> --
> >>>> You received this message because you are subscribed to the Google
> >>>> Groups "tesseract-ocr" group.
> >>>> To post to this group, send email to [email protected]
> >>>> To unsubscribe from this group, send email to
> >>>> [email protected]
> >>>> For more options, visit this group at
> >>>>http://groups.google.com/group/tesseract-ocr?hl=en
>
> >>> --
> >>> You received this message because you are subscribed to the Google
> >>> Groups "tesseract-ocr" group.
> >>> To post to this group, send email to [email protected]
> >>> To unsubscribe from this group, send email to
> >>> [email protected]
> >>> For more options, visit this group at
> >>>http://groups.google.com/group/tesseract-ocr?hl=en
>
> >>> --
> >>> You received this message because you are subscribed to the Google
> >>> Groups "tesseract-ocr" group.
> >>> To post to this group, send email to [email protected]
> >>> To unsubscribe from this group, send email to
> >>> [email protected]
> >>> For more options, visit this group at
> >>>http://groups.google.com/group/tesseract-ocr?hl=en
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> >http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to