Done. All the wikis will need a major update for 3.00 when it comes anyway.
Ray.

On Mon, Jun 1, 2009 at 3:51 PM, Matt Chan <[email protected]> wrote:

>
> I think I got around it. I wasn't copying over the word-dawg and freq-
> dawg files from another language or generating them. I just touched
> empty files with the same names. Sorry for the trouble.
>
> Ray, could the training document be clarified so that when it mentions
> "You must create inttemp, normproto, pfftable and unicharset using the
> procedure described below", that it also says the *-dawf files need to
> be copied over or generated? I know it says that not using the *-dawg
> files can result in lower accuracy, my experience is that tess asks
> for them or won't run.
>
> Also, since I'm reading in nucleic acid strings which don't have word
> sequences, would not using a dictionary actually increase accuracy?
>
> Thanks,
> Matt
>
> On Jun 1, 6:20 pm, Matt Chan <[email protected]> wrote:
> > Hi,
> >
> > I'm training tesseract to recognize only a small subset of english
> > letters (A, C, T, G, U) for pulling nucleic acid sequences out of
> > journal publications.
> >
> > I'm having trouble with one paper because the font joins 'A's when
> > they are consecutive. I've tried creating boxes which break the joined
> > 'AA' together, but tesseract gives me an error about having "box
> > overlaps blob in labelled word".
> >
> > I've managed to get around that by specifying 'AA' as a single letter
> > for those blobs, but I'm still having issues with a "Error: Illegal
> > malloc request size!" bug. I'm not sure if these are related to my
> > training process, or something else altogether.
> >
> > I'm hesitant to recompile because I'm moving the data files to a
> > closed-source program which uses a tesseract back-end.
> >
> > I can give more details if necessary.
> >
> > Thanks in advance for any replies.
> > Matt
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to