Done. All the wikis will need a major update for 3.00 when it comes anyway. Ray.
On Mon, Jun 1, 2009 at 3:51 PM, Matt Chan <[email protected]> wrote: > > I think I got around it. I wasn't copying over the word-dawg and freq- > dawg files from another language or generating them. I just touched > empty files with the same names. Sorry for the trouble. > > Ray, could the training document be clarified so that when it mentions > "You must create inttemp, normproto, pfftable and unicharset using the > procedure described below", that it also says the *-dawf files need to > be copied over or generated? I know it says that not using the *-dawg > files can result in lower accuracy, my experience is that tess asks > for them or won't run. > > Also, since I'm reading in nucleic acid strings which don't have word > sequences, would not using a dictionary actually increase accuracy? > > Thanks, > Matt > > On Jun 1, 6:20 pm, Matt Chan <[email protected]> wrote: > > Hi, > > > > I'm training tesseract to recognize only a small subset of english > > letters (A, C, T, G, U) for pulling nucleic acid sequences out of > > journal publications. > > > > I'm having trouble with one paper because the font joins 'A's when > > they are consecutive. I've tried creating boxes which break the joined > > 'AA' together, but tesseract gives me an error about having "box > > overlaps blob in labelled word". > > > > I've managed to get around that by specifying 'AA' as a single letter > > for those blobs, but I'm still having issues with a "Error: Illegal > > malloc request size!" bug. I'm not sure if these are related to my > > training process, or something else altogether. > > > > I'm hesitant to recompile because I'm moving the data files to a > > closed-source program which uses a tesseract back-end. > > > > I can give more details if necessary. > > > > Thanks in advance for any replies. > > Matt > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

