Hi, I'm training tesseract to recognize only a small subset of english letters (A, C, T, G, U) for pulling nucleic acid sequences out of journal publications.
I'm having trouble with one paper because the font joins 'A's when they are consecutive. I've tried creating boxes which break the joined 'AA' together, but tesseract gives me an error about having "box overlaps blob in labelled word". I've managed to get around that by specifying 'AA' as a single letter for those blobs, but I'm still having issues with a "Error: Illegal malloc request size!" bug. I'm not sure if these are related to my training process, or something else altogether. I'm hesitant to recompile because I'm moving the data files to a closed-source program which uses a tesseract back-end. I can give more details if necessary. Thanks in advance for any replies. Matt --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

