Indeed, ultimately Tesseract operates on contours which are always extracted from blobs. Blobs are structures in turn extracted from binary images, That's why these contours are always closed. It is possible, however, to tinker inside the guts and make Tesseracrt match your contours as partial prototypes. (Refer to papers at http://code.google.com/p/tesseract-ocr/wiki/Documentation) You're going to have hard time doing this as the class hierarchy is really convoluted and a bit awkward. You're also going to do much R&D because (as I know) nothing had been done previously to check how it'll work for such a task, and you should investigate accuracy both for contour extraction and Tesseract parts.
And I don't know if this info can really help )) Warm regards, Dmitri Silaev www.CustomOCR.com On Tue, Nov 15, 2011 at 11:03 PM, daniel <[email protected]> wrote: > Hi, > > I was just referring to your previous post, were you said I should > just convert a list of blobs into a binary image. I don't think that > will always work. Since I don't know in advance which segments are > writing I would have to generate a binary image from an arbitrary > segmentation. That in general is the map coloring problem, for which > you need up to four colors, given you find a good coloring algorithm. > Basically I was wondering if it is not possible to give tesseract a > list of contours, that my not even have to be closed. I.e. edges. > Well, I will just play around with giving it edge images. But I was > just hoping I could go one level deeper, and give tesseract directly a > list of contours, as I assume that is what it operates on, in the end. > > I don't have good example images ready right now, but as soon as I do > I will post them here. > > > Daniel > > > On 15 Nov., 14:44, Dmitri Silaev <[email protected]> wrote: >> I don't know either. Sample images are still wanted. In the worst case >> it may end up in need to develop your own code, not just a sequence of >> ready library calls. >> -- >> Dmitri >> >> >> >> >> >> >> >> On Tue, Nov 15, 2011 at 1:13 PM, daniel <[email protected]> >> wrote: >> > Hey, >> >> > I don't know. How about situations where more than two colors are >> > involved. I would have to map the discovered segments to two colors, >> > which may even be impossible. And with contours even more so, as the >> > contours may not be closed... >> >> > On 12 Nov., 18:26, Dmitri Silaev <[email protected]> wrote: >> >> If you're able to use OpenCV then, given a list of contours or blobs, >> >> you should be able to reconstruct a binary image. This is a general >> >> thought. To get a more practical advice, send us your sample image(s) >> >> >> Warm regards, >> >> Dmitri Silaevwww.CustomOCR.com >> >> >> On Sat, Nov 12, 2011 at 4:37 PM, daniel <[email protected]> >> >> wrote: >> >> > Hi, >> >> >> > I want to use tesseract to read text off things like posters and >> >> > packages. The text will have different colors, there will be images >> >> > and other mess, so it seems like a non-standard situation. I thought >> >> > it would help if I use some opencv segmentation or contour finding >> >> > algorithm instead of the thresholding that tessearact seems to do. >> >> > That, however, will not provide a binary image, but a list of >> >> > components/contours. How can I feed this to tesseract? >> >> >> > Best >> >> >> > Daniel >> >> >> > -- >> >> > You received this message because you are subscribed to the Google >> >> > Groups "tesseract-ocr" group. >> >> > To post to this group, send email to [email protected] >> >> > To unsubscribe from this group, send email to >> >> > [email protected] >> >> > For more options, visit this group at >> >> >http://groups.google.com/group/tesseract-ocr?hl=en >> >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "tesseract-ocr" group. >> > To post to this group, send email to [email protected] >> > To unsubscribe from this group, send email to >> > [email protected] >> > For more options, visit this group at >> >http://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

