Ok, so I thought more on this. What I will end up with is segments of possible various colors. Instead of handing tesseract I would like to hand it a set of (closed) contours. Is that possible?
On 16 Nov., 08:09, Dmitri Silaev <[email protected]> wrote: > Indeed, ultimately Tesseract operates on contours which are always > extracted from blobs. Blobs are structures in turn extracted from > binary images, That's why these contours are always closed. It is > possible, however, to tinker inside the guts and make Tesseracrt match > your contours as partial prototypes. (Refer to papers > athttp://code.google.com/p/tesseract-ocr/wiki/Documentation) You're > going to have hard time doing this as the class hierarchy is really > convoluted and a bit awkward. You're also going to do much R&D because > (as I know) nothing had been done previously to check how it'll work > for such a task, and you should investigate accuracy both for contour > extraction and Tesseract parts. > > And I don't know if this info can really help )) > > Warm regards, > Dmitri Silaevwww.CustomOCR.com > > > > > > > > On Tue, Nov 15, 2011 at 11:03 PM, daniel <[email protected]> > wrote: > > Hi, > > > I was just referring to your previous post, were you said I should > > just convert a list of blobs into a binary image. I don't think that > > will always work. Since I don't know in advance which segments are > > writing I would have to generate a binary image from an arbitrary > > segmentation. That in general is the map coloring problem, for which > > you need up to four colors, given you find a good coloring algorithm. > > Basically I was wondering if it is not possible to give tesseract a > > list of contours, that my not even have to be closed. I.e. edges. > > Well, I will just play around with giving it edge images. But I was > > just hoping I could go one level deeper, and give tesseract directly a > > list of contours, as I assume that is what it operates on, in the end. > > > I don't have good example images ready right now, but as soon as I do > > I will post them here. > > > Daniel > > > On 15 Nov., 14:44, Dmitri Silaev <[email protected]> wrote: > >> I don't know either. Sample images are still wanted. In the worst case > >> it may end up in need to develop your own code, not just a sequence of > >> ready library calls. > >> -- > >> Dmitri > > >> On Tue, Nov 15, 2011 at 1:13 PM, daniel <[email protected]> > >> wrote: > >> > Hey, > > >> > I don't know. How about situations where more than two colors are > >> > involved. I would have to map the discovered segments to two colors, > >> > which may even be impossible. And with contours even more so, as the > >> > contours may not be closed... > > >> > On 12 Nov., 18:26, Dmitri Silaev <[email protected]> wrote: > >> >> If you're able to use OpenCV then, given a list of contours or blobs, > >> >> you should be able to reconstruct a binary image. This is a general > >> >> thought. To get a more practical advice, send us your sample image(s) > > >> >> Warm regards, > >> >> Dmitri Silaevwww.CustomOCR.com > > >> >> On Sat, Nov 12, 2011 at 4:37 PM, daniel <[email protected]> > >> >> wrote: > >> >> > Hi, > > >> >> > I want to use tesseract to read text off things like posters and > >> >> > packages. The text will have different colors, there will be images > >> >> > and other mess, so it seems like a non-standard situation. I thought > >> >> > it would help if I use some opencv segmentation or contour finding > >> >> > algorithm instead of the thresholding that tessearact seems to do. > >> >> > That, however, will not provide a binary image, but a list of > >> >> > components/contours. How can I feed this to tesseract? > > >> >> > Best > > >> >> > Daniel > > >> >> > -- > >> >> > You received this message because you are subscribed to the Google > >> >> > Groups "tesseract-ocr" group. > >> >> > To post to this group, send email to [email protected] > >> >> > To unsubscribe from this group, send email to > >> >> > [email protected] > >> >> > For more options, visit this group at > >> >> >http://groups.google.com/group/tesseract-ocr?hl=en > > >> > -- > >> > You received this message because you are subscribed to the Google > >> > Groups "tesseract-ocr" group. > >> > To post to this group, send email to [email protected] > >> > To unsubscribe from this group, send email to > >> > [email protected] > >> > For more options, visit this group at > >> >http://groups.google.com/group/tesseract-ocr?hl=en > > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > >http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

