OCRopus, which can use Tesseract as its engine, has support for some position information being output -- segmentation and some other things:
check out their docs on "file formats" https://docs.google.com/View?id=dfxcv4vc_92c8xxp7 --Sven On Tue, May 4, 2010 at 12:56 PM, lux <[email protected]> wrote: > No, it must be something given by tesseract because there could be > more red than black (font color in this example) and so it would all > screw up! > Anyway I can just get the text from tesseract before with the boxes > positions... but the problem is that I also need the exact color of > the word tesseract picked up. > > Tesseract surelly store the positions of the texts when it compute the > image, but the point is... is there a way to get these? > > On 3 Mag, 21:01, Sven Pedersen <[email protected]> wrote: >> Using filters to cancel out colors other than the target color, it >> should be possible to iteratively extract text of a certain color (say >> red, green, blue, black, etc.) But that would be hard. Generally >> people just want to get the text and fix the colors later. >> --Sven >> >> >> >> >> >> On Sun, May 2, 2010 at 1:41 PM, Sandro Zahra <[email protected]> wrote: >> > I think that OCR is not about colours..... >> >> > On 2 May 2010 17:35, lux <[email protected]> wrote: >> >> >> I need the RIGHT position of the text or the RIGHT color, not an >> >> average color :/. >> >> >> On 11 Apr, 20:48, MARTIN Pierre <[email protected]> wrote: >> >> > > So how can I get the position of text? >> >> > > I've tryed with makebox but it's not really right, it gives me the >> >> > > cordinates of the whole "letter box" so it's impossible for me to get >> >> > > the right pixel of the letter >> >> > > (e.g. it would work for an 'I' but for an 'A' it gives me the box left >> >> > > up and right down position so I don't know how to get the letter color >> >> > > because the 'A' is not at the start nor at the end of the box). >> >> >> > That's the right method. If you want to know where the "pixels" are, do >> >> > an histogram equalization of your picture, then contrast it with a >> >> > fairly >> >> > agressive threshold (If it's not already in 1bpp), this will give you a >> >> > copy >> >> > of your picture with only black and black pixels. Now, that's on this >> >> > picture (Basically 1bpp depth picture) that you run tesseract. >> >> > Then given the boxes, you look in your black & white picture where black >> >> > pixels are in the boxes, and then with the same coordinates you can see >> >> > them >> >> > in your original picture. After that, do color average from all pixels >> >> > in a >> >> > box in your original picture and you're good. >> >> >> > Pierre. >> >> >> -- >> >> You received this message because you are subscribed to the Google Groups >> >> "tesseract-ocr" group. >> >> To post to this group, send email to [email protected]. >> >> To unsubscribe from this group, send email to >> >> [email protected]. >> >> For more options, visit this group at >> >>http://groups.google.com/group/tesseract-ocr?hl=en. >> >> > -- >> > You received this message because you are subscribed to the Google Groups >> > "tesseract-ocr" group. >> > To post to this group, send email to [email protected]. >> > To unsubscribe from this group, send email to >> > [email protected]. >> > For more options, visit this group at >> >http://groups.google.com/group/tesseract-ocr?hl=en. >> >> -- >> ``All that is gold does not glitter, >> not all those who wander are lost; >> the old that is strong does not wither, >> deep roots are not reached by the frost. >> From the ashes a fire shall be woken, >> a light from the shadows shall spring; >> renewed shall be blade that was broken, >> the crownless again shall be king.” >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group >> athttp://groups.google.com/group/tesseract-ocr?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en. > > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

