OCRopus, which can use Tesseract as its engine, has support for some
position information being output -- segmentation and some other
things:

check out their docs on "file formats"
https://docs.google.com/View?id=dfxcv4vc_92c8xxp7

--Sven


On Tue, May 4, 2010 at 12:56 PM, lux <[email protected]> wrote:
> No, it must be something given by tesseract because there could be
> more red than black (font color in this example) and so it would all
> screw up!
> Anyway I can just get the text from tesseract before with the boxes
> positions... but the problem is that I also need the exact color of
> the word tesseract picked up.
>
> Tesseract surelly store the positions of the texts when it compute the
> image, but the point is... is there a way to get these?
>
> On 3 Mag, 21:01, Sven Pedersen <[email protected]> wrote:
>> Using filters to cancel out colors other than the target color, it
>> should be possible to iteratively extract text of a certain color (say
>> red, green, blue, black, etc.) But that would be hard. Generally
>> people just want to get the text and fix the colors later.
>> --Sven
>>
>>
>>
>>
>>
>> On Sun, May 2, 2010 at 1:41 PM, Sandro Zahra <[email protected]> wrote:
>> > I think that OCR is not about colours.....
>>
>> > On 2 May 2010 17:35, lux <[email protected]> wrote:
>>
>> >> I need the RIGHT position of the text or the RIGHT color, not an
>> >> average color :/.
>>
>> >> On 11 Apr, 20:48, MARTIN Pierre <[email protected]> wrote:
>> >> > > So how can I get the position of text?
>> >> > > I've tryed with makebox but it's not really right, it gives me the
>> >> > > cordinates of the whole "letter box" so it's impossible for me to get
>> >> > > the right pixel of the letter
>> >> > > (e.g. it would work for an 'I' but for an 'A' it gives me the box left
>> >> > > up and right down position so I don't know how to get the letter color
>> >> > > because the 'A' is not at the start nor at the end of the box).
>>
>> >> > That's the right method. If you want to know where the "pixels" are, do
>> >> > an histogram equalization of your picture, then contrast it with a 
>> >> > fairly
>> >> > agressive threshold (If it's not already in 1bpp), this will give you a 
>> >> > copy
>> >> > of your picture with only black and black pixels. Now, that's on this
>> >> > picture (Basically 1bpp depth picture) that you run tesseract.
>> >> > Then given the boxes, you look in your black & white picture where black
>> >> > pixels are in the boxes, and then with the same coordinates you can see 
>> >> > them
>> >> > in your original picture. After that, do color average from all pixels 
>> >> > in a
>> >> > box in your original picture and you're good.
>>
>> >> > Pierre.
>>
>> >> --
>> >> You received this message because you are subscribed to the Google Groups
>> >> "tesseract-ocr" group.
>> >> To post to this group, send email to [email protected].
>> >> To unsubscribe from this group, send email to
>> >> [email protected].
>> >> For more options, visit this group at
>> >>http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>> > --
>> > You received this message because you are subscribed to the Google Groups
>> > "tesseract-ocr" group.
>> > To post to this group, send email to [email protected].
>> > To unsubscribe from this group, send email to
>> > [email protected].
>> > For more options, visit this group at
>> >http://groups.google.com/group/tesseract-ocr?hl=en.
>>
>> --
>> ``All that is gold does not glitter,
>>   not all those who wander are lost;
>> the old that is strong does not wither,
>>   deep roots are not reached by the frost.
>> From the ashes a fire shall be woken,
>>   a light from the shadows shall spring;
>> renewed shall be blade that was broken,
>>   the crownless again shall be king.”
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group 
>> athttp://groups.google.com/group/tesseract-ocr?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/tesseract-ocr?hl=en.
>
>



-- 
``All that is gold does not glitter,
  not all those who wander are lost;
the old that is strong does not wither,
  deep roots are not reached by the frost.
>From the ashes a fire shall be woken,
  a light from the shadows shall spring;
renewed shall be blade that was broken,
  the crownless again shall be king.”

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to