On Apr 8, 9:44 am, Anton Zorin <[email protected]> wrote:
> Hi Falke,
>
> Could you please be more specific:
>
> you want to specify image +  box file pair and based on those, fill the
> contents of text edit to enable playing with text formatting/styles? If so,
> it would be a bit complicated since it is a challange to extract formatting
> information from an image file.




Let me work backwards, in my reply:




No, that challenge is not there. It WOULD, indeed, be a challenge to
AUTOMATE it, but I am talking about doing that MANUALLY.  And this
manual process would, in fact, be the primary purpose of the utility
(to manually do that which you said would be hard to automate): mouse-
drag-select regions, and then (with an additional command (key-combo
or button)) _MARK_ the selected region(s) (representing a series of
boxes) as either "bold", "italic", "some_font1", "some_font2", etc.
 The actual annotation markups would be written back to the
("enhanced") box file ("enhanced", in the sense of having an extra
column, for the style code)

So, why would this be useful?  It would enable one to easily rip a box
file apart into multiple, style-specific box files (one for bold, one
for italics, one for each font, etc.) -- in compliance with
tesseract's training requirements (which include "do not mix font
styles")

>
> On Sun, Apr 8, 2012 at 4:38 PM, Falke <[email protected]> wrote:
> > Hi, Anton!
>
> > Looks very interesting.
>
> > Could you also do a utility that is the REVERSE of this ? :-))
>
> > I have been looking high and low for something that takes as input a
> > training bitmap+boxfile pair, and allows you to drag-mouse-select
> > multiple boxes (corresponding to words, sentences, etc.), and mark
> > them (annotate them) as a particular style (italics, bold, font1,
> > font2, etc.). There are already some box editors out there that
> > annotate -- but none with a full mouse-drag-select regions (sets of
> > boxes), performed in the bitmap display window, to annotate the
> > selected boxes at once.
>
> > (Discrete selection would, of course, be even better)
>
> > On Apr 5, 2:17 pm, Anton Zorin <[email protected]> wrote:
> > > forgot to post a link. So here it is:http://code.google.com/p/txt2img/
>
> > > On Thu, Apr 5, 2012 at 10:13 PM, Anton Zorin <[email protected]
> > >wrote:
>
> > > > Hi All,
>
> > > > I think this tool could be useful for some of you. It allows to
> > > > generate training images along with box files using the text edit
> > > > control's contens as input. So simple formatting is possible, font
> > > > antialiasing can be turned on/off). Currently compiled only for
> > > > windows (installer is available on the downloads page). Any comments/
> > > > remarks/bugs are welcome.
>
> > > > Thanks in advance.
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to