Hi Falke,

Now I got it. So user selects a rectangle on the image and corresponding
lines from the box file highlighted somehow to enalbe the former to put
some custom information there or save the fragments as a separate image-box
pair. Is this vision correct?

If so, at first glance it seems to me that such functionality would be more
appropriate for another (separate) application. GUI would be different:
 - spread sheet is needed to work with box files
 - selection of image regions and displaying of those should be convenient

But from the implementation complexity point of view it doesn't look like a
big deal for QT.

Thanks,

Anton Zorin
On Sun, Apr 8, 2012 at 7:49 PM, Falke <[email protected]> wrote:

>
>
>
>
>
>
>
>
> On Apr 8, 9:44 am, Anton Zorin <[email protected]> wrote:
> > Hi Falke,
> >
> > Could you please be more specific:
> >
> > you want to specify image +  box file pair and based on those, fill the
> > contents of text edit to enable playing with text formatting/styles? If
> so,
> > it would be a bit complicated since it is a challange to extract
> formatting
> > information from an image file.
>
>
>
>
> Let me work backwards, in my reply:
>
>
>
>
> No, that challenge is not there. It WOULD, indeed, be a challenge to
> AUTOMATE it, but I am talking about doing that MANUALLY.  And this
> manual process would, in fact, be the primary purpose of the utility
> (to manually do that which you said would be hard to automate): mouse-
> drag-select regions, and then (with an additional command (key-combo
> or button)) _MARK_ the selected region(s) (representing a series of
> boxes) as either "bold", "italic", "some_font1", "some_font2", etc.
>  The actual annotation markups would be written back to the
> ("enhanced") box file ("enhanced", in the sense of having an extra
> column, for the style code)
>
> So, why would this be useful?  It would enable one to easily rip a box
> file apart into multiple, style-specific box files (one for bold, one
> for italics, one for each font, etc.) -- in compliance with
> tesseract's training requirements (which include "do not mix font
> styles")
>
> >
> > On Sun, Apr 8, 2012 at 4:38 PM, Falke <[email protected]> wrote:
> > > Hi, Anton!
> >
> > > Looks very interesting.
> >
> > > Could you also do a utility that is the REVERSE of this ? :-))
> >
> > > I have been looking high and low for something that takes as input a
> > > training bitmap+boxfile pair, and allows you to drag-mouse-select
> > > multiple boxes (corresponding to words, sentences, etc.), and mark
> > > them (annotate them) as a particular style (italics, bold, font1,
> > > font2, etc.). There are already some box editors out there that
> > > annotate -- but none with a full mouse-drag-select regions (sets of
> > > boxes), performed in the bitmap display window, to annotate the
> > > selected boxes at once.
> >
> > > (Discrete selection would, of course, be even better)
> >
> > > On Apr 5, 2:17 pm, Anton Zorin <[email protected]> wrote:
> > > > forgot to post a link. So here it is:
> http://code.google.com/p/txt2img/
> >
> > > > On Thu, Apr 5, 2012 at 10:13 PM, Anton Zorin <
> [email protected]
> > > >wrote:
> >
> > > > > Hi All,
> >
> > > > > I think this tool could be useful for some of you. It allows to
> > > > > generate training images along with box files using the text edit
> > > > > control's contens as input. So simple formatting is possible, font
> > > > > antialiasing can be turned on/off). Currently compiled only for
> > > > > windows (installer is available on the downloads page). Any
> comments/
> > > > > remarks/bugs are welcome.
> >
> > > > > Thanks in advance.
> >
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to