Hi Falke, Now I got it. So user selects a rectangle on the image and corresponding lines from the box file highlighted somehow to enalbe the former to put some custom information there or save the fragments as a separate image-box pair. Is this vision correct?
If so, at first glance it seems to me that such functionality would be more appropriate for another (separate) application. GUI would be different: - spread sheet is needed to work with box files - selection of image regions and displaying of those should be convenient But from the implementation complexity point of view it doesn't look like a big deal for QT. Thanks, Anton Zorin On Sun, Apr 8, 2012 at 7:49 PM, Falke <[email protected]> wrote: > > > > > > > > > On Apr 8, 9:44 am, Anton Zorin <[email protected]> wrote: > > Hi Falke, > > > > Could you please be more specific: > > > > you want to specify image + box file pair and based on those, fill the > > contents of text edit to enable playing with text formatting/styles? If > so, > > it would be a bit complicated since it is a challange to extract > formatting > > information from an image file. > > > > > Let me work backwards, in my reply: > > > > > No, that challenge is not there. It WOULD, indeed, be a challenge to > AUTOMATE it, but I am talking about doing that MANUALLY. And this > manual process would, in fact, be the primary purpose of the utility > (to manually do that which you said would be hard to automate): mouse- > drag-select regions, and then (with an additional command (key-combo > or button)) _MARK_ the selected region(s) (representing a series of > boxes) as either "bold", "italic", "some_font1", "some_font2", etc. > The actual annotation markups would be written back to the > ("enhanced") box file ("enhanced", in the sense of having an extra > column, for the style code) > > So, why would this be useful? It would enable one to easily rip a box > file apart into multiple, style-specific box files (one for bold, one > for italics, one for each font, etc.) -- in compliance with > tesseract's training requirements (which include "do not mix font > styles") > > > > > On Sun, Apr 8, 2012 at 4:38 PM, Falke <[email protected]> wrote: > > > Hi, Anton! > > > > > Looks very interesting. > > > > > Could you also do a utility that is the REVERSE of this ? :-)) > > > > > I have been looking high and low for something that takes as input a > > > training bitmap+boxfile pair, and allows you to drag-mouse-select > > > multiple boxes (corresponding to words, sentences, etc.), and mark > > > them (annotate them) as a particular style (italics, bold, font1, > > > font2, etc.). There are already some box editors out there that > > > annotate -- but none with a full mouse-drag-select regions (sets of > > > boxes), performed in the bitmap display window, to annotate the > > > selected boxes at once. > > > > > (Discrete selection would, of course, be even better) > > > > > On Apr 5, 2:17 pm, Anton Zorin <[email protected]> wrote: > > > > forgot to post a link. So here it is: > http://code.google.com/p/txt2img/ > > > > > > On Thu, Apr 5, 2012 at 10:13 PM, Anton Zorin < > [email protected] > > > >wrote: > > > > > > > Hi All, > > > > > > > I think this tool could be useful for some of you. It allows to > > > > > generate training images along with box files using the text edit > > > > > control's contens as input. So simple formatting is possible, font > > > > > antialiasing can be turned on/off). Currently compiled only for > > > > > windows (installer is available on the downloads page). Any > comments/ > > > > > remarks/bugs are welcome. > > > > > > > Thanks in advance. > > > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

