Great idea!

I would suggest putting the documentation in a wiki instead of here. That
way it will be easier to refer to and find later.

Shree

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Mon, Jun 3, 2013 at 7:55 PM, Nick White <[email protected]> wrote:

> I wonder, would others here be interested in figuring out and
> documenting little bits of how the code works?
>
> I spent some time in the line segmentation code a little while ago,
> to figure out better configuration parameters for line segmentation
> for the Ancient Greek training (which ended up being pretty
> successful), and I could certainly contribute a partial description
> of how it works.
>
> If others are interested in doing this for key sections (like the
> parts Dmitri suggested), perhaps we should set up a wiki and get to
> work? It wouldn't be comprehensive, of course, but sharing what we
> know could still prove pretty useful.
>
> What do people think? Is anyone else interested in doing this?
>
> I'll dig out the (very scrappy) notes I made on line segmentation,
> clean them up, and post them here, when I get time. If anyone else
> is interested, I'll set up a wiki somewhere.
>
> Nick
>
> On Thu, May 30, 2013 at 07:32:52PM +0400, Dmitri Silaev wrote:
> > Excellent post, Nick! The more I read, the more I felt I had to ask
> > these questions myself, but didn't yet. I'm afraid, though, many of
> > them would remain unanswered.
> >
> > Because after several years of monitoring and asking in this forum I
> > got used to the feeling that principal developers make only new
> > release announcements. In the early years, they were much more active
> > in discussions. I can suppose many of forum questions are tedious to
> > answer over and over again, the forum search can be used, and many
> > people just feel lazy to use it. But some of them are not like that
> > and deserve answers.
> >
> > Now it looks like Google is doing us a favor making a formerly
> > commercial engine outsource and sharing its developments from time to
> > time. The community contribution now is constrained by enhancing
> > release packages and fixing trivial bugs. Without a proper
> > documentation or at least clues on how all this (not only Cube) works,
> > developers keep community contribution nominal. I personally need more
> > info and am ready to contribute, if I begin to understand the code
> > enough. I used to surf the code alone, but the potential of this
> > approach is limited. Off the bat, I'm interested in segmentation,
> > details on class pruner and integer matcher, description of Cube, best
> > practices on training data generation. I think, there are more to
> > come, once I get more info on these.
> >
> > --
> > Dmitri
> >
> >
> > On Thu, May 30, 2013 at 6:48 PM, Nick White <[email protected]>
> wrote:
> > > Hi Tesseractors,
> > >
> > > I am feeling a bit fed up about the lack of openness with the
> > > Tesseract project.
> > >
> > > The addition of the cube mode, and several trainings, with
> > > absolutely no documentation, or (as far as I can tell) any tools to
> > > create cube training files, is a good example of this.
> > >
> > > As is the lack of tif/box files for any of the core training files
> > > in the project.
> > >
> > > Keeping the cube tools and documentation private sucks royally. If
> > > they aren't perfect or polished, it doesn't matter; we could help
> > > to fix them up!
> > >
> > > I suspect some of the tif/box files for training aren't being
> > > released because of concerns about copyright of the image files. If
> > > that's the case please work to clear them up, or create freely
> > > reusable versions.
> > >
> > > I love Tesseract; having a very high quality free software OCR
> > > package is awesome, and I'm very grateful for the amazing work being
> > > done on it. But I find the lack of parity between those inside
> > > Google and the wider community to be rather troubling.
> > >
> > > If there's anything I can do to help make cube training tools and
> > > documentation available, or the training source files, I'd be very
> > > happy to help. Replying offlist if appropriate is fine.
> > >
> > > Nick
> > >
> > > --
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "tesseract-ocr" group.
> > > To post to this group, send email to [email protected]
> > > To unsubscribe from this group, send email to
> > > [email protected]
> > > For more options, visit this group at
> > > http://groups.google.com/group/tesseract-ocr?hl=en
> > >
> > > ---
> > > You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected].
> > > For more options, visit https://groups.google.com/groups/opt_out.
> > >
> > >
> >
> > --
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
> >
> > ---
> > You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected].
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to