Thanks, Nick.

It is good to have some cube info. Please add the list of languages that
use cube mode. I know that Hindi uses option 2 i.e. combined cube and tess
mode.

Regarding neural networks, I have read that nn has been removed from
tesseract as it was not open source. That may explain why there is minimal
nn code in 3.02. Please see:
http://www.cedricve.me/2013/04/12/how-to-train-tesseract/

Shree

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Mon, Jun 10, 2013 at 10:28 PM, Nick White <[email protected]>wrote:

> Right then, I've created a wiki in Google Code for this collected
> effort.
>
> https://code.google.com/p/tesseract-ocr-extradocs/
>
> I have spent some time this last week reading some of the cube code
> and figuring out the purpose of the various cube training files. I
> still don't know the most interesting stuff, which is exactly how
> the .nn files are used, but it was taking me a while to read the
> code so I though I'd just post what I have so far.
>
> If anyone wants to add to the wiki let me know and I'll gladly add
> you to the project.
>
> The next thing on my list to document is line segmentation, though I
> should probably try to add more information on how cube works first.
>
> I hope this looks useful to people, and inspires everyone to dig
> into all of the code :)
>
> Nick
>
> On Mon, Jun 03, 2013 at 10:49:46AM -0400, Sven Pedersen wrote:
> > Sounds good. I think we should make some attempt to reverse engineer the
> Cube
> > engine. I imagine Google will eventually release documentation, but we
> don't
> > know when, if we document it they may be more inclined to give their
> side of it
> > more quickly. It is very possible they don't have much internal
> documentation
> > anyway.
> > --Sven
> >
> >
> > On Mon, Jun 3, 2013 at 10:25 AM, Nick White <[email protected]>
> wrote:
> >
> >     I wonder, would others here be interested in figuring out and
> >     documenting little bits of how the code works?
> >
> >     I spent some time in the line segmentation code a little while ago,
> >     to figure out better configuration parameters for line segmentation
> >     for the Ancient Greek training (which ended up being pretty
> >     successful), and I could certainly contribute a partial description
> >     of how it works.
> >
> >     If others are interested in doing this for key sections (like the
> >     parts Dmitri suggested), perhaps we should set up a wiki and get to
> >     work? It wouldn't be comprehensive, of course, but sharing what we
> >     know could still prove pretty useful.
> >
> >     What do people think? Is anyone else interested in doing this?
> >
> >     I'll dig out the (very scrappy) notes I made on line segmentation,
> >     clean them up, and post them here, when I get time. If anyone else
> >     is interested, I'll set up a wiki somewhere.
> >
> >     Nick
> >
> >     On Thu, May 30, 2013 at 07:32:52PM +0400, Dmitri Silaev wrote:
> >     > Excellent post, Nick! The more I read, the more I felt I had to ask
> >     > these questions myself, but didn't yet. I'm afraid, though, many of
> >     > them would remain unanswered.
> >     >
> >     > Because after several years of monitoring and asking in this forum
> I
> >     > got used to the feeling that principal developers make only new
> >     > release announcements. In the early years, they were much more
> active
> >     > in discussions. I can suppose many of forum questions are tedious
> to
> >     > answer over and over again, the forum search can be used, and many
> >     > people just feel lazy to use it. But some of them are not like that
> >     > and deserve answers.
> >     >
> >     > Now it looks like Google is doing us a favor making a formerly
> >     > commercial engine outsource and sharing its developments from time
> to
> >     > time. The community contribution now is constrained by enhancing
> >     > release packages and fixing trivial bugs. Without a proper
> >     > documentation or at least clues on how all this (not only Cube)
> works,
> >     > developers keep community contribution nominal. I personally need
> more
> >     > info and am ready to contribute, if I begin to understand the code
> >     > enough. I used to surf the code alone, but the potential of this
> >     > approach is limited. Off the bat, I'm interested in segmentation,
> >     > details on class pruner and integer matcher, description of Cube,
> best
> >     > practices on training data generation. I think, there are more to
> >     > come, once I get more info on these.
> >     >
> >     > --
> >     > Dmitri
> >     >
> >     >
> >     > On Thu, May 30, 2013 at 6:48 PM, Nick White <
> [email protected]>
> >     wrote:
> >     > > Hi Tesseractors,
> >     > >
> >     > > I am feeling a bit fed up about the lack of openness with the
> >     > > Tesseract project.
> >     > >
> >     > > The addition of the cube mode, and several trainings, with
> >     > > absolutely no documentation, or (as far as I can tell) any tools
> to
> >     > > create cube training files, is a good example of this.
> >     > >
> >     > > As is the lack of tif/box files for any of the core training
> files
> >     > > in the project.
> >     > >
> >     > > Keeping the cube tools and documentation private sucks royally.
> If
> >     > > they aren't perfect or polished, it doesn't matter; we could help
> >     > > to fix them up!
> >     > >
> >     > > I suspect some of the tif/box files for training aren't being
> >     > > released because of concerns about copyright of the image files.
> If
> >     > > that's the case please work to clear them up, or create freely
> >     > > reusable versions.
> >     > >
> >     > > I love Tesseract; having a very high quality free software OCR
> >     > > package is awesome, and I'm very grateful for the amazing work
> being
> >     > > done on it. But I find the lack of parity between those inside
> >     > > Google and the wider community to be rather troubling.
> >     > >
> >     > > If there's anything I can do to help make cube training tools and
> >     > > documentation available, or the training source files, I'd be
> very
> >     > > happy to help. Replying offlist if appropriate is fine.
> >     > >
> >     > > Nick
> >     > >
> >     > > --
> >     > > --
> >     > > You received this message because you are subscribed to the
> Google
> >     > > Groups "tesseract-ocr" group.
> >     > > To post to this group, send email to
> [email protected]
> >     > > To unsubscribe from this group, send email to
> >     > > [email protected]
> >     > > For more options, visit this group at
> >     > > http://groups.google.com/group/tesseract-ocr?hl=en
> >     > >
> >     > > ---
> >     > > You received this message because you are subscribed to the
> Google
> >     Groups "tesseract-ocr" group.
> >     > > To unsubscribe from this group and stop receiving emails from
> it, send
> >     an email to [email protected].
> >     > > For more options, visit https://groups.google.com/groups/opt_out
> .
> >     > >
> >     > >
> >     >
> >     > --
> >     > --
> >     > You received this message because you are subscribed to the Google
> >     > Groups "tesseract-ocr" group.
> >     > To post to this group, send email to
> [email protected]
> >     > To unsubscribe from this group, send email to
> >     > [email protected]
> >     > For more options, visit this group at
> >     > http://groups.google.com/group/tesseract-ocr?hl=en
> >     >
> >     > ---
> >     > You received this message because you are subscribed to the Google
> Groups
> >     "tesseract-ocr" group.
> >     > To unsubscribe from this group and stop receiving emails from it,
> send an
> >     email to [email protected].
> >     > For more options, visit https://groups.google.com/groups/opt_out.
> >     >
> >     >
> >
> >     --
> >     --
> >     You received this message because you are subscribed to the Google
> >     Groups "tesseract-ocr" group.
> >     To post to this group, send email to [email protected]
> >     To unsubscribe from this group, send email to
> >     [email protected]
> >     For more options, visit this group at
> >     http://groups.google.com/group/tesseract-ocr?hl=en
> >
> >     ---
> >     You received this message because you are subscribed to the Google
> Groups
> >     "tesseract-ocr" group.
> >     To unsubscribe from this group and stop receiving emails from it,
> send an
> >     email to [email protected].
> >     For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
> >
> >
> >
> >
> > --
> > ``All that is gold does not glitter,
> >   not all those who wander are lost;
> > the old that is strong does not wither,
> >   deep roots are not reached by the frost.
> > From the ashes a fire shall be woken,
> >   a light from the shadows shall spring;
> > renewed shall be blade that was broken,
> >   the crownless again shall be king.”
> >
> > --
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> > http://groups.google.com/group/tesseract-ocr?hl=en
> >
> > ---
> > You received this message because you are subscribed to the Google Groups
> > "tesseract-ocr" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email
> > to [email protected].
> > For more options, visit https://groups.google.com/groups/opt_out.
> >
> >
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to