Re: Cube documentation, training source files, and openness

Dmitri Silaev Thu, 30 May 2013 08:33:47 -0700

Excellent post, Nick! The more I read, the more I felt I had to ask
these questions myself, but didn't yet. I'm afraid, though, many of
them would remain unanswered.

Because after several years of monitoring and asking in this forum I
got used to the feeling that principal developers make only new
release announcements. In the early years, they were much more active
in discussions. I can suppose many of forum questions are tedious to
answer over and over again, the forum search can be used, and many
people just feel lazy to use it. But some of them are not like that
and deserve answers.

Now it looks like Google is doing us a favor making a formerly
commercial engine outsource and sharing its developments from time to
time. The community contribution now is constrained by enhancing
release packages and fixing trivial bugs. Without a proper
documentation or at least clues on how all this (not only Cube) works,
developers keep community contribution nominal. I personally need more
info and am ready to contribute, if I begin to understand the code
enough. I used to surf the code alone, but the potential of this
approach is limited. Off the bat, I'm interested in segmentation,
details on class pruner and integer matcher, description of Cube, best
practices on training data generation. I think, there are more to
come, once I get more info on these.

--
Dmitri

On Thu, May 30, 2013 at 6:48 PM, Nick White <[email protected]> wrote:
> Hi Tesseractors,
>
> I am feeling a bit fed up about the lack of openness with the
> Tesseract project.
>
> The addition of the cube mode, and several trainings, with
> absolutely no documentation, or (as far as I can tell) any tools to
> create cube training files, is a good example of this.
>
> As is the lack of tif/box files for any of the core training files
> in the project.
>
> Keeping the cube tools and documentation private sucks royally. If
> they aren't perfect or polished, it doesn't matter; we could help
> to fix them up!
>
> I suspect some of the tif/box files for training aren't being
> released because of concerns about copyright of the image files. If
> that's the case please work to clear them up, or create freely
> reusable versions.
>
> I love Tesseract; having a very high quality free software OCR
> package is awesome, and I'm very grateful for the amazing work being
> done on it. But I find the lack of parity between those inside
> Google and the wider community to be rather troubling.
>
> If there's anything I can do to help make cube training tools and
> documentation available, or the training source files, I'd be very
> happy to help. Replying offlist if appropriate is fine.
>
> Nick
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups 
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Cube documentation, training source files, and openness

Reply via email to