Sounds good. I think we should make some attempt to reverse engineer the Cube engine. I imagine Google will eventually release documentation, but we don't know when, if we document it they may be more inclined to give their side of it more quickly. It is very possible they don't have much internal documentation anyway. --Sven
On Mon, Jun 3, 2013 at 10:25 AM, Nick White <[email protected]> wrote: > I wonder, would others here be interested in figuring out and > documenting little bits of how the code works? > > I spent some time in the line segmentation code a little while ago, > to figure out better configuration parameters for line segmentation > for the Ancient Greek training (which ended up being pretty > successful), and I could certainly contribute a partial description > of how it works. > > If others are interested in doing this for key sections (like the > parts Dmitri suggested), perhaps we should set up a wiki and get to > work? It wouldn't be comprehensive, of course, but sharing what we > know could still prove pretty useful. > > What do people think? Is anyone else interested in doing this? > > I'll dig out the (very scrappy) notes I made on line segmentation, > clean them up, and post them here, when I get time. If anyone else > is interested, I'll set up a wiki somewhere. > > Nick > > On Thu, May 30, 2013 at 07:32:52PM +0400, Dmitri Silaev wrote: > > Excellent post, Nick! The more I read, the more I felt I had to ask > > these questions myself, but didn't yet. I'm afraid, though, many of > > them would remain unanswered. > > > > Because after several years of monitoring and asking in this forum I > > got used to the feeling that principal developers make only new > > release announcements. In the early years, they were much more active > > in discussions. I can suppose many of forum questions are tedious to > > answer over and over again, the forum search can be used, and many > > people just feel lazy to use it. But some of them are not like that > > and deserve answers. > > > > Now it looks like Google is doing us a favor making a formerly > > commercial engine outsource and sharing its developments from time to > > time. The community contribution now is constrained by enhancing > > release packages and fixing trivial bugs. Without a proper > > documentation or at least clues on how all this (not only Cube) works, > > developers keep community contribution nominal. I personally need more > > info and am ready to contribute, if I begin to understand the code > > enough. I used to surf the code alone, but the potential of this > > approach is limited. Off the bat, I'm interested in segmentation, > > details on class pruner and integer matcher, description of Cube, best > > practices on training data generation. I think, there are more to > > come, once I get more info on these. > > > > -- > > Dmitri > > > > > > On Thu, May 30, 2013 at 6:48 PM, Nick White <[email protected]> > wrote: > > > Hi Tesseractors, > > > > > > I am feeling a bit fed up about the lack of openness with the > > > Tesseract project. > > > > > > The addition of the cube mode, and several trainings, with > > > absolutely no documentation, or (as far as I can tell) any tools to > > > create cube training files, is a good example of this. > > > > > > As is the lack of tif/box files for any of the core training files > > > in the project. > > > > > > Keeping the cube tools and documentation private sucks royally. If > > > they aren't perfect or polished, it doesn't matter; we could help > > > to fix them up! > > > > > > I suspect some of the tif/box files for training aren't being > > > released because of concerns about copyright of the image files. If > > > that's the case please work to clear them up, or create freely > > > reusable versions. > > > > > > I love Tesseract; having a very high quality free software OCR > > > package is awesome, and I'm very grateful for the amazing work being > > > done on it. But I find the lack of parity between those inside > > > Google and the wider community to be rather troubling. > > > > > > If there's anything I can do to help make cube training tools and > > > documentation available, or the training source files, I'd be very > > > happy to help. Replying offlist if appropriate is fine. > > > > > > Nick > > > > > > -- > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "tesseract-ocr" group. > > > To post to this group, send email to [email protected] > > > To unsubscribe from this group, send email to > > > [email protected] > > > For more options, visit this group at > > > http://groups.google.com/group/tesseract-ocr?hl=en > > > > > > --- > > > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > > > > -- > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > > http://groups.google.com/group/tesseract-ocr?hl=en > > > > --- > > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

