Great idea! I would suggest putting the documentation in a wiki instead of here. That way it will be easier to refer to and find later.
Shree Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 3, 2013 at 7:55 PM, Nick White <[email protected]> wrote: > I wonder, would others here be interested in figuring out and > documenting little bits of how the code works? > > I spent some time in the line segmentation code a little while ago, > to figure out better configuration parameters for line segmentation > for the Ancient Greek training (which ended up being pretty > successful), and I could certainly contribute a partial description > of how it works. > > If others are interested in doing this for key sections (like the > parts Dmitri suggested), perhaps we should set up a wiki and get to > work? It wouldn't be comprehensive, of course, but sharing what we > know could still prove pretty useful. > > What do people think? Is anyone else interested in doing this? > > I'll dig out the (very scrappy) notes I made on line segmentation, > clean them up, and post them here, when I get time. If anyone else > is interested, I'll set up a wiki somewhere. > > Nick > > On Thu, May 30, 2013 at 07:32:52PM +0400, Dmitri Silaev wrote: > > Excellent post, Nick! The more I read, the more I felt I had to ask > > these questions myself, but didn't yet. I'm afraid, though, many of > > them would remain unanswered. > > > > Because after several years of monitoring and asking in this forum I > > got used to the feeling that principal developers make only new > > release announcements. In the early years, they were much more active > > in discussions. I can suppose many of forum questions are tedious to > > answer over and over again, the forum search can be used, and many > > people just feel lazy to use it. But some of them are not like that > > and deserve answers. > > > > Now it looks like Google is doing us a favor making a formerly > > commercial engine outsource and sharing its developments from time to > > time. The community contribution now is constrained by enhancing > > release packages and fixing trivial bugs. Without a proper > > documentation or at least clues on how all this (not only Cube) works, > > developers keep community contribution nominal. I personally need more > > info and am ready to contribute, if I begin to understand the code > > enough. I used to surf the code alone, but the potential of this > > approach is limited. Off the bat, I'm interested in segmentation, > > details on class pruner and integer matcher, description of Cube, best > > practices on training data generation. I think, there are more to > > come, once I get more info on these. > > > > -- > > Dmitri > > > > > > On Thu, May 30, 2013 at 6:48 PM, Nick White <[email protected]> > wrote: > > > Hi Tesseractors, > > > > > > I am feeling a bit fed up about the lack of openness with the > > > Tesseract project. > > > > > > The addition of the cube mode, and several trainings, with > > > absolutely no documentation, or (as far as I can tell) any tools to > > > create cube training files, is a good example of this. > > > > > > As is the lack of tif/box files for any of the core training files > > > in the project. > > > > > > Keeping the cube tools and documentation private sucks royally. If > > > they aren't perfect or polished, it doesn't matter; we could help > > > to fix them up! > > > > > > I suspect some of the tif/box files for training aren't being > > > released because of concerns about copyright of the image files. If > > > that's the case please work to clear them up, or create freely > > > reusable versions. > > > > > > I love Tesseract; having a very high quality free software OCR > > > package is awesome, and I'm very grateful for the amazing work being > > > done on it. But I find the lack of parity between those inside > > > Google and the wider community to be rather troubling. > > > > > > If there's anything I can do to help make cube training tools and > > > documentation available, or the training source files, I'd be very > > > happy to help. Replying offlist if appropriate is fine. > > > > > > Nick > > > > > > -- > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "tesseract-ocr" group. > > > To post to this group, send email to [email protected] > > > To unsubscribe from this group, send email to > > > [email protected] > > > For more options, visit this group at > > > http://groups.google.com/group/tesseract-ocr?hl=en > > > > > > --- > > > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > > > > -- > > -- > > You received this message because you are subscribed to the Google > > Groups "tesseract-ocr" group. > > To post to this group, send email to [email protected] > > To unsubscribe from this group, send email to > > [email protected] > > For more options, visit this group at > > http://groups.google.com/group/tesseract-ocr?hl=en > > > > --- > > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

