I had to edit a few Tesseract box files to generate training data recently and didn't find any of the existing tools <https://code.google.com/p/tesseract-ocr/wiki/AddOns#Box_file_editors> to my liking. I wanted something that ran on Mac OS X and showed letters inside their boxes.
So I built a web-based tool which I'm calling boxedit. Here's the tool: http://www.danvk.org/boxedit/ Demo with preloaded data: http://www.danvk.org/boxedit/demo.html Source code & instructions: https://github.com/danvk/boxedit/ A few things to like about it: - It's entirely browser-based, so it runs on any platform and requires no installation. - You can use the browser's zoom in/out features. - It shows OCR'd letters on top of the source image, so the accuracy is easy to gauge. - It can split boxes N ways. - You can edit the raw box data or use the GUI, either works & they stay in sync. - It's easy to get going: drag & drop an image and its box file to get started. A few things to dislike: - The UI could use some work: the overlaying of transcribed letters could be much clearer. - Saving your changes back to disk is tedious (my best solution is to copy/paste back into the box file). - Missing a few important features (e.g. n-way merge and moving/resizing boxes visually) If people find this useful, I'm happy to polish it a bit more. Feel free to file issues <https://github.com/danvk/boxedit/issues> on GitHub. - Dan -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c7108500-c70a-4cf2-b3db-c3c3f3505122%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

