I really love your idea of making this package open source! I'm trying to experiment with your layout analysis system, but I'm stuck on some really basic steps and I could use a little help. I'm working on a Microsoft Vista 64-bit system. I tried simple strategies for porting to Visual Studio 2005, but quickly realized that I would have to do a significant amount of rewriting to make this code portable to both Linux and Windows. So I switched to cygwin and the GNU toolset. After resolving a few issues I now have a fully compiled OCRopus project with tesseract and iulib (but not openFST, which has some issues with cygwin that I have not yet resolved). This runs just fine on the alice_1.png file, and tesseract runs well enough on its test set.
So now I'm trying to run much more difficult files through OCRopus (large newspaper scans), which mostly fail with obscure warnings printed from the code (too many columns, various calculations not yielding plausible values, etc.). I'd like to debug the code, setting breakpoints, modifying the code, etc. I've tried using Eclipse running in conjunction with cygwin as a development environment, but this hits a variety of bizarre errors that I suspect have to do with mixing Windows paths that start with a drive letter with unix-style paths. Now I'm trying to use ddd running on XTerm, with gdb underneath. This functions correctly and I can run "ocroscript recognize data/pages/alice_1.png" correctly, and I can set breakpoints in ocroscript.cc and ocrtoplevel.cc, I can single step, etc. But I'm having trouble with the next step. The recognize.lua script works correctly, but many of the other scripts appear to be written for a different LUA interface and will not run with ocroscript. I can successfully augment the recognize.lua script by adding commands to write binary and gray level files (though writing the output file from the segmenter does not produce the type of zoned image that I might have expected). Following the recognize.lua script sequence, I try to set breakpoints in files like ocr-binarize-sauvola.cc or even in image.cc (in functions that appear to implement write_image_gray(), for example, after making sure that I call this function repeatedly from recognize.lua), but these breakpoints are never reached. Perhaps gdb can't set breakpoints in the libocropus.a library that is linked to ocroscript? I've also considered compiling some of the top level C++ programs such as main-ocr-binarize-sauvola.cc, but there aren't any Makefiles that make this easy. I can write these Makefiles, but then I'm puzzled as to precisely which routines I am supposed to be calling and in what order. It seems like I ought to be following the recognize.lua script, but there are multiple possibilities for each phase of processing and I can't find clear guidance on how to proceed. The PowerPoint slide presentations are really interesting and informative, but many of the lua scripts that appear there either don't exist or don't work. I feel like I've missed something really basic and simple. I know that you are all working on Ubuntu and I will switch over to that system if I have to, but what development tools are you using that make debugging and compiling possible with this code? Is there some top level main program that I should be looking at to see how to call the library functions, and is there some documentation that I've missed? I would appreciate any help anyone on this forum can give me. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
