It's taken a lot of trial and error for me to install OCRopus under
Windows.  Perhaps it would help to me to list the key steps that I
remember:

1) Install the full cygwin environment - this will take a lot of space
and time but it's worth it.  You will need all the GNU compiler tools
(gcc, g++, ar, ranlib, gdb, make, etc.) plus other stuff like the
Netpbm image conversion routines (portions of OCRopus use image file
formats can only be understood by Netpbm routines).  Note that on my
machine (Vista 64-bit), g++ points to the version 3.4.4 of the GNU C++
compiler by default, but compiling openFST will require using later
versions within cygwin.

2) Download the iulib library, untar (I use Winzip 12.0).  Bring up a
bash shell (you ought to configure your .bashrc scripts to provide
command aliases that you like), and run:
    ./configure
    make
    make install

3) Download OCRopus 0.3.1 (do not use the svn version)

4) Download Tesseract 2.0.3, untar.  You will need to patch viewer/
svutil.cpp:

#include <netinet/in.h>

just before the line:

#include <pthread.h>

You also need to run a patch that is contained in the OCRopus main
directory (this is described in the INSTALL file in OCRopus).  You
need to be in the tesseract-2.03 directory for this to work.

    patch -p1 <../ocropus-0.3/tesseract-2.03-patch.diff     # check
this path!

Now run
    ./configure
    make
    make install

5) I tried to compile openFST on Windows.  I got it to compile by
using command line arguments to configure to use the version 4.xxx GNU
compilers (on my machine these appear to be named gcc4 and g++4), but
I was not able to use the resulting installation with OCRopus because
the scripts that use FST assume that the include files are in a
different place than the latest openFST installs to.  Fortunately, you
do not need FST, leptonica, or SDL to use OCRopus.

6) Compile OCRopus by:
    ./configure --without-fst --without-leptonica --without-SDL
    make
    make install
    make check
The "check" target runs a bunch of tests.  The ones that depend on FST
won't run (that's OK).  One of the others has a minor error on my
system (one bounding box has a different size than expected).
7) You can test the result by running (from the OCRopus directory in
bash):
    ocroscript recognize data/pages/alice_1.png  >out_alice.html
    ocroscript recognize data/pages/snark01.png  >out_snark.html
This uses the recognize.lua script in /usr/local/share/ocroscript/
scripts.

I'm stuck on the next step, namely figuring out how to use this
package and to modify it while working on a Windows platform.  Visual
Studio won't work with this code, of course.  I've tried Eclipse C++,
but I keep getting weird errors when I attempt to run the make file
within the Eclipse environment - I think this may be due to
incompatibilities between Windows file paths (e.g. D:\ocropus-0.3\...
vs. Linux file paths /cygdrive/d/ocropus-0.3/...)

Currently I can run ocroscript from within "ddd", a visual debugger
that runs on top of X-windows and gdb.  To use this, you first have to
start the Xterm server from cygwin.  Then ddd runs in an Xterm client
(included with cygwin).  Unfortunately, I haven't yet figured out how
to set breakpoints in the OCRopus code in the layout analysis routines
that will actually trigger when I run ocroscript (I can only get
breakpoints to work in code modules like ocroscript.cc and
ocrotoplevel.cc).  It think that the problem has to do with the fact
that ocroscript doesn't call the C++ code directly, but instead runs a
LUA interpreter that links to the libocropus.a library.

Anyway, there are a lot of "main" C++ programs included in the OCRopus
system, spread out across quite a number of directories.  I have not
yet figured out, however, which of these is a "real" main.  All the
ones I've looked at so far import a file that is supposed to contain
intermediate results from earlier processing, and then outputs another
file with more calculations added to it.  Also these files are not
built by the Makefile that ships with OCRopus.

Also many of the LUA scripts don't appear to work with the latest
version of OCRopus, and I can't find any documentation that explains
what any of the scripts are doing or why they are different.  I'm
working through the code to figure this out, but the mapping between C+
+ routines and LUA calls is not in any documentation that I've seen
yet, and it's not completely straightforward.

Of course, I've only been puzzling over this for a few days and I'm
sure I'll figure out a lot more over the next few days.

Good luck!
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to