OK, I checked in a fixed version.

We're probably going to try to remove the Tesseract dependence before the
official 0.4 release; you'll still be able to use Tesseract, but it will be
a separate command and (small) interface library.

The reason is that Tesseract right now cannot be used with dynamic link
libraries, so we can't provide working Python bindings if we link.  Also,
the different versions of Tesseract have incompatible APIs, so it's hard to
make OCRopus work with all of them.

In terms of performance, the Tesseract character recognizer provides
reasonable results on a wide range of documents, while OCRopus pre-0.4 has
less consistent performance but performs much better in our benchmarks than
Tesseract for document classes that it has been trained on.

Mostly, what OCRopus needs now for more consistent performance is a lot more
training on different document types.  We're aiming for a distributed
training model, where many different people can train OCRopus on their
documents and on their machines and submit the trained models.  We can then
build a "supermodel" out of the components that works well for a lot of
models.  Again, the infrastructure for that is in place, and we hope that
that will be part of 0.5.

Tom

On Fri, May 15, 2009 at 10:20, Thomas Breuel <[email protected]> wrote:

> Oops, that change got checked in accidentally.  Just replace the "if 0"
> with "if 1"; I'll try to fix the repository later.
>
> Sorry about that.
>
> Tom
>
>
> On Fri, May 15, 2009 at 00:43, Taxman <[email protected]> wrote:
>
>>
>> Thanks 0mat, but tesseract-ocr-dev and tesseract-ocr is definitely
>> already installed (it was already in the ocropus/ubuntu script, Tom)
>> and the tesseract section of my SConstruct file looks like:
>>
>> ### tesseract
>>
>> if 0:
>>    env.Append(CPPPATH=["${tesseract}/include/tesseract"])
>>    env.Append(LIBPATH=["${tesseract}/lib"])
>>    env.Append(LIBS=["libtesseract_full.a","pthread"])
>>    env.Append(CPPDEFINES=["HAVE_TESSERACT"])
>>    assert conf.CheckLibWithHeader('tesseract_full', 'tesseract/
>> baseapi.h', 'C++')
>>    assert conf.CheckLib('pthread')
>>
>> It's definitely not commented out. I'm guessing there must be some
>> other missing dependency that most people have on a running system
>> that I don't yet have on this clean one. Any more ideas?
>> >>
>>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to