> and get errors as below for each training file
> dataset/0000/0636.gt.txt: transcript doesn't agree with cseg
> (transcript 1, cseg 0) FIXME

This means that the transcript contains one character and the cseg
contains 0 characters.

Why does the cseg contain zero characters?  Because your images appear
to be so low resolution that the noise filter just removes the few
bits that are in your image.

If you really want to train on such low resolution images, you have two options:

* figure out which part of OCRopus is removing the bits and turn it
off (noise removal happens in several places, and I'm not sure which
one is responsible for this)

* write your own top-level loop to train the characters directly (by
copying and then greatly simplifying linerec.cc)

BTW, the "FIXME" comment is there because we changed the
representation of cseg files a little and that occasionally triggers
this exception; however, in your case, the exception is really due to
the bits getting deleted, rather than the changed cseg file.

Tom

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to