I tried ocropus on a character cut out from a scanned input, and got
the same error.
http://yaroslavvb.com/upload/ocropus/dataset2/0000/

I could figure out the problem if I had an example of a dataset where
trainseg works

On Jun 12, 12:03 pm, Thomas Breuel <[email protected]> wrote:
> Again, you're trying to apply OCRopus to inputs it is not targeted at
> or tested on.  That character is not a character cut out from a 300dpi
> scanned input, it's a fuzzy, scaled up image of a low-resolution
> character.
>
> That means you will have some work to do in order to get it to work on
> such inputs: you can either write C++ code to use the classifiers
> inside OCRopus to handle this case, or you need to figure out whether
> the existing line recognizer can be made to work on these kinds of
> inputs.
>
> If you really just want to recognize isolated characters like this,
> your best bet is to feed them directly to the OCRopus classifiers in a
> separate C++ program.
>
> Tom
>
> On Fri, Jun 12, 2009 at 04:30, Yaroslav Bulatov<[email protected]> wrote:
>
> > I tried higher resolution images, and get the same error. In
> > particular using the following dataset
> >http://yaroslavvb.com/upload/ocropus/dataset/
>
> > I issue command
> > ocropus trainseg model.simple dataset
>
> > And get
> > dataset/0000/0000.gt.txt: transcript doesn't agree with cseg
> > (transcript 1, cseg 0) FIXME
>
> > On May 31, 1:27 pm, Thomas Breuel <[email protected]> wrote:
> >> > and get errors as below for each training file
> >> > dataset/0000/0636.gt.txt: transcript doesn't agree with cseg
> >> > (transcript 1, cseg 0) FIXME
>
> >> This means that the transcript contains one character and the cseg
> >> contains 0 characters.
>
> >> Why does the cseg contain zero characters?  Because your images appear
> >> to be so low resolution that the noise filter just removes the few
> >> bits that are in your image.
>
> >> If you really want to train on such low resolution images, you have two 
> >> options:
>
> >> * figure out which part of OCRopus is removing the bits and turn it
> >> off (noise removal happens in several places, and I'm not sure which
> >> one is responsible for this)
>
> >> * write your own top-level loop to train the characters directly (by
> >> copying and then greatly simplifying linerec.cc)
>
> >> BTW, the "FIXME" comment is there because we changed the
> >> representation of cseg files a little and that occasionally triggers
> >> this exception; however, in your case, the exception is really due to
> >> the bits getting deleted, rather than the changed cseg file.
>
> >> Tom
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to