Again, you're trying to apply OCRopus to inputs it is not targeted at or tested on. That character is not a character cut out from a 300dpi scanned input, it's a fuzzy, scaled up image of a low-resolution character.
That means you will have some work to do in order to get it to work on such inputs: you can either write C++ code to use the classifiers inside OCRopus to handle this case, or you need to figure out whether the existing line recognizer can be made to work on these kinds of inputs. If you really just want to recognize isolated characters like this, your best bet is to feed them directly to the OCRopus classifiers in a separate C++ program. Tom On Fri, Jun 12, 2009 at 04:30, Yaroslav Bulatov<[email protected]> wrote: > > I tried higher resolution images, and get the same error. In > particular using the following dataset > http://yaroslavvb.com/upload/ocropus/dataset/ > > I issue command > ocropus trainseg model.simple dataset > > And get > dataset/0000/0000.gt.txt: transcript doesn't agree with cseg > (transcript 1, cseg 0) FIXME > > > On May 31, 1:27 pm, Thomas Breuel <[email protected]> wrote: >> > and get errors as below for each training file >> > dataset/0000/0636.gt.txt: transcript doesn't agree with cseg >> > (transcript 1, cseg 0) FIXME >> >> This means that the transcript contains one character and the cseg >> contains 0 characters. >> >> Why does the cseg contain zero characters? Because your images appear >> to be so low resolution that the noise filter just removes the few >> bits that are in your image. >> >> If you really want to train on such low resolution images, you have two >> options: >> >> * figure out which part of OCRopus is removing the bits and turn it >> off (noise removal happens in several places, and I'm not sure which >> one is responsible for this) >> >> * write your own top-level loop to train the characters directly (by >> copying and then greatly simplifying linerec.cc) >> >> BTW, the "FIXME" comment is there because we changed the >> representation of cseg files a little and that occasionally triggers >> this exception; however, in your case, the exception is really due to >> the bits getting deleted, rather than the changed cseg file. >> >> Tom > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
