I put up a sample training set and describe how to do training in this document:
http://code.google.com/p/ocropus/wiki/Using (Look under Training, then Getting Started) Tom On Sat, Jun 13, 2009 at 00:18, Yaroslav Bulatov<[email protected]> wrote: > > I tried ocropus on a character cut out from a scanned input, and got > the same error. > http://yaroslavvb.com/upload/ocropus/dataset2/0000/ > > I could figure out the problem if I had an example of a dataset where > trainseg works > > On Jun 12, 12:03 pm, Thomas Breuel <[email protected]> wrote: >> Again, you're trying to apply OCRopus to inputs it is not targeted at >> or tested on. That character is not a character cut out from a 300dpi >> scanned input, it's a fuzzy, scaled up image of a low-resolution >> character. >> >> That means you will have some work to do in order to get it to work on >> such inputs: you can either write C++ code to use the classifiers >> inside OCRopus to handle this case, or you need to figure out whether >> the existing line recognizer can be made to work on these kinds of >> inputs. >> >> If you really just want to recognize isolated characters like this, >> your best bet is to feed them directly to the OCRopus classifiers in a >> separate C++ program. >> >> Tom >> >> On Fri, Jun 12, 2009 at 04:30, Yaroslav Bulatov<[email protected]> wrote: >> >> > I tried higher resolution images, and get the same error. In >> > particular using the following dataset >> >http://yaroslavvb.com/upload/ocropus/dataset/ >> >> > I issue command >> > ocropus trainseg model.simple dataset >> >> > And get >> > dataset/0000/0000.gt.txt: transcript doesn't agree with cseg >> > (transcript 1, cseg 0) FIXME >> >> > On May 31, 1:27 pm, Thomas Breuel <[email protected]> wrote: >> >> > and get errors as below for each training file >> >> > dataset/0000/0636.gt.txt: transcript doesn't agree with cseg >> >> > (transcript 1, cseg 0) FIXME >> >> >> This means that the transcript contains one character and the cseg >> >> contains 0 characters. >> >> >> Why does the cseg contain zero characters? Because your images appear >> >> to be so low resolution that the noise filter just removes the few >> >> bits that are in your image. >> >> >> If you really want to train on such low resolution images, you have two >> >> options: >> >> >> * figure out which part of OCRopus is removing the bits and turn it >> >> off (noise removal happens in several places, and I'm not sure which >> >> one is responsible for this) >> >> >> * write your own top-level loop to train the characters directly (by >> >> copying and then greatly simplifying linerec.cc) >> >> >> BTW, the "FIXME" comment is there because we changed the >> >> representation of cseg files a little and that occasionally triggers >> >> this exception; however, in your case, the exception is really due to >> >> the bits getting deleted, rather than the changed cseg file. >> >> >> Tom > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
