I tried it as "ocropus trainseg may04 dataset" on a dataset http://yaroslavvb.com/upload/ocropus04dataset/
and get errors as below for each training file dataset/0000/0636.gt.txt: transcript doesn't agree with cseg (transcript 1, cseg 0) FIXME On May 27, 4:19 am, Thomas Breuel <[email protected]> wrote: > You can train on isolated characters using "ocropus trainseg"; it > requires the input images (of the form 0000/0001.png), corresponding > character segmentation files (of the form 0000/0001.cseg.gt.png) and > the output (of the form 0000/0001.gt.txt). If you really have just > one character per input, the 0001.cseg.gt.png is just a binary version > of the grayscale image and the ground truth file contains only a > single character. More commonly, you'd have many characters per line. > > Alternatively, if you really want full programmatic control, you can > use any classifier (interface: IModel) and train it. You can train it > either on the raw bitmap, or you can extract features with the > built-in feature extractor (interface: IFeatureMap), or with your own > feature extractor. > > Look in linerec.cc in the addTrainingLine and recognizeLine methods > (although that contains a lot of segmentation-related code). > > Tom > > On Tue, May 26, 2009 at 20:38, Yaroslav Bulatov <[email protected]> wrote: > > > I'd like to train ocropus to recognize isolated digits. Version 0.3 > > had rec-bpnet-isolated Lua script, any suggestions where to start > > looking for similar functionality in 0.4? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
