Re: Using ocropus 0.4 for isolated character recognition?

Thomas Breuel Sat, 13 Jun 2009 03:46:54 -0700

I put up a sample training set and describe how to do training in this document:


http://code.google.com/p/ocropus/wiki/Using

(Look under Training, then Getting Started)

Tom

On Sat, Jun 13, 2009 at 00:18, Yaroslav Bulatov<[email protected]> wrote:
>
> I tried ocropus on a character cut out from a scanned input, and got
> the same error.
> http://yaroslavvb.com/upload/ocropus/dataset2/0000/
>
> I could figure out the problem if I had an example of a dataset where
> trainseg works
>
> On Jun 12, 12:03 pm, Thomas Breuel <[email protected]> wrote:
>> Again, you're trying to apply OCRopus to inputs it is not targeted at
>> or tested on.  That character is not a character cut out from a 300dpi
>> scanned input, it's a fuzzy, scaled up image of a low-resolution
>> character.
>>
>> That means you will have some work to do in order to get it to work on
>> such inputs: you can either write C++ code to use the classifiers
>> inside OCRopus to handle this case, or you need to figure out whether
>> the existing line recognizer can be made to work on these kinds of
>> inputs.
>>
>> If you really just want to recognize isolated characters like this,
>> your best bet is to feed them directly to the OCRopus classifiers in a
>> separate C++ program.
>>
>> Tom
>>
>> On Fri, Jun 12, 2009 at 04:30, Yaroslav Bulatov<[email protected]> wrote:
>>
>> > I tried higher resolution images, and get the same error. In
>> > particular using the following dataset
>> >http://yaroslavvb.com/upload/ocropus/dataset/
>>
>> > I issue command
>> > ocropus trainseg model.simple dataset
>>
>> > And get
>> > dataset/0000/0000.gt.txt: transcript doesn't agree with cseg
>> > (transcript 1, cseg 0) FIXME
>>
>> > On May 31, 1:27 pm, Thomas Breuel <[email protected]> wrote:
>> >> > and get errors as below for each training file
>> >> > dataset/0000/0636.gt.txt: transcript doesn't agree with cseg
>> >> > (transcript 1, cseg 0) FIXME
>>
>> >> This means that the transcript contains one character and the cseg
>> >> contains 0 characters.
>>
>> >> Why does the cseg contain zero characters?  Because your images appear
>> >> to be so low resolution that the noise filter just removes the few
>> >> bits that are in your image.
>>
>> >> If you really want to train on such low resolution images, you have two 
>> >> options:
>>
>> >> * figure out which part of OCRopus is removing the bits and turn it
>> >> off (noise removal happens in several places, and I'm not sure which
>> >> one is responsible for this)
>>
>> >> * write your own top-level loop to train the characters directly (by
>> >> copying and then greatly simplifying linerec.cc)
>>
>> >> BTW, the "FIXME" comment is there because we changed the
>> >> representation of cseg files a little and that occasionally triggers
>> >> this exception; however, in your case, the exception is really due to
>> >> the bits getting deleted, rather than the changed cseg file.
>>
>> >> Tom
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Using ocropus 0.4 for isolated character recognition?

Reply via email to