Re: Image quality loss in pages2lines process as well as general errors.

Tom Fri, 14 May 2010 13:15:13 -0700

This was reported before and should be fixed in the latest version.

http://code.google.com/p/ocropus/issues/detail?id=255#c2


The reason behind it is that pages2lines splits up both the grayscale
and the binary versions of the image, but it used the original version
for one and the deskewed version for the other.  Since extraction from
the grayscale image involves masking, you're seeing JPEG noise through
character-shaped masks.

If you give poor quality input to the rest of OCRopus, you will get
"index out of range" and "beam search" errors; the former usually
indicates that it couldn't find something, and the latter indicates
that the raw output from the character recognizer doesn't match the
language model.  We're trying to improve the error messages in future
versions.

Please have a look at the new commands in ocropy/ocropus-*; they are
functionally analogous, but much easier to read and modify, and they
take command line switches and have more built-in documentation.

Tom

On Apr 29, 11:28 pm, Ben <[email protected]> wrote:
> I want to start by thanking you for helping out. I have been trying
> for a while to get ocropus working and have been having some issues.
> My first issue is that the version of ocropus I compiled (0.4.4) is
> behaving very poorly on most images.  For example, the image
> "alice_1.png", which comes with ocropus in the "data/testimages"
> folder, returns very poor results when I run it through the
> recommended process of "book2pages", "pages2lines", "lines2fsts"...
> (seehttp://benhansen.me/ocropus/alice_1.png.html) but returns great
> results when I use the "ocropus page <imagefilename>" command line
> parameter (seehttp://benhansen.me/ocropus/alice_1.pngPageOutput.html).
> After looking into the problem it seems that images are loosing a lot
> of quality in the "pages2lines" step of the process. The binarization
> process seems to be very clean and effective 
> (seehttp://benhansen.me/ocropus/0001.bin.png)
> but the segmented line images have lost a lot of quality 
> (seehttp://benhansen.me/ocropus/010003.png).  I am confused on the reason
> behind this quality loss. Shouldn't the pages2lines process only split
> up the already touched up binarized image? Any ideas? Robert B.
> submitted "Which image for training?" on Apr 22 I couldn't find a
> responce.
> Also I keep getting a lot of "[error] narray: index out of range", and
> "[error] beam search failed" errors.  There was a discussion
> "introduction and request for help getting up and running" on May 9
> where these errors and garbage output were mentioned to be caused by a
> PPI of less than 300 or otherwise improperly formatted images. The
> images I have been using are 300 DPI as well as have a standard font
> size with regards to the dpi. Are there any other specifications that
> I should know about the images.
> I would like to start working on these problems.  My feelings are that
> this might just be a glitch in the current version and I don't want to
> waste energy solving a problem that has already been solved.
> Thanks Again!
> -Ben
>
> --
> You received this message because you are subscribed to the Google Groups 
> "ocropus" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group 
> athttp://groups.google.com/group/ocropus?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

Re: Image quality loss in pages2lines process as well as general errors.

Reply via email to