Hi,

sure, here you go:

http://roggisch.de/screenshots.tgz

A little bit of explanation:

On the right, you see the processed image with lines detected by
OpenCV.

At the end of the lines, you see the page-numbers, which are extracted
and copied into the image in the left half, you can see that as well.

In the terminal, you see the classification-results: the dictionary
contains keys left and right, and as you can see, numbers vary wildly.

Again, the very same code and setup works with really high accuracy if
showing numbers from 10 to 28.

I'm in the meantime trying to finde the original data for the english
language training - are these available somewhere?

Diez


On Nov 12, 10:08 am, zdenko podobny <[email protected]> wrote:
> can you  send example of image where tesseract detect number with leading
> zero?
>
> Zdenko
>
> On Fri, Nov 11, 2011 at 9:56 PM, Diez B. Roggisch <[email protected]> wrote:
>
>
>
>
>
>
>
> > Hi,
>
> > I'm trying to detect page numbers in a book. Contrary to normal page-
> > numbers, the ones below 10 are written with a leading zero - 01,
> > 02, ...09.
>
> > For all numbers *above* 09, detection is much more stable.
>
> > I'm using english as language, and limit the character set to 0-9.
>
> > So my question is: is it possible that the english language training
> > set contains numbers without leading zeros, and thus the detection is
> > better? If yes, is there a way to only supply a different word-set,
> > without having to give the full image/box/stuff?
>
> > Also, I found a bug. I'm using baseapi, with a custom tessdata-dir
> > (OSX app bundle). This however doesn't work, because mainblk.cpp
> > relies on hard-coded paths or a environment-variable to determine
> > datadir, which is a global.
>
> > Setting the envdir in code is my work-around, but of course that's not
> > really cool.
>
> > Thanks for any suggestions,
>
> > Diez
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "tesseract-ocr" group.
> > To post to this group, send email to [email protected]
> > To unsubscribe from this group, send email to
> > [email protected]
> > For more options, visit this group at
> >http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to