I think your training data should be more than one line. Create a page of
text and see if that works.

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Mon, Oct 28, 2013 at 7:59 AM, Jonathan Nikkel <[email protected]> wrote:

> Hey there,
>
> I am a Tesseract novice, and would like to solicit some help/advices from
> you smart folks.  I will preface by saying that I have read the FAQ,
> searched this forum profusely, read all of the topics, and tried all the
> suggestions/advices I found, with no luck so far.  This is probably not a
> difficult one, I assume I must be missing something stupid, but hey, that
> is why we have forums like these =).
>
> What I am using:
> Windows 7 box
> Tesseract v3.02
> TesseractTrainer (auto-generated .tif's based on input training text,
> automates the training process)
>
> I am able to successfully train the off-the-shelf arial training data
> included with the Tesseract dev files.
>
> I am now trying to train a custom data set with the Arial font (no mods,
> standard installed with windows) using this setup to make sure I understand
> this training process/code, and am setting things up correctly, before
> moving on to more complex fonts.
>
> I am getting 100% failures in blob recognition/box resegmentation, and am
> puzzled as to why.  I have tried numerous combinations of character
> spacing, line spacing, font size, image bit depth (I am now using a binary
> image), DPI (using 300 dpi, 3600x3600 now, to be consistent with the
> example trainings), and am trying to home in using a font size that
> achieves an xheight of 25 pixels.  I have checked the box file accuracy
> using cowboxer, and am getting accurate boxes it appears.
>
> Attached are some example files; I have tried alternative character
> spacings from nearly touching, up to about double what you see here.  I
> have tried all of the pageseg modes, using* {prefix}.tif {prefix} nobatch
> box.train *parameters.  Pageseg mode 4 crashes, the rest generate 100%
> resegmentation errors.
>
> Where am I going wrong?  Anyone have a working example setup with
> TesseractTraining they can share?
>
> Regards,
>
> -Jon
>
>
>
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to