Please post it as an issue in
http://code.google.com/p/tesseract-ocr/issues/list
I am having similar problems too.

Shree

Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Sun, Jun 30, 2013 at 3:21 AM, TedJ <[email protected]> wrote:

> Hello,
>
> Can anyone tell me why 3.0.1 works flawlessly for the command line:
> 'tesseract 01.bmp 01 nobatch box.train' operating on the attached source
> bmp and box files with this output:
>
> *Tesseract Open Source OCR Engine v3.01 with Leptonica*
> *APPLY_BOXES:*
> *   Boxes read from boxfile:    4883*
> *   Boxes failed resegmentation:       0*
> *   Found 4883 good blobs and 0 unlabelled blobs in 0 words.*
> *   0 remaining unlabelled words deleted.*
> *TRAINING ... Font name = UnknownFont*
> *Generated training data for 1011 words*
>
> but 3.0.02 produces this output:
>
> *Tesseract Open Source OCR Engine v3.02 with Leptonica*
> *row xheight=11.3333, but median xheight = 19.4987*
> *row xheight=10, but median xheight = 19.4987*
> *row xheight=10, but median xheight = 19.4987*
> *row xheight=10, but median xheight = 19.4987*
> *row xheight=10, but median xheight = 19.4987*
> *row xheight=10, but median xheight = 19.4987*
> *row xheight=10.6667, but median xheight = 19.4987*
> *row xheight=16.5455, but median xheight = 19.4987*
> *row xheight=16.5455, but median xheight = 19.4987*
> *row xheight=16.5455, but median xheight = 19.4987*
> *row xheight=12, but median xheight = 19.4987*
> *row xheight=16.5455, but median xheight = 19.4987*
> *row xheight=16.5455, but median xheight = 19.4987*
> *row xheight=16.5455, but median xheight = 19.4987*
> *row xheight=16.5455, but median xheight = 19.4987*
> *row xheight=16.5455, but median xheight = 19.4987*
> *row xheight=13, but median xheight = 19.4987*
> *row xheight=16.5455, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *row xheight=28.5, but median xheight = 19.4987*
> *FAIL!*
> *APPLY_BOXES: boxfile line 38/N ((756,1844),(766,1872)): FAILURE!
> Couldn't find a matching blob*
> *FAIL!*
> *APPLY_BOXES: boxfile line 148/N ((1172,1808),(1182,1836)): FAILURE!
> Couldn't find a matching blob*
> *FAIL!*
> *APPLY_BOXES: boxfile line 233/N ((948,1772),(958,1800)): FAILURE!
> Couldn't find a matching blob*
> *FAIL!*
> *APPLY_BOXES: boxfile line 331/N ((948,1736),(958,1764)): FAILURE!
> Couldn't find a matching blob*
> *FAIL!*
> *APPLY_BOXES: boxfile line 414/N ((644,1700),(654,1728)): FAILURE!
> Couldn't find a matching blob*
> *FAIL!*
> *APPLY_BOXES: boxfile line 463/N ((1668,1700),(1678,1728)): FAILURE!
> Couldn't find a matching blob*
> *FAIL!*
> *APPLY_BOXES: boxfile line 627/M ((1204,1628),(1214,1656)): FAILURE!
> Couldn't find a matching blob*
> *FAIL!*
> *APPLY_BOXES: boxfile line 647/' ((1560,1648),(1562,1656)): FAILURE!
> Couldn't find a matching blob*
> *FAIL!*
> *APPLY_BOXES: boxfile line 653/W ((1668,1628),(1678,1656)): FAILURE!
> Couldn't find a matching blob*
> *APPLY_BOXES:*
> *   Boxes read from boxfile:    4883*
> *   Boxes failed resegmentation:     173*
> *APPLY_BOXES: Unlabelled word at :Bounding box=(1622,1844)->(1630,1860)*
> *   Found 4710 good blobs.*
> *   Leaving 4 unlabelled blobs in 0 words.*
> *   1 remaining unlabelled words deleted.*
> *TRAINING ... Font name = UnknownFont*
> *Generated training data for 1272 words*
>
> I think I've ruled out character/line spacing issues.  The idealized fonts
> appear perfect.  What could be wrong?  You'll notice that 3.02 fails
> initially on a series of N's and on other chars thereafter.  Note that I
> changed the mixture of uppercase and lowercase characters from the standard
> "than Phone:" training text into all uppercase.  Note that I changed
> nothing in between installing 3.01, running it and producing the language
> 01, then deleting 3.01 and installing 3.02 instead on Windows.  Note that
> I've removed most of the APPLY_BOXES errors for brevity.  Can anyone tell
> me what the cryptic row xHeight and median xHeight output is saying about
> how Tesseract is interpreting the source bmp?  Could that have to do with
> relative line length?  Can anyone please tell me why 3.02.02 is failing so
> often whereas 3.01 is not?
>
> Thanks,
>   Ted
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to