I think you are mixing 2 different things: you can get box output
or hOCR output but not both:

   - box file is IMO useful for tesseract training and it has only
   information about symbols and its positions
   - hOCR is IMO focused page analyze (it identifies blocks, paragraphs,
   words) and it show word confidence (in x_wconf)

Using both variables does not make sense.

If you are not satisfied with hOCR output you can create your own output
using tesseract-ocr API.

Zdenko


On Mon, Jun 24, 2013 at 7:10 PM, Perry Horwich <[email protected]>wrote:

> Hi,
>
> Thanks for the awesome opensource OCR application.
>
> I can generate html and box files using a config file like this:
>
> tessedit_char_whitelist
> abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
> tessedit_create_boxfile 1
> tessedit_create_hocr 1
>
> This does not seem to be producing confidence values, either by word or
> letter.
>
> The box file looks like this:
>
> a 1883 3619 1940 3684 0
> d 1946 3617 2007 3704 0
> e 2014 3618 2069 3684 0
>
> And the <body> of the html hocr file looks identical:
>
> a 1883 3619 1940 3684 0
> d 1946 3617 2007 3704 0
> e 2014 3618 2069 3684 0
>
> Is there a variable I can set in the config file to produce confidence
> values for words or letters?
>
> I am using:
> tesseract 3.02.02
>  leptonica-1.69
>   libjpeg 8d : libpng 1.5.14 : libtiff 4.0.3 : zlib 1.2.5
>
> ... compiled on a Mac, OS X 10.8.3  Works great.
>
> Many thanks -
>
> Perry
>
> --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to