More information and question:
 
- Putting the 'tesseract found' boxes on to the image, it can be seen that 
one box is correct and the other actually is the 'inside' of a region in 
the character.
 
- Does anyone know how to stop tesseract from doing the resegementaion and 
just make use of the provided bounding box for training? (It is certain 
that the bounding box is correct as it is generated, not manually marked or 
computer detected)
 
 
Regards,
W. K. Lo
 
 

On Tuesday, February 19, 2013 1:03:27 PM UTC+8, W. K. LO wrote:

>  Dear Tessearct users/developers,
>
>  
>
> I have problem using Tesseract to train a Chinese OCR. Examples are 
> described as follows:
>
>  
>
> 1. Empty page, even though the TIF is not empty and the box file is 
> bounding the character tighthly
>
> .\tesseract test.ming.24.tif test.ming.24 batch.nochop box.train
>
>  
>
> === begin output ===
>
> Tesseract Open Source OCR Engine v3.02 with Leptonica
>
> Empty page!!
>
> Empty page!!
>
> === end output ===
>
>  
>
>  
>
> 2.  Failed resegmentation (specifically tell that there is only one 
> character)
>
> .\tesseract test.ming.24.tif test.ming.24 -psm 10 batch.nochop box.train
>
>  
>
> === begin output ===
>
> Tesseract Open Source OCR Engine v3.02 with Leptonica
>
> Bounding box=(16,23)->(28,32)
>
> Bounding box=(16,15)->(28,24)
>
> APPLY_BOXES: boxfile line 0/??((8,14),(36,41)): FAILURE! Couldn't find a 
> matchin
>
> g blob
>
> APPLY_BOXES:
>
>   Boxes read from boxfile:       1
>
>   Boxes failed resegmentation:       1
>
> APPLY_BOXES: Unlabelled word at :Bounding box=(16,15)->(28,32)
>
> APPLY_BOXES: Unlabelled word at :Bounding box=(8,14)->(36,41)
>
>    Found 0 good blobs.
>
>    2 remaining unlabelled words deleted.
>
> Generated training data for 0 words
>  
> === end output ===
>
>  
>
> Anyone can help?
>
>  
>
>  
>
> Regards,
>
> W. K. Lo
>
>  
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to