Thanks for the information.
 
Actually, the original process I did follow the wiki instruction closely. 
The example given above is just an example to illustrate the problem I 
faced.
 
The original process I have taken looks like this.
Training text (chi.ming.exp0.txt):
- from *http://ash.jp/code/cn/big5tbl.htm*<http://ash.jp/code/cn/big5tbl.htm>
- remove the row and coloun headers, symbols not needed
- add common punctuations
- join lines
- repeat multiple times
- convert to UTF-8 without BOM using Notepad++

Training TIF and Box generation (chi.ming.exp0.tif and chi.ming.exp0.box):
- use jTessBoxEditor
- "ming" font, regular, 24pt

Training process
- use Tesseract-OCR 3.02 portable version for Windows
- command: ..\Tesseract-OCR\tesseract chi.ming.exp0.tif chi.ming.exp0 
batch.nochop box.train

Output
- long list of messages
- a partial list is attached in "partial messages from page 1 of 9.txt"

Files:
- chi.ming.exp0.txt
  
[*https://docs.google.com/file/d/0Bz99K1Qj2HQ_TkdUNmJYTDF1V00/edit*<https://docs.google.com/file/d/0Bz99K1Qj2HQ_TkdUNmJYTDF1V00/edit>
]
- chi.ming.exp0.tif
  
[*https://docs.google.com/file/d/0Bz99K1Qj2HQ_SVZ3QlpDczRLVW8/edit*<https://docs.google.com/file/d/0Bz99K1Qj2HQ_SVZ3QlpDczRLVW8/edit>
]
- chi.ming.exp0.box
  
[*https://docs.google.com/file/d/0Bz99K1Qj2HQ_RnBWejJWUVdFUGc/edit*<https://docs.google.com/file/d/0Bz99K1Qj2HQ_RnBWejJWUVdFUGc/edit>
]
- partial messages from page 1 of 10.txt
  
[*https://docs.google.com/file/d/0Bz99K1Qj2HQ_Z0gwVmY2OFJtTkk/edit*<https://docs.google.com/file/d/0Bz99K1Qj2HQ_Z0gwVmY2OFJtTkk/edit>
]
 

Thanks a lot.

Regards,
W. K. Lo
 
 

On Sunday, February 24, 2013 4:37:09 AM UTC+8, zdenop wrote:

> Your input image do not follow training wiki[1] so your result is failure 
> (yes, you can fail to train tesseract even you follow wiki ;-), but if you 
> do not follow it, you can be sure you will fail especially if you have no 
> experience with tesseract training) 
>
> [1] 
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images
>
> Zdenko
>
>
> On Tue, Feb 19, 2013 at 7:29 AM, W. K. LO <[email protected] 
> <javascript:>>wrote:
>
>> I have problem using tesseract in training using character image.
>> Examples of the problem is described as follows.
>> Box and Tif files are attached.
>> Box: https://docs.google.com/file/d/0Bz99K1Qj2HQ_dkZKUW5RdDU1Tk0/edit
>> Tif: https://docs.google.com/file/d/0Bz99K1Qj2HQ_WkJqOHI0OHU3Nnc/edit
>>  
>> Case 1:
>> ===command===
>> tesseract test.ming.24.tif test.ming.24 batch.nochop box.train
>>  
>> ===output message===
>> Tesseract Open Source OCR Engine v3.02 with Leptonica
>> Empty page!!
>> Empty page!!
>>  
>> Case 2: Telling Tesseract there is only one single character
>> ===command===
>> .\tesseract test.ming.24.tif test.ming.24 -psm 10 batch.nochop box.train
>>  
>> ===output message===
>> Tesseract Open Source OCR Engine v3.02 with Leptonica
>> Bounding box=(16,23)->(28,32)
>> Bounding box=(16,15)->(28,24)
>> APPLY_BOXES: boxfile line 0/??((8,14),(36,41)): FAILURE! Couldn't find a 
>> matchin
>> g blob
>> APPLY_BOXES:
>>    Boxes read from boxfile: 1
>>    Boxes failed resegmentation: 1
>> APPLY_BOXES: Unlabelled word at :Bounding box=(16,15)->(28,32)
>> APPLY_BOXES: Unlabelled word at :Bounding box=(8,14)->(36,41)
>>    Found 0 good blobs.
>>    2 remaining unlabelled words deleted.
>> Generated training data for 0 words
>>  
>> Any options needed to be specified to make it work?
>>  
>> Thanks a lot.
>>  
>> Regards,
>> W. K. Lo
>>
>> -- 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to