hi, W.K.LO have you ever solved this problem? i've met this segmentation fault recently.
On Wednesday, February 27, 2013 4:44:20 PM UTC+8, W. K. LO wrote: > > Dear all, > > I would like to know if there is/are option(s) for controlling the > segmentation process during OCR. > > I am playing with a Chinese OCR and find that the segmentation is affected > by neighbouring characters. I would like to try playing with the > parameters/options to control the processes. An example is given as follows: > > test01 > ====== > test01.tif [ > https://docs.google.com/file/d/0Bz99K1Qj2HQ_anJKQXN4RTlmLXc/edit] > command: tesseract test01.tif test01 -l chi makebox > result: 4th, 5th, 8th characters are broken apart > test01.box [ > https://docs.google.com/file/d/0Bz99K1Qj2HQ_aFQ2ekpVMy0wTWM/edit] > screen of test01 segmentaion [ > https://docs.google.com/file/d/0Bz99K1Qj2HQ_UmlUVFNLT0paZjA/edit] > > test02 > ====== > test02.tif [ > https://docs.google.com/file/d/0Bz99K1Qj2HQ_NVF6ZzhvSXBQZnc/edit] > edit: remove first 3 characters of test01.tif > command: tesseract test02.tif test02 -l chi makebox > result: all characters are correctly segmented (only mixed up a > punctuation mark) > test02.box [ > https://docs.google.com/file/d/0Bz99K1Qj2HQ_N04zM1V2T2xvNWs/edit] > screen of test02 segmentaion [ > https://docs.google.com/file/d/0Bz99K1Qj2HQ_QXNKcGNzU3NxMDg/edit] > > test03 > ====== > test03.tif [ > https://docs.google.com/file/d/0Bz99K1Qj2HQ_cjZCbVE3ZVNWOUU/edit] > edit: replace the 2nd last character of test02.tif > command: tesseract test03.tif test03 -l chi makebox > result: 1st, 2nd and 5th characters are broken apart > test03.box [ > https://docs.google.com/file/d/0Bz99K1Qj2HQ_TVRzMXpmTDlwR00/edit] > screen of test03 segmentaion [ > https://docs.google.com/file/d/0Bz99K1Qj2HQ_UDlyOGIxU091SnM/edit] > It seems that the combination in test02 favour tesseract's default > setting. I would like to try if there are parameters/options for me to play > around to control the segmentation process. > > > Thanks. > > Regards, > W. K. Lo > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

