thanks, the problem is fixed now,because the font_properties and the [lang ].[fontname].exp[num] on the command , must same.
but one thing i cant understand. the fontname is a real font Name?? or just a mark?? if it's a real font name , the program is using or not? if my font name have a space in the middle ,how can i do? font name like: <My Font>. very thanks... 在 2013年1月14日星期一UTC+8上午2时55分49秒,zdenop写道: > > On Sun, Jan 13, 2013 at 6:06 PM, zdenko podobny <[email protected]<javascript:> > > wrote: > >> If you want to help, that make sure you read documentation[1], follow it >> closely and search forum/issues. Making multiple posts (forum+issues) will >> not help you. >> >> Just reading your post it is clear that you do not follow wiki at least >> in there cases: >> >> - name of input files. If documentation states it should be "[lang].[ >> fontname].exp[num].tif" why do you use "[lang].[fontname].[num].tif" >> ??? >> - font_properties - it is not according documentation. >> >> If you want to run traning for non-latin based language - make sure you >> are able to run it for English first. There are reported some problems with >> LTR training, >> > > Ups it should be RTL training... > > >> so it will help you to eliminate problems with not following >> documentation and possible problems with non-latin based language. >> >> [1] https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 >> >> Zdenko >> >> >> On Sun, Jan 13, 2013 at 3:57 AM, gold snake <[email protected]<javascript:> >> > wrote: >> >>> help~~~~ >>> >>> 在 2013年1月12日星期六UTC+8下午4时15分09秒,gold snake写道: >>> >>>> *the display error content is :* >>>> D:\Little\Tesseract-OCR\build>**shapeclustering -F font_properties -U >>>> unicharset - >>>> O oybab.unicharset oybab.A.0.tr >>>> Reading oybab.A.0.tr ... >>>> Font id = -1/0, class id = 1/2 on sample 0 >>>> font_id >= 0 && font_id < font_id_map_.SparseSize():**Error:Assert >>>> failed:in file >>>> ..\..\classify\**trainingsampleset.cpp, line 622 >>>> >>>> *there is my font_properties file content:* >>>> TheFont 0 0 0 0 0 >>>> >>>> *there is when i make tr files commandLine display content:* >>>> D:\Little\Tesseract-OCR\build>**tesseract oybab.A.0.tif oybab.A.0 >>>> nobatch box.trai >>>> n >>>> Tesseract Open Source OCR Engine v3.02 with Leptonica >>>> TIFFReadDirectory: Warning, TIFFstream: wrong data type 7 for >>>> "RichTIFFIPTC"; ta >>>> g ignored. >>>> TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 37724 >>>> (0x935c) en >>>> countered. >>>> TIFFReadDirectory: Warning, TIFFstream: wrong data type 7 for >>>> "RichTIFFIPTC"; ta >>>> g ignored. >>>> TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 37724 >>>> (0x935c) en >>>> countered. >>>> TIFFReadDirectory: Warning, TIFFstream: wrong data type 7 for >>>> "RichTIFFIPTC"; ta >>>> g ignored. >>>> TIFFReadDirectory: Warning, TIFFstream: unknown field with tag 37724 >>>> (0x935c) en >>>> countered. >>>> row xheight=120.333, but median xheight = 83.5 >>>> row xheight=46.6667, but median xheight = 83.5 >>>> APPLY_BOXES: boxfile line 3/卅 ((312,53),(385,204)): FAILURE! Couldn't >>>> find a ma >>>> tching blob >>>> APPLY_BOXES: >>>> Boxes read from boxfile: 4 >>>> Boxes failed resegmentation: 1 >>>> APPLY_BOXES: Unlabelled word at :Bounding box=(312,53)->(369,122) >>>> Found 3 good blobs. >>>> 1 remaining unlabelled words deleted. >>>> >>>> >>>> >>>> >>>> *there is my box file content:* >>>> ئ 18 48 142 227 0 >>>> ئ 173 43 218 223 0 >>>> ئ 254 39 274 228 0 >>>> ئ 312 53 385 204 0 >>>> >>>> *ps: my language is something like arab, it's writing right to left. >>>> so what is the problem ??? please help. thanks so much...* >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected]<javascript:> >>> To unsubscribe from this group, send email to >>> [email protected] <javascript:> >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

