On Tue, Apr 30, 2013 at 2:43 PM, Ardian Nur Fazri <[email protected]>wrote:
> i have unicharset with complex value. it's not like in literature on > https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3, it's > more than it. > > *No matter what the documentation says, the source code is the ultimate truth, the best and most definitive and up-to-date documentation you're likely to find.*[1] ;-) Suggested reading for today is unicharset.h[2] unicharset.cpp[3] [1] http://www.codinghorror.com/blog/2012/04/learn-to-read-the-source-luke.html [2] https://code.google.com/p/tesseract-ocr/source/browse/trunk/ccutil/unicharset.h [3] https://code.google.com/p/tesseract-ocr/source/browse/trunk/ccutil/unicharset.cpp this is my sample of my code : > > a 3 0,255,0,255,0,32767,0,32767,0,32767 NULL -1 0 0 # a [61 ]a >> >> n 3 0,255,0,255,0,32767,0,32767,0,32767 NULL -1 0 0 # n [6e ]a >> >> c 3 0,255,0,255,0,32767,0,32767,0,32767 NULL -1 0 0 # c [63 ]a >> >> r 3 0,255,0,255,0,32767,0,32767,0,32767 NULL -1 0 0 # r [72 ]a >> >> k 3 0,255,0,255,0,32767,0,32767,0,32767 NULL -1 0 0 # k [6b ]a >> >> f 3 0,255,0,255,0,32767,0,32767,0,32767 NULL -1 0 0 # f [66 ]a >> >> d 3 0,255,0,255,0,32767,0,32767,0,32767 NULL -1 0 0 # d [64 ]a >> >> s 3 0,255,0,255,0,32767,0,32767,0,32767 NULL -1 0 0 # s [73 ]a >> >> w 3 0,255,0,255,0,32767,0,32767,0,32767 NULL -1 0 0 # w [77 ]a >> >> > In past I did "fast" examination[4], but I never find time (reason, priority???) for deeper evaluation of this file. So if anybody has time/resource please feel free to have a look on this and share your experience with community... Here some hints: - there could be several version of unicharset file[5] - some part of data (e.g. ranges, script) are not fill in by current training tools (e.g. they have default/initted values) - extracting of unicharset from data files provided by google can help with analyze... [4] http://www.sk-spell.sk.cx/first-notes-for-tesseract-ocr-302-traning [5] https://code.google.com/p/tesseract-ocr/source/browse/trunk/ccutil/unicharset.cpp?r=838#682 > anybody know what for value which in green color background? > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

