You do not use 3.04 version ;-) There is development version of tesseract marked that way, but it is not finished yet (AFAIK Ray commit some changes).
Zdenko On Wed, Sep 24, 2014 at 10:56 AM, Paul <[email protected]> wrote: > I use 3.04. You may try to upgrade. > > Am Dienstag, 23. September 2014 02:12:17 UTC+2 schrieb 葉家忠: >> >> 3.02... I've found the code snipet you said but can't have it executed.. >> 2014. 9. 23. 오전 5:29에 "Paul" <[email protected]>님이 작성: >> >>> Those sections are definitely run. Which version of Tesseract are you >>> using? >>> >>> Am Montag, 1. September 2014 09:00:29 UTC+2 schrieb 葉家忠: >>>> >>>> Really thank you for kindly help~ >>>> >>>> I try what you said above but get nothing changed, >>>> When I traced the code in debug mode, I found the codes mentioned above >>>> are never run once, >>>> I wonder if there is any parameter I should set it true? >>>> >>>> please teach me more, >>>> Thanks again~ >>>> >>>> >>>> 2014년 8월 30일 토요일 오후 6시 16분 9초 UTC+8, Paul 님의 말: >>>>> >>>>> I suffered from similar issues and fixed the problem by adding a line >>>>> to textord/colfind.cpp: >>>>> >>>>> Between >>>>> >>>>> #endif // GRAPHICS_DISABLED >>>>> >>>>> and >>>>> >>>>> SetBlockRuleEdges(input_block); >>>>> >>>>> I added: >>>>> >>>>> input_block->noise_blobs.clear(); // remove noise blobs >>>>> >>>>> This will remove noise blobs during the segmentation of blocks and >>>>> prevent noise blobs from being added to the text block around them. I >>>>> think >>>>> it is a dirty hack, but it will probably give you better results. Maybe we >>>>> have to tackle this problem in a more in-depth solution in the future. >>>>> >>>>> Changing the constant >>>>> >>>>> const double kMinMediumSizeRatio = 0.25; >>>>> >>>>> to >>>>> >>>>> const double kMinMediumSizeRatio = 0.15; >>>>> >>>>> in blobbox.cpp also helped to improve the results. You can try to >>>>> adjust that constant to your needs. >>>>> >>>>> Paul >>>>> >>>>> >>>>> Am Donnerstag, 28. August 2014 10:04:33 UTC+2 schrieb 葉家忠: >>>>>> >>>>>> I use Tesseract to recognize the simplified chinese character >>>>>> >>>>>> Since some noise of the source image can't be removed, so I decide >>>>>> to fix the source code to remove the incorrect result. >>>>>> >>>>>> Since the each of the chinese charactor size is fix-sized, so the >>>>>> nose can be found easily because its size will be much smaller than a >>>>>> normal character. >>>>>> >>>>>> I've tried to set the parameter "textord_heavy_nr" to true to remove >>>>>> the noise, but it won't work because in some case it will remove some >>>>>> importart parts of a chinese character which is quite necessary to form a >>>>>> complete chinese character >>>>>> >>>>>> Can any one tell me how to fix the code that remove the result lastly >>>>>> decided by Tesseract which size is smaller than specific blob size? >>>>>> >>>>>> I really thank you for helping~ >>>>>> >>>>>> >>>>>> >>>>>> ps: the attached file show 3 characters but it will be recognized as >>>>>> 4 characters because of the noise. >>>>>> >>>>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "tesseract-ocr" group. >>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>> topic/tesseract-ocr/M80Et5GOZXA/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/tesseract-ocr/20489d50-3464-427d-b599-896f519d5599% >>> 40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/20489d50-3464-427d-b599-896f519d5599%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/8f73cabc-acc2-4169-a358-1866e4d04afa%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/8f73cabc-acc2-4169-a358-1866e4d04afa%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zATCnRNqOJE8Of9Osjkyi0dgLonTttMDtzOsLtV6sAGA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

