hey, i try to build tesseract from source now, and after i have built Leptonica, i couldn't build tesseract with this error :
/bin/bash ../libtool --tag=CXX --mode=link g++ -g -O2 -std=c++11 -o tesseract tesseract-tesseractmain.o libtesseract.la -lrt -lpthread libtool: link: g++ -g -O2 -std=c++11 -o .libs/tesseract tesseract-tesseractmain.o ./.libs/libtesseract.so -lrt -lpthread /usr/bin/ld: tesseract-tesseractmain.o: undefined reference to symbol 'lept_free' //usr/local/lib/liblept.so.5: error adding symbols: DSO missing from command line collect2: error: ld returned 1 exit status Makefile:598: recipe for target 'tesseract' failed make[2]: *** [tesseract] Error 1 make[2]: Leaving directory '/home/david/project/tesseract-3.05.01/api' Makefile:489: recipe for target 'all-recursive' failed make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory '/home/david/project/tesseract-3.05.01' Makefile:398: recipe for target 'all' failed make: *** [all] Error 2 Any idea why ? On Monday, June 19, 2017 at 6:58:57 PM UTC+3, shree wrote: > > I would also suggest that you add spaces between words in your input text, > > ShreeDevi > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Mon, Jun 19, 2017 at 9:19 PM, ShreeDevi Kumar <[email protected] > <javascript:>> wrote: > >> You could also try running training on your windows pc with 3.05.01 >> using tesstrain.sh using `git for windows` which will provide you a shell >> for running bash scripts. >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Mon, Jun 19, 2017 at 9:05 PM, ShreeDevi Kumar <[email protected] >> <javascript:>> wrote: >> >>> Where do you have your source files for english langdata? >>> >>> If it is in a directory such as ../langdata/eng/ >>> then put the common.unicharset, latin.unicharset and font_properties etc >>> in >>> ../langdata >>> >>> >>> >>> ShreeDevi >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Mon, Jun 19, 2017 at 8:34 PM, David Barishev <[email protected] >>> <javascript:>> wrote: >>> >>>> Thanks for the replay, >>>> If you mean if i have the latin and common unicharset in the tessdata >>>> direcotry( /usr/share/tesseract-ocr/tessdata ),i have downloaded them and >>>> placed them in the directory and still getting the same behavior. >>>> I have also tried doing it from my windows machine which has 3.05 >>>> version, and had same behavior . >>>> >>>> On Monday, June 19, 2017 at 2:58:40 PM UTC+3, shree wrote: >>>>> >>>>> do u have the common and latin unicharset in ur langdata directory. >>>>> >>>>> See https://github.com/tesseract-ocr/langdata >>>>> >>>>> Try to build the latest 3.05.01 version. >>>>> >>>>> ShreeDevi >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>>> On Mon, Jun 19, 2017 at 3:23 PM, David Barishev <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello all! >>>>>> Im trying to train tesseract to recognize a new font in English ( >>>>>> supercell-magic). >>>>>> I have created a .tif file and matching .box file using >>>>>> jTessBoxEditor ( eng.supercell-magic.exp0.tif and >>>>>> eng.supercell-magic.exp0.box ), and trained tesseract with them. >>>>>> >>>>>> Here is tesseracts's output: >>>>>> $ tesseract eng.supercell-magic.exp0.tif eng.supercell-magic.exp0 >>>>>> box.train >>>>>> Tesseract Open Source OCR Engine v3.04.01 with Leptonica >>>>>> Page 1 >>>>>> row xheight=30, but median xheight = 37.5455 >>>>>> APPLY_BOXES: >>>>>> Boxes read from boxfile: 1559 >>>>>> Found 1559 good blobs. >>>>>> Generated training data for 34 words >>>>>> Page 2 >>>>>> APPLY_BOXES: >>>>>> Boxes read from boxfile: 1677 >>>>>> Found 1677 good blobs. >>>>>> Generated training data for 34 words >>>>>> Page 3 >>>>>> APPLY_BOXES: >>>>>> Boxes read from boxfile: 1362 >>>>>> Found 1362 good blobs. >>>>>> Generated training data for 28 words >>>>>> >>>>>> >>>>>> So the next step is to extract the characters >>>>>> using unicharset_extractor. >>>>>> There was a normal output for it : >>>>>> $ unicharset_extractor eng.supercell-magic.exp0.box >>>>>> Extracting unicharset from eng.supercell-magic.exp0.box >>>>>> Wrote unicharset file ./unicharset. >>>>>> >>>>>> But when i view the file, it's mostly 0 and 255, which is not like >>>>>> the example in the wiki >>>>>> <https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#an-example-of-the-unicharset-file> >>>>>> >>>>>> : >>>>>> An example of the unicharset file >>>>>> >>>>>> 110 >>>>>> NULL 0 NULL 0 >>>>>> N 5 59,68,216,255,87,236,0,27,104,227 Latin 11 0 1 N >>>>>> Y 5 59,68,216,255,91,205,0,47,91,223 Latin 33 0 2 Y >>>>>> 1 8 59,69,203,255,45,128,0,66,74,173 Common 3 2 3 1 >>>>>> 9 8 18,66,203,255,89,156,0,39,104,173 Common 4 2 4 9 >>>>>> a 3 58,65,186,198,85,164,0,26,97,185 Latin 56 0 5 a >>>>>> ... >>>>>> >>>>>> >>>>>> Mine looks more like this: >>>>>> >>>>>> 74 >>>>>> NULL 0 NULL 0 >>>>>> Joined 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # Joined [4a 6f 69 6e >>>>>> 65 64 ] >>>>>> |Broken|0|1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # Broken >>>>>> t 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # t [74 ] >>>>>> h 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # h [68 ] >>>>>> a 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # a [61 ] >>>>>> n 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # n [6e ] >>>>>> P 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # P [50 ] >>>>>> o 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # o [6f ] >>>>>> e 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # e [65 ] >>>>>> : 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # : [3a ] >>>>>> r 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # r [72 ] >>>>>> l 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # l [6c ] >>>>>> i 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # i [69 ] >>>>>> 1 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # 1 [31 ] >>>>>> N 0 0,255,0,255,0,0,0,0,0,0 NULL 0 0 0 # N [4e ] >>>>>> >>>>>> Why is that ? Thanks in advances. >>>>>> >>>>>> Im using ubuntu 16.04 with tesseract version: >>>>>> >>>>>> tesseract 3.04.01 >>>>>> leptonica-1.73 >>>>>> libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : >>>>>> libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0 >>>>>> >>>>>> I have attached the box and tiff file and the data file, and the >>>>>> unicharset file. >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/tesseract-ocr/cd052525-9eb7-4527-b75b-82e1a687997d%40googlegroups.com >>>>>> >>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/cd052525-9eb7-4527-b75b-82e1a687997d%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected] <javascript:>. >>>> To post to this group, send email to [email protected] >>>> <javascript:>. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/3789eb00-d438-4efe-afc3-ce3e3dc60aa2%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/3789eb00-d438-4efe-afc3-ce3e3dc60aa2%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/54633535-84c3-47b4-9d60-1c081ff0ddd1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

