Hi, I want to use tesseract for Chinese words. So, first I tried to execute the command sudo apt install tesseract-ocr-chi-sim And, I can find chi_sim.traineddata in /usr/share/tesseract-ocr/4.00/tessdata and can check like this (I also downloaded chi_tra and jpn.)
$ tesseract --list-langs List of available languages (5): chi_sim chi_tra eng jpn osd Actually, I can use tesseract, but I want to do ocr more accurately, so I want to use chi_sim.traineddata downloaded from here. https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata After I executed the command sudo apt remove tesseract-ocr-chi-sim I put the new chi_sim.traineddata in /usr/share/tesseract-ocr/4.00/tessdata, and I tried to use tesseract. However I cannot like this. $ tesseract 0.jpeg output -l chi_sim Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'chi_sim' Tesseract couldn't load any languages! Could not initialize tesseract. Then, I tried like this, but I cannot. $ tesseract 0.jpeg output -l chi_sim --tessdata-dir /usr/share/tesse ract-ocr/4.00/tessdata Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'chi_sim' Tesseract couldn't load any languages! Could not initialize tesseract. Then, I tried to connect path to /usr/share/tesseract-ocr/4.00/tessdata and tried again, but I cannot. $ export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata/ $ tesseract 0.jpeg output -l chi_sim Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language 'chi_sim' Tesseract couldn't load any languages! Could not initialize tesseract. If I execute the language list, I can find chi_sim.traineddata again. $ tesseract --list-langs List of available languages (5): chi_sim chi_tra eng jpn osd Please tell me why I cannot use the traineddata downloaded from https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata? Did I make a mistake? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/e93f49e3-978e-458d-8f97-1e0266a318c8%40googlegroups.com.

