Thanks for your advice. I downdloaded files by clicking the "download" button in https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata. And I moved the chi_sim.traineddata file to /usr/share/tesseract-ocr/4.00/tessdata/ , and checked the file (which size is 42.3MB) exactly there. But, I cannot use tesseract. As I said, I can use tesseract with the file downloaded by executing sudo apt install tesseract-ocr-chi-sim, but the data downloaded from Data files did not work. I cannot understand why it did not work.
2019年12月8日日曜日 23時15分31秒 UTC+9 zdenop: > > How did you downloaded files from repository? > Please check files in /usr/share/tesseract-ocr/4.00/tessdata/ if there > have the same size as in repository. > > Zdenko > > > so 7. 12. 2019 o 17:34 坂本聖 <[email protected] <javascript:>> > napísal(a): > >> Hi, >> I want to use tesseract for Chinese words. So, first I tried to execute >> the command >> sudo apt install tesseract-ocr-chi-sim >> And, I can find chi_sim.traineddata in >> /usr/share/tesseract-ocr/4.00/tessdata and can check like this (I also >> downloaded chi_tra and jpn.) >> >> $ tesseract --list-langs >> >> List of available languages (5): >> >> chi_sim >> >> chi_tra >> >> eng >> >> jpn >> >> osd >> >> >> Actually, I can use tesseract, but I want to do ocr more accurately, so I >> want to use chi_sim.traineddata downloaded from here. >> https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata >> After I executed the command >> sudo apt remove tesseract-ocr-chi-sim >> I put the new chi_sim.traineddata in >> /usr/share/tesseract-ocr/4.00/tessdata, and I tried to use tesseract. >> However I cannot like this. >> >> $ tesseract 0.jpeg output -l chi_sim >> >> Error opening data file >> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata >> >> Please make sure the TESSDATA_PREFIX environment variable is set to your >> "tessdata" directory. >> >> Failed loading language 'chi_sim' >> >> Tesseract couldn't load any languages! >> >> Could not initialize tesseract. >> >> >> Then, I tried like this, but I cannot. >> >> >> $ tesseract 0.jpeg output -l chi_sim --tessdata-dir /usr/share/tesse >> ract-ocr/4.00/tessdata >> >> Error opening data file >> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata >> >> Please make sure the TESSDATA_PREFIX environment variable is set to your >> "tessdata" directory. >> >> Failed loading language 'chi_sim' >> >> Tesseract couldn't load any languages! >> >> Could not initialize tesseract. >> >> >> Then, I tried to connect path to /usr/share/tesseract-ocr/4.00/tessdata >> and tried again, but I cannot. >> >> >> $ export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata/ >> >> $ tesseract 0.jpeg output -l chi_sim >> >> Error opening data file >> /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata >> >> Please make sure the TESSDATA_PREFIX environment variable is set to your >> "tessdata" directory. >> >> Failed loading language 'chi_sim' >> >> Tesseract couldn't load any languages! >> >> Could not initialize tesseract. >> >> >> If I execute the language list, I can find chi_sim.traineddata again. >> >> $ tesseract --list-langs >> >> List of available languages (5): >> >> chi_sim >> >> chi_tra >> >> eng >> >> jpn >> >> osd >> >> >> Please tell me why I cannot use the traineddata downloaded from >> https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata >> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Ftesseract-ocr%2Ftessdata%2Fblob%2Fmaster%2Fchi_sim.traineddata&sa=D&sntz=1&usg=AFQjCNFDC123R3ymMJl_jEb2iqh-WMZfdg>? >> >> Did I make a mistake? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/e93f49e3-978e-458d-8f97-1e0266a318c8%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/e93f49e3-978e-458d-8f97-1e0266a318c8%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/fd0e48ec-412c-464d-85bb-5ed65d4419c3%40googlegroups.com.

