Hi,
I want to use tesseract for Chinese words. So, first I tried to execute the 
command 
sudo apt install tesseract-ocr-chi-sim 
And, I can find chi_sim.traineddata in 
/usr/share/tesseract-ocr/4.00/tessdata and can check like this (I also 
downloaded chi_tra and jpn.)

$ tesseract --list-langs

List of available languages (5):

chi_sim

chi_tra

eng

jpn

osd


Actually, I can use tesseract, but I want to do ocr more accurately, so I 
want to use chi_sim.traineddata downloaded from here.
https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata
After I executed the command
sudo apt remove tesseract-ocr-chi-sim
I put the new chi_sim.traineddata in 
/usr/share/tesseract-ocr/4.00/tessdata, and I tried to use tesseract. 
However I cannot like this.

$ tesseract 0.jpeg output -l chi_sim

Error opening data file 
/usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata

Please make sure the TESSDATA_PREFIX environment variable is set to your 
"tessdata" directory.

Failed loading language 'chi_sim'

Tesseract couldn't load any languages!

Could not initialize tesseract.


Then, I tried like this, but I cannot.


$ tesseract 0.jpeg output -l chi_sim --tessdata-dir /usr/share/tesse
ract-ocr/4.00/tessdata

Error opening data file 
/usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata

Please make sure the TESSDATA_PREFIX environment variable is set to your 
"tessdata" directory.

Failed loading language 'chi_sim'

Tesseract couldn't load any languages!

Could not initialize tesseract.


Then, I tried to connect path to /usr/share/tesseract-ocr/4.00/tessdata and 
tried again, but I cannot.


$ export TESSDATA_PREFIX=/usr/share/tesseract-ocr/4.00/tessdata/

$ tesseract 0.jpeg output -l chi_sim

Error opening data file 
/usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata

Please make sure the TESSDATA_PREFIX environment variable is set to your 
"tessdata" directory.

Failed loading language 'chi_sim'

Tesseract couldn't load any languages!

Could not initialize tesseract.


If I execute the language list, I can find chi_sim.traineddata again.

$ tesseract --list-langs

List of available languages (5):

chi_sim

chi_tra

eng

jpn

osd


Please tell me why I cannot use the traineddata downloaded from 
https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata? 
Did I make a mistake?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e93f49e3-978e-458d-8f97-1e0266a318c8%40googlegroups.com.

Reply via email to