I know this message is old but I wanted to chime in with an with a bit more 
of a specific answer so that if anyone else came across this thread they 
wouldn't be stuck.

TLDR:
Training requires tesseract and unicharset_extractor to be the same 
version. If you update one without updating the other, you'll get this 
error. How you update each depends on how you installed each. If you 
installed tesseract through a package manager, someone probably also made 
the unicharset_extractor/tesseract training utilities of the same version 
also available in your package manager. If you installed tesseract through 
source, you need to run `make training && sudo make training-install`.

Regardless of how you installed tesseract, you can probably:
1. find your version with `tesseract -v` 
2. clone the tesseract repo from https://github.com/tesseract-ocr/tesseract
3. checkout the version that matches your tesseract version (`git checkout 
5.0.0-aplha` for example)
4. run `./configure && make training && sudo make training-install`

How this problem arose for me:

I installed tesseract from source. Cloned down the repo from 
https://github.com/tesseract-ocr/tesseract and ran `./configure && make && 
sudo make install`. Then I installed the training utilities with `make 
training && sudo make install-training`. Then I cloned the tesstrain repo 
`https://github.com/tesseract-ocr/tesstrain` and everything worked. Time 
passes and I come back to train some new data and I update tesseract with a 
`git pull` and re-run `./configure && make && sudo make install`. 

This is when I see the error when I tried to run `make training` from the 
`tesstrain` repo:
```
ERROR: shared library version mismatch (was 5.0.0-alpha-797-gec01, expected 
5.0.0-alpha-647-g4a00b Did you use a wrong shared tesseract library? make: 
*** [Makefile:180: data/table-ocr/unicharset] Error 1 
```

That message pointed me to line 180 of the Makefile in the tesstrain. Line 
180 was `unicharset_extractor --output_unicharset 
"$(OUTPUT_DIR)/my.unicharset" --norm_mode $(NORM 181 _MODE) "$(ALL_GT)"` so 
I see it's related to the `unicharset_extractor` utility. I try running 
`unicharset_extractor -v` and get the error again about a version mismatch. 

I searched the `tesseract` repo for `unicharset_extractor` to see 
how/where/why it is needed, where it gets installed, and why it wasn't 
updated. `grep -R 'unicharset_extractor' .` in the tesseract repo will show 
you that it comes from `src/training/Makefile`.

That's when it dawned on me that I forgot to run `make training && sudo 
make training-install` after I updated tesseract.
On Thursday, May 9, 2019 at 1:42:19 AM UTC-7 anne wrote:

> Umm, can you please elaborate on what you mean by "similar way as you 
> installed it"? I actually installed tesseract twice, first by following the 
> instructions here: 
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 and 
> then when I encountered this error, I thought, hey maybe I should just 
> install it again. Although this time I found this command 
>
> sudo apt install tesseract-ocr 
>
> found in https://github.com/tesseract-ocr/tesseract/wiki/Compiling which 
> I used. <https://github.com/tesseract-ocr/tesseract/wiki/Compiling>
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/548fdb70-db4f-494f-a2f1-725b180e6a27n%40googlegroups.com.

Reply via email to