Hi again,
Using the instructions at
<https://www.endpoint.com/blog/2018/07/09/training-tesseract-models-from-scratch>,
I'm getting a bit further, but when my script gets to this part:
combine_lang_model \
--input_unicharset "${UNICHARSET_FILE}" \
--script_dir "${TESSDATA_PREFIX}" \
--output_dir "${OUTPUT_DIR}" \
--pass_through_recoder \
--lang "${LANG_CODE}"
it fails with this error:
Config file is optional, continuing...
Failed to read data from: /home/adam/sandboxes/TEST/tessdata/mem/mem.config
Failed to read data from:
/home/adam/sandboxes/TEST/tessdata/radical-stroke.txt
Error reading radical code table
/home/adam/sandboxes/TEST/tessdata/radical-stroke.txt
I can't figure out from these instructions or the tesseract
documentation on github where the mem.config and radical-stroke.txt
files are supposed to come from. Any help would be greatly appreciated!
Also, the previous tesseract command is creating the *.lstmf files in
the same directory as the *.box and *.tif files --- are they supposed to
be in the TESSDATA_PREFIX directory instead?
Thanks,
Adam
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/b685cfec-0144-fc06-b90f-e9ba54771316%40sheffield.ac.uk.