On 29/11/2015 12:18, Marco Atzeri wrote:
On 27/11/2015 16:28, Sriranga(83yrsold) wrote:
In coninuation of my previous post - I like to inform that also succeeded
to generate the kan.traineddata file in tesseract-3.05.0Dev using
tesstrain.sh.
I am thankful to all concerned who helped me to solve the problem.
Good Luck.
On Fri, Nov 27, 2015 at 6:45 PM, Sriranga(83yrsold)
<withblessing.sriranga.1...@gmail.com
<mailto:withblessing.sriranga.1...@gmail.com>> wrote:
HI
After several attempts- for more than two days, now
Successfully generated kan.traineddata file in ubuntu 15.10 using
tesstrain.sh of tesseract-3.04.
Attached terminal extract for benefit of users. since
kan.traineddata exceeds 25mb - could not attached herewith. Please
note all fonts listed in language-specific.sh did not work for kan
- resulting failures. I don't know reason why it does not work?
with best of luck,
sriranga(83)
Nice to heard you solved it.
I am testing the cygwin version using the data you provided me,
and clearly there is something wrong in passing font directive
from the script to the utilities.
Moreover I see some segfaults on text2image, that should never
anyway happens.
As soon I found more, I will update here
Regards
Marco
Using the latest git version for the scripts, with at typo correction,
I was able to process the Sriranga's data with 3.04 Cygwin version.
All the logs and data here
http://matzeri.altervista.org/works/tesseract/
directory contents:
input = Sriranga's data
log = script and run logs
scripts = git version and patch for type
tessdata = output file
Additional notes:
- for this case the suggested Cygwin font is "Lohit Kannada"
- There was a misalignment passing temporary date to test2image
one step putting in "/tmp" and the next step expecting
in "/tmp/leptonica"
Workaround linking /tmp/leptonica -> /tmp
- The finale step was expecting "font_properties" in the kan
directory.
Workaround linking
font_properties -> /usr/share/tessdata/font_properties
Regards
Marco
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/565C1274.5040608%40gmail.com.
For more options, visit https://groups.google.com/d/optout.