[tesseract-ocr] Re: Ground Truth from Box Files

2020-04-21 Thread Peyi Oyelo
Hello Shree, On Friday, January 6, 2017 at 12:09:15 PM UTC+1, shree wrote: > > Does anyone know of any utilities to convert a box file to ground truth > text file? > > I am using tesstrain.sh which uses text2image for trying out LSTM > training. However, because unrenderable words are not

[tesseract-ocr] Re: Ground Truth from Box Files

2020-04-21 Thread Peyi Oyelo
Hello Shree and sorry for reviving an old dead thread. I am currently trying to train Tesseract to recognize the Akan language. I have been able to create a trained data file that can recognize akan, however this does not use Tesseract's lstm network. I am now trying to perform lstm training

[tesseract-ocr] Re: fine tuning a few characters generating training images error

2020-04-19 Thread Peyi Oyelo
Thanks for the insight. Experiencing the same issue. My tiff file as well was 66MB. On Thursday, June 13, 2019 at 2:50:21 PM UTC-7, Jingjing Lin wrote: > > turns out it is indeed because the chi_sim.training_text I was using was > too large. > I downloaded it from langdata_lstm repository

Re: [tesseract-ocr] Re: Ground Truth from Box Files

2020-04-22 Thread Peyi Oyelo
n.traineddata? > > Do you need to train it only for one font? > > On Tue, Apr 21, 2020 at 11:06 PM Peyi Oyelo > wrote: > >> Thank you for replying Shree. I have zipped the entire document into >> Akan.zip. >> >> >> I have attach

Re: [tesseract-ocr] Re: Ground Truth from Box Files

2020-04-24 Thread Peyi Oyelo
@shree hello sir/maam? On Wednesday, April 22, 2020 at 7:23:28 AM UTC-7, Peyi Oyelo wrote: > > I created the akan.traineddata using the typical tesseract 3 legacy > workflow. I do not have word/freq/punc lists. As of now I would like to > train using lstm to support as many font