[tesseract-ocr] Training Tesseract4.0 (LSTM) on word level bounding boxes

2017-08-10 Thread 'Shoaib' via tesseract-ocr
Hi everyone, I would like to train Tesseract on my own dataset comprising of word images. I have the bounding box information but for the whole word instead of per character. I referred to the following documentation available on the topic of training Tesseract 4.0.

[tesseract-ocr] Training Tesseract4.0 (LSTM) on word level bounding boxes

2017-08-10 Thread 'Shoaib Ahmed' via tesseract-ocr
Hi, I would like to train Tesseract 4.0 (LSTM) on word level bounding boxes. Is there any possibility to train on word level bounding boxes in Tesseract 4.0 at the moment instead of character level bounding boxes? -- You received this message because you are subscribed to the Google Groups

Re: [tesseract-ocr] Creation of encoded unicharset failed While constructing LSTM training data.

2017-08-10 Thread ShreeDevi Kumar
​Seems to work fine for me. Are you sure that you have relevant files in the directories listed in that command? check tessdata, langdata location. Use tessdata/best/*.traineddata as the existing models.​ ShreeDevi भजन - कीर्तन -

[tesseract-ocr] Creation of encoded unicharset failed While constructing LSTM training data.

2017-08-10 Thread robertyoung0511
Hello, I'm trying to finetune the end.traineddata model as the steps in the link: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-%C2%B1-a-few-characters As the tutorail shows, I fine tuning for ± a few characters following the steps. But when I execute