[tesseract-ocr] Re: Training Tesseract4.0 (LSTM) on word level bounding boxes

Tao Shatoo Mon, 21 May 2018 21:46:39 -0700

Not yet,i tried but failed.I'm waiting for the same API like you.

在 2017年8月11日星期五 UTC+8上午6:08:05，Shoaib写道：
>
> Hi everyone,
>
> I would like to train Tesseract on my own dataset comprising of word 
> images. I have the bounding box information but for the whole word instead 
> of per character. I referred to the following documentation available on 
> the topic of training Tesseract 4.0. 
> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
>
> On the documentation, it is mentioned that "*The boxes only need to be at 
> the textline level. It is thus far easier to make training data from 
> existing image data.*". But later in the wiki, the box format that allows 
> boxes at text line level is said not to be implemented as of yet ("*Box 
> File Format - Second Option (NOT YET IMPLEMENTED)*"). I would therefore, 
> like to know if there is any way to train Tesseract based on just the word 
> bounding box information instead of character level information?
>
> Thanking you for your time in this regard.
>


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/159baf4d-28a2-49c6-99c2-5fb1cc231ae3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Training Tesseract4.0 (LSTM) on word level bounding boxes

Reply via email to