[tesseract-ocr] Training Tesseract4.0 (LSTM) on word level bounding boxes

2017-08-10 Thread 'Shoaib' via tesseract-ocr
Hi everyone,

I would like to train Tesseract on my own dataset comprising of word 
images. I have the bounding box information but for the whole word instead 
of per character. I referred to the following documentation available on 
the topic of training Tesseract 4.0. 
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00

On the documentation, it is mentioned that "*The boxes only need to be at 
the textline level. It is thus far easier to make training data from 
existing image data.*". But later in the wiki, the box format that allows 
boxes at text line level is said not to be implemented as of yet ("*Box 
File Format - Second Option (NOT YET IMPLEMENTED)*"). I would therefore, 
like to know if there is any way to train Tesseract based on just the word 
bounding box information instead of character level information?

Thanking you for your time in this regard.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8581d82d-5ec2-45b4-bdda-342152970014%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Training Tesseract4.0 (LSTM) on word level bounding boxes

2017-08-10 Thread 'Shoaib Ahmed' via tesseract-ocr
Hi,

I would like to train Tesseract 4.0 (LSTM) on word level bounding boxes. 
Is there any possibility to train on word level bounding boxes in Tesseract 
4.0 at the moment instead of character level bounding boxes?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4d91f85f-4293-4f0e-bf4d-cbcc000b50d4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.