You can see if generate_line_box.py <https://github.com/OCR-D/ocrd-train/blob/master/generate_line_box.py> from https://github.com/OCR-D/ocrd-train is helpful.
It requires single line images and matching ground truth to create the box files. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, May 22, 2018 at 8:14 AM, Tao Shatoo <[email protected]> wrote: > Not yet,i tried but failed.I'm waiting for the same API like you. > > 在 2017年8月11日星期五 UTC+8上午6:08:05,Shoaib写道: >> >> Hi everyone, >> >> I would like to train Tesseract on my own dataset comprising of word >> images. I have the bounding box information but for the whole word instead >> of per character. I referred to the following documentation available on >> the topic of training Tesseract 4.0. >> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 >> >> On the documentation, it is mentioned that "*The boxes only need to be >> at the textline level. It is thus far easier to make training data from >> existing image data.*". But later in the wiki, the box format that >> allows boxes at text line level is said not to be implemented as of yet >> ("*Box >> File Format - Second Option (NOT YET IMPLEMENTED)*"). I would therefore, >> like to know if there is any way to train Tesseract based on just the word >> bounding box information instead of character level information? >> >> Thanking you for your time in this regard. >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/159baf4d-28a2-49c6-99c2-5fb1cc231ae3% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/159baf4d-28a2-49c6-99c2-5fb1cc231ae3%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWb6C2sWpuLAsDjqj2kaKN6PT7ovkqwOtMPgmkfURw-HA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

