Please submit as a PR to https://github.com/tesseract-ocr/tesstrain
On Wed, Sep 18, 2019 at 4:08 PM Lorenzo Bolzani <[email protected]> wrote: > > Hi, > I wrote this small script to speed up OCRD-train > <https://github.com/OCR-D/ocrd-train> training startup. > > It generates the boxes for all the images provided on the command line (it > works only for single line images). > > It is a simple conversion of the generate_line_box.py from ocrd-train. I > used it once, it seems to work fine. > > Currently with OCR-D the boxes and lstmf generation is very slow because > it starts a new process for each image. > > I execute this script before calling the makefile. > > I do the "shell expansion" in python so that it can handle a very long > list of files. > > So you need to call it in this way: > > python generate_all_line_boxes.py -i 'data/train/*.tif' > > with single quotes to prevent shell expansion. > > > BTW, it would be nice to have the same thing for the lstmf files. > > > > Bye > > Lorenzo > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwTnogqM0C1jk69QKX3hMFvk7nuMJLYAbvw%2BsL%3DZdsQcA%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwTnogqM0C1jk69QKX3hMFvk7nuMJLYAbvw%2BsL%3DZdsQcA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVcZmOyS3bX0YeyLMBuM%2B4WMQvb%2Bir%2BRbK8k3p8NGGqnA%40mail.gmail.com.

