You can setup a loop to read all files and then process each file for HOCR output.
Here is an example script - this one was for PDF output. You can change output option to hocr - see https://tesseract-ocr.googlecode.com/git/doc/tesseract.1.html for command syntax #Page Segmentation Modes #3 = Fully automatic page segmentation, but no OSD. (Default) #4 = Assume a single column of text of variable sizes. #6 = Assume a single uniform block of text. PSM=3 MYFILE=page LANG=san OUTPUTOPTION=pdf rm zzz.txt for f in *$MYFILE*.tif do echo "Starting OCR for $f file with -l $LANG at $(date) , please wait..." tesseract --tessdata-dir C:/Home/UserShree/tesseract-ocr/testing $f $f-san -l $LANG -psm $PSM $OUTPUTOPTION cat $f.txt>>zzz.txt done echo "OCR done" gswin32c -dPDFA -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sProcessColorModel=DeviceCMYK -sPDFACompatibilityPolicy=2 -sOutputFile=zzz.pdf *$MYFILE*-san.pdf echo "pdf merged" ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 20, 2015 at 12:25 PM, Stathis L. <[email protected]> wrote: > Does anyone know how to make a script so that tesseract could process many > files and output an hocr file for each tif given? Windows script or linux > script would do:) Thanks > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/2b7f4306-cc23-4cba-90e8-c447f8d08509%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/2b7f4306-cc23-4cba-90e8-c447f8d08509%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVGxfp_9%2BJxvYNQnnkykpDMW08k6DtP8JYNk2E3r3itww%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

