TP used "config-page.txt" for the name of the config file, but you can name it any way you like. A config file is a file of control parameters used for tweaking Tesseract. You can find some e.g. in the "tessdata/configs" directory, but also you can create your own.
As for existence and effects of specific parameters, currently I don't any other way to find it out but digging in Tesseract's code. There's also an ancient documentation at http://tesseract-ocr.repairfaq.org/tess_variables_all.html but one needs to explore if some parameter is still valid and the descriptions are often obscure. Warm regards, Dmitri Silaev www.CustomOCR.com On Thu, Mar 8, 2012 at 8:32 PM, Paul <[email protected]> wrote: > Thank you gents that will work for me, I will give it a try. Is there > somewhere I can find some documentation on things like config-page.txt > etc. I have Googled it but am not finding a whle lot of info. > > Best Regards > > Paul > > On Mar 8, 8:47 am, Dmitri Silaev <[email protected]> wrote: >> My bad, I had missed that feature. "tessedit_page_number" indeed >> allows to specify a TIFF page. I can only add a bit of clarification: >> the page number is zero-based. The value of -1 (default) instructs >> Tesseract to process all TIFF pages. >> >> Warm regards, >> Dmitri Silaevwww.CustomOCR.com >> >> >> >> >> >> >> >> On Thu, Mar 8, 2012 at 12:28 PM, TP <[email protected]> wrote: >> > On Wed, Mar 7, 2012 at 9:33 AM, Dmitri Silaev <[email protected]> >> > wrote: >> >> No, at this time it is not possible to do via command line. >> >> > As a matter of fact with the SVN version of tesseract at least (and >> > probably earlier versions), it is possible to tell tesseract to OCR a >> > particular page in a multipage tiff file via the command line. For >> > example, run: >> >> > tesseract.exe example_multipage.tif page4 config-page.txt >> >> > where the config file, config-page.txt, only has the following in it: >> >> > tessedit_page_number 3 >> >> > You'll see: >> >> > Tesseract Open Source OCR Engine v3.02 with Leptonica >> > Page 4 of 5 >> >> > and page4.txt will then contain the OCRed text of the fourth "page" in >> > example_multipage.tif. >> >> > So just dynamically create "config-page.txt" with the page # you want to >> > OCR. >> >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "tesseract-ocr" group.> To post to this group, send email >> > [email protected] >> > To unsubscribe from this group, send email >> > to>[email protected] >> > For more options, visit this group at >> >http://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

