TP used "config-page.txt" for the name of the config file, but you can
name it any way you like. A config file is a file of control
parameters used for tweaking Tesseract. You can find some e.g. in the
"tessdata/configs" directory, but also you can create your own.

As for existence and effects of specific parameters, currently I don't
any other way to find it out but digging in Tesseract's code. There's
also an ancient documentation at
http://tesseract-ocr.repairfaq.org/tess_variables_all.html but one
needs to explore if some parameter is still valid and the descriptions
are often obscure.

Warm regards,
Dmitri Silaev
www.CustomOCR.com



On Thu, Mar 8, 2012 at 8:32 PM, Paul <[email protected]> wrote:
> Thank you gents that will work for me, I will give it a try. Is there
> somewhere I can find some documentation on things like config-page.txt
> etc. I have Googled it but am not finding a whle lot of info.
>
> Best Regards
>
> Paul
>
> On Mar 8, 8:47 am, Dmitri Silaev <[email protected]> wrote:
>> My bad, I had missed that feature. "tessedit_page_number" indeed
>> allows to specify a TIFF page. I can only add a bit of clarification:
>> the page number is zero-based. The value of -1 (default) instructs
>> Tesseract to process all TIFF pages.
>>
>> Warm regards,
>> Dmitri Silaevwww.CustomOCR.com
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Mar 8, 2012 at 12:28 PM, TP <[email protected]> wrote:
>> > On Wed, Mar 7, 2012 at 9:33 AM, Dmitri Silaev <[email protected]> 
>> > wrote:
>> >> No, at this time it is not possible to do via command line.
>>
>> > As a matter of fact with the SVN version of tesseract at least (and
>> > probably earlier versions), it is possible to tell tesseract to OCR a
>> > particular page in a multipage tiff file via the command line. For
>> > example, run:
>>
>> >   tesseract.exe example_multipage.tif page4 config-page.txt
>>
>> > where the config file, config-page.txt, only has the following in it:
>>
>> >  tessedit_page_number    3
>>
>> > You'll see:
>>
>> >  Tesseract Open Source OCR Engine v3.02 with Leptonica
>> >  Page 4 of 5
>>
>> > and page4.txt will then contain the OCRed text of the fourth "page" in
>> > example_multipage.tif.
>>
>> > So just dynamically create "config-page.txt" with the page # you want to 
>> > OCR.
>>
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups "tesseract-ocr" group.> To post to this group, send email 
>> > [email protected]
>> > To unsubscribe from this group, send email 
>> > to>[email protected]
>> > For more options, visit this group at
>> >http://groups.google.com/group/tesseract-ocr?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to