[tesseract-ocr] Re: how to use PDF as Input

Quan Nguyen Thu, 04 Jan 2018 08:45:43 -0800

You can specify a .uzn file defining the zones.

https://groups.google.com/forum/#!topic/tesseract-ocr/M0o5az7Zoo8


On Thursday, January 4, 2018 at 7:37:48 AM UTC-6, Subhanshu Gupta wrote:
>
> Thanks Quan. One more thing, how can I use Tesseract to read a form having 
> different data fields like Name, Address, etc. and save the corresponding 
> data to somewhere else?
>
>
> On Thursday, January 4, 2018 at 6:51:48 AM UTC+5:30, Quan Nguyen wrote:
>>
>> Tesseract engine cannot read PDF. You'll have to convert them to suitable 
>> images (TIFF or PNG) first. There are many tools for that: ImageMagick, 
>> GhostScript, PDFBox, etc.
>>
>> On Wednesday, January 3, 2018 at 12:05:12 PM UTC-6, Subhanshu Gupta wrote:
>>>
>>> Dear All,
>>>
>>> I am new to Tesseract OCR and need to implement it to Read PDF Forms but 
>>> I am not able to find any good documentation for which method to use to 
>>> read PDF as well as for Character Segmentation.
>>> If any of you have any doc/manual relating on which method is used where 
>>> it will be really very helpful.
>>>
>>> Thanks. :)
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/acd40ce0-46d2-4442-9f83-16a895ac27c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: how to use PDF as Input

Reply via email to