You can specify a .uzn file defining the zones. https://groups.google.com/forum/#!topic/tesseract-ocr/M0o5az7Zoo8
On Thursday, January 4, 2018 at 7:37:48 AM UTC-6, Subhanshu Gupta wrote: > > Thanks Quan. One more thing, how can I use Tesseract to read a form having > different data fields like Name, Address, etc. and save the corresponding > data to somewhere else? > > > On Thursday, January 4, 2018 at 6:51:48 AM UTC+5:30, Quan Nguyen wrote: >> >> Tesseract engine cannot read PDF. You'll have to convert them to suitable >> images (TIFF or PNG) first. There are many tools for that: ImageMagick, >> GhostScript, PDFBox, etc. >> >> On Wednesday, January 3, 2018 at 12:05:12 PM UTC-6, Subhanshu Gupta wrote: >>> >>> Dear All, >>> >>> I am new to Tesseract OCR and need to implement it to Read PDF Forms but >>> I am not able to find any good documentation for which method to use to >>> read PDF as well as for Character Segmentation. >>> If any of you have any doc/manual relating on which method is used where >>> it will be really very helpful. >>> >>> Thanks. :) >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/acd40ce0-46d2-4442-9f83-16a895ac27c0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.