On Friday, February 10, 2017 at 8:39:51 AM UTC-5, MUHAMMAD ADNAN wrote:
>
> Hi,
> I have some scanned pdf files  which contain table on each page , some 
> tables have borders and some without border and lines.
> I want to extract the formatted table with data in it to a word or excel 
> format.I am totally new to tesseract-ocr and don't know how to use this in 
> C++ or C#.
> Proper Guidance on detection of table and saving output using tesseract is 
> highly appreciated.
> Thanks
>
> Best Regards
> Adnan
>

You might want to use Tabula instead, provided that the pdf contains the 
text and numbers and not just images of them.

https://github.com/tabulapdf/tabula 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/dd3bf2c9-f252-4d73-85a1-cc74c76da1d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to