Re: [tesseract-ocr] Tesseract 4.0 extracting multiple columns where one is wanted

2018-05-03 Thread ShreeDevi Kumar
Try with --psm 6 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, May 2, 2018 at 9:26 PM, wrote: > I am using Tesseract 4.0 to extract text from scanned PDF documents. I >

[tesseract-ocr] Tesseract 4.0 extracting multiple columns where one is wanted

2018-05-02 Thread peter . bleackley
I am using Tesseract 4.0 to extract text from scanned PDF documents. I first use pdftoppm to split the document into pages represented as png files, and then use the following command to perform OCR tesseract page.pdf stdout -l eng --psm 4 The pages generally have section numbers down the left