[tesseract-ocr] Extracting Text from Onscreen vs Image

2018-10-18 Thread DreadStarX
Hey Guys/Gals, I'm working on an application to assist myself and colleagues in our day to day tasks. Here's what I want to know. Can Tesseract extract text from onscreen without using an image? If yes, how fast can it read and decipher the text? If Tesseract can't, then I planned on taking a

[tesseract-ocr] Combining -psm 4 with OSD?

2018-10-18 Thread Jarl Arntzen
Hi, all. Iam OCRing 10k invoices for AI training and, as it turns out, using Tesseract's -psm 4 exported as txt is ideal for this as it provides each individual line item as one uninterrupted line of text across the page, including all columns. Example: Product Description

Re: [tesseract-ocr] Server performance is 3x as slow versus local machine

2018-10-18 Thread Zdenko Podobny
Why? What is tesseract issue? That tesseract does not have the same speed on different hw??? That is expected. David started discussion on right place - forum. Please use tesseract issue tracker only for issues that can be fixed on tesseract side. We can not fix user side. Zdenko št 18. 10.

[tesseract-ocr] Server performance is 3x as slow versus local machine

2018-10-18 Thread shree
Reply by @stweil in issue tracker. Please continue further discussion there. It looks like the local machine is rather new hardware, while the server is older. So it could be AVX / SSE none at all. The user can run tesseract --version on both machines to see whether SSE and AVX are found.