[tesseract-ocr] Training - Output traneddata always have the same size as input

2020-04-07 Thread Luan Fernandes
Good morning everyone, First of all I found a similar problem on this post, although the solutions didn't seem to help me: https://groups.google.com/forum/#!msg/tesseract-ocr/O8EEFSSj7_I/aRCIzGbvAgAJ So the question is, after various iterations on hundreds of pages, shound't the output

Re: [tesseract-ocr] The text is not recognized from png

2020-04-07 Thread Zdenko Podobny
You can start with reading docs and then searching issue tracker and forum for "table". Zdenko ut 7. 4. 2020 o 7:38 amrapalli karan napísal(a): > I have this .pdf file which I am able to read only partially. I am using R > language to fetch the data from the pdf file which is uploaded in the

Re: [tesseract-ocr] How to split a3 in single page

2020-04-07 Thread Zdenko Podobny
no. Tesseract is OCR engine and not image processing tool. Pdf export strictly follow rule to not modify input image e.g. you have this need you need to use other tools to create pdf. Zdenko po 6. 4. 2020 o 23:51 Teo napísal(a): > I've this page, can I split this A3 scan in 2 A4, during the

[tesseract-ocr] The text is not recognized from png

2020-04-07 Thread Lakshay Saini
Hi, 1. Deskew the image to get straight text lines. 2. Use tesseract's PSM 6 mode, this mode helps you scan the pdf horizontally which can be very useful in table extraction. Tesseract engine can provide great results depending on the quality of image provided to it. It cannot give you 100%

[tesseract-ocr] How to use eng+equ traineddata

2020-04-07 Thread Seda Yılmaz
I am developing android project for graduation project.I want to recognize mathematical expressions,symbols like 3x ÷ 7 = 11 , x^2 – 4 = 0 , integral sign etc. I tried equ.traineddata but it returns absurd result. exactly what i want to do use together equ and eng taineddata.I think

[tesseract-ocr] Re: Python Multiprocessing - Tesseract slows down running OCR (Solved)

2020-04-07 Thread Michael Keenan
More of a resolution - it looks like the issue was accidentally because I was using 0.3.1, and there was a bug fix in 0.3.2 for properly cleaning of temp files: https://github.com/madmaze/pytesseract/releases. So upgrading pytesseract is more likely the best course of action. On Tuesday, 7

[tesseract-ocr] Re: The text is not recognized from png

2020-04-07 Thread amrapalli karan
Thanks for the post but while I am trying to use deskew in R , its throwing error while installation. But I have a work around which gave somewhat similar results. The magick package has image_deskew but that didn't seem to work. The output is generating a '|' and 'CATHODEFULL'. and I am not