Re: [tesseract-ocr] Re: Box file generator combines vertical lines across rows of text

2018-04-24 Thread ShreeDevi Kumar
Please provide a sample tiff, single page will do, for testing. On 25-Apr-2018 2:00 AM, "Cameron McSweeney" wrote: Yes, and the box files 4.0 made still had the same problem. The accuracy with 4.0 was much better but it still needs some tweaking, so I figured I would be

Re: [tesseract-ocr] Re: Box file generator combines vertical lines across rows of text

2018-04-24 Thread Cameron McSweeney
Yes, and the box files 4.0 made still had the same problem. The accuracy with 4.0 was much better but it still needs some tweaking, so I figured I would be better off fixing the problem in 3.05 > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr"

Re: [tesseract-ocr] Re: Box file generator combines vertical lines across rows of text

2018-04-24 Thread ShreeDevi Kumar
Have you tried the latest version, tesseract 4.0.0beta? On Wed 25 Apr, 2018, 12:03 AM Cameron McSweeney, wrote: > Tesseract seems to be much too willing to find vertical lines. For > example, Ds will be divided so that the straight, left portion is separate > from the

[tesseract-ocr] Re: Box file generator combines vertical lines across rows of text

2018-04-24 Thread Cameron McSweeney
Tesseract seems to be much too willing to find vertical lines. For example, Ds will be divided so that the straight, left portion is separate from the right, curved portion. The font is fixed, so stuff like that shouldn't happen -- You received this message because you are subscribed to the

[tesseract-ocr] Re: Box file generator combines vertical lines across rows of text

2018-04-24 Thread Cameron McSweeney
Tesseract seems to be much too willing to find vertical lines. For example, Ds will be divided so that the straight, left portion is separate from the right, curved portion. The font is fixed, so stuff like that shouldn't happen On Tuesday, April 24, 2018 at 11:29:21 AM UTC-4, Cameron

[tesseract-ocr] Re: Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-24 Thread Александр Поздняков
Hi. I compiled an rpm package with tesseract-ocr for CentOS, Fedora, ScientificLinux, OpenSuse. It must be checked... https://build.opensuse.org/project/show/home:Alexander_Pozdnyakov понедельник, 23 апреля 2018 г., 21:22:40 UTC+3 пользователь Eugene Huang написал: > > Hello! Most people are

Re: [tesseract-ocr] Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-24 Thread ShreeDevi Kumar
I have never used equ.traineddata. From feedback in the forum I don't think it works very well. Maybe equ has not been trained via LSTM training, I have no way of knowing. Only Ray Smith or other developers from Google can answer that. Only LSTM models exist in tessdata_best and tessdata_fast.

Re: [tesseract-ocr] Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-24 Thread Eugene Huang
@Shree Thanks for the tip. Just 2 quick questions. 1) From https://github.com/tesseract-ocr/tesseract/wiki/Data-Files, it says that "osd" and "equ" traineddata files are compatible between Tesseract 3 and 4. In the GitHub tessdata_fast repo (https://github.com/tesseract-ocr/tessdata_fast),

Re: [tesseract-ocr] Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2018-04-24 Thread Eugene Huang
@Shree Thanks for the tip. Just 2 quick questions. 1) From https://github.com/tesseract-ocr/tesseract/wiki/Data-Files, it says that "osd" and "equ" traineddata files are compatible between Tesseract 3 and 4. In the GitHub tessdata_fast repo (https://github.com/tesseract-ocr/tessdata_fast),

[tesseract-ocr] Box file generator combines vertical lines across rows of text

2018-04-24 Thread Cameron McSweeney
I am working on character recognition at work so I can copy information from tables in giant TIFF files and write a program that can automatically use the information from those tables. The tables are computer-generated, but the information is unavailable to me in any format besides TIFF. The