Re: [tesseract-ocr] Tessercat 4.0 korean detecting chinese

2018-04-11 Thread Fanatico
After some research in Korean I found that they do use Chinese characters in their language, so it is correct to set Chinese as a sublanguage, the problem is that the kor.training_text doesn't have chinede letters, so the code is only training Korean and ignoring the Chinese, so if I tesseract

[tesseract-ocr] Re: How to train for multiple languages?

2018-04-11 Thread Fanatico
Thanks, I was going to do this, just to be sure if there wasn't a way to train 2 traineddata like the actual. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: [tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-11 Thread Firlefanz
Thank you again. I think I'll stay with plain txt -- pdf looks too difficult to achieve. Now, next problem: Everything worked fine with my 1-page test pdf. I now tried to do the same with a 30 MB 500 pages pdf. After running convert -density 300 test.pdf -depth 8 -strip -background white

Re: [tesseract-ocr] Column splitting failed around fuzzy line

2018-04-11 Thread ShreeDevi Kumar
Try to look at leptonica sample programs about column splitting to see if you can preprocess the image better, before giving to tesseract On Wed 11 Apr, 2018, 11:46 AM Ewan Mellor, wrote: > Hi, > > > I am using Tesseract 4 (git 10f4998a) to process a file with two

Re: [tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-11 Thread ShreeDevi Kumar
https://github.com/tesseract-ocr/tesseract/issues/660 Regarding pdf On Wed 11 Apr, 2018, 1:28 PM ShreeDevi Kumar, wrote: > 1. Check the output tif and adjust convert command if needed > > 2. Depending on your tesseract version you could try -l frk also. > > 3. Yes, you

Re: [tesseract-ocr] Re: Error opening traineddata files on Mac High Sierra

2018-04-11 Thread ShreeDevi Kumar
1. Check the output tif and adjust convert command if needed 2. Depending on your tesseract version you could try -l frk also. 3. Yes, you can get a pdf as output. Search Github issues, there is a long discussion thread regarding best ways to create a pdf output. Look for pdf and invisible

[tesseract-ocr] How to include tesseract 4.00 to my visual studio c++ ??

2018-04-11 Thread abdelsalam . h . a . a
i have been using tesseract 3.04 i could use it just by adding the include file to my project, but when i download the new version tesseract 4.00 there was no include file . plz any one can help me in this thank you . -- You received this message because you are subscribed to the Google

[tesseract-ocr] Column splitting failed around fuzzy line

2018-04-11 Thread Ewan Mellor
Hi, I am using Tesseract 4 (git 10f4998a) to process a file with two columns. A snippet of the image is shown below. The problem is that there is a fuzzy line between the two columns, and the column detector has got confused. I've ended up with one block covering the first column up to