[tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread test0r man
--Push-- does anyone have an idea? thanks for help! Am Sonntag, 8. September 2019 12:23:28 UTC+2 schrieb test0r man: > > hi, > i use this command: > > tesseract input/image.jpg output/output --dpi 72 --oem 1 -l deu+eng > > to scan image like "1_input.jpg" and "2_input.jpg". the ocr result is

Re: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread Ravi Annaswamy
I didn’t try these images but my first guess: can you not provide dpi 72 as option and try? Sent from my iPhone > On Oct 5, 2019, at 4:04 AM, test0r man wrote: > > --Push-- > > does anyone have an idea? > > thanks for help! > > > Am Sonntag, 8. September 2019 12:23:28 UTC+2 schrieb test0r

RE: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread Adrian Owen
https://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy Gimp is your friend. From: tesseract-ocr@googlegroups.com [mailto:tesseract-ocr@googlegroups.com] On Behalf Of Ravi Annaswamy Sent: 05 October 2019 11:08 To: tesseract-ocr@googlegroups.com Subject:

Re: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread Zdenko Podobny
tesseract 2_input_cropped.png - --psm 6 --oem 0 6. 7. 8. 9. 10. Zdenko so 5. 10. 2019 o 10:04 test0r man napísal(a): > --Push-- > > does anyone have an idea? > > thanks for help! > > > Am Sonntag, 8. September 2019 12:23:28 UTC+2 schrieb test0r man: >> >> hi, >> i use this command: >> >>

Re: [tesseract-ocr] Re: Training Sinhala fonts using Tesseract 4.0 version

2019-10-05 Thread Isurianuradha96
Seems this bash script (legacy.sh) is responsible for the mapping of non-Unicode fonts with legacy mapping (as a legacy to Unicode converter). And seems this script file is responsible for the generation of the box,tif and lstmf files. Am I right? so where should I place this script file in

Re: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread test0r man
i've tried without the 72 dpi option. the result on the first image is a bit bader. on the second image no change Am Samstag, 5. Oktober 2019 12:08:35 UTC+2 schrieb Ravi Annaswamy: > > I didn’t try these images but my first guess: can you not provide dpi 72 > as option and try? > > Sent from my

Re: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread test0r man
thanks for your test. i set the border with imagemagick for a better result on the first image. tesseract detects with psm 6 all numbers right, but only on the second image. have you tried the first image too? Am Samstag, 5. Oktober 2019 14:52:15 UTC+2 schrieb zdenop: > > > tesseract

Re: [tesseract-ocr] Re: Training Sinhala fonts using Tesseract 4.0 version

2019-10-05 Thread Shree Devi Kumar
If you use linux, you can try similar to attached bash script. On Thu, Oct 3, 2019 at 2:55 PM Shree Devi Kumar wrote: > There is no direct method for training from non-unicode fonts. Tesseract's > output is also Unicode text only. > > You can work from scanned images of text in non-unicode

Re: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread Zdenko Podobny
end is typo ;-) should be read as eng :-) Dňa so 5. 10. 2019, 21:31 test0r man napísal(a): > Hi Zdenko, > > very good job! i've tried so many image manipulation, but this was the > wrong way for the problems 1-3. the idea with the uzn file is great and i > think the perfect solution. Thanks :-)

Re: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread test0r man
thanks for the link. i will read and try it Am Samstag, 5. Oktober 2019 14:38:26 UTC+2 schrieb Testing Windows Screenshots: > > > https://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy > > > > Gimp is your friend. > > > > *From:*

Re: [tesseract-ocr] Need Help Learning Howto Train Tesseract OCR on Fraktur Fonts - MAC - VietOCR v5.5.2 and Tesseract 4.1.0

2019-10-05 Thread Helmut Wollmersdorfer
Hi Akos, depends from which period you want to OCR Fraktur. Before 1750 you cannot expect very good results. This one is around 1770 in Fraktur (similar Breitkopffraktur) and not so bad:

Re: [tesseract-ocr] Re: tesseract ignores single/short characters -> any ideas?

2019-10-05 Thread test0r man
Hi Zdenko, very good job! i've tried so many image manipulation, but this was the wrong way for the problems 1-3. the idea with the uzn file is great and i think the perfect solution. Thanks :-) i can confirm that scaling these image doesn't helped (more than 30 pixel per letter is the right