Hi Zdenko,
thanks for your response.
I know tesseract at very beginning level, so can you tell me how can I
check it? (I use a Linux version of tesseract...)
Thanks,
Salvo.
Il giorno giovedì 16 ottobre 2014 21:46:31 UTC+2, zdenop ha scritto:
fl is recognizes as ligature in English, so there
You probably got the source for a different version of Tesseract. This
might not matter, depending on what you are doing. Find out the version by
running it: you will see 'Tesseract Open Source OCR Engine v3.04.00 with
Leptonica' or similar.
How to train:
On Linux try YAGF, it is a GUI front end for Tesseract. As zdenop said,
you have a unicode problem. You need to use UTF8 for strings.
On Friday, October 17, 2014 6:07:26 AM UTC-4, Salvo Piazza wrote:
Hi Zdenko,
thanks for your response.
I know tesseract at very beginning level, so can you
OCR a test image with you app, store result to text file. Than OCR the same
image with tesseract executable (output should be in text file by default)
and compare results.
If output from tesseract executable is OK, but from your app is wrong (e.g.
there are only ascii letters) = you have problem
I have been getting great results from Tesseract when the images are clear.
However, many of my images are crummy.
How would you get the best results for this? Maybe improved training, maybe
image pre-processing?
The original is like this:
If you like Perl you can parse values from the hOCR. You will need to
change this to suit:
sub saveStats {
my ( $outHcr, $outStats) = @_;
open( STFILE, $outStats);
# get just the x_wconf values from the hocr file:
# write to a stats file with a wconf per line
my $confsum
https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
try with image at 300dpi or higher. resize 300%
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Oct 17, 2014 at 8:35 PM, Rick Leir rich...@c7a.ca
Thanks, ShreeDevi
I opened the jpg in Gimp, and you can see that it is about 100 pixels per
text line:
https://lh5.googleusercontent.com/-jAAkrAFL_wE/VEE3pA5LMbI/ADs/1kExQh_pdiA/s1600/gimpOriginal.png
On Friday, October 17, 2014 11:23:37 AM UTC-4, shree wrote:
You have to experiment ..
I got better results after some image processing and vietocr ..
that it has bcln dooi
transfer of a portzon
which has been leased
an. M- nan-ant.‘ 0n Mu
[image: Inline image 1]
ShreeDevi
भजन - कीर्तन -
On Fri, 17 Oct 2014, Rick Leir wrote:
I opened the jpg in Gimp, and you can see that it is about
100 pixels per text line:
[gimpOriginal.png]
That image looks to be scanned at about 150 dpi. With
such faint characters, scanning at 300 or 600 dpi would
have been better. Anyway, try scaling
10 matches
Mail list logo