you should *uninstall the old version fully* and then build the version
from git. It is possibly referring to some older libraries.
Also, this needs leptonica 1.71. Not sure if the documentation mentions it
or not.
ShreeDevi
भजन -
please see https://code.google.com/p/tesseract-ocr/issues/detail?id=1278
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Jan 9, 2015 at 5:44 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
you should *uninstall
I am using the git version -- output and messages attached. pdf seems to
have all the lines.
User@HP ~/tesseract-ocr/testing
$ tesseract 5.tif 5 pdf
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Page 1
OSD: Weak margin (5.78), horiz textlines, not CJK: Don't rotate.
Page 2
Too few
I don't think that's the supposed behavior. What version of tesseract are
you using? Please post a sample image for testing?
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Jan 8, 2015 at 8:00 PM, C.
see
http://stackoverflow.com/questions/15067651/cannot-find-a-way-to-make-tessnet2-work
tessnet2 is .NET wrapper for Tesseract 2.04
Try newer versions - say from https://github.com/charlesw/tesseract
ShreeDevi
भजन - कीर्तन - आरती @
I think you need to deskew/dewarp the lines, increase brighness, get the
imaes at 300dpi and try.
I tested using your images with vietocr (4.0 beta) with the following
output ...
--
East 133rd Street, cast from Cypress Ave. In the background is
the United Electric Light and
Which version of source have you used?
Latest version is available from
https://code.google.com/p/tesseract-ocr/source/checkout
You need the pdf config files in tessdata directory. See
https://code.google.com/p/tesseract-ocr/source/browse/tessdata
You also need to make sure that tessdata_prefix
Hi Chris,
I opened the pdfs in Adobe Reader as well as Foxit Reader on Windows7, and
the page flickers with large size text but then seems to display normally -
zoom 100% also seems to be regular output only.
Tesseract now has a 'pdf' option, so you don't need to do 'hocrpdf'. Try
the following:
Have you tried with version compiled from latest source on git?
If you post a couple of sample images I can give a try and let you know
what results I get.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sun, Nov 23,
Hi,
Have you added the fonts to font-properties file?
Try removing the 'narrow' font from your training set.
Test with just one or two similar fonts and see if results are better.
ShreeDevi
भजन - कीर्तन - आरती @
.
On Wed, Nov 19, 2014 at 7:47 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Training 2 files
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Nov 20, 2014 at 9:15 AM, ShreeDevi Kumar shreesh...@gmail.com
I have not used Serak - but the issues page there indicates problems with
RTL languages - see
https://code.google.com/p/serak-tesseract-trainer/issues/detail?id=6
why are u not using jtessbox editor's trainer or the command line programs?
I think the binaries are bundled with JTess...
here.
Question: m i giving the wrong file in the path in Tesseract executable
and Training data i.e ara box file? or what goes wrong.
note: i have put no data words_list, frequent_words, font_properties file.
On 20 November 2014 17:32, ShreeDevi Kumar shreesh...@gmail.com wrote:
I have
take a look at hocr output
and tsv option from https://code.google.com/r/email-hocr-tsv/
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sat, Nov 15, 2014 at 3:39 PM, Simon Støvring simonstoevr...@gmail.com
wrote:
I
Amarjeet,
Glad that you are getting 70-80% correct OCR for Marathi using the Konkani
traineddata I posted.
The Hindi traineddata was trained with 'cube' method by Google but that is
not available to us.
The training can be improved with better training text or font similar to
the one being
Have you tried with the existing english traineddata?
I get good recognition with your 'prepared-image'?
If that is the kind of image you need to OCR, you could do that with psm 6
and then split each letter separately?
ShreeDevi
भजन -
Straighten the image before sending to tesseract. You can use scantailor or
unpaper.
Imagemagick may also have an option, you'll have to look.
See attached images - output from scantailor - and then OCRed using Vietocr
(gui frontend to Tesseract)
MODEL NAME 7
MOORE RF28HMEDBSR
ml.“
| mt
.txt
.pdf
.hocr
pdf and hocr can be passed as CONFIG file options when using tesseract from
commandline
and txt output is created automatically (in both cases, I think)
This is with the latest version of tesseract from git.
ShreeDevi
asc traineddata does not have a wordlist or dictionary, so using eng will
help with that. Also, I just trained using a few fonts that support the
whole range. If you train with the font you are using, you will get better
results.
You can use 'combine_tessdata' command with the -u (unpack) option
You need leptonica 1.71 for the current version of tesseract.
liblept.so.4
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Nov 12, 2014 at 5:05 PM, Patrick Vöhrs voe...@wesoma-consulting.com
wrote:
Hi at all,
Have you seen http://tess4j.sourceforge.net/ - A Java JNA wrapper for
Tesseract OCR API.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Nov 12, 2014 at 6:18 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
You
];
On Wed, Nov 12, 2014 at 12:30 AM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Are you able to pass a configuration variable with iOS CocoaPod ?
*-c configvar=value*
Set value for control parameter. Multiple -c arguments are allowed.
*configfile*
The name of a config to use. A config
, ShreeDevi Kumar shreesh...@gmail.com
wrote:
bazaar is nothing but a config file which sets values for a set of config
variables, please see
https://code.google.com/p/tesseract-ocr/source/browse/tessdata/configs/bazaar
So, if patterns are helpful, you can that as a config.
ShreeDevi
You can look at the unicharset of the traineddata to see the coverage.
try with eng+deu+iast
iast is a traineddata that I generated for sanskrit transliteration in
roman/latin script.
https://code.google.com/r/shreeshrii-langdata/source/browse/iast.unicharset?name=iast
JTessBoxEditor has three tabs
Use *Tiff/Box Generator* to generate tiff and box files from a given text
file for the chosen font
The Box files created by Box/Tiff Generator are based on the rendering of
the text in the chosen font and will be accurate - however they may still
get errors 'blob
Please attach a copy of the image so that I can try.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Nov 11, 2014 at 9:43 PM, misonis...@gmail.com wrote:
I was in PSM_SINGLE_LINE mode indeed, because my text is
Have you tested with the English traineddata from the git tessdata repo?
Please see
https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
try with these,
/path/to/eng.user-patterns:
1-\d\d\d-GOOG-411
www.\n\\\*.com
I haven't tried this personally though
ShreeDevi
You don't need to train in order to extract text.
Have you tried with the english traineddata .. available from
https://code.google.com/p/tesseract-ocr/source/browse/?repo=tessdata
ShreeDevi
भजन - कीर्तन - आरती @
also see https://groups.google.com/forum/#!topic/tesseract-ocr/et7bS5QRf2o
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Nov 11, 2014 at 11:02 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Have you tested
You need to pre-process the image so that G shows up correctly. In the
attached image G looks like a 6 as it is connected.
If that is the shape of G in the font and you need to OCR it, you may
either need to retrain or post-process the text.
You could also try with a newer version of program.
I checked with vietocr beta4, which uses newer version of tesseract - it
recognizes your tiff correctly.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Nov 12, 2014 at 8:12 AM, ShreeDevi Kumar shreesh
, as the final version of
what I'm using will be using an iOS CocoaPod that does not support the
bazaar functionality of Tesseract.
On Tue, Nov 11, 2014 at 8:51 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
On Wed, Nov 12, 2014 at 2:13 AM, ste...@fortyau.com wrote:
The user-patterns looks
Look under jtessboxeditor/samples/vie folder
and create similar files for your language
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Nov 10, 2014 at 1:10 PM, iram akbar iramakb...@gmail.com wrote:
Quan,
i
What method are you using for training?
Which version of tesseract?
What platform?
Please see instructions on
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
The following shell script will be useful, if using the latest source from
git.
See
https://groups.google.com/forum/?utm_medium=emailutm_source=footer#!topic/tesseract-dev/8e0F2cK2YzU
for
Plans for 3.04 release
For Training Instructions, please see
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
Please see
https://code.google.com/p/tesseract-ocr/source/browse/?repo=langdata#git%2Fkat
Language codesISO 639-1 http://en.wikipedia.org/wiki/ISO_639-1kaISO 639-2
http://en.wikipedia.org/wiki/ISO_639-2geo
http://www.sil.org/iso639-3/documentation.asp?id=geo (B)
kat
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Nov 7, 2014 at 4:26 PM, iram akbar iramakb...@gmail.com wrote:
Hi,
i want to make my own tessdata
Also see
https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0usp=sharing
tutorial files for overview
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Nov 7, 2014 at 5:04 PM, ShreeDevi Kumar shreesh
CC:ing Ray and Dev group
That language data is part of the update done by Ray Smith on August 12.
Ray is planning an update to language data and traineddata soon, so if you
have suggestions for improvement, please file an issue and provide more
details, samples of each script, etc..
ShreeDevi
Please also change the FONT under TRAINER tab to Arabic .
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Nov 6, 2014 at 2:49 PM, iram akbar iramakb...@gmail.com wrote:
i have downloaded the lates version 1.1
You could also test with
gswin32c -q -dNOPAUSE -dBATCH -sDEVICE=tiffgray -sCompression=lzw -r300
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Nov 6, 2014 at 2:13 PM, Sébastien Cuendet
Click on the 'generate' box - with some devanagri fonts I have found that
text does not display but the tiff/box are generated. Maybe same for the
arabic font you are using. Give it a try.
You can also try to copy and paste the text, sometimes that works.
ShreeDevi
I think you are using the wrong tools ...
If you need to convert a jpg to tif, use an image editor such as
imagemagick, irfanview
If you need to OCR the image, tesseract accepts jpg as input as well as tif
There already is arabic traineddata for tesseract - see
I had asked to try vietocr because it is using a newer svn version for the
java 4.0beta and I find it easy to test under windows with the gui, as I
can change the image filter settings in it.
You will have to choose the tools based on your platform and other
requirements. You could use
Did you install the latest version from
http://packages.ubuntu.com/utopic/tesseract-ocr
If so, it should have the trainingtools.
Try
which text2image
to see if it installed
ShreeDevi
भजन - कीर्तन - आरती @
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Nov 5, 2014 at 4:57 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
I had asked to try vietocr because it is using a newer svn version for the
java 4.0beta and I find it easy to test under windows with the gui, as I
can
There already is language data for srp - please see
https://code.google.com/p/tesseract-ocr/source/browse/srp/?repo=langdata
and
https://code.google.com/p/tesseract-ocr/source/browse/srp.traineddata?repo=tessdata
Ray Smith, the lead developer of tesseract at Google is planning to
release
Thanks for clarifying and giving more details.
I am cc:ing this email to the tesseract developers group and Ray for answer
to your question how to submit this file to Tesseract's repository?.
Meanwhile, I suggest that you add an 'issue' and attach the traineddata.
Thanks!
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Nov 4, 2014 at 7:35 AM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Thanks for clarifying and giving more details.
I am cc:ing this email to the tesseract developers group and Ray
http://manpages.ubuntu.com/manpages/precise/man1/tesseract.1.html
*tesseract* *imagename* *outbase* [*-l* *lang*] [*-psm* *N*] [*configfile* ...]
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sat, Nov 1, 2014 at
Updated version of man page is at
https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sat, Nov 1, 2014 at 4:19 PM, ShreeDevi Kumar shreesh...@gmail.com
In VietOCR's image menu, check 'screenshot mode'
Use the filters submenu to experiment with other settings to improve your
image.
Look under properties for the dpi, convert your input images to 300dpi as
they are currently low res (72dpi or so).
experiment :-)
ShreeDevi
change image to 300 dpi
try vietocr - in screenshot mode -
try with the vietnamese traineddata
with commandline tesseract use 'digits' config file as parameter
recognizing only numbers is actually answered on the tesseract FAQ
http://code.google.com/p/tesseract-ocr/wiki/FAQ
Do look at https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
for pre-processing steps for your images to improve recognition regardless
of the OCR you use.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed,
Please choose german in the dropdown for language on right hand side.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Oct 29, 2014 at 9:08 PM, boris borisri...@gmail.com wrote:
Hi Shree,
many thanks for your
I was going to suggest the tips from
https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
but, just OCRing the image without any changes in VietOCR (GUI frontend for
tesseract) with German traineddata gives perfect result - see image.
What version are you using, on what platform, ??
I
Try .net wrapper with newer version of tesseract.
invert the image, smoothen/blur, make greyscale ... I tried with vietocr
output is 'QBCDEFGHIJKL'
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Oct 23, 2014 at
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Oct 23, 2014 at 12:24 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Try .net wrapper with newer version of tesseract.
invert the image, smoothen/blur, make greyscale ... I tried
https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
try with image at 300dpi or higher. resize 300%
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Oct 17, 2014 at 8:35 PM, Rick Leir rich...@c7a.ca
You have to experiment ..
I got better results after some image processing and vietocr ..
that it has bcln dooi
transfer of a portzon
which has been leased
an. M- nan-ant.‘ 0n Mu
[image: Inline image 1]
ShreeDevi
भजन - कीर्तन -
Marathi traineddata should be in the next release, since there is langdata
for it now in the repo.
You can give a try to the traineddata file from
https://code.google.com/r/shreeshrii-tessdata/source/browse?name=knn which
is a start for konkani.
ShreeDevi
701 - 761 of 761 matches
Mail list logo