Marathi traineddata should be in the next release, since there is langdata
for it now in the repo.
You can give a try to the traineddata file from
https://code.google.com/r/shreeshrii-tessdata/source/browse?name=knn which
is a start for konkani.
ShreeDevi
https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
try with image at 300dpi or higher. resize 300%
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Oct 17, 2014 at 8:35 PM, Rick Leir rich...@c7a.ca
You have to experiment ..
I got better results after some image processing and vietocr ..
that it has bcln dooi
transfer of a portzon
which has been leased
an. M- nan-ant.‘ 0n Mu
[image: Inline image 1]
ShreeDevi
भजन - कीर्तन -
Try .net wrapper with newer version of tesseract.
invert the image, smoothen/blur, make greyscale ... I tried with vietocr
output is 'QBCDEFGHIJKL'
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Oct 23, 2014 at
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Oct 23, 2014 at 12:24 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Try .net wrapper with newer version of tesseract.
invert the image, smoothen/blur, make greyscale ... I tried
I was going to suggest the tips from
https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
but, just OCRing the image without any changes in VietOCR (GUI frontend for
tesseract) with German traineddata gives perfect result - see image.
What version are you using, on what platform, ??
I
Please choose german in the dropdown for language on right hand side.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Oct 29, 2014 at 9:08 PM, boris borisri...@gmail.com wrote:
Hi Shree,
many thanks for your
Do look at https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
for pre-processing steps for your images to improve recognition regardless
of the OCR you use.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed,
In VietOCR's image menu, check 'screenshot mode'
Use the filters submenu to experiment with other settings to improve your
image.
Look under properties for the dpi, convert your input images to 300dpi as
they are currently low res (72dpi or so).
experiment :-)
ShreeDevi
change image to 300 dpi
try vietocr - in screenshot mode -
try with the vietnamese traineddata
with commandline tesseract use 'digits' config file as parameter
recognizing only numbers is actually answered on the tesseract FAQ
http://code.google.com/p/tesseract-ocr/wiki/FAQ
http://manpages.ubuntu.com/manpages/precise/man1/tesseract.1.html
*tesseract* *imagename* *outbase* [*-l* *lang*] [*-psm* *N*] [*configfile* ...]
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sat, Nov 1, 2014 at
Updated version of man page is at
https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sat, Nov 1, 2014 at 4:19 PM, ShreeDevi Kumar shreesh...@gmail.com
There already is language data for srp - please see
https://code.google.com/p/tesseract-ocr/source/browse/srp/?repo=langdata
and
https://code.google.com/p/tesseract-ocr/source/browse/srp.traineddata?repo=tessdata
Ray Smith, the lead developer of tesseract at Google is planning to
release
Thanks for clarifying and giving more details.
I am cc:ing this email to the tesseract developers group and Ray for answer
to your question how to submit this file to Tesseract's repository?.
Meanwhile, I suggest that you add an 'issue' and attach the traineddata.
Thanks!
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Nov 4, 2014 at 7:35 AM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Thanks for clarifying and giving more details.
I am cc:ing this email to the tesseract developers group and Ray
I had asked to try vietocr because it is using a newer svn version for the
java 4.0beta and I find it easy to test under windows with the gui, as I
can change the image filter settings in it.
You will have to choose the tools based on your platform and other
requirements. You could use
Did you install the latest version from
http://packages.ubuntu.com/utopic/tesseract-ocr
If so, it should have the trainingtools.
Try
which text2image
to see if it installed
ShreeDevi
भजन - कीर्तन - आरती @
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Nov 5, 2014 at 4:57 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
I had asked to try vietocr because it is using a newer svn version for the
java 4.0beta and I find it easy to test under windows with the gui, as I
can
Please also change the FONT under TRAINER tab to Arabic .
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Nov 6, 2014 at 2:49 PM, iram akbar iramakb...@gmail.com wrote:
i have downloaded the lates version 1.1
You could also test with
gswin32c -q -dNOPAUSE -dBATCH -sDEVICE=tiffgray -sCompression=lzw -r300
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Nov 6, 2014 at 2:13 PM, Sébastien Cuendet
Click on the 'generate' box - with some devanagri fonts I have found that
text does not display but the tiff/box are generated. Maybe same for the
arabic font you are using. Give it a try.
You can also try to copy and paste the text, sometimes that works.
ShreeDevi
I think you are using the wrong tools ...
If you need to convert a jpg to tif, use an image editor such as
imagemagick, irfanview
If you need to OCR the image, tesseract accepts jpg as input as well as tif
There already is arabic traineddata for tesseract - see
Please see
https://code.google.com/p/tesseract-ocr/source/browse/?repo=langdata#git%2Fkat
Language codesISO 639-1 http://en.wikipedia.org/wiki/ISO_639-1kaISO 639-2
http://en.wikipedia.org/wiki/ISO_639-2geo
http://www.sil.org/iso639-3/documentation.asp?id=geo (B)
kat
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Nov 7, 2014 at 4:26 PM, iram akbar iramakb...@gmail.com wrote:
Hi,
i want to make my own tessdata
Also see
https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0usp=sharing
tutorial files for overview
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Nov 7, 2014 at 5:04 PM, ShreeDevi Kumar shreesh
CC:ing Ray and Dev group
That language data is part of the update done by Ray Smith on August 12.
Ray is planning an update to language data and traineddata soon, so if you
have suggestions for improvement, please file an issue and provide more
details, samples of each script, etc..
ShreeDevi
See
https://groups.google.com/forum/?utm_medium=emailutm_source=footer#!topic/tesseract-dev/8e0F2cK2YzU
for
Plans for 3.04 release
For Training Instructions, please see
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
Look under jtessboxeditor/samples/vie folder
and create similar files for your language
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Nov 10, 2014 at 1:10 PM, iram akbar iramakb...@gmail.com wrote:
Quan,
i
What method are you using for training?
Which version of tesseract?
What platform?
Please see instructions on
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
The following shell script will be useful, if using the latest source from
git.
JTessBoxEditor has three tabs
Use *Tiff/Box Generator* to generate tiff and box files from a given text
file for the chosen font
The Box files created by Box/Tiff Generator are based on the rendering of
the text in the chosen font and will be accurate - however they may still
get errors 'blob
Please attach a copy of the image so that I can try.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Nov 11, 2014 at 9:43 PM, misonis...@gmail.com wrote:
I was in PSM_SINGLE_LINE mode indeed, because my text is
Have you tested with the English traineddata from the git tessdata repo?
Please see
https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
try with these,
/path/to/eng.user-patterns:
1-\d\d\d-GOOG-411
www.\n\\\*.com
I haven't tried this personally though
ShreeDevi
You don't need to train in order to extract text.
Have you tried with the english traineddata .. available from
https://code.google.com/p/tesseract-ocr/source/browse/?repo=tessdata
ShreeDevi
भजन - कीर्तन - आरती @
also see https://groups.google.com/forum/#!topic/tesseract-ocr/et7bS5QRf2o
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Nov 11, 2014 at 11:02 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Have you tested
You need to pre-process the image so that G shows up correctly. In the
attached image G looks like a 6 as it is connected.
If that is the shape of G in the font and you need to OCR it, you may
either need to retrain or post-process the text.
You could also try with a newer version of program.
I checked with vietocr beta4, which uses newer version of tesseract - it
recognizes your tiff correctly.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Nov 12, 2014 at 8:12 AM, ShreeDevi Kumar shreesh
, as the final version of
what I'm using will be using an iOS CocoaPod that does not support the
bazaar functionality of Tesseract.
On Tue, Nov 11, 2014 at 8:51 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
On Wed, Nov 12, 2014 at 2:13 AM, ste...@fortyau.com wrote:
The user-patterns looks
You need leptonica 1.71 for the current version of tesseract.
liblept.so.4
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Nov 12, 2014 at 5:05 PM, Patrick Vöhrs voe...@wesoma-consulting.com
wrote:
Hi at all,
Have you seen http://tess4j.sourceforge.net/ - A Java JNA wrapper for
Tesseract OCR API.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Nov 12, 2014 at 6:18 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
You
];
On Wed, Nov 12, 2014 at 12:30 AM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Are you able to pass a configuration variable with iOS CocoaPod ?
*-c configvar=value*
Set value for control parameter. Multiple -c arguments are allowed.
*configfile*
The name of a config to use. A config
, ShreeDevi Kumar shreesh...@gmail.com
wrote:
bazaar is nothing but a config file which sets values for a set of config
variables, please see
https://code.google.com/p/tesseract-ocr/source/browse/tessdata/configs/bazaar
So, if patterns are helpful, you can that as a config.
ShreeDevi
You can look at the unicharset of the traineddata to see the coverage.
try with eng+deu+iast
iast is a traineddata that I generated for sanskrit transliteration in
roman/latin script.
https://code.google.com/r/shreeshrii-langdata/source/browse/iast.unicharset?name=iast
Straighten the image before sending to tesseract. You can use scantailor or
unpaper.
Imagemagick may also have an option, you'll have to look.
See attached images - output from scantailor - and then OCRed using Vietocr
(gui frontend to Tesseract)
MODEL NAME 7
MOORE RF28HMEDBSR
ml.“
| mt
.txt
.pdf
.hocr
pdf and hocr can be passed as CONFIG file options when using tesseract from
commandline
and txt output is created automatically (in both cases, I think)
This is with the latest version of tesseract from git.
ShreeDevi
asc traineddata does not have a wordlist or dictionary, so using eng will
help with that. Also, I just trained using a few fonts that support the
whole range. If you train with the font you are using, you will get better
results.
You can use 'combine_tessdata' command with the -u (unpack) option
Amarjeet,
Glad that you are getting 70-80% correct OCR for Marathi using the Konkani
traineddata I posted.
The Hindi traineddata was trained with 'cube' method by Google but that is
not available to us.
The training can be improved with better training text or font similar to
the one being
Have you tried with the existing english traineddata?
I get good recognition with your 'prepared-image'?
If that is the kind of image you need to OCR, you could do that with psm 6
and then split each letter separately?
ShreeDevi
भजन -
take a look at hocr output
and tsv option from https://code.google.com/r/email-hocr-tsv/
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sat, Nov 15, 2014 at 3:39 PM, Simon Støvring simonstoevr...@gmail.com
wrote:
I
I have not used Serak - but the issues page there indicates problems with
RTL languages - see
https://code.google.com/p/serak-tesseract-trainer/issues/detail?id=6
why are u not using jtessbox editor's trainer or the command line programs?
I think the binaries are bundled with JTess...
here.
Question: m i giving the wrong file in the path in Tesseract executable
and Training data i.e ara box file? or what goes wrong.
note: i have put no data words_list, frequent_words, font_properties file.
On 20 November 2014 17:32, ShreeDevi Kumar shreesh...@gmail.com wrote:
I have
Hi,
Have you added the fonts to font-properties file?
Try removing the 'narrow' font from your training set.
Test with just one or two similar fonts and see if results are better.
ShreeDevi
भजन - कीर्तन - आरती @
.
On Wed, Nov 19, 2014 at 7:47 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Training 2 files
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Nov 20, 2014 at 9:15 AM, ShreeDevi Kumar shreesh...@gmail.com
Have you tried with version compiled from latest source on git?
If you post a couple of sample images I can give a try and let you know
what results I get.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sun, Nov 23,
Hi Chris,
I opened the pdfs in Adobe Reader as well as Foxit Reader on Windows7, and
the page flickers with large size text but then seems to display normally -
zoom 100% also seems to be regular output only.
Tesseract now has a 'pdf' option, so you don't need to do 'hocrpdf'. Try
the following:
Which version of source have you used?
Latest version is available from
https://code.google.com/p/tesseract-ocr/source/checkout
You need the pdf config files in tessdata directory. See
https://code.google.com/p/tesseract-ocr/source/browse/tessdata
You also need to make sure that tessdata_prefix
I think you need to deskew/dewarp the lines, increase brighness, get the
imaes at 300dpi and try.
I tested using your images with vietocr (4.0 beta) with the following
output ...
--
East 133rd Street, cast from Cypress Ave. In the background is
the United Electric Light and
https://code.google.com/p/tesseract-ocr/source/browse/?repo=langdata#git%2Feng
https://code.google.com/p/tesseract-ocr/source/browse?repo=tessdata#git
http://tesseract-ocr.googlecode.com/svn-history/trunk/doc/combine_tessdata.1.html
pecify option -u to unpack all the components to the specified
Have you looked at imagemagick and related scripts for pre-processing the
images?
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Wed, Jan 21, 2015 at 1:30 AM, newbie spens.mallang...@gmail.com wrote:
I found that
You can look at
http://zdenop.github.io/tesseract-doc/
http://fossies.org/dox/tesseract-ocr-3.02.02/index.html
https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0usp=sharing
https://code.google.com/p/tesseract-ocr/wiki/Documentation
ShreeDevi
://bhajans.ramparivar.com
On Fri, Jan 9, 2015 at 5:44 PM, ShreeDevi Kumar shree...@gmail.com
wrote:
you should *uninstall the old version fully* and then build the
version from git. It is possibly referring to some older libraries.
Also, this needs leptonica 1.71. Not sure if the documentation
I am using the git version -- output and messages attached. pdf seems to
have all the lines.
User@HP ~/tesseract-ocr/testing
$ tesseract 5.tif 5 pdf
Tesseract Open Source OCR Engine v3.04.00 with Leptonica
Page 1
OSD: Weak margin (5.78), horiz textlines, not CJK: Don't rotate.
Page 2
Too few
As far as I know, pdf creation is a new addition and the issues were ironed
out only recently. There have been over 100 commits to the code since 3.03
rc.
If you want the new functionality, you can try compiling the code from
https://code.google.com/p/tesseract-ocr/source/checkout
Instructions
you should *uninstall the old version fully* and then build the version
from git. It is possibly referring to some older libraries.
Also, this needs leptonica 1.71. Not sure if the documentation mentions it
or not.
ShreeDevi
भजन -
please see https://code.google.com/p/tesseract-ocr/issues/detail?id=1278
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Fri, Jan 9, 2015 at 5:44 PM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
you should *uninstall
see
http://stackoverflow.com/questions/15067651/cannot-find-a-way-to-make-tessnet2-work
tessnet2 is .NET wrapper for Tesseract 2.04
Try newer versions - say from https://github.com/charlesw/tesseract
ShreeDevi
भजन - कीर्तन - आरती @
I don't think that's the supposed behavior. What version of tesseract are
you using? Please post a sample image for testing?
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Jan 8, 2015 at 8:00 PM, C.
vietocr has bulkocr and batch options.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sun, Mar 22, 2015 at 6:39 AM, Dennis dennisg...@gmail.com wrote:
I'm using the latest version of tesseract: 3.02.
I
Please see
http://www.ucsc.cmb.ac.lk/sdu/research.html
http://192.248.22.122/ocrsinhala/upload.php
Here is the output from it:
ටුද්රණි:ල .ය්චත වැට වරීජන:: ඵාෂ්. ඨ:ර්චූකට පවන්චි:යගැ න ::න චූට කූ- එ0
දූකූ:ගයගැ
0පි පිශ්රීබඳව රජය:ෘන් ඉදීරිෂන් කූයරන ය:ට,රණ් ච්ඝ දූ0කට 9දාද්රඩා භ:තපිජං
.ාරීග
ාඝන්
German language code is deu NOT dau
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Mar 9, 2015 at 9:06 AM, Ofer Rosenberg rosenberg.o...@gmail.com
wrote:
Hello,
I have a problem when running tesseract for a
have you followed the suggestions given on
https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Mar 9, 2015 at 10:26 AM, Daniel danieluc...@gmail.com wrote:
http://sourceforge.net/projects/tesseracthindi/files/?source=navbar
you can take the training files from there and improve.
If the work is for an NGO, you can also contact IISC for Tamil and Kannada
OCR - please see
I have not done any additional work on that.
Not sure when the next release will be and which languages will be
supported in it.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Sun, Apr 5, 2015 at 11:55 PM, Ash L
Please try the vietocr gui frontend for tesseract ocr available from
http://vietocr.sourceforge.net/
It uses a newer version of tesseract.
you can also try using the bengali traineddata available on tesseract site
-
Did you try with the Latin traineddata
https://github.com/tesseract-ocr/tessdata/blob/master/lat.traineddata?raw=true
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Jul 6, 2015 at 5:46 PM, Brennan Nunamaker
Please see https://github.com/tesseract-ocr/langdata/tree/master/lat
which has the language data used for latin. You can use this as the basis
to create your own traineddata file for an old historical version of latin
ShreeDevi
भजन -
, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Please see https://github.com/tesseract-ocr/langdata/tree/master/lat
which has the language data used for latin. You can use this as the basis
to create your own traineddata file for an old historical version of
latin
ShreeDevi
, 2015 at 6:41 PM, ShreeDevi Kumar shree...@gmail.com
wrote:
Please see https://github.com/tesseract-ocr/langdata/tree/master/lat
which has the language data used for latin. You can use this as the
basis to create your own traineddata file for an old historical version
of latin
ShreeDevi
On Mon, Aug 17, 2015 at 6:07 AM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Ray was looking for comparative feedback regarding the new traineddata
for RTL languages, so this will be useful.
Ray -
https://groups.google.com/forum/#!msg/tesseract-dev/qcFtWCAAlT8/SZ4xBS5DHwwJ
Another
Ray was looking for comparative feedback regarding the new traineddata for
RTL languages, so this will be useful.
As far as I know, Google Docs does not use tesseract OCR engine for
recognizing the text. Its OCR accuracy is better than Tesseract for some
Indian languages also. However, it doesn't
On Sun, Aug 2, 2015 at 3:25 PM, Marco Atzeri marco.atz...@gmail.com wrote:
On 8/2/2015 10:31 AM, ShreeDevi Kumar wrote:
+ tesseract-dev google group
Thank you, Marco. I will download the training tools packages and and
give it a try.
In future updates to the tesseract package, may I
I am assuming that FreeOCR is using an older version of tesseract engine
and hence does not support the newer traineddata files for grc etc.
On Windows, you can give a try to the binaries built by Simon on cygwin
with the latest code from github - http://domasofan.spdns.eu/tesseract/
ShreeDevi
It maybe best to post this as an issue
- sent from my phone. excuse the brevity and typos.
On 13 Aug 2015 15:30, Anshul Maheshwari anshul.ffm...@gmail.com wrote:
I have pasted valgrind output, where tesseract is just linked not used any
single api of tessearct in my code.
then it have
- International Components for Unicode:
Layout library
icu-lx icu-lx - International Components for Unicode:
Paragraph Layout library
$ pkg-config --libs icu-i18n
-licui18n -licuuc -licudata -lpthread -lm
On 7/27/2015 9:05 AM, ShreeDevi Kumar wrote:
Marco,
Please see
to file.
greetings,
simon
Am 23.07.2015 um 04:55 schrieb ShreeDevi Kumar:
http://domasofan.spdns.eu/tesseract/how%20to%20install.txt
Excellent instructions, Simon.
I am downloading and will give it a try under Windows8.
I would suggest that you add 'Tesseract for Windows' as a heading
Zdenko,
Just to confirm,
Is it OK to use the newer releases from
https://github.com/tesseract-ocr/tesseract/releases for distribution or
is the latest code for distribution 3.04.00?
Thanks!
ShreeDevi
भजन - कीर्तन - आरती @
Thank you, Marco.
1. Is there a way to download just the tesseract package and dependencies
(like Simon had setup) for testing purposes for those who do not have a
cygwin install?
2. The pdf output option (as far as I understand it) adds the OCRed text
layer on top of copy of the original image,
**: training tools don't build #61*
- sent from my phone. excuse the brevity and typos.
On 27 Jul 2015 11:50, Marco Atzeri marco.atz...@gmail.com wrote:
On 7/27/2015 4:54 AM, ShreeDevi Kumar wrote:
Thank you, Marco.
1. Is there a way to download just the tesseract package and
dependencies (like
You can test the Cygwin compiled windows binaries by Simon. However pdf
output is not working in it.
- sent from my phone. excuse the brevity and typos.
On 25 Jul 2015 16:07, Sriranga(81+yrsold) withblessi...@gmail.com wrote:
thanks for the information.
On 21 July 2015 at 05:09, ShreeDevi
Mark,
3.04 is officially going to be released soon. Can you share your experience
with windows build to help in that process.
- sent from my phone. excuse the brevity.
On 11 Jul 2015 10:44, Mark Seidner topo...@gmail.com wrote:
Hi everyone,
I downloaded the latest 3.04 code from git and
as any other. Why it
should be tagged???
Zdenko
On Tue, Jul 21, 2015 at 6:44 AM, ShreeDevi Kumar shreesh...@gmail.com
wrote:
Zdenko,
How is this update tagged? Is there a version number with it for future
ref.
- sent from my phone. excuse the brevity.
On 21 Jul 2015 00:09, zdenko
I don't think a windows binary of 3.04.00 has been made available.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Mon, Jul 20, 2015 at 6:11 PM, sriranga(82yrsold)
withblessing.sriranga.1...@gmail.com wrote:
From
for f in *.tif
do
tesseract$f $f hocr
done
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Jul 21, 2015 at 4:29 PM, Stathis L. doombringer...@gmail.com
wrote:
Does anybody know how to process multiple
that for loop is for a bash script -
please see http://www.cyberciti.biz/faq/bash-for-loop/ for examples -
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Tue, Jul 21, 2015 at 6:36 PM, Stathis L.
There is marathi traineddata. However that is not trained with cube engine
and hence may not be as accurate.
http://packages.ubuntu.com/wily/tesseract-ocr-mar
You can test with both hin and mar and report your experience.
Thanks!
- sent from my phone. excuse the brevity.
On 28 Oct 2015 14:16,
For indian languages also check out OCR feature in google drive/docs.
- sent from my phone. excuse the brevity.
On 28 Oct 2015 17:34, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote:
> There is marathi traineddata. However that is not trained with cube engine
> and hence
manpages.ubuntu.com/manpages/precise/man1/tesseraact.1.html
- sent from my phone. excuse the brevity.
On 14 Oct 2015 19:40, "Bill Wong" wrote:
> I've been comparing for the same image on PC and MAC, the results differ a
> lot.
> My images are PNG files, in french
To use a particular language the syntax is
-l fra
Not
-fra
- sent from my phone. excuse the brevity.
On 14 Oct 2015 19:40, "Bill Wong" wrote:
> I've been comparing for the same image on PC and MAC, the results differ a
> lot.
> My images are PNG files, in french
Have you tried with the new traineddata files at
https://github.com/tesseract-ocr/tessdata
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Jul 9, 2015 at 2:55 PM, wfxia...@gmail.com wrote:
Hi, Nade, thanks for
Usually if you have multiple traineddata for same language, you would give
a distinct name to each eg. eng and en2
Then if you want to use both
-l eng+en2
Or
-l en2+eng
Depending on which one u want to give priority to
To use ur own traineddata en2 only
-l en2
- sent from my phone. excuse the
See https://tesseract-ocr.googlecode.com/git/doc/tesseract.1.html for
syntax of command
- sent from my phone. excuse the brevity
On 10 Jul 2015 12:11, ShreeDevi Kumar shreesh...@gmail.com wrote:
Usually if you have multiple traineddata for same language, you would give
a distinct name to each
1 - 100 of 761 matches
Mail list logo