Re: [tesseract-ocr] Tessdata for marathi

2014-10-16 Thread ShreeDevi Kumar
Marathi traineddata should be in the next release, since there is langdata for it now in the repo. You can give a try to the traineddata file from https://code.google.com/r/shreeshrii-tessdata/source/browse?name=knn which is a start for konkani. ShreeDevi

Re: [tesseract-ocr] how can I get better results for this

2014-10-17 Thread ShreeDevi Kumar
https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality ​try with image at 300dpi or higher. resize 300%​ ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Oct 17, 2014 at 8:35 PM, Rick Leir rich...@c7a.ca

Re: [tesseract-ocr] how can I get better results for this

2014-10-17 Thread ShreeDevi Kumar
You have to experiment .. I got better results after some image processing and vietocr .. that it has bcln dooi transfer of a portzon which has been leased an. M- nan-ant.‘ 0n Mu [image: Inline image 1] ShreeDevi भजन - कीर्तन -

Re: [tesseract-ocr] Training for plotter file

2014-10-19 Thread ShreeDevi Kumar
Which version of tesseract are you using? Try changing to 300/600 dpi, apply a blur/soften filter, decrease brighness, convert to greyscale. I tried with vietocr gui, zero with the line across gets recognized as @, rest comes out ok. If you will not have @ in your plots, you could just

Re: [tesseract-ocr] Reading dot matrix characters

2014-10-23 Thread ShreeDevi Kumar
Try .net wrapper with newer version of tesseract. invert the image, smoothen/blur, make greyscale ... I tried with vietocr output is 'QBCDEFGHIJKL' ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Oct 23, 2014 at

Re: [tesseract-ocr] Reading dot matrix characters

2014-10-23 Thread ShreeDevi Kumar
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Oct 23, 2014 at 12:24 PM, ShreeDevi Kumar shreesh...@gmail.com wrote: Try .net wrapper with newer version of tesseract. invert the image, smoothen/blur, make greyscale ... I tried

Re: [tesseract-ocr] any chance to get this .tiff converted to text?

2014-10-28 Thread ShreeDevi Kumar
I was going to suggest the tips from https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality but, just OCRing the image without any changes in VietOCR (GUI frontend for tesseract) with German traineddata gives perfect result - see image. What version are you using, on what platform, ?? I

Re: [tesseract-ocr] Re: any chance to get this .tiff converted to text?

2014-10-29 Thread ShreeDevi Kumar
Please choose german in the dropdown for language on right hand side. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Oct 29, 2014 at 9:08 PM, boris borisri...@gmail.com wrote: Hi Shree, many thanks for your

Re: [tesseract-ocr] Re: any chance to get this .tiff converted to text?

2014-10-30 Thread ShreeDevi Kumar
Do look at https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality for pre-processing steps for your images to improve recognition regardless of the OCR you use. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed,

Re: [tesseract-ocr] Re: any chance to get this .tiff converted to text?

2014-10-31 Thread ShreeDevi Kumar
In VietOCR's image menu, check 'screenshot mode' Use the filters submenu to experiment with other settings to improve your image. Look under properties for the dpi, convert your input images to 300dpi as they are currently low res (72dpi or so). experiment :-) ShreeDevi

Re: [tesseract-ocr] Strange regocnition

2014-10-31 Thread ShreeDevi Kumar
change image to 300 dpi try vietocr - in screenshot mode - try with the vietnamese traineddata with commandline tesseract use 'digits' config file as parameter recognizing only numbers is actually answered on the tesseract FAQ http://code.google.com/p/tesseract-ocr/wiki/FAQ

Re: [tesseract-ocr] default mode PSM

2014-11-01 Thread ShreeDevi Kumar
http://manpages.ubuntu.com/manpages/precise/man1/tesseract.1.html *tesseract* *imagename* *outbase* [*-l* *lang*] [*-psm* *N*] [*configfile* ...] ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Nov 1, 2014 at

Re: [tesseract-ocr] default mode PSM

2014-11-01 Thread ShreeDevi Kumar
Updated version of man page is at https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Nov 1, 2014 at 4:19 PM, ShreeDevi Kumar shreesh...@gmail.com

Re: [tesseract-ocr] Re: Adding new language to Tesseract?

2014-11-03 Thread ShreeDevi Kumar
There already is language data for srp - please see https://code.google.com/p/tesseract-ocr/source/browse/srp/?repo=langdata and https://code.google.com/p/tesseract-ocr/source/browse/srp.traineddata?repo=tessdata Ray Smith, the lead developer of tesseract at Google is planning to release

Re: [tesseract-ocr] Re: Adding new language to Tesseract?

2014-11-03 Thread ShreeDevi Kumar
Thanks for clarifying and giving more details. I am cc:ing this email to the tesseract developers group and Ray for answer to your question how to submit this file to Tesseract's repository?. Meanwhile, I suggest that you add an 'issue' and attach the traineddata. Thanks! ShreeDevi

[tesseract-ocr] Re: Contribution : Serbian Cyrillic traineddata file

2014-11-03 Thread ShreeDevi Kumar
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Nov 4, 2014 at 7:35 AM, ShreeDevi Kumar shreesh...@gmail.com wrote: Thanks for clarifying and giving more details. I am cc:ing this email to the tesseract developers group and Ray

Re: [tesseract-ocr] Reading dot matrix characters

2014-11-05 Thread ShreeDevi Kumar
I had asked to try vietocr because it is using a newer svn version for the java 4.0beta and I find it easy to test under windows with the gui, as I can change the image filter settings in it. You will have to choose the tools based on your platform and other requirements. You could use

Re: [tesseract-ocr] How to run make training for Repo installed Tesseract 3.03

2014-11-05 Thread ShreeDevi Kumar
Did you install the latest version from http://packages.ubuntu.com/utopic/tesseract-ocr If so, it should have the trainingtools. Try which text2image to see if it installed ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Reading dot matrix characters

2014-11-05 Thread ShreeDevi Kumar
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Nov 5, 2014 at 4:57 PM, ShreeDevi Kumar shreesh...@gmail.com wrote: I had asked to try vietocr because it is using a newer svn version for the java 4.0beta and I find it easy to test under windows with the gui, as I can

Re: [tesseract-ocr] Re: jTessBoxEditor 0.6 Beta release

2014-11-06 Thread ShreeDevi Kumar
Please also change the FONT under TRAINER tab to Arabic . ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Nov 6, 2014 at 2:49 PM, iram akbar iramakb...@gmail.com wrote: i have downloaded the lates version 1.1

Re: [tesseract-ocr] Reducing the generated PDF size / compression PDF

2014-11-06 Thread ShreeDevi Kumar
You could also test with gswin32c -q -dNOPAUSE -dBATCH -sDEVICE=tiffgray -sCompression=lzw -r300 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Nov 6, 2014 at 2:13 PM, Sébastien Cuendet

Re: [tesseract-ocr] Re: jTessBoxEditor 0.6 Beta release

2014-11-06 Thread ShreeDevi Kumar
Click on the 'generate' box - with some devanagri fonts I have found that text does not display but the tiff/box are generated. Maybe same for the arabic font you are using. Give it a try. You can also try to copy and paste the text, sometimes that works. ShreeDevi

Re: [tesseract-ocr] Re: jTessBoxEditor 0.6 Beta release

2014-11-06 Thread ShreeDevi Kumar
​I think you are using the wrong tools ... If you need to convert a jpg to tif, use an image editor such as imagemagick, irfanview If you need to OCR the image, tesseract accepts jpg as input as well as tif There already is arabic traineddata for tesseract - see

Re: [tesseract-ocr] Support Language

2014-11-07 Thread ShreeDevi Kumar
Please see https://code.google.com/p/tesseract-ocr/source/browse/?repo=langdata#git%2Fkat Language codesISO 639-1 http://en.wikipedia.org/wiki/ISO_639-1kaISO 639-2 http://en.wikipedia.org/wiki/ISO_639-2geo http://www.sil.org/iso639-3/documentation.asp?id=geo (B) kat

Re: [tesseract-ocr] Re: Tesseract 3.02.02 Released

2014-11-07 Thread ShreeDevi Kumar
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Nov 7, 2014 at 4:26 PM, iram akbar iramakb...@gmail.com wrote: Hi, i want to make my own tessdata

Re: [tesseract-ocr] Re: Tesseract 3.02.02 Released

2014-11-07 Thread ShreeDevi Kumar
Also see https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0usp=sharing tutorial files for overview ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Nov 7, 2014 at 5:04 PM, ShreeDevi Kumar shreesh

Re: [tesseract-ocr] Support Georgian Language

2014-11-07 Thread ShreeDevi Kumar
CC:ing Ray and Dev group That language data is part of the update done by Ray Smith on August 12. Ray is planning an update to language data and traineddata soon, so if you have suggestions for improvement, please file an issue and provide more details, samples of each script, etc.. ShreeDevi

Re: [tesseract-ocr] Support Language

2014-11-08 Thread ShreeDevi Kumar
See https://groups.google.com/forum/?utm_medium=emailutm_source=footer#!topic/tesseract-dev/8e0F2cK2YzU for Plans for 3.04 release For Training Instructions, please see https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

Re: [tesseract-ocr] Re: jTessBoxEditor 0.6 Beta release

2014-11-10 Thread ShreeDevi Kumar
Look under jtessboxeditor/samples/vie folder and create similar files for your language ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Nov 10, 2014 at 1:10 PM, iram akbar iramakb...@gmail.com wrote: Quan, i

Re: [tesseract-ocr] Training Tesseract Can't Find Files

2014-11-10 Thread ShreeDevi Kumar
What method are you using for training? Which version of tesseract? What platform? Please see instructions on https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 The following shell script will be useful, if using the latest source from git.

Re: [tesseract-ocr] Re: jTessBoxEditor - Tesseract box editor trainer

2014-11-11 Thread ShreeDevi Kumar
JTessBoxEditor has three tabs Use *Tiff/Box Generator* to generate tiff and box files from a given text file for the chosen font The Box files created by Box/Tiff Generator are based on the rendering of the text in the chosen font and will be accurate - however they may still get errors 'blob

Re: [tesseract-ocr] Re: 6od instead of God

2014-11-11 Thread ShreeDevi Kumar
Please attach a copy of the image so that I can try. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Nov 11, 2014 at 9:43 PM, misonis...@gmail.com wrote: I was in PSM_SINGLE_LINE mode indeed, because my text is

Re: [tesseract-ocr] Train Tesseract to Only Find a Single 17 Character Word

2014-11-11 Thread ShreeDevi Kumar
Have you tested with the English traineddata from the git tessdata repo? Please see https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html try with these, /path/to/eng.user-patterns: 1-\d\d\d-GOOG-411 www.\n\\\*.com I haven't tried this personally though ShreeDevi

Re: [tesseract-ocr] Re: jTessBoxEditor - Tesseract box editor trainer

2014-11-11 Thread ShreeDevi Kumar
You don't need to train in order to extract text. Have you tried with the english traineddata .. available from https://code.google.com/p/tesseract-ocr/source/browse/?repo=tessdata ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Train Tesseract to Only Find a Single 17 Character Word

2014-11-11 Thread ShreeDevi Kumar
also see https://groups.google.com/forum/#!topic/tesseract-ocr/et7bS5QRf2o ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Nov 11, 2014 at 11:02 PM, ShreeDevi Kumar shreesh...@gmail.com wrote: Have you tested

Re: [tesseract-ocr] Re: 6od instead of God

2014-11-11 Thread ShreeDevi Kumar
You need to pre-process the image so that G shows up correctly. In the attached image G looks like a 6 as it is connected. If that is the shape of G in the font and you need to OCR it, you may either need to retrain or post-process the text. You could also try with a newer version of program.

Re: [tesseract-ocr] Re: Train Tesseract to Only Find a Single 17 Character Word

2014-11-11 Thread ShreeDevi Kumar
On Wed, Nov 12, 2014 at 2:13 AM, ste...@fortyau.com wrote: The user-patterns looks helpful, but I can't find any documentation on formatting or how it works. Is there documentation on this somewhere? ​Did you see the man page? I had also sent link to a related discussion in the past.

Re: [tesseract-ocr] Re: 6od instead of God

2014-11-11 Thread ShreeDevi Kumar
I checked with vietocr beta4, which uses newer version of tesseract - it recognizes your tiff correctly. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Nov 12, 2014 at 8:12 AM, ShreeDevi Kumar shreesh

Re: [tesseract-ocr] Re: Train Tesseract to Only Find a Single 17 Character Word

2014-11-11 Thread ShreeDevi Kumar
, as the final version of what I'm using will be using an iOS CocoaPod that does not support the bazaar functionality of Tesseract. On Tue, Nov 11, 2014 at 8:51 PM, ShreeDevi Kumar shreesh...@gmail.com wrote: On Wed, Nov 12, 2014 at 2:13 AM, ste...@fortyau.com wrote: The user-patterns looks

Re: [tesseract-ocr] Exception in thread main java.lang.UnsatisfiedLinkError: liblept.so.4: Cannot load Shared-Object

2014-11-12 Thread ShreeDevi Kumar
You need leptonica 1.71 for the current version of tesseract. liblept.so.4 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Nov 12, 2014 at 5:05 PM, Patrick Vöhrs voe...@wesoma-consulting.com wrote: Hi at all,

Re: [tesseract-ocr] Exception in thread main java.lang.UnsatisfiedLinkError: liblept.so.4: Cannot load Shared-Object

2014-11-12 Thread ShreeDevi Kumar
Have you seen http://tess4j.sourceforge.net/ - A Java JNA wrapper for Tesseract OCR API. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Nov 12, 2014 at 6:18 PM, ShreeDevi Kumar shreesh...@gmail.com wrote: You

Re: [tesseract-ocr] Re: Train Tesseract to Only Find a Single 17 Character Word

2014-11-12 Thread ShreeDevi Kumar
]; On Wed, Nov 12, 2014 at 12:30 AM, ShreeDevi Kumar shreesh...@gmail.com wrote: Are you able to pass a configuration variable with iOS CocoaPod ? *-c configvar=value* Set value for control parameter. Multiple -c arguments are allowed. *configfile* The name of a config to use. A config

Re: [tesseract-ocr] Re: Train Tesseract to Only Find a Single 17 Character Word

2014-11-12 Thread ShreeDevi Kumar
, ShreeDevi Kumar shreesh...@gmail.com wrote: bazaar is nothing but a config file which sets values for a set of config variables, please see https://code.google.com/p/tesseract-ocr/source/browse/tessdata/configs/bazaar So, if patterns are helpful, you can that as a config. ShreeDevi

Re: [tesseract-ocr] Covering ASCII Extended range.

2014-11-12 Thread ShreeDevi Kumar
You can look at the unicharset of the traineddata to see the coverage. try with eng+deu+iast iast is a traineddata that I generated for sanskrit transliteration in roman/latin script. https://code.google.com/r/shreeshrii-langdata/source/browse/iast.unicharset?name=iast

Re: [tesseract-ocr] Reading Device labels to get model number

2014-11-13 Thread ShreeDevi Kumar
Straighten the image before sending to tesseract. You can use scantailor or unpaper. Imagemagick may also have an option, you'll have to look. See attached images - output from scantailor - and then OCRed using Vietocr (gui frontend to Tesseract) MODEL NAME 7 MOORE RF28HMEDBSR ml.“ | mt

Re: [tesseract-ocr] What are the possible output file extensions?

2014-11-13 Thread ShreeDevi Kumar
.txt .pdf .hocr pdf and hocr can be passed as CONFIG file options when using tesseract from commandline and txt output is created automatically (in both cases, I think) This is with the latest version of tesseract from git. ShreeDevi

Re: [tesseract-ocr] Covering ASCII Extended range.

2014-11-13 Thread ShreeDevi Kumar
asc traineddata does not have a wordlist or dictionary, so using eng will help with that. Also, I just trained using a few fonts that support the whole range. If you train with the font you are using, you will get better results. You can use 'combine_tessdata' command with the -u (unpack) option

Re: [tesseract-ocr] मराठी ओसीआर

2014-11-14 Thread ShreeDevi Kumar
Amarjeet, Glad that you are getting 70-80% correct OCR for Marathi using the Konkani traineddata I posted. The Hindi traineddata was trained with 'cube' method by Google but that is not available to us. The training can be improved with better training text or font similar to the one being

Re: [tesseract-ocr] Configure for single character recognition

2014-11-14 Thread ShreeDevi Kumar
Have you tried with the existing english traineddata? I get good recognition with your 'prepared-image'? If that is the kind of image you need to OCR, you could do that with psm 6 and then split each letter separately? ShreeDevi भजन -

Re: [tesseract-ocr] Configure for single character recognition

2014-11-15 Thread ShreeDevi Kumar
take a look at hocr output and tsv option from https://code.google.com/r/email-hocr-tsv/ ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Nov 15, 2014 at 3:39 PM, Simon Støvring simonstoevr...@gmail.com wrote: I

Re: [tesseract-ocr] Re: jTessBoxEditor 0.6 Beta release

2014-11-20 Thread ShreeDevi Kumar
I have not used Serak - but the issues page there indicates problems with RTL languages - see https://code.google.com/p/serak-tesseract-trainer/issues/detail?id=6 why are u not using jtessbox editor's trainer or the command line programs? I think the binaries are bundled with JTess...

[tesseract-ocr] is it possible to use the latest source from git to train Arabic?

2014-11-20 Thread ShreeDevi Kumar
here. Question: m i giving the wrong file in the path in Tesseract executable and Training data i.e ara box file? or what goes wrong. note: i have put no data words_list, frequent_words, font_properties file. On 20 November 2014 17:32, ShreeDevi Kumar shreesh...@gmail.com wrote: I have

Re: [tesseract-ocr] Training data gets worse as I add characters

2014-11-21 Thread ShreeDevi Kumar
Hi, Have you added the fonts to font-properties file? Try removing the 'narrow' font from your training set. Test with just one or two similar fonts and see if results are better. ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Covering ASCII Extended range.

2014-11-21 Thread ShreeDevi Kumar
. On Wed, Nov 19, 2014 at 7:47 PM, ShreeDevi Kumar shreesh...@gmail.com wrote: Training 2 files ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Nov 20, 2014 at 9:15 AM, ShreeDevi Kumar shreesh...@gmail.com

Re: [tesseract-ocr] Re: Searchable PDF output with oversized font

2014-11-23 Thread ShreeDevi Kumar
Have you tried with version compiled from latest source on git? If you post a couple of sample images I can give a try and let you know what results I get. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Nov 23,

Re: [tesseract-ocr] Re: Searchable PDF output with oversized font

2014-11-25 Thread ShreeDevi Kumar
Hi Chris, I opened the pdfs in Adobe Reader as well as Foxit Reader on Windows7, and the page flickers with large size text but then seems to display normally - zoom 100% also seems to be regular output only. Tesseract now has a 'pdf' option, so you don't need to do 'hocrpdf'. Try the following:

Re: [tesseract-ocr] tesseract 3 pdf error

2014-12-13 Thread ShreeDevi Kumar
Which version of source have you used? Latest version is available from https://code.google.com/p/tesseract-ocr/source/checkout You need the pdf config files in tessdata directory. See https://code.google.com/p/tesseract-ocr/source/browse/tessdata You also need to make sure that tessdata_prefix

Re: [tesseract-ocr] Odd behavior when trying to force a box to split

2015-01-01 Thread ShreeDevi Kumar
I think you need to deskew/dewarp the lines, increase brighness, get the imaes at 300dpi and try. I tested using your images with vietocr (4.0 beta) with the following output ... -- East 133rd Street, cast from Cypress Ave. In the background is the United Electric Light and

Re: [tesseract-ocr] Unable to locate dictionary files

2015-02-02 Thread ShreeDevi Kumar
https://code.google.com/p/tesseract-ocr/source/browse/?repo=langdata#git%2Feng https://code.google.com/p/tesseract-ocr/source/browse?repo=tessdata#git http://tesseract-ocr.googlecode.com/svn-history/trunk/doc/combine_tessdata.1.html pecify option -u to unpack all the components to the specified

Re: [tesseract-ocr] Re: tessdata/eng.traineddata question

2015-01-20 Thread ShreeDevi Kumar
Have you looked at imagemagick and related scripts for pre-processing the images? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Wed, Jan 21, 2015 at 1:30 AM, newbie spens.mallang...@gmail.com wrote: I found that

Re: [tesseract-ocr] Combining traineddata files

2015-02-10 Thread ShreeDevi Kumar
You cannot combine two traineddata filesets but you can give two traineddata sets for recognition using -l option (for languages) examples -l eng+myeng -l eng+jpn -l jpn+eng -l hin+san -l eng+tam ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] Help needed in understanding source. New to tesseract.

2015-02-01 Thread ShreeDevi Kumar
You can look at http://zdenop.github.io/tesseract-doc/ http://fossies.org/dox/tesseract-ocr-3.02.02/index.html https://drive.google.com/folderview?id=0B7l10Bj_LprhQnpSRkpGMGV2eE0usp=sharing https://code.google.com/p/tesseract-ocr/wiki/Documentation ShreeDevi

Re: [tesseract-ocr] lines dissappear in resulting file

2015-01-09 Thread ShreeDevi Kumar
://bhajans.ramparivar.com On Fri, Jan 9, 2015 at 5:44 PM, ShreeDevi Kumar shree...@gmail.com wrote: you should *uninstall the old version fully* and then build the version from git. It is possibly referring to some older libraries. Also, this needs leptonica 1.71. Not sure if the documentation

Re: [tesseract-ocr] lines dissappear in resulting file

2015-01-08 Thread ShreeDevi Kumar
I am using the git version -- output and messages attached. pdf seems to have all the lines. User@HP ~/tesseract-ocr/testing $ tesseract 5.tif 5 pdf Tesseract Open Source OCR Engine v3.04.00 with Leptonica Page 1 OSD: Weak margin (5.78), horiz textlines, not CJK: Don't rotate. Page 2 Too few

Re: [tesseract-ocr] lines dissappear in resulting file

2015-01-09 Thread ShreeDevi Kumar
As far as I know, pdf creation is a new addition and the issues were ironed out only recently. There have been over 100 commits to the code since 3.03 rc. If you want the new functionality, you can try compiling the code from https://code.google.com/p/tesseract-ocr/source/checkout Instructions

Re: [tesseract-ocr] lines dissappear in resulting file

2015-01-09 Thread ShreeDevi Kumar
you should *uninstall the old version fully* and then build the version from git. It is possibly referring to some older libraries. Also, this needs leptonica 1.71. Not sure if the documentation mentions it or not. ShreeDevi भजन -

Re: [tesseract-ocr] lines dissappear in resulting file

2015-01-09 Thread ShreeDevi Kumar
please see https://code.google.com/p/tesseract-ocr/issues/detail?id=1278 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jan 9, 2015 at 5:44 PM, ShreeDevi Kumar shreesh...@gmail.com wrote: you should *uninstall

Re: [tesseract-ocr] Very wrong output Tessnet2 + Tesseract

2015-01-03 Thread ShreeDevi Kumar
see http://stackoverflow.com/questions/15067651/cannot-find-a-way-to-make-tessnet2-work tessnet2 is .NET wrapper for Tesseract 2.04 Try newer versions - say from https://github.com/charlesw/tesseract ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] lines dissappear in resulting file

2015-01-08 Thread ShreeDevi Kumar
I don't think that's the supposed behavior. What version of tesseract are you using? Please post a sample image for testing? ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Thu, Jan 8, 2015 at 8:00 PM, C.

Re: [tesseract-ocr] Training for plotter file

2015-03-22 Thread ShreeDevi Kumar
vietocr has bulkocr and batch options. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Mar 22, 2015 at 6:39 AM, Dennis dennisg...@gmail.com wrote: I'm using the latest version of tesseract: 3.02. I

Re: [tesseract-ocr] Preparing training data for new language

2015-03-15 Thread ShreeDevi Kumar
Please see http://www.ucsc.cmb.ac.lk/sdu/research.html http://192.248.22.122/ocrsinhala/upload.php Here is the output from it: ටුද්‍රණි:ල .ය්චත වැට වරීජන:: ඵාෂ්. ඨ:ර්චූකට පවන්චි:යගැ න ::න චූට කූ- එ0 දූකූ:ගයගැ 0පි පිශ්‍රීබඳව රජය:ෘන් ඉදීරිෂන් කූයරන ය:ට,රණ් ච්ඝ දූ0කට 9දාද්‍රඩා භ:තපිජං .ාරීග ාඝන්

Re: [tesseract-ocr] German doucment

2015-03-09 Thread ShreeDevi Kumar
German language code is deu NOT dau ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Mar 9, 2015 at 9:06 AM, Ofer Rosenberg rosenberg.o...@gmail.com wrote: Hello, I have a problem when running tesseract for a

Re: [tesseract-ocr] Re: Android OCR application looking for quality improvments

2015-03-09 Thread ShreeDevi Kumar
have you followed the suggestions given on https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Mar 9, 2015 at 10:26 AM, Daniel danieluc...@gmail.com wrote:

Re: [tesseract-ocr] Re: Steps to Configure Tesseract OCR for tamil Language

2015-03-01 Thread ShreeDevi Kumar
http://sourceforge.net/projects/tesseracthindi/files/?source=navbar you can take the training files from there and improve. If the work is for an NGO, you can also contact IISC for Tamil and Kannada OCR - please see

Re: [tesseract-ocr] Tessdata for marathi

2015-04-05 Thread ShreeDevi Kumar
I have not done any additional work on that. Not sure when the next release will be and which languages will be supported in it. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Apr 5, 2015 at 11:55 PM, Ash L

Re: [tesseract-ocr] Tesseract 3.02 does not detect inter-word spacing for Bengali language.

2015-05-19 Thread ShreeDevi Kumar
Please try the vietocr gui frontend for tesseract ocr available from http://vietocr.sourceforge.net/ It uses a newer version of tesseract. you can also try using the bengali traineddata available on tesseract site -

Re: [tesseract-ocr] Tesseract returns empty result with custom language but not english

2015-07-06 Thread ShreeDevi Kumar
Did you try with the Latin traineddata https://github.com/tesseract-ocr/tessdata/blob/master/lat.traineddata?raw=true ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 6, 2015 at 5:46 PM, Brennan Nunamaker

Re: [tesseract-ocr] Tesseract returns empty result with custom language but not english

2015-07-06 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/langdata/tree/master/lat which has the language data used for latin. You can use this as the basis to create your own traineddata file for an old historical version of latin ShreeDevi भजन -

Re: [tesseract-ocr] Tesseract returns empty result with custom language but not english

2015-07-06 Thread ShreeDevi Kumar
, ShreeDevi Kumar shreesh...@gmail.com wrote: Please see https://github.com/tesseract-ocr/langdata/tree/master/lat which has the language data used for latin. You can use this as the basis to create your own traineddata file for an old historical version of latin ShreeDevi

Re: [tesseract-ocr] Tesseract returns empty result with custom language but not english

2015-07-06 Thread ShreeDevi Kumar
, 2015 at 6:41 PM, ShreeDevi Kumar shree...@gmail.com wrote: Please see https://github.com/tesseract-ocr/langdata/tree/master/lat which has the language data used for latin. You can use this as the basis to create your own traineddata file for an old historical version of latin ShreeDevi

Re: [tesseract-ocr] Re: persian in tesseract-ocr

2015-08-17 Thread ShreeDevi Kumar
On Mon, Aug 17, 2015 at 6:07 AM, ShreeDevi Kumar shreesh...@gmail.com wrote: Ray was looking for comparative feedback regarding the new traineddata for RTL languages, so this will be useful. ​ Ray - https://groups.google.com/forum/#!msg/tesseract-dev/qcFtWCAAlT8/SZ4xBS5DHwwJ Another

Re: [tesseract-ocr] Re: persian in tesseract-ocr

2015-08-16 Thread ShreeDevi Kumar
Ray was looking for comparative feedback regarding the new traineddata for RTL languages, so this will be useful. As far as I know, Google Docs does not use tesseract OCR engine for recognizing the text. Its OCR accuracy is better than Tesseract for some Indian languages also. However, it doesn't

Re: [tesseract-ocr] building on cygwin with training data

2015-08-02 Thread ShreeDevi Kumar
On Sun, Aug 2, 2015 at 3:25 PM, Marco Atzeri marco.atz...@gmail.com wrote: On 8/2/2015 10:31 AM, ShreeDevi Kumar wrote: + tesseract-dev google group Thank you, Marco. I will download the training tools packages and and give it a try. In future updates to the tesseract package, may I

Re: [tesseract-ocr] FreeOCR stops working when trying to OCR in Greek (ell or grc) languages

2015-07-31 Thread ShreeDevi Kumar
I am assuming that FreeOCR is using an older version of tesseract engine and hence does not support the newer traineddata files for grc etc. On Windows, you can give a try to the binaries built by Simon on cygwin with the latest code from github - http://domasofan.spdns.eu/tesseract/ ShreeDevi

Re: [tesseract-ocr] Re: Memory leak

2015-08-14 Thread ShreeDevi Kumar
It maybe best to post this as an issue - sent from my phone. excuse the brevity and typos. On 13 Aug 2015 15:30, Anshul Maheshwari anshul.ffm...@gmail.com wrote: I have pasted valgrind output, where tesseract is just linked not used any single api of tessearct in my code. then it have

Re: [tesseract-ocr] tesseract on cygwin

2015-07-27 Thread ShreeDevi Kumar
- International Components for Unicode: Layout library icu-lx icu-lx - International Components for Unicode: Paragraph Layout library $ pkg-config --libs icu-i18n -licui18n -licuuc -licudata -lpthread -lm On 7/27/2015 9:05 AM, ShreeDevi Kumar wrote: Marco, Please see

Re: [tesseract-ocr] tesseract on cygwin

2015-07-23 Thread ShreeDevi Kumar
to file. greetings, simon Am 23.07.2015 um 04:55 schrieb ShreeDevi Kumar: http://domasofan.spdns.eu/tesseract/how%20to%20install.txt Excellent instructions, Simon. I am downloading and will give it a try under Windows8. I would suggest that you add 'Tesseract for Windows' as a heading

Re: [tesseract-ocr] displayed version number of tesseract when compiled from git

2015-07-23 Thread ShreeDevi Kumar
Zdenko, Just to confirm, Is it OK to use the newer releases from https://github.com/tesseract-ocr/tesseract/releases for distribution or is the latest code for distribution 3.04.00? Thanks! ShreeDevi भजन - कीर्तन - आरती @

Re: [tesseract-ocr] tesseract on cygwin

2015-07-26 Thread ShreeDevi Kumar
Thank you, Marco. 1. Is there a way to download just the tesseract package and dependencies (like Simon had setup) for testing purposes for those who do not have a cygwin install? 2. The pdf output option (as far as I understand it) adds the OCRed text layer on top of copy of the original image,

Re: [tesseract-ocr] tesseract on cygwin

2015-07-27 Thread ShreeDevi Kumar
**: training tools don't build #61* - sent from my phone. excuse the brevity and typos. On 27 Jul 2015 11:50, Marco Atzeri marco.atz...@gmail.com wrote: On 7/27/2015 4:54 AM, ShreeDevi Kumar wrote: Thank you, Marco. 1. Is there a way to download just the tesseract package and dependencies (like

Re: [tesseract-ocr] require tesseract.exe of 3.04 version.

2015-07-25 Thread ShreeDevi Kumar
You can test the Cygwin compiled windows binaries by Simon. However pdf output is not working in it. - sent from my phone. excuse the brevity and typos. On 25 Jul 2015 16:07, Sriranga(81+yrsold) withblessi...@gmail.com wrote: thanks for the information. On 21 July 2015 at 05:09, ShreeDevi

Re: [tesseract-ocr] differences between version 3.03 and 3.04

2015-07-13 Thread ShreeDevi Kumar
Mark, 3.04 is officially going to be released soon. Can you share your experience with windows build to help in that process. - sent from my phone. excuse the brevity. On 11 Jul 2015 10:44, Mark Seidner topo...@gmail.com wrote: Hi everyone, I downloaded the latest 3.04 code from git and

Re: [tesseract-ocr] building tesseract on windows using cygwin

2015-07-21 Thread ShreeDevi Kumar
as any other. Why it should be tagged??? Zdenko On Tue, Jul 21, 2015 at 6:44 AM, ShreeDevi Kumar shreesh...@gmail.com wrote: Zdenko, How is this update tagged? Is there a version number with it for future ref. - sent from my phone. excuse the brevity. On 21 Jul 2015 00:09, zdenko

Re: [tesseract-ocr] require tesseract.exe of 3.04 version.

2015-07-21 Thread ShreeDevi Kumar
I don't think a windows binary of 3.04.00 has been made available. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jul 20, 2015 at 6:11 PM, sriranga(82yrsold) withblessing.sriranga.1...@gmail.com wrote: From

Re: [tesseract-ocr] Multiple tiff processing

2015-07-21 Thread ShreeDevi Kumar
for f in *.tif do tesseract$f $f hocr done ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jul 21, 2015 at 4:29 PM, Stathis L. doombringer...@gmail.com wrote: Does anybody know how to process multiple

Re: [tesseract-ocr] Multiple tiff processing

2015-07-21 Thread ShreeDevi Kumar
that for loop is for a bash script - please see http://www.cyberciti.biz/faq/bash-for-loop/ for examples - ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Jul 21, 2015 at 6:36 PM, Stathis L.

Re: [tesseract-ocr] Re: How to use use tesseract to extract regional Indian text such as Marathi or Hindi?

2015-10-28 Thread ShreeDevi Kumar
There is marathi traineddata. However that is not trained with cube engine and hence may not be as accurate. http://packages.ubuntu.com/wily/tesseract-ocr-mar You can test with both hin and mar and report your experience. Thanks! - sent from my phone. excuse the brevity. On 28 Oct 2015 14:16,

Re: [tesseract-ocr] Re: How to use use tesseract to extract regional Indian text such as Marathi or Hindi?

2015-10-28 Thread ShreeDevi Kumar
For indian languages also check out OCR feature in google drive/docs. - sent from my phone. excuse the brevity. On 28 Oct 2015 17:34, "ShreeDevi Kumar" <shreesh...@gmail.com> wrote: > There is marathi traineddata. However that is not trained with cube engine > and hence

Re: [tesseract-ocr] Is there any difference using Tesseract on a mac or pc ?

2015-10-14 Thread ShreeDevi Kumar
manpages.ubuntu.com/manpages/precise/man1/tesseraact.1.html - sent from my phone. excuse the brevity. On 14 Oct 2015 19:40, "Bill Wong" wrote: > I've been comparing for the same image on PC and MAC, the results differ a > lot. > My images are PNG files, in french

Re: [tesseract-ocr] Is there any difference using Tesseract on a mac or pc ?

2015-10-14 Thread ShreeDevi Kumar
To use a particular language the syntax is -l fra Not -fra - sent from my phone. excuse the brevity. On 14 Oct 2015 19:40, "Bill Wong" wrote: > I've been comparing for the same image on PC and MAC, the results differ a > lot. > My images are PNG files, in french

  1   2   3   4   5   6   7   8   >