Training Tesseract 3

2010-04-12 Thread rkvsraman
Hello, I am not able to find the training manual for tesseract 3. Please point me to one. Thanks -Raman -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to tesseract-...@googlegroups.com. To unsubscribe from

Traineddata Models for Indic Languages

2011-07-11 Thread rkvsraman
Hi, I am happy to let you the availability of Tesseract traineddata model for Indian languages. Currently the models are available for 9 indic scripts 1) Hindi 2) Tamil 3) Malayalam 4) Telugu 5) Kannada 6) Oriya 7) Gujarati 8) Gurmukhi 9) Bengali Please download the models from

Re: Shirorekha Splitting for Bengali

2013-02-14 Thread rkvsraman
, rkvsraman rkvs...@gmail.comjavascript: wrote: What code changes do i make for Tesseract to understand that Shirorekha splitting is required for Bengali or Punjabi? Thanks -Raman -- -- You received this message because you are subscribed to the Google Groups tesseract-ocr group

Re: Shirorekha Splitting for Bengali

2013-02-14 Thread rkvsraman
I am still not able to find the place in tesseract code where tesseract decides to do shirorekha clipping based on the language. -- -- You received this message because you are subscribed to the Google Groups tesseract-ocr group. To post to this group, send email to

Re: Shirorekha Splitting for Bengali

2013-02-14 Thread rkvsraman
at hin.config for clues. On Thu, Feb 14, 2013 at 12:05:52AM -0800, rkvsraman wrote: I do understand how clipping is done. What i need to know is how to direct Tesseract to do shirorekha clipping for a new language. For example , if i provide -l hin argument to tesseract , it does

Re: [tesseract-ocr] Announcing the Indic-OCR Project

2017-01-19 Thread rkvsraman
ntermediate process. Aborting... > > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Fri, Dec 9, 2016 at 1:13 AM, rkvsraman <rkvs...@gmail.com > > wrote: > >> >> Over past c

[tesseract-ocr] ERROR: Unrecognized argument --run_shape_clustering

2016-09-14 Thread rkvsraman
Hello, I am training for hindi using tesstrain.sh and I am getting this error "ERROR: Unrecognized argument --run_shape_clustering" run_shape_clustering is recommended for indic langs. I did a quick grep to find out that the argument is not used anywhere. Is it not used anymore? -- You

[tesseract-ocr] Number of Devnagari characters in unicharset

2016-09-14 Thread rkvsraman
Hello, The sample 'hin' langdata shows more that 1700 lines in hin.cube-unicharset while when we generate the unicharset from the training_text , only about 1175 characters are generated. Are we missing something or was a different training text used the sample unicharset? -- You

[tesseract-ocr] Re: Improving accuracy for the specific file

2016-10-04 Thread rkvsraman
You need to scan the page at atleast 300 dpi resolution for good quality recognition. I ran tesseract and this is what i got (PFA) On Tuesday, October 4, 2016 at 10:12:46 AM UTC+5:30, Neeraj Prakash wrote: > > Hi All, > > I have uploaded the source image file which (actual.jpg) was OCR-ed

Re: [tesseract-ocr] Cube models for Marathi and Sanskrit

2016-09-21 Thread rkvsraman
Thank you for that info. That helps. BTW which gui do u use for running tesseract or is it command line? On Wednesday, September 21, 2016 at 2:34:04 PM UTC+5:30, shree wrote: > > Also see the san.config file in the langdata directory > > ShreeDevi >

Re: [tesseract-ocr] Shapeclustering crashes on linux

2016-09-22 Thread rkvsraman
__ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Thu, Sep 22, 2016 at 5:14 PM, rkvsraman <rkvs...@gmail.com > > wrote: > >> >> Hello, >> >> I am running the shape clustering command and it crashes with followin

[tesseract-ocr] Cube models for Marathi and Sanskrit

2016-09-20 Thread rkvsraman
Hello, In tessdata , I see cube models only for hindi and not for Marathi and Sanskrit thought they have the same script. Any particular reason for this? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and

Re: [tesseract-ocr] Shapeclustering crashes on linux

2016-09-22 Thread rkvsraman
Hi, Shapeclustering doesn't crash after i added those Devanagari files, but it is now running for past 45 minutes and still hasnt got done. Is that normal? On Thursday, September 22, 2016 at 6:30:10 PM UTC+5:30, rkvsraman wrote: > > Let me try with Devanagari.* files > > Thank

RE: [tesseract-ocr] Help with unicharambigs

2016-09-27 Thread rkvsraman
I tried v1 too. Wasn't much useful  Sent from my Windows 10 phone From: rkvsraman -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-oc

[tesseract-ocr] Specify font in Tesseract

2016-11-09 Thread rkvsraman
Hello, If I know the font in which the scanned document is, can i specify it to Tesseract while recognizing. Something like tesseract tam.png tamreco -l tam -font "Noto Sans Tamil" Can i do this? -Raman -- You received this message because you are subscribed to the Google Groups

[tesseract-ocr] Script Detection

2016-11-07 Thread rkvsraman
Hello, I tried to detect the script of the above bengali image with command tesseract ben.png bensc - -psm 0 and i get following output in bensc.osd which detects the the script as Latin. Page number: 0 Orientation in degrees: 90 Rotate: 270 Orientation confidence: 1.48 Script: Latin

Re: [tesseract-ocr] Tesseract cannot recognize clean webpage screenshot

2016-11-10 Thread rkvsraman
Check if the DPI is about 300. Screenshots generally have lesser DPI. On Friday, November 11, 2016 at 5:50:23 AM UTC+5:30, JF wrote: > > I have an app that needs to recognize text in screenshots. > > Does that matter? I think this image is clean enough for Tesseract to > recognize? > > On

[tesseract-ocr] APPLY_BOXES: boxfile line FAILURE! Couldn't find a matching blob for മ in malayalam

2016-10-22 Thread rkvsraman
Hello, I am training tesseract for malayalam. The tif and the box files and the tesstrain log are shared here https://drive.google.com/drive/folders/0Bz8Xp0bwrlkdblNWMEZnaGpWTEk?usp=sharing Surprisingly i get errors for only the blobs which have the character മ in them. These blobs

Re: [tesseract-ocr] segdemo starting without the arguments "segdemo inter"

2016-10-17 Thread rkvsraman
No it does not. On Monday, October 17, 2016 at 3:38:16 PM UTC+5:30, shree wrote: > > Does this also happen when you use use Google provided kan.traineddata in > tesseract/tessdata repository? > > On 17 Oct 2016 11:48 a.m., "rkvsraman" <rkvs...@gmail.com >

[tesseract-ocr] java scrollview starts without segdemo inter

2016-10-16 Thread rkvsraman
Hello, I am training for Kannada and when i try to run a image for ocr after that , the java scrollview automatically starts without me passing the arguments segdemo inter. Any clues why that should happen? -- You received this message because you are subscribed to the Google Groups

[tesseract-ocr] Announcing the Indic-OCR Project

2016-12-08 Thread rkvsraman
Over past couple of months , I have been developing some tools based on tesseract. I am happy to announce the general availability of these tools Indic-OCR github project site. The url is https://indic-ocr.github.io/ Would love to get some feedback from folks at tesseract-ocr. Best