Hello,
I am not able to find the training manual for tesseract 3. Please
point me to one.
Thanks
-Raman
--
You received this message because you are subscribed to the Google Groups
tesseract-ocr group.
To post to this group, send email to tesseract-...@googlegroups.com.
To unsubscribe from
Hi,
I am happy to let you the availability of Tesseract traineddata model
for Indian languages.
Currently the models are available for 9 indic scripts
1) Hindi
2) Tamil
3) Malayalam
4) Telugu
5) Kannada
6) Oriya
7) Gujarati
8) Gurmukhi
9) Bengali
Please download the models from
, rkvsraman rkvs...@gmail.comjavascript:
wrote:
What code changes do i make for Tesseract to understand that Shirorekha
splitting is required for Bengali or Punjabi?
Thanks
-Raman
--
--
You received this message because you are subscribed to the Google
Groups tesseract-ocr group
I am still not able to find the place in tesseract code where tesseract
decides to do shirorekha clipping based on the language.
--
--
You received this message because you are subscribed to the Google
Groups tesseract-ocr group.
To post to this group, send email to
at hin.config for clues.
On Thu, Feb 14, 2013 at 12:05:52AM -0800, rkvsraman wrote:
I do understand how clipping is done. What i need to know is how to
direct
Tesseract to do shirorekha clipping for a new language.
For example , if i provide -l hin argument to tesseract , it does
ntermediate process. Aborting...
>
>
> ShreeDevi
>
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Fri, Dec 9, 2016 at 1:13 AM, rkvsraman <rkvs...@gmail.com
> > wrote:
>
>>
>> Over past c
Hello,
I am training for hindi using tesstrain.sh and I am getting this error
"ERROR: Unrecognized argument --run_shape_clustering" run_shape_clustering
is recommended for indic langs. I did a quick grep to find out that the
argument is not used anywhere.
Is it not used anymore?
--
You
Hello,
The sample 'hin' langdata shows more that 1700 lines in hin.cube-unicharset
while when we generate the unicharset from the training_text , only about
1175 characters are generated.
Are we missing something or was a different training text used the sample
unicharset?
--
You
You need to scan the page at atleast 300 dpi resolution for good quality
recognition.
I ran tesseract and this is what i got (PFA)
On Tuesday, October 4, 2016 at 10:12:46 AM UTC+5:30, Neeraj Prakash wrote:
>
> Hi All,
>
> I have uploaded the source image file which (actual.jpg) was OCR-ed
Thank you for that info.
That helps. BTW which gui do u use for running tesseract or is it command
line?
On Wednesday, September 21, 2016 at 2:34:04 PM UTC+5:30, shree wrote:
>
> Also see the san.config file in the langdata directory
>
> ShreeDevi
>
__
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Thu, Sep 22, 2016 at 5:14 PM, rkvsraman <rkvs...@gmail.com
> > wrote:
>
>>
>> Hello,
>>
>> I am running the shape clustering command and it crashes with followin
Hello,
In tessdata , I see cube models only for hindi and not for Marathi and
Sanskrit thought they have the same script.
Any particular reason for this?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and
Hi,
Shapeclustering doesn't crash after i added those Devanagari files, but it
is now running for past 45 minutes and still hasnt got done. Is that normal?
On Thursday, September 22, 2016 at 6:30:10 PM UTC+5:30, rkvsraman wrote:
>
> Let me try with Devanagari.* files
>
> Thank
I tried v1 too. Wasn't much useful
Sent from my Windows 10 phone
From: rkvsraman
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-oc
Hello,
If I know the font in which the scanned document is, can i specify it to
Tesseract while recognizing.
Something like
tesseract tam.png tamreco -l tam -font "Noto Sans Tamil"
Can i do this?
-Raman
--
You received this message because you are subscribed to the Google Groups
Hello,
I tried to detect the script of the above bengali image with command
tesseract ben.png bensc - -psm 0
and i get following output in bensc.osd which detects the the script as
Latin.
Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 1.48
Script: Latin
Check if the DPI is about 300. Screenshots generally have lesser DPI.
On Friday, November 11, 2016 at 5:50:23 AM UTC+5:30, JF wrote:
>
> I have an app that needs to recognize text in screenshots.
>
> Does that matter? I think this image is clean enough for Tesseract to
> recognize?
>
> On
Hello,
I am training tesseract for malayalam. The tif and the box files and the
tesstrain log are shared
here
https://drive.google.com/drive/folders/0Bz8Xp0bwrlkdblNWMEZnaGpWTEk?usp=sharing
Surprisingly i get errors for only the blobs which have the character മ in
them.
These blobs
No it does not.
On Monday, October 17, 2016 at 3:38:16 PM UTC+5:30, shree wrote:
>
> Does this also happen when you use use Google provided kan.traineddata in
> tesseract/tessdata repository?
>
> On 17 Oct 2016 11:48 a.m., "rkvsraman" <rkvs...@gmail.com >
Hello,
I am training for Kannada and when i try to run a image for ocr after that
, the java scrollview automatically starts without me passing the
arguments segdemo inter.
Any clues why that should happen?
--
You received this message because you are subscribed to the Google Groups
Over past couple of months , I have been developing some tools based on
tesseract.
I am happy to announce the general availability of these tools Indic-OCR
github project site.
The url is https://indic-ocr.github.io/
Would love to get some feedback from folks at tesseract-ocr.
Best
21 matches
Mail list logo