Re: [tesseract-ocr] Covering ASCII Extended range.

2014-11-14 Thread Ryan Dev
> > asc traineddata does not have a wordlist or dictionary, so using eng will > help with that. You mean unpack the wordlist from eng and pack it into the asc one? Or run tesseract with "eng+asc"? Currently I run each language in complete isolation from each other, and figure out the results

Re: [tesseract-ocr] Configure for single character recognition

2014-11-14 Thread Ryan Dev
It looks like all your characters are uppercase, but if that is not always the case, my experience with doing per character ocr in tesseract is it cannot handle capitalization properly. That is, is it a 'c' or a 'C'? I layout all my characters in a straight line, and get much better results us

Re: [tesseract-ocr] Configure for single character recognition

2014-11-14 Thread ShreeDevi Kumar
Have you tried with the existing english traineddata? I get good recognition with your 'prepared-image'? If that is the kind of image you need to OCR, you could do that with psm 6 and then split each letter separately? ShreeDevi भजन -

[tesseract-ocr] baseline in hOCR output

2014-11-14 Thread Janusz S. Bien
What is the meaning of the baselines parameters? In my outpur I have e.g. baseline -0.013 0 baseline -0.003 -18 What does it mean? Best regards Janusz -- Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (Katedra Lingwistyki Formalnej) Prof. Janusz S. Bień - University of Warsaw (Fo

[tesseract-ocr] Re: Exception in thread "main" java.lang.UnsatisfiedLinkError: liblept.so.4: Cannot load Shared-Object

2014-11-14 Thread Patrick Vöhrs
Now I get a new error: [workspace/project]bin/libtesseract.so: Ungültiger ELF-Header But I use a 32 Bit library on a 32-Bit JRE What's wrong with it? Am Mittwoch, 12. November 2014 12:35:42 UTC+1 schrieb Patrick Vöhrs: > > Hi at all, > > It's my first time using Tesseract, but I get only erro

Re: [tesseract-ocr] Re: query on French Script MT tif images

2014-11-14 Thread iram akbar
Hi, In tesseract tessdata folder there are different cube files ( ara.cube(word frequent file), ara.cube.nm, ara.cube.LM etc.) can anyone tell me the purpose of those files and who has created them. i am working in Arabic data training and want to create those files although until now i am su

[tesseract-ocr] Configure for single character recognition

2014-11-14 Thread Simon Støvring
Hello, I am trying to recognize single characters written with the Gotham Bold font. I have trained Tesseract by following Michael Jay Lissners guide "Adding New Fonts to Tesseract 3 OCR Engine" . I t

[tesseract-ocr] Re: Exception in thread "main" java.lang.UnsatisfiedLinkError: liblept.so.4: Cannot load Shared-Object

2014-11-14 Thread Patrick Vöhrs
Thanks for your reply, I have used Tess4J but I cannot get it working. It starts but this project is missing some librariers, I have already installed leptonica. Am Mittwoch, 12. November 2014 12:35:42 UTC+1 schrieb Patrick Vöhrs: > > Hi at all, > > It's my first time using Tesseract, but I get

Re: [tesseract-ocr] मराठी ओसीआर

2014-11-14 Thread ShreeDevi Kumar
Amarjeet, Glad that you are getting 70-80% correct OCR for Marathi using the Konkani traineddata I posted. The Hindi traineddata was trained with 'cube' method by Google but that is not available to us. The training can be improved with better training text or font similar to the one being OCRed

[tesseract-ocr] Re: Any multi-tiff tr file shapeclustering method?

2014-11-14 Thread summy00
I mean the training don't correct the wrong recognization, you can find the last page of tiff is orignal full English text. On Friday, November 14, 2014 4:26:37 PM UTC+8, summy00 wrote: > > Hi all, I just forget Compute the Character Set, after I add it, it works >> well. >> > > But I find it do

[tesseract-ocr] मराठी ओसीआर

2014-11-14 Thread Amarjeet Chopade
श्रीमान श्रीश्रीजी बहुत धन्यवाद आपने कोकणी ट्रेनडाटा फाईल भेजी। मैने तुरंत उपयोग किया। ७० से ८० प्रतिशत सही ओसीआर हो रहा है। मुझे एक शंका है, टेसडाटा फोल्डर मे हिन्दी के लिए ट्रेनडाटा फाईल के अलावा और भी सात फाईले होती है। वे क्या है? इसी तरह कोकणी की फाईले भी बनाकर उसमे पेस्ट की जाए तो? फिरसे ध

[tesseract-ocr] Re: Any multi-tiff tr file shapeclustering method?

2014-11-14 Thread summy00
> > Hi all, I just forget Compute the Character Set, after I add it, it works > well. > But I find it don't recognize well after trainning, any suggestion? > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this grou