That's a great idea -- I don't have spare time for new projects at the
moment, but I wonder if something like OpenOCR might be useful as a
starting point for an effort like this: https://github.com/tleyden/open-ocr
On Tuesday, March 21, 2017 at 4:03:52 PM UTC-4, Rich Jones wrote:
>
> Hello,
Hi Zdenko,
Sure, no problem -- I've made all the files, along with instructions, at
https://github.com/ddohler/tesseract-georgian
Cheers,
Derek
On Fri, Apr 3, 2015 at 4:06 AM, zdenko podobny zde...@gmail.com wrote:
Can you create a repository for your training (in sourceforge or github
can
handle (~10px), so I doubled the image size. This resulted in much improved
recognition; there are still errors, but fewer of them and they make
sense now. Tesseract isn't able to segment the 5-column page layout very
well, but otherwise I'm pretty happy with the results.
Derek
On Thu, Apr 2
ShreeDevi,
Where did this training text come from? It includes two different Georgian
scripts (mkhedruli and asomtavruli). Only mkhedruli is in common usage
today, so it seems to me that it would be best to remove the asomtavruli to
increase accuracy on modern texts. If complete historical
this training set once it has been improved
somewhat.
Cheers,
Derek
On Tuesday, June 3, 2014 12:03:43 PM UTC-4, Nick White wrote:
Hi Derek,
Thanks for this. It does indeed look pretty good, from my brief
testing (though I don't know Georgian at all, so I'm only basing it
on those shapes look
to the scripts and README to make this clear, so I
suggest doing 'git pull' to get the latest copy.
Hope that helps!
Derek
On Sunday, June 3, 2012 10:29:26 PM UTC+4, shikamuk wrote:
Hey, Derek.
Thank you for scripts, they seem to work.
However, a couple of questions:
0. So, I've
on the list in case anyone else
finds them useful.
Just a head's up, the default language is Georgian because that's what I'm
training for, so make sure to change that to your language when training.
https://github.com/ddohler/tess_school
Cheers,
Derek
--
You received this message because you
Strangely, the spammer who was sending tons of messages to the list is
also Georgian and claims that his software works on the Georgian
language. I'm planning to download his software tonight and (after
carefully checking for viruses) test it out.
Will respond to PM momentarily.
Derek
On May 17
Hi Roast,
It is locally adapted binarization; see here for more details:
http://www.leptonica.com/binarization.html
On Mon, Feb 20, 2012 at 2:30 PM, Roast zhang.lib...@gmail.com wrote:
Hi, Derek Dohler, could you tell me the detail of process the image to get
the better result?
Thanks
that your results will improve significantly.
Derek
On Feb 19, 2012, at 4:58 , Jason Funk wrote:
My specific examples are screen captures of powerpoint slides. For
example, what would need to be done to this image?
http://jasonfunk.net/example2.jpeg
On Feb 18, 6:03 pm, Sven Pedersen
Hi Sriranga,
Many thanks for doing this -- I haven't had time to test it myself yet.
What is your assessment of the effect on processing time?
Cheers,
Derek
2012/2/9 Sriranga(78yrs) withblessing.sriranga.1...@gmail.com
Derek,
Again tested using version 3.02 for combinations of * four
I'm excited by this:
Added simultaneous multi-language capability.
Can you provide any info on how this works?
Cheers,
Derek
On Fri, Feb 3, 2012 at 4:32 PM, Sriranga(78yrsold)
withblessi...@gmail.comwrote:
Attached release notes for 3.02. Download can be done from svn of the
project
to configure tesseract.
There isn't much explanation about what they *do*, but hopefully that's enough
to get you started.
Derek
On Jun 17, 2011, at 12:12 , Steve wrote:
Where can I find a complete list of [config] parameters for using in:
tesaract image outputbasename [configs]
I searched
in some cases. I am hoping that
providing a punc-dawg is the solution, but I haven't been able to find a good
resource for this, either in the list archives or in the source files.
Can anyone tell me what type of file I should use to create the punc-dawg and
number-dawg files?
Thanks!
Derek
+English, Georgian+English+Russian, and use the appropriate one. This
is my fallback option since it seems the most likely to work while maintaining
maximum accuracy.
Any advice, please let me know, thanks!
Derek
--
You received this message because you are subscribed to the Google
Groups
15 matches
Mail list logo