[tesseract-ocr] Re: FOSS Project Proposal: tesseract-cloud

2017-03-22 Thread Derek
That's a great idea -- I don't have spare time for new projects at the moment, but I wonder if something like OpenOCR might be useful as a starting point for an effort like this: https://github.com/tleyden/open-ocr On Tuesday, March 21, 2017 at 4:03:52 PM UTC-4, Rich Jones wrote: > > Hello,

Re: [tesseract-ocr] New Georgian (kartuli ena) traineddata for Tesseract

2015-04-03 Thread Derek Dohler
Hi Zdenko, Sure, no problem -- I've made all the files, along with instructions, at https://github.com/ddohler/tesseract-georgian Cheers, Derek On Fri, Apr 3, 2015 at 4:06 AM, zdenko podobny zde...@gmail.com wrote: Can you create a repository for your training (in sourceforge or github

Re: [tesseract-ocr] New Georgian (kartuli ena) traineddata for Tesseract

2015-04-02 Thread Derek Dohler
can handle (~10px), so I doubled the image size. This resulted in much improved recognition; there are still errors, but fewer of them and they make sense now. Tesseract isn't able to segment the 5-column page layout very well, but otherwise I'm pretty happy with the results. Derek On Thu, Apr 2

Re: [tesseract-ocr] Support Language

2014-11-07 Thread Derek
ShreeDevi, Where did this training text come from? It includes two different Georgian scripts (mkhedruli and asomtavruli). Only mkhedruli is in common usage today, so it seems to me that it would be best to remove the asomtavruli to increase accuracy on modern texts. If complete historical

Re: [tesseract-ocr] Re: Anyone working on Georgian (kartuli ena)?

2014-10-10 Thread Derek
this training set once it has been improved somewhat. Cheers, Derek On Tuesday, June 3, 2014 12:03:43 PM UTC-4, Nick White wrote: Hi Derek, Thanks for this. It does indeed look pretty good, from my brief testing (though I don't know Georgian at all, so I'm only basing it on those shapes look

Re: Scripts to semi-automate training

2012-06-04 Thread Derek
to the scripts and README to make this clear, so I suggest doing 'git pull' to get the latest copy. Hope that helps! Derek On Sunday, June 3, 2012 10:29:26 PM UTC+4, shikamuk wrote: Hey, Derek. Thank you for scripts, they seem to work. However, a couple of questions: 0. So, I've

Scripts to semi-automate training

2012-05-24 Thread Derek Dohler
on the list in case anyone else finds them useful. Just a head's up, the default language is Georgian because that's what I'm training for, so make sure to change that to your language when training. https://github.com/ddohler/tess_school Cheers, Derek -- You received this message because you

Re: Anyone working on Georgian (kartuli ena)?

2012-05-17 Thread Derek
Strangely, the spammer who was sending tons of messages to the list is also Georgian and claims that his software works on the Georgian language. I'm planning to download his software tonight and (after carefully checking for viruses) test it out. Will respond to PM momentarily. Derek On May 17

Re: Tesseract vs Commercial Products

2012-02-20 Thread Derek Dohler
Hi Roast, It is locally adapted binarization; see here for more details: http://www.leptonica.com/binarization.html On Mon, Feb 20, 2012 at 2:30 PM, Roast zhang.lib...@gmail.com wrote: Hi, Derek Dohler, could you tell me the detail of process the image to get the better result? Thanks

Re: Tesseract vs Commercial Products

2012-02-18 Thread Derek Dohler
that your results will improve significantly. Derek On Feb 19, 2012, at 4:58 , Jason Funk wrote: My specific examples are screen captures of powerpoint slides. For example, what would need to be done to this image? http://jasonfunk.net/example2.jpeg On Feb 18, 6:03 pm, Sven Pedersen

Re: Version 3.02 in alpha

2012-02-08 Thread Derek Dohler
Hi Sriranga, Many thanks for doing this -- I haven't had time to test it myself yet. What is your assessment of the effect on processing time? Cheers, Derek 2012/2/9 Sriranga(78yrs) withblessing.sriranga.1...@gmail.com Derek, Again tested using version 3.02 for combinations of * four

Re: Version 3.02 in alpha

2012-02-03 Thread Derek Dohler
I'm excited by this: Added simultaneous multi-language capability. Can you provide any info on how this works? Cheers, Derek On Fri, Feb 3, 2012 at 4:32 PM, Sriranga(78yrsold) withblessi...@gmail.comwrote: Attached release notes for 3.02. Download can be done from svn of the project

Re: List of Config Paramenters

2011-06-17 Thread Derek Dohler
to configure tesseract. There isn't much explanation about what they *do*, but hopefully that's enough to get you started. Derek On Jun 17, 2011, at 12:12 , Steve wrote: Where can I find a complete list of [config] parameters for using in: tesaract image outputbasename [configs] I searched

Making punc-dawg and number-dawg?

2011-06-14 Thread Derek Dohler
in some cases. I am hoping that providing a punc-dawg is the solution, but I haven't been able to find a good resource for this, either in the list archives or in the source files. Can anyone tell me what type of file I should use to create the punc-dawg and number-dawg files? Thanks! Derek

Best multiple language option?

2011-06-03 Thread Derek Dohler
+English, Georgian+English+Russian, and use the appropriate one. This is my fallback option since it seems the most likely to work while maintaining maximum accuracy. Any advice, please let me know, thanks! Derek -- You received this message because you are subscribed to the Google Groups