Re: Franken+ Released -- New Tool For Training Tesseract on Fonts from Page Images

2013-12-12 Thread Clemens Neudecker
Dear all, Thanks to Matt and Bryan for making me aware of this interesting discussion! My name is Clemens Neudecker and I have been the Technical Manager of the IMPACT project (www.impact-project.eu). Without going into greater detail about the points that have already been discussed at

Re: Franken+ Released -- New Tool For Training Tesseract on Fonts from Page Images

2013-12-12 Thread Nick White
Hi Bryan, On Tue, Dec 10, 2013 at 07:18:57PM -0600, Bryan Tarpley wrote: We've found that when two letters aren't touching, Tesseract has trouble identifying them together as a single ligature, /especially/ given that the character e by itself looks exactly the same as the one in ke. Oh,

Re: Franken+ Released -- New Tool For Training Tesseract on Fonts from Page Images

2013-12-12 Thread Nick White
Dear Clemens, There's lots of great stuff in your email, thanks so much for sending it! It'll take me a while to get through; I'm likely to reply again later on. I just took a look at the ocrevalUAtion tool, and read the pages at https://sites.google.com/site/textdigitisation/ - it looks very

Re: Franken+ Released -- New Tool For Training Tesseract on Fonts from Page Images

2013-12-12 Thread Clemens Neudecker
Hi Nick, Thanks for the encouraging reply, looking forward to further feedback and comments! I think the main distinction the https://sites.google.com/site/textdigitisation/ OCR list was trying to make is between online (web-based) services and stand-alone tools, but I will point out to the