I'd like to announce an OCR challenge that will start soon, it is for an open source project and will include prizes.
I'm a part of the http://OpenPlaques.org/ project, the site collects flickr images of commemorative plaques which are manually transcribed and added to the site. Plaques generally have 20-100 words which describe a historic situation - the words are clear, often in white on a blue background. These plaques exist all over the UK (and in many other countries). The goal of the project is to make these historic locations easily searchable. Here's an example entry for Sir Whinston Churchill near me: http://www.openplaques.org/plaques/990 The project founders *manually* transcribe the plaque photos at present - this is a crazy situation as they have several thousand plaques outstanding and more are added every day. The project is now international (it started in the UK less than a year ago) and an automatic transcription system is sorely needed. As a part of my play-time projects I've setup an Artificial Intelligence Cookbook site where I'm building a community of like-minded folk who like solving interesting challenges. I've already documented a work-in-progress report on a manual solution to this problem using tesseract 2: http://blog.aicookbook.com/2010/06/optical-character-recognition-webservice-work-in-progress/ and I've just posted a software outline in Python for (bad!) automatic recognition: http://blog.aicookbook.com/2010/07/automatic-plaque-transcription-using-python-work-in-progress/ The OpenPlaques project are building a corpus of images with transcriptions for me, once we have a good set of images I'll begin the challenge. This should be in the next two weeks. You can see my demo code and a suggested solution here: http://aicookbook.com/wiki/Automatic_plaque_transcription and I'm *very* open to feedback in our Google Group: http://groups.google.com/group/aicookbook I'll run the competition for several months with a prize for the best solution each month. Solutions get open sourced and sooner or later a good automatic solution will be created which can start automatically transcribing the OpenPlaques corpus of images. Winners will also get their name listed on the OpenPlaques site. If you'd like to test your skills with OCR then you'll find a good range of images to work on - from simple clean shots to angled, dark, smudged images of weather-beaten plaques taken at a distance. Cheers, Ian. -- Ian Ozsvald (A.I. researcher, screencaster) [email protected] http://IanOzsvald.com http://MorConsulting.com/ http://blog.AICookbook.com/ http://TheScreencastingHandbook.com http://FivePoundApp.com/ http://twitter.com/IanOzsvald -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en.

