I'd like to announce an OCR challenge that will start soon, it is for
an open source project and will include prizes.

I'm a part of the http://OpenPlaques.org/ project, the site collects
flickr images of commemorative plaques which are manually transcribed
and added to the site. Plaques generally have 20-100 words which
describe a historic situation - the words are clear, often in white on
a blue background. These plaques exist all over the UK (and in many
other countries). The goal of the project is to make these historic
locations easily searchable.

Here's an example entry for Sir Whinston Churchill near me:
http://www.openplaques.org/plaques/990

The project founders *manually* transcribe the plaque photos at
present - this is a crazy situation as they have several thousand
plaques outstanding and more are added every day. The project is now
international (it started in the UK less than a year ago) and an
automatic transcription system is sorely needed.

As a part of my play-time projects I've setup an Artificial
Intelligence Cookbook site where I'm building a community of
like-minded folk who like solving interesting challenges. I've already
documented a work-in-progress report on a manual solution to this
problem using tesseract 2:
http://blog.aicookbook.com/2010/06/optical-character-recognition-webservice-work-in-progress/
and I've just posted a software outline in Python for (bad!) automatic
recognition:
http://blog.aicookbook.com/2010/07/automatic-plaque-transcription-using-python-work-in-progress/

The OpenPlaques project are building a corpus of images with
transcriptions for me, once we have a good set of images I'll begin
the challenge. This should be in the next two weeks.

You can see my demo code and a suggested solution here:
http://aicookbook.com/wiki/Automatic_plaque_transcription
and I'm *very* open to feedback in our Google Group:
http://groups.google.com/group/aicookbook

I'll run the competition for several months with a prize for the best
solution each month. Solutions get open sourced and sooner or later a
good automatic solution will be created which can start automatically
transcribing the OpenPlaques corpus of images. Winners will also get
their name listed on the OpenPlaques site.

If you'd like to test your skills with OCR then you'll find a good
range of images to work on - from simple clean shots to angled, dark,
smudged images of weather-beaten plaques taken at a distance.

Cheers,
Ian.

-- 
Ian Ozsvald (A.I. researcher, screencaster)
[email protected]

http://IanOzsvald.com
http://MorConsulting.com/
http://blog.AICookbook.com/
http://TheScreencastingHandbook.com
http://FivePoundApp.com/
http://twitter.com/IanOzsvald

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/tesseract-ocr?hl=en.

Reply via email to