The next release of OCRopus supports retraining much better than the current release.
The process for training will be: (1) manually label a small number of characters and train a small model (2) transcribe a larger amount of text at the line (or block) level and train a larger model (3) refine the model on even larger unlabeled text For (1) and (2), you can also use the output from an existing OCR engine, even if has a fairly high error rate; OCRopus can use that for bootstrapping. For alphabetic scripts with a modest number of diacritics (and old Irish script qualifies), that should work quite well. All the functionality is there and working. However, the next OCRopus release keeps getting pushed back because of changes and refactoring imposed by external dependencies and requirements--we're doing a lot of refactoring and code cleanups right now; we're now aiming for a mid-March release. The good news is that both the command line and the installation should be significantly simplified, and that it will be easier for external developers to contribute than it has been in the past. Tom On Mon, Feb 23, 2009 at 15:32, John Lunney <[email protected]> wrote: > > Hi guys, > I'm working for an Irish dictionary project. As part of this, I'm > looking into various scanning software. We already have OmniPage, but > I'm having trouble teaching it to read the old Irish script (mainly > due to its labyrinthine interface and predilection for racing ahead in > the process). > > Here's a sample of the old Irish script ("cló Gaelach"): > http://www.photopol.com/gaeilge/ar_nathar.jpg > > How is Ocropus's performance for non-Latin scripts? Would it be usable > in a production setting? > Should I look at any other packages? > > Any advice much appreciated, > John Lunney > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
