The next release of OCRopus supports retraining much better than the current
release.

The process for training will be:

(1) manually label a small number of characters and train a small model
(2) transcribe a larger amount of text at the line (or block) level and
train a larger model
(3) refine the model on even larger unlabeled text

For (1) and (2), you can also use the output from an existing OCR engine,
even if has a fairly high error rate; OCRopus can use that for
bootstrapping.

For alphabetic scripts with a modest number of diacritics (and old Irish
script qualifies), that should work quite well.

All the functionality is there and working.  However, the next OCRopus
release keeps getting pushed back because of changes and refactoring imposed
by external dependencies and requirements--we're doing a lot of refactoring
and code cleanups right now; we're now aiming for a mid-March release.

The good news is that both the command line and the installation should be
significantly simplified, and that it will be easier for external developers
to contribute than it has been in the past.

Tom


On Mon, Feb 23, 2009 at 15:32, John Lunney <[email protected]> wrote:

>
> Hi guys,
> I'm working for an Irish dictionary project. As part of this, I'm
> looking into various scanning software. We already have OmniPage, but
> I'm having trouble teaching it to read the old Irish script (mainly
> due to its labyrinthine interface and predilection for racing ahead in
> the process).
>
> Here's a sample of the old Irish script ("cló Gaelach"):
> http://www.photopol.com/gaeilge/ar_nathar.jpg
>
> How is Ocropus's performance for non-Latin scripts? Would it be usable
> in a production setting?
> Should I look at any other packages?
>
> Any advice much appreciated,
> John Lunney
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to