Here is something about automated corrections: http://ilk.uvt.nl/downloads/pub/papers/CICLING08.TICCL.MRE.postpublication.pdf
Unrelated to the above, I would like to use languagetool.org to automate corrections. So much to do, so little time.. On Tuesday, September 2, 2014 11:13:50 AM UTC-4, Pierre Lison wrote: > > > Hi, > > I'm a researcher in statistical machine translation, and use for my work > of bunch of translated texts (in multiple languages), some of which were > automatically generated via OCR. I recently noticed that some texts > included subtantial numbers of OCR errors, which I would of course like to > correct to improve the quality of my data. > > I was therefore wondering if I could use tesseract or some related > software tool in order to correct at least some of these OCR-generated > errors (through e.g. statistical language modelling techniques). Note that > I unfortunately don't have access to the original scans, I only have the > raw, OCR-produced text. > > Any suggestions? > > Thanks! > > Pierre > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/715ce30f-c574-446a-997a-d5dfb137d89b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

