[tesseract-ocr] Re: Post-correction of OCR-generated text

Rick Leir Fri, 05 Sep 2014 06:50:13 -0700

Here is something about automated corrections:

http://ilk.uvt.nl/downloads/pub/papers/CICLING08.TICCL.MRE.postpublication.pdf


Unrelated to the above, I would like to use languagetool.org to automate 
corrections.  So much to do, so little time..


On Tuesday, September 2, 2014 11:13:50 AM UTC-4, Pierre Lison wrote:
>
>
> Hi,
>
> I'm a researcher in statistical machine translation, and use for my work 
> of bunch of translated texts (in multiple languages), some of which were 
> automatically generated via OCR.  I recently noticed that some texts 
> included subtantial numbers of OCR errors, which I would of course like to 
> correct to improve the quality of my data.
>
> I was therefore wondering if I could use tesseract or some related 
> software tool in order to correct at least some of these OCR-generated 
> errors (through e.g. statistical language modelling techniques).  Note that 
> I unfortunately don't have access to the original scans, I only have the 
> raw, OCR-produced text.  
>
> Any suggestions?
>
> Thanks!
>
> Pierre
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/715ce30f-c574-446a-997a-d5dfb137d89b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: Post-correction of OCR-generated text

Reply via email to