For GUI you can try VietOCR - http://sourceforge.net/projects/vietocr/files/vietocr/
For Language data for sanskrit transliteration Try http://sourceforge.net/projects/tesseracthindi/files/Tesseract-3-02-SanskritTransliteration/ Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Tue, Nov 26, 2013 at 12:40 PM, Srivas <[email protected]> wrote: > Hi! > I have a bunch of PDF files journals and I need to get the text out of it. > They contain a lot of romanized sanskrit diacritical marks and that creates > a difficulty. I tried Finereader and OmniPage but they cannot be trained to > recognize those symbols. I just need an ORC program I can train to show any > symbol required and the above programs cannot do that. > > Where should I start from? I feel like this program can do the job but can > you help me to get started? I downloaded tesseract and installed it > (windows). There are different GUIs available and I think it will make it > easier to work. Can you suggest a good one? I tried gimagereader but it's > too primitive and leaves a lot of work to be done afterwards with the > overall text. > > I don't think this kind of language pack is available and how to create > it? > > I will add one pdf and fonts that were used to create it. Maybe someone > would like to try and let me know how to do it? > > Thank you for any help! > > Regards, > Srivas > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

