Please see http://www.ucsc.cmb.ac.lk/sdu/research.html
http://192.248.22.122/ocrsinhala/upload.php Here is the output from it: ටුද්රණි:ල .ය්චත වැට වරීජන:: ඵාෂ්. ඨ:ර්චූකට පවන්චි:යගැ න ::න චූට කූ- එ0 දූකූ:ගයගැ 0පි පිශ්රීබඳව රජය:ෘන් ඉදීරිෂන් කූයරන ය:ට,රණ් ච්ඝ දූ0කට 9දාද්රඩා භ:තපිජං .ාරීග ාඝන් ප්රශ",නය පිඝඳ: ග::චූටිට ාද්රංයහාර:ක්ත: වන ඛචද්ර තීාඝ. ථි 9තර ඉත.න් :0ද: ::ංළක් :: ව:ග චරීජනජෙි ළශද්රණු අ:: බීශින් න:ර:ණු ගැ: ක:ළරන බව කි::න අනචසූතකඅ ඝමඛන්ඩශයක් වෘන්කිළ ඝමින ඒක:බ6හ යණ්ගැංසූ 8: ත්රං.උළ ඩාය පද්ර ගට නි::බී. ට්රද්රන්ාඋ යහ්ච්ත ව'ඩා වජී චන ළග:ණීරණ් ක: ඝංළක්න ජන. පිශ්රීබදච රජය ත්රිභින් භළ ගැධබ්න්ඩළ::න් තවළන් වග වරජනා::න් පසූව ජංකික ඉදිථීපන. කූංරන ඟඋ මිඝඳු ල වීතඳු.ක් රජ:න් ඉදි5 ගප්ධපි80 ඝංශ,එ:යථී නල්පිපි:: ටික් ඝමබන්ඩාඟයන් වෘන්කි:: න් වි නතී ඛච්ළ වාන'කිය කමගළල් හට::ගැන කිගඛන යමිනි එ'"ත:බග්ඩ 0ණ්)ටලජෙි ථීත ඒළ:ඛද්ඩ ළණ්ඩලයළ:' ාශ්චක අචූලු අභූ 884::,) තර,ණු ගල්කඛ 0ංඩාලඝ ව:ශිදුරටක් ප්රතෘශකළ ::0 කාළ්ඝ. ද වෘක්ති. ඝමිති ඒඛද:ඛ ශ:තවජං යත:ප නිරණ්ළකච ාඝ:::ළ :ක්ෂය දිෘ: ළං:ාංංශ් -ංළ::ං ාංගං: එචූම්ළ,න් ළචූං ව:ක්තිළ ළටිති %ලළ ංං:ර:ළ, ෂ 8ළෘඳා ළශන් දැහෘළන් පූංල ෂං එක:ඛළ'ඩ ංණ්"ඩාළඝ ත:ඳම් තසූ::. ක්රිංෂ ට්:ජීග, තීද්රඩ: ාළන් ඝංකචජ: ක්ෂංච නෂෘ. ළටිඝ:න නිගළනළතප එළබ්ළ1ච ක්ංංගැ ංෂ්ප:::යප ෘදූ පූද්රණං ංංහාෘ ා:ා ඝ- ඛළ,ඥාළථ-න්ත ළපි. ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Mar 15, 2015 at 8:15 PM, Ruwanka De Silva < [email protected]> wrote: > Hi All, > > I am trying to train tesseract for Sinhalese language, for recognize text > in old Sinhalese newspapers. I am new for tesseract and I have few > questions about how to prepare training data for best results. So these are > my questions, > > 1. What is the best resolution (dpi) for training data? > 2. I supposed to do binarization and some enhancements as a preprocessing > before doing ocr, so will teseract give best results if I train it for > preprocessed images or will it give best results if I train it for raw > images (attached herewith)? > 3. I don't have font related with these images so I couldn't create > training data myself, so are there any solution for creating training data > other than using scanned images of newspapers? > 4. Sinahales has huge character set which include different diacritics for > modify the phonetic sound/meaning of a letter so what are the steps do I > have to take in order to increase accuracy? > > Any help would be appreciated. > > Regards, > Ruwanka De Silva > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/8d8ad5b8-e3d7-4581-8972-1b631f5bc1c5%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/8d8ad5b8-e3d7-4581-8972-1b631f5bc1c5%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduU3XGzGvO%3Dzyv1b51dSaztXmT%3DThU3RoB%2B2R2-4p%3DnAsg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

