If you look at the readme files in the diff subdirectories starting with
OCR under
https://github.com/Shreeshrii/imagessan/tree/master you will see results of
character and word level accuracy. Depending on the font, character level
accuracy is around 80% and word level accuracy around 60%

I have not used it for actual OCR of any text because sanskritocr software
by dr. Oliver hellwig gives better results.

See https://sites.google.com/site/sanskritcode/ocr/1-ocr-ing

- sent from my phone. excuse the brevity.
On 13-Jun-2016 6:53 pm, "ShreeDevi Kumar" <[email protected]> wrote:

> Yes, hin traineddata with cube gives better results than san.
>
> I did some rudimentary testing with the new traineddata I made. It does
> not use cube. Look at the config files, it has some options for devanagari
> processing.
>
> You could try to unpack the hin traineddata and then remake the Dawg files
> using sanskrit wordlists and combine them as an experiment.
>
> If you have unicode version of the font used for the docs you want to OCR,
> then train using that.
>
> - sent from my phone. excuse the brevity.
> On 13-Jun-2016 4:47 pm, "rohit saluja" <[email protected]> wrote:
>
>> Thanks again for replying. I will surely check them out.
>>
>> My experience is that OCR on sanskrit data with hin.traineddata gives
>> better results than san.traineddata. I do know know, it is due to cube mode
>> or devanagari preprocessing(segmentation i guess) in devanagari?
>>
>> I wonder why such preprocessing is not applied in san.traineddata.
>> Please let me know whether you are using cube mode in your traineddata or
>> not, and are you using devanagari preprocessing?
>>
>> On Mon, Jun 13, 2016 at 9:18 AM, ShreeDevi Kumar <[email protected]>
>> wrote:
>>
>>> Google has not provided images and box files for San.traineddata
>>> released for 3.04
>>>
>>> I tried training using text2image with a combination of different fonts
>>> and training text. Results are at
>>> https://github.com/Shreeshrii/imagessan/tree/master/tessdata
>>>
>>> You can give these a try to see if recognition is any better.
>>>
>>> You can unpack any trained data file using -u option with
>>> combine-tessdata to see the config files etc.
>>>
>>> http://manpages.ubuntu.com/manpages/trusty/man1/combine_tessdata.1.html
>>>
>>> Use the dawg2wordlist to look at the various dictionary word lists used.
>>>
>>> http://manpages.ubuntu.com/manpages/trusty/man1/dawg2wordlist.1.html
>>>
>>> - sent from my phone. excuse the brevity.
>>> On 12-Jun-2016 11:26 am, "rohit saluja" <[email protected]> wrote:
>>>
>>>> Hey thanks for replying.
>>>> Which options to use with text2image command? Also, is there any
>>>> configuration file and fonts list?
>>>>
>>>> I tried the default option of text2image with tesseract github training
>>>> data with sanskrit 2003, but the recognition results are far away from
>>>> san.traineddata file on github.
>>>> Any help in matching san.traineddata results, starting from the
>>>> scratch, would be highly appreciated.
>>>>
>>>> Thanks in advance
>>>> Rohit
>>>>
>>>> On Friday, 6 May 2016 12:59:44 UTC+5:30, rohit saluja wrote:
>>>>
>>>>> Do we have Sanskrit training images and box files available online?
>>>>>
>>>>> Thanks
>>>>> Rohit
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/tesseract-ocr/45767a89-cd11-4f39-9622-3fe7b4d20a4a%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/45767a89-cd11-4f39-9622-3fe7b4d20a4a%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/tesseract-ocr/apmhpJ3K924/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXfqoY_BSW9BURAbj_AzdtRykK2ea5e9G2Suq9QCeWMOA%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXfqoY_BSW9BURAbj_AzdtRykK2ea5e9G2Suq9QCeWMOA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/CAEga%2BsUNCmGHEmPB0fBZjgPmEAXvWNtzzdkkKK%3DRcd_u25f%2B1Q%40mail.gmail.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/CAEga%2BsUNCmGHEmPB0fBZjgPmEAXvWNtzzdkkKK%3DRcd_u25f%2B1Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWu3-cLcTHi2e%3D0Zr15Do5nawfG93k_dXvBeBwze%2BMHfw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to