After my experimentation and trial of last two months using Tesseract OCR
for Hindi/Sanskrit, I would like to update the forum members, who have been
very helpful in providing info and guidance of the results so far.
I have posted the training source files as well as traineddata for hindi -
On Thu, Apr 18, 2013 at 5:35 AM, sdk shreesh...@gmail.com wrote:
Zdenko,
You wrote:
He can create another data and use it together with data provided by
google.
Does this mean that we can use the ability of tessearct to use multiple
languages for recognition to use multiple traineddata
Thanks,
Yes, Google has provided hin.traineddata which gives good results.
I was trying to see whether it was possible to further train it with
additional fonts.
On Tuesday, April 16, 2013 10:50:24 PM UTC+5:30, rākēśvara rāvu wrote:
I think google has an internal traineddata file for
This is covered in the FAQ:
https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_do_I_add_just_one_character_or_one_font_to_my_favourite_lang
which links to the training WIKI
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
--Sven
On Wed, Apr 17, 2013 at 7:24 AM, sdk
On Wed, 17 Apr 2013, Sven Pedersen wrote:
This is covered in theFAQ:https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_
do_I_add_just_one_character_or_one_font_to_my_favourite_l
ang
which links to the training WIKI
https://code.google.com/p/tesseract-ocr/wiki/TrainingTess
eract3
--Sven
Rob,
You can add fonts to existing languages. Just follow the combine
instructions.
Sven
On Wednesday, April 17, 2013, Robert Komar wrote:
On Wed, 17 Apr 2013, Sven Pedersen wrote:
This is covered in theFAQ:https://code.google.**
On Wed, Apr 17, 2013 at 10:41 PM, Sven Pedersen sven.peder...@gmail.comwrote:
Rob,
You can add fonts to existing languages. Just follow the combine
instructions.
As far as I know, it is not possible. He can create another data and use it
together with data provided by google.
Sven
On
On Wed, Apr 17, 2013 at 10:36 PM, Robert Komar rko...@telus.net wrote:
On Wed, 17 Apr 2013, Sven Pedersen wrote:
This is covered in theFAQ:https://code.google.**
com/p/tesseract-ocr/wiki/FAQ#**How_https://code.google.com/p/tesseract-ocr/wiki/FAQ#How_
Thanks. I did follow the training wiki.
However, since Hindi uses CUBE mode, it is not possible to train for that.
I am trying to train for san - Sanskrit which uses the same devanagari
script, in Non-cube mode.
On Thu, Apr 18, 2013 at 1:34 AM, Sven Pedersen sven.peder...@gmail.comwrote:
Thanks, Zdenko!
I think it would be helpful to add this to the training pages wiki in the
next update.
If possible, also add a list of the languages that use the Cube mode.
On Thu, Apr 18, 2013 at 3:05 AM, zdenko podobny zde...@gmail.com wrote:
I remember one user post, that he
Zdenko,
You wrote:
He can create another data and use it together with data provided by
google.
Does this mean that we can use the ability of tessearct to use multiple
languages for recognition to use multiple traineddata files for same 'real'
language but with different language codes?
I think google has an internal traineddata file for devanagari, because
sometimes when you search for sanskrit stuff it gives results from google
books. so it is possible.
On Sat, Mar 30, 2013 at 7:18 PM, sdk shreesh...@gmail.com wrote:
Hello,
I have recently installed tesseract-ocr 3.02 on
Hello,
I have recently installed tesseract-ocr 3.02 on windows 7 and am training
it for sanskrit2003 font for Hindi.
1. While running unicharset_extractor I received the error
Utf8 buffer too big, size=57 for
à☼½à☼_à☼¿à¥?à¥?à¥,ृà¥,à¥.à¥+à¥╪à¥^à¥%à¥Sà¥à¥ Oà¥?à¥Zà¥?
Is this just a warning or
13 matches
Mail list logo