Sir, Actually A, B, C etc are the tif file names containing subset of my character-set only for simplicity. And i have 20 number of such files named as A to ....T.tif, and successfully able to trained the files A, B,C,D,E by combining one after another. As per u r reply I cam to know that i can use 32 number of files. Thank you
On Saturday, June 1, 2013 1:58:26 PM UTC+5:30, sdk wrote: > > Mamata, > > You need to look at > http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 > and follow the instructions there. > > Your box file names should be in the format: > ori.lohit.exp0.box > ori.A.exp0.box > ori.B.exp0.box > > where lohit, A and B are font names that have been listed in the > font_properties file. > > You should not cat training files of two different fonts. You can give > upto 32 different tr files for the training process. > > In order to create the correct unicharset you need to provide the box file > used for creating the LohitOriya.tr file. > > I am assuming that you have downloaded the oriya files from > http://code.google.com/p/parichit/downloads/list > > > http://code.google.com/p/parichit/downloads/detail?name=ori_lohit_image_box.tar.gz&can=2&q= > has the box and tif files > > http://code.google.com/p/parichit/downloads/detail?name=Oriya.txt&can=2&q= > has the ground truth/training text in oriya. > > I would suggest that you first create the oriya traineddata using the > lohit files and then add your files one by one. > > Shree > > > > > On Sat, Jun 1, 2013 at 11:35 AM, mamata nayak <[email protected]<javascript:> > > wrote: > >> Sir, >> please help me >> Actually character set of my language consists of about 500 characters. >> I have divide these into subset's i.e about 10 .tif files and generate >> box file and edit those using Qt editor separately and then use the >> following command: >> >> $ cat >> LohitOriya.tr C.e0.tr >> >> to concatenate one .tr files with the previously generated LohitOriya.tr >> file. >> >> $ unicharset_extractor A.3.box B.e0.box C.e0.box >> >> to generate the unicharset file. >> >> Please response as early as possible. >> >> Eagerly waiting >> $unicharset_extractor >> >> >> On Tue, May 21, 2013 at 3:38 PM, Shree Devi Kumar >> <[email protected]<javascript:> >> > wrote: >> >>> Mamata, >>> Please see https://code.google.com/p/tesseract-ocr/downloads/list for >>> the available language data friles for tesseract 3.02. In case Odia is >>> similar to bangala, you can use the bengali traineddata to bootstrap for >>> odia. >>> >>> Shree >>> >>> Shree Devi Kumar >>> ____________________________________________________________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> >>> On Tue, May 21, 2013 at 2:26 PM, mamata nayak >>> <[email protected]<javascript:> >>> > wrote: >>> >>>> Sir >>>> Can you please tell me, the recent list of indian languages those are >>>> trained the tesseract-ocr engine. >>>> >>>> Thank you >>>> >>>> >>>> On Sun, May 12, 2013 at 12:23 PM, Shree Devi Kumar >>>> <[email protected]<javascript:> >>>> > wrote: >>>> >>>>> Are you training Odia language? >>>>> >>>>> Have you seen >>>>> http://tdil-dc.in/tdildcMain/articles/374232Odia%20Script%20Grammar_Ver1.0.pdf >>>>> ? >>>>> >>>>> >>>>> Shree Devi Kumar >>>>> ____________________________________________________________ >>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>> >>>>> >>>>> On Sat, May 11, 2013 at 9:01 PM, mamata nayak >>>>> <[email protected]<javascript:> >>>>> > wrote: >>>>> >>>>>> Thank you sir. >>>>>> I could able to detect a set of character set of my language. >>>>>> However a single character among all of those i.e ଫୀ is recognized as >>>>>> character pairs differently at different place in training image such as >>>>>> କ୍ଷୀଛୀ, ନୀନୀ .ଯୀଛୀ, ପୀଛୀ, ବୀନୀ as it occurs 5 times >>>>>> . >>>>>> then i use unicharambigs file having the information as follows >>>>>> v1 >>>>>> 2 କ୍ଷୀ ଛୀ 1 ଫୀ 1 >>>>>> 2 ନୀ ନୀ 1 ଫୀ 1 >>>>>> 2 ଯୀ ଛୀ 1 ଫୀ 1 >>>>>> 2 ପୀ ଛୀ 1 ଫୀ 1 >>>>>> 2 ବୀ ନୀ 1 ଫୀ 1 >>>>>> But the problem while recognizing these pair of characters it replace >>>>>> with ଫୀ >>>>>> So please understood my problem and give suggestion. >>>>>> thanking you >>>>>> >>>>>> >>>>>> On Wed, May 8, 2013 at 5:47 PM, Quan Nguyen >>>>>> <[email protected]<javascript:> >>>>>> > wrote: >>>>>> >>>>>>> You would need to run the tesseract command to generate the box file >>>>>>> for your image, e.g.: >>>>>>> >>>>>>> tesseract eng.timesitalic.exp0.tif eng.timesitalic.exp0 batch.nochop >>>>>>> makebox >>>>>>> >>>>>>> >>>>>>> Check Tesseract Training Wiki for more details. >>>>>>> >>>>>>> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 >>>>>>> >>>>>>> Once you have the TIFF/Box pair, you can open it in jTessBoxEditor. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wednesday, May 8, 2013 12:29:43 AM UTC-5, mama wrote: >>>>>>> >>>>>>>> Good Morning Sir, >>>>>>>> Thanks for your reply. >>>>>>>> Now my problem is, for few set of characters of my language the >>>>>>>> jTessBoxEditor could open the corresponding tif file and generate its >>>>>>>> box >>>>>>>> file but for few other it can't be generate the box co-ordinate.Please >>>>>>>> sir >>>>>>>> I have attached the file. >>>>>>>> >>>>>>>> >>>>>>>> On Sat, May 4, 2013 at 7:38 PM, Quan Nguyen <[email protected]>wrote: >>>>>>>> >>>>>>>>> What Ubuntu and Java versions are installed on your machine? You >>>>>>>>> probably has a headless Java -- i.e., one without graphics libraries. >>>>>>>>> Can >>>>>>>>> you use Oracle Java 7, which is the version I tested with? Thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>> http://askubuntu.com/**questions/55848/how-do-i-** >>>>>>>>> install-oracle-java-jdk-7<http://askubuntu.com/questions/55848/how-do-i-install-oracle-java-jdk-7> >>>>>>>>> >>>>>>>>> On Saturday, May 4, 2013 8:10:33 AM UTC-5, mama wrote: >>>>>>>>> >>>>>>>>>> sir, >>>>>>>>>> After giving this command at the command prompt, the output as >>>>>>>>>> follows >>>>>>>>>> java -Xms128m -Xmx512m -jar jTessBoxEditor.jar >>>>>>>>>> 4 May, 2013 6:21:23 PM java.util.prefs.**FileSystemPref**erences$2 >>>>>>>>>> run >>>>>>>>>> INFO: Created user preferences directory. >>>>>>>>>> Exception in thread "AWT-EventQueue-0" java.awt.HeadlessException >>>>>>>>>> at java.awt.GraphicsEnvironment.**c**heckHeadless(** >>>>>>>>>> GraphicsEnvironme**nt.java:173) >>>>>>>>>> at java.awt.Window.<init>(Window.****java:546) >>>>>>>>>> at java.awt.Frame.<init>(Frame.**ja**va:419) >>>>>>>>>> at java.awt.Frame.<init>(Frame.**ja**va:384) >>>>>>>>>> at javax.swing.JFrame.<init>(**JFra**me.java:174) >>>>>>>>>> at net.sourceforge.tessboxeditor.****Gui.<init>(Unknown >>>>>>>>>> Source) >>>>>>>>>> at net.sourceforge.tessboxeditor.****GuiWithMRU.<init>(Unknown >>>>>>>>>> Source) >>>>>>>>>> at net.sourceforge.tessboxeditor.****GuiWithEdit.<init>(Unknown >>>>>>>>>> Source) >>>>>>>>>> at >>>>>>>>>> net.sourceforge.tessboxeditor.****GuiWithSpinner.<init>(Unknown >>>>>>>>>> Source) >>>>>>>>>> at net.sourceforge.tessboxeditor.****GuiWithFont.<init>(Unknown >>>>>>>>>> Source) >>>>>>>>>> at net.sourceforge.tessboxeditor.****GuiWithLaF.<init>(Unknown >>>>>>>>>> Source) >>>>>>>>>> at net.sourceforge.tessboxeditor.****GuiWithTools.<init>(Unknown >>>>>>>>>> Source) >>>>>>>>>> at net.sourceforge.tessboxeditor.****GuiWithTools$2.run(Unknown >>>>>>>>>> Source) >>>>>>>>>> at java.awt.event.**InvocationEvent**.dispatch(** >>>>>>>>>> InvocationEvent.**java:226) >>>>>>>>>> at java.awt.EventQueue.**dispatchEv**entImpl(EventQueue.** >>>>>>>>>> java:673) >>>>>>>>>> at java.awt.EventQueue.access$**300**(EventQueue.java:96) >>>>>>>>>> at java.awt.EventQueue$2.run(**Even**tQueue.java:634) >>>>>>>>>> at java.awt.EventQueue$2.run(**Even**tQueue.java:632) >>>>>>>>>> at java.security.**AccessController**.doPrivileged(**Native >>>>>>>>>> Method) >>>>>>>>>> at java.security.**AccessControlCon**text$1.** >>>>>>>>>> doIntersectionPrivilege**(**AccessControlContext.java:**105) >>>>>>>>>> at java.awt.EventQueue.**dispatchEv**ent(EventQueue.java:** >>>>>>>>>> 643) >>>>>>>>>> at java.awt.EventDispatchThread.**p**umpOneEventForFilters(** >>>>>>>>>> EventDis**patchThread.java:275) >>>>>>>>>> at java.awt.EventDispatchThread.**p**umpEventsForFilter(** >>>>>>>>>> EventDispat**chThread.java:200) >>>>>>>>>> at java.awt.EventDispatchThread.**p**umpEventsForHierarchy(** >>>>>>>>>> EventDis**patchThread.java:190) >>>>>>>>>> at java.awt.EventDispatchThread.**p**umpEvents(** >>>>>>>>>> EventDispatchThread.**java:185) >>>>>>>>>> at java.awt.EventDispatchThread.**p**umpEvents(** >>>>>>>>>> EventDispatchThread.**java:177) >>>>>>>>>> at java.awt.EventDispatchThread.**r** >>>>>>>>>> un(EventDispatchThread.java:**13**8) >>>>>>>>>> >>>>>>>>>> However i could not get how to open the window >>>>>>>>>> [image: jTessBoxEditor Swing UI][image: Box View] >>>>>>>>>> jTessBoxEditor Swing U >>>>>>>>>> >>>>>>>>>> Please reply me >>>>>>>>>> Thank you >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, May 1, 2013 at 3:32 AM, Quan Nguyen <[email protected]>wrote: >>>>>>>>>> >>>>>>>>>>> Version 0.9 Release: >>>>>>>>>>> >>>>>>>>>>> - Enhance Generate TIFF/Box functionality to allow for combining >>>>>>>>>>> prepending symbols in addition to appending >>>>>>>>>>> - Fix a bug that failed to persist changes to table in edit mode >>>>>>>>>>> - Find function now supports partial matches >>>>>>>>>>> - Fix a problem with table not scrolling along when row header >>>>>>>>>>> has focus and scrolling >>>>>>>>>>> >>>>>>>>>>> http://sourceforge.net/**project**s/vietocr/files/** >>>>>>>>>>> jTessBoxEditor**/<http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> You received this message because you are subscribed to the >>>>>>>>>>> Google >>>>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>>>> To post to this group, send email to [email protected] >>>>>>>>>>> >>>>>>>>>>> To unsubscribe from this group, send email to >>>>>>>>>>> tesseract-oc...@**googlegroups.**com >>>>>>>>>>> >>>>>>>>>>> For more options, visit this group at >>>>>>>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>>>>>>>>> >>>>>>>>>>> --- >>>>>>>>>>> You received this message because you are subscribed to a topic >>>>>>>>>>> in the Google Groups "tesseract-ocr" group. >>>>>>>>>>> To unsubscribe from this topic, visit >>>>>>>>>>> https://groups.google.com/d/**to**pic/tesseract-ocr/** >>>>>>>>>>> QQ8wC59YKUI/**unsubscribe?hl=en<https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en> >>>>>>>>>>> . >>>>>>>>>>> To unsubscribe from this group and all its topics, send an >>>>>>>>>>> email to tesseract-oc...@**googlegroups.**com. >>>>>>>>>>> >>>>>>>>>>> For more options, visit https://groups.google.com/**grou** >>>>>>>>>>> ps/opt_out <https://groups.google.com/groups/opt_out>. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To post to this group, send email to [email protected] >>>>>>>>> To unsubscribe from this group, send email to >>>>>>>>> tesseract-oc...@**googlegroups.com >>>>>>>>> For more options, visit this group at >>>>>>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en> >>>>>>>>> >>>>>>>>> --- >>>>>>>>> You received this message because you are subscribed to a topic in >>>>>>>>> the Google Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>>>>>>>> **topic/tesseract-ocr/**QQ8wC59YKUI/unsubscribe?hl=en<https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en> >>>>>>>>> . >>>>>>>>> To unsubscribe from this group and all its topics, send an email >>>>>>>>> to tesseract-oc...@**googlegroups.com. >>>>>>>>> For more options, visit >>>>>>>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out> >>>>>>>>> . >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To post to this group, send email to >>>>>>> [email protected]<javascript:> >>>>>>> To unsubscribe from this group, send email to >>>>>>> [email protected] <javascript:> >>>>>>> For more options, visit this group at >>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>>>>> >>>>>>> --- >>>>>>> You received this message because you are subscribed to a topic in >>>>>>> the Google Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this topic, visit >>>>>>> https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en >>>>>>> . >>>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>>> [email protected] <javascript:>. >>>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To post to this group, send email to >>>>>> [email protected]<javascript:> >>>>>> To unsubscribe from this group, send email to >>>>>> [email protected] <javascript:> >>>>>> For more options, visit this group at >>>>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>>>> >>>>>> --- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "tesseract-ocr" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected] <javascript:>. >>>>>> >>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To post to this group, send email to >>>>> [email protected]<javascript:> >>>>> To unsubscribe from this group, send email to >>>>> [email protected] <javascript:> >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>>> >>>>> --- >>>>> You received this message because you are subscribed to a topic in the >>>>> Google Groups "tesseract-ocr" group. >>>>> To unsubscribe from this topic, visit >>>>> https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en >>>>> . >>>>> To unsubscribe from this group and all its topics, send an email to >>>>> [email protected] <javascript:>. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>>> >>>>> >>>> >>>> -- >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To post to this group, send email to >>>> [email protected]<javascript:> >>>> To unsubscribe from this group, send email to >>>> [email protected] <javascript:> >>>> For more options, visit this group at >>>> http://groups.google.com/group/tesseract-ocr?hl=en >>>> >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected] <javascript:>. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>>> >>>> >>> >>> -- >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To post to this group, send email to [email protected]<javascript:> >>> To unsubscribe from this group, send email to >>> [email protected] <javascript:> >>> For more options, visit this group at >>> http://groups.google.com/group/tesseract-ocr?hl=en >>> >>> --- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "tesseract-ocr" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en >>> . >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected] <javascript:>. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> >>> >> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected]<javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> --- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

