Sir,
Actually A, B, C etc are the tif file names containing subset of my 
character-set only for simplicity.
And i have 20 number of such files named as A to ....T.tif, and 
successfully able to trained the files A, B,C,D,E by combining one after 
another.
As per u r reply I cam to know that i can use 32 number of files.
Thank you  

On Saturday, June 1, 2013 1:58:26 PM UTC+5:30, sdk wrote:
>
> ​Mamata,
>
> You need to look at 
> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
> and follow the instructions there.
>
> Your box file names should be in the format:
> ori.lohit.exp0.box
> ori.A.exp0.box
> ori.B.exp0.box
>
> where lohit, A and B are font names that have been listed in the 
> font_properties file.
>
> You should not cat training files of two different fonts. You can give 
> upto 32 different tr files for the training process.
>
> In order to create the correct unicharset you need to provide the box file 
> used for creating the LohitOriya.tr file.
>
> I am assuming that you have downloaded the oriya files from 
> http://code.google.com/p/parichit/downloads/list
>
>
> http://code.google.com/p/parichit/downloads/detail?name=ori_lohit_image_box.tar.gz&can=2&q=
> has the box and tif files
>
> http://code.google.com/p/parichit/downloads/detail?name=Oriya.txt&can=2&q=
> has the ground truth/training text in oriya.
>
> I would suggest that you first create the oriya traineddata using the 
> lohit files and then add your files one by one.
>
> Shree​
>
>
>
>
> On Sat, Jun 1, 2013 at 11:35 AM, mamata nayak <[email protected]<javascript:>
> > wrote:
>
>> Sir,
>> please help me
>> Actually character set of my language consists of about 500 characters.
>> I have divide these into subset's i.e about 10 .tif files and generate 
>> box file and edit those using Qt editor separately and then use the 
>> following command:
>>
>> $ cat >> LohitOriya.tr C.e0.tr
>>
>> to concatenate one .tr files with the previously generated LohitOriya.tr 
>> file.  
>>
>> $ unicharset_extractor A.3.box B.e0.box C.e0.box
>>
>> to generate the unicharset  file.
>>
>> Please response as early as possible.
>>
>> Eagerly waiting
>> $unicharset_extractor        
>>
>>
>> On Tue, May 21, 2013 at 3:38 PM, Shree Devi Kumar 
>> <[email protected]<javascript:>
>> > wrote:
>>
>>> Mamata,
>>>  Please see https://code.google.com/p/tesseract-ocr/downloads/list for 
>>> the available language data friles for tesseract 3.02. In case Odia is 
>>> similar to bangala, you can use the bengali traineddata to bootstrap for 
>>> odia.
>>>
>>> Shree
>>>
>>> Shree Devi Kumar
>>> ____________________________________________________________
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>>
>>> On Tue, May 21, 2013 at 2:26 PM, mamata nayak 
>>> <[email protected]<javascript:>
>>> > wrote:
>>>
>>>> Sir 
>>>> Can you please tell me, the recent list of indian languages those are 
>>>> trained the tesseract-ocr engine.
>>>>
>>>> Thank you
>>>>
>>>>
>>>> On Sun, May 12, 2013 at 12:23 PM, Shree Devi Kumar 
>>>> <[email protected]<javascript:>
>>>> > wrote:
>>>>
>>>>> Are you training Odia language?
>>>>>
>>>>> Have you seen 
>>>>> http://tdil-dc.in/tdildcMain/articles/374232Odia%20Script%20Grammar_Ver1.0.pdf
>>>>> ?
>>>>>
>>>>>  
>>>>> Shree Devi Kumar
>>>>> ____________________________________________________________
>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>>  
>>>>>
>>>>> On Sat, May 11, 2013 at 9:01 PM, mamata nayak 
>>>>> <[email protected]<javascript:>
>>>>> > wrote:
>>>>>
>>>>>> Thank you sir.
>>>>>> I could able to detect a set of character set of my language.
>>>>>> However a single character among all of those i.e ଫୀ is recognized as 
>>>>>> character pairs differently at different place in training image such as 
>>>>>> କ୍ଷୀଛୀ, ନୀନୀ .ଯୀଛୀ, ପୀଛୀ, ବୀନୀ as it occurs 5 times
>>>>>> .
>>>>>> then i use unicharambigs file having the information as follows 
>>>>>> v1
>>>>>> 2    କ୍ଷୀ ଛୀ    1    ଫୀ    1    
>>>>>> 2    ନୀ ନୀ    1    ଫୀ    1
>>>>>> 2    ଯୀ ଛୀ    1    ଫୀ    1
>>>>>> 2    ପୀ ଛୀ    1    ଫୀ    1
>>>>>> 2    ବୀ ନୀ    1    ଫୀ    1
>>>>>> But the problem while recognizing these pair of characters it replace 
>>>>>> with ଫୀ
>>>>>> So please understood my problem and give suggestion.
>>>>>> thanking you
>>>>>>
>>>>>>
>>>>>> On Wed, May 8, 2013 at 5:47 PM, Quan Nguyen 
>>>>>> <[email protected]<javascript:>
>>>>>> > wrote:
>>>>>>
>>>>>>> You would need to run the tesseract command to generate the box file 
>>>>>>> for your image, e.g.:
>>>>>>>
>>>>>>> tesseract eng.timesitalic.exp0.tif eng.timesitalic.exp0 batch.nochop 
>>>>>>> makebox
>>>>>>>
>>>>>>>
>>>>>>> Check Tesseract Training Wiki for more details.
>>>>>>>
>>>>>>> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
>>>>>>>
>>>>>>> Once you have the TIFF/Box pair, you can open it in jTessBoxEditor.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wednesday, May 8, 2013 12:29:43 AM UTC-5, mama wrote:
>>>>>>>
>>>>>>>> Good Morning Sir,
>>>>>>>> Thanks for your reply.
>>>>>>>> Now my problem is, for few set of characters of my language the 
>>>>>>>> jTessBoxEditor could open the corresponding tif file and generate its 
>>>>>>>> box 
>>>>>>>> file but for few other it can't be generate the box co-ordinate.Please 
>>>>>>>> sir 
>>>>>>>> I have attached the file.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, May 4, 2013 at 7:38 PM, Quan Nguyen <[email protected]>wrote:
>>>>>>>>
>>>>>>>>> What Ubuntu and Java versions are installed on your machine? You 
>>>>>>>>> probably has a headless Java -- i.e., one without graphics libraries. 
>>>>>>>>> Can 
>>>>>>>>> you use Oracle Java 7, which is the version I tested with? Thanks.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://askubuntu.com/**questions/55848/how-do-i-**
>>>>>>>>> install-oracle-java-jdk-7<http://askubuntu.com/questions/55848/how-do-i-install-oracle-java-jdk-7>
>>>>>>>>>
>>>>>>>>> On Saturday, May 4, 2013 8:10:33 AM UTC-5, mama wrote:
>>>>>>>>>
>>>>>>>>>> sir,
>>>>>>>>>> After giving this command at the command prompt, the output as 
>>>>>>>>>> follows 
>>>>>>>>>> java -Xms128m -Xmx512m -jar jTessBoxEditor.jar
>>>>>>>>>> 4 May, 2013 6:21:23 PM java.util.prefs.**FileSystemPref**erences$2 
>>>>>>>>>> run
>>>>>>>>>> INFO: Created user preferences directory.
>>>>>>>>>> Exception in thread "AWT-EventQueue-0" java.awt.HeadlessException
>>>>>>>>>>     at java.awt.GraphicsEnvironment.**c**heckHeadless(**
>>>>>>>>>> GraphicsEnvironme**nt.java:173)
>>>>>>>>>>     at java.awt.Window.<init>(Window.****java:546)
>>>>>>>>>>     at java.awt.Frame.<init>(Frame.**ja**va:419)
>>>>>>>>>>     at java.awt.Frame.<init>(Frame.**ja**va:384)
>>>>>>>>>>     at javax.swing.JFrame.<init>(**JFra**me.java:174)
>>>>>>>>>>     at net.sourceforge.tessboxeditor.****Gui.<init>(Unknown 
>>>>>>>>>> Source)
>>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithMRU.<init>(Unknown 
>>>>>>>>>> Source)
>>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithEdit.<init>(Unknown 
>>>>>>>>>> Source)
>>>>>>>>>>     at 
>>>>>>>>>> net.sourceforge.tessboxeditor.****GuiWithSpinner.<init>(Unknown 
>>>>>>>>>> Source)
>>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithFont.<init>(Unknown 
>>>>>>>>>> Source)
>>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithLaF.<init>(Unknown 
>>>>>>>>>> Source)
>>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithTools.<init>(Unknown 
>>>>>>>>>> Source)
>>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithTools$2.run(Unknown 
>>>>>>>>>> Source)
>>>>>>>>>>     at java.awt.event.**InvocationEvent**.dispatch(**
>>>>>>>>>> InvocationEvent.**java:226)
>>>>>>>>>>     at java.awt.EventQueue.**dispatchEv**entImpl(EventQueue.**
>>>>>>>>>> java:673)
>>>>>>>>>>     at java.awt.EventQueue.access$**300**(EventQueue.java:96)
>>>>>>>>>>     at java.awt.EventQueue$2.run(**Even**tQueue.java:634)
>>>>>>>>>>     at java.awt.EventQueue$2.run(**Even**tQueue.java:632)
>>>>>>>>>>     at java.security.**AccessController**.doPrivileged(**Native 
>>>>>>>>>> Method)
>>>>>>>>>>     at java.security.**AccessControlCon**text$1.**
>>>>>>>>>> doIntersectionPrivilege**(**AccessControlContext.java:**105)
>>>>>>>>>>     at java.awt.EventQueue.**dispatchEv**ent(EventQueue.java:**
>>>>>>>>>> 643)
>>>>>>>>>>     at java.awt.EventDispatchThread.**p**umpOneEventForFilters(**
>>>>>>>>>> EventDis**patchThread.java:275)
>>>>>>>>>>     at java.awt.EventDispatchThread.**p**umpEventsForFilter(**
>>>>>>>>>> EventDispat**chThread.java:200)
>>>>>>>>>>     at java.awt.EventDispatchThread.**p**umpEventsForHierarchy(**
>>>>>>>>>> EventDis**patchThread.java:190)
>>>>>>>>>>     at java.awt.EventDispatchThread.**p**umpEvents(**
>>>>>>>>>> EventDispatchThread.**java:185)
>>>>>>>>>>     at java.awt.EventDispatchThread.**p**umpEvents(**
>>>>>>>>>> EventDispatchThread.**java:177)
>>>>>>>>>>     at java.awt.EventDispatchThread.**r**
>>>>>>>>>> un(EventDispatchThread.java:**13**8)
>>>>>>>>>>
>>>>>>>>>> However i could not get how to open the window
>>>>>>>>>> [image: jTessBoxEditor Swing UI][image: Box View]
>>>>>>>>>> jTessBoxEditor Swing U
>>>>>>>>>>
>>>>>>>>>> Please reply me 
>>>>>>>>>> Thank you
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 1, 2013 at 3:32 AM, Quan Nguyen <[email protected]>wrote:
>>>>>>>>>>
>>>>>>>>>>> Version 0.9 Release:
>>>>>>>>>>>
>>>>>>>>>>> - Enhance Generate TIFF/Box functionality to allow for combining 
>>>>>>>>>>> prepending symbols in addition to appending
>>>>>>>>>>> - Fix a bug that failed to persist changes to table in edit mode
>>>>>>>>>>> - Find function now supports partial matches
>>>>>>>>>>> - Fix a problem with table not scrolling along when row header 
>>>>>>>>>>> has focus and scrolling
>>>>>>>>>>>
>>>>>>>>>>> http://sourceforge.net/**project**s/vietocr/files/**
>>>>>>>>>>> jTessBoxEditor**/<http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/>
>>>>>>>>>>>  
>>>>>>>>>>> -- 
>>>>>>>>>>> -- 
>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>> Google
>>>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>>>>>
>>>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>>>> tesseract-oc...@**googlegroups.**com
>>>>>>>>>>>
>>>>>>>>>>> For more options, visit this group at
>>>>>>>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>>>>  
>>>>>>>>>>> --- 
>>>>>>>>>>> You received this message because you are subscribed to a topic 
>>>>>>>>>>> in the Google Groups "tesseract-ocr" group.
>>>>>>>>>>> To unsubscribe from this topic, visit 
>>>>>>>>>>> https://groups.google.com/d/**to**pic/tesseract-ocr/**
>>>>>>>>>>> QQ8wC59YKUI/**unsubscribe?hl=en<https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en>
>>>>>>>>>>> .
>>>>>>>>>>>  To unsubscribe from this group and all its topics, send an 
>>>>>>>>>>> email to tesseract-oc...@**googlegroups.**com.
>>>>>>>>>>>
>>>>>>>>>>> For more options, visit https://groups.google.com/**grou**
>>>>>>>>>>> ps/opt_out <https://groups.google.com/groups/opt_out>.
>>>>>>>>>>>  
>>>>>>>>>>>  
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  -- 
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>> tesseract-oc...@**googlegroups.com
>>>>>>>>> For more options, visit this group at
>>>>>>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>>  
>>>>>>>>> --- 
>>>>>>>>> You received this message because you are subscribed to a topic in 
>>>>>>>>> the Google Groups "tesseract-ocr" group.
>>>>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>>>>>>>> **topic/tesseract-ocr/**QQ8wC59YKUI/unsubscribe?hl=en<https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en>
>>>>>>>>> .
>>>>>>>>> To unsubscribe from this group and all its topics, send an email 
>>>>>>>>> to tesseract-oc...@**googlegroups.com.
>>>>>>>>> For more options, visit 
>>>>>>>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>>>>>>>> .
>>>>>>>>>  
>>>>>>>>>  
>>>>>>>>>
>>>>>>>>
>>>>>>>>  -- 
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "tesseract-ocr" group.
>>>>>>> To post to this group, send email to 
>>>>>>> [email protected]<javascript:>
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> [email protected] <javascript:>
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>>>  
>>>>>>> --- 
>>>>>>> You received this message because you are subscribed to a topic in 
>>>>>>> the Google Groups "tesseract-ocr" group.
>>>>>>> To unsubscribe from this topic, visit 
>>>>>>> https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en
>>>>>>> .
>>>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>>>> [email protected] <javascript:>.
>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>>  
>>>>>>>  
>>>>>>>
>>>>>>
>>>>>>  -- 
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To post to this group, send email to 
>>>>>> [email protected]<javascript:>
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected] <javascript:>
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>>  
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected] <javascript:>.
>>>>>>
>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>  
>>>>>>  
>>>>>>
>>>>>
>>>>>  -- 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to 
>>>>> [email protected]<javascript:>
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected] <javascript:>
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>  
>>>>> --- 
>>>>> You received this message because you are subscribed to a topic in the 
>>>>> Google Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this topic, visit 
>>>>> https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en
>>>>> .
>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>> [email protected] <javascript:>.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>  
>>>>>  
>>>>>
>>>>
>>>>  -- 
>>>> -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to 
>>>> [email protected]<javascript:>
>>>> To unsubscribe from this group, send email to
>>>> [email protected] <javascript:>
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>  
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected] <javascript:>.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>  
>>>>  
>>>>
>>>
>>>  -- 
>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]<javascript:>
>>> To unsubscribe from this group, send email to
>>> [email protected] <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>  
>>> --- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "tesseract-ocr" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en
>>> .
>>> To unsubscribe from this group and all its topics, send an email to 
>>> [email protected] <javascript:>.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>  
>>>  
>>>
>>
>>  -- 
>> -- 
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]<javascript:>
>> To unsubscribe from this group, send email to
>> [email protected] <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to