​Mamata,

You need to look at
http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
and follow the instructions there.

Your box file names should be in the format:
ori.lohit.exp0.box
ori.A.exp0.box
ori.B.exp0.box

where lohit, A and B are font names that have been listed in the
font_properties file.

You should not cat training files of two different fonts. You can give upto
32 different tr files for the training process.

In order to create the correct unicharset you need to provide the box file
used for creating the LohitOriya.tr file.

I am assuming that you have downloaded the oriya files from
http://code.google.com/p/parichit/downloads/list

http://code.google.com/p/parichit/downloads/detail?name=ori_lohit_image_box.tar.gz&can=2&q=
has the box and tif files

http://code.google.com/p/parichit/downloads/detail?name=Oriya.txt&can=2&q=
has the ground truth/training text in oriya.

I would suggest that you first create the oriya traineddata using the lohit
files and then add your files one by one.

Shree​




On Sat, Jun 1, 2013 at 11:35 AM, mamata nayak <[email protected]> wrote:

> Sir,
> please help me
> Actually character set of my language consists of about 500 characters.
> I have divide these into subset's i.e about 10 .tif files and generate box
> file and edit those using Qt editor separately and then use the following
> command:
>
> $ cat >> LohitOriya.tr C.e0.tr
>
> to concatenate one .tr files with the previously generated LohitOriya.tr
> file.
>
> $ unicharset_extractor A.3.box B.e0.box C.e0.box
>
> to generate the unicharset  file.
>
> Please response as early as possible.
>
> Eagerly waiting
> $unicharset_extractor
>
>
> On Tue, May 21, 2013 at 3:38 PM, Shree Devi Kumar <[email protected]>wrote:
>
>> Mamata,
>>  Please see https://code.google.com/p/tesseract-ocr/downloads/list for
>> the available language data friles for tesseract 3.02. In case Odia is
>> similar to bangala, you can use the bengali traineddata to bootstrap for
>> odia.
>>
>> Shree
>>
>> Shree Devi Kumar
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>>
>> On Tue, May 21, 2013 at 2:26 PM, mamata nayak <[email protected]>wrote:
>>
>>> Sir
>>> Can you please tell me, the recent list of indian languages those are
>>> trained the tesseract-ocr engine.
>>>
>>> Thank you
>>>
>>>
>>> On Sun, May 12, 2013 at 12:23 PM, Shree Devi Kumar <[email protected]
>>> > wrote:
>>>
>>>> Are you training Odia language?
>>>>
>>>> Have you seen
>>>> http://tdil-dc.in/tdildcMain/articles/374232Odia%20Script%20Grammar_Ver1.0.pdf
>>>> ?
>>>>
>>>>
>>>> Shree Devi Kumar
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>>
>>>> On Sat, May 11, 2013 at 9:01 PM, mamata nayak <[email protected]>wrote:
>>>>
>>>>> Thank you sir.
>>>>> I could able to detect a set of character set of my language.
>>>>> However a single character among all of those i.e ଫୀ is recognized as
>>>>> character pairs differently at different place in training image such as
>>>>> କ୍ଷୀଛୀ, ନୀନୀ .ଯୀଛୀ, ପୀଛୀ, ବୀନୀ as it occurs 5 times
>>>>> .
>>>>> then i use unicharambigs file having the information as follows
>>>>> v1
>>>>> 2    କ୍ଷୀ ଛୀ    1    ଫୀ    1
>>>>> 2    ନୀ ନୀ    1    ଫୀ    1
>>>>> 2    ଯୀ ଛୀ    1    ଫୀ    1
>>>>> 2    ପୀ ଛୀ    1    ଫୀ    1
>>>>> 2    ବୀ ନୀ    1    ଫୀ    1
>>>>> But the problem while recognizing these pair of characters it replace
>>>>> with ଫୀ
>>>>> So please understood my problem and give suggestion.
>>>>> thanking you
>>>>>
>>>>>
>>>>> On Wed, May 8, 2013 at 5:47 PM, Quan Nguyen <[email protected]>wrote:
>>>>>
>>>>>> You would need to run the tesseract command to generate the box file
>>>>>> for your image, e.g.:
>>>>>>
>>>>>> tesseract eng.timesitalic.exp0.tif eng.timesitalic.exp0 batch.nochop 
>>>>>> makebox
>>>>>>
>>>>>>
>>>>>> Check Tesseract Training Wiki for more details.
>>>>>>
>>>>>> http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3
>>>>>>
>>>>>> Once you have the TIFF/Box pair, you can open it in jTessBoxEditor.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wednesday, May 8, 2013 12:29:43 AM UTC-5, mama wrote:
>>>>>>
>>>>>>> Good Morning Sir,
>>>>>>> Thanks for your reply.
>>>>>>> Now my problem is, for few set of characters of my language the
>>>>>>> jTessBoxEditor could open the corresponding tif file and generate its 
>>>>>>> box
>>>>>>> file but for few other it can't be generate the box co-ordinate.Please 
>>>>>>> sir
>>>>>>> I have attached the file.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, May 4, 2013 at 7:38 PM, Quan Nguyen <[email protected]>wrote:
>>>>>>>
>>>>>>>> What Ubuntu and Java versions are installed on your machine? You
>>>>>>>> probably has a headless Java -- i.e., one without graphics libraries. 
>>>>>>>> Can
>>>>>>>> you use Oracle Java 7, which is the version I tested with? Thanks.
>>>>>>>>
>>>>>>>>
>>>>>>>> http://askubuntu.com/**questions/55848/how-do-i-**
>>>>>>>> install-oracle-java-jdk-7<http://askubuntu.com/questions/55848/how-do-i-install-oracle-java-jdk-7>
>>>>>>>>
>>>>>>>> On Saturday, May 4, 2013 8:10:33 AM UTC-5, mama wrote:
>>>>>>>>
>>>>>>>>> sir,
>>>>>>>>> After giving this command at the command prompt, the output as
>>>>>>>>> follows
>>>>>>>>> java -Xms128m -Xmx512m -jar jTessBoxEditor.jar
>>>>>>>>> 4 May, 2013 6:21:23 PM java.util.prefs.**FileSystemPref**erences$2
>>>>>>>>> run
>>>>>>>>> INFO: Created user preferences directory.
>>>>>>>>> Exception in thread "AWT-EventQueue-0" java.awt.HeadlessException
>>>>>>>>>     at java.awt.GraphicsEnvironment.**c**heckHeadless(**
>>>>>>>>> GraphicsEnvironme**nt.java:173)
>>>>>>>>>     at java.awt.Window.<init>(Window.****java:546)
>>>>>>>>>     at java.awt.Frame.<init>(Frame.**ja**va:419)
>>>>>>>>>     at java.awt.Frame.<init>(Frame.**ja**va:384)
>>>>>>>>>     at javax.swing.JFrame.<init>(**JFra**me.java:174)
>>>>>>>>>     at net.sourceforge.tessboxeditor.****Gui.<init>(Unknown
>>>>>>>>> Source)
>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithMRU.<init>(Unknown
>>>>>>>>> Source)
>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithEdit.<init>(Unknown
>>>>>>>>> Source)
>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithSpinner.<init>(Unknown
>>>>>>>>> Source)
>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithFont.<init>(Unknown
>>>>>>>>> Source)
>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithLaF.<init>(Unknown
>>>>>>>>> Source)
>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithTools.<init>(Unknown
>>>>>>>>> Source)
>>>>>>>>>     at net.sourceforge.tessboxeditor.****GuiWithTools$2.run(Unknown
>>>>>>>>> Source)
>>>>>>>>>     at java.awt.event.**InvocationEvent**.dispatch(**
>>>>>>>>> InvocationEvent.**java:226)
>>>>>>>>>     at java.awt.EventQueue.**dispatchEv**entImpl(EventQueue.**
>>>>>>>>> java:673)
>>>>>>>>>     at java.awt.EventQueue.access$**300**(EventQueue.java:96)
>>>>>>>>>     at java.awt.EventQueue$2.run(**Even**tQueue.java:634)
>>>>>>>>>     at java.awt.EventQueue$2.run(**Even**tQueue.java:632)
>>>>>>>>>     at java.security.**AccessController**.doPrivileged(**Native
>>>>>>>>> Method)
>>>>>>>>>     at java.security.**AccessControlCon**text$1.**
>>>>>>>>> doIntersectionPrivilege**(**AccessControlContext.java:**105)
>>>>>>>>>     at java.awt.EventQueue.**dispatchEv**ent(EventQueue.java:**
>>>>>>>>> 643)
>>>>>>>>>     at java.awt.EventDispatchThread.**p**umpOneEventForFilters(**
>>>>>>>>> EventDis**patchThread.java:275)
>>>>>>>>>     at java.awt.EventDispatchThread.**p**umpEventsForFilter(**
>>>>>>>>> EventDispat**chThread.java:200)
>>>>>>>>>     at java.awt.EventDispatchThread.**p**umpEventsForHierarchy(**
>>>>>>>>> EventDis**patchThread.java:190)
>>>>>>>>>     at java.awt.EventDispatchThread.**p**umpEvents(**
>>>>>>>>> EventDispatchThread.**java:185)
>>>>>>>>>     at java.awt.EventDispatchThread.**p**umpEvents(**
>>>>>>>>> EventDispatchThread.**java:177)
>>>>>>>>>     at java.awt.EventDispatchThread.**r**
>>>>>>>>> un(EventDispatchThread.java:**13**8)
>>>>>>>>>
>>>>>>>>> However i could not get how to open the window
>>>>>>>>> [image: jTessBoxEditor Swing UI][image: Box View]
>>>>>>>>> jTessBoxEditor Swing U
>>>>>>>>>
>>>>>>>>> Please reply me
>>>>>>>>> Thank you
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 1, 2013 at 3:32 AM, Quan Nguyen <[email protected]>wrote:
>>>>>>>>>
>>>>>>>>>> Version 0.9 Release:
>>>>>>>>>>
>>>>>>>>>> - Enhance Generate TIFF/Box functionality to allow for combining
>>>>>>>>>> prepending symbols in addition to appending
>>>>>>>>>> - Fix a bug that failed to persist changes to table in edit mode
>>>>>>>>>> - Find function now supports partial matches
>>>>>>>>>> - Fix a problem with table not scrolling along when row header
>>>>>>>>>> has focus and scrolling
>>>>>>>>>>
>>>>>>>>>> http://sourceforge.net/**project**s/vietocr/files/**
>>>>>>>>>> jTessBoxEditor**/<http://sourceforge.net/projects/vietocr/files/jTessBoxEditor/>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>>>>
>>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>>> tesseract-oc...@**googlegroups.**com
>>>>>>>>>>
>>>>>>>>>> For more options, visit this group at
>>>>>>>>>> http://groups.google.com/**group**/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>> You received this message because you are subscribed to a topic
>>>>>>>>>> in the Google Groups "tesseract-ocr" group.
>>>>>>>>>> To unsubscribe from this topic, visit
>>>>>>>>>> https://groups.google.com/d/**to**pic/tesseract-ocr/**
>>>>>>>>>> QQ8wC59YKUI/**unsubscribe?hl=en<https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en>
>>>>>>>>>> .
>>>>>>>>>>  To unsubscribe from this group and all its topics, send an email
>>>>>>>>>> to tesseract-oc...@**googlegroups.**com.
>>>>>>>>>>
>>>>>>>>>> For more options, visit https://groups.google.com/**grou**
>>>>>>>>>> ps/opt_out <https://groups.google.com/groups/opt_out>.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "tesseract-ocr" group.
>>>>>>>> To post to this group, send email to [email protected]
>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>> tesseract-oc...@**googlegroups.com
>>>>>>>> For more options, visit this group at
>>>>>>>> http://groups.google.com/**group/tesseract-ocr?hl=en<http://groups.google.com/group/tesseract-ocr?hl=en>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> You received this message because you are subscribed to a topic in
>>>>>>>> the Google Groups "tesseract-ocr" group.
>>>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/*
>>>>>>>> *topic/tesseract-ocr/**QQ8wC59YKUI/unsubscribe?hl=en<https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en>
>>>>>>>> .
>>>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>>>> tesseract-oc...@**googlegroups.com.
>>>>>>>> For more options, visit 
>>>>>>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>>>>>>> .
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "tesseract-ocr" group.
>>>>>> To post to this group, send email to [email protected]
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected]
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>>
>>>>>> ---
>>>>>> You received this message because you are subscribed to a topic in
>>>>>> the Google Groups "tesseract-ocr" group.
>>>>>> To unsubscribe from this topic, visit
>>>>>> https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en
>>>>>> .
>>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>>> [email protected].
>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>  --
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected]
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>>
>>>>> ---
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>>
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>>
>>>>>
>>>>
>>>>  --
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "tesseract-ocr" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>>
>>>> ---
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "tesseract-ocr" group.
>>>> To unsubscribe from this topic, visit
>>>> https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en
>>>> .
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> [email protected].
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>>
>>>>
>>>
>>>  --
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/tesseract-ocr?hl=en
>>>
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>>  --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/tesseract-ocr/QQ8wC59YKUI/unsubscribe?hl=en
>> .
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to