Hi Rajesh,
A couple of questions... 
1) when you use the sample  png file to train ...  to create language data 
files ... is there a complementary text file which much be present as a 
utf-8 text file for the tiff file.... getting image file is confusing to me.

2)   If there are pairs of image and text files for training how does one 
name them so that the program knows what to do ?

Or have I got it all wrong ? I need someone to explain this as though I 
were a 5-year old.

Any help is appreciated

Richard

PS  I am trying to use Tesseract to create my own Fraktur German language 
data files to enhance ocr accuracy.

On Tuesday, May 1, 2012 11:43:14 AM UTC+10, Rajesh wrote:
>
> Hi Falke, 
> Here is a sample image
>  
>
> On Mon, Apr 30, 2012 at 12:10 AM, Falke <[email protected]> wrote:
>
>>
>>
>> On Apr 26, 2:18 pm, Rajesh Pandey <[email protected]> wrote:
>> > > > Earlier I was interested in creating a Nepali OCR but I am these 
>> days
>> > > more
>> >
>> > > You were going to write the whole engine, from scratch?  Wow.
>> >
>> > Yes indeed. We(as a team) were creating a complete OCR. We
>> > *were*researching and developing a full fledged Nepali OCR.
>> >
>> > Some of the work is still there at code.google.com/p/nepaliocr
>> >
>> > I haven't tried to train again. I was asking if anyone had ever tried 
>> for
>> > Nepali because there might be some people who had luck. If I'd know that
>> > people had luck training, it would be worth trying it. Its nearly 3 
>> years I
>> > had attempted to train tesseract for Nepali.
>> >
>> > Fossnepal is a group of Nepali Open source community.
>> >
>>
>> If you uploaded a sample scanned image to this forum, others
>> (including myself) could try it with tesseract.  I'm not sure how much
>> difference there is between font(s) in (older?) Nepali documents and
>> Hindi documents... While the alphabet is the same (correct me if i'm
>> wrong), maybe the styles (font variations) are different enough to
>> call for separate training (?)  But I don't think it should be SO
>> different as to negate the following deductive statement: "If
>> tesseract is trainable for Hindi, it should be trainable for Nepali
>> ".  Or, IOW: At best -- you can piggyback on the hindi training; at
>> worst, you'll need to train specifically for nepali (therewith
>> achieving accuracy comparable to the one with Hindi).
>>
>> Of course, not being an expert on this, i may have to eat my words ...
>>
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>
>
>
> -- 
> Rajesh Pandey
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to