Hi All, I have came across the Khmer training data with tesseract, I also made few train to see if it possible.
I have tried to use my train data and download from here to test but I got error: Tesseract khmer1.tif output -l khm read_params_file: Can't open ûl read_params_file: Can't open khm Tesseract Open Source OCR Engine v3.02 with Leptonica I have moved file: khm.traineddata to "C:\Program Files (x86)\Tesseract-OCR\tessdata" Can anyone give me some clue what could be wrong here? Thanks, Metrey On Tuesday, August 23, 2011 4:26:07 PM UTC+7, Jane wrote: > > I love your first paragraph, Dmitri. Anyway, I dun backup the training > images only the trained data. It's been nearly a year that I paused working > on tesseract as deadline of my project is tight. Most of the images and box > files were shared with Sriranga, Dmitri, and also the group. Also, I do not > want to share to group my image files as they were extracted from > news/gossips that maybe political harm. Bad luck for me too, that I didnt > backup the images. > > With the trainedata, any new people want to work on that can be sure that > tesseract is usable with Khmer and they can add more training data as they > wish. > > Thanks to all the members, especially Dmitri and Sriranga, for giving me > all the feedback, explanation and idea. > > Sochenda > > > > On Tue, Aug 23, 2011 at 3:41 PM, Dmitri Silaev > <[email protected]<javascript:> > > wrote: > >> We have no right to force people to give away everything. No doubt, >> it's better to have sources, but ask the project owners and Google why >> they are holding back sources for the latest traineddata files, huh? >> And this is when the whole project had been declared open source... >> >> But we can ask people if they really intend to hold back information >> (they just may not realize this), that's more correct and polite. >> Anyways, the Khmer "traineddata" file Sochenda has shared is somewhat >> useful. >> >> That's my opinion. >> >> Warm regards, >> Dmitri Silaev >> www.CustomOCR.com >> >> >> >> >> >> On Tue, Aug 23, 2011 at 11:13 AM, zdenko podobny >> <[email protected]<javascript:>> >> wrote: >> > >> > On Tue, Aug 23, 2011 at 9:01 AM, Dmitri Silaev >> > <[email protected]<javascript:> >> > >> > wrote: >> >> >> >> He-he, IMHO is a way to seem a bit less all-knowing and all-seeing to >> >> people. At least I use it that way )) See >> >> http://en.wiktionary.org/wiki/IMHO >> >> >> >> Anyways, I've checked what you shared, thanks so much. Actually you >> >> don't need to share the "normproto", "microfeat", etc. files as they >> >> are generated. The main part is your source image and box files, which >> >> you didn't share, is this intentional? If yes, it's your right and >> >> that's OK... >> >> >> > Well it is right, but I would not say it is OK (my opinion) ;-). >> Beucase if >> > somebody wants to improve it he/she has to start from begining. >> > I think dan-fraktur project [1] is good example how to >> > contribute language data (buildscript, tif, box, dictionary data files - >> > everything is included...) >> > Zdenko >> > [1] https://github.com/paalberti/tesseract-dan-fraktur >> >> >> >> Warm regards, >> >> Dmitri Silaev >> >> www.CustomOCR.com >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Aug 23, 2011 at 6:20 AM, KHEM Sochenda >> >> <[email protected]<javascript:> >> > >> >> wrote: >> >> > I dun know what IMHO is. Never use it. however I share the link here. >> >> > >> >> > >> >> > >> https://docs.google.com/leaf?id=0B9BTtR5QkyOgOTliYWMxY2YtNjJkOS00Mzg0LWI0OTctYzI1NGJhMGY1Mjk4&hl=en_US >> >> > >> >> > >> https://docs.google.com/leaf?id=0B9BTtR5QkyOgODFhZWU1NzAtNjc0OS00MThmLThjODItMGNlODM1ZDFjNzkx&hl=en_US >> >> > >> >> > >> https://docs.google.com/leaf?id=0B9BTtR5QkyOgNGY1MmI1ZTYtOTA0OS00NWFkLWI3Y2ItYWRiMWJhZDBjODQ1&hl=en_US >> >> > >> >> > >> https://docs.google.com/leaf?id=0B9BTtR5QkyOgNDU0NTAwMmQtNzFhZi00NGI0LTkxMjItYTRmMjZiNTkxYzQy&hl=en_US >> >> > >> >> > >> https://docs.google.com/leaf?id=0B9BTtR5QkyOgYjI3Zjk4YWItZjMxOC00MTAwLThiMjUtODdlZDI2N2ExMzYx&hl=en_US >> >> > >> >> > >> https://docs.google.com/leaf?id=0B9BTtR5QkyOgZmRhMWFkODAtZDQ5OS00OWY5LTk5ZmUtZWRlZTc0N2ExMGZi&hl=en_US >> >> > >> >> > >> https://docs.google.com/leaf?id=0B9BTtR5QkyOgZTE3N2RlZWMtYjRjNi00NTkyLTljZDQtOTgwNDljNmQ3ZDhi&hl=en_US >> >> > >> >> > >> >> > >> >> > On Wed, Aug 17, 2011 at 2:15 PM, zdenko podobny >> >> > <[email protected]<javascript:> >> > >> >> > wrote: >> >> >> >> >> >> IMHO best way is to create somewhere public repository (e.g. >> >> >> code.google.com, github.com sf.net ...) and send link here. I will >> add >> >> >> it >> >> >> to http://code.google.com/p/tesseract-ocr/wiki/AddOns. >> >> >> Zd. >> >> >> >> >> >> >> >> >> On Wed, Aug 17, 2011 at 9:11 AM, KHEM Sochenda >> >> >> <[email protected]<javascript:> >> > >> >> >> wrote: >> >> >>> >> >> >>> Dear Dmitri, >> >> >>> >> >> >>> Do you know how to upload the training dataset? I want to upload >> what >> >> >>> I >> >> >>> did. >> >> >>> >> >> >>> Regards, >> >> >>> Sochenda >> >> >>> >> >> >>> On Tue, Aug 9, 2011 at 3:50 PM, Dmitri Silaev >> >> >>> <[email protected]<javascript:> >> > >> >> >>> wrote: >> >> >>>> >> >> >>>> Training for Khmer is really a challenging task. You can refer to >> the >> >> >>>> following thread for some clues: >> >> >>>> >> >> >>>> >> https://groups.google.com/d/topic/tesseract-ocr/TzwbS3CwhGo/discussion >> >> >>>> You can also contact Sochenda to ask if he did any progress on >> this. >> >> >>>> >> >> >>>> Warm regards, >> >> >>>> Dmitri Silaev >> >> >>>> www.CustomOCR.com >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> >>>> On Tue, Aug 9, 2011 at 11:56 AM, Sovila Srun >> >> >>>> <[email protected]<javascript:> >> > >> >> >>>> wrote: >> >> >>>> > Thanks a lot, Zdenko! Now, I successfully configured. >> >> >>>> > I have a question to you. I would like to train to system for >> Khmer >> >> >>>> > language, do you have any comments about this? From what I need >> to >> >> >>>> > start it. >> >> >>>> > Oh, anyway you can speak Russian? >> >> >>>> > Thanks, Cheyvarman! >> >> >>>> > Best regards! >> >> >>>> > >> >> >>>> > 2011/8/9 zdenko podobny <[email protected] <javascript:>> >> >> >>>> >> >> >> >>>> >> What you want to configure and what did you try? >> >> >>>> >> >> >> >>>> >> On Tue, Aug 9, 2011 at 6:53 AM, Cheyvarman >> >> >>>> >> <[email protected]<javascript:> >> > >> >> >>>> >> wrote: >> >> >>>> >>> >> >> >>>> >>> Anyone, can tell me how to configure tesseract-ocr any >> version in >> >> >>>> >>> windows? >> >> >>>> >>> It's not worked to configure it via instruction :( >> >> >>>> >>> Thanks in advance >> >> >>>> >>> >> >> >>>> >>> -- >> >> >>>> >>> You received this message because you are subscribed to the >> >> >>>> >>> Google >> >> >>>> >>> Groups "tesseract-ocr" group. >> >> >>>> >>> To post to this group, send email to >> >> >>>> >>> [email protected] <javascript:> >> >> >>>> >>> To unsubscribe from this group, send email to >> >> >>>> >>> [email protected] <javascript:> >> >> >>>> >>> For more options, visit this group at >> >> >>>> >>> http://groups.google.com/group/tesseract-ocr?hl=en >> >> >>>> >> >> >> >>>> >> -- >> >> >>>> >> You received this message because you are subscribed to the >> Google >> >> >>>> >> Groups "tesseract-ocr" group. >> >> >>>> >> To post to this group, send email to >> >> >>>> >> [email protected] <javascript:> >> >> >>>> >> To unsubscribe from this group, send email to >> >> >>>> >> [email protected] <javascript:> >> >> >>>> >> For more options, visit this group at >> >> >>>> >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> >>>> > >> >> >>>> > -- >> >> >>>> > You received this message because you are subscribed to the >> Google >> >> >>>> > Groups "tesseract-ocr" group. >> >> >>>> > To post to this group, send email to >> >> >>>> > [email protected]<javascript:> >> >> >>>> > To unsubscribe from this group, send email to >> >> >>>> > [email protected] <javascript:> >> >> >>>> > For more options, visit this group at >> >> >>>> > http://groups.google.com/group/tesseract-ocr?hl=en >> >> >>>> > >> >> >>>> >> >> >>>> -- >> >> >>>> You received this message because you are subscribed to the Google >> >> >>>> Groups "tesseract-ocr" group. >> >> >>>> To post to this group, send email to >> >> >>>> [email protected]<javascript:> >> >> >>>> To unsubscribe from this group, send email to >> >> >>>> [email protected] <javascript:> >> >> >>>> For more options, visit this group at >> >> >>>> http://groups.google.com/group/tesseract-ocr?hl=en >> >> >>> >> >> >>> -- >> >> >>> You received this message because you are subscribed to the Google >> >> >>> Groups "tesseract-ocr" group. >> >> >>> To post to this group, send email to >> >> >>> [email protected]<javascript:> >> >> >>> To unsubscribe from this group, send email to >> >> >>> [email protected] <javascript:> >> >> >>> For more options, visit this group at >> >> >>> http://groups.google.com/group/tesseract-ocr?hl=en >> >> >> >> >> >> -- >> >> >> You received this message because you are subscribed to the Google >> >> >> Groups "tesseract-ocr" group. >> >> >> To post to this group, send email to >> >> >> [email protected]<javascript:> >> >> >> To unsubscribe from this group, send email to >> >> >> [email protected] <javascript:> >> >> >> For more options, visit this group at >> >> >> http://groups.google.com/group/tesseract-ocr?hl=en >> >> > >> >> > -- >> >> > You received this message because you are subscribed to the Google >> >> > Groups "tesseract-ocr" group. >> >> > To post to this group, send email to >> >> > [email protected]<javascript:> >> >> > To unsubscribe from this group, send email to >> >> > [email protected] <javascript:> >> >> > For more options, visit this group at >> >> > http://groups.google.com/group/tesseract-ocr?hl=en >> >> > >> >> >> >> -- >> >> You received this message because you are subscribed to the Google >> >> Groups "tesseract-ocr" group. >> >> To post to this group, send email to >> >> [email protected]<javascript:> >> >> To unsubscribe from this group, send email to >> >> [email protected] <javascript:> >> >> For more options, visit this group at >> >> http://groups.google.com/group/tesseract-ocr?hl=en >> > >> > -- >> > You received this message because you are subscribed to the Google >> > Groups "tesseract-ocr" group. >> > To post to this group, send email to >> > [email protected]<javascript:> >> > To unsubscribe from this group, send email to >> > [email protected] <javascript:> >> > For more options, visit this group at >> > http://groups.google.com/group/tesseract-ocr?hl=en >> > >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected]<javascript:> >> To unsubscribe from this group, send email to >> [email protected] <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en >> > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

