[tesseract-ocr] training font
hi , i need your help. i want know that in tesseract-ocr for persian , we have a train for each font or we have a train for all fonts ?thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8fd2ce87-545d-4e10-ad6d-5585b1cb8cfc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Re: training font
thank you On Wednesday, March 22, 2017 at 10:31:41 PM UTC+4:30, Saurabh Srivastav wrote: > > you can train it for single font. > > On Sunday, March 19, 2017 at 1:23:50 PM UTC+5:30, Ava Nimaee wrote: >> >> hi , i need your help. >> i want know that in tesseract-ocr for persian , we have a train for each >> font or we have a train for all fonts ?thanks >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5af424c8-889a-4c21-800e-f210d493a9c4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] what fonts does esseract support?
hi sorry i want know that what fonts does tesseract support? also , what are tesseract's priority for training? thanks. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/9d31c72f-d6e6-4cba-a9ed-3b8c574cc7ae%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] ERROR: Could not find training text file
Hi . sorry I used this syntax: training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \ --noextract_font_properties --langdata_dir langdata \ --tessdata_dir tessdata \ --fontlist "Times New Roman," --output_dir engtrain Befor that i create boxfile and tif and Ucnicahset_output I clone langdata for tesseract v4.0 but take this error: === Phase I: Generating training images === ERROR: Could not find training text file langdata/eng/eng.training_text i can't solve it and i don't know where should i put taining_text.txt actually it is a text file that i want train it. Thanks for attention. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a141d688-bc59-4485-b7bc-66ac650ebfd8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Error:Assert failed:in file text2image.cpp, line 428
I use tesseract v 4.0 on ubuntu 16.04 On Wednesday, July 26, 2017 at 11:20:25 AM UTC+4:30, shree wrote: > > Which version of tesseract are you using? Which platform? > > Try building the latest code from github and use that. > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, Jul 25, 2017 at 9:02 PM, Ava Nimaee <beigy@gmail.com > > wrote: > >> hi >> sorry but i can't solve this error. when i used "text2image >> --text=training_text.txt –outputbase=eng.Times New Roman,.exp0 >> --font='Times New Roman,' --fonts_dir=/usr/share/fonts" >> show me this : >> Output file missing! >> !FLAGS_outputbase.empty():Error:Assert failed:in file text2image.cpp, >> line 428 >> Segmentation fault (core dumped) >> can you please help me? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/67213a5a-a743-4705-8a05-7db4ee4b6a79%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/67213a5a-a743-4705-8a05-7db4ee4b6a79%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/875c5ade-455e-4b1b-bf60-f827231e6f38%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Error:Assert failed:in file text2image.cpp, line 428
Thank for your help On Wednesday, July 26, 2017 at 11:20:25 AM UTC+4:30, shree wrote: > > Which version of tesseract are you using? Which platform? > > Try building the latest code from github and use that. > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, Jul 25, 2017 at 9:02 PM, Ava Nimaee <beigy@gmail.com > > wrote: > >> hi >> sorry but i can't solve this error. when i used "text2image >> --text=training_text.txt –outputbase=eng.Times New Roman,.exp0 >> --font='Times New Roman,' --fonts_dir=/usr/share/fonts" >> show me this : >> Output file missing! >> !FLAGS_outputbase.empty():Error:Assert failed:in file text2image.cpp, >> line 428 >> Segmentation fault (core dumped) >> can you please help me? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/67213a5a-a743-4705-8a05-7db4ee4b6a79%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/67213a5a-a743-4705-8a05-7db4ee4b6a79%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/623d90b5-7269-4450-a297-417dc48290ac%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] ERROR: Could not find training text file
Thanks alot On Monday, July 31, 2017 at 4:10:14 PM UTC+4:30, shree wrote: > > add a line similar to following to your training command, pointing to > where you have your training text > > --training_text ../langdata/eng/eng.training_text \ > > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Mon, Jul 31, 2017 at 4:24 PM, Ava Nimaee <beigy@gmail.com > > wrote: > >> Hi . sorry I used this syntax: >> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng >> --linedata_only \ >> --noextract_font_properties --langdata_dir langdata \ >> --tessdata_dir tessdata \ >> --fontlist "Times New Roman," --output_dir engtrain >> Befor that i create boxfile and tif and Ucnicahset_output >> I clone langdata for tesseract v4.0 >> but take this error: >> === Phase I: Generating training images === >> ERROR: Could not find training text file langdata/eng/eng.training_text >> i can't solve it and i don't know where should i put taining_text.txt >> actually it is a text file that i want train it. >> Thanks for attention. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/a141d688-bc59-4485-b7bc-66ac650ebfd8%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/a141d688-bc59-4485-b7bc-66ac650ebfd8%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/8633cd80-bf08-48ee-b219-de7cede2aafe%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110
Thanks alot. you're right . the path shoulde be compelet i used /home/zohreh/Desktop/tesseract-master/z/engtrian/eng/eng.traineddata insted of z/engtrain/eng/eng.traineddata. it just can write the path from root On Wednesday, August 16, 2017 at 5:50:18 AM UTC+4:30, roberty...@gmail.com wrote: > > Hi, I don't encounter this error. > > But you may check your traineddata whether in the correct directory, as > well as some other paths. > > 在 2017年8月15日星期二 UTC+8下午5:45:17,Ava Nimaee写道: >> >> Hi thanks for your help >> i used your link. but i got this error: >> mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file >> ../lstm/lstmtrainer.h, line 110 >> Segmentation fault (core dumped) >> I wanna start train persian language.so im trying english first. i creat >> boxfile and unicharset .then eng.charset_size=110.txt >> ,eng.Times_New_Roman.exp0.lstmf , eng.traineddata , eng.training_files.txt >> , eng.unicharset >> that all of those have created with this syntax: >> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng >> --training_text training/langdata/eng/eng.training_text >> --linedata_only \ >> --noextract_font_properties --langdata_dir training/langdata \ >> --tessdata_dir ./tessdata \ >> --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian >> and now i have error that i told you >> >> On Monday, August 14, 2017 at 1:00:02 PM UTC+4:30, roberty...@gmail.com >> wrote: >>> >>> What problems do you encounter? Please give more information about the >>> problems. >>> >>> I later used the new tutorial ( >>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact) >>> >>> to train data, and I didn't have any problems. Hope to help you. >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/28e0ed79-a5d2-44a2-824f-a6c408145e84%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110
sorry i have a qustion: what is the output of this syntax.because i after that i have alot of base44.409_2195.checkpoint. but in tutorials i saw eng.lstm and i have not that. whic syntax create eng.lstm? I must thank you for your support at this time On Wednesday, August 16, 2017 at 5:50:18 AM UTC+4:30, roberty...@gmail.com wrote: > > Hi, I don't encounter this error. > > But you may check your traineddata whether in the correct directory, as > well as some other paths. > > 在 2017年8月15日星期二 UTC+8下午5:45:17,Ava Nimaee写道: >> >> Hi thanks for your help >> i used your link. but i got this error: >> mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file >> ../lstm/lstmtrainer.h, line 110 >> Segmentation fault (core dumped) >> I wanna start train persian language.so im trying english first. i creat >> boxfile and unicharset .then eng.charset_size=110.txt >> ,eng.Times_New_Roman.exp0.lstmf , eng.traineddata , eng.training_files.txt >> , eng.unicharset >> that all of those have created with this syntax: >> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng >> --training_text training/langdata/eng/eng.training_text >> --linedata_only \ >> --noextract_font_properties --langdata_dir training/langdata \ >> --tessdata_dir ./tessdata \ >> --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian >> and now i have error that i told you >> >> On Monday, August 14, 2017 at 1:00:02 PM UTC+4:30, roberty...@gmail.com >> wrote: >>> >>> What problems do you encounter? Please give more information about the >>> problems. >>> >>> I later used the new tutorial ( >>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact) >>> >>> to train data, and I didn't have any problems. Hope to help you. >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/df771d0a-f104-4f0b-9628-f281f8c3da3f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110
Hi thanks for your help i used your link. but i got this error: mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110 Segmentation fault (core dumped) I wanna start train persian language.so im trying english first. i creat boxfile and unicharset .then eng.charset_size=110.txt ,eng.Times_New_Roman.exp0.lstmf , eng.traineddata , eng.training_files.txt , eng.unicharset that all of those have created with this syntax: training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --training_text training/langdata/eng/eng.training_text --linedata_only \ --noextract_font_properties --langdata_dir training/langdata \ --tessdata_dir ./tessdata \ --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian and now i have error that i told you On Monday, August 14, 2017 at 1:00:02 PM UTC+4:30, roberty...@gmail.com wrote: > > What problems do you encounter? Please give more information about the > problems. > > I later used the new tutorial ( > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact) > > to train data, and I didn't have any problems. Hope to help you. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/32fbf734-6549-4a5d-8fef-f08ad4085097%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110
I have traineddata in this path: /home/zohreh/tesstutorial/engtrian/eng/eng.traineddata. that with using : training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --training_text training/langdata/eng/eng.training_text --linedata_only \ --noextract_font_properties --langdata_dir training/langdata \ --tessdata_dir ./tessdata \ --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian i created it. And also i used the link that u sent me. sorry shree but i tried alot but i couldn't solve that. On Monday, August 7, 2017 at 10:28:05 PM UTC+4:30, shree wrote: > > You also need to provide a traineddata file as input > > Please review the updated training instructions in the wiki and change the > training commands accordingly. > > On 07-Aug-2017 6:15 PM, "Ava Nimaee" <beigy@gmail.com > > wrote: > >> hi how can you solve it? i have this error too. >> please help me >> >> On Friday, August 4, 2017 at 11:03:41 AM UTC+4:30, roberty...@gmail.com >> wrote: >>> >>> Hello, >>> >>> I use the 'git pull' command to update the code from the link >>> https://github.com/tesseract-ocr/tesseract.git, and I recompile, >>> reinstall the Tess4.0. >>> >>> But when I execute the command (showed in below) to finetune the >>> traineddata, an error appears: >>> "mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file >>> ../lstm/lstmtrainer.h, line 110" >>> >>> lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned >>> \ >>> --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \ >>> --train_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \ >>> --eval_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \ >>> --target_error_rate 0.01 >>> >>> >>> >>> There is nothing wrong with the Tess before updating the code. But now, >>> An assertion error crashes. Why? Can you help me? >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/7c66d368-f232-4eed-abfc-3bba2418f024%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/7c66d368-f232-4eed-abfc-3bba2418f024%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3ae829b7-0a54-4439-b895-46ca2955c77f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Re: training font
sorry about my delay i use tesseract v 4.0 On Saturday, April 8, 2017 at 11:02:33 PM UTC+4:30, peiman F. wrote: > > which version of tesseract you are using > tesseract dont support persian completely yet > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/cfe0f649-1c6c-40c1-8848-21b8a7f37661%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Re: train a new font for language of persian
sorry about me delay i should train some words like as لا in previous version like as this word detect wrong. and now i want understand in version 4.0 we need to font detection or no we can trani any font together? and is there a bach file for tesseract 4.0 ? i can have it ? thanks alot On Friday, May 5, 2017 at 7:01:03 PM UTC+4:30, shree wrote: > > There is already farsi/persian traineddata for tesseract-ocr 4.0-alpha at > https://github.com/tesseract-ocr/tessdata/raw/master/fas.traineddata > > Have you given it a try? Which font do you want to add to it? > > On Thursday, May 4, 2017 at 6:06:09 PM UTC+5:30, Ava Nimaee wrote: >> >> hi every one. i want start to use tesseract to first. i need learn about >> where i shuld start? i want train a new font for persian language .but i >> have been confused. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/3591b7fc-6e1c-4c36-ad0b-fdb5a7615af2%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] create boxfile and tiff
hi i used *text2image --text=training_text.txt --outputbase=eng.* *Times_New_Roman,.exp0 --font='* Times_New_Roman,*' --fonts_dir=* */usr/share/fonts but show this * *FcInitiReinitialize failed!!Could not find font named Arial. Pango suggested font Please correct --font arg.:Error:Assert failed:in file text2image.cpp, line 437Segmentation fault (core dumped)sorry i cant solve itcan you help me* -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/65690cd5-fbd4-4c6c-8bca-228289f71901%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Error:Assert failed:in file text2image.cpp, line 428
hi sorry but i can't solve this error. when i used "text2image --text=training_text.txt –outputbase=eng.Times New Roman,.exp0 --font='Times New Roman,' --fonts_dir=/usr/share/fonts" show me this : Output file missing! !FLAGS_outputbase.empty():Error:Assert failed:in file text2image.cpp, line 428 Segmentation fault (core dumped) can you please help me? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/67213a5a-a743-4705-8a05-7db4ee4b6a79%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata
thank for your attention i remove all and install again last version tesseract and leptonica and use this syntax training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --training_text training/langdata/eng/eng.training_text --linedata_only \ --noextract_font_properties --langdata_dir training/langdata \ --tessdata_dir ./tessdata \ --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian but got a new error. all of things is ok but at the end took this: Setting unichar properties Other case É of é is not in unicharset Setting script properties Failed to read data from: training/langdata/eng/eng.config Null char=2 Invalid format in radical table at line 4: 3400 1.4 Creation of encoded unicharset failed!! Error writing recoder!! Reducing Trie to SquishedDawg Reducing Trie to SquishedDawg Reducing Trie to SquishedDawg Moving /tmp/tmp.GW5DOJr0rG/eng/eng.Times_New_Roman.exp0.lstmf to /home/zohreh/tesstutorial/engtrian Completed training for language 'eng' and i dont have eng.config my langdata . i clone langdata from git's tesseract On Saturday, August 5, 2017 at 5:50:59 PM UTC+4:30, shree wrote: > > tesseract -v > tesseract 4.00.00dev-594-g044e06e-2085 > leptonica-1.74.4 > libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib > 1.2.8 > > Found AVX > Found SSE > > > The above version is working ok on linux > > nice lstmtraining \ >--old_traineddata ../tessdata/best/san.traineddata \ > --continue_from ../tessdata/best/san.lstm \ >--traineddata ../tesstutorial/vedic/san/san.traineddata \ >--train_listfile ../tesstutorial/vedic/san.training_files.txt \ >--eval_listfile ../tesstutorial/vedic/san.eval_files.txt \ > --model_output ../tesstutorial/vedic/santune \ > --max_iterations 200 \ >--debug_interval 0 > > Loaded file ../tessdata/best/san.lstm, unpacking... > Warning: LSTMTrainer deserialized an LSTMRecognizer! > Code range changed from 145 to 2308!! > Num (Extended) outputs,weights in Series: > 1,36,0,1:1, 0 > Num (Extended) outputs,weights in Series: > C3,3:9, 0 > Ft16:16, 160 > Total weights = 160 > [C3,3Ft16]:16, 160 > Mp3,3:16, 0 > Lfys48:48, 12480 > Lfx96:96, 55680 > Lrx96:96, 74112 > Lfx192:192, 221952 > Fc2308:2308, 445444 > Total weights = 809828 > Previous null char=2 mapped to 2 > Continuing from ../tessdata/best/san.lstm > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp0.lstmf > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp-1.lstmf > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.Adobe_Devanagari.exp-2.lstmf > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.Adobe_Devanagari.exp1.lstmf > > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Sat, Aug 5, 2017 at 6:43 PM, ShreeDevi Kumar <shree...@gmail.com > > wrote: > >> did you build the training tools again? >> >> >> ShreeDevi >> >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Sat, Aug 5, 2017 at 6:37 PM, Ava Nimaee <beigy@gmail.com >> > wrote: >> >>> yes, you said me and i clone last tesseract-master and insatll it and >>> leptoica again and make tiff and box file and unicharest and then use this >>> syntax: >>> training/tesstrain.sh \ >>> --fonts_dir /usr/share/fonts \ >>> --lang eng \ >>> --training_text langdata/eng/eng.training_text \ >>> --linedata_only \ >>> --noextract_font_properties --langdata_dir langdata \ >>> --tessdata_dir ./tessdata \ >>> --fontlist "Times New Roman," \ >>> --output_dir tesstutorial/engtrian >>> >>> training/tesstrain.sh \ >>> --fonts_dir /usr/share/fonts \ >>> --lang eng \ >>> --training_text langdata/eng/eng.training_text \ >>> --linedata_only \ >>> --noextract_font_properties --langdata_dir langdata \ >>> --tessdata_dir ./tessdata \ >>> --output_dir tesstutorial/engeval >>> and finally i use the last code that i said took error. >>> and for last syntax i put langdata/eng on folder of engtrian >>> >>> >>> On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote: >>>> >>>> Are you using the latest source of programs from github for building >>>> tesseract? >>>>
[tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110
hi how can you solve it? i have this error too. please help me On Friday, August 4, 2017 at 11:03:41 AM UTC+4:30, roberty...@gmail.com wrote: > > Hello, > > I use the 'git pull' command to update the code from the link > https://github.com/tesseract-ocr/tesseract.git, and I recompile, > reinstall the Tess4.0. > > But when I execute the command (showed in below) to finetune the > traineddata, an error appears: > "mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file > ../lstm/lstmtrainer.h, line 110" > > lstmtraining --model_output ~/tesstutorial/chituned_from_chisim/chituned \ > --continue_from ~/tesstutorial/chituned_from_chisim/chi_sim.lstm \ > --train_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \ > --eval_listfile ~/tesstutorial/chitest/chi_sim.training_files.txt \ > --target_error_rate 0.01 > > > > There is nothing wrong with the Tess before updating the code. But now, An > assertion error crashes. Why? Can you help me? > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7c66d368-f232-4eed-abfc-3bba2418f024%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata
I'll do thank you On Monday, August 7, 2017 at 12:38:39 PM UTC+4:30, shree wrote: > > There have been changes since then. > > Either update your git repository via > > git pull origin > > or > > clone it again. > > > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Mon, Aug 7, 2017 at 12:26 PM, Ava Nimaee <beigy@gmail.com > > wrote: > >> about 3 weeks ago >> >> >> On Sunday, August 6, 2017 at 7:59:44 AM UTC+4:30, shree wrote: >>> >>> >Invalid format in radical table at line 4: 3400 1.4 >>> >>> When did you clone langdata? >>> >>> Ray has updated radical-stroke.txt 11 days ago - see >>> https://github.com/tesseract-ocr/langdata/commit/3e32be3dc07be0994f3687664a44cb3246b5aa11 >>> >>> ShreeDevi >>> >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Sat, Aug 5, 2017 at 10:56 PM, Ava Nimaee <beigy@gmail.com> wrote: >>> >>>> thank for your attention >>>> i remove all and install again last version tesseract and leptonica and >>>> use this syntax >>>> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng >>>> --training_text training/langdata/eng/eng.training_text >>>> --linedata_only \ >>>> --noextract_font_properties --langdata_dir training/langdata \ >>>> --tessdata_dir ./tessdata \ >>>> --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian >>>> >>>> but got a new error. all of things is ok but at the end took this: >>>> >>>> Setting unichar properties >>>> Other case É of é is not in unicharset >>>> Setting script properties >>>> Failed to read data from: training/langdata/eng/eng.config >>>> Null char=2 >>>> Invalid format in radical table at line 4: 3400 1.4 >>>> Creation of encoded unicharset failed!! >>>> Error writing recoder!! >>>> Reducing Trie to SquishedDawg >>>> Reducing Trie to SquishedDawg >>>> Reducing Trie to SquishedDawg >>>> Moving /tmp/tmp.GW5DOJr0rG/eng/eng.Times_New_Roman.exp0.lstmf to >>>> /home/zohreh/tesstutorial/engtrian >>>> >>>> Completed training for language 'eng' >>>> and i dont have eng.config my langdata . i clone langdata from git's >>>> tesseract >>>> >>>> >>>> On Saturday, August 5, 2017 at 5:50:59 PM UTC+4:30, shree wrote: >>>>> >>>>> tesseract -v >>>>> tesseract 4.00.00dev-594-g044e06e-2085 >>>>> leptonica-1.74.4 >>>>> libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : >>>>> zlib 1.2.8 >>>>> >>>>> Found AVX >>>>> Found SSE >>>>> >>>>> >>>>> The above version is working ok on linux >>>>> >>>>> nice lstmtraining \ >>>>>--old_traineddata ../tessdata/best/san.traineddata \ >>>>> --continue_from ../tessdata/best/san.lstm \ >>>>>--traineddata ../tesstutorial/vedic/san/san.traineddata \ >>>>>--train_listfile ../tesstutorial/vedic/san.training_files.txt \ >>>>>--eval_listfile ../tesstutorial/vedic/san.eval_files.txt \ >>>>> --model_output ../tesstutorial/vedic/santune \ >>>>> --max_iterations 200 \ >>>>>--debug_interval 0 >>>>> >>>>> Loaded file ../tessdata/best/san.lstm, unpacking... >>>>> Warning: LSTMTrainer deserialized an LSTMRecognizer! >>>>> Code range changed from 145 to 2308!! >>>>> Num (Extended) outputs,weights in Series: >>>>> 1,36,0,1:1, 0 >>>>> Num (Extended) outputs,weights in Series: >>>>> C3,3:9, 0 >>>>> Ft16:16, 160 >>>>> Total weights = 160 >>>>> [C3,3Ft16]:16, 160 >>>>> Mp3,3:16, 0 >>>>> Lfys48:48, 12480 >>>>> Lfx96:96, 55680 >>>>> Lrx96:96, 74112 >>>>> Lfx192:192, 221952 >>>>> Fc2308:2308, 445444 >>>>> Total weights = 809828 >>>>> Previous null char=2 mapped to 2 >>>>> Continuing from ../tessdata/best/san.lstm >>>>> Loaded 13
Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata
yes, you said me and i clone last tesseract-master and insatll it and leptoica again and make tiff and box file and unicharest and then use this syntax: training/tesstrain.sh \ --fonts_dir /usr/share/fonts \ --lang eng \ --training_text langdata/eng/eng.training_text \ --linedata_only \ --noextract_font_properties --langdata_dir langdata \ --tessdata_dir ./tessdata \ --fontlist "Times New Roman," \ --output_dir tesstutorial/engtrian training/tesstrain.sh \ --fonts_dir /usr/share/fonts \ --lang eng \ --training_text langdata/eng/eng.training_text \ --linedata_only \ --noextract_font_properties --langdata_dir langdata \ --tessdata_dir ./tessdata \ --output_dir tesstutorial/engeval and finally i use the last code that i said took error. and for last syntax i put langdata/eng on folder of engtrian On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote: > > Are you using the latest source of programs from github for building > tesseract? > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Sat, Aug 5, 2017 at 6:21 PM, Ava Nimaee <beigy@gmail.com > > wrote: > >> Hi >> i used this syntax: >> >> training/lstmtraining --debug_interval 100 \ >> --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ >> --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ >> --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \ >> --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ >> --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ >> --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log >> >> and put eng.traineddata on right path but has an error: >> >> ERROR: Non-existent flag --traineddata >> >> can you help me? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a9e00cdf-64d2-4cfe-9ff8-de931c34d798%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata
yes but i just cant install those syntax make ScrollView.jar export SCROLLVIEW_PATH=$PWD/java On Saturday, August 5, 2017 at 5:44:20 PM UTC+4:30, shree wrote: > > did you build the training tools again? > > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Sat, Aug 5, 2017 at 6:37 PM, Ava Nimaee <beigy@gmail.com > > wrote: > >> yes, you said me and i clone last tesseract-master and insatll it and >> leptoica again and make tiff and box file and unicharest and then use this >> syntax: >> training/tesstrain.sh \ >> --fonts_dir /usr/share/fonts \ >> --lang eng \ >> --training_text langdata/eng/eng.training_text \ >> --linedata_only \ >> --noextract_font_properties --langdata_dir langdata \ >> --tessdata_dir ./tessdata \ >> --fontlist "Times New Roman," \ >> --output_dir tesstutorial/engtrian >> >> training/tesstrain.sh \ >> --fonts_dir /usr/share/fonts \ >> --lang eng \ >> --training_text langdata/eng/eng.training_text \ >> --linedata_only \ >> --noextract_font_properties --langdata_dir langdata \ >> --tessdata_dir ./tessdata \ >> --output_dir tesstutorial/engeval >> and finally i use the last code that i said took error. >> and for last syntax i put langdata/eng on folder of engtrian >> >> >> On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote: >>> >>> Are you using the latest source of programs from github for building >>> tesseract? >>> >>> ShreeDevi >>> >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Sat, Aug 5, 2017 at 6:21 PM, Ava Nimaee <beigy@gmail.com> wrote: >>> >>>> Hi >>>> i used this syntax: >>>> >>>> training/lstmtraining --debug_interval 100 \ >>>> --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ >>>> --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ >>>> --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \ >>>> --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ >>>> --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ >>>> --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log >>>> >>>> and put eng.traineddata on right path but has an error: >>>> >>>> ERROR: Non-existent flag --traineddata >>>> >>>> can you help me? >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To post to this group, send email to tesser...@googlegroups.com. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com?utm_medium=email_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/a9e00cdf-64d2-4cfe-9ff8-de931c34d798%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/a9e00cdf-64d2-4cfe-9ff8-de931c34d798%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2b28aeff-5f90-4353-bee7-c8fe001a36cf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata
thanks alot i try again On Saturday, August 5, 2017 at 5:50:59 PM UTC+4:30, shree wrote: > > tesseract -v > tesseract 4.00.00dev-594-g044e06e-2085 > leptonica-1.74.4 > libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib > 1.2.8 > > Found AVX > Found SSE > > > The above version is working ok on linux > > nice lstmtraining \ >--old_traineddata ../tessdata/best/san.traineddata \ > --continue_from ../tessdata/best/san.lstm \ >--traineddata ../tesstutorial/vedic/san/san.traineddata \ >--train_listfile ../tesstutorial/vedic/san.training_files.txt \ >--eval_listfile ../tesstutorial/vedic/san.eval_files.txt \ > --model_output ../tesstutorial/vedic/santune \ > --max_iterations 200 \ >--debug_interval 0 > > Loaded file ../tessdata/best/san.lstm, unpacking... > Warning: LSTMTrainer deserialized an LSTMRecognizer! > Code range changed from 145 to 2308!! > Num (Extended) outputs,weights in Series: > 1,36,0,1:1, 0 > Num (Extended) outputs,weights in Series: > C3,3:9, 0 > Ft16:16, 160 > Total weights = 160 > [C3,3Ft16]:16, 160 > Mp3,3:16, 0 > Lfys48:48, 12480 > Lfx96:96, 55680 > Lrx96:96, 74112 > Lfx192:192, 221952 > Fc2308:2308, 445444 > Total weights = 809828 > Previous null char=2 mapped to 2 > Continuing from ../tessdata/best/san.lstm > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp0.lstmf > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp-1.lstmf > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.Adobe_Devanagari.exp-2.lstmf > Loaded 138/138 pages (1-138) of document > ../tesstutorial/vedic/san.Adobe_Devanagari.exp1.lstmf > > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Sat, Aug 5, 2017 at 6:43 PM, ShreeDevi Kumar <shree...@gmail.com > > wrote: > >> did you build the training tools again? >> >> >> ShreeDevi >> >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Sat, Aug 5, 2017 at 6:37 PM, Ava Nimaee <beigy@gmail.com >> > wrote: >> >>> yes, you said me and i clone last tesseract-master and insatll it and >>> leptoica again and make tiff and box file and unicharest and then use this >>> syntax: >>> training/tesstrain.sh \ >>> --fonts_dir /usr/share/fonts \ >>> --lang eng \ >>> --training_text langdata/eng/eng.training_text \ >>> --linedata_only \ >>> --noextract_font_properties --langdata_dir langdata \ >>> --tessdata_dir ./tessdata \ >>> --fontlist "Times New Roman," \ >>> --output_dir tesstutorial/engtrian >>> >>> training/tesstrain.sh \ >>> --fonts_dir /usr/share/fonts \ >>> --lang eng \ >>> --training_text langdata/eng/eng.training_text \ >>> --linedata_only \ >>> --noextract_font_properties --langdata_dir langdata \ >>> --tessdata_dir ./tessdata \ >>> --output_dir tesstutorial/engeval >>> and finally i use the last code that i said took error. >>> and for last syntax i put langdata/eng on folder of engtrian >>> >>> >>> On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote: >>>> >>>> Are you using the latest source of programs from github for building >>>> tesseract? >>>> >>>> ShreeDevi >>>> >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Sat, Aug 5, 2017 at 6:21 PM, Ava Nimaee <beigy@gmail.com> wrote: >>>> >>>>> Hi >>>>> i used this syntax: >>>>> >>>>> training/lstmtraining --debug_interval 100 \ >>>>> --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ >>>>> --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' >>>>> \ >>>>> --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \ >>>>> --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ >>>>> --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ >>>>> --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log >>>>> >>>>> an
Re: [tesseract-ocr] ERROR: Non-existent flag --traineddata
i'm using linux ubuntu 16.04 On Saturday, August 5, 2017 at 5:57:01 PM UTC+4:30, shree wrote: > > Are you using linux or windows? > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Sat, Aug 5, 2017 at 6:55 PM, Ava Nimaee <beigy@gmail.com > > wrote: > >> thanks alot i try again >> >> >> On Saturday, August 5, 2017 at 5:50:59 PM UTC+4:30, shree wrote: >> >>> tesseract -v >>> tesseract 4.00.00dev-594-g044e06e-2085 >>> leptonica-1.74.4 >>> libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : >>> zlib 1.2.8 >>> >>> Found AVX >>> Found SSE >>> >>> >>> The above version is working ok on linux >>> >>> nice lstmtraining \ >>>--old_traineddata ../tessdata/best/san.traineddata \ >>> --continue_from ../tessdata/best/san.lstm \ >>>--traineddata ../tesstutorial/vedic/san/san.traineddata \ >>>--train_listfile ../tesstutorial/vedic/san.training_files.txt \ >>>--eval_listfile ../tesstutorial/vedic/san.eval_files.txt \ >>> --model_output ../tesstutorial/vedic/santune \ >>> --max_iterations 200 \ >>>--debug_interval 0 >>> >>> Loaded file ../tessdata/best/san.lstm, unpacking... >>> Warning: LSTMTrainer deserialized an LSTMRecognizer! >>> Code range changed from 145 to 2308!! >>> Num (Extended) outputs,weights in Series: >>> 1,36,0,1:1, 0 >>> Num (Extended) outputs,weights in Series: >>> C3,3:9, 0 >>> Ft16:16, 160 >>> Total weights = 160 >>> [C3,3Ft16]:16, 160 >>> Mp3,3:16, 0 >>> Lfys48:48, 12480 >>> Lfx96:96, 55680 >>> Lrx96:96, 74112 >>> Lfx192:192, 221952 >>> Fc2308:2308, 445444 >>> Total weights = 809828 >>> Previous null char=2 mapped to 2 >>> Continuing from ../tessdata/best/san.lstm >>> Loaded 138/138 pages (1-138) of document >>> ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp0.lstmf >>> Loaded 138/138 pages (1-138) of document >>> ../tesstutorial/vedic/san.AA_NAGARI_SHREE_L3.exp-1.lstmf >>> Loaded 138/138 pages (1-138) of document >>> ../tesstutorial/vedic/san.Adobe_Devanagari.exp-2.lstmf >>> Loaded 138/138 pages (1-138) of document >>> ../tesstutorial/vedic/san.Adobe_Devanagari.exp1.lstmf >>> >>> >>> ShreeDevi >>> >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Sat, Aug 5, 2017 at 6:43 PM, ShreeDevi Kumar <shree...@gmail.com> >>> wrote: >>> >>>> did you build the training tools again? >>>> >>>> >>>> ShreeDevi >>>> >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Sat, Aug 5, 2017 at 6:37 PM, Ava Nimaee <beigy@gmail.com> wrote: >>>> >>>>> yes, you said me and i clone last tesseract-master and insatll it and >>>>> leptoica again and make tiff and box file and unicharest and then use >>>>> this >>>>> syntax: >>>>> training/tesstrain.sh \ >>>>> --fonts_dir /usr/share/fonts \ >>>>> --lang eng \ >>>>> --training_text langdata/eng/eng.training_text \ >>>>> --linedata_only \ >>>>> --noextract_font_properties --langdata_dir langdata \ >>>>> --tessdata_dir ./tessdata \ >>>>> --fontlist "Times New Roman," \ >>>>> --output_dir tesstutorial/engtrian >>>>> >>>>> training/tesstrain.sh \ >>>>> --fonts_dir /usr/share/fonts \ >>>>> --lang eng \ >>>>> --training_text langdata/eng/eng.training_text \ >>>>> --linedata_only \ >>>>> --noextract_font_properties --langdata_dir langdata \ >>>>> --tessdata_dir ./tessdata \ >>>>> --output_dir tesstutorial/engeval >>>>> and finally i use the last code that i said took error. >>>>> and for last syntax i put langdata/eng on folder of engtrian >>>>> >>>>> >>>>> On Saturday, August 5, 2017 at 5:28:48 PM UTC+4:30, shree wrote: >>>>>> >>>
[tesseract-ocr] ERROR: Non-existent flag --traineddata
Hi i used this syntax: training/lstmtraining --debug_interval 100 \ --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ --model_output ~/tesstutorial/engoutput/base --learning_rate 20e-4 \ --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log and put eng.traineddata on right path but has an error: ERROR: Non-existent flag --traineddata can you help me? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/30f1bf28-ea15-4999-b9ca-bccfed2be66f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Failed to load list of training filenames from
we tried but for some word and font, it is not so good and we decied train it On Friday, August 4, 2017 at 7:30:04 PM UTC+4:30, shree wrote: > > Please try the ocr with new tessdata/best/far.traineddata - farsi - > persian and provide your feedback for Ray to improve the training. > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Fri, Aug 4, 2017 at 6:40 PM, Ava Nimaee <beigy@gmail.com > > wrote: > >> Thanks alot. >> Im so sorry beacuse i strart train tesseract 4.0 for persian and i dont >> have any experiance about it. i've tried alot. but i face alot of error. >> Many thanks for your assistance in our project >> >> On Friday, August 4, 2017 at 4:12:34 PM UTC+4:30, shree wrote: >>> >>> Please check tesseract training wiki for new instructions. >>> >>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 >>> >>> Use the latest code from github. >>> >>> ShreeDevi >>> ________ >>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>> >>> On Fri, Aug 4, 2017 at 5:03 PM, Ava Nimaee <beigy@gmail.com> wrote: >>> >>>> Hi sorry i have an error >>>> can you help me? >>>> I use this syntax: >>>> lstmtraining -U ../tesstutorial/englayer_from_eng/eng.unicharset \ >>>> --script_dir langdata --debug_interval 0 \ >>>> --continue_from ../tesstutorial/englayer_from_eng/eng.lstm \ >>>> --append_index 5 --net_spec '[Lfx256 O1c105]' \ >>>> --model_output ../tesstutorial/englayer_from_eng/englayer \ >>>> --train_listfile ../tesstutorial/engtrain/eng.training_files.txt \ >>>> --eval_listfile ../tesstutorial/engeval/eng.training_files.txt \ >>>> --max_iterations 5 >>>> but take an error : >>>> Failed to load list of training filenames from >>>> ../tesstutorial/engtrain/eng.training_files.txt >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To post to this group, send email to tesser...@googlegroups.com. >>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/986d017a-b04a-442b-8cfe-877aed950858%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/986d017a-b04a-442b-8cfe-877aed950858%40googlegroups.com?utm_medium=email_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/406f2fde--4f86-b152-0b4358eaaeb7%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/406f2fde--4f86-b152-0b4358eaaeb7%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/03bcdac4-ab33-41d7-9428-3799d03e7e46%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Failed to load list of training filenames from
Hi sorry i have an error can you help me? I use this syntax: lstmtraining -U ../tesstutorial/englayer_from_eng/eng.unicharset \ --script_dir langdata --debug_interval 0 \ --continue_from ../tesstutorial/englayer_from_eng/eng.lstm \ --append_index 5 --net_spec '[Lfx256 O1c105]' \ --model_output ../tesstutorial/englayer_from_eng/englayer \ --train_listfile ../tesstutorial/engtrain/eng.training_files.txt \ --eval_listfile ../tesstutorial/engeval/eng.training_files.txt \ --max_iterations 5 but take an error : Failed to load list of training filenames from ../tesstutorial/engtrain/eng.training_files.txt -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/986d017a-b04a-442b-8cfe-877aed950858%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Failed to load list of training filenames from
Thanks alot. Im so sorry beacuse i strart train tesseract 4.0 for persian and i dont have any experiance about it. i've tried alot. but i face alot of error. Many thanks for your assistance in our project On Friday, August 4, 2017 at 4:12:34 PM UTC+4:30, shree wrote: > > Please check tesseract training wiki for new instructions. > > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 > > Use the latest code from github. > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Fri, Aug 4, 2017 at 5:03 PM, Ava Nimaee <beigy@gmail.com > > wrote: > >> Hi sorry i have an error >> can you help me? >> I use this syntax: >> lstmtraining -U ../tesstutorial/englayer_from_eng/eng.unicharset \ >> --script_dir langdata --debug_interval 0 \ >> --continue_from ../tesstutorial/englayer_from_eng/eng.lstm \ >> --append_index 5 --net_spec '[Lfx256 O1c105]' \ >> --model_output ../tesstutorial/englayer_from_eng/englayer \ >> --train_listfile ../tesstutorial/engtrain/eng.training_files.txt \ >> --eval_listfile ../tesstutorial/engeval/eng.training_files.txt \ >> --max_iterations 5 >> but take an error : >> Failed to load list of training filenames from >> ../tesstutorial/engtrain/eng.training_files.txt >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/986d017a-b04a-442b-8cfe-877aed950858%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/986d017a-b04a-442b-8cfe-877aed950858%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/406f2fde--4f86-b152-0b4358eaaeb7%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] train a new font for language of persian
hi every one. i want start to use tesseract to first. i need learn about where i shuld start? i want train a new font for persian language .but i have been confused. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ca426caf-727c-42c6-93f4-17f2b4ac12bb%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Re: Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110
Hi shree, Thanks alot for attention. i corrected all syntax and i can generate some base70.229_1900.checkpoint and have just files hike it. but in tutorials, there is eng.lstm. how can i create it . actually what is eng.lstm. and what is lstm-punc-dawg? it is similar eng.punc's file that Mr.Smit put in landgata/eng? On Wednesday, August 16, 2017 at 8:07:47 PM UTC+4:30, shree wrote: > > Please check the updated tutorials in the wiki. There have been many > changes. > > On 16-Aug-2017 3:50 PM, "Ava Nimaee" <beigy@gmail.com > > wrote: > >> sorry i have a qustion: >> what is the output of this syntax.because i after that i have alot of >> base44.409_2195.checkpoint. but in tutorials i saw eng.lstm >> and i have not that. whic syntax create eng.lstm? >> >> I must thank you for your support at this time >> >> >> On Wednesday, August 16, 2017 at 5:50:18 AM UTC+4:30, >> roberty...@gmail.com wrote: >>> >>> Hi, I don't encounter this error. >>> >>> But you may check your traineddata whether in the correct directory, as >>> well as some other paths. >>> >>> 在 2017年8月15日星期二 UTC+8下午5:45:17,Ava Nimaee写道: >>>> >>>> Hi thanks for your help >>>> i used your link. but i got this error: >>>> mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file >>>> ../lstm/lstmtrainer.h, line 110 >>>> Segmentation fault (core dumped) >>>> I wanna start train persian language.so im trying english first. i >>>> creat boxfile and unicharset .then eng.charset_size=110.txt >>>> ,eng.Times_New_Roman.exp0.lstmf , eng.traineddata , eng.training_files.txt >>>> , eng.unicharset >>>> that all of those have created with this syntax: >>>> training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng >>>> --training_text training/langdata/eng/eng.training_text >>>> --linedata_only \ >>>> --noextract_font_properties --langdata_dir training/langdata \ >>>> --tessdata_dir ./tessdata \ >>>> --fontlist "Times New Roman," --output_dir ~/tesstutorial/engtrian >>>> and now i have error that i told you >>>> >>>> On Monday, August 14, 2017 at 1:00:02 PM UTC+4:30, roberty...@gmail.com >>>> wrote: >>>>> >>>>> What problems do you encounter? Please give more information about >>>>> the problems. >>>>> >>>>> I later used the new tutorial ( >>>>> https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact) >>>>> >>>>> to train data, and I didn't have any problems. Hope to help you. >>>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/df771d0a-f104-4f0b-9628-f281f8c3da3f%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/df771d0a-f104-4f0b-9628-f281f8c3da3f%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d8720e3f-39eb-4171-9993-a81e4a8b0105%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Error:Assert failed:in file ../lstm/lstmtrainer.h, line 110
Hi shree I read instructions on the training wiki page but i dont have eng.lstm non of the syntaxs create eng.lstm. how can i create it. even i check langdata which i download it form git amd there is't there. i spend alot of time but i don't khonw how i can create it. can you tell me. On Monday, August 21, 2017 at 7:41:41 PM UTC+4:30, shree wrote: > > lstm file is the language model. It is saved in traineddata file. > > dawgs are a kind of compressed files, created from lists of words, > punctuation or numbers. > > You can use dawg2wordlist to unpack them. > > Please follow the instructions on the training wiki page. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b671d71c-181d-4cac-8def-122c74a0af12%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] create unicharset for RTL language
Hi i need your help i need to create boxfile and unicharset for Persian language. i used the syntax that i used for Latin. but the results are revers. could you please tell me how do i do this? thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c086eee3-2d3e-4ec0-9ad2-6550dadbd753%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Create boxfile and unicharset for RTL language
Hi i need your help i need to create boxfile and unicharset for Persian language. i used the syntax that i used for Latin. but the results are revers. could you please tell me how do i do this? thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/42bf0393-8b56-43c2-b88d-af68b4967c71%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] unicharset and boxfile for tesseract 4
Hi i want know about unicharset and box file in tesseract 4 for RTL script. i trained but the result is not good.can anyone give me the link about it?and also xheight -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2ede74a5-15a5-454f-b87a-48a6614942b1%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Create boxfile and unicharset for RTL language
I understand just difference RTL language with LTR is at unicharset. i create unichraset with its tool but how can i create xheight for persian. there is my unicharset after convert it to RTL 36 NULL 0 NULL 0 Joined 7 0,69,188,255,486,1218,0,30,486,1188 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a |Broken|0|1 f 0,69,186,255,892,2138,0,80,892,2058 Common 2 10 2 |Broken|0|1 # Broken س 1 0,255,0,255,0,0,0,0,0,0 Arabic 3 13 3 س # س [633 200d ]x ل 1 0,255,0,255,0,0,0,0,0,0 Inherited 4 18 4 ل # ل [200d 644 200d ]x ا 1 0,255,0,255,0,0,0,0,0,0 Inherited 5 18 5 ا # ا [200d 627 ]x م 1 0,64,134,241,51,272,0,46,56,313 Arabic 6 13 6 م # م [645 ]x ع 1 0,255,0,255,0,0,0,0,0,0 Arabic 7 13 7 ع # ع [639 200d ]x ی 1 0,255,0,255,0,0,0,0,0,0 Inherited 8 18 8 ی # ی [200d 6cc 200d ]x ک 1 0,255,0,255,0,0,0,0,0,0 Inherited 9 18 9 ک # ک [200d 6a9 200d ]x م 1 0,255,0,255,0,0,0,0,0,0 Inherited 10 18 10 م # م [200d 645 ]x م 1 0,255,0,255,0,0,0,0,0,0 Arabic 11 13 11 م # م [645 200d ]x ه 1 0,255,0,255,0,0,0,0,0,0 Inherited 12 18 12 ه # ه [200d 647 200d ]x ذ 1 0,255,0,255,0,0,0,0,0,0 Inherited 13 18 13 ذ # ذ [200d 630 ]x ا 1 26,117,200,255,11,181,7,82,33,222 Arabic 14 13 14 ا # ا [627 ]x ک 1 0,255,0,255,0,0,0,0,0,0 Arabic 15 13 15 ک # ک [6a9 200d ]x ج 1 0,255,0,255,0,0,0,0,0,0 Inherited 16 18 16 ج # ج [200d 62c 200d ]x ی 1 0,255,0,255,0,0,0,0,0,0 Arabic 17 13 17 ی # ی [6cc 200d ]x ی 1 0,255,0,255,0,0,0,0,0,0 Inherited 18 18 18 ی # ی [200d 6cc ]x ش 1 0,255,0,255,0,0,0,0,0,0 Arabic 19 13 19 ش # ش [634 200d ]x م 1 0,255,0,255,0,0,0,0,0,0 Inherited 20 18 20 م # م [200d 645 200d ]x ل 1 0,255,0,255,0,0,0,0,0,0 Arabic 21 13 21 ل # ل [644 200d ]x ن 1 0,255,0,255,0,0,0,0,0,0 Inherited 22 18 22 ن # ن [200d 646 ]x ب 1 0,255,0,255,0,0,0,0,0,0 Inherited 23 18 23 ب # ب [200d 628 200d ]x ز 1 0,255,0,255,0,0,0,0,0,0 Inherited 24 18 24 ز # ز [200d 632 ]x ت 1 0,255,0,255,0,0,0,0,0,0 Inherited 25 18 25 ت # ت [200d 62a ]x . 10 12,108,64,140,18,52,9,77,52,193 Common 26 6 26 . # . [2e ]p و 1 0,68,137,238,65,290,0,27,62,256 Arabic 27 13 27 و # و [648 ]x ن 1 0,255,0,255,0,0,0,0,0,0 Arabic 28 13 28 ن # ن [646 200d ]x س 1 0,255,0,255,0,0,0,0,0,0 Inherited 29 18 29 س # س [200d 633 200d ]x ن 1 0,88,163,255,68,321,0,52,76,354 Arabic 30 13 30 ن # ن [646 ]x ب 1 0,255,0,255,0,0,0,0,0,0 Arabic 31 13 31 ب # ب [628 200d ]x و 1 0,255,0,255,0,0,0,0,0,0 Inherited 32 18 32 و # و [200d 648 ]x پ 1 0,255,0,255,0,0,0,0,0,0 Arabic 33 13 33 پ # پ [67e 200d ]x ر 1 0,255,0,255,0,0,0,0,0,0 Inherited 34 18 34 ر # ر [200d 631 ]x ی 1 0,71,148,225,95,253,0,45,103,279 Arabic 35 13 35 ی # ی [6cc ]x but "Inherited" don't have any unicharset in langdata and without it train is not so good fpr example i fine tune for "لا". it is part of "Inherited". can you please tell me how can i create xheight for persian's font and about "Inherited" and also about appropriate RTL flags for persian language. thanks On Thursday, August 31, 2017 at 5:32:59 PM UTC+4:30, shree wrote: > > Use tesstrain.sh for training. > > It should apply the appropriate RTL flags for persian language. > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Thu, Aug 31, 2017 at 2:39 PM, Ava Nimaee <beigy@gmail.com > > wrote: > >> Hi i need your help >> i need to create boxfile and unicharset for Persian language. i used the >> syntax that i used for Latin. but the results are revers. could you please >> tell me how do i do this? >> thanks >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/42bf0393-8b56-43c2-b88d-af68b4967c71%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/42bf0393-8b56-43c2-b88d-af68b4967c71%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/d9dc6ac8-9803-4596-adf4-79fcf6fb5559%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Size of font
Hi, I have been training Tesseract for Persian language. i have been doing this from scratch and now i want to know how i can set size of font. In fact the size is important? Thanks. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0526250f-4111-4bb5-a902-28db099a707e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.