[tesseract-ocr] 2 Layer CNN
Hi, I used 2 Layer CNN Like this: [1,48,0,1Ct3,3,8Mp3,3Ct3,3,16Mp2,2Lbys64Lbx512O1c1] But the error rate of error is high even after 30 iteration In Tesseract can we use 2 layers CNN? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/784278e4-d82b-4c4f-ae1d-35141e225c79%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] What is the information in basetrain.log
Hi, Does any one know about the information in the log file that create while training? Warning: given outputs 1 not equal to unicharset of 165. Num outputs,weights in Series: 1,48,0,1:1, 0 Num outputs,weights in Series: C3,3:9, 0 Ft16:16, 160 Total weights = 160 [C3,3Ft16]:16, 160 Mp3,3:16, 0 Lbys64:128, 41472 Lbx128:256, 263168 Lby256:512, 1050624 Lbx512:1024, 4198400 Fc165:165, 169125 Total weights = 5722949 Built network:[1,48,0,1[C3,3Ft16]Mp3,3Lbys64Lbx128Lby256Lbx512Fc165] from request [1,48,0,1Ct3,3,16Mp3,3Lbys64Lbx128Lby256Lbx512O1c1] Espacially this part: Num outputs,weights in Series: 1,48,0,1:1, 0 Num outputs,weights in Series: C3,3:9, 0 Ft16:16, 160 Total weights = 160 [C3,3Ft16]:16, 160 Mp3,3:16, 0 Thanks for your help. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/1197c56d-aa4d-4e82-8d4d-9ad4fa9e2449%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] CNN and Tensorflow in Tesseract
I have some question: 1- how many layers does CNN has in tensor flow? 2- What is the stride in the Convolution layers and pooling layers? 3- Convolution has use zero pad? I'm training Persian language and my accuracy is so good but I need to increase. Convolution is so important in my training. Does any one know the answers? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/ab23f73b-8fb5-44e7-9872-5b79eba03a54%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] the length of input to lstm
I'm training tesseract from scratch for the Persian Language. But I need to know about the output of TF convention because it is the input of lstm. Wiki says, for example, ct5,5,32. I couldn't understand the number of output. In this case,32 is depth. but how about the number of output. Besides, somewhere say, 32 is the number of the filter. Can anyone describe it to me? In summary, when the network is: [1,48,0,1[C3,3Ft16]Mp3,3Lfys64Lfx96Lrx96Lfx192Fc165] what are the numbers of inputs? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/badf97f0-1420-43ce-9879-7ccc3ab79a05%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
Hi, I have been runnig about 130G data which are 4000 files. My command is /home/kddlab/Desktop/tesseract-master/src/training/lstmtraining \ --traineddata /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \ --model_output /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base --learning_rate 0.001 \ --train_listfile /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt \ --eval_listfile /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt \ --max_iterations 15 &>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log but after reading some files the tesseract gives the error and stop training: Loaded 821/10179 pages (1-821) of document /home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed. Could you please help me? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Train Persian fonts
Hi, I could train Persian fonts and the result is so good. But there is a problem when the text includes English and Persian Words. In the training phase, Tesseract removes all English words in Persian fonts. How can I generate tif and box file for Persian fonts when the text includes English words? Thanks. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/54337c84-c9a3-498c-b703-f125b5bfd9d9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Re: error: Must provide a --traineddata see training wiki
On Sunday, September 30, 2018 at 12:20:19 PM UTC+3:30, Zohreh Khosrobeygi wrote: > > I am trying to finetun train for tesseract > I've created a new fas.traindata and extract best traind data for persian > and then run below command: > > combine_tessdata -e tessdata/best/fas.traineddata \ > /home/zohreh/Desktop/tesseract-master/tessdata/ext-best/fas.lstm > > training/lstmtraining --model_output /training/langdata/Test/out \ > --continue_fromtraining /langdata/Test/ext-best/fas.lstm \ --traineddata > training/langdata/Test/ALLData/fas/fas.traineddata \ --old_traineddata > training/langdata/Test/best/fas.traineddata \ --train_listfile > training/langdata/Test/ALLData/fas.training_files.txt \ --max_iterations > 3600 --target_error_rate 0.01 > But I've got this errore: > > Must provide a --traineddata see training wiki > What Is the problem? > Thanks > * I pasted wrong: it's true combine_tessdata -e training/langdata/Test/best/fas.traineddata \ training/langdata/Test/ext-best/fas. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/250befd1-cfbf-4f72-89db-c84db6375b0e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] error: Must provide a --traineddata see training wiki
I am trying to finetun train for tesseract I've created a new fas.traindata and extract best traind data for persian and then run below command: combine_tessdata -e tessdata/best/fas.traineddata \ /home/zohreh/Desktop/tesseract-master/tessdata/ext-best/fas.lstm training/lstmtraining --model_output /training/langdata/Test/out \ --continue_fromtraining /langdata/Test/ext-best/fas.lstm \ --traineddata training/langdata/Test/ALLData/fas/fas.traineddata \ --old_traineddata training/langdata/Test/best/fas.traineddata \ --train_listfile training/langdata/Test/ALLData/fas.training_files.txt \ --max_iterations 3600 --target_error_rate 0.01 But I've got this errore: Must provide a --traineddata see training wiki What Is the problem? Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/12985bf6-b540-4295-9c4b-80fb7df0fab8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Compute CTC targets failed while training
Hi, I use this : tesseract 4.0.0-beta.4 leptonica-1.74.4 libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 Found AVX2 Found AVX Found SSE I've trained about 18000 line for persian language. I use this command: bash -x tesstrain.sh --fonts_dir /usr/share/fonts --lang fas --training_text /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.training_text.txt --wordlist /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.wordlist.txt --linedata_only \ --noextract_font_properties --langdata_dir /home/zohreh/Desktop/tesseract-master/src/training/langdata \ --tessdata_dir /home/zohreh/Desktop/tesseract-master/tessdata \ --fontlist "Arial" --output_dir /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2 and then run this: sudo /home/zohreh/Desktop/tesseract-master/src/training/lstmtraining \ --traineddata /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas/fas.traineddata --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c1]' \ --model_output /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/base --learning_rate 0.001 \ --train_listfile /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas.training_files.txt \ --eval_listfile /home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt \ --max_iterations 5000 &>/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/basetrain.log but always show Compute CTC targets failed and the model is not well at all. I normal my text and each line of the text have 20 token(max). Could you pleas help me? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/04872dc6-7d92-4f95-9f65-8bb0cbf87c8c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Text2image doens't create font list
Hi, I use tesseract 4.0.0-beta.4 leptonica-1.74.4 libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 Found AVX2 Found AVX Found SSE But when I run this command: text2image --find_fonts \ --fonts_dir /usr/share/fonts \ --text ./langdata/fas/fas.training_text \ --min_coverage 1 \ --outputbase ./langdata/fas/fas \ |& grep raw | sed -e 's/ :.*/" \\/g' | sed -e 's/^/ "/' >./langdata/fas/fas.fontslist.txt fas.fontslist.txt is empty. I have some fonts on my linux. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/af4239bf-2cad-405f-ba22-540b65dd7596%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] tesstrain.sh doesn't create traindata
Hi, I had created traindata for persina befor but not my tesstrain.sh doesn't create fas.traindata. I think this part of bash file doesn't call make__traineddata() { tlog "*\n=== Making final traineddata file ===*" local lang_prefix=${LANGDATA_ROOT}/${LANG_CODE}/${LANG_CODE} # Combine available files for this language from the langdata dir. if [[ -r ${lang_prefix}.config ]]; then tlog "Copying ${lang_prefix}.config to ${TRAINING_DIR}" cp ${lang_prefix}.config ${TRAINING_DIR} chmod u+w ${TRAINING_DIR}/${LANG_CODE}.config fi if [[ -r ${lang_prefix}.params-model ]]; then tlog "Copying ${lang_prefix}.params-model to ${TRAINING_DIR}" cp ${lang_prefix}.params-model ${TRAINING_DIR} chmod u+w ${TRAINING_DIR}/${LANG_CODE}.params-model fi # Compose the traineddata file. run_command combine_tessdata ${TRAINING_DIR}/${LANG_CODE}. # Copy it to the output dir, overwriting only if allowed by the cmdline flag. if [[ ! -d ${OUTPUT_DIR} ]]; then tlog "Creating new directory ${OUTPUT_DIR}" mkdir -p ${OUTPUT_DIR} fi local destfile=${OUTPUT_DIR}/${LANG_CODE}${namestr}.traineddata; if [[ -f ${destfile} ]] && ((! OVERWRITE)); then err_exit "File ${destfile} exists and no --overwrite specified"; fi tlog "Moving ${TRAINING_DIR}/${LANG_CODE}.traineddata to ${OUTPUT_DIR}" z=${TRAINING_DIR}/${LANG_CODE}.traineddata cp -f ${TRAINING_DIR}/${LANG_CODE}.traineddata ${destfile} } I reinstalled ./autogen.sh ./configure sudo make sudo make install sudo ldconfig sudo make training sudo make training-install again and tesseract -v tesseract 4.0.0-beta.3 leptonica-1.76.0 libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 Found AVX2 Found AVX Found SSE is. Could you please some one help me? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/28a7e551-0ab2-420a-b9b3-3776f5f33202%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Errore: Compute CTC targets failed
Hi, when I have a problem when I use lstmtraining command: lstmtraining \ --traineddata /home/kddlab/Desktop/tesseract-master/training/langdata/fas/Phase2/fas/fas.traineddata --net_spec '[1,36,0,1Ct3,3,16Mp3,3Lfys270Lfx540Lrx540Lfx192O1c1]' \ --model_output /home/kddlab/Desktop/tesseract-master/training/langdata/fas/Phaseout/base --learning_rate 0.001 \ --train_listfile /home/kddlab/Desktop/tesseract-master/training/langdata/fas/Phase2/fas.training_files.txt \ --eval_listfile /home/kddlab/Desktop/tesseract-master/training/langdata/fas/Phase3/fas.training_files.txt \ --max_iterations 2762200 &>/home/kddlab/Desktop/tesseract-master/training/langdata/fas/Phaseout/basetrain.log Compute CTC targets failed! Compute CTC targets failed! How can I solve the problem? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/579c6112-faff-40b8-8e2a-955244e0a02b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Make lstm for some files
I have some tif and box files for each font for example: fas.B_Mitra.exp0.box fas.B_Mitra.exp0.tif fas.B_Mitra.exp1.box fas.B_Mitra.exp1.tif fas.B_Mitra.exp2.box fas.B_Mitra.exp2.tif . . . How can I make lstm for each of them? Thx. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/c011d8f3-75b1-471f-a772-35327390bf78%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] LSTM files
Hi, I have been training persian language. My text is too large so I had to generated 18 boxfiles and 18 tifs for one text. Then I make on unicharset for all 18 files. Now when I want to make lstm file, it just create one lstm,"fas.B_Mitra.exp0.lstmf" and can't create for 1 ot 18. I test somthing else. I created "fas.B_Nazanin.exp0.lstmf" too and use lstmtraining but it just used fas.B_Mitra.exp0.lstmf and did not use another. How can I make a lstm for all my boxes? Thx. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/648d5bbc-5b16-4a30-b2af-d87504102cf8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Train 2 language together
Thx. you're right. On Sunday, July 1, 2018 at 10:02:55 PM UTC+4:30, shree wrote: > > The font being used does not support English. > > On Sun, Jul 1, 2018 at 10:06 PM Zohreh Khosrobeygi > wrote: > >> Hi, >> I have been training the text: >> >> 272-135031- BECAUSE YOU WERE SLEEPING INSTEAD OWHILE POOR SHAGGY >> SITS THERE A COOING DOVE >> فیلم و و , منابع سال آگهی آخرين آخرین بود. ساخت و کنی >> >> It means the text contains Persian and English. But when Tiff file has >> been created, all English text have been removed. The Tiff file contains >> this: >> >> 272-135031- >> فیلم و و , منابع سال آگهی آخرين آخرین بود. ساخت و کنی >> >> But for Persian we need to train both language together. >> How can I solve the problem? How can I train 2 language together? >> Thanks a lot. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/0e854ed2-3ca2-48e7-af79-9f4f1924e38b%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/0e854ed2-3ca2-48e7-af79-9f4f1924e38b%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/bc68bba3-af00-49c6-92eb-81328a307f95%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Train 2 language together
Hi, I have been training the text: 272-135031- BECAUSE YOU WERE SLEEPING INSTEAD OWHILE POOR SHAGGY SITS THERE A COOING DOVE فیلم و و , منابع سال آگهی آخرين آخرین بود. ساخت و کنی It means the text contains Persian and English. But when Tiff file has been created, all English text have been removed. The Tiff file contains this: 272-135031- فیلم و و , منابع سال آگهی آخرين آخرین بود. ساخت و کنی But for Persian we need to train both language together. How can I solve the problem? How can I train 2 language together? Thanks a lot. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0e854ed2-3ca2-48e7-af79-9f4f1924e38b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] parameter not found: tessedit_ocr_psm_mode
Has it chaned? I used to code : tessedit_ocr_engine_mode 1 tessedit_ocr_psm_mode 6 On Sunday, July 1, 2018 at 8:11:20 PM UTC+4:30, shree wrote: > > correct variable is > > tessedit_pageseg_mode > > On Sun, Jul 1, 2018 at 8:51 PM Shree Devi Kumar > wrote: > >> what's the output for ? >> >> tesseract -v >> >> which tesseract >> >> which tesstrain.sh >> >> On Sun, Jul 1, 2018 at 8:39 PM Zohreh Khosrobeygi > > wrote: >> >>> Hi, >>> when i use the tesstrain.sh, I have been getting this error that is >>> about my fas.config. My config file is: >>> >>> tessedit_ocr_engine_mode 1 >>> tessedit_ocr_psm_mode 6 >>> >>> The erroe is: >>> >>> read_params_file: parameter not found: tessedit_ocr_psm_mode >>> + [[ 0 -gt 0 ]] >>> + export TESSDATA_PREFIX= >>> + TESSDATA_PREFIX= >>> + for img_file in '${img_files}' >>> + check_file_readable /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf >>> + for file in '$@' >>> + [[ ! -r /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf ]] >>> + err_exit '/tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not >>> exist or is not readable' >>> + echo -e 'ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf' does >>> not exist or is not readable >>> + tee -a /tmp/tmp.AjJgcthbHl/fas/tesstrain.log >>> ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not exist >>> or is not readable >>> + exit 1 >>> >>> Could you please help me? >>> >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com . >>> To post to this group, send email to tesser...@googlegroups.com >>> . >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/544fed36-eeb2-484f-a0e1-a3067e489ea8%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/tesseract-ocr/544fed36-eeb2-484f-a0e1-a3067e489ea8%40googlegroups.com?utm_medium=email_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> -- >> >> >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> > > > -- > > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7cad3f96-e365-4e77-b2f6-baa16b76d04f%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] parameter not found: tessedit_ocr_psm_mode
tesseract 4.0.0-beta.3 leptonica-1.74.4 libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 Found AVX2 Found AVX Found SSE --- On Sunday, July 1, 2018 at 7:52:08 PM UTC+4:30, shree wrote: > > what's the output for ? > > tesseract -v > > which tesseract > > which tesstrain.sh > > On Sun, Jul 1, 2018 at 8:39 PM Zohreh Khosrobeygi > wrote: > >> Hi, >> when i use the tesstrain.sh, I have been getting this error that is about >> my fas.config. My config file is: >> >> tessedit_ocr_engine_mode 1 >> tessedit_ocr_psm_mode 6 >> >> The erroe is: >> >> read_params_file: parameter not found: tessedit_ocr_psm_mode >> + [[ 0 -gt 0 ]] >> + export TESSDATA_PREFIX= >> + TESSDATA_PREFIX= >> + for img_file in '${img_files}' >> + check_file_readable /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf >> + for file in '$@' >> + [[ ! -r /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf ]] >> + err_exit '/tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not >> exist or is not readable' >> + echo -e 'ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf' does >> not exist or is not readable >> + tee -a /tmp/tmp.AjJgcthbHl/fas/tesstrain.log >> ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not exist or >> is not readable >> + exit 1 >> >> Could you please help me? >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/544fed36-eeb2-484f-a0e1-a3067e489ea8%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/544fed36-eeb2-484f-a0e1-a3067e489ea8%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5848ac2f-671e-4cdf-9ac6-5cba3d70c18e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] parameter not found: tessedit_ocr_psm_mode
Hi, when i use the tesstrain.sh, I have been getting this error that is about my fas.config. My config file is: tessedit_ocr_engine_mode 1 tessedit_ocr_psm_mode 6 The erroe is: read_params_file: parameter not found: tessedit_ocr_psm_mode + [[ 0 -gt 0 ]] + export TESSDATA_PREFIX= + TESSDATA_PREFIX= + for img_file in '${img_files}' + check_file_readable /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf + for file in '$@' + [[ ! -r /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf ]] + err_exit '/tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not exist or is not readable' + echo -e 'ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf' does not exist or is not readable + tee -a /tmp/tmp.AjJgcthbHl/fas/tesstrain.log ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not exist or is not readable + exit 1 Could you please help me? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/544fed36-eeb2-484f-a0e1-a3067e489ea8%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [tesseract-ocr] Unrecognized argument --linedata_only
Yes, i am using src/training/tesstrain.sh On Friday, June 8, 2018 at 6:44:27 PM UTC+4:30, shree wrote: > > Are you using the correct version of tesstrain.sh? > > It should be in src/training/tesstrain.sh > > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > > On Fri, Jun 8, 2018 at 6:49 PM Zohreh Khosrobeygi > wrote: > >> Hi, >> I have been training tesseract but i have this errore" >> >> Unrecognized argument --linedata_only >> >> And it's my version of tesseract" >> tesseract 4.0.0-beta.1 >> leptonica-1.74.4 >> libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib >> 1.2.8 >> >> Found AVX2 >> Found AVX >> Found SSE >> >> Besides it's my command: >> sudo tesstrain.sh --fonts_dir /usr/share/fonts --lang fas >> --training_text >> /home/kddlab/Desktop/tesseract-master/1MyData/fas/fas.training_text >> --linedata_only \ >> --noextract_font_properties --langdata_dir >> /home/kddlab/Desktop/tesseract-master/langdata \ >> --tessdata_dir /home/kddlab/Desktop/tesseract-master/tessdata \ >> --fontlist "B Mitra" --output_dir >> /home/kddlab/Desktop/tesseract-master/1MyData/testfas >> >> And i have config file: >> # Use LSTM >> tessedit_ocr_engine_mode 1 >> tessedit_pageseg_mode 6 >> >> How can i solve this? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com . >> To post to this group, send email to tesser...@googlegroups.com >> . >> Visit this group at https://groups.google.com/group/tesseract-ocr. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/a692d903-34be-4a51-99c5-11ed34bb6cef%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/a692d903-34be-4a51-99c5-11ed34bb6cef%40googlegroups.com?utm_medium=email_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/caf0b092-1a2c-4e73-9171-16678495af51%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Re: Unrecognized argument --linedata_only
On Friday, June 8, 2018 at 5:49:43 PM UTC+4:30, Zohreh Khosrobeygi wrote: > > Hi, > I have been training tesseract but i have this errore" > > Unrecognized argument --linedata_only > > And it's my version of tesseract" > tesseract 4.0.0-beta.1 > leptonica-1.74.4 > libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib > 1.2.8 > > Found AVX2 > Found AVX > Found SSE > > Besides it's my command: > sudo tesstrain.sh --fonts_dir /usr/share/fonts --lang fas > --training_text > /home/kddlab/Desktop/tesseract-master/1MyData/fas/fas.training_text > --linedata_only \ > --noextract_font_properties --langdata_dir > /home/kddlab/Desktop/tesseract-master/langdata \ > --tessdata_dir /home/kddlab/Desktop/tesseract-master/tessdata \ > --fontlist "B Mitra" --output_dir > /home/kddlab/Desktop/tesseract-master/1MyData/testfas > > And i have config file: > # Use LSTM > tessedit_ocr_engine_mode 1 > tessedit_pageseg_mode 6 > > How can i solve this? > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/2fb39a22-f0e9-4bb0-96b3-8c6624694bc9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[tesseract-ocr] Unrecognized argument --linedata_only
Hi, I have been training tesseract but i have this errore" Unrecognized argument --linedata_only And it's my version of tesseract" tesseract 4.0.0-beta.1 leptonica-1.74.4 libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 Found AVX2 Found AVX Found SSE Besides it's my command: sudo tesstrain.sh --fonts_dir /usr/share/fonts --lang fas --training_text /home/kddlab/Desktop/tesseract-master/1MyData/fas/fas.training_text --linedata_only \ --noextract_font_properties --langdata_dir /home/kddlab/Desktop/tesseract-master/langdata \ --tessdata_dir /home/kddlab/Desktop/tesseract-master/tessdata \ --fontlist "B Mitra" --output_dir /home/kddlab/Desktop/tesseract-master/1MyData/testfas And i have config file: # Use LSTM tessedit_ocr_engine_mode 1 tessedit_pageseg_mode 6 How can i solve this? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a692d903-34be-4a51-99c5-11ed34bb6cef%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.