[tesseract-ocr] 2 Layer CNN

2018-12-30 Thread Zohreh Khosrobeygi
Hi, I used 2 Layer CNN Like this:
[1,48,0,1Ct3,3,8Mp3,3Ct3,3,16Mp2,2Lbys64Lbx512O1c1]
But the error rate of error is high even after 30 iteration
In Tesseract can we use 2 layers CNN?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/784278e4-d82b-4c4f-ae1d-35141e225c79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] What is the information in basetrain.log

2018-12-09 Thread Zohreh Khosrobeygi
Hi, 
Does any one know about the information in the log file that create while 
training?
Warning: given outputs 1 not equal to unicharset of 165.
Num outputs,weights in Series:
  1,48,0,1:1, 0
Num outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lbys64:128, 41472
  Lbx128:256, 263168
  Lby256:512, 1050624
  Lbx512:1024, 4198400
  Fc165:165, 169125
Total weights = 5722949
Built network:[1,48,0,1[C3,3Ft16]Mp3,3Lbys64Lbx128Lby256Lbx512Fc165] from 
request [1,48,0,1Ct3,3,16Mp3,3Lbys64Lbx128Lby256Lbx512O1c1]
Espacially this part:
Num outputs,weights in Series:
  1,48,0,1:1, 0
Num outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0

Thanks for your help.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1197c56d-aa4d-4e82-8d4d-9ad4fa9e2449%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] CNN and Tensorflow in Tesseract

2018-12-09 Thread Zohreh Khosrobeygi
I have some question:
1- how many layers does CNN has in tensor flow?
2- What is the stride in the Convolution layers and pooling layers?
3- Convolution has use zero pad?

I'm training Persian language and my accuracy is so good but I need to 
increase. Convolution is so important in my training.
Does any one know the answers?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ab23f73b-8fb5-44e7-9872-5b79eba03a54%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] the length of input to lstm

2018-12-04 Thread Zohreh Khosrobeygi
I'm training tesseract from scratch for the Persian Language. But I need to 
know about the output of TF convention because it is the input of lstm. 
Wiki says, for example, ct5,5,32. I couldn't understand the number of 
output. In this case,32 is depth. but how about the number of output. 
Besides, somewhere say, 32 is the number of the filter. Can anyone describe 
it to me?
In summary, when the network is:
[1,48,0,1[C3,3Ft16]Mp3,3Lfys64Lfx96Lrx96Lfx192Fc165]
what are the numbers of inputs?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/badf97f0-1420-43ce-9879-7ccc3ab79a05%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.

2018-11-26 Thread Zohreh Khosrobeygi
Hi, 
I have been runnig about 130G data which are 4000 files. My command is

/home/kddlab/Desktop/tesseract-master/src/training/lstmtraining   \
  --traineddata 
/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas/fas.traineddata
   
--net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c165]' \
  --model_output 
/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/base
 
--learning_rate 0.001 \
  --train_listfile 
/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/fas.training_files.txt
 
\
  --eval_listfile 
/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
 
\ --max_iterations 15 
&>/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/Out_Checkpoint/basetrain.log

but after reading some files the tesseract gives the error and stop 
training:

Loaded 821/10179 pages (1-821) of document 
/home/kddlab/Desktop/tesseract-master/src/training/langdata/fas/AllLstm/fas.B_Lotus.exp2778.lstmf
lt-lstmtraining: genericvector.h:720: T& GenericVector::operator[](int) 
const [with T = char]: Assertion `index >= 0 && index < size_used_' failed.
Could you please help me?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b60675e3-5008-4584-92b7-f77e5ab0d037%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Train Persian fonts

2018-10-24 Thread Zohreh Khosrobeygi
Hi, 
I could train Persian fonts and the result is so good. But there is a 
problem when the text includes English and Persian Words. In the training 
phase, Tesseract removes all English words in Persian fonts. How can I 
generate tif and box file for Persian fonts when the text includes English 
words?
Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/54337c84-c9a3-498c-b703-f125b5bfd9d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: error: Must provide a --traineddata see training wiki

2018-09-30 Thread Zohreh Khosrobeygi


On Sunday, September 30, 2018 at 12:20:19 PM UTC+3:30, Zohreh Khosrobeygi 
wrote:
>
> I am trying to finetun train for tesseract 
> I've created a new fas.traindata and extract best traind data for persian 
> and then run below command:
>
> combine_tessdata -e tessdata/best/fas.traineddata \
>   /home/zohreh/Desktop/tesseract-master/tessdata/ext-best/fas.lstm
>   
> training/lstmtraining --model_output /training/langdata/Test/out \ 
> --continue_fromtraining /langdata/Test/ext-best/fas.lstm \  --traineddata 
> training/langdata/Test/ALLData/fas/fas.traineddata \ --old_traineddata 
> training/langdata/Test/best/fas.traineddata \ --train_listfile 
> training/langdata/Test/ALLData/fas.training_files.txt \ --max_iterations 
> 3600  --target_error_rate 0.01
> But I've got this errore:
>
> Must provide a --traineddata see training wiki
>  What Is the problem?
> Thanks
>
*
I pasted wrong:
it's true
combine_tessdata -e training/langdata/Test/best/fas.traineddata \
  training/langdata/Test/ext-best/fas. 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/250befd1-cfbf-4f72-89db-c84db6375b0e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] error: Must provide a --traineddata see training wiki

2018-09-30 Thread Zohreh Khosrobeygi
I am trying to finetun train for tesseract 
I've created a new fas.traindata and extract best traind data for persian 
and then run below command:

combine_tessdata -e tessdata/best/fas.traineddata \
  /home/zohreh/Desktop/tesseract-master/tessdata/ext-best/fas.lstm
  
training/lstmtraining --model_output /training/langdata/Test/out \ 
--continue_fromtraining /langdata/Test/ext-best/fas.lstm \  --traineddata 
training/langdata/Test/ALLData/fas/fas.traineddata \ --old_traineddata 
training/langdata/Test/best/fas.traineddata \ --train_listfile 
training/langdata/Test/ALLData/fas.training_files.txt \ --max_iterations 
3600  --target_error_rate 0.01
But I've got this errore:

Must provide a --traineddata see training wiki
 What Is the problem?
Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/12985bf6-b540-4295-9c4b-80fb7df0fab8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Compute CTC targets failed while training

2018-09-25 Thread Zohreh Khosrobeygi
Hi, I use this :
tesseract 4.0.0-beta.4
 leptonica-1.74.4
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 
1.2.8

 Found AVX2
 Found AVX
 Found SSE
I've trained about 18000 line for persian language. I use this command:

bash -x tesstrain.sh --fonts_dir /usr/share/fonts --lang fas
--training_text  
 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.training_text.txt
 
--wordlist 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/fas.wordlist.txt
  
--linedata_only \
  --noextract_font_properties --langdata_dir 
/home/zohreh/Desktop/tesseract-master/src/training/langdata \
  --tessdata_dir /home/zohreh/Desktop/tesseract-master/tessdata \
  --fontlist "Arial" --output_dir 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2
and then run this:
sudo /home/zohreh/Desktop/tesseract-master/src/training/lstmtraining   \
  --traineddata 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas/fas.traineddata
  
 --net_spec '[1,48,0,1Ct3,3,16Mp3,3Lfys64Lfx96Lrx96Lfx192O1c1]' \
  --model_output 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/base 
--learning_rate 0.001 \
  --train_listfile 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Phase2/fas.training_files.txt
 
\
  --eval_listfile 
/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/v/fas.training_files.txt
 
\
  --max_iterations 5000 
&>/home/zohreh/Desktop/tesseract-master/src/training/langdata/fas/Out/basetrain.log
but always show Compute CTC targets failed and the model is not well at all.
I normal my text and each line of the text have 20 token(max).
Could you pleas help me?
 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/04872dc6-7d92-4f95-9f65-8bb0cbf87c8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Text2image doens't create font list

2018-09-25 Thread Zohreh Khosrobeygi
Hi,
I use
 tesseract 4.0.0-beta.4
 leptonica-1.74.4
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 
1.2.8

 Found AVX2
 Found AVX
 Found SSE
But when I run this command:
 text2image --find_fonts \
--fonts_dir /usr/share/fonts \
--text ./langdata/fas/fas.training_text \
--min_coverage 1  \
--outputbase ./langdata/fas/fas \
|& grep raw | sed -e 's/ :.*/" \\/g'  | sed -e 's/^/  "/' 
>./langdata/fas/fas.fontslist.txt
fas.fontslist.txt is empty. I have some fonts on my linux. 

 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/af4239bf-2cad-405f-ba22-540b65dd7596%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] tesstrain.sh doesn't create traindata

2018-09-21 Thread Zohreh Khosrobeygi
Hi,
I had created traindata for persina befor but not my tesstrain.sh doesn't 
create fas.traindata.
I think this part of bash file doesn't call


make__traineddata() {
  tlog "*\n=== Making final traineddata file ===*"
  local lang_prefix=${LANGDATA_ROOT}/${LANG_CODE}/${LANG_CODE}
 

  # Combine available files for this language from the langdata dir.
  if [[ -r ${lang_prefix}.config ]]; then
tlog "Copying ${lang_prefix}.config to ${TRAINING_DIR}"
cp ${lang_prefix}.config ${TRAINING_DIR}
chmod u+w ${TRAINING_DIR}/${LANG_CODE}.config
  fi
  if [[ -r ${lang_prefix}.params-model ]]; then
tlog "Copying ${lang_prefix}.params-model to ${TRAINING_DIR}"
cp ${lang_prefix}.params-model ${TRAINING_DIR}
chmod u+w ${TRAINING_DIR}/${LANG_CODE}.params-model
  fi
  
  # Compose the traineddata file.
  run_command combine_tessdata ${TRAINING_DIR}/${LANG_CODE}.

  # Copy it to the output dir, overwriting only if allowed by the cmdline 
flag.
  if [[ ! -d ${OUTPUT_DIR} ]]; then
  tlog "Creating new directory ${OUTPUT_DIR}"
  mkdir -p ${OUTPUT_DIR}
  fi
  local destfile=${OUTPUT_DIR}/${LANG_CODE}${namestr}.traineddata;
 
  
  if [[ -f ${destfile} ]] && ((! OVERWRITE)); then
  err_exit "File ${destfile} exists and no --overwrite specified";
  fi
  tlog "Moving ${TRAINING_DIR}/${LANG_CODE}.traineddata to ${OUTPUT_DIR}"
  z=${TRAINING_DIR}/${LANG_CODE}.traineddata

  cp -f ${TRAINING_DIR}/${LANG_CODE}.traineddata ${destfile}
  
}
I reinstalled 
./autogen.sh
./configure
sudo make
sudo make install
sudo ldconfig
sudo make training
sudo make training-install
again and 
 tesseract -v
tesseract 4.0.0-beta.3
 leptonica-1.76.0
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 
1.2.8
 Found AVX2
 Found AVX
 Found SSE
is.
Could you please some one help me?




-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/28a7e551-0ab2-420a-b9b3-3776f5f33202%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Errore: Compute CTC targets failed

2018-08-29 Thread Zohreh Khosrobeygi
Hi, when I have a problem when I use lstmtraining  command:


lstmtraining   \
  --traineddata 
/home/kddlab/Desktop/tesseract-master/training/langdata/fas/Phase2/fas/fas.traineddata
  
 --net_spec '[1,36,0,1Ct3,3,16Mp3,3Lfys270Lfx540Lrx540Lfx192O1c1]' \
  --model_output 
/home/kddlab/Desktop/tesseract-master/training/langdata/fas/Phaseout/base 
--learning_rate 0.001 \
  --train_listfile 
/home/kddlab/Desktop/tesseract-master/training/langdata/fas/Phase2/fas.training_files.txt
 
\
  --eval_listfile 
/home/kddlab/Desktop/tesseract-master/training/langdata/fas/Phase3/fas.training_files.txt
 
\
  --max_iterations 2762200 
&>/home/kddlab/Desktop/tesseract-master/training/langdata/fas/Phaseout/basetrain.log


Compute CTC targets failed!
Compute CTC targets failed!

How can I solve the problem?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/579c6112-faff-40b8-8e2a-955244e0a02b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Make lstm for some files

2018-08-16 Thread Zohreh Khosrobeygi
I have some tif and box files for each font for example:
fas.B_Mitra.exp0.box
fas.B_Mitra.exp0.tif
fas.B_Mitra.exp1.box
fas.B_Mitra.exp1.tif
fas.B_Mitra.exp2.box
fas.B_Mitra.exp2.tif
.
.
.
How can I make lstm for each of them?
Thx.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c011d8f3-75b1-471f-a772-35327390bf78%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] LSTM files

2018-08-13 Thread Zohreh Khosrobeygi
Hi, 
I have been training persian language. My text is too large so I had to 
generated 18 boxfiles and 18 tifs for one text. Then I make on unicharset 
for all 18 files. Now when I want to make lstm file, it just create one 
lstm,"fas.B_Mitra.exp0.lstmf" and can't create for 1 ot 18. 
I test somthing else. I created "fas.B_Nazanin.exp0.lstmf" too and 
use lstmtraining but it just used fas.B_Mitra.exp0.lstmf and did not use 
another.
How can I make a lstm for all my boxes?
Thx.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/648d5bbc-5b16-4a30-b2af-d87504102cf8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Train 2 language together

2018-07-02 Thread Zohreh Khosrobeygi
Thx. you're right.

On Sunday, July 1, 2018 at 10:02:55 PM UTC+4:30, shree wrote:
>
> The font being used does not support English.
>
> On Sun, Jul 1, 2018 at 10:06 PM Zohreh Khosrobeygi  > wrote:
>
>> Hi,
>> I have been training the text:
>>
>> 272-135031- BECAUSE YOU WERE SLEEPING INSTEAD OWHILE POOR SHAGGY 
>> SITS THERE A COOING DOVE
>> فیلم و و , منابع سال آگهی آخرين آخرین بود. ساخت و کنی
>>
>> It means the text contains Persian and English. But when Tiff file has 
>> been created, all English text have been removed. The Tiff file contains 
>> this:
>>
>> 272-135031-
>> فیلم و و , منابع سال آگهی آخرين آخرین بود. ساخت و کنی
>>
>> But for Persian we need to train both language together.
>> How can I solve the problem? How can I train 2 language together?
>> Thanks a lot.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/0e854ed2-3ca2-48e7-af79-9f4f1924e38b%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/0e854ed2-3ca2-48e7-af79-9f4f1924e38b%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
>
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bc68bba3-af00-49c6-92eb-81328a307f95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Train 2 language together

2018-07-01 Thread Zohreh Khosrobeygi
Hi,
I have been training the text:

272-135031- BECAUSE YOU WERE SLEEPING INSTEAD OWHILE POOR SHAGGY SITS 
THERE A COOING DOVE
فیلم و و , منابع سال آگهی آخرين آخرین بود. ساخت و کنی

It means the text contains Persian and English. But when Tiff file has been 
created, all English text have been removed. The Tiff file contains this:

272-135031-
فیلم و و , منابع سال آگهی آخرين آخرین بود. ساخت و کنی

But for Persian we need to train both language together.
How can I solve the problem? How can I train 2 language together?
Thanks a lot.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0e854ed2-3ca2-48e7-af79-9f4f1924e38b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] parameter not found: tessedit_ocr_psm_mode

2018-07-01 Thread Zohreh Khosrobeygi
Has it chaned? I used to code :
tessedit_ocr_engine_mode 1
tessedit_ocr_psm_mode 6



On Sunday, July 1, 2018 at 8:11:20 PM UTC+4:30, shree wrote:
>
> correct variable is 
>
> tessedit_pageseg_mode 
>
> On Sun, Jul 1, 2018 at 8:51 PM Shree Devi Kumar  > wrote:
>
>> what's the output for ?
>>
>> tesseract -v
>>
>> which tesseract
>>
>> which tesstrain.sh
>>
>> On Sun, Jul 1, 2018 at 8:39 PM Zohreh Khosrobeygi > > wrote:
>>
>>> Hi, 
>>> when i use the tesstrain.sh, I have been getting this error that is 
>>> about my fas.config. My config file is:
>>>
>>> tessedit_ocr_engine_mode 1
>>> tessedit_ocr_psm_mode 6
>>>
>>> The erroe is:
>>>
>>> read_params_file: parameter not found: tessedit_ocr_psm_mode
>>> + [[ 0 -gt 0 ]]
>>> + export TESSDATA_PREFIX=
>>> + TESSDATA_PREFIX=
>>> + for img_file in '${img_files}'
>>> + check_file_readable /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf
>>> + for file in '$@'
>>> + [[ ! -r /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf ]]
>>> + err_exit '/tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not 
>>> exist or is not readable'
>>> + echo -e 'ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf' does 
>>> not exist or is not readable
>>> + tee -a /tmp/tmp.AjJgcthbHl/fas/tesstrain.log
>>> ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not exist 
>>> or is not readable
>>> + exit 1
>>>
>>> Could you please help me?
>>>
>>>
>>>  
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to tesseract-oc...@googlegroups.com .
>>> To post to this group, send email to tesser...@googlegroups.com 
>>> .
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/544fed36-eeb2-484f-a0e1-a3067e489ea8%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/544fed36-eeb2-484f-a0e1-a3067e489ea8%40googlegroups.com?utm_medium=email_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> -- 
>>
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>
>
> -- 
>
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7cad3f96-e365-4e77-b2f6-baa16b76d04f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] parameter not found: tessedit_ocr_psm_mode

2018-07-01 Thread Zohreh Khosrobeygi
tesseract 4.0.0-beta.3
 leptonica-1.74.4
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 
1.2.8

 Found AVX2
 Found AVX
 Found SSE
---

On Sunday, July 1, 2018 at 7:52:08 PM UTC+4:30, shree wrote:
>
> what's the output for ?
>
> tesseract -v
>
> which tesseract
>
> which tesstrain.sh
>
> On Sun, Jul 1, 2018 at 8:39 PM Zohreh Khosrobeygi  > wrote:
>
>> Hi, 
>> when i use the tesstrain.sh, I have been getting this error that is about 
>> my fas.config. My config file is:
>>
>> tessedit_ocr_engine_mode 1
>> tessedit_ocr_psm_mode 6
>>
>> The erroe is:
>>
>> read_params_file: parameter not found: tessedit_ocr_psm_mode
>> + [[ 0 -gt 0 ]]
>> + export TESSDATA_PREFIX=
>> + TESSDATA_PREFIX=
>> + for img_file in '${img_files}'
>> + check_file_readable /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf
>> + for file in '$@'
>> + [[ ! -r /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf ]]
>> + err_exit '/tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not 
>> exist or is not readable'
>> + echo -e 'ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf' does 
>> not exist or is not readable
>> + tee -a /tmp/tmp.AjJgcthbHl/fas/tesstrain.log
>> ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not exist or 
>> is not readable
>> + exit 1
>>
>> Could you please help me?
>>
>>
>>  
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/544fed36-eeb2-484f-a0e1-a3067e489ea8%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/544fed36-eeb2-484f-a0e1-a3067e489ea8%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> -- 
>
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5848ac2f-671e-4cdf-9ac6-5cba3d70c18e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] parameter not found: tessedit_ocr_psm_mode

2018-07-01 Thread Zohreh Khosrobeygi
Hi, 
when i use the tesstrain.sh, I have been getting this error that is about 
my fas.config. My config file is:

tessedit_ocr_engine_mode 1
tessedit_ocr_psm_mode 6

The erroe is:

read_params_file: parameter not found: tessedit_ocr_psm_mode
+ [[ 0 -gt 0 ]]
+ export TESSDATA_PREFIX=
+ TESSDATA_PREFIX=
+ for img_file in '${img_files}'
+ check_file_readable /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf
+ for file in '$@'
+ [[ ! -r /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf ]]
+ err_exit '/tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not exist 
or is not readable'
+ echo -e 'ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf' does 
not exist or is not readable
+ tee -a /tmp/tmp.AjJgcthbHl/fas/tesstrain.log
ERROR: /tmp/tmp.AjJgcthbHl/fas/fas.B_Nazanin.exp0.lstmf does not exist or 
is not readable
+ exit 1

Could you please help me?


 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/544fed36-eeb2-484f-a0e1-a3067e489ea8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Unrecognized argument --linedata_only

2018-06-09 Thread Zohreh Khosrobeygi
Yes, i am using   src/training/tesstrain.sh


On Friday, June 8, 2018 at 6:44:27 PM UTC+4:30, shree wrote:
>
> Are you using the correct version of tesstrain.sh?
>
> It should be in src/training/tesstrain.sh
>
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
>
> On Fri, Jun 8, 2018 at 6:49 PM Zohreh Khosrobeygi  > wrote:
>
>> Hi,
>> I have been training tesseract but i have this errore"
>>
>> Unrecognized argument --linedata_only
>>  
>> And it's my version of tesseract"
>> tesseract 4.0.0-beta.1
>>  leptonica-1.74.4
>>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 
>> 1.2.8
>>
>>  Found AVX2
>>  Found AVX
>>  Found SSE
>>
>> Besides it's my command:
>> sudo tesstrain.sh --fonts_dir /usr/share/fonts --lang fas
>> --training_text 
>> /home/kddlab/Desktop/tesseract-master/1MyData/fas/fas.training_text
>>  --linedata_only \
>>   --noextract_font_properties --langdata_dir 
>> /home/kddlab/Desktop/tesseract-master/langdata \
>>   --tessdata_dir /home/kddlab/Desktop/tesseract-master/tessdata \
>>   --fontlist "B Mitra" --output_dir 
>> /home/kddlab/Desktop/tesseract-master/1MyData/testfas
>>
>> And i have config file:
>> # Use LSTM
>> tessedit_ocr_engine_mode 1
>> tessedit_pageseg_mode 6
>>
>> How can i solve this?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/a692d903-34be-4a51-99c5-11ed34bb6cef%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/a692d903-34be-4a51-99c5-11ed34bb6cef%40googlegroups.com?utm_medium=email_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/caf0b092-1a2c-4e73-9171-16678495af51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Unrecognized argument --linedata_only

2018-06-09 Thread Zohreh Khosrobeygi


On Friday, June 8, 2018 at 5:49:43 PM UTC+4:30, Zohreh Khosrobeygi wrote:
>
> Hi,
> I have been training tesseract but i have this errore"
>
> Unrecognized argument --linedata_only
>  
> And it's my version of tesseract"
> tesseract 4.0.0-beta.1
>  leptonica-1.74.4
>   libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 
> 1.2.8
>
>  Found AVX2
>  Found AVX
>  Found SSE
>
> Besides it's my command:
> sudo tesstrain.sh --fonts_dir /usr/share/fonts --lang fas
> --training_text 
> /home/kddlab/Desktop/tesseract-master/1MyData/fas/fas.training_text
>  --linedata_only \
>   --noextract_font_properties --langdata_dir 
> /home/kddlab/Desktop/tesseract-master/langdata \
>   --tessdata_dir /home/kddlab/Desktop/tesseract-master/tessdata \
>   --fontlist "B Mitra" --output_dir 
> /home/kddlab/Desktop/tesseract-master/1MyData/testfas
>
> And i have config file:
> # Use LSTM
> tessedit_ocr_engine_mode 1
> tessedit_pageseg_mode 6
>
> How can i solve this?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/2fb39a22-f0e9-4bb0-96b3-8c6624694bc9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Unrecognized argument --linedata_only

2018-06-08 Thread Zohreh Khosrobeygi
Hi,
I have been training tesseract but i have this errore"

Unrecognized argument --linedata_only
 
And it's my version of tesseract"
tesseract 4.0.0-beta.1
 leptonica-1.74.4
  libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 
1.2.8

 Found AVX2
 Found AVX
 Found SSE

Besides it's my command:
sudo tesstrain.sh --fonts_dir /usr/share/fonts --lang fas
--training_text 
/home/kddlab/Desktop/tesseract-master/1MyData/fas/fas.training_text
 --linedata_only \
  --noextract_font_properties --langdata_dir 
/home/kddlab/Desktop/tesseract-master/langdata \
  --tessdata_dir /home/kddlab/Desktop/tesseract-master/tessdata \
  --fontlist "B Mitra" --output_dir 
/home/kddlab/Desktop/tesseract-master/1MyData/testfas

And i have config file:
# Use LSTM
tessedit_ocr_engine_mode 1
tessedit_pageseg_mode 6

How can i solve this?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a692d903-34be-4a51-99c5-11ed34bb6cef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.