I am trying to get the second choice from Tesseract and I find the 
"lstm_choice_mode" parameter.
But it appears that it only work for lstm engine.(Only lstm gives more than 
one output for each characters)
Can anyone give me a clue?

[image: business_tax_payment_001_shadow_out_pt_binary_42.JPG] 
<about:invalid#zClosurez>









*Results for lstm engine:*

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
 <head>
  <title></title>
  <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
  <meta name='ocr-system' content='tesseract v5.0.0-alpha.20200328' />
  <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line 
ocrx_word ocrp_wconf'/>
 </head>
 <body>
  <div class='ocr_page' id='page_1' title='image 
"C:\Users\YUNTSE~1\AppData\Local\Temp\tess_4yg1i978.PNG"; bbox 0 0 224 114; 
ppageno 0'>
   <div class='ocr_carea' id='block_1_1' title="bbox 35 37 154 66">
    <p class='ocr_par' id='par_1_1' lang='chi_tra' title="bbox 35 37 154 66">
     <span class='ocr_line' id='line_1_1' title="bbox 35 37 154 66; baseline 
0.017 -2; x_size 36; x_descenders 9; x_ascenders 9">
      <span class='ocrx_word' id='word_1_1' title='bbox 35 39 91 65; x_wconf 
92'>固定
       <span class='ocrx_cinfo' id='lstm_choices_1_1_1'>
        <span class='ocrx_cinfo' id='choice_1_1_1' title='x_confs 
0'>固</span></span>
       <span class='ocrx_cinfo' id='lstm_choices_1_1_2'>
        <span class='ocrx_cinfo' id='choice_1_1_2' title='x_confs 
95.486488'>定</span>
        <span class='ocrx_cinfo' id='choice_1_1_3' title='x_confs 0'>守</span>
        <span class='ocrx_cinfo' id='choice_1_1_4' title='x_confs 0'>放</span>
        <span class='ocrx_cinfo' id='choice_1_1_5' title='x_confs 0'>軍</span>
        <span class='ocrx_cinfo' id='choice_1_1_6' title='x_confs 0'>說</span>
        <span class='ocrx_cinfo' id='choice_1_1_7' title='x_confs 
0'>補</span></span>
      </span>
      <span class='ocrx_word' id='word_1_2' title='bbox 109 37 154 66; x_wconf 
96'>資產
       <span class='ocrx_cinfo' id='lstm_choices_1_2_1'>
        <span class='ocrx_cinfo' id='choice_1_2_1' title='x_confs 0'>資</span>
        <span class='ocrx_cinfo' id='choice_1_2_2' title='x_confs 0'>謂</span>
        <span class='ocrx_cinfo' id='choice_1_2_3' title='x_confs 0'>寬</span>
        <span class='ocrx_cinfo' id='choice_1_2_4' title='x_confs 0'>寅</span>
        <span class='ocrx_cinfo' id='choice_1_2_5' title='x_confs 
0'>宮</span></span>
       <span class='ocrx_cinfo' id='lstm_choices_1_2_2'>
        <span class='ocrx_cinfo' id='choice_1_2_6' title='x_confs 
96.327087'>產</span>
        <span class='ocrx_cinfo' id='choice_1_2_7' title='x_confs 0'>廬</span>
        <span class='ocrx_cinfo' id='choice_1_2_8' title='x_confs 0'>宙</span>
        <span class='ocrx_cinfo' id='choice_1_2_9' title='x_confs 0'>寬</span>
        <span class='ocrx_cinfo' id='choice_1_2_10' title='x_confs 
0'>放</span></span>
      </span>
     </span>
    </p>
   </div>
  </div>
 </body>
</html>


*Results for legacy engine:*

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
 <head>
  <title></title>
  <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
  <meta name='ocr-system' content='tesseract v5.0.0-alpha.20200328' />
  <meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line 
ocrx_word ocrp_wconf'/>
 </head>
 <body>
  <div class='ocr_page' id='page_1' title='image 
"C:\Users\YUNTSE~1\AppData\Local\Temp\tess_s7_zf3u2.PNG"; bbox 0 0 224 114; 
ppageno 0'>
   <div class='ocr_carea' id='block_1_1' title="bbox 35 37 154 66">
    <p class='ocr_par' id='par_1_1' lang='business_tax_payment' title="bbox 35 
37 154 66">
     <span class='ocr_line' id='line_1_1' title="bbox 35 37 154 66; baseline 0 
-0.999; x_size 36; x_descenders 9; x_ascenders 9">
      <span class='ocrx_word' id='word_1_1' title='bbox 35 37 154 66; x_wconf 
38'>固定固定產
      </span>
     </span>
    </p>
   </div>
  </div>
 </body>
</html>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/81c814a3-c543-4807-a6a3-133f3c3d4049o%40googlegroups.com.

Reply via email to