[tesseract-ocr] Where to find the Tesseract.dll for Tesseract OCR version v5.0.0.

2021-04-21 Thread Sharp Subbu
Dear Friends,

We have tried to find the Tesseract.dll for Tesseract OCR version v5.0.0. 
in the Tesseract git hub url ().
Kindly share the Tesseract.dll for Tesseract OCR version v5.0.0 if you have 
it, or kindly share the steps to create this dll from the Tesseract Git hub 
code.

Thanks in advance.

Regards,
Subramanyam

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7b810d08-f52d-496b-a74b-dfa088f38be9n%40googlegroups.com.


Re: [tesseract-ocr] Re: tessedit_create_boxfile condensed like boxaGetBox

2021-04-21 Thread Zdenko Podobny
Use tsv output but you will still need to parse it to get line information.

Zdenko


st 21. 4. 2021 o 16:38 Baris Unsal  napísal(a):

> I want the opposite way. Getting ril_textline like output from passing
> argument to tesseract.
>
> On Wednesday, 21 April 2021 at 17:36:35 UTC+3 Quan Nguyen wrote:
>
>> I think it would need to operate at RIL_SYMBOL level, not RIL_TEXTLINE.
>>
>> On Wednesday, April 21, 2021 at 7:17:04 AM UTC-5 yosoyl...@gmail.com
>> wrote:
>>
>>> Hi, when I pass tessedit_create_boxfile 1 argument to tesseract it
>>> outputs individual chars' location. But when I use api like this:
>>>
>>> ```
>>> Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true,NULL
>>> ,NULL);
>>> for(int i = 0; i < boxes->n; i++){
>>> BOX* box =boxaGetBox(boxes,i,L_CLONE);
>>> api->SetRectangle(box->x,box->y,box->w,box->h);
>>> char* outText = api->GetUTF8Text();
>>> int conf = api->MeanTextConf();
>>> fprintf(stdout,"Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text:
>>> %s",
>>> i, box->x, box->y, box->w, box->h, conf, outText);
>>> boxDestroy();
>>> delete[] outText;
>>> }
>>> ```
>>> it outputs whole line like this:
>>> Box[1]: x=36, y=84, w=246, h=14, confidence: 44, text: #Spor #siyaset
>>> Fanket FIliskiler
>>>
>>> Is there any way to combine individual boxes to print like API? Thanks
>>> in advance.
>>>
>>>
>>>
>>>
>>>
>>>
>>> 
>>> ### Environment
>>>
>>> * **Tesseract Version**: 
>>> tesseract 4.1.1-rc2-25-g9707
>>>  leptonica-1.78.0
>>>   libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 :
>>> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>>>  Found AVX2
>>>  Found AVX
>>>  Found FMA
>>>  Found SSE
>>>  Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6
>>> liblz4/1.8.3 libzstd/1.3.8
>>>
>>> * **Platform**: 
>>> Linux pardus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28)
>>> x86_64 GNU/Linux
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/8e54bc79-113a-4685-9bba-2353216dad2fn%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wS8XdwKW1eG%2BBW2L2ieVMYt%2B4GjAP59tyf%2BQpcWVOkwA%40mail.gmail.com.


Re: [tesseract-ocr] detect decimal point in amount with psm 11

2021-04-21 Thread Zdenko Podobny
   1. You got the result for the image you provided.
   2. I suggest you to use other oem
   3. I know that invoice digitalizator use different parameters for
   parsing numbers.


Zdenko


st 21. 4. 2021 o 17:45 Kumar Rajwani 
napísal(a):

> Hi Zdenop, As i said i know psm 6 working better in number but it not able
> to get all text in image. where psm 11 does better. So this the reason i
> want to with psm 11 but i am getting wrong amount that's the only problem i
> am facing with psm 11. So can you tell me how can i achive same result as
> you in psm 11.
> Thanks
>
> On Wednesday, April 21, 2021 at 8:34:20 PM UTC+5:30 zdenop wrote:
>
>> Try to use better config parameters. e.g:
>>
>> $ tesseract download.png - --psm 6 --oem 0
>> will produce:
>> $ 250,941.00
>> $ -75,282.00
>> $ 175,659.00
>> $ -15,072 00
>> $ 2,860.00
>> $ 0.00
>> $ 163,447.00
>>
>> legacy engine could be better for numbers
>>
>> Zdenko
>>
>>
>> st 21. 4. 2021 o 14:10 Kumar Rajwani  napísal(a):
>>
>>> Hey,
>>> I am using tesseract to identify amounts in my forms. You can look below
>>> image for sample. i am getting perfect amount with decimal in psm 6.
>>> but when i use psm 11 i am getting follwing output. I have to use psm 11
>>> as it identify more text with compare to psm 6 in my images.
>>> 250,941
>>> 00
>>> 00
>>> -75,282
>>> 175,659
>>> 00
>>> -15,072
>>> 00
>>> 2,860
>>> 00
>>> 00
>>> 163,447
>>> 00
>>> The code i am using.
>>> print(pytesseract.image_to_string(image.crop((2000,1570,2500,2000)),
>>>   lang="eng",
>>>
>>>   config = '-c tessedit_do_invert=0 --psm 
>>> 11').replace("\n\n","\n"))
>>>
>>> I want to ask if there is any changes i can do to get decimal point with
>>> psm 11.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/4d793afb-b554-4322-83ef-4ff94accc85en%40googlegroups.com
>>> 
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/aaede6a0-c304-45a7-badd-b242091d821bn%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yJn_s41YkO15gauTdVjJS%2BQJr9fVC7%3DNFfQM15q4V41Q%40mail.gmail.com.


Re: [tesseract-ocr] detect decimal point in amount with psm 11

2021-04-21 Thread Kumar Rajwani
Hi Zdenop, As i said i know psm 6 working better in number but it not able 
to get all text in image. where psm 11 does better. So this the reason i 
want to with psm 11 but i am getting wrong amount that's the only problem i 
am facing with psm 11. So can you tell me how can i achive same result as 
you in psm 11.
Thanks

On Wednesday, April 21, 2021 at 8:34:20 PM UTC+5:30 zdenop wrote:

> Try to use better config parameters. e.g:
>
> $ tesseract download.png - --psm 6 --oem 0
> will produce:
> $ 250,941.00
> $ -75,282.00
> $ 175,659.00
> $ -15,072 00
> $ 2,860.00
> $ 0.00
> $ 163,447.00
>
> legacy engine could be better for numbers
>
> Zdenko
>
>
> st 21. 4. 2021 o 14:10 Kumar Rajwani  napísal(a):
>
>> Hey,
>> I am using tesseract to identify amounts in my forms. You can look below 
>> image for sample. i am getting perfect amount with decimal in psm 6.
>> but when i use psm 11 i am getting follwing output. I have to use psm 11 
>> as it identify more text with compare to psm 6 in my images.
>> 250,941
>> 00
>> 00
>> -75,282
>> 175,659
>> 00
>> -15,072
>> 00
>> 2,860
>> 00
>> 00
>> 163,447
>> 00
>> The code i am using.
>> print(pytesseract.image_to_string(image.crop((2000,1570,2500,2000)),
>>   lang="eng",
>>
>>   config = '-c tessedit_do_invert=0 --psm 
>> 11').replace("\n\n","\n"))
>>
>> I want to ask if there is any changes i can do to get decimal point with 
>> psm 11.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/4d793afb-b554-4322-83ef-4ff94accc85en%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/aaede6a0-c304-45a7-badd-b242091d821bn%40googlegroups.com.


Re: [tesseract-ocr] detect decimal point in amount with psm 11

2021-04-21 Thread Zdenko Podobny
Try to use better config parameters. e.g:

$ tesseract download.png - --psm 6 --oem 0
will produce:
$ 250,941.00
$ -75,282.00
$ 175,659.00
$ -15,072 00
$ 2,860.00
$ 0.00
$ 163,447.00

legacy engine could be better for numbers

Zdenko


st 21. 4. 2021 o 14:10 Kumar Rajwani 
napísal(a):

> Hey,
> I am using tesseract to identify amounts in my forms. You can look below
> image for sample. i am getting perfect amount with decimal in psm 6.
> but when i use psm 11 i am getting follwing output. I have to use psm 11
> as it identify more text with compare to psm 6 in my images.
> 250,941
> 00
> 00
> -75,282
> 175,659
> 00
> -15,072
> 00
> 2,860
> 00
> 00
> 163,447
> 00
> The code i am using.
> print(pytesseract.image_to_string(image.crop((2000,1570,2500,2000)),
>   lang="eng",
>
>   config = '-c tessedit_do_invert=0 --psm 
> 11').replace("\n\n","\n"))
>
> I want to ask if there is any changes i can do to get decimal point with
> psm 11.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/4d793afb-b554-4322-83ef-4ff94accc85en%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xQ%2BUwBUo2A_t6XWrbhZyM1Jvtw9%3Dp_nbuMi%2BBr%2BVG_%2BA%40mail.gmail.com.


[tesseract-ocr] Re: tessedit_create_boxfile condensed like boxaGetBox

2021-04-21 Thread Baris Unsal
I want the opposite way. Getting ril_textline like output from passing 
argument to tesseract.

On Wednesday, 21 April 2021 at 17:36:35 UTC+3 Quan Nguyen wrote:

> I think it would need to operate at RIL_SYMBOL level, not RIL_TEXTLINE.
>
> On Wednesday, April 21, 2021 at 7:17:04 AM UTC-5 yosoyl...@gmail.com 
> wrote:
>
>> Hi, when I pass tessedit_create_boxfile 1 argument to tesseract it 
>> outputs individual chars' location. But when I use api like this:
>>
>> ```
>> Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true,NULL,
>> NULL);
>> for(int i = 0; i < boxes->n; i++){
>> BOX* box =boxaGetBox(boxes,i,L_CLONE);
>> api->SetRectangle(box->x,box->y,box->w,box->h);
>> char* outText = api->GetUTF8Text();
>> int conf = api->MeanTextConf();
>> fprintf(stdout,"Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: 
>> %s",
>> i, box->x, box->y, box->w, box->h, conf, outText);
>> boxDestroy();
>> delete[] outText;
>> }
>> ```
>> it outputs whole line like this:
>> Box[1]: x=36, y=84, w=246, h=14, confidence: 44, text: #Spor #siyaset 
>> Fanket FIliskiler
>>
>> Is there any way to combine individual boxes to print like API? Thanks in 
>> advance.
>>
>>
>>
>>
>>
>>
>> 
>> ### Environment
>>
>> * **Tesseract Version**: 
>> tesseract 4.1.1-rc2-25-g9707
>>  leptonica-1.78.0
>>   libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : 
>> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>>  Found AVX2
>>  Found AVX
>>  Found FMA
>>  Found SSE
>>  Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 
>> liblz4/1.8.3 libzstd/1.3.8
>>
>> * **Platform**: 
>> Linux pardus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 
>> GNU/Linux
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8e54bc79-113a-4685-9bba-2353216dad2fn%40googlegroups.com.


[tesseract-ocr] Re: tessedit_create_boxfile condensed like boxaGetBox

2021-04-21 Thread Quan Nguyen
I think it would need to operate at RIL_SYMBOL level, not RIL_TEXTLINE.

On Wednesday, April 21, 2021 at 7:17:04 AM UTC-5 yosoyl...@gmail.com wrote:

> Hi, when I pass tessedit_create_boxfile 1 argument to tesseract it outputs 
> individual chars' location. But when I use api like this:
>
> ```
> Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true,NULL,
> NULL);
> for(int i = 0; i < boxes->n; i++){
> BOX* box =boxaGetBox(boxes,i,L_CLONE);
> api->SetRectangle(box->x,box->y,box->w,box->h);
> char* outText = api->GetUTF8Text();
> int conf = api->MeanTextConf();
> fprintf(stdout,"Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: %s"
> ,
> i, box->x, box->y, box->w, box->h, conf, outText);
> boxDestroy();
> delete[] outText;
> }
> ```
> it outputs whole line like this:
> Box[1]: x=36, y=84, w=246, h=14, confidence: 44, text: #Spor #siyaset 
> Fanket FIliskiler
>
> Is there any way to combine individual boxes to print like API? Thanks in 
> advance.
>
>
>
>
>
>
> 
> ### Environment
>
> * **Tesseract Version**: 
> tesseract 4.1.1-rc2-25-g9707
>  leptonica-1.78.0
>   libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : 
> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>  Found AVX2
>  Found AVX
>  Found FMA
>  Found SSE
>  Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 
> liblz4/1.8.3 libzstd/1.3.8
>
> * **Platform**: 
> Linux pardus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 
> GNU/Linux
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a20ef4b7-9f76-4f20-a867-5d6f60fc6c62n%40googlegroups.com.


Re: [tesseract-ocr] tessedit_create_boxfile condensed like boxaGetBox

2021-04-21 Thread Zdenko Podobny
Hello,

it is unclear for what do you do/want to do:

   - you wrote want individual chars, but request from API line
   (RIL_TEXTLINE)
   - then you wrote " Is there any way to combine individual boxes to print
   like API" so what do you want to combine?

Maybe it would be better if you provide input images and desired output...

Zdenko


st 21. 4. 2021 o 14:17 Baris Unsal  napísal(a):

> Hi, when I pass tessedit_create_boxfile 1 argument to tesseract it outputs
> individual chars' location. But when I use api like this:
>
> ```
> Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true,NULL,
> NULL);
> for(int i = 0; i < boxes->n; i++){
> BOX* box =boxaGetBox(boxes,i,L_CLONE);
> api->SetRectangle(box->x,box->y,box->w,box->h);
> char* outText = api->GetUTF8Text();
> int conf = api->MeanTextConf();
> fprintf(stdout,"Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: %s"
> ,
> i, box->x, box->y, box->w, box->h, conf, outText);
> boxDestroy();
> delete[] outText;
> }
> ```
> it outputs whole line like this:
> Box[1]: x=36, y=84, w=246, h=14, confidence: 44, text: #Spor #siyaset
> Fanket FIliskiler
>
> Is there any way to combine individual boxes to print like API? Thanks in
> advance.
>
>
>
>
>
>
> 
> ### Environment
>
> * **Tesseract Version**: 
> tesseract 4.1.1-rc2-25-g9707
>  leptonica-1.78.0
>   libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 :
> libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
>  Found AVX2
>  Found AVX
>  Found FMA
>  Found SSE
>  Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6
> liblz4/1.8.3 libzstd/1.3.8
>
> * **Platform**: 
> Linux pardus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64
> GNU/Linux
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/afa7a425-7946-4bf1-b6f6-7f5d39ab2d6cn%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xWSouBq1d6kr1mnbqG0iT%2BF0K4NiHt2mNW1oxWEqSd7w%40mail.gmail.com.


[tesseract-ocr] tessedit_create_boxfile condensed like boxaGetBox

2021-04-21 Thread Baris Unsal
Hi, when I pass tessedit_create_boxfile 1 argument to tesseract it outputs 
individual chars' location. But when I use api like this:

```
Boxa* boxes = api->GetComponentImages(tesseract::RIL_TEXTLINE, true,NULL,
NULL);
for(int i = 0; i < boxes->n; i++){
BOX* box =boxaGetBox(boxes,i,L_CLONE);
api->SetRectangle(box->x,box->y,box->w,box->h);
char* outText = api->GetUTF8Text();
int conf = api->MeanTextConf();
fprintf(stdout,"Box[%d]: x=%d, y=%d, w=%d, h=%d, confidence: %d, text: %s",
i, box->x, box->y, box->w, box->h, conf, outText);
boxDestroy();
delete[] outText;
}
```
it outputs whole line like this:
Box[1]: x=36, y=84, w=246, h=14, confidence: 44, text: #Spor #siyaset 
Fanket FIliskiler

Is there any way to combine individual boxes to print like API? Thanks in 
advance.







### Environment

* **Tesseract Version**: 
tesseract 4.1.1-rc2-25-g9707
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : libtiff 
4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 liblz4/1.8.3 
libzstd/1.3.8

* **Platform**: 
Linux pardus 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 
GNU/Linux

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/afa7a425-7946-4bf1-b6f6-7f5d39ab2d6cn%40googlegroups.com.


[tesseract-ocr] detect decimal point in amount with psm 11

2021-04-21 Thread Kumar Rajwani
Hey,
I am using tesseract to identify amounts in my forms. You can look below 
image for sample. i am getting perfect amount with decimal in psm 6.
but when i use psm 11 i am getting follwing output. I have to use psm 11 as 
it identify more text with compare to psm 6 in my images.
250,941
00
00
-75,282
175,659
00
-15,072
00
2,860
00
00
163,447
00
The code i am using.
print(pytesseract.image_to_string(image.crop((2000,1570,2500,2000)),
  lang="eng",
  config = '-c tessedit_do_invert=0 --psm 
11').replace("\n\n","\n"))

I want to ask if there is any changes i can do to get decimal point with 
psm 11.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4d793afb-b554-4322-83ef-4ff94accc85en%40googlegroups.com.