[tesseract-ocr] Re: Tesseract page segmentation algorithm?

2019-09-30 Thread Balachandar Suresh
Hi,
If you are still looking at this. Here you go. 
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35094.pdf



On Saturday, September 22, 2018 at 11:23:03 PM UTC+5:30, chulwoo pack wrote:
>
> Hi everyone,
>
> Does anyone know what kind of method/algorithm is being used in the 
> tesseract's fully automated page segmentation?
> I am specifically interested in the segmentation portion rather than any 
> other pre-processing steps, such as deskewing or noise-removal process. I 
> have tried really hard to find any documentation that might specify the 
> sequence of its process or the algorithm is based on particular paper, etc.
>
> Thank you.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/31273629-e11b-41fe-890a-e7ca2103e92e%40googlegroups.com.


Re: [tesseract-ocr] Re: Tesseract Recognition using psm13 for charatcers like "t", "i", "j"

2019-09-30 Thread Purushotham Rao Eravalli
Thank you very much. I will look into the versions and get back to you.


On Mon, Sep 30, 2019, 7:34 PM Zdenko Podobny  wrote:

> >tesseract front2-201-6.jpg -
> Warning: Invalid resolution 0 dpi. Using 70 instead.
> Estimating resolution as 148
> Aimanam, Pulikkuttissery.
>
> >tesseract front2-476-4.jpg -
> Warning: Invalid resolution 0 dpi. Using 70 instead.
> Estimating resolution as 170
> S/O: ltvari Lal, Village patti kuki.
>
>
> >tesseract -v
> tesseract 5.0.0-alpha-456-g021f
>  leptonica-1.79.0 (Sep 16 2019, 13:25:21) [MSC v.1916 LIB Release x64]
>   libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 2.0.2) : libpng 1.6.36 :
> libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.0
>  Found AVX2
>  Found AVX
>  Found FMA
>  Found SSE
>  Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 libzstd/1.3.8
>
> IMO 4.1 should produce the same result. I use model from tessdata_best.
>
> Zdenko
>
>
> po 30. 9. 2019 o 15:12 Purushotham Rao Eravalli 
> napísal(a):
>
>> [image: 8aa8ea34feb16d5ee596e05fffe4c81f.jpg_front2-201-6.jpg]
>>
>> [image: 5e07a43c069f76fcb85505f8dcda1721.jpg_front2-476-4.jpg]
>>
>>
>> On Monday, September 30, 2019 at 3:59:00 PM UTC+5:30, Purushotham Rao
>> Eravalli wrote:
>>>
>>> Hi,
>>>
>>> I retrained tesseract with Calibiri, arial. While testing on the cropped
>>> text images I am facing issues where the characters "t", "i", "j" are all
>>> recognised as "l" adn sometimes "e" as "a". Does someone have solution for
>>> this.
>>>
>>>
>>> Thanks,
>>> Purushotham
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/19907562-078e-498f-8596-6b407d7407f2%40googlegroups.com
>> 
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoFKN7NcfeK_TDynTt0XRzoFV64dwf1FOh%3DmeV1ozSMQ%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHrKmrWthknv5mxRgec5HdyE-k_hsW-VQuFR8HOU41XB9ZUGSg%40mail.gmail.com.


Re: [tesseract-ocr] Re: Tesseract Recognition using psm13 for charatcers like "t", "i", "j"

2019-09-30 Thread Zdenko Podobny
 >tesseract front2-201-6.jpg -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 148
Aimanam, Pulikkuttissery.

>tesseract front2-476-4.jpg -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 170
S/O: ltvari Lal, Village patti kuki.


>tesseract -v
tesseract 5.0.0-alpha-456-g021f
 leptonica-1.79.0 (Sep 16 2019, 13:25:21) [MSC v.1916 LIB Release x64]
  libgif 5.1.2 : libjpeg 6b (libjpeg-turbo 2.0.2) : libpng 1.6.36 : libtiff
4.0.10 : zlib 1.2.11 : libwebp 1.0.2 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 libzstd/1.3.8

IMO 4.1 should produce the same result. I use model from tessdata_best.

Zdenko


po 30. 9. 2019 o 15:12 Purushotham Rao Eravalli 
napísal(a):

> [image: 8aa8ea34feb16d5ee596e05fffe4c81f.jpg_front2-201-6.jpg]
>
> [image: 5e07a43c069f76fcb85505f8dcda1721.jpg_front2-476-4.jpg]
>
>
> On Monday, September 30, 2019 at 3:59:00 PM UTC+5:30, Purushotham Rao
> Eravalli wrote:
>>
>> Hi,
>>
>> I retrained tesseract with Calibiri, arial. While testing on the cropped
>> text images I am facing issues where the characters "t", "i", "j" are all
>> recognised as "l" adn sometimes "e" as "a". Does someone have solution for
>> this.
>>
>>
>> Thanks,
>> Purushotham
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/19907562-078e-498f-8596-6b407d7407f2%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xoFKN7NcfeK_TDynTt0XRzoFV64dwf1FOh%3DmeV1ozSMQ%40mail.gmail.com.


Re: [tesseract-ocr] Tesseract Recognition using psm13 for charatcers like "t", "i", "j"

2019-09-30 Thread Purushotham Rao Eravalli
Hi,
Please look at these images.


Thanks

On Mon, Sep 30, 2019 at 6:35 PM Zdenko Podobny  wrote:

> Can you provide testing images?
>  I do not think there is any need to retrain  tesseract for common font
> like Arial.
>
> Zdenko
>
>
> po 30. 9. 2019 o 12:29 Purushotham Rao Eravalli 
> napísal(a):
>
>> Hi,
>>
>> I retrained tesseract with Calibiri, arial. While testing on the cropped
>> text images I am facing issues where the characters "t", "i", "j" are all
>> recognised as "l" adn sometimes "e" as "a". Does someone have solution for
>> this.
>>
>>
>> Thanks,
>> Purushotham
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/577d1038-e809-42b4-8e3c-242e04f77d22%40googlegroups.com
>> 
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wPpDS9Y7iM%3Dsar43bf4mTijCQ3nKTEJ%2B1FQoWCJhLXAA%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHrKmrXhMh3xq6sCsYf21e6jB2fzVv18VOh-hq3_1V%2BTmWceFg%40mail.gmail.com.


Re: [tesseract-ocr] Tesseract Recognition using psm13 for charatcers like "t", "i", "j"

2019-09-30 Thread Zdenko Podobny
Can you provide testing images?
 I do not think there is any need to retrain  tesseract for common font
like Arial.

Zdenko


po 30. 9. 2019 o 12:29 Purushotham Rao Eravalli 
napísal(a):

> Hi,
>
> I retrained tesseract with Calibiri, arial. While testing on the cropped
> text images I am facing issues where the characters "t", "i", "j" are all
> recognised as "l" adn sometimes "e" as "a". Does someone have solution for
> this.
>
>
> Thanks,
> Purushotham
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/577d1038-e809-42b4-8e3c-242e04f77d22%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wPpDS9Y7iM%3Dsar43bf4mTijCQ3nKTEJ%2B1FQoWCJhLXAA%40mail.gmail.com.


[tesseract-ocr] Tesseract Recognition using psm13 for charatcers like "t", "i", "j"

2019-09-30 Thread Purushotham Rao Eravalli
Hi,

I retrained tesseract with Calibiri, arial. While testing on the cropped 
text images I am facing issues where the characters "t", "i", "j" are all 
recognised as "l" adn sometimes "e" as "a". Does someone have solution for 
this.


Thanks,
Purushotham

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/577d1038-e809-42b4-8e3c-242e04f77d22%40googlegroups.com.