Re: [tesseract-ocr] Re: tesseract in Windows

2019-11-14 Thread MATHANKUMAR m
Hi Entrak,
I had done those ways by the way of used same pre-trained
data files for both ubuntu and windows. But getting confused why the result
from my windows machine are very low.

Thanks

On Thu, 14 Nov 2019 at 14:06, Entrak Entshuldiga  wrote:

> At first glance, it sounds like you've not included the traineddata-files
> for your chosen language.
>
> If you just have the standard one that comes with the installer, it needs
> to be trained.
> But you can import the pre-trained datafiles easily enough from here:
> https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
>
>
>
> On Thursday, 14 November 2019 06:08:19 UTC+1, MATHANKUMAR m wrote:
>>
>> Hi entrak,
>>   As per your suggestion i did  added the command to call
>> the tesseract exe in my python module.But the question is i don't
>> understand why the results are differs with ubuntu 16.04(more accurate) vs
>> windows 7(poor results) for the same image when tested.Is  there anything i
>> need to do with windows along with tesseract installation If anything then
>> please let me know.
>>
>> Thanks,
>> Mathan
>>
>> On Wed, 13 Nov 2019 at 13:58, Entrak Entshuldiga 
>> wrote:
>>
>>> The installer should install everything you need to run Tesseract.
>>>
>>> You may have to set the environmental path to the correct location
>>> manually, so that you can call tesseract.exe without having to type the
>>> full path every time, but no extra downloads should be needed.
>>>
>>> On Wednesday, 13 November 2019 09:02:54 UTC+1, MATHANKUMAR m wrote:

 Hi,
 Please share me the proper installation instruction for the
 tesseract in windows 7 platform. Because i tried via UB-Mannheim
 's exe option ,  the
 results from ubuntu platform is far better than result from windows
 platform.If i have to use the leptonica or any other packages related to
 this then please provide me an instruction.

 Thank,
 Mathan

>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesser...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/2aa069b0-8cd4-4d67-8bf6-e06fbc271171%40googlegroups.com
>>> 
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/067b004d-be50-4aad-bc94-32989df22d5f%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAOWu5P6ij1vvkrqZ1ZTnXNtr9x3jVEr1-adXT5CoXqYp4VFWfw%40mail.gmail.com.


[tesseract-ocr] Re: Get Character or word error rate

2019-11-14 Thread Quan Nguyen
Use hocr output format. It has confidence values in the output file.

On Thursday, November 14, 2019 at 8:51:09 AM UTC-6, Mobeen Ali wrote:
>
> Hi everyone!
>
> I was wondering if there is a function or method to get character or word 
> error rate after applying tesseract-ocr on the image?
>
> What i mean is, for example,
>
> when i run the command
>
>- tesseract  test_image.tiff  output_text
>
> after writing the text in the output_text file, it could return me the 
> word error rate or character error rate of the text...
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/019d3474-aa97-4b58-b510-12270bddfd97%40googlegroups.com.


[tesseract-ocr] Get Character or word error rate

2019-11-14 Thread Mobeen Ali
Hi everyone!

I was wondering if there is a function or method to get character or word 
error rate after applying tesseract-ocr on the image?

What i mean is, for example,

when i run the command

   - tesseract  test_image.tiff  output_text

after writing the text in the output_text file, it could return me the word 
error rate or character error rate of the text...

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0deb7127-198d-4a1e-ae04-655ad6a20a56%40googlegroups.com.


Re: [tesseract-ocr] tesstrain.sh hangs at Phase I: Generating Training Images, at a point and never proceed to next phase

2019-11-14 Thread Mobeen Ali
Thanks shree!... i used python script of tesstrain and it worked like a 
charm.

Thanks alot

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6c9f8c35-21f7-4f9f-8b17-b974405d0837%40googlegroups.com.


[tesseract-ocr] Re: How to identify distorted text ?

2019-11-14 Thread Entrak Entshuldiga
You need some sort of image enhancement to do that, as Tesseract only scans 
the image-file provided for known patterns.



On Monday, 11 November 2019 09:18:54 UTC+1, amo wrote:
>
> How to identify distorted text? I want to use tesseract for verification 
> code recognition.
> Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4d942c0a-0632-4760-bde5-96d18c053b11%40googlegroups.com.


Re: [tesseract-ocr] Restricting OCR to not to read specific words

2019-11-14 Thread Hanny Narola

Thank you for your reply. 

Currently I have implemented the method you suggested. But just wanted to 
explore, if any other method can be used to get desired result.

On Thursday, November 14, 2019 at 5:34:47 PM UTC+5:30, J Adam Funk wrote:
>
> Hi, 
>
> I think it might be easier for you to use regular expressions to filter 
> the OCR output.  You could use also the field labels to check the 
> alignment of the data. 
>
> HTH, 
> Adam 
>
>
>
>
>
>
> On 13/11/2019 06:16, Hanny Narola wrote: 
> > This is my first time working with tesseract and I am using python. I am 
> > using pytesseract library. 
> > My current task is to read user form and store user's data to database. 
> > I do not want to read the labels like First name, last name etc, Just 
> > want to read values. 
> > I don't know how to approach this. 
> > 
> > What should I write in config? 
> > Should I create config file? 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a94e5fed-ce63-4870-95a8-d5867af4aed9%40googlegroups.com.


Re: [tesseract-ocr] Restricting OCR to not to read specific words

2019-11-14 Thread Adam Funk
Hi,

I think it might be easier for you to use regular expressions to filter
the OCR output.  You could use also the field labels to check the
alignment of the data.

HTH,
Adam






On 13/11/2019 06:16, Hanny Narola wrote:
> This is my first time working with tesseract and I am using python. I am
> using pytesseract library.
> My current task is to read user form and store user's data to database.
> I do not want to read the labels like First name, last name etc, Just
> want to read values.
> I don't know how to approach this.
> 
> What should I write in config?
> Should I create config file?
> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tesseract-ocr+unsubscr...@googlegroups.com
> .
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/8241afaa-7c0c-4761-98f2-afd946f907e1%40googlegroups.com
> .

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ac6a8aee-d92c-d47f-bd8b-0d64e674f4bb%40sheffield.ac.uk.


[tesseract-ocr] tesseract Architecture

2019-11-14 Thread Kyungjun Lee
Hi

I'm newbie at tesseract


I want to see the diagram of tesseract-ocr(with lstm model)

Could you show me the architecture of tesseract-ocr

In andvance, Thank you for replying my question

Have a nice day~ 


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8fac3caf-213c-407a-9225-256a6cf6ead3%40googlegroups.com.


Re: [tesseract-ocr] Re: tesseract in Windows

2019-11-14 Thread Entrak Entshuldiga
At first glance, it sounds like you've not included the traineddata-files 
for your chosen language.

If you just have the standard one that comes with the installer, it needs 
to be trained. 
But you can import the pre-trained datafiles easily enough from here: 
https://github.com/tesseract-ocr/tesseract/wiki/Data-Files



On Thursday, 14 November 2019 06:08:19 UTC+1, MATHANKUMAR m wrote:
>
> Hi entrak,
>   As per your suggestion i did  added the command to call 
> the tesseract exe in my python module.But the question is i don't 
> understand why the results are differs with ubuntu 16.04(more accurate) vs 
> windows 7(poor results) for the same image when tested.Is  there anything i 
> need to do with windows along with tesseract installation If anything then 
> please let me know.
>
> Thanks,
> Mathan
>
> On Wed, 13 Nov 2019 at 13:58, Entrak Entshuldiga  > wrote:
>
>> The installer should install everything you need to run Tesseract.
>>
>> You may have to set the environmental path to the correct location 
>> manually, so that you can call tesseract.exe without having to type the 
>> full path every time, but no extra downloads should be needed.
>>
>> On Wednesday, 13 November 2019 09:02:54 UTC+1, MATHANKUMAR m wrote:
>>>
>>> Hi,
>>> Please share me the proper installation instruction for the 
>>> tesseract in windows 7 platform. Because i tried via UB-Mannheim 
>>> 's exe option ,  the 
>>> results from ubuntu platform is far better than result from windows 
>>> platform.If i have to use the leptonica or any other packages related to 
>>> this then please provide me an instruction.
>>>
>>> Thank,
>>> Mathan
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2aa069b0-8cd4-4d67-8bf6-e06fbc271171%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/067b004d-be50-4aad-bc94-32989df22d5f%40googlegroups.com.