1. Make sure you have the latest version of tesseract.
Then try this script and provide exact/full error message:
import tempfile
import cv2
import pytesseract
from PIL import Image
from pytesseract import Output
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program
Files\\Tesseract-OCR\\tesseract.exe'
*i*mg = cv2.imread('images/invoice-sample.jpg')
# check temp file
temp_file = tempfile.NamedTemporaryFile(prefix='tess_')
print(temp_file.name)
image = Image.fromarray(img)
image.save(temp_file.name + '.png', format='png', **image.info)
temp_file.close()
if img.any():
print("Image shape:", img.shape)
data_dict = pytesseract.image_to_data(img, output_type=Output.DICT)
n_boxes = len(data_dict['level'])
for i in range(n_boxes):
(x, y, w, h) = (data_dict['left'][i], data_dict['top'][i], data_dict['width'
][i], data_dict['height'][i])
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 125, 125), 2)
cv2.imshow('img', img)
cv2.waitKey(0)
else:
print("Can not open input file")
Zdenko
so 29. 2. 2020 o 19:04 Supharerk Thawillarp <[email protected]>
napísal(a):
> Sure
>
> >>> pytesseract.pytesseract.tesseract_cmd = 'C:\\Program
> Files\\Tesseract-OCR\\tesseract.exe'
> >>> pytesseract.get_tesseract_version()
> LooseVersion ('5.0.0-alpha.20200223')
>
>
> เมื่อ วันอาทิตย์ที่ 1 มีนาคม ค.ศ. 2020 0 นาฬิกา 21 นาที 26 วินาที UTC+7,
> zdenop เขียนว่า:
>>
>> This means there is problem with pytesseract/python permissions.
>>
>> Can you get output for pytesseract.get_tesseract_version()?
>>
>> Zdenko
>>
>>
>> so 29. 2. 2020 o 12:10 Supharerk Thawillarp <[email protected]>
>> napísal(a):
>>
>>> No, the tesserect successfully run with output generated in textfile.
>>>
>>> (base) PS C:\Users\Supharerk\ocr_server> & 'C:\Program
>>> Files\Tesseract-OCR\tesseract.exe' .\images\invoice-sample.jpg invoice-
>>> sample
>>> Tesseract Open Source OCR Engine v5.0.0-alpha.20200223 with Leptonica
>>>
>>>
>>>
>>> However, the WinError 5 arise again when running from python (with
>>> pipenv)
>>> (base) PS C:\Users\Supharerk\ocr_server> pipenv run python .\app2.py
>>> Traceback (most recent call last):
>>> File ".\app2.py", line 10, in <module>
>>> d=pytesseract.image_to_data(img,output_type=Output.DICT)
>>> File
>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py"
>>> , line 426, in image_to_data
>>> }[output_type]()
>>> File
>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py"
>>> , line 424, in <lambda>
>>> Output.DICT: lambda: file_to_dict(run_and_get_output(*args), '\t', -
>>> 1),
>>> File
>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py"
>>> , line 264, in run_and_get_output
>>> return output_file.read().decode('utf-8').strip()
>>> File
>>> "c:\users\supharerk\appdata\local\continuum\anaconda3\lib\contextlib.py"
>>> , line 119, in __exit__
>>> next(self.gen)
>>> File
>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py"
>>> , line 176, in save
>>> cleanup(f.name)
>>> File
>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py"
>>> , line 136, in cleanup
>>> raise e
>>> File
>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py"
>>> , line 133, in cleanup
>>> remove(filename)
>>> PermissionError: [WinError 5] Access is denied:
>>> 'C:\\Users\\SUPHAR~1\\AppData\\Local\\Temp\\tess_y3d570lt'
>>>
>>>
>>>
>>>
>>>
>>>
>>> เมื่อ วันเสาร์ที่ 29 กุมภาพันธ์ ค.ศ. 2020 16 นาฬิกา 19 นาที 41 วินาที
>>> UTC+7, zdenop เขียนว่า:
>>>>
>>>> Can you replicate problem with command line /"pure" tesseract? e,g,
>>>> 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
>>>> images/invoice-sample.jpg
>>>> invoice-sample
>>>>
>>>> Zdenko
>>>>
>>>>
>>>> pi 28. 2. 2020 o 20:31 Supharerk Thawillarp <[email protected]>
>>>> napísal(a):
>>>>
>>>>>
>>>>> I'm new to tesseract and trying to follow tutorial on Windows 10 using
>>>>> the code below
>>>>>
>>>>> import cv2
>>>>> import pytesseract
>>>>> from pytesseract import Output
>>>>> pytesseract.pytesseract.tesseract_cmd = 'C:\\Program
>>>>> Files\\Tesseract-OCR\\tesseract.exe'
>>>>>
>>>>>
>>>>> img=cv2.imread('images/invoice-sample.jpg')
>>>>>
>>>>>
>>>>> d=pytesseract.image_to_data(img,output_type=Output.DICT)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> print(d.keys)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The problem is, I keep getting error PermissionError: [WinError 5]
>>>>> Access is denied: 'from implementing image_to_data and image_to_string in
>>>>> Windows 10.
>>>>>
>>>>> Only resource I found in stackoverflow is to set tesseract_cmd, PATH
>>>>> and TESSDATA_PREFIX which did not work for me. Not even using the
>>>>> administrative cmd works.
>>>>>
>>>>> After spending a couple hours I found setting permission for
>>>>> tesseract.exe (right click, select property and go to security tab) by
>>>>> checking Full control and Modify below to make it works.
>>>>>
>>>>> Hope this will help some people strugglingthe same problem.
>>>>>
>>>>>
>>>>> [image: 1582917756731.jpg][image: 1582917788913.jpg]
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/2d9f9f66-40a5-4ce9-9f14-cca48307e9f5%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2d9f9f66-40a5-4ce9-9f14-cca48307e9f5%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/06df8a53-6027-4dc5-af29-b7e29d446b29%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/06df8a53-6027-4dc5-af29-b7e29d446b29%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/71abd149-93fe-478c-a637-6a9faf117c32%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/71abd149-93fe-478c-a637-6a9faf117c32%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zgeP5pqMvSR4%3DwUxwfrYbGiFcqZ-HLXRJLRtJ2tiA7MA%40mail.gmail.com.