anyway report it to pytesseract project, so it can be fixed - otherwise next update will bring it once again.
Zdenko ne 1. 3. 2020 o 18:17 Supharerk Thawillarp <[email protected]> napísal(a): > After diving in pytesseract.py I found one possible related issue in > the NamedTemporaryFile. > > According to the post in stackoverflow ( > https://stackoverflow.com/questions/55081022/python-tempfile-with-a-context-manager-on-windows-10-leads-to-permissionerror), > I added the delete=False argument in the NamedTemporaryFile function in > pytesseract.py. > > > @contextmanager > # > https://stackoverflow.com/questions/55081022/python-tempfile-with-a-context-manager-on-windows-10-leads-to-permissionerror > def save(image): > try: > with NamedTemporaryFile(prefix='tess_',delete=False) as f: > if isinstance(image, str): > yield f.name, realpath(normpath(normcase(image))) > return > > > > It's working since then. > > I will forward this thread and issue to pytesseract. > > Thanks for you help. > > > > > > > > เมื่อ วันอาทิตย์ที่ 1 มีนาคม ค.ศ. 2020 23 นาฬิกา 40 นาที 26 วินาที UTC+7, > zdenop เขียนว่า: >> >> Hello, >> >> I am not able to reproduce error, errors come from here [1] where >> pytesseract tries to cleanup temporary files. >> You should report it to pytesseract project as there is no option to skip >> this code. >> Maybe you can try to modify this part of pytesseact code[2]: >> >> finally: >> cleanup(f.name) >> >> to >> >> finally: >> f.close() >> cleanup(f.name) >> >> >> [1] >> https://github.com/madmaze/pytesseract/blob/master/src/pytesseract.py#L131 >> >> [2] >> https://github.com/madmaze/pytesseract/blob/7fef19ff176bd9f837753dc4c0ebc76b16267775/src/pytesseract.py#L176 >> >> Zdenko >> >> >> ne 1. 3. 2020 o 14:11 Supharerk Thawillarp <[email protected]> >> napísal(a): >> >>> ok, it gave me WinErr5 again. >>> >>> >>> PS C:\Users\Supharerk\ocr_server> pipenv run python .\test_tess.py >>> C:\Users\SUPHAR~1\AppData\Local\Temp\tess_g9e7avw0 >>> Image shape: (1150, 835, 3) >>> Traceback (most recent call last): >>> File ".\test_tess.py", line 19, in <module> >>> data_dict = pytesseract.image_to_data(img, output_type=Output.DICT) >>> File >>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>> , line 426, in image_to_data >>> }[output_type]() >>> File >>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>> , line 424, in <lambda> >>> Output.DICT: lambda: file_to_dict(run_and_get_output(*args), '\t', - >>> 1), >>> File >>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>> , line 264, in run_and_get_output >>> return output_file.read().decode('utf-8').strip() >>> File >>> "c:\users\supharerk\appdata\local\continuum\anaconda3\lib\contextlib.py" >>> , line 119, in __exit__ >>> next(self.gen) >>> File >>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>> , line 176, in save >>> cleanup(f.name) >>> File >>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>> , line 136, in cleanup >>> raise e >>> File >>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>> , line 133, in cleanup >>> remove(filename) >>> PermissionError: [WinError 5] Access is denied: >>> 'C:\\Users\\SUPHAR~1\\AppData\\Local\\Temp\\tess_69cggzq3' >>> >>> >>> >>> >>> >>> เมื่อ วันอาทิตย์ที่ 1 มีนาคม ค.ศ. 2020 3 นาฬิกา 05 นาที 39 วินาที UTC+7, >>> zdenop เขียนว่า: >>>> >>>> 1. Make sure you have the latest version of tesseract. >>>> Then try this script and provide exact/full error message: >>>> >>>> import tempfile >>>> >>>> import cv2 >>>> import pytesseract >>>> from PIL import Image >>>> from pytesseract import Output >>>> >>>> pytesseract.pytesseract.tesseract_cmd = 'C:\\Program >>>> Files\\Tesseract-OCR\\tesseract.exe' >>>> >>>> *i*mg = cv2.imread('images/invoice-sample.jpg') >>>> >>>> # check temp file >>>> temp_file = tempfile.NamedTemporaryFile(prefix='tess_') >>>> print(temp_file.name) >>>> image = Image.fromarray(img) >>>> image.save(temp_file.name + '.png', format='png', **image.info) >>>> temp_file.close() >>>> >>>> if img.any(): >>>> print("Image shape:", img.shape) >>>> data_dict = pytesseract.image_to_data(img, output_type=Output.DICT) >>>> n_boxes = len(data_dict['level']) >>>> for i in range(n_boxes): >>>> (x, y, w, h) = (data_dict['left'][i], data_dict['top'][i], data_dict[ >>>> 'width'][i], data_dict['height'][i]) >>>> cv2.rectangle(img, (x, y), (x + w, y + h), (255, 125, 125), 2) >>>> cv2.imshow('img', img) >>>> cv2.waitKey(0) >>>> else: >>>> print("Can not open input file") >>>> >>>> >>>> >>>> >>>> Zdenko >>>> >>>> >>>> so 29. 2. 2020 o 19:04 Supharerk Thawillarp <[email protected]> >>>> napísal(a): >>>> >>>>> Sure >>>>> >>>>> >>> pytesseract.pytesseract.tesseract_cmd = 'C:\\Program >>>>> Files\\Tesseract-OCR\\tesseract.exe' >>>>> >>> pytesseract.get_tesseract_version() >>>>> LooseVersion ('5.0.0-alpha.20200223') >>>>> >>>>> >>>>> เมื่อ วันอาทิตย์ที่ 1 มีนาคม ค.ศ. 2020 0 นาฬิกา 21 นาที 26 วินาที >>>>> UTC+7, zdenop เขียนว่า: >>>>>> >>>>>> This means there is problem with pytesseract/python permissions. >>>>>> >>>>>> Can you get output for pytesseract.get_tesseract_version()? >>>>>> >>>>>> Zdenko >>>>>> >>>>>> >>>>>> so 29. 2. 2020 o 12:10 Supharerk Thawillarp <[email protected]> >>>>>> napísal(a): >>>>>> >>>>>>> No, the tesserect successfully run with output generated in textfile. >>>>>>> >>>>>>> (base) PS C:\Users\Supharerk\ocr_server> & 'C:\Program >>>>>>> Files\Tesseract-OCR\tesseract.exe' .\images\invoice-sample.jpg >>>>>>> invoice-sample >>>>>>> Tesseract Open Source OCR Engine v5.0.0-alpha.20200223 with >>>>>>> Leptonica >>>>>>> >>>>>>> >>>>>>> >>>>>>> However, the WinError 5 arise again when running from python (with >>>>>>> pipenv) >>>>>>> (base) PS C:\Users\Supharerk\ocr_server> pipenv run python .\app2.py >>>>>>> Traceback (most recent call last): >>>>>>> File ".\app2.py", line 10, in <module> >>>>>>> d=pytesseract.image_to_data(img,output_type=Output.DICT) >>>>>>> File >>>>>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>>>>>> , line 426, in image_to_data >>>>>>> }[output_type]() >>>>>>> File >>>>>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>>>>>> , line 424, in <lambda> >>>>>>> Output.DICT: lambda: file_to_dict(run_and_get_output(*args), >>>>>>> '\t', -1), >>>>>>> File >>>>>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>>>>>> , line 264, in run_and_get_output >>>>>>> return output_file.read().decode('utf-8').strip() >>>>>>> File >>>>>>> "c:\users\supharerk\appdata\local\continuum\anaconda3\lib\contextlib.py" >>>>>>> , line 119, in __exit__ >>>>>>> next(self.gen) >>>>>>> File >>>>>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>>>>>> , line 176, in save >>>>>>> cleanup(f.name) >>>>>>> File >>>>>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>>>>>> , line 136, in cleanup >>>>>>> raise e >>>>>>> File >>>>>>> "C:\Users\Supharerk\.virtualenvs\ocr_server-jUkFWk3u\lib\site-packages\pytesseract\pytesseract.py" >>>>>>> , line 133, in cleanup >>>>>>> remove(filename) >>>>>>> PermissionError: [WinError 5] Access is denied: >>>>>>> 'C:\\Users\\SUPHAR~1\\AppData\\Local\\Temp\\tess_y3d570lt' >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> เมื่อ วันเสาร์ที่ 29 กุมภาพันธ์ ค.ศ. 2020 16 นาฬิกา 19 นาที 41 >>>>>>> วินาที UTC+7, zdenop เขียนว่า: >>>>>>>> >>>>>>>> Can you replicate problem with command line /"pure" tesseract? e,g, >>>>>>>> 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe' >>>>>>>> images/invoice-sample.jpg >>>>>>>> invoice-sample >>>>>>>> >>>>>>>> Zdenko >>>>>>>> >>>>>>>> >>>>>>>> pi 28. 2. 2020 o 20:31 Supharerk Thawillarp <[email protected]> >>>>>>>> napísal(a): >>>>>>>> >>>>>>>>> >>>>>>>>> I'm new to tesseract and trying to follow tutorial on Windows 10 >>>>>>>>> using the code below >>>>>>>>> >>>>>>>>> import cv2 >>>>>>>>> import pytesseract >>>>>>>>> from pytesseract import Output >>>>>>>>> pytesseract.pytesseract.tesseract_cmd = 'C:\\Program >>>>>>>>> Files\\Tesseract-OCR\\tesseract.exe' >>>>>>>>> >>>>>>>>> >>>>>>>>> img=cv2.imread('images/invoice-sample.jpg') >>>>>>>>> >>>>>>>>> >>>>>>>>> d=pytesseract.image_to_data(img,output_type=Output.DICT) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> print(d.keys) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The problem is, I keep getting error PermissionError: [WinError 5] >>>>>>>>> Access is denied: 'from implementing image_to_data and >>>>>>>>> image_to_string in >>>>>>>>> Windows 10. >>>>>>>>> >>>>>>>>> Only resource I found in stackoverflow is to set tesseract_cmd, >>>>>>>>> PATH and TESSDATA_PREFIX which did not work for me. Not even using the >>>>>>>>> administrative cmd works. >>>>>>>>> >>>>>>>>> After spending a couple hours I found setting permission for >>>>>>>>> tesseract.exe (right click, select property and go to security tab) by >>>>>>>>> checking Full control and Modify below to make it works. >>>>>>>>> >>>>>>>>> Hope this will help some people strugglingthe same problem. >>>>>>>>> >>>>>>>>> >>>>>>>>> [image: 1582917756731.jpg][image: 1582917788913.jpg] >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "tesseract-ocr" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/2d9f9f66-40a5-4ce9-9f14-cca48307e9f5%40googlegroups.com >>>>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/2d9f9f66-40a5-4ce9-9f14-cca48307e9f5%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> >>>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/06df8a53-6027-4dc5-af29-b7e29d446b29%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/06df8a53-6027-4dc5-af29-b7e29d446b29%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/71abd149-93fe-478c-a637-6a9faf117c32%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/71abd149-93fe-478c-a637-6a9faf117c32%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/ccc5d777-e0af-4683-9881-0efc8798ecb8%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/ccc5d777-e0af-4683-9881-0efc8798ecb8%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/e4e2b3f1-5201-4ddb-adbb-b810570ce7d1%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/e4e2b3f1-5201-4ddb-adbb-b810570ce7d1%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x4ssEJed%2BZ0k7uuQC28qYg3AkK3jhB1ge-gLoDXEP1uA%40mail.gmail.com.

