Re: [tesseract-ocr] Re: Reading image from Rubber

محمود محمد Wed, 11 Dec 2024 05:23:08 -0800

Hello I want make or generated with you a simple file trainddata by
jtessboxeditor for Tesseract and test it can you inform me time to discuss
The steps.  Thanks


في الثلاثاء، ٢٦ نوفمبر ٢٠٢٤، ٥:٠١ م Taresh Chaudhari <
[email protected]> كتب:

> Thanks Mahmoud for sharing. I did apply these techniques, but still
> results are not good and still trying to solve this problem. Let me see how
> does it proceed.
>
> On Tuesday, 26 November 2024 at 00:31:29 UTC+5:30 [email protected]
> wrote:
>
>> To improve the accuracy of text extraction, you can preprocess the image
>> before passing it to the OCR engine. Preprocessing techniques like
>> converting the image to grayscale, enhancing contrast, or applying filters
>> can help reduce noise and improve readability. Additionally, tweaking the
>> pytesseract settings like changing the --psm value may also improve the
>> results.
>>
>> Here’s an updated version of your code with some preprocessing steps:
>> import pytesseract
>> from PIL import Image, ImageEnhance, ImageFilter
>>
>> pytesseract.pytesseract.tesseract_cmd =
>> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'
>>
>> # Path to your image
>> image_path = 'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg'
>>
>> def extract_text_from_image(image_path):
>>     # Open the image
>>     img = Image.open(image_path)
>>
>>     # Convert the image to grayscale to improve text-background contrast
>>     img = img.convert('L')  # Convert image to grayscale
>>     img = ImageEnhance.Contrast(img).enhance(2)  # Increase contrast
>>     img = img.filter(ImageFilter.SHARPEN)  # Sharpen the image
>>
>>     # Use pytesseract to extract text
>>
>>
>>     extracted_text = pytesseract.image_to_string(img, config='--psm 6')
>> # PSM 6 assumes a block of text
>>     return extracted_text.strip()
>>
>> # Extract and print text
>> text = extract_text_from_image(image_path)
>> print(f"Text extracted from {image_path}: {text}")
>>
>> في الاثنين، ٢٥ نوفمبر ٢٠٢٤، ٤:١٢ م Taresh Chaudhari <[email protected]>
>> كتب:
>>
>>> Attaching a image for reference.
>>>
>>> On Monday, 25 November 2024 at 15:52:27 UTC+5:30 Taresh Chaudhari wrote:
>>>
>>>> Hi,
>>>> I am trying to read the characters from the image, which has characters
>>>> with black color in the background. Attaching the code which i used to
>>>> extract, currently its giving the partial output. Can you help me to guide
>>>> how to make it accurate?
>>>>
>>>>
>>>> import pytesseract
>>>> from PIL import Image
>>>> pytesseract.pytesseract.tesseract_cmd =
>>>> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'
>>>> # Paths to your images
>>>> image_paths = [
>>>>    'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg']
>>>>
>>>> # Function to process an image and extract text
>>>> def extract_text_from_image(image_path):
>>>>     # Open the image
>>>>     img = Image.open(image_path)
>>>>
>>>>     # Use pytesseract to perform OCR
>>>>     extracted_text = pytesseract.image_to_string(img, config='--psm 6')
>>>>  # PSM 6 assumes a block of text
>>>>     return extracted_text.strip()
>>>>
>>>> # Process all images and print results
>>>> for img_path in image_paths:
>>>>     text = extract_text_from_image(img_path)
>>>>     print(f"Text extracted from {img_path}: {text}")
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAB5aXsm5tLgkf%3DJeJ6aiUHazvaUXK0hvrO32y4cyq0SU0K6Ydw%40mail.gmail.com.

Re: [tesseract-ocr] Re: Reading image from Rubber

Reply via email to