Hello I want make or generated with you a simple file trainddata by jtessboxeditor for Tesseract and test it can you inform me time to discuss The steps. Thanks
في الثلاثاء، ٢٦ نوفمبر ٢٠٢٤، ٥:٠١ م Taresh Chaudhari < tareshchaudh...@gmail.com> كتب: > Thanks Mahmoud for sharing. I did apply these techniques, but still > results are not good and still trying to solve this problem. Let me see how > does it proceed. > > On Tuesday, 26 November 2024 at 00:31:29 UTC+5:30 mahmoud...@gmail.com > wrote: > >> To improve the accuracy of text extraction, you can preprocess the image >> before passing it to the OCR engine. Preprocessing techniques like >> converting the image to grayscale, enhancing contrast, or applying filters >> can help reduce noise and improve readability. Additionally, tweaking the >> pytesseract settings like changing the --psm value may also improve the >> results. >> >> Here’s an updated version of your code with some preprocessing steps: >> import pytesseract >> from PIL import Image, ImageEnhance, ImageFilter >> >> pytesseract.pytesseract.tesseract_cmd = >> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe' >> >> # Path to your image >> image_path = 'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg' >> >> def extract_text_from_image(image_path): >> # Open the image >> img = Image.open(image_path) >> >> # Convert the image to grayscale to improve text-background contrast >> img = img.convert('L') # Convert image to grayscale >> img = ImageEnhance.Contrast(img).enhance(2) # Increase contrast >> img = img.filter(ImageFilter.SHARPEN) # Sharpen the image >> >> # Use pytesseract to extract text >> >> >> extracted_text = pytesseract.image_to_string(img, config='--psm 6') >> # PSM 6 assumes a block of text >> return extracted_text.strip() >> >> # Extract and print text >> text = extract_text_from_image(image_path) >> print(f"Text extracted from {image_path}: {text}") >> >> في الاثنين، ٢٥ نوفمبر ٢٠٢٤، ٤:١٢ م Taresh Chaudhari <tareshc...@gmail.com> >> كتب: >> >>> Attaching a image for reference. >>> >>> On Monday, 25 November 2024 at 15:52:27 UTC+5:30 Taresh Chaudhari wrote: >>> >>>> Hi, >>>> I am trying to read the characters from the image, which has characters >>>> with black color in the background. Attaching the code which i used to >>>> extract, currently its giving the partial output. Can you help me to guide >>>> how to make it accurate? >>>> >>>> >>>> import pytesseract >>>> from PIL import Image >>>> pytesseract.pytesseract.tesseract_cmd = >>>> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe' >>>> # Paths to your images >>>> image_paths = [ >>>> 'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg'] >>>> >>>> # Function to process an image and extract text >>>> def extract_text_from_image(image_path): >>>> # Open the image >>>> img = Image.open(image_path) >>>> >>>> # Use pytesseract to perform OCR >>>> extracted_text = pytesseract.image_to_string(img, config='--psm 6') >>>> # PSM 6 assumes a block of text >>>> return extracted_text.strip() >>>> >>>> # Process all images and print results >>>> for img_path in image_paths: >>>> text = extract_text_from_image(img_path) >>>> print(f"Text extracted from {img_path}: {text}") >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> To view this discussion visit >>> https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion visit > https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAB5aXsm5tLgkf%3DJeJ6aiUHazvaUXK0hvrO32y4cyq0SU0K6Ydw%40mail.gmail.com.