[tesseract-ocr] Re: Macron’s recognition in Tesseract (āĀēĒīĪōŌūŪ)

2020-04-28 Thread leo vince
Using the tesseract-ocr-lat but no macrons are showing up, just accent 
grave and convex...

On Friday, January 20, 2017 at 12:21:14 AM UTC-7, alter...@gmail.com wrote:
>
> Dear all,
> I frequently use Tesseract (3.04) and it’s great. 
> Still, I can’t find a way to get Tesseract recognize macrons (āĀēĒīĪōŌūŪ).
> There was a discussion 
> 
>  
> here about it 5 years ago but at the time, there wasn’t much of a solution.
> Things may have changed since then and I’m wondering if somebody would 
> have some hints.
> Macrons are used among other things when doing recognition from japanese 
> transcribed in latin alphabet (rōmaji).
> Thanks in advance for all possible ideas.
> For now, using fra or deu as one of the language, I get ô or ö…
> Best,
> Nicolas
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/be78cf8e-7a9e-444a-9220-b6ef150a4e28%40googlegroups.com.


Re: [tesseract-ocr] Tessaract not able to output detected text

2020-04-28 Thread payel roy
Hi  Zdenko

Thanks for your email. I already tried with multiple combination changing
different parameters. However I am still not able to get text from the
image. Attached my pre-processing code, which I am running before using
tesseract. But however I am unable to get text still. Please help.

On Tue, 28 Apr 2020 at 23:57, Zdenko Podobny  wrote:

> https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html
>
> Zdenko
>
>
> ut 28. 4. 2020 o 20:26 payel roy  napísal(a):
>
>> Hi Team,
>>
>> I am new to Tessaract. Following the code snippet. While running it, I
>> can't get result back from Tesseract on the detect texts. Please help.
>>
>> #!/usr/bin/python
>>
>> import cv2
>> import pytesseract
>> import sys
>> from PIL import Image
>>
>> filename=sys.argv[1]
>>
>> print(pytesseract.image_to_string(Image.open(filename)))
>>
>>
>> Both of the above images get detected by Amazon rekognition system with
>> 80% confidence score. Would you please help how I can get this working on
>> Tesseract?
>>
>> Thanks
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/bc3386b8-0220-458b-bd5d-bef463747747%40googlegroups.com
>> 
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wd5vQ5mB_1s%3DMPFkG6Ud6KZBg0AAAzGy3kBigBc%2BHoLg%40mail.gmail.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CALUOEQhzfSc%3DUN4LffP78bRxvNjRMs_jGHT05s%3Di8Bin4T8S1Q%40mail.gmail.com.
#!/usr/bin/python

import cv2
import numpy as np
import sys

# get grayscale image
def get_grayscale(image):
return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# noise removal
def remove_noise(image):
return cv2.medianBlur(image,5)
 
#thresholding
def thresholding(image):
return cv2.threshold(image, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

#dilation
def dilate(image):
kernel = np.ones((5,5),np.uint8)
return cv2.dilate(image, kernel, iterations = 1)

#erosion
def erode(image):
kernel = np.ones((5,5),np.uint8)
return cv2.erode(image, kernel, iterations = 1)

#opening - erosion followed by dilation
def opening(image):
kernel = np.ones((5,5),np.uint8)
return cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)

#canny edge detection
def canny(image):
return cv2.Canny(image, 100, 200)

#skew correction
def deskew(image):
coords = np.column_stack(np.where(image > 0))
angle = cv2.minAreaRect(coords)[-1]
if angle < -45:
angle = -(90 + angle)
else:
angle = -angle
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, angle, 1.0)
rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, 
borderMode=cv2.BORDER_REPLICATE)
return rotated

#template matching
def match_template(image, template):
return cv2.matchTemplate(image, template, cv2.TM_CCOEFF_NORMED)


if __name__ == "__main__":
filename=sys.argv[1]
image = cv2.imread(filename)

gray = get_grayscale(image)
thresh = thresholding(gray)
opening = opening(gray)
canny = canny(gray)
#deskew=deskew(canny)


outputFilename="pre-"+filename;
cv2.imwrite(outputFilename, canny)


Re: [tesseract-ocr] Tessaract not able to output detected text

2020-04-28 Thread Zdenko Podobny
https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html

Zdenko


ut 28. 4. 2020 o 20:26 payel roy  napísal(a):

> Hi Team,
>
> I am new to Tessaract. Following the code snippet. While running it, I
> can't get result back from Tesseract on the detect texts. Please help.
>
> #!/usr/bin/python
>
> import cv2
> import pytesseract
> import sys
> from PIL import Image
>
> filename=sys.argv[1]
>
> print(pytesseract.image_to_string(Image.open(filename)))
>
>
> Both of the above images get detected by Amazon rekognition system with
> 80% confidence score. Would you please help how I can get this working on
> Tesseract?
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/bc3386b8-0220-458b-bd5d-bef463747747%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wd5vQ5mB_1s%3DMPFkG6Ud6KZBg0AAAzGy3kBigBc%2BHoLg%40mail.gmail.com.


[tesseract-ocr] What does each column represent in the output of image_to_data method of pytesseract?

2020-04-28 Thread durga sai
I was trying extract text using pytesseract. Using image_to_data giving the 
dataframe as a output. I want to know the meaning of each column in it. I 
did not get any description about those columns anywhere. I got to know the 
meaning of some columns but not all. 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c189e93c-64b8-4786-ad3c-89573efb4fdf%40googlegroups.com.


[tesseract-ocr] Re: Engineering drawings OCR

2020-04-28 Thread Piyush Chandra
Hi,
 
First of all please make sure you have quality image, check this link 
 for more info.

If you still don't get the  required result, the it is suggested to train 
tesseract with that particular font. And yes, training helps in improved 
text detection. (Just try to fine tune an existing trained data model) 

BR\ Piyush 

On Tuesday, 28 April 2020 10:44:05 UTC+5:30, pranaya mhatre wrote:
>
> Hi,
>
> I am using tesseract v4.1.0-bibtag19 in windows 10. I am extracting text 
> from engineering drawings made in auto cad and the images are clear. but i 
> am unable to extract all text from drawings and also getting some garbage 
> text. 
>
> Is it required to train tesseract for engineering drawings font ?  fonts 
> are namely times new roman, romans, simplex, arial.
> IS tesseract training helps in text detection also ? 
>
> Please help me.
>
> Thank you.
>
> Regards,
> Pranaya Mhatre
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/58a82c55-28ff-43c2-9e42-a1b18e903bca%40googlegroups.com.