I agree with the suggestion to try template matching. I already did some
experiments with Tesseract, so I will share those here.
Previous threads have brought up issues with part numbers with mixed
letters and digits, when using the default English training data. The same
thing is happening here. In one of your examples, R5 -> RS and T9 -> TS.
I tried a few experiments to alter the spacing in the original image.
(1) First, I tried increasing the horizontal spacing between characters. A
little bit of increase does seem to help; however, if I added too much
space, there was a "ringing" effect, that Tesseract would read in
characters that aren't there. You can see that in some cases "V" got
doubled into "Vv".
(2) Next, I tried putting each character an a separate line. In this case
also, there was a "ringing" effect with letter V.
(3) Third, I tried putting each character into its own image. (This is
slower because I believe pytesseract launches a new instance each time you
call it.)
(4) Finally, I tried running all three approaches together and showing the
results together.
For each method, I had to tune the parameters a little bit, and so it's
likely that it will still fail on some cases in your data set.
For me, it was interesting to play with the different spacing parameters
and see how Tesseract reacts.
I did not experiment much with the Page Segmentation Mode (psm) parameter.
I haven't tried the legacy engine either, which was suggested.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/04927f33-7530-4952-b082-d59cc8313aed%40googlegroups.com.
# Usage: python img.py <filename.png> [mode]
# mode is optional and can be:
# 1 : expand spacing between characters
# 2 : put characters on separate lines
# 3 : put characters in separate images
# 4 : ensemble - try all three modes, and show the results
# Next experiment to try: Separate each character into its own image (with margins) and process each image separately! (return a list of images)
# Have three modes: expand horizontally, expand vertically, and send separate images...
from pytesseract import image_to_string
import pytesseract
import cv2
import re
import sys
import numpy as np
filename = sys.argv[1]
mode = 4
if len(sys.argv) > 2:
mode = int(sys.argv[2])
img = cv2.imread(filename, cv2.IMREAD_GRAYSCALE)
height, width = img.shape
# Important: remove the line from the bottom as well as the top
# The re-spacing algorithm won't work unless it can find a whitespace gap from top to bottom between characters.
# i.e., this code only works with a single line of text, with no other content.
MARGIN_TOP_FROM_BOTTOM = 41
MARGIN_BOTTOM = 5
MARGIN_LEFT = 2
MARGIN_RIGHT = 2
roi = img[height - MARGIN_TOP_FROM_BOTTOM : height - MARGIN_BOTTOM, MARGIN_LEFT : width - MARGIN_RIGHT]
import char_spacing
'''
tesseract --help-extra
Page segmentation modes:
0 Orientation and script detection (OSD) only.
1 Automatic page segmentation with OSD.
2 Automatic page segmentation, but no OSD, or OCR. (not implemented)
3 Fully automatic page segmentation, but no OSD. (Default)
4 Assume a single column of text of variable sizes.
5 Assume a single uniform block of vertically aligned text.
6 Assume a single uniform block of text.
7 Treat the image as a single text line.
8 Treat the image as a single word.
9 Treat the image as a single word in a circle.
10 Treat the image as a single character.
11 Sparse text. Find as much text as possible in no particular order.
12 Sparse text with OSD.
13 Raw line. Treat the image as a single text line,
bypassing hacks that are Tesseract-specific.
'''
roi = cv2.resize(roi, None, fx=2, fy=2)
# Experimental
def process_horizontal(roi):
print('Expanding horizontal space between characters.')
psm = 7
roi = char_spacing.expand_horizontal_gaps(roi, min_run_length=64, white_threshold = 220)
tess_config = f"--psm {psm} --oem 3 tessedit_char_whitelist=0123456789"
_, roi = cv2.threshold(roi, 128+64, 255, cv2.THRESH_BINARY)
roi = cv2.GaussianBlur(roi, (3,3), 0)
text_detected = image_to_string(roi, config=tess_config, )
return text_detected, roi
def process_vertical(roi):
print('Expanding characters onto separate lines.')
psm = 6
roi = char_spacing.one_character_per_line(roi, line_spacing=20, white_threshold = 220)
tess_config = f"--psm {psm} --oem 3 tessedit_char_whitelist=0123456789"
_, roi = cv2.threshold(roi, 128+64, 255, cv2.THRESH_BINARY)
roi = cv2.GaussianBlur(roi, (3,3), 0)
text_detected = image_to_string(roi, config=tess_config, )
text_detected = text_detected.replace('\n', ' ') # Mode 2 only
return text_detected, roi
def process_separate(roi):
print('Separating character clusters into separate images')
psm = 7 # Single text line (or 10, single character)
#multiple_images = True
images = char_spacing.one_character_per_image(roi, new_margin=8, white_threshold = 220)
tess_config = f"--psm {psm} --oem 3 tessedit_char_whitelist=0123456789"
text_list = []
processed_images = []
for roi in images:
_, roi = cv2.threshold(roi, 128+64, 255, cv2.THRESH_BINARY)
roi = cv2.GaussianBlur(roi, (3,3), 0)
processed_images.append(roi)
text_detected = image_to_string(roi, config=tess_config, )
text_list.append(text_detected)
text_detected = ' '.join(text_list) # Use space for readable output, or blank for better number recognition (we will remove the spaces later)
return text_detected, images[0]
def extract_numbers(text_detected):
print()
print(text_detected, '(before correction)')
text_detected = re.sub('I', '1', text_detected)
text_detected = re.sub('i', '1', text_detected)
text_detected = re.sub('l', '1', text_detected)
text_detected = re.sub('L', '1', text_detected)
text_detected = re.sub('Z', '2', text_detected)
text_detected = re.sub('S', '5', text_detected)
text_detected = re.sub('s', '5', text_detected)
text_detected = re.sub('G', '6', text_detected)
text_detected = text_detected.replace('O', '0')
text_detected = text_detected.replace('o', '0')
print(text_detected, '(after correction)')
# Remove spacing before finding numbers, because the preprocessing may have separated digits
text_detected = text_detected.replace(' ', '')
numbers = re.findall("[0-9]+", text_detected)
print(numbers)
return numbers
if (mode == 1): text_detected, roi = process_horizontal(roi)
if (mode == 2): text_detected, roi = process_vertical(roi)
if (mode == 3): text_detected, roi = process_separate(roi)
if (mode == 4):
t1, _ = process_horizontal(roi)
t2, _ = process_vertical(roi)
t3, _ = process_separate(roi)
n1 = extract_numbers(t1)
n2 = extract_numbers(t2)
n3 = extract_numbers(t3)
print('Results:')
print(n1)
print(n2)
print(n3)
if (mode != 4):
print(extract_numbers(text_detected))
#print(text[5] + text[6] + text[7])
#if (multiple_images):
# TIMEOUT = 3000
# for roi in processed_images:
# cv2.imshow("roi-sub", roi)
# cv2.waitKey(TIMEOUT)
#else:
TIMEOUT = 45 * 1000
cv2.imshow("roi", roi)
cv2.waitKey(TIMEOUT)
# Experimental
import numpy as np
def expand_horizontal_gaps(bin_img, min_run_length=16, white_threshold = 240):
gaps = []
in_gap = False
(height, width) = bin_img.shape
for x in range(width):
# Look for runs of white pixels...
is_white = True
for y in range(height):
if bin_img[y][x] < white_threshold:
is_white = False
break
if is_white and not in_gap:
in_gap = True
gap_start = x
if in_gap and not is_white:
gaps.append((gap_start, x))
in_gap = False#
if in_gap:
gaps.append((gap_start, width))
# Now 'gaps' contains a list of all gaps...
gap_deficits = []
for gap in gaps:
gap_size = gap[1] - gap[0]
if (gap_size < min_run_length):
gap_deficits.append(min_run_length - gap_size)
else:
gap_deficits.append(0)
total_deficit = sum(gap_deficits)
gap_centers = []
for gap in gaps:
avg = (gap[0] + gap[1]) // 2
gap_centers.append(avg)
gap_centers.append(-1)
gap_index = 0
WHITE = 255
new_img = np.zeros((height, width + total_deficit), np.uint8)
xx = 0
for x in range(width):
# Insert a gap if necessary
if x == gap_centers[gap_index]:
for _ in range(gap_deficits[gap_index]):
for y in range(height):
new_img[y][xx] = WHITE
xx += 1
gap_index += 1
# Copy the column over regularly
for y in range(height):
new_img[y][xx] = bin_img[y][x]
xx += 1
return new_img
def get_runs(img, white_threshold = 240):
runs = []
in_run = False
(height, width) = img.shape
for x in range(width):
# Look for runs of white pixels...
is_white = True
for y in range(height):
if img[y][x] < white_threshold:
is_white = False
break
if is_white and in_run:
in_run = False
runs.append((run_start, x))
if not in_run and not is_white:
run_start = x
in_run = True
if in_run:
runs.append((run_start, width))
return runs
def one_character_per_image(img, new_margin = 16, white_threshold = 240):
images = []
# Find the margins...
margins = get_margins(img, white_threshold)
top, bottom, left, right = margins
# Crop the image
img = img[top:-(bottom+1), left:-(right+1)]
height = len(img)
runs = get_runs(img)
for row_index, run in enumerate(runs):
x_offset = new_margin
y_offset = new_margin
new_width = (run[1] - run[0]) + (2 * new_margin)
new_height = (2 * new_margin) + height
new_img = np.ones((new_height, new_width), np.uint8) * 255
for y in range(height):
yy = y_offset + y
for x in range(run[1] - run[0]):
xx = x + x_offset
new_img[yy][xx] = img[y][x + run[0]]
images.append(new_img)
return images
def one_character_per_line(img, line_spacing = 20, white_threshold = 240):
# find the margins...
margins = get_margins(img, white_threshold)
top, bottom, left, right = margins
# Crop the image
img = img[top:-(bottom+1), left:-(right+1)]
runs = get_runs(img)
# Calculate the size of the new image
new_margin = 16
width = len(img[0])
height = len(img)
longest_run = max([run[1] - run[0] for run in runs]) # Longest run length
new_width = longest_run + 2 * new_margin
new_height = (2 * new_margin) + (height * len(runs)) + (line_spacing * (len(runs) - 1))
new_img = np.ones((new_height, new_width), np.uint8) * 255
for row_index, run in enumerate(runs):
x_offset = new_margin
y_offset = new_margin + (line_spacing + height) * row_index
for y in range(height):
yy = y_offset + y
for x in range(run[1] - run[0]):
xx = x + x_offset
new_img[yy][xx] = img[y][x + run[0]]
return new_img
def get_margins(gray, shade):
# Identify margins (top, bottom, left, right) on a grayscale image, using the given shade as a threshold, assuming a light background.
height, width = gray.shape
# Top margin:
top_margin = 0
for y in range(height):
blank = True
for x in range(width):
if gray[y][x] < shade:
blank = False
break
if (blank): top_margin += 1
else: break
# Bottom margin:
bottom_margin = 0
for y in range(height-1, -1, -1):
blank = True
for x in range(width):
if gray[y][x] < shade:
blank = False
break
if (blank): bottom_margin += 1
else: break
# Right margin:
right_margin = 0
for x in range(width-1, -1, -1):
blank = True
for y in range(height):
if gray[y][x] < shade:
blank = False
break
if (blank): right_margin += 1
else: break
# Left margin:
left_margin = 0
for x in range(width):
blank = True
for y in range(height):
if gray[y][x] < shade:
blank = False
break
if (blank): left_margin += 1
else: break
return (top_margin, bottom_margin, left_margin, right_margin)
This file shows the three methods in succession, and then all three results
together.
>python img.py 2017-03-26_SecondPie.png
Expanding horizontal space between characters.
Expanding characters onto separate lines.
Separating character clusters into separate images
N 1 Vv N 2 Vv 2 N 2 Vv 2 T R Vv 3 T 3 R 3 R 1 N 8 R2 R 1 T 6 R2 T 2 T 1 (before
correction)
N 1 Vv N 2 Vv 2 N 2 Vv 2 T R Vv 3 T 3 R 3 R 1 N 8 R2 R 1 T 6 R2 T 2 T 1 (after
correction)
['1', '2', '2', '2', '2', '3', '3', '3', '1', '8', '2', '1', '6', '2', '2', '1']
N 1 Vv N 2 V2 N 2 V2 T R V3 T3 R3 R 1 N8 R2 R 1 T6 R2 T2 T1 (before correction)
N 1 Vv N 2 V2 N 2 V2 T R V3 T3 R3 R 1 N8 R2 R 1 T6 R2 T2 T1 (after correction)
['1', '2', '2', '2', '2', '3', '3', '3', '1', '8', '2', '1', '6', '2', '2', '1']
N 1 Vv N 2 V2 N 2 V2 T R V3 T3 R3 R 1 N8 R2 R 1 T6 R2 T2 T1 (before correction)
N 1 Vv N 2 V2 N 2 V2 T R V3 T3 R3 R 1 N8 R2 R 1 T6 R2 T2 T1 (after correction)
['1', '2', '2', '2', '2', '3', '3', '3', '1', '8', '2', '1', '6', '2', '2', '1']
Results:
['1', '2', '2', '2', '2', '3', '3', '3', '1', '8', '2', '1', '6', '2', '2', '1']
['1', '2', '2', '2', '2', '3', '3', '3', '1', '8', '2', '1', '6', '2', '2', '1']
['1', '2', '2', '2', '2', '3', '3', '3', '1', '8', '2', '1', '6', '2', '2', '1']
===
>python img.py 2007-04-12_SecondPie.png
Expanding horizontal space between characters.
Expanding characters onto separate lines.
Separating character clusters into separate images
R 1 N Vv T N 3 R 5 N 2 Vv 5 N 2 R4 R 1 N 3 T9 N 2 R4 R 1 N 3 Vv 1 0 R 3 (before
correction)
R 1 N Vv T N 3 R 5 N 2 Vv 5 N 2 R4 R 1 N 3 T9 N 2 R4 R 1 N 3 Vv 1 0 R 3 (after
correction)
['1', '3', '5', '2', '5', '2', '4', '1', '3', '9', '2', '4', '1', '3', '10',
'3']
R 1 N Vv T N 3 RS N 2 V5 N 2 R4 R 1 N 3 T9 N 2 R4 R 1 N 3 vio R3 (before
correction)
R 1 N Vv T N 3 R5 N 2 V5 N 2 R4 R 1 N 3 T9 N 2 R4 R 1 N 3 v10 R3 (after
correction)
['1', '3', '5', '2', '5', '2', '4', '1', '3', '9', '2', '4', '1', '3', '10',
'3']
R 1 N Vv T N 3 RS N 2 VS N 2 R4 R 1 N 3 T9 N 2 R4 R 1 N 3 vio R3 (before
correction)
R 1 N Vv T N 3 R5 N 2 V5 N 2 R4 R 1 N 3 T9 N 2 R4 R 1 N 3 v10 R3 (after
correction)
['1', '3', '5', '2', '5', '2', '4', '1', '3', '9', '2', '4', '1', '3', '10',
'3']
Results:
['1', '3', '5', '2', '5', '2', '4', '1', '3', '9', '2', '4', '1', '3', '10',
'3']
['1', '3', '5', '2', '5', '2', '4', '1', '3', '9', '2', '4', '1', '3', '10',
'3']
['1', '3', '5', '2', '5', '2', '4', '1', '3', '9', '2', '4', '1', '3', '10',
'3']