Re: [tesseract-ocr] Re: Dot-matrix woes

Slartybartfast Thu, 02 Nov 2023 15:36:19 -0700

Thank you! The original has much more border around it. I just cropped it 
for easier viewing here. I already did a little bit of pre-processing but 
looks like I need to do more. Interesting that scaling up improved things. 
According to one analysis done, accuracy depends on character height. 
According to that - I had the optimum character height, but maybe things 
have changed. The original scan was done at 300 dpi. I'll try 600.


Incidentally ... I got so frustrated I wrote my own OCR program today. Only 
took me a few hours. Much more accurate than Tesseract, though working with 
fixed-width fonts makes life a lot easier!! Just divide the image up into a 
grid, and pattern match each "cell". As I was only interested in the 
numbers, I only had 16 (hex digits) to match against.

Cheers


On Thursday, November 2, 2023 at 12:43:12 PM UTC piggy wrote:

> I added more white space around the target text by scaling the canvas to 
> 500 pixels wide, and then scaled up the whole image by a factor of 2.
>
> -230 6 5O
>
> 90 6 50
>
> 90 6 -100
> 130 6 -100
> 130 6 -150
>
> On Thu, Nov 2, 2023 at 8:35 AM La Monte H. P. Yarroll <[email protected]> 
> wrote:
>
>> I had a little success applying 2.5 pixels of blur and then thresholding 
>> at 217-255. FWIW, I used gimp for the preprocesing. Here's what I got after 
>> just a few minutes:
>> a i @)
>>
>> -230 & 50
>> 90 6 50
>> 90 6 -100
>>
>> 130 6
>> 130 6
>>
>> ~100
>> -130
>>
>> I don't know what happened to the first column or why the last 2 lines 
>> got split the way they did.
>>
>>
>> On Wed, Nov 1, 2023 at 4:30 PM Slartybartfast <
>> [email protected]> wrote:
>>
>>> Doesn't anybody have any ideas?  :-(
>>>
>>> On Tuesday, October 24, 2023 at 5:40:20 PM UTC+1 Slartybartfast wrote:
>>>
>>>> Hi
>>>> I am a new tesseract user, and I'm really struggling to get it to 
>>>> produce any kind of sensible results, especially with numerical text. I 
>>>> have some text that looks like this:
>>>> [image: example_input.jpg]
>>>> I've read the documentation, and looked through the parameter list, and 
>>>> I added the following to the command line:
>>>> --psm 6
>>>> -c preserve_interword_spaces=1
>>>> -c textord_dotmatrix_gap=6
>>>> -c classify_bln_numeric_mode=1
>>>> -c rej_alphas_in_number_perm=1
>>>>
>>>> But I just get garbage out:
>>>>
>>>> Oo -250 6 3a
>>>> 190 & So
>>>> 190 6 -100
>>>> 1 $1290 6 ~140
>>>> 1 $130 6 ~150
>>>>
>>>> I've tried all sorts of additional image processing to try and improve 
>>>> the look of the text, but none of it works. In fact, this is the best 
>>>> output of seen. It's usually worse. I'm really hoping someone who has 
>>>> worked with dot-matrix input can offer some magic incantation to make 
>>>> tesseract come to its senses. Thanks.
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/tesseract-ocr/15797f86-58c9-4e71-b316-54f663d04cbfn%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/tesseract-ocr/15797f86-58c9-4e71-b316-54f663d04cbfn%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5c364cf1-076a-43e4-86f2-61b925b9d6c3n%40googlegroups.com.

Re: [tesseract-ocr] Re: Dot-matrix woes

Reply via email to