Re: [tesseract-ocr] Poor results of Tesseract performing a play card evaluation

Paulus Present Mon, 20 Nov 2023 01:56:24 -0800

Previous message >>

Now, I suspect that in future development of the OpenAI API and their 
models it will become possible to query a custom GPT version which could be 
pretrained manually by simply coversing with it about a sample text you 
provide and telling it what it transcribed wrong and how to correct. I 
already tested this in the browser interface and my findings are that if 
you present GPT-4 in the browser just 1 image which you ask to transcribe 
it does a fine job. You then proceed by poinying out it's errors and ask it 
to correct thereby helping GPT-4 to extend it's knowledge base in that 
conversation. You then feed it a 2nd text and do the same process. After a 
certain set of images it will transcribe perfectly from the first attempt, 
simply because it has acquired the skills to read the documentation based 
on your specific steering and instructions. When you would then be able to 
query this model in the API, your prompt would simply need to be an image 
and it would know what to do and result you the transcribed text back 
perfectly. For now I still need the long prompt ;).

I thought this info might be useful to you, but can imagine that you are
well aware of this already seen as you are already in this field.

I have attached my code and the docs you need to run it. All paths in the
code need to be changed of course to the locations where you will put the
source files.
Also you need your own OpenAI API key which you can get here
https://help.openai.com/en/articles/4936850-where-do-i-find-my-api-key (It
is a paid service so you need to add min $5 on your account)

Some additional info:
- The script uses a combination of OCR, classic image recognition and GPT-4
Vision to get all different datapoints. Where OCR or image recognition
sufficed I applied this because a deterministic procedure seems preferably
when sufficient.
- The special symbols in the text were found with classical image
recognition and put in a dictionary based on location in the body text. I
then used this dict to replace all symbol placeholders in the GPT-4
transcription with the actual symbols from the image recognition dict. This
is overly complex, but was the only way to get the accuracy I wanted and
being able to only prompt vanilla GPT-4 Vision. When you will be able to
query your own custom trained GPT-4 Vision the replacement step will not be
necessary as with training, it can learn to recognize the symbols itself. I
have tested this in the browser interface and this is the case. When I
correct a symbol transcription once it remembers this for future
transcriptions in the same chat.
- The script part for GPT-4 Vision already implements a batching method,
but as batching is not yet allowed on the OpenAI API side, the batch size
is set to '1'. However in the browser interface you can upload up to 10
images in 1 prompt, so I suspect this will become available via the API
sometime in the future. It suffices then to increase the BATCH_size on line
489 to start using this option.

So, this was a very long explanation and I am sorry for that. I am however
very enthousiastic about the results and you surely helped me along. Thanks
again! Also curious to see how you would use this in your field :)

Let me know what you think! :)

Kind regards
Paulus

>> Attachments

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/1b5385ca-2607-4dd3-9fad-4aa3dc6cbc79n%40googlegroups.com.

Re: [tesseract-ocr] Poor results of Tesseract performing a play card evaluation

Reply via email to