Wow, thanks, it will take me a while to parse this but it sounds very promising.

art

From: [email protected] <[email protected]> On Behalf 
Of Paulus Present
Sent: Monday, November 20, 2023 4:56 AM
To: tesseract-ocr <[email protected]>
Subject: Re: [tesseract-ocr] Poor results of Tesseract performing a play card 
evaluation

You don't often get email from 
[email protected]<mailto:[email protected]>. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>
Previous message >>

Now, I suspect that in future development of the OpenAI API and their models it 
will become possible to query a custom GPT version which could be pretrained 
manually by simply coversing with it about a sample text you provide and 
telling it what it transcribed wrong and how to correct. I already tested this 
in the browser interface and my findings are that if you present GPT-4 in the 
browser just 1 image which you ask to transcribe it does a fine job. You then 
proceed by poinying out it's errors and ask it to correct thereby helping GPT-4 
to extend it's knowledge base in that conversation. You then feed it a 2nd text 
and do the same process. After a certain set of images it will transcribe 
perfectly from the first attempt, simply because it has acquired the skills to 
read the documentation based on your specific steering and instructions. When 
you would then be able to query this model in the API, your prompt would simply 
need to be an image and it would know what to do and result you the transcribed 
text back perfectly. For now I still need the long prompt ;).

I thought this info might be useful to you, but can imagine that you are well 
aware of this already seen as you are already in this field.

I have attached my code and the docs you need to run it. All paths in the code 
need to be changed of course to the locations where you will put the source 
files.
Also you need your own OpenAI API key which you can get here 
https://help.openai.com/en/articles/4936850-where-do-i-find-my-api-key (It is a 
paid service so you need to add min $5 on your account)

Some additional info:
- The script uses a combination of OCR, classic image recognition and GPT-4 
Vision to get all different datapoints. Where OCR or image recognition sufficed 
I applied this because a deterministic procedure seems preferably when 
sufficient.
- The special symbols in the text were found with classical image recognition 
and put in a dictionary based on location in the body text. I then used this 
dict to replace all symbol placeholders in the GPT-4 transcription with the 
actual symbols from the image recognition dict. This is overly complex, but was 
the only way to get the accuracy I wanted and being able to only prompt vanilla 
GPT-4 Vision. When you will be able to query your own custom trained GPT-4 
Vision the replacement step will not be necessary as with training, it can 
learn to recognize the symbols itself. I have tested this in the browser 
interface and this is the case. When I correct a symbol transcription once it 
remembers this for future transcriptions in the same chat.
- The script part for GPT-4 Vision already implements a batching method, but as 
batching is not yet allowed on the OpenAI API side, the batch size is set to 
'1'. However in the browser interface you can upload up to 10 images in 1 
prompt, so I suspect this will become available via the API sometime in the 
future. It suffices then to increase the BATCH_size on line 489 to start using 
this option.

So, this was a very long explanation and I am sorry for that. I am however very 
enthousiastic about the results and you surely helped me along. Thanks again! 
Also curious to see how you would use this in your field :)

Let me know what you think! :)

Kind regards
Paulus

>> Attachments
--
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 
[email protected]<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1b5385ca-2607-4dd3-9fad-4aa3dc6cbc79n%40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/1b5385ca-2607-4dd3-9fad-4aa3dc6cbc79n%40googlegroups.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/YQBPR0101MB9902CD72783569A365696A0CDCB4A%40YQBPR0101MB9902.CANPRD01.PROD.OUTLOOK.COM.

Reply via email to