Hi Art

Thanks so much for the help and also for the mail with code you sent! My 
reply is late, but that is because I also do this behind the hours as a 
hobby and I had some stuff to figure out. I tested the HOCR method you 
suggested and it indeed came up with notably better results than out of the 
box OCR multiline analysis. However, the results were still far from 
perfect and I wanted to do as least post editing as possible. In this case, 
as it is about card instructions that follow a strict syntax, it is 
important that really every character is transcribed correctly, to the last 
accent. I do not imagine that this is any different at your end where you 
do this professionally and on historical scans. How cool that you wanted to 
help me and that there is technical overlap in such diverse topics 
(historical documenst vs play cards :D)!

Because HOCR did still give me lots of post editing work that I wanted to 
avoid as much as possible (Ravensburger is launching 204 cards every 3 
months - and this is just a hobby :)) I did look further for other 
solutions and wanted to let you know what I found as you did for me. Maybe 
you will have some use for it and I would be more then happy to hear it. 
For the last year I have been looking quite heavily into the GPT 
capabilities of GPT-4 and as of last Monday GPT-4 Vision (which still is in 
preview) comes with image input-output capabilities. Under the hood I 
suppose GPT-4 was linked with DALL-E. However it be, I decided to test 
GPT-4 Vision on the body text and this yielded incredibly good results. 
Based on the image recognition and language reading capabilities it has out 
of the box, it is able to really accurately transcribe the content of the 
body text of the card. Because I can only call the vanilla GPT-4 Vision 
from the API I had to provide it a very extensive prompt telling it what to 
do with the special symbols and layout elements that were encountered in 
the image. See my custom prompt below:

"I present you an image with the body text of a Lorcana play card. "
"Please transcribe the text from the image following the below 
instructions: "
"- You should ignore all non-textual information apart from numbers, 
punctuation marks and the special graphical symbols mentioned by me below. "
"- Punctuation marks like commas, hyphens or dashes can be positioned close 
to graphical symbols. Punctuation marks are never part of a graphical 
symbol and must always be transcribed. Make sure not to miss a single 
punctuation mark. "
"- You should transcribe all numbers you find and keep them in their number 
format. "
"- Each special graphical symbol (hexagon, diamond, sunburst, ...) that you 
encounter has to be transcribed as '{s}'. "
"- If there is a black rectangular background of a text, this can never be 
considered as a symbol. Ignore it. Only keep the white text you find 
therein. "
"- Symbols can never be at the start of a text line. If you think to see a 
symbol there, ignore it."
"- Symbols are to be used singularly and not in sequence. "
"- It could be there is an artistic horizontal divider line in the image. 
Don't consider it as a symbol even if it has intricate linework. "
"- All text under the artistic horizontal divider line is a flavor text and 
has to be prefixed by '/FlavorText: ' in the transcription. "
"- '/FlavorText: ' will only be written once. Don't repeat '/FlavorText: ' 
even if there is a new textline in the flavor text. "
"- In the FlavorText every punctuation mark must be transcribed as well. 
Don't forget any comma, point or other punctuation mark."
"- Provide the transcription clearly, with no repetitions, formatting or 
explanation. "
"- Don't use your inbuilt Python funcionality. "

>> Next message

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/55e0eff3-7700-495c-ac13-fe8dde56e512n%40googlegroups.com.

Reply via email to