Paul, I havent gotten a chance to play around with that yet, but thanks for 
linking that, I might very well have to go that route.

I am having a very confusing issue tho that Im hoping maybe someone can 
shed some light on.

I've been testing out my language traineddata on a bunch of different 
boards, and for what seems like no rhyme or reason sometimes tesseract 
outputs perfect and other times I get total garbage. Even tho the file its 
seeing seems the same. It also changes depending on if I have the "-psm 6" 
flag added or not. Which makes sense that there would be a change, but I 
dont understand why its changing the way that it is. (I now know that the 
-psm 6 treats the image as a single uniform block of text)

Examples

Here is output when its working how I want it to.

This is the .tif file tesseract sees that I captured via 
"tessedit_write_images 1" config

http://i.imgur.com/uQdrEsQ.jpg

Here is how it detects the characters (viewed in jTessBoxEditor) with the 
"tesseract image.tif image -psm 6 -l lang batch.nochop makebox" command. 
With the resulting output of a "tesseract image.tif output -psm 6 -l lang" 
shown along side

http://i.imgur.com/Abzq2LC.jpg

It has a near perfect recognition with only a couple minor errors, the 
boxes are clearly drawn around both the letter and the score, and in the 
case of the wild card tiles it correctly detects it and recognizes it as a 
lowercase character (Which is what I trained it to do). removal of the -psm 
6 flag and nothing at all is detected and I get an "empty page!!" output.

Now another tif file that is as far as I can tell functionally identical 
(grabbed via write_images config)

http://i.imgur.com/ui1u8qk.jpg

this time tho, character recognition is terrible and Its not recognizing 
that the letter and score parts of a tile are the same character. Using the 
identical "tesseract image.tif image -psm 6 -l lang batch.nochop makebox" 
command and with the resulting output of a "tesseract image.tif output -psm 
6 -l lang" shown along side

http://i.imgur.com/anqdXGk.jpg

however curiously, if I do the same thing but this time without the -psm 6 
flag, It does a decent job (not as good as in the first example tho) and 
gets most of the letters right, however now it reads the .tif from top to 
bottom, and right to left. When I make a box file tho, it draws it the 
same, which I dont understand because its definitely detecting the 
characters differently. 
("tesseract image.tif image -l lang batch.nochop makebox" and "tesseract 
image.tif output -l lang")

http://i.imgur.com/o1Id32L.jpg

I am sooo confused. What is going on? I have about 4 screens it recognizes 
perfectly, and 7 or so that its garbage and use of the -psm is identical to 
as described here. I don't see any functional differences between them. 
Tile distribution doesnt seem to matter, how much border I give around 
doesnt seem to matter. It just detects some and refuses to detect others. 
It never flip flops either, if it works on a board, it always works, and if 
it doesnt, it never does.

here is my traineddata file if it helps 
http://www.idspispopd.net/fnl.traineddata

any ideas? Im starting to go mad :)

thanks!

Alex

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6027b26d-cd8a-493f-a4a5-22609b1c00dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to