Paul, I havent gotten a chance to play around with that yet, but thanks for linking that, I might very well have to go that route.
I am having a very confusing issue tho that Im hoping maybe someone can shed some light on. I've been testing out my language traineddata on a bunch of different boards, and for what seems like no rhyme or reason sometimes tesseract outputs perfect and other times I get total garbage. Even tho the file its seeing seems the same. It also changes depending on if I have the "-psm 6" flag added or not. Which makes sense that there would be a change, but I dont understand why its changing the way that it is. (I now know that the -psm 6 treats the image as a single uniform block of text) Examples Here is output when its working how I want it to. This is the .tif file tesseract sees that I captured via "tessedit_write_images 1" config http://i.imgur.com/uQdrEsQ.jpg Here is how it detects the characters (viewed in jTessBoxEditor) with the "tesseract image.tif image -psm 6 -l lang batch.nochop makebox" command. With the resulting output of a "tesseract image.tif output -psm 6 -l lang" shown along side http://i.imgur.com/Abzq2LC.jpg It has a near perfect recognition with only a couple minor errors, the boxes are clearly drawn around both the letter and the score, and in the case of the wild card tiles it correctly detects it and recognizes it as a lowercase character (Which is what I trained it to do). removal of the -psm 6 flag and nothing at all is detected and I get an "empty page!!" output. Now another tif file that is as far as I can tell functionally identical (grabbed via write_images config) http://i.imgur.com/ui1u8qk.jpg this time tho, character recognition is terrible and Its not recognizing that the letter and score parts of a tile are the same character. Using the identical "tesseract image.tif image -psm 6 -l lang batch.nochop makebox" command and with the resulting output of a "tesseract image.tif output -psm 6 -l lang" shown along side http://i.imgur.com/anqdXGk.jpg however curiously, if I do the same thing but this time without the -psm 6 flag, It does a decent job (not as good as in the first example tho) and gets most of the letters right, however now it reads the .tif from top to bottom, and right to left. When I make a box file tho, it draws it the same, which I dont understand because its definitely detecting the characters differently. ("tesseract image.tif image -l lang batch.nochop makebox" and "tesseract image.tif output -l lang") http://i.imgur.com/o1Id32L.jpg I am sooo confused. What is going on? I have about 4 screens it recognizes perfectly, and 7 or so that its garbage and use of the -psm is identical to as described here. I don't see any functional differences between them. Tile distribution doesnt seem to matter, how much border I give around doesnt seem to matter. It just detects some and refuses to detect others. It never flip flops either, if it works on a board, it always works, and if it doesnt, it never does. here is my traineddata file if it helps http://www.idspispopd.net/fnl.traineddata any ideas? Im starting to go mad :) thanks! Alex -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/6027b26d-cd8a-493f-a4a5-22609b1c00dc%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.