Hi Alex, One quick thought, if you're still using .uzn, it's only loaded with certain psm levels (it is with -psm 6, but not -psm 3, the default). And it's loaded from <imagename_without_extension>.uzn. So if you have any .uzn files lying around, they will be being applied with psm 6, but not if you don't explicitly state the -psm.
Nick On Wed, Jul 09, 2014 at 04:17:47PM -0700, Alex Ryan wrote: > Paul, I havent gotten a chance to play around with that yet, but thanks for > linking that, I might very well have to go that route. > > I am having a very confusing issue tho that Im hoping maybe someone can shed > some light on. > > I've been testing out my language traineddata on a bunch of different boards, > and for what seems like no rhyme or reason sometimes tesseract outputs perfect > and other times I get total garbage. Even tho the file its seeing seems the > same. It also changes depending on if I have the "-psm 6" flag added or not. > Which makes sense that there would be a change, but I dont understand why its > changing the way that it is. (I now know that the -psm 6 treats the image as a > single uniform block of text) > > Examples > > Here is output when its working how I want it to. > > This is the .tif file tesseract sees that I captured via > "tessedit_write_images > 1" config > > http://i.imgur.com/uQdrEsQ.jpg > > Here is how it detects the characters (viewed in jTessBoxEditor) with the > "tesseract image.tif image -psm 6 -l lang batch.nochop makebox" command. With > the resulting output of a "tesseract image.tif output -psm 6 -l lang" shown > along side > > http://i.imgur.com/Abzq2LC.jpg > > It has a near perfect recognition with only a couple minor errors, the boxes > are clearly drawn around both the letter and the score, and in the case of the > wild card tiles it correctly detects it and recognizes it as a lowercase > character (Which is what I trained it to do). removal of the -psm 6 flag and > nothing at all is detected and I get an "empty page!!" output. > > Now another tif file that is as far as I can tell functionally identical > (grabbed via write_images config) > > http://i.imgur.com/ui1u8qk.jpg > > this time tho, character recognition is terrible and Its not recognizing that > the letter and score parts of a tile are the same character. Using the > identical "tesseract image.tif image -psm 6 -l lang batch.nochop makebox" > command and with the resulting output of a "tesseract image.tif output -psm 6 > -l lang" shown along side > > http://i.imgur.com/anqdXGk.jpg > > however curiously, if I do the same thing but this time without the -psm 6 > flag, It does a decent job (not as good as in the first example tho) and gets > most of the letters right, however now it reads the .tif from top to bottom, > and right to left. When I make a box file tho, it draws it the same, which I > dont understand because its definitely detecting the characters differently. > ("tesseract image.tif image -l lang batch.nochop makebox" and "tesseract > image.tif output -l lang") > > http://i.imgur.com/o1Id32L.jpg > > I am sooo confused. What is going on? I have about 4 screens it recognizes > perfectly, and 7 or so that its garbage and use of the -psm is identical to as > described here. I don't see any functional differences between them. Tile > distribution doesnt seem to matter, how much border I give around doesnt seem > to matter. It just detects some and refuses to detect others. It never flip > flops either, if it works on a board, it always works, and if it doesnt, it > never does. > > here is my traineddata file if it helps http://www.idspispopd.net/ > fnl.traineddata > > any ideas? Im starting to go mad :) > > thanks! > > Alex > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email > to tesseract-ocr+unsubscr...@googlegroups.com. > To post to this group, send email to tesseract-ocr@googlegroups.com. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/msgid/ > tesseract-ocr/6027b26d-cd8a-493f-a4a5-22609b1c00dc%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/20140710144523.GA4993%40manta.lan. For more options, visit https://groups.google.com/d/optout.