Hi Alex,

One quick thought, if you're still using .uzn, it's only loaded with 
certain psm levels (it is with -psm 6, but not -psm 3, the default).  
And it's loaded from <imagename_without_extension>.uzn. So if you 
have any .uzn files lying around, they will be being applied with 
psm 6, but not if you don't explicitly state the -psm.


On Wed, Jul 09, 2014 at 04:17:47PM -0700, Alex Ryan wrote:
> Paul, I havent gotten a chance to play around with that yet, but thanks for
> linking that, I might very well have to go that route.
> I am having a very confusing issue tho that Im hoping maybe someone can shed
> some light on.
> I've been testing out my language traineddata on a bunch of different boards,
> and for what seems like no rhyme or reason sometimes tesseract outputs perfect
> and other times I get total garbage. Even tho the file its seeing seems the
> same. It also changes depending on if I have the "-psm 6" flag added or not.
> Which makes sense that there would be a change, but I dont understand why its
> changing the way that it is. (I now know that the -psm 6 treats the image as a
> single uniform block of text)
> Examples
> Here is output when its working how I want it to.
> This is the .tif file tesseract sees that I captured via 
> "tessedit_write_images
> 1" config
> http://i.imgur.com/uQdrEsQ.jpg
> Here is how it detects the characters (viewed in jTessBoxEditor) with the
> "tesseract image.tif image -psm 6 -l lang batch.nochop makebox" command. With
> the resulting output of a "tesseract image.tif output -psm 6 -l lang" shown
> along side
> http://i.imgur.com/Abzq2LC.jpg
> It has a near perfect recognition with only a couple minor errors, the boxes
> are clearly drawn around both the letter and the score, and in the case of the
> wild card tiles it correctly detects it and recognizes it as a lowercase
> character (Which is what I trained it to do). removal of the -psm 6 flag and
> nothing at all is detected and I get an "empty page!!" output.
> Now another tif file that is as far as I can tell functionally identical
> (grabbed via write_images config)
> http://i.imgur.com/ui1u8qk.jpg
> this time tho, character recognition is terrible and Its not recognizing that
> the letter and score parts of a tile are the same character. Using the
> identical "tesseract image.tif image -psm 6 -l lang batch.nochop makebox"
> command and with the resulting output of a "tesseract image.tif output -psm 6
> -l lang" shown along side
> http://i.imgur.com/anqdXGk.jpg
> however curiously, if I do the same thing but this time without the -psm 6
> flag, It does a decent job (not as good as in the first example tho) and gets
> most of the letters right, however now it reads the .tif from top to bottom,
> and right to left. When I make a box file tho, it draws it the same, which I
> dont understand because its definitely detecting the characters differently.
> ("tesseract image.tif image -l lang batch.nochop makebox" and "tesseract
> image.tif output -l lang")
> http://i.imgur.com/o1Id32L.jpg
> I am sooo confused. What is going on? I have about 4 screens it recognizes
> perfectly, and 7 or so that its garbage and use of the -psm is identical to as
> described here. I don't see any functional differences between them. Tile
> distribution doesnt seem to matter, how much border I give around doesnt seem
> to matter. It just detects some and refuses to detect others. It never flip
> flops either, if it works on a board, it always works, and if it doesnt, it
> never does.
> here is my traineddata file if it helps http://www.idspispopd.net/
> fnl.traineddata
> any ideas? Im starting to go mad :)
> thanks!
> Alex
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email
> to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/msgid/
> tesseract-ocr/6027b26d-cd8a-493f-a4a5-22609b1c00dc%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
For more options, visit https://groups.google.com/d/optout.

Reply via email to