> So, my question is why does it matter if the bg is white and why the > check in place?
It's in place because the recognizer can only recognize black-on-white characters. It's a common programming mistake, and an occasional problem in the input, that characters are input as white-on-black. > I _could_ possibly set bgcheck = false, recompile and get away with > it. I don't think you need to recompile; you should be able to set that via an environment variable. So, you might ask: why is OCRopus not smart enough to do the right thing for both kinds of characters? It's actually not hard to do, but it would essentially mean trying each character both ways, which would mean that OCRopus takes twice as long to complete (there are some other ways one can do it). > (2) A binarize'd image was provided but then > "check_page_segmentation(seg)" crushed it right away: > ocroscript: segmenter.lua:39: CHECK > ./ocr-utils/ocr-segmentations.cc:275 (column > 0 && column < 32) || > column == 254 || column == 255 > stack traceback: > [C]: in function 'check_page_segmentation' > segmenter.lua:39: in main chunk > [C]: ? > > The binarize'd image was provided using: > 28 input = bytearray(); > 29 iulib.read_image_gray(input,arg[1]); > 30 binarizer:binarize(image,input); > > Why? OCRopus used to be more lenient in what kinds of segmentation images it accepted. Now it checks more carefully. However, the segmenters other than RAST haven't been updated yet to give correct segmentation output (since we use them rarely). Please file a bug report about this so that we remember to fix it. Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
