On Thu, Apr 19, 2012 at 10:39 PM, xdhmoore <[email protected]> wrote:
> I am having exactly the same issue. I am trying to train based on > some very simple czech sentences arranged plainly on black and white. > It seems to not be recognizing the periods when creating the box file, > and it is throwing this error whenever I try to add a box around a > period. > > Please post examples. A picture is worth a thousand words. > On Apr 1, 5:18 pm, Overv <[email protected]> wrote: > > Let me start by providing my data: > > > > math.tudelft.tif:http://puu.sh/nt5Q > > math.tudelft.box:http://puu.sh/nt66 > > > > I made my box file by editing the output from tesseract with cowboxer > > from the Add-Ins page. I try to train it using this console command on > > Windows 7 64-bit with Tesseract 3.0.1: > > > > tesseract math.tudelft.tif math.tudelft box.train > > > > The output: > > > > APPLY_BOXES: boxfile line 10/3 > ((2605,1801),(2653,1869)):FAILURE!Couldn'tfindamatchingblob > > APPLY_BOXES: boxfile line 12/( > ((2614,1650),(2646,1768)):FAILURE!Couldn'tfindamatchingblob > > APPLY_BOXES: boxfile line 14/+ > ((2573,1527),(2684,1640)):FAILURE!Couldn'tfindamatchingblob > > APPLY_BOXES: boxfile line 16/x > ((2583,1415),(2676,1498)):FAILURE!Couldn'tfindamatchingblob > > APPLY_BOXES: boxfile line 18/x > ((2586,1290),(2673,1371)):FAILURE!Couldn'tfindamatchingblob > > APPLY_BOXES: boxfile line 19/o > ((2589,1163),(2669,1246)):FAILURE!Couldn'tfindamatchingblob > > APPLY_BOXES: boxfile line 24/s > ((2598,784),(2662,871)):FAILURE!Couldn'tfindamatchingblob > > ... > > > > I tried making the boxes bigger and smaller, but nothing except for > > removing them helps. What is wrong with them? > > First of all: I am not aware about version 3.0.1. There was 3.00, 3.00.1 (for windows only), 3.01 and 3.02 (at the moment in svn only) If you use 3.01: IMO this is penalty for not following instruction[1] and training based on factitious input. As far as I remember I did not have this problem with 3.01 and 3.02 when running training on real scans. Just from experience: if you try to create factitious image input, try to group symbol to imitate words and keep the same height of row (3.02 is checking xheight of row, so you will got additional warning e.g. "row xheight=52, but median xheight = 68.5") [1] http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3#Generate_Training_Images -- Zdenko -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

