Thanks, Nick, unix is indeed cool, when one knows how :-)
Thanks so much for the commands. Appreciate the help. Shree Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 3, 2013 at 4:34 PM, Nick White <[email protected]> wrote: > Hi, > > I'm very glad you're finding the test suite useful :) > > I'll reply to you below. > > On Mon, Jun 03, 2013 at 12:32:32AM -0700, sdk wrote: > > I would like to use the results to help create additional training texts. > > Specifically I would like to delete the lines which have 100% > recognition so > > that what is left are the lines in error from the wordacc reports which > look > > like: > > > > 3 0 100.00 ख्य > > 2 0 100.00 ख्या > > 1 0 100.00 ख्याल > > 1 1 0.00 ख्यि > > 1 1 0.00 ख्यी > > 1 0 100.00 ख्र > > 1 0 100.00 ख्व > > 1 0 100.00 ख्वा > > 2 2 0.00 ख्स > > 1 1 0.00 ख्सि > > 1 0 100.00 खड़े > > 5 0 100.00 ग > > 2 2 0.00 गँ > > 3 0 100.00 गं > > 1 0 100.00 गंभीर > > > > It should be easy to say, ignore all lines that have 100.00 in them. > > > > Can you tell me what command I can use on Win7 - CYGWin installation to > take > > the report and output just the text in error. > > Sure. As you're in cygwin this is pretty easy, as it's exactly the > sort of thing unix tools are good for. > > If you just want to remove all lines which have 100% recognition, > you can add a 'awk' command like this: > > ocrevalutf8 wordacc ground.txt ocr.txt | awk '$3 != 100 {print $0}' > > results.txt > > or if you've already got a results file you want to change, you can > do this: > > awk '$3 != 100 {print $0}' < results.txt > newresults.txt > > If you only want the last sections where things are broken down by > word, you can add a sed commend, like this: > > ocrevalutf8 wordacc ground.txt ocr.txt | sed '/^ Count Missed %Right > $/,$ !d' | awk '$3 != 100 {print $0}' > results.txt > > See, isn't unix cool? :) > > Your accuracy results look great - good job on the training! > > I hope this helps, let me know if you need anything else. > > Nick > > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

