Hi,
I'm very glad you're finding the test suite useful :)
I'll reply to you below.
On Mon, Jun 03, 2013 at 12:32:32AM -0700, sdk wrote:
> I would like to use the results to help create additional training texts.
> Specifically I would like to delete the lines which have 100% recognition so
> that what is left are the lines in error from the wordacc reports which look
> like:
>
> 3 0 100.00 ख्य
> 2 0 100.00 ख्या
> 1 0 100.00 ख्याल
> 1 1 0.00 ख्यि
> 1 1 0.00 ख्यी
> 1 0 100.00 ख्र
> 1 0 100.00 ख्व
> 1 0 100.00 ख्वा
> 2 2 0.00 ख्स
> 1 1 0.00 ख्सि
> 1 0 100.00 खड़े
> 5 0 100.00 ग
> 2 2 0.00 गँ
> 3 0 100.00 गं
> 1 0 100.00 गंभीर
>
> It should be easy to say, ignore all lines that have 100.00 in them.
>
> Can you tell me what command I can use on Win7 - CYGWin installation to take
> the report and output just the text in error.
Sure. As you're in cygwin this is pretty easy, as it's exactly the
sort of thing unix tools are good for.
If you just want to remove all lines which have 100% recognition,
you can add a 'awk' command like this:
ocrevalutf8 wordacc ground.txt ocr.txt | awk '$3 != 100 {print $0}' >
results.txt
or if you've already got a results file you want to change, you can
do this:
awk '$3 != 100 {print $0}' < results.txt > newresults.txt
If you only want the last sections where things are broken down by
word, you can add a sed commend, like this:
ocrevalutf8 wordacc ground.txt ocr.txt | sed '/^ Count Missed %Right $/,$
!d' | awk '$3 != 100 {print $0}' > results.txt
See, isn't unix cool? :)
Your accuracy results look great - good job on the training!
I hope this helps, let me know if you need anything else.
Nick
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.