UNLV testing is another interesting bit of Tesseract's history.
I don't know if this code is alive in recent revisions of Tesseract,
but seemingly it was in 2.xx:
http://code.google.com/p/tesseract-ocr/wiki/TestingTesseract
There's a project containing evaluation tools governed by Ray Smith at
http://code.google.com/p/isri-ocr-evaluation-tools/

All the links I could find and that can tell about the UNLV output
format are currently dead, so I only managed to get the PDF
(originally called "AT-1995.pdf") from the Wayback
Machine. Download at
https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B4FRY5H4TwI8OWQ2MjM4MWItYjUwMy00MWI5LTg3YTctZWM1NzgwNWQ4Njhm&hl=en
Refer to Section 4 for format details. Imho at the moment
this is the only source of info on the UNLV format, aside from
studying Tess's code.

By the way, interesting, why do you need output in the UNLV format?

Warm regards,
Dmitri Silaev
www.CustomOCR.com





On Thu, May 5, 2011 at 2:35 PM, Lutz, Michael <[email protected]> wrote:
> Is there any information on the UNLV encoding.
> All I found was this, 
> http://www.mail-archive.com/[email protected]/msg01540.html
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected] [mailto:[email protected]] 
> Im Auftrag von Jimmy O'Regan
> Gesendet: Donnerstag, 5. Mai 2011 11:47
> An: [email protected]
> Betreff: Re: Best way to detect german mutated vowel (ü, ä, ö)
>
> On 5 May 2011 10:39, Lutz, Michael <[email protected]> wrote:
>> Thanks :).
>> So I have 2 options I guess, make it ISO-8859-1 to display it correctly in 
>> Windows or use the UNLV output string.
>
> Option 3: first use
> chcp 65001
> then run tesseract.
>
> --
> <Sefam> Are any of the mentors around?
> <jimregan> yes, they're the ones trolling you
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> This message is confidential and intended only for the addressee. If you have 
> received this message in error, please immediately notify the 
> [email protected] and delete it from your system as well as any copies. The 
> content of e-mails as well as traffic data may be monitored by NDS for 
> employment and security purposes.
> To protect the environment please do not print this e-mail unless necessary.
>
> An NDS Group Limited company. www.nds.com
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to