uzn file is simple text file with area per line. Area need to have this
structure[1]:

x y width height description


x, y, width, height are number (integer) separated by space
description is text, not used by tesseract, but can help you describe area
(e.g. header, footer, body...)
For examples see some file from isri-ocr-evaluation-tools[2].


[1] 
https://code.google.com/p/tesseract-ocr/source/browse/trunk/ccstruct/blread.cpp?r=1064#54

[2] 
https://code.google.com/p/isri-ocr-evaluation-tools/downloads/detail?name=zset.4B.tar.gz&can=2&q=


Zdenko


On Sat, May 24, 2014 at 6:54 AM, Glen Rubin <[email protected]> wrote:

> I would also like more information on how to make a UZN file appropriate
> to my image.  thanks!
>
> On Sunday, October 14, 2007 10:38:29 AM UTC-7, Christoph Reimmann wrote:
>>
>> Hi Ray,
>>
>> thx for your answer.
>>
>> I've tried out ocr with in.uzn and .... it worked very well. Thanks.
>>
>> But when is a zone file correctly formatted ? I can't find a
>> documentation. Do you know whether there is one ?
>>
>> Thx again in advance, Chris
>>
>> On 12 Okt., 18:47, "Ray Smith" <[email protected]> wrote:
>> > If you have made a correctly formatted UNLV zone file, then you should
>> name
>> > it in.uzn and use this command line:
>> > tesseract in.tif out.txt -l deu
>> > The in.uzn file will be found based on the name of the input tif file.
>> > Ray.
>> >
>> > On 10/12/07, [email protected] <[email protected]> wrote:
>> >
>> >
>> >
>> >
>> >
>> > > Tess does not at this point support multiple columns. You can write a
>> > > zoning software yourself and then use the dll interface to recognize
>> > > those parts of it.
>> >
>> > > On Oct 12, 3:35 am, Reimmann <[email protected]> wrote:
>> > > > Hi,
>> >
>> > > > I'm trying out Tesseract 2.01. I have a document that two columns of
>> > > > text, the quality of Tesseract's recognition is very good, but the
>> > > > columns are mixed, because tesseract recognizes the characters line
>> by
>> > > > line. So, I like to have two different zones, that are recognized
>> one
>> > > > after the other. I have tried out a tiff-image and a "zone-file"
>> that
>> > > > I found on the UNLV site, but this does not work. My command-line
>> > > > looks like that:
>> >
>> > > > tesseract in.tif out.txt -l deu in.zone
>> >
>> > > > in.tif is not compressed.
>> >
>> > > > When I debug this, the program exits at line 234 in variables.cpp
>> when
>> > > > trying to read_variables.
>> >
>> > > > Can anyone help ?
>> >
>> > > > Has anyone a useful pair of tiff-file and configuration-file for
>> > > > recognizing parts of a document ?
>> >
>> > > > thx in advance,
>> >
>> > > > Chris from Aachen, Germany- Zitierten Text ausblenden -
>> >
>> > - Zitierten Text anzeigen -
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/8ec643c4-2e0b-4f62-8d52-183da1789cda%40googlegroups.com<https://groups.google.com/d/msgid/tesseract-ocr/8ec643c4-2e0b-4f62-8d52-183da1789cda%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wA2%3DM4eFrANMfOGa1SejxhS11yHHtzVQy3V6rseHZoOg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to