Re: UIMA internals memory footprint

Marshall Schor Tue, 22 May 2007 06:47:05 -0700

The indexes use int[] arrays.Kirk - what indexes do you have defined (if any)? Do you"addToIndexes..." any of

the annotations you create?

-Marshall


Adam Lally wrote:

On 5/18/07, Thilo Goetz <[EMAIL PROTECTED]> wrote:
You can estimate data use on the heap as follows. Each FS uses atleast oneint for the type information, plus whatever features it has. So avanillaannotation is 3 ints, one for the type, and one for the start and endfeatures,respectively. If you have two additional features, that's 5 ints, so20 bytes.If you use the JCas, you incur an additional overhead of a Javaobject foreach annotation. It's small, but I can't say off the top of my headhow smallexactly. Plus, the JCas objects are held in a HashMap (or some such,Marshall
correct me if I'm wrong), which incurs additional memory overhead.
In my experience, the CAS can easily reach 10 to 20 times the size ofthe inputdocument. If you have information reach token annotations, that'snot reallysurprising. (And this is without using JCas). Imagine you were tomanuallycreate Java objects that carry the same information, you would seeroughly
the same kind of overhead.
Using these numbers can we account for the 9,300,000 bytes of integerarrays?
100,000 annotations of size 5 cells = 500,000 ints, which is exactly
the default heap size.  But with the Sofa FS this will exceed the
default heap size.  It will grow by another 500,000 (I think).

So that accounts for 1,000,000 ints = 4,000,000 bytes.

Where are the other 5,300,000?



Likewise, what about the 1,600,000 bytes of Integers.  The JCAS hash
map only accounts for one per annotation, which in this case should
only be 400,000 bytes.

Maybe it would be useful to get Kirk's test case so we can take a look
at where exactly the memory is being used.  I think it would need to
be attached to a JIRA issue with the grant license to Apache box
checked?

-Adam

Re: UIMA internals memory footprint

Reply via email to