Hi Adam, > Kirk, > > In this test are you running a CPE or just an AnalysisEngine? If it > is a CPE do you know what your CAS Pool size is?
It's an AnalysisEngine. > When a CAS is created it does allocate a large heap which is then > filled as you create annotations. By default I believe this is > 500,000 cells (2MB) per CAS, but this can be overridden (see > UIMAFramework.getDefaultPerformanceTuningPropeties()). So this can > defintely be one source of memory overhead. As you saw it does not > grow with larger documents, it will only grow if you create enough > annotations to fill up the allocated space. I noticed that this is tweak-able and set it to something insanely small (like 100). But, as you said, it grows as the number of annotations grow. Since the parameter is under the umbrella of performance, I'd assume that it would actually be better to pre-allocate close to what we're going to use. Thanks! Kirk > On 5/17/07, Kirk True <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > I have begun getting seeing heavy memory use when processing > largish > > documents through a UIMA pipeline. I wanted to make sure what I'm > > seeing with regard to UIMA's internal memory use is on par with > > expectations. > > > > It looks like either for a 1,500,000 byte or a 15,000,000 byte > document > > with the same annotations (100,000 10-character annotations), we > incur > > a ~13 MB "overhead" for internal UIMA data structures. Is this in > line > > with expectations? > > > > Details: > > > > In the interest of narrowing down the issue, I made a very simple > test > > annotator which mimics what my annotators do. The annotator creates > a > > document of N bytes which is set in a view in the CAS, then it > > transforms the bytes to an HTML string that is then set in a view > in > > the CAS. Next, for each view, the annotator creates 50,000 > annotations. > > Each annotation has two 5-character attributes. I profiled my > > application using two profilers (JProbe and YourKit) and took heap > > snapshots before and after processing was performed and saw similar > > results. > > > > I know there's a lot going on under the hood, so I'm trying to get > an > > idea of what kind of size factor I can expect for a given document > > size. Right now, according to my calculations and verified by the > > profiler, the expected memory usage for just my data (i.e. the two > > views of the document and the strings making up the annotations) > is: > > > > For a 1,500,000 byte document: > > > > Original document 1,500,000 > > HTML document 2,800,000 > > TestCaseAnnotation 1,600,000 > > Annotation strings 4,800,000 > > Annotation char[]s 2,400,000 > > Integer 1,600,000 (UIMA internal > (Annotation)) > > int[] 9,300,000 (UIMA internal) > > java.util.HashMap$Entry 2,400,000 (UIMA internal) > > ----------------------------------- > > 26,400,000 > > > > For a 15,000,000 byte document: > > > > Original document 15,000,000 > > HTML document 28,000,000 > > TestCaseAnnotation 1,600,000 > > Annotation strings 4,800,000 > > Annotation char[]s 2,400,000 > > Integer 1,600,000 (UIMA internal > (Annotation)) > > int[] 9,300,000 (UIMA internal) > > java.util.HashMap$Entry 2,400,000 (UIMA internal) > > ----------------------------------- > > 65,100,000 > > > > I can post the code for the test cases if it helps. > > > > Thanks, > > Kirk > > >
