[ 
https://issues.apache.org/jira/browse/UIMA-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513212
 ] 

Marshall Schor commented on UIMA-483:
-------------------------------------

Here's Eddie's comment from email thread:

7/9/2007 12:03 PM

I tend to agree with this. The overall design will get simpler. The
downside will be having to create all the string objects at
deserialization, but this will still leave binary serialization much
faster than XML serialization. 

> JCas method like getSofaDataString that doesn't copy the chars from the 
> StringHeap
> ----------------------------------------------------------------------------------
>
>                 Key: UIMA-483
>                 URL: https://issues.apache.org/jira/browse/UIMA-483
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.2
>            Reporter: Greg Holmberg
>
> I process large documents--the String I pass to JCas.setSofaDataString may be 
> as large 100 MBs (50,000,000 chars).  This is causing the JVM to run out of 
> memory when we have many concurrent AnalysisEngines running.
> I traced JCas.getSofaDataString(), and it eventually calls 
> StringHeap.getStringForCode(), which does a "new String" from it's private 
> char[] (which does a copy).
> This would happen for each annotator.  We have five, so now the 100 MBs has 
> become 600 MBs.  Multiply by 10 concurrent AnalysisEngines, and that's 6,000 
> MBs.
> Perhaps there could be a variation on getSofaDataString that returns one of 
> the other classes (besides String) that implements CharSequence.  A 
> CharBuffer perhaps, or even a new class the implements the CharSequence 
> interface but is read-only (just four methods).  Or even just return a char[] 
> or char[] and begin/end offset into the StringHeap.
> If nothing else, perhaps the document text should be treated specially from 
> all the little strings in the StringHeap, and be stored separately, so calls 
> to getSofaDataString() simply return a reference to an existing String 
> object, without copying.
> I'm open to possibilities, I just need the copying to end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to