[ 
https://issues.apache.org/jira/browse/UIMA-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Baessler updated UIMA-483:
----------------------------------

    Component/s: Core Java Framework

> JCas method like getSofaDataString that doesn't copy the chars from the 
> StringHeap
> ----------------------------------------------------------------------------------
>
>                 Key: UIMA-483
>                 URL: https://issues.apache.org/jira/browse/UIMA-483
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.1
>            Reporter: Greg Holmberg
>
> I process large documents--the String I pass to JCas.setSofaDataString may be 
> as large 100 MBs (50,000,000 chars).  This is causing the JVM to run out of 
> memory when we have many concurrent AnalysisEngines running.
> I traced JCas.getSofaDataString(), and it eventually calls 
> StringHeap.getStringForCode(), which does a "new String" from it's private 
> char[] (which does a copy).
> This would happen for each annotator.  We have five, so now the 100 MBs has 
> become 600 MBs.  Multiply by 10 concurrent AnalysisEngines, and that's 6,000 
> MBs.
> Perhaps there could be a variation on getSofaDataString that returns one of 
> the other classes (besides String) that implements CharSequence.  A 
> CharBuffer perhaps, or even a new class the implements the CharSequence 
> interface but is read-only (just four methods).  Or even just return a char[] 
> or char[] and begin/end offset into the StringHeap.
> If nothing else, perhaps the document text should be treated specially from 
> all the little strings in the StringHeap, and be stored separately, so calls 
> to getSofaDataString() simply return a reference to an existing String 
> object, without copying.
> I'm open to possibilities, I just need the copying to end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to