[jira] Updated: (UIMA-483) JCas method like getSofaDataString that doesn't copy the chars from the StringHeap

Marshall Schor (JIRA) Tue, 17 Jul 2007 10:45:25 -0700

     [ 
https://issues.apache.org/jira/browse/UIMA-483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marshall Schor updated UIMA-483:
--------------------------------

    Affects Version/s: 2.1

> JCas method like getSofaDataString that doesn't copy the chars from the 
> StringHeap
> ----------------------------------------------------------------------------------
>
>                 Key: UIMA-483
>                 URL: https://issues.apache.org/jira/browse/UIMA-483
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.1, 2.2
>            Reporter: Greg Holmberg
>
> I process large documents--the String I pass to JCas.setSofaDataString may be 
> as large 100 MBs (50,000,000 chars).  This is causing the JVM to run out of 
> memory when we have many concurrent AnalysisEngines running.
> I traced JCas.getSofaDataString(), and it eventually calls 
> StringHeap.getStringForCode(), which does a "new String" from it's private 
> char[] (which does a copy).
> This would happen for each annotator.  We have five, so now the 100 MBs has 
> become 600 MBs.  Multiply by 10 concurrent AnalysisEngines, and that's 6,000 
> MBs.
> Perhaps there could be a variation on getSofaDataString that returns one of 
> the other classes (besides String) that implements CharSequence.  A 
> CharBuffer perhaps, or even a new class the implements the CharSequence 
> interface but is read-only (just four methods).  Or even just return a char[] 
> or char[] and begin/end offset into the StringHeap.
> If nothing else, perhaps the document text should be treated specially from 
> all the little strings in the StringHeap, and be stored separately, so calls 
> to getSofaDataString() simply return a reference to an existing String 
> object, without copying.
> I'm open to possibilities, I just need the copying to end.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (UIMA-483) JCas method like getSofaDataString that doesn't copy the chars from the StringHeap

Reply via email to