[ 
https://issues.apache.org/jira/browse/UIMA-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thilo Goetz reopened UIMA-1067:
-------------------------------


Fix in 2.2.2 hotfix 1.

> Remove char heap/ref heap in StringHeap of the CAS
> --------------------------------------------------
>
>                 Key: UIMA-1067
>                 URL: https://issues.apache.org/jira/browse/UIMA-1067
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.2.2
>            Reporter: Thilo Goetz
>            Assignee: Thilo Goetz
>             Fix For: 2.3
>
>
> The StringHeap class provides two ways to store strings: either as Java 
> strings, or by copying characters onto a character heap.  The second option 
> is only used for deserialization from a binary CAS.  However, even if not 
> used, this capability means a very significant memory overhead.  To 
> demonstrate this, I ran the following experiment.  As analysis engine, I used 
> our sandbox POS tagger.  It sets just one string feature on each token.  As 
> text, I used a 2.4MB input file (2x moby.txt).  To run this in IBM Java 
> 1.5.0_7 (which happens to be the JVM I'm interested in) you need to specify 
> -Xmx135M.  I checked 5MB increments.  The I patched the StringHeap 
> implementation to work without the additional book keeping overhead and ran 
> the experiment again.  I was then able to run with -Xmx115M.  This represents 
> a very significant gain, particularly given the fact that I ran so little 
> analysis (only tokens and sentences are produced, and only a single 
> string-valued feature set).  The new code also ran a tiny bit faster, but not 
> much.  One might see more improvement for analysis that is not as compute 
> intensive as the Tagger.
> The challenge is to make sure that the serialization code still works after 
> this change.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to