[
https://issues.apache.org/jira/browse/UIMA-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marshall Schor updated UIMA-1089:
---------------------------------
Affects Version/s: 2.3
defer beyond 2.3.0
> Space/Time tradeoffs in the CAS
> -------------------------------
>
> Key: UIMA-1089
> URL: https://issues.apache.org/jira/browse/UIMA-1089
> Project: UIMA
> Issue Type: Improvement
> Components: Core Java Framework
> Affects Versions: 2.2.2, 2.3
> Reporter: Marshall Schor
> Priority: Minor
>
> Investigate / implement optimizations that trade user-controllable time
> (running the optimizations) for space. One such optimization could be:
> sharing strings. To do the sharing requires additional computation and
> (temporary) storage to detect the sharing opportunities, but results in space
> savings. For instance, a common annotation might assign short strings like
> "noun" to a "part-of-speech" feature. If you are processing a large
> document, there may be a large number of these kinds of string valued
> features, picked from a small pool of allowable values. The CAS's string
> storage might be able to be optimized to share the string references in this
> case, at a cost of temporarily creating a hash table of the unique strings
> and using it to identify sharing possibilities. A new API call to do this
> optimization would isolate the performance/space overhead of doing this
> optimization to just those users and times where it makes sense to do this.
> An alternative would be to automatically figure this out for some selected
> kinds of optimizations, but I'm not sure that could be done without impacting
> finely-tuned systems negatively.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.