Thilo Goetz wrote:
Marshall Schor wrote:
Thilo Goetz wrote:
Bhavani Iyer wrote:
Hi Thilo,
There are two separate requirements being addressed here:
1) delta CAS for optimizing remote services.
Here its agreed that there should be no measurable overhead
when there
is no remoting.
There will be a single test against the high water mark. The
high
water mark defaults to 0. Only when the high
water mark is set to a value greater than 0 is logging of CAS
operations on FSs below the high water mark enabled.
2) Journaling for debugging aggregate components.
This capability is for Core UIMA as well as for remote
services. This
will have some additional overhead and will be have to be
explicitly enabled
by the aggregate controller for a component. Basically the aggregate
controller enables journaling by setting the high water mark before
the call
to process.
Regarding using the high water mark, this is already being used for
merging
CAS.
That's not a good thing, and certainly no justification of using
the same design here.
Can you say more about why this is not a good thing? I see it as an
internal design detail.
Precisely. It's an implementation detail of the CAS heap that
we should be able to change -- that we must be able to change
if we would like to improve on the heap.
We could change it if we found a better approach.
The CAS heap and
in particular the way it grows is a major performance bottleneck
for large documents. If we have other parts of UIMA depend on
the (bad) implementation details now, we'll never be able to
improve on the design.
Hmmm, I guess I was thinking that if we wanted to change this in the
future, we could. I agree it would be more difficult; we've changed
things like this before using the refactoring tools that let you see
pretty clearly various dependencies...
-Marshall