Re: uv3 iterators - success in avoiding all concurrent modification exceptions

2016-09-16 Thread Marshall Schor
re: remove the need for the snapshot iterators then?

Yes, mostly.  There's one other use for those iterators, I think - they can in
unusual circumstances, speed things up (but mostly, they slow things down a
little). The speed up happens if you're doing a fully sorted index with lots of
subtypes interleaved and do multiple moves forwards and backwards.  The snapshot
"flattens" the interleaved nature (if I remember correctly), and then the
forwards and backwards movement occurs more efficiently, without "rattling" the
multiple iterators (one per type) as they interleave.

-Marshall


On 9/16/2016 4:20 PM, Richard Eckart de Castilho wrote:
> On 16.09.2016, at 22:06, Marshall Schor  wrote:
 Does this seem like a good thing to try?
> Definitely sounds promising. So that would remove the need for the snapshot 
> iterators then?
>
> Cheers,
>
> -- Richard



Re: uv3 iterators - success in avoiding all concurrent modification exceptions

2016-09-16 Thread Richard Eckart de Castilho
On 16.09.2016, at 22:06, Marshall Schor  wrote:
> 
>>> Does this seem like a good thing to try?

Definitely sounds promising. So that would remove the need for the snapshot 
iterators then?

Cheers,

-- Richard

Re: uv3 iterators - success in avoiding all concurrent modification exceptions

2016-09-16 Thread Marshall Schor
One other benefit: UIMA automatically may "under-the-covers" remove and add back
some FSs if you update some features used as keys in indexes.  This could cause
ConcurrentModificationException if you had loops that did this, even though you
had no index operations coded explicitly as part of the loop.

-Marshall Schor


On 9/16/2016 3:59 PM, Marshall Schor wrote:
> As an experiment, I implemented a copy-on-write style of concurrent 
> modification
> exception prevention in UV3.
>
> It does minimal copying, only copying part of the index related to the
> particular type being updated; if no iterators are in use, there's no copying
> (but see below).
>
> The copy is done just once, even for multiple iterators, unless a subsequent
> iterator is created after another update has happened to that part of the 
> index.
>
> With this, you get a trade-off: no more concurrent modification exceptions; 
> you
> can modify indexes within loops, but (incrementally) copies are made of index
> parts if needed.  So it takes more space and time, due to copies sometimes 
> being
> made.
>
> In the following case, no copies will be made:
>
>   a) modify the indexes
>
>   b) create an iterator, iterate, then drop references to the iterator, and 
> have
> the garbage collector gc it.
>
>   c) repeat a and b as much as you like.
>
> If you're through with an iterator, but it hasn't been GC'd yet, then the
> modification code can't tell your through with the iterator, and has to make 
> a copy.
>
> Is this a good trade off to make?  Should we have 2 modes of running 
> pipelines -
> with/without this feature?
>
> -Marshall
>
> P.S. there's an edge case caught by the test cases.  In today's world, if you 
> do:
>a) modify the indexes
>b) start iterating
>c) modify the indexes
>d) do one of moveToFirst, Last, or just moveTo(fs), these "reset" the
> concurrent mod, and allow continuing use of the iterator, this time over the
> updated indexes.  I had to add some more details in the impl to make this work
> the same way... 
>
> On 9/14/2016 10:11 AM, Marshall Schor wrote:
>> Version 2 had snapshot iterators, used for two purposes:
>>
>> a) allowing underlying index modifications while iterating (over the 
>> snapshot).
>> Note that this includes even simple things like changing begin/end values in 
>> an
>> annotation (which could cause a remove/add-back to indexes action while those
>> features are changed).
>>
>> b) performance (in some edge cases, but also has a performance cost initially
>> (to create the snapshot))
>>
>> It might be reasonable to support case (a) more automatically.  One approach
>> might be to do a "copy on write" style for the index parts.  Java has, for
>> instance CopyOnWriteArrayList and CopyOnWriteArraySet.  This could add 1 more
>> level of indirection in using UIMA indexes; details need to be worked out and
>> could be complex (indexes need to be performant and thread-safe for reading).
>>
>> Does this seem like a good thing to try?
>>
>> -Marshall
>>
>>
>



Re: uv3 iterators - success in avoiding all concurrent modification exceptions

2016-09-16 Thread Marshall Schor
As an experiment, I implemented a copy-on-write style of concurrent modification
exception prevention in UV3.

It does minimal copying, only copying part of the index related to the
particular type being updated; if no iterators are in use, there's no copying
(but see below).

The copy is done just once, even for multiple iterators, unless a subsequent
iterator is created after another update has happened to that part of the index.

With this, you get a trade-off: no more concurrent modification exceptions; you
can modify indexes within loops, but (incrementally) copies are made of index
parts if needed.  So it takes more space and time, due to copies sometimes being
made.

In the following case, no copies will be made:

  a) modify the indexes

  b) create an iterator, iterate, then drop references to the iterator, and have
the garbage collector gc it.

  c) repeat a and b as much as you like.

If you're through with an iterator, but it hasn't been GC'd yet, then the
modification code can't tell your through with the iterator, and has to make a 
copy.

Is this a good trade off to make?  Should we have 2 modes of running pipelines -
with/without this feature?

-Marshall

P.S. there's an edge case caught by the test cases.  In today's world, if you 
do:
   a) modify the indexes
   b) start iterating
   c) modify the indexes
   d) do one of moveToFirst, Last, or just moveTo(fs), these "reset" the
concurrent mod, and allow continuing use of the iterator, this time over the
updated indexes.  I had to add some more details in the impl to make this work
the same way... 

On 9/14/2016 10:11 AM, Marshall Schor wrote:
> Version 2 had snapshot iterators, used for two purposes:
>
> a) allowing underlying index modifications while iterating (over the 
> snapshot).
> Note that this includes even simple things like changing begin/end values in 
> an
> annotation (which could cause a remove/add-back to indexes action while those
> features are changed).
>
> b) performance (in some edge cases, but also has a performance cost initially
> (to create the snapshot))
>
> It might be reasonable to support case (a) more automatically.  One approach
> might be to do a "copy on write" style for the index parts.  Java has, for
> instance CopyOnWriteArrayList and CopyOnWriteArraySet.  This could add 1 more
> level of indirection in using UIMA indexes; details need to be worked out and
> could be complex (indexes need to be performant and thread-safe for reading).
>
> Does this seem like a good thing to try?
>
> -Marshall
>
>