1. Retrieve the keys (millions) to be evicted and pass this to
localData.removeAll()
MS: This option is probably not a good idea for a few reasons.
First, it will take quite a lot of memory to pass the list to
localData.removeAll, thereby making a significant amount of garbage.
Second, it will result in one massive collection of GemFire Tombstones
exactly 10 minutes later, which will likely affect other processing.

2. Parallel stream over the local data, filtering and calling
localData.remove() on each entry. This is safe as no
ConcurrentModificationException should be thrown from entrySet().
MS: This is likely to be the most efficient in terms of memory usage. The
individual removes might be significantly slower than removeAll()s with
batches of 100 or 1000. Also with the removeAll()s there is probably no
need for multiple threads. It would be good to put some time between
batches so that the Tombstone collection will be in small batches as well.

3. Retrieve the keys to evict, then batch calling removeAll() for each
batch on a single thread.
MS: Actually this if done on the local data is the same as what I described
in my answer to #2.

Of course these are just my best guess, actual testing and measurement is
probably a good idea.

One more thing, when it comes to big batch removes I always question if my
objective is to get them removed as quickly as possible, or get them
removed with the least impact to system resources. Usually the answer ends
up being the least impact to system resources. If that is the case, doing
things that make it go faster are worse than doing things that make it go
slower.


--
Mike Stolz
Principal Engineer, GemFire Product Lead
Mobile: +1-631-835-4771



On Thu, Nov 1, 2018 at 7:26 AM <phillip.appleg...@gmail.com> wrote:

> Hi,
>
> I am investigating removing a large amount of data from a partitioned
> region (with 1 redundant copy) and I am wondering what is the recommended
> way to run this operation. I understand the operation should be invoked as
> a function executing on the servers but as for the remove algorithm there
> appear to be multiple options:
>
> 1. Retrieve the keys (millions) to be evicted and pass this to
> localData.removeAll()
>
> 2. Parallel stream over the local data, filtering and calling
> localData.remove() on each entry. This is safe as no
> ConcurrentModificationException should be thrown from entrySet().
>
> 3. Retrieve the keys to evict, then batch calling removeAll() for each
> batch on a single thread.
>
> 4. Others?
>
> I appreciate this question is fairly open ended and subject to specific
> performance tests, but I am interested to hear opinions/recommendations on
> how this sort of operations should be performed.
>
> Thanks in advance,
>
> Phil
>

Reply via email to