1. Retrieve the keys (millions) to be evicted and pass this to localData.removeAll() MS: This option is probably not a good idea for a few reasons. First, it will take quite a lot of memory to pass the list to localData.removeAll, thereby making a significant amount of garbage. Second, it will result in one massive collection of GemFire Tombstones exactly 10 minutes later, which will likely affect other processing.
2. Parallel stream over the local data, filtering and calling localData.remove() on each entry. This is safe as no ConcurrentModificationException should be thrown from entrySet(). MS: This is likely to be the most efficient in terms of memory usage. The individual removes might be significantly slower than removeAll()s with batches of 100 or 1000. Also with the removeAll()s there is probably no need for multiple threads. It would be good to put some time between batches so that the Tombstone collection will be in small batches as well. 3. Retrieve the keys to evict, then batch calling removeAll() for each batch on a single thread. MS: Actually this if done on the local data is the same as what I described in my answer to #2. Of course these are just my best guess, actual testing and measurement is probably a good idea. One more thing, when it comes to big batch removes I always question if my objective is to get them removed as quickly as possible, or get them removed with the least impact to system resources. Usually the answer ends up being the least impact to system resources. If that is the case, doing things that make it go faster are worse than doing things that make it go slower. -- Mike Stolz Principal Engineer, GemFire Product Lead Mobile: +1-631-835-4771 On Thu, Nov 1, 2018 at 7:26 AM <phillip.appleg...@gmail.com> wrote: > Hi, > > I am investigating removing a large amount of data from a partitioned > region (with 1 redundant copy) and I am wondering what is the recommended > way to run this operation. I understand the operation should be invoked as > a function executing on the servers but as for the remove algorithm there > appear to be multiple options: > > 1. Retrieve the keys (millions) to be evicted and pass this to > localData.removeAll() > > 2. Parallel stream over the local data, filtering and calling > localData.remove() on each entry. This is safe as no > ConcurrentModificationException should be thrown from entrySet(). > > 3. Retrieve the keys to evict, then batch calling removeAll() for each > batch on a single thread. > > 4. Others? > > I appreciate this question is fairly open ended and subject to specific > performance tests, but I am interested to hear opinions/recommendations on > how this sort of operations should be performed. > > Thanks in advance, > > Phil >