One other consideration is the impact of large batch processing operations on your workload. Do you have sufficient cpu / network capacity to handle the extra work? How will that affect response times? I’ve seen some users break down work into smaller batches and amortize the cost over time.
YMMV, Anthony > On Nov 1, 2018, at 4:26 AM, phillip.appleg...@gmail.com wrote: > > Hi, > > I am investigating removing a large amount of data from a partitioned region > (with 1 redundant copy) and I am wondering what is the recommended way to run > this operation. I understand the operation should be invoked as a function > executing on the servers but as for the remove algorithm there appear to be > multiple options: > > 1. Retrieve the keys (millions) to be evicted and pass this to > localData.removeAll() > > 2. Parallel stream over the local data, filtering and calling > localData.remove() on each entry. This is safe as no > ConcurrentModificationException should be thrown from entrySet(). > > 3. Retrieve the keys to evict, then batch calling removeAll() for each batch > on a single thread. > > 4. Others? > > I appreciate this question is fairly open ended and subject to specific > performance tests, but I am interested to hear opinions/recommendations on > how this sort of operations should be performed. > > Thanks in advance, > > Phil