Re: Optimizing Local Cache Iteration

2016-09-19 Thread Taras Ledkov

Hi, Jaime

AFAIK there is no way to hook the rebalance now.

Do you try the CacheMemoryMode.OFFHEAP_VALUES mode? I understand that 
this option isn't related to keep time at serialization

but it may affect performance in case of big value object.

On 18.09.2016 6:44, jaime spicciati wrote:

All,
I put together a simple ignite application to iterate over all cache 
entries using broadcast() and scanQuery() (I am currently evaluating 
the two approaches). The goal is to iterate over all of the cache 
values local to the ignite instance as fast as possible.


The data I am storing in the grid is relatively large, 10k of data for 
each cache value and the keys are just strings. My initial benchmarks 
are decent, I am able to iterate over 133k entries/second per ignite 
instance. If I store just the keys and not the large cache values I 
can iterate over the keys at a rate of around 1.8 million 
entries/second (getting as close to this performance is my goal)


The compromise I have found is to store the 10k of data via java 
unsafe() calls offheap, and annotate the field with transient 
(avoiding serialization). This approach is giving me around 1.4 
million entries /second which is orders of magnitude faster than the 
133k when the large data was serialized.


I believe the unsafe() approach will work but will break down if the 
Ignite framework attempts to rebalance which in turn will start 
copying the data around the cluster. If I go down this road are there 
hooks anywhere to deserialize the offheap data before it is shipped to 
another node during a rebalance? Or am I barking up the wrong tree on 
this one entirely?


I have done all of the typical optimizations such as turning off 
copyOnRead, reducing backups, setting a large heap, etc.


Thanks



--
Taras Ledkov
Mail-To: tled...@gridgain.com



Optimizing Local Cache Iteration

2016-09-17 Thread jaime spicciati
All,
I put together a simple ignite application to iterate over all cache
entries using broadcast() and scanQuery() (I am currently evaluating the
two approaches). The goal is to iterate over all of the cache values local
to the ignite instance as fast as possible.

The data I am storing in the grid is relatively large, 10k of data for each
cache value and the keys are just strings. My initial benchmarks are
decent, I am able to iterate over 133k entries/second per ignite instance.
If I store just the keys and not the large cache values I can iterate over
the keys at a rate of around 1.8 million entries/second (getting as close
to this performance is my goal)

The compromise I have found is to store the 10k of data via java unsafe()
calls offheap, and annotate the field with transient (avoiding
serialization). This approach is giving me around 1.4 million entries
/second which is orders of magnitude faster than the 133k when the large
data was serialized.

I believe the unsafe() approach will work but will break down if the Ignite
framework attempts to rebalance which in turn will start copying the data
around the cluster. If I go down this road are there hooks anywhere to
deserialize the offheap data before it is shipped to another node during a
rebalance? Or am I barking up the wrong tree on this one entirely?

I have done all of the typical optimizations such as turning off
copyOnRead, reducing backups, setting a large heap, etc.

Thanks