Re: MapReduce - How to efficiently scan through subset of the caches?

Stephen Darlington Fri, 23 Apr 2021 09:34:34 -0700

Add an index on the tenant, run a SQL query with setLocal(true). You might also 
want to look at the IgniteCompute#affinityRun method that takes a partition as 
a parameter and run it by partition rather than node (higher degree of 
parallelism, potentially makes failover easier).


> On 23 Apr 2021, at 17:16, William.L <wil...@gmail.com> wrote:
> 
> Hi,
> 
> I am investigating whether the MapReduce API is the right tool for my
> scenario. Here's the context of the caches:
> * Multiple caches for different type of dataset
> * Each cache has multi-tenant data and the tenant id is part of the cache
> key
> * Each cache entry is a complex json/binary object that I want to do
> computation on (let's just say it is hard to do it in SQL) and return some
> complex results for each entry (e.g. a dictionary) that I want to do
> reduce/aggregation on.
> * The cluster is persistence enabled because we have more data then memory
> 
> My scenario is to do the MapReduce operation only on data for a specific
> tenant (small subset of the data). From reading the forum about MapReduce,
> it seems like the best way to do this is using the IgniteCache.localEntries
> API and iterate through the node's local cache. My concern with this
> approach is that we are looping through the whole cache (K&V) which is very
> inefficient. Is there a more efficient way to filter only the relevant keys
> and then access the matching entries only?
> 
> Thanks.
> 
> 
> 
> 
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: MapReduce - How to efficiently scan through subset of the caches?

Reply via email to