Hi Stan, Your understanding is correct.
I'm aware of the AffinityRun and AffinityCall methods, and their simple key limitation. My use case may require 100,000 or more elements of information to be processed, so I don't want to call AffinityRun/Call that often. Each of these elements is identified by a key that is very efficiently encoded into the request (at the ~1 bit per key level) Further, each of those elements identifies work units that in themselves could have 100,000 or more different elements to be processed. One approach would be to explicitly break up the request into smaller ones, each targeted at a server node. But that requires the requestor to have intimate knowledge of the composition of the grid resources deployed, which is not desirable. The approach I'm looking into here is to have each server node receive the same request via Cluster.Broadcast(), and for those nodes to determine which elements in the overall request via the Key -> Partition affinity mapping. The mapping itself is very efficient, and as I noted in my original post determining the partition -> node map seems simple enough to do. I'm unsure of the performance of requesting that mapping for every request, versus caching it and adding watchers for rebalancing and topology change events to invalidate that cache mapping as needed (and how to wire those up). Thanks, Raymond. -----Original Message----- From: Stanislav Lukyanov [mailto:stanlukya...@gmail.com] Sent: Tuesday, April 17, 2018 12:02 AM To: firstname.lastname@example.org Subject: RE: Efficiently determining if cache keys belong to the local server node // Bcc’ing off dev@ignite list for now as it seems to be rather a user-space discussion. Hi, Let me take a step back first. It seems a bit like an XY problem (https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem), so I’d like to clarify the goals before diving into your current solution. AFAIU you want to process certain entries in your cache locally on the server that caches these entries. Is that correct? Have you looked at affinityRun and affinityCall (https://apacheignite.readme.io/docs/collocate-compute-and-data)? If yes, why they don’t work for you? One limitation with these methods is that they accept a single key to process. Can you process your keys one by one, or do you need to access multiple keys at once? Thanks, Stan From: Raymond Wilson Sent: 15 апреля 2018 г. 10:55 To: email@example.com Cc: d...@ignite.apache.org Subject: Efficiently determining if cache keys belong to the local server node I have a type of query that asks for potentially large numbers of information elements to be computed. Each element has an affinity key that maps it to a server node through an IAffinityFunction. The way the question is asked means that a single query broadcast to the compute projection (owning the cache containing the source data for the request) contains the identities of all the pieces of information needed to be processed. Each server node then scans the elements requested and identifies which ones are its responsibility according to the affinity key. Calculating the partition ID from the affinity key is simple (I have an affinity function set up and supplied to the cache configuration, or I could use IAffinity.GetPartition()), so the question became: How do I know the server node executing the query is responsible for that partition, and so should process this element? IE: I need to derive the vector of primary or backup partitions that this node is responsible for. I can query the partition map and return it, like this: ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name); public Dictionary<int, bool> primaryPartitions = affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().GetLocalNode()).ToDictionary(k => k, v => true); This lets me do a dictionary lookup, but its less efficient that having a complete partition map with simple array lookup semantics, like this: ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name); bool partitionMap = new bool[affinity.Partitions]; foreach (int partition in affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode())) partitionMap[partition] = true; This is a nice lookup for the query to determine which elements are its responsibility from the overall request. I’m not sure of the performance profile of this approach if I end up doing it a lot, so I’m considering caching this lookup and invalidate it if any event occurs that could modify the key -> partition map. Questions: 1. How big is the penalty when determining the full partition map like this? 2. If I decide to invalidate the cached map, what are all the events I’d need to listen to? 1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not sure if this gives visibility to the points in time when a rebalanced partition becomes active on the new node and so the partition map changes 2. Topology change events? (eg: adding a new backup node without rebalancing (if that is a thing) I looked for an event like that but have not found it so far, though I do know the affinity function can respond to this via AssignPartitions() 3. How do I provide my own affinity key mapper to for keys to partition IDs, but allow Ignite to map the partitions to nodes. The IAffinityFunction implementation requires both steps to be implemented. I’d prefer not to have the partition -> server mapping responsibility as this requires persistent configuration on the nodes to ensure stable mapping. Thanks, Raymond.