RE: Efficiently determining if cache keys belong to the local server node

2018-04-16 Thread Stanislav Lukyanov
// Bcc’ing off dev@ignite list for now as it seems to be rather a user-space 
discussion.

Hi,

Let me take a step back first. It seems a bit like an XY problem 
(https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem),
so I’d like to clarify the goals before diving into your current solution.

AFAIU you want to process certain entries in your cache locally on the server 
that caches these entries. Is that correct?
Have you looked at affinityRun and affinityCall 
(https://apacheignite.readme.io/docs/collocate-compute-and-data)? If yes, why 
they don’t work for you?
One limitation with these methods is that they accept a single key to process. 
Can you process your keys one by one, or do you need to access multiple keys at 
once?

Thanks,
Stan 

From: Raymond Wilson
Sent: 15 апреля 2018 г. 10:55
To: u...@ignite.apache.org
Cc: dev@ignite.apache.org
Subject: Efficiently determining if cache keys belong to the local server node

I have a type of query that asks for potentially large numbers of
information elements to be computed. Each element has an affinity key that
maps it to a server node through an IAffinityFunction.



The way the question is asked means that a single query broadcast to the
compute projection (owning the cache containing the source data for the
request) contains the identities of all the pieces of information needed to
be processed.



Each server node then scans the elements requested and identifies which
ones are its responsibility according to the affinity key.



Calculating the partition ID from the affinity key is simple (I have an
affinity function set up and supplied to the cache configuration, or I
could use IAffinity.GetPartition()), so the question became: How do I know
the server node executing the query is responsible for that partition, and
so should process this element? IE: I need to derive the vector of primary
or backup  partitions that this node is responsible for.



I can query the partition map and return it, like this:



ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

public Dictionary<int, bool> primaryPartitions =
affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().GetLocalNode()).ToDictionary(k
=> k, v => true);



This lets me do a dictionary lookup, but its less efficient that having a
complete partition map with simple array lookup semantics, like this:



ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

bool[] partitionMap = new bool[affinity.Partitions];



foreach (int partition in
affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode()))

partitionMap[partition] = true;



This is a nice lookup for the query to determine which elements are its
responsibility from the overall request.



I’m not sure of the performance profile of this approach if I end up doing
it a lot, so I’m considering caching this lookup and invalidate it if any
event occurs that could modify the key -> partition map.



Questions:



   1. How big is the penalty when determining the full partition map like
   this?
   2. If I decide to invalidate the cached map, what are all the events I’d
   need to listen to?
  1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not
  sure if this gives visibility to the points in time when a rebalanced
  partition becomes active on the new node and so the partition map changes
  2. Topology change events? (eg: adding a new backup node without
  rebalancing (if that is a thing) I looked for an event like that but have
  not found it so far, though I do know the affinity function can
respond to
  this via AssignPartitions()
   3. How do I provide my own affinity key mapper to for keys to partition
   IDs, but allow Ignite to map the partitions to nodes. The IAffinityFunction
   implementation requires both steps to be implemented. I’d prefer not to
   have the partition -> server mapping responsibility as this requires
   persistent configuration on the nodes to ensure stable mapping.



Thanks,

Raymond.



Efficiently determining if cache keys belong to the local server node

2018-04-15 Thread Raymond Wilson
I have a type of query that asks for potentially large numbers of
information elements to be computed. Each element has an affinity key that
maps it to a server node through an IAffinityFunction.



The way the question is asked means that a single query broadcast to the
compute projection (owning the cache containing the source data for the
request) contains the identities of all the pieces of information needed to
be processed.



Each server node then scans the elements requested and identifies which
ones are its responsibility according to the affinity key.



Calculating the partition ID from the affinity key is simple (I have an
affinity function set up and supplied to the cache configuration, or I
could use IAffinity.GetPartition()), so the question became: How do I know
the server node executing the query is responsible for that partition, and
so should process this element? IE: I need to derive the vector of primary
or backup  partitions that this node is responsible for.



I can query the partition map and return it, like this:



ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

public Dictionary primaryPartitions =
affinity.GetPrimaryPartitions(Cache.Ignite.GetCluster().GetLocalNode()).ToDictionary(k
=> k, v => true);



This lets me do a dictionary lookup, but its less efficient that having a
complete partition map with simple array lookup semantics, like this:



ICacheAffinity affinity = Cache.Ignite.GetAffinity(Cache.Name);

bool[] partitionMap = new bool[affinity.Partitions];



foreach (int partition in
affinity.GetBackupPartitions(Cache.Ignite.GetCluster().GetLocalNode()))

partitionMap[partition] = true;



This is a nice lookup for the query to determine which elements are its
responsibility from the overall request.



I’m not sure of the performance profile of this approach if I end up doing
it a lot, so I’m considering caching this lookup and invalidate it if any
event occurs that could modify the key -> partition map.



Questions:



   1. How big is the penalty when determining the full partition map like
   this?
   2. If I decide to invalidate the cached map, what are all the events I’d
   need to listen to?
  1. Rebalancing events?:I found CacheRebalancingEvent, but I’m not
  sure if this gives visibility to the points in time when a rebalanced
  partition becomes active on the new node and so the partition map changes
  2. Topology change events? (eg: adding a new backup node without
  rebalancing (if that is a thing) I looked for an event like that but have
  not found it so far, though I do know the affinity function can
respond to
  this via AssignPartitions()
   3. How do I provide my own affinity key mapper to for keys to partition
   IDs, but allow Ignite to map the partitions to nodes. The IAffinityFunction
   implementation requires both steps to be implemented. I’d prefer not to
   have the partition -> server mapping responsibility as this requires
   persistent configuration on the nodes to ensure stable mapping.



Thanks,

Raymond.