Thanks Andrey, 1) If I understand correctly, the FailOver feature of the ComputeGrid is to mitigate SPOF for the compute jobs i.e. the Callables/Runnables/Closures which are distributed to multiple nodes. But my goal was also to mitigate the failure of the client node which is responsible for reading external files and creating the compute jobs collection.Can the FailOverSpi handle that as well ? My pseudo-code of the client node is as follows : - Check for files in filesystem - If present, then for each line present in the file, create a compute job. (If I understand correctly, this is the piece of code which falls under the scope of FailOverSpi) - Finally, loop back to wait for any more files.
2) Coming to my second question. Let's say I cache the CDR file records/entries into a certain cache e.g. : "CDRFileCache". I then run 5 nodes each with a listener waiting for new entries to be added to this cache. - If I stream 3 entries into this cache, one after another, will all listeners process all these 3 entries ? i.e will entry1,2,3 be processed by listener1,2,3,4 and 5 ? - Or is it that if listener1 is processing entry1, then entry1 will not be processed by any other listener because listener1 has already started processing it ? Regards, Neeraj -------------------------------------------- On Fri, 24/2/17, Andrey Mashenkov <[email protected]> wrote: Subject: Re: Real-time computing To: [email protected], "Neeraj Vaidya" <[email protected]> Date: Friday, 24 February, 2017, 2:20 AM Hi Neeraj, 1. Why you want to use Zookeeper to mitigating an SPOF instead of Ignite ComputeGrid failover features? 2. If you need to reuse data then caching makes sense. For processing new entries you can use Events or Continuous queries. You are free in choosing number of nodes for your grid. You can choose what nodes will hold data and what nodes will be used for computations. I'm not sure I understand last question. Would you please detail the last use case? On Thu, Feb 23, 2017 at 3:23 AM, Neeraj Vaidya <[email protected]> wrote: Hi, I have a use case where I need to perform computation on records in files (specifically files containing telecom CDRs). To this, I have a few questions : 1) Should I have just one client node which reads these records and creates Callable compute jobs for each record ? With just 1 client node, I suppose this will be a single-point of failure. I could use Zookeeper to manage a cluster of such nodes, thus possibly mitigating an SPOF. 2) Or should I stream/load these records using a client, into a cache and then have other cluster nodes read this cache for new entries and then let them perform the computation ? In this case, is there a way by which I can have only one node get hold of computing every record ? Regards, Neeraj -- Best regards, Andrey V. Mashenkov
