Re: Real-time computing

Neeraj Vaidya Thu, 23 Feb 2017 16:09:01 -0800

Thanks Andrey,

1) If I understand correctly, the FailOver feature of the ComputeGrid is to 
mitigate SPOF for the compute jobs i.e. the Callables/Runnables/Closures which 
are distributed to multiple nodes. But my goal was also to mitigate the failure 
of the client node which is responsible for reading external files and creating 
the compute jobs collection.Can the FailOverSpi handle that as well ? My 
pseudo-code of the client node is as follows :
- Check for files in filesystem
- If present, then for each line present in the file, create a compute job. (If 
I understand correctly, this is the piece of code which falls under the scope 
of FailOverSpi)
- Finally, loop back to wait for any more files.


2) Coming to my second question. Let's say I cache the CDR file records/entries 
into a certain cache e.g. : "CDRFileCache". I then run 5 nodes each with a 
listener waiting for new entries to be added to this cache. 
- If I stream 3 entries into this cache, one after another, will all listeners 
process all these 3 entries ? i.e will entry1,2,3 be processed by 
listener1,2,3,4 and 5 ?
- Or is it that if listener1 is processing entry1, then entry1 will not be 
processed by any other listener because listener1 has already started 
processing it ?

Regards,
Neeraj

--------------------------------------------
On Fri, 24/2/17, Andrey Mashenkov <[email protected]> wrote:

 Subject: Re: Real-time computing
 To: [email protected], "Neeraj Vaidya" <[email protected]>
 Date: Friday, 24 February, 2017, 2:20 AM
 
 Hi Neeraj,
 1. Why you want
 to use Zookeeper to mitigating an SPOF instead of Ignite
 ComputeGrid failover features?
 2. If you need
 to reuse data then caching makes sense. For processing new
 entries you can use Events or Continuous
 queries. You are free in
 choosing number of nodes for your grid. You can choose what
 nodes will hold data and what nodes will be used for
 computations. 
 I'm not
 sure I understand last question. Would you please detail the
 last use case?
 
 On Thu, Feb 23, 2017 at
 3:23 AM, Neeraj Vaidya <[email protected]>
 wrote:
 Hi,
 
 
 
 I have a use case where I need to perform computation on
 records in files (specifically files containing telecom
 CDRs).
 
 
 
 To this, I have a few questions :
 
 
 
 1) Should I have just one client node which reads these
 records and creates Callable compute jobs for each record ?
 With just 1 client node, I suppose this will be a
 single-point of failure. I could use Zookeeper to manage a
 cluster of such nodes, thus possibly mitigating an SPOF.
 
 
 
 2) Or should I stream/load these records using a client,
 into a cache and then have other cluster nodes read this
 cache for new entries and then let them perform the
 computation ? In this case, is there a way by which I can
 have only one node get hold of computing every record ?
 
 
 
 Regards,
 
 Neeraj
 
 
 
 
 -- 
 Best regards,
 Andrey V.
 Mashenkov

Re: Real-time computing

Reply via email to