Hi,

I have a use case where I need to perform computation on records in files 
(specifically files containing telecom CDRs).

To this, I have a few questions :

1) Should I have just one client node which reads these records and creates 
Callable compute jobs for each record ? With just 1 client node, I suppose this 
will be a single-point of failure. I could use Zookeeper to manage a cluster of 
such nodes, thus possibly mitigating an SPOF.

2) Or should I stream/load these records using a client, into a cache and then 
have other cluster nodes read this cache for new entries and then let them 
perform the computation ? In this case, is there a way by which I can have only 
one node get hold of computing every record ?

Regards,
Neeraj

Reply via email to