Hi, I have a use case where I need to perform computation on records in files (specifically files containing telecom CDRs).
To this, I have a few questions : 1) Should I have just one client node which reads these records and creates Callable compute jobs for each record ? With just 1 client node, I suppose this will be a single-point of failure. I could use Zookeeper to manage a cluster of such nodes, thus possibly mitigating an SPOF. 2) Or should I stream/load these records using a client, into a cache and then have other cluster nodes read this cache for new entries and then let them perform the computation ? In this case, is there a way by which I can have only one node get hold of computing every record ? Regards, Neeraj
