Hi ,

It might be a very general question to ask here but I'm curious to know why
spark streaming can achieve better throughput than storm as claimed in the
spark streaming paper. Does it depend on certain use cases and/or data
source ? What drives better performance in spark streaming case or in other
ways, what makes storm not as performant as spark streaming ?

Also, in order to guarantee exact-once semantics when node failure happens,
 spark makes replicas of RDDs and checkpoints so that data can be
recomputed on the fly while on Trident case, they use transactional object
to persist the state and result but it's not obvious to me which approach
is more costly and why ? Any one can provide some experience here ?

Thanks a lot,

Weide

Reply via email to