I know Storm fairly well, Spark is probably new to everyone, I haven't used it yet. Some thoughts on where Spark Streaming would be a more natural fit than Storm Spark Streaming + Counting messages or do other statistics on them + Sliding windows on streams - Programming model (Spark seems be one big procedure for the entire process) - Maturity?
Storm + More complex topologies (complexity is delegated to the bolts) + Always concurrent (Spark only for some operations) + Multiple receivers for one message + Maturity Maybe it's the examples, but Spark seems to be geared towards ad-hoc programming. I'm curious what a complex application would look like. Also performance compared to Storm is a question mark. Storm has a great programming model which on first glance looks more universal than Spark's. On 9 June 2014 22:04, Rajiv Onat <[email protected]> wrote: > Thanks. Not sure why you say it is different, from a stream processing use > case perspective both seems to accomplish the same thing while the > implementation may take different approaches. If I want to aggregate and do > stats in Storm, I would have to microbatch the tuples at some level. e.g. > count of orders in last 1 minute, in Storm I have to write code to for > sliding windows and state management, while Spark seems to provide > operators to accomplish that. Tuple level operations such as enrichment, > filters etc.. seems also doable in both. > > > On Mon, Jun 9, 2014 at 12:24 PM, Ted Dunning <[email protected]> > wrote: > >> >> They are different. >> >> Storm allows right now processing of tuples. Spark streaming requires >> micro batching (which may be a really short time). Spark streaming allows >> checkpointing of partial results in the stream supported by the framework. >> Storm says you should roll your own or use trident. >> >> Applications that fit one like a glove are likely to bind a bit on the >> other. >> >> >> >> >> On Mon, Jun 9, 2014 at 12:16 PM, Rajiv Onat <[email protected]> wrote: >> >>> I'm trying to figure out whether these are competitive technologies for >>> stream processing or complimentary? From the initial read, from a stream >>> processing capabilities both provides a framework for scaling while Spark >>> has window constructs, Apache Spark has a Spark Streaming and promises one >>> platform for batch, interactive and stream processing. >>> >>> Any comments or thoughts? >>> >> >> >
