In left field.
On Mon, Jun 9, 2014 at 4:57 PM, Dan <[email protected]> wrote: > Where would Akka fit on the Storm/Spark spectrum? > > Thanks > Dan > > ------------------------------ > Date: Mon, 9 Jun 2014 15:48:49 -0700 > Subject: Re: Apache Storm vs Apache Spark > From: [email protected] > To: [email protected] > > Thanks Taylor. Storm seems more flexible in terms of its framework in that > it provides key primitives, the onus is on the developers depending on > their QOS needs how to fine tune it. On the other hand, looking at Lambda > architecture, Storm only fulfills the speed layer while Spark could be > batch/speed/serving (Spark SQL). Based on the use cases and compromises > one would like to make on throughput/latency/QOS, guess have to pick the > right one. > > My simple use case is > a) I have stream of orders (keyed on customerid, source is socket) > b) I filter for those orders that is from my high value customers (I have > to make sure I have this list of high value customers available on all bolt > tasks in memory for fast correlation/projection), so customer id in > streams correlated to customer id in the list and if the customer type is > in platinum and gold > c) Count the orders/amount for last 5 minutes and group by products, > customer type > > > > > > On Mon, Jun 9, 2014 at 2:27 PM, P. Taylor Goetz <[email protected]> wrote: > > The way I usually describe the difference is that Spark is a batch > processing framework that also does micro-batching (Spark Streaming), while > Storm is a stream processing framework that also does micro-batching > (Trident). So architecturally they are very different, but have some > similarity on the functional side. > > With micro-batching you can achieve higher throughput at the cost of > increased latency. With Spark this is unavoidable. With Storm you can use > the core API (spouts and bolts) to do one-at-a-time processing to avoid the > inherent latency overhead imposed by micro-batching. With Trident, you get > state management out of the box, and sliding windows are supported as well. > > In terms of adoption and production deployments, Storm has been around > longer and there are a LOT of production deployments. I’m not aware of that > many production Spark deployments, but I’d expect that to change over time. > > In terms of performance, I can’t really point to any valid comparisons. > When I say “valid” I mean open and independently verifiable. There is one > study that I’m aware of that claims Spark streaming is insanely faster than > Storm. The problem with that study is that none of the code or > configurations used are publicly available (that I’m aware of). So without > a way to independently verify those claims, I’d dismiss it as marketing > fluff (the same goes for the IBM InfoStreams comparison). Storm is very > tunable when it comes to performance, allowing it to be tuned to the use > case at hand. However, it is also easy to cripple performance with the > wrong config. > > I can personally verify that it is possible to process 1.2+ million > (relatively small) messages per second with a 10-15 node cluster — and that > includes writing to HBase, and other components (I don’t have the hardware > specs handy, but can probably dig them up). > > > - Taylor > > > > > On Jun 9, 2014, at 4:04 PM, Rajiv Onat <[email protected]> wrote: > > Thanks. Not sure why you say it is different, from a stream processing use > case perspective both seems to accomplish the same thing while the > implementation may take different approaches. If I want to aggregate and do > stats in Storm, I would have to microbatch the tuples at some level. e.g. > count of orders in last 1 minute, in Storm I have to write code to for > sliding windows and state management, while Spark seems to provide > operators to accomplish that. Tuple level operations such as enrichment, > filters etc.. seems also doable in both. > > > On Mon, Jun 9, 2014 at 12:24 PM, Ted Dunning <[email protected]> > wrote: > > > They are different. > > Storm allows right now processing of tuples. Spark streaming requires > micro batching (which may be a really short time). Spark streaming allows > checkpointing of partial results in the stream supported by the framework. > Storm says you should roll your own or use trident. > > Applications that fit one like a glove are likely to bind a bit on the > other. > > > > > On Mon, Jun 9, 2014 at 12:16 PM, Rajiv Onat <[email protected]> wrote: > > I'm trying to figure out whether these are competitive technologies for > stream processing or complimentary? From the initial read, from a stream > processing capabilities both provides a framework for scaling while Spark > has window constructs, Apache Spark has a Spark Streaming and promises one > platform for batch, interactive and stream processing. > > Any comments or thoughts? > > > > > >
