Regardless of what IBM does, there is clearly a lot that Storm is not doing for performance. This is largely by design for simplicity. For instance, storm could use reflection and byte code engineering to merge bolts while still allowing rearrangement and rebalancing. Likewise, when throughput gets high enough that some batching is worthwhile, technologies like empirical schema, column orientation, bit weaving and zero serialization formats[1,2] could all be used to move records. Together, these would give several orders of magnitude speedup. The fact that these have not yet been worth looking at for Storm users indicate that the virtues and challenges of Storm probably live elsewhere.
The right thing to do is stick to Storm's knitting and make it the best that it can be within its own self-concept. No need to ignore other offerings, but considering of alternatives should be limited to finding ways to improve Storm. As I see it, the thing driving Storm adoption is reasonably flexible parallel execution of streams with a very simple programming and fail-over model. Performance is not a major issue for most needs since a million tuples per second is larger than what most people need and that is pretty easy to achieve with Storm. [1] See Apache Drill for empirical schema, on-the fly code generation and columnar zero serialization format [2] See h2o for magical compression of columnar data with resulting monster speed On Mon, May 12, 2014 at 12:09 PM, Klausen Schaefersinho < [email protected]> wrote: > Hi, > > my guess is that 40k are per CPU or so... for sure not for an entire > cluster. > > > On Mon, May 12, 2014 at 4:46 PM, Marc Vaillant <[email protected]>wrote: > >> To play devil's advocate, if you believe the stream performance gains, >> then the 40k will likely pay for itself in needing to deploy a fraction >> of the resources for the same throughput. >> >> On Mon, May 12, 2014 at 09:02:53AM -0400, John Welcher wrote: >> > Hi >> > >> > Streams also cost 40,000 US while Storm is free. >> > >> > John >> > >> > >> > On Mon, May 12, 2014 at 3:49 AM, Klausen Schaefersinho < >> > [email protected]> wrote: >> > >> > Hi, >> > >> > I found some interesting comparison of IBM Stream and Storm: >> > >> > https://www.ibmdw.net/streamsdev/2014/04/22/streams-apache-storm/ >> > >> > It also includes an interesting comparison between ZeroMQ and the >> Netty >> > Performance. >> > >> > >> > Cheers, >> > >> > Klaus >> > >> > >> > >
