Hi, I'm currently trying out Storm with the Spout/Bolt API and Trident.
I have a 40 nodes cluster on google compute. Each machine has 15gb of memory, 4 cores and 1 TB of disk. My test job looks like this: Source --> bolt.fieldsGrouping --> bolt.fieldsGrouping --> Sink The Source is generating very simple tuples. They contain an id (long), a timestamp, the hostID and some byte[] payload. I'm running the Storm job with a parallelism of 120 on 30 machines. For Storm, I'm getting a throughput of 30.000 elements/second/core. Storm with fault-tolerance on is achieving 2.500 elements/second/core. I've set setMaxSpoutPending to 1000, because the elements would time out otherwise. For Trident I'm getting only 500 elements/second/core. I've set the batch size to 5000. Can you give me advise on improving these numbers? In particular for trident I think there is something wrong. You can find the source of the Storm job here: https://gist.github.com/rmetzger/5d0b1d9553dd8f73ef8c And Trident: https://gist.github.com/rmetzger/0ad20251d60a73426ebe Thanks, Robert
